INL-ROM custom MMC3 hybrid mapper design

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
INL-ROM custom MMC3 hybrid mapper design
by on (#98626)
So I finished the mess of wires that is my current prototype of INL-ROM:
Image

I had high hopes of ordering the boards Monday, but as luck would have it my Xilinx JTAG programmer crapped out. I wasted all night yesterday trying to get it running and took a stab at making my own but no luck... To make my life easier for when a new programmer shows up in a couple days I created testbenches and fully verified my code running on the CPLD's. No issues there, so I'm pretty confident with the whole design now. Assuming all is well with this new programmer, I'll be sending off the files this week after checking a couple things I can only do with hardware.

tepples wrote:
Could you consider a 512/32 version for people who don't need CHR ROM support and think they can make the next Mega Man 4/6?

I think I can do a little better, what about using the last 4KB to facilitate 4 screen mirroring as well?

Dropping CHR down to 32KB (presumably VRAM) gives a LOT of breathing room for the available logic. I know the idea of making new mappers raises a lot of brows around here, but I can't help but spill the beans on what is coming down the pipe. Getting the thing released as STOCK FULL MMC3 and others is the priority here supporting repros/hacks and the likes. All of this nonsense to follow won't be ready until I've had time to play around with it, write a test ROM etc. Basically I'm not looking to change the MMC3 much for this 'homebrew/hybrid' version, just attempt to make full use of what hardware is already on the board. To keep things simple I don't think I'll allow much configurability to the user for this alternative mapper choice. Some things will be limited to make room for improved functionality in other areas.

Firstly I was reminded the other day of how something like this would be great with chykn's ENIO that I'll have in my hands shortly. The simple thing is to support the "direct addressing" mode of his device by porting PRG R/W and /CE through the EXP pins. Beyond that, I'm looking to make some slight modifications to the MMC3 allowing for Flash PRG-ROM programming via his IO. Not sure how I'll do the boot-ROM yet, I've thought about reserving a ~64KB sector of the 512KB flash for the bootrom. Somewhat dangerous, but I can build some safeguards into the mapper blocking unintentional writes to the bootrom sector.

Aside from compiling up the idea of using the 32KB for name, attribute, and pattern tables; I also compiled a way to utilize the 32KB of WRAM. Previously I was thinking of adding some register below the WRAM. I found an even better solution though, since it's only 2 bits, it'd actually fit in the first 'control' register's unused bits. Just directly mapping 2 of those bits (D5-D3) to WRAM A14 & 13. The other idea would be to use the $A000/A001 registers could be used instead. I've still got around 20 macro cells to work with at this point.

I've got a few other ideas kicking around, but not enough data at this point to be sure that I can provide them yet.
Re: MMC3 (or similar) reproduction circuit boards. INL-ROM
by on (#98627)
Impressive prototype. ;) Sorry to here about the dead programmer. I hope you get it all sorted out and we see new MMC3 boards soon.
Re: MMC3 (or similar) reproduction circuit boards anyone?
by on (#98654)
Quote:
Aside from compiling up the idea of using the 32KB for name, attribute, and pattern tables
Anything specific fleshed out here yet? (I kinda like the MMC5's mechanism for specifying extended banking)

infiniteneslives wrote:
Previously I was thinking of adding some register below the WRAM. I found an even better solution though, since it's only 2 bits, it'd actually fit in the first 'control' register's unused bits. Just directly mapping 2 of those bits (D5-D3) to WRAM A14 & 13.
A yearish ago, Tepples was asking about the possibility of banking WRAM with MMC3 (discussion on the wiki); one thought that would be trivial but perhaps awkward would be enabling a 4kB RAM window in the $5000-$5FFF page. This would then track the bank setting for the $C000-$DFFF bank in the MMC3. Although not directly supported by the MMC3, it's a hardware modification that's just a simple variant on the standard "add prg ram" circuit— half of a 74'20 (/(M2·/ROMSEL·A14·A12)) to get a 4+4F RAM banking.

Quote:
The other idea would be to use the $A000/A001 registers could be used instead.
See also the MMC6: It's not unprecedented to reinterpret the bits of the PRG-RAM protect register.
Re: MMC3 (or similar) reproduction circuit boards anyone?
by on (#98657)
lidnariq wrote:
Quote:
Aside from compiling up the idea of using the 32KB for name, attribute, and pattern tables
Anything specific fleshed out here yet? (I kinda like the MMC5's mechanism for specifying extended banking)


So the MMC5's setup of mirroring is still limited to 2 NT's plus EXRAM for a 3rd choice, and fill mode for a forth. In contrast my idea was just straight up 4screen mirroring. I would completely disable the 2KB VRAM on board, and make a simple change to the CHR bank select circuitry that would enable the last 4KB of CHR RAM when CHR A13 is high. So from a programmer's perspective you'd just have 4 sets of NT and AT from PPU $2000-3FFF. So really there wouldn't be any mechanism to control from the programmer's view. The only thing you'd want to keep in mind though is that you wouldn't want to set any of the CHR bank select registers to point to the last 4 banks (4KB). You actually could if you really wanted to, but you'd end up rendering the NT/AT as pattern tables (trash/static basically). My setup is really just double using 32KB of SRAM for name, attribute, and pattern tables to give you true four screen mirroring without any additional cost. The logic required to do so is about the same to implement normal MMC3 mirroring so it effectively comes at no logic cost either.

Quote:
A yearish ago, Tepples was asking about the possibility of banking WRAM with MMC3 (discussion on the wiki); one thought that would be trivial but perhaps awkward would be enabling a 4kB RAM window in the $5000-$5FFF page. This would then track the bank setting for the $C000-$DFFF bank in the MMC3. Although not directly supported by the MMC3, it's a hardware modification that's just a simple variant on the standard "add prg ram" circuit— half of a 74'20 (/(M2·/ROMSEL·A14·A12)) to get a 4+4F RAM banking.


So if I were to answer Tepples' question from that discussion, with my MMC3 would output high on all upper address bits when enabling WRAM. That's because I only base the A13 and above upon the current status of PRG A13 & 14 on the NES. Them being high for $5000-5FFF would be the same as enabling the last bank of PRG ROM. I'm pretty sure what's what you and Drag were saying.

I could base the WRAM bank off of the bank selected for $C000-DFFF, but I don't see much benefit. It'd actually make the logic more complex than just using something like the lower bits of the $A001 register and always mapping those bits to WRAM A13/14. This also seems simpler to program because you don't have to keep track of what reg6 is mapping to.

Aside from this stuff I'm hoping to tidy up the Scanline counter. I did some checking with my logic analyzer previously that confirmed that Tepples' idea of sensing scanlines based on CHR A13 should be pretty simple. It would end up firing IRQ later than the actual MMC3 (a little after sprite fetching instead of immediately) This should remove sprite/background restrictions and no longer require careful use of $2006/7. Additionally it would require significantly less logic to sense scan lines in this manner so it should free up some logic space. One thought I had was to try and implement a RAMBO-1 style counter that could be used to count scanlines or CPU cycles and be able to change on the fly. None of this has been tested yet though, just ideas and brainstorming for the most part.

Beyond that I didn't really have anything on the plate for minor improvements/additions to my MMC3. If you have ideas/requests though feel free to share, there won't be room for logic but there is a sizable amount, simple stuff like more PRG-ROM is possible. Hmm now that I think of it, with some luck there should be enough for some MMC2/4 latch style bankswitching though. ;)
Re: MMC3 (or similar) reproduction circuit boards anyone?
by on (#98659)
infiniteneslives wrote:
Quote:
A yearish ago, Tepples was asking about the possibility of banking WRAM with MMC3 (discussion on the wiki); one thought that would be trivial but perhaps awkward would be enabling a 4kB RAM window in the $5000-$5FFF page. This would then track the bank setting for the $C000-$DFFF bank in the MMC3. Although not directly supported by the MMC3, it's a hardware modification that's just a simple variant on the standard "add prg ram" circuit— half of a 74'20 (/(M2·/ROMSEL·A14·A12)) to get a 4+4F RAM banking.
I could base the WRAM bank off of the bank selected for $C000-DFFF, but I don't see much benefit. It'd actually make the logic more complex than just using something like the lower bits of the $A001 register and always mapping those bits to WRAM A13/14. This also seems simpler to program because you don't have to keep track of what reg6 is mapping to.
Sorry, I must have stated myself poorly. What I meant was "because the original MMC3 ignores /ROMSEL for banking, if you map RAM into $5000-$5fff, it'll follow the bank in $d000-$dfff". Shouldn't be any more complicated in hardware, although it is definitely weird enough to be of dubious utility.

Quote:
Beyond that I didn't really have anything on the plate for minor improvements/additions to my MMC3. If you have ideas/requests though feel free to share, there won't be room for logic but there is a sizable amount, simple stuff like more PRG-ROM is possible. Hmm now that I think of it, with some luck there should be enough for some MMC2/4 latch style bankswitching though. ;)
The ideas that come to mind in the bucket of "probably sufficiently simple" are 1- FME-7's ability to bank PRG ROM into $6000-$7fff, 2- automatic IRQ redirection so that the vector for the MMC3 IRQ is pre-separated so that the software doesn't need to check/only use one of the different IRQ sources, and 3- automatic controller DPCM-deglitcher by watching for reads and writes to $4xxx, clocking on reads and stopping the shift register after 8 bits have been read.
Re: MMC3 (or similar) reproduction circuit boards anyone?
by on (#98669)
infiniteneslives wrote:
In contrast my idea was just straight up 4screen mirroring. I would completely disable the 2KB VRAM on board, and make a simple change to the CHR bank select circuitry that would enable the last 4KB of CHR RAM when CHR A13 is high. So from a programmer's perspective you'd just have 4 sets of NT and AT from PPU $2000-3FFF. So really there wouldn't be any mechanism to control from the programmer's view.

Which would make it harder to have a status bar. When revising one of NovaYoshi's mapper ideas over the past couple days, I came up with a different plan: allow switching between standard mirroring modes in CIRAM and a four screen mode in CHR RAM $7000-$7FFF.
Re: MMC3 (or similar) reproduction circuit boards anyone?
by on (#98697)
lidnariq wrote:
Sorry, I must have stated myself poorly. What I meant was "because the original MMC3 ignores /ROMSEL for banking, if you map RAM into $5000-$5fff, it'll follow the bank in $d000-$dfff". Shouldn't be any more complicated in hardware, although it is definitely weird enough to be of dubious utility.


Ahh I get it now I didn't pick up on the fact it would be different at $5000. Yeah it wouldn't take much more logic to allow something like that assuming my direct mapping of $A001 bits wasnt implemented. The combination of yours and my solution wouldn't be a good use of the limited logic though. One thought would be to use my idea but only control A14 with the mapper, and just connect A13 to the NES. It'd be even simpler logic, but you'd loose 8KB of WRAM because you'd just have two 12KB banks. If one had a hard time even utilizing 4 8KB pages of WRAM that loss of 12KB might not be an issue. More Linear WRAM seems like a better benefit than the full 32KB, I'm just speculating though...


Quote:
The ideas that come to mind in the bucket of "probably sufficiently simple" are 1- FME-7's ability to bank PRG ROM into $6000-$7fff, 2- automatic IRQ redirection so that the vector for the MMC3 IRQ is pre-separated so that the software doesn't need to check/only use one of the different IRQ sources, and 3- automatic controller DPCM-deglitcher by watching for reads and writes to $4xxx, clocking on reads and stopping the shift register after 8 bits have been read.


1- what is it exactly that makes having a prg rom bank at $6000 appealing? This probably wouldnt cost too much logic, but it'd probably be low priority unless I'm missing some benefit that'd make it worth the logic cost.

2- how would you imagine preseparating the IRQ? I'm not familiar enough with other IRQs to realize a way to do this more easily than how they are normally distinguished. Oh wait... What if the mapper always switched to something besides the last bank upon the issue of a mapper IRQ. How much time/trouble would this really save though?

3- DPCM-deglitcher: How would you stop the shift register exactly?

Tepples: I like some of your ideas there with the Nova. One comment though, I think you might be under estimating the CHR window logic. That arithmetic is going to suck up some considerable logic I would guess.

But I see what you mean about the status bar difficulty. I'd have to check some things, but I think it'd be simpler logic to implement H/V mirroring from the same on cart VRAM by just fixing A10/11.

Also, would you mind branching off these last few days of discussion about a hybrid mapper to another thread? I don't want to confuse people interested in only the stock mapper styles. Thx.
Re: MMC3 (or similar) reproduction circuit boards anyone?
by on (#98698)
lidnariq wrote:
The ideas that come to mind in the bucket of "probably sufficiently simple" are 1- FME-7's ability to bank PRG ROM into $6000-$7fff, 2- automatic IRQ redirection so that the vector for the MMC3 IRQ is pre-separated so that the software doesn't need to check/only use one of the different IRQ sources, and 3- automatic controller DPCM-deglitcher by watching for reads and writes to $4xxx, clocking on reads and stopping the shift register after 8 bits have been read.

None of these ideas are really worth wasting hardware resources on, IMO.

1) When you already have two switchable 8K banks and two fixed 8K banks, having a third switchable bank doesn't really help much (two switchable banks + a fixed bank really is the sweet spot IMO, it's so much better than for example UxROM's one switchable bank + one fixed bank because you can map code in the other bank and data in the other, whereas with 16K banking you have to either fit all of your code in the fixed bank or always worry about not being able to easily switch data in when running code from the switched bank).

EDIT: OK, there's one case where a third switchable bank could be useful: DPCM. Even in that case it would be just as useful to have 3 switchable banks at $8000-DFFF and one fixed bank at $E000-FFFF.

2) If you have a scanline IRQ, I can't really think of much reason to have any other IRQ sources (e.g. the DPCM IRQ and frame IRQs are mostly useless).

3) Not worth it to save ~500 or so cycles extra that it would take to do this on software.
Re: MMC3 (or similar) reproduction circuit boards anyone?
by on (#98700)
infiniteneslives wrote:
Also, would you mind branching off these last few days of discussion about a hybrid mapper to another thread?

PM me exactly which post IDs need split and I'll do it.
Re: MMC3 (or similar) reproduction circuit boards anyone?
by on (#98703)
It was just brainstorming :p

infiniteneslives wrote:
1- what is it exactly that makes having a prg rom bank at $6000 appealing? This probably wouldnt cost too much logic, but it'd probably be low priority unless I'm missing some benefit that'd make it worth the logic cost.

thefox wrote:
1) When you already have two switchable 8K banks and two fixed 8K banks, having a third switchable bank doesn't really help much (two switchable banks + a fixed bank really is the sweet spot IMO, it's so much better than for example UxROM's one switchable bank + one fixed bank because you can map code in the other bank and data in the other, whereas with 16K banking you have to either fit all of your code in the fixed bank or always worry about not being able to easily switch data in when running code from the switched bank).

EDIT: OK, there's one case where a third switchable bank could be useful: DPCM. Even in that case it would be just as useful to have 3 switchable banks at $8000-DFFF and one fixed bank at $E000-FFFF.

I was specifically referring to FME-7's having 4 switchable banks plus the fixed bank at $e000. Is it obviously useful? I dunno, it keeps you from having to save context. The four banks could be something like "DPCM stream", "music data", "level data", "game code".

infiniteneslives wrote:
2- how would you imagine preseparating the IRQ? I'm not familiar enough with other IRQs to realize a way to do this more easily than how they are normally distinguished. Oh wait... What if the mapper always switched to something besides the last bank upon the issue of a mapper IRQ. How much time/trouble would this really save though?

A fair amount, depending on how frequently interrupts come; interrupt ringdowns are present in almost all computing designs.
thefox wrote:
2) If you have a scanline IRQ, I can't really think of much reason to have any other IRQ sources (e.g. the DPCM IRQ and frame IRQs are mostly useless).

But that's a fine point.

Quote:
3- DPCM-deglitcher: How would you stop the shift register exactly?

a 9 bit shift register; when 0b???????1 is written to $4xxx the register is preloaded with 0b000000001, reads clock it, and when the 256s bit is 1 is blocks itself.
Quote:
3) Not worth it to save ~500 or so cycles extra that it would take to do this on software.

I disagree; 500cy is actually pretty huge, reliably always the same amount of time is better, and DPCM no longer conflicts with certain not-rereadable things (e.g. Arkanoid, SNES mouse).
Re: MMC3 (or similar) reproduction circuit boards. INL-ROM
by on (#98707)
What is DPCM deglitching? Is there a thread somewhere or a wiki page I could read about this? (Trying not to derail topic further.)
Re: MMC3 (or similar) reproduction circuit boards. INL-ROM
by on (#98709)
rainwarrior wrote:
What is DPCM deglitching? Is there a thread somewhere or a wiki page I could read about this? (Trying not to derail topic further.)

viewtopic.php?f=2&t=4116
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#98711)
I know about the read conflict, but how would a mapper "deglitch" it?
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#98713)
The seeming read still happens; the 6502 inside the NES just isn't listening. By mapping a shift register that will pay attention to the same bits that the software would, you can make sure something listens, and then save the results for the NES. (Is that clear enough?)
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#98714)
In other words, something like this would work:
Code:
  lda #$01
  sta $4016  ; the mapper snoops on writes to $4016
  lsr a
  sta $4016

  bit $4016
  bit $4016
  bit $4016
  bit $4016

  bit $4016
  bit $4016
  bit $4016
  bit $4016

  lda $4036  ; the mapper has been watching the data bus during $4016 reads too

But given the existence of devices that return 16-bit (Super NES controller), 24-bit (Four Score), or 32-bit (mouse) records on D0, or records on D3 and D4 (Power Pad and Arkanoid paddle), that could take a big CPLD with a lot of memory.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#98715)
So the mapper can pick up the extra read even though software can't? I guess that makes sense then. Afterwards you have 8 or 9 bits you could compare against the value you read in software?
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#98724)
rainwarrior wrote:
So the mapper can pick up the extra read even though software can't? I guess that makes sense then. Afterwards you have 8 or 9 bits you could compare against the value you read in software?

Nah you don't need to compare it to anything, you'll know it's right. Something to keep in mind though is that according to some rumors, some pads (?) don't like being read too fast: viewtopic.php?f=2&t=4841. (I may try to reproduce the problem some day since I have a PAL NES.)
Re: MMC3 (or similar) reproduction circuit boards anyone?
by on (#98727)
lidnariq wrote:
Quote:
3- DPCM-deglitcher: How would you stop the shift register exactly?

a 9 bit shift register; when 0b???????1 is written to $4xxx the register is preloaded with 0b000000001, reads clock it, and when the 256s bit is 1 is blocks itself.


So Let me see if we're on the same page. It seems like we've got some complex ideas going on here that'll require to be simplified greatly if they're going to be a viable choice. I see the 500 cycle benefit to be substantial, not required obviously but worth a look in any event. Although I think it might be valid to just assume use of the DPCM costs you an extra 500 cycles, or controller inputs that can afford to be delayed a few frames.

I think you're alluding to the possibility of just watching for writes to $4000-$4FFF. That would require two additional inputs (PRG A11&12) which is feasible especially if we're using those to decode $5000-5FFF for PRG RAM potentially as well.

So if I understand this correctly the thought is to have a 8bit shift register that would read ONLY 8 bits after ANY write to $4xxx. Then the NES could then read ACTUAL controller data from the mapper in a single read. So the shift register would hold trash most of the time but we don't care, you can only consider the data valid after a single controller read. Not sure how you'd read back that shift register, it'd require a lot of address lines or mirroring which there may not be much left of either. Only other option being a special sequence of operations to read things back which requires even MORE logic.

Ah wait none of that non-sense works because the controller is being clocked by a signal which is only available at the controllers and EXP port. So ASSUMING one had a jumper or modified ENIO, the clock signal could be routed to the cart's mapper. Because you can't clock off of reads from $4xxx, that's why the NES is broke in the first place...

So unless I'm being oblivious to something, all I can see is a large cost for a modest benefit. So yeah I'm officially stumped as to how this work work at all even if we weren't trying to fix things like the paddle and other long winded peripherals. It'd probably be a lot easier to implement without such limited resources.

But I do have one other question... Do we KNOW that the controller data is actually VALID when the DPCM clocks it randomly? I guess what I'm wondering is if there is potential for bus conflicting on Data-0 when the controller is driving the line out of sync. If this were actually an issue I don't see how you'd ever resolve the problem with hardware.

Quote:
Something to keep in mind though is that according to some rumors, some pads (?) don't like being read too fast: viewtopic.php?f=2&t=4841. (I may try to reproduce the problem some day since I have a PAL NES.)


Can't remember how those extra resistors are connected on the PAL NES pads, but the faster clock and some effect of that resistor could be the issue. I've got a PAL controller board sitting around here somewhere, but can't find it at the moment...
Re: MMC3 (or similar) reproduction circuit boards anyone?
by on (#98728)
Let me try unpacking some things here:

infiniteneslives wrote:
I think you're alluding to the possibility of just watching for writes to $4000-$4FFF. That would require two additional inputs (PRG A11&12) which is feasible especially if we're using those to decode $5000-5FFF for PRG RAM potentially as well.
It could even just be watching $4000-$5FFF; the point is that you're not reading or writing to anything else in that range while you're reading the controller. (Also, how'd A11 get involved?)

Quote:
Not sure how you'd read back that shift register, it'd require a lot of address lines or mirroring which there may not be much left of either.
No argument there; but I don't think (e.g.) mirroring it over all of $5xxx is that awful.

Quote:
Ah wait none of that nonsense works because the controller is being clocked by a signal which is only available at the controllers and EXP port.
But it's not! That's the beauty of it. The NES asserts the /RD4016 line, but the address bus still contains $4016, and the data bus is still driven by the 74hc368. The only hardish part is coming up with a convention so that the cartridge doesn't need to full decode the address bus.

Quote:
even if we weren't trying to fix things like the paddle and other long winded peripherals.
For the Vaus controller, some cleverness needs to be present to know whether the shift register should pay attention to D4 or D0. The mouse, I sadly agree, is a lost cause (32 bits of which you can throw away the first 8? eh...)

Quote:
Do we KNOW that the controller data is actually VALID when the DPCM clocks it randomly?
The 74HC368 has a OE propagation delay of 45ns max at 4.5V; the CD4021has a maximum shift frequency of 2.5MHz (350ns). That's all the hardware that produces this is, the only way for it to screw up is if the 2A03 changed the address bus in the middle of an M2 cycle (it doesn't)
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#98766)
Yeah sorry I don't know where the A11 came from I must have miscounted. Is there anywhere that documents exactly how this glitch works/behaves? The previous thread that was linked to didn't even label the signals I think I can guess which ones they are though. Being a glitch and all I don't feel any of my normal assumptions are necessarily valid.

I suppose I could verify things more for myself with my logic analyzer. Honestly the more I think about this whole thing the less interest I have in trying to design, test, and implement a feature with modest returns at best IMO.

So if you all are seriously interested in this deglitcher, I suggest drawing up the schematic. I'll code it up in verilog and see how much logic it'll cost. If we're able to keep it simple enough to fit in the available room, then someone can write a test ROM and I'll test it out. Even still though I'd leave it up to a vote if it boiled down to this feature beating out some other feature of interest.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#98769)
infiniteneslives wrote:
Is there anywhere that documents exactly how this glitch works/behaves?
The Visual2A03 answers all.

DPCM DMA deasserts RDY to take the CPU off the bus for (at least) two M2 cycles. At least one of those is junk, a repetition of the last value from the CPU. The second the DPCM hardware drives the address bus to the byte desired. (Example with reading $2002 instead).

The Joystick read strobes are known to be NOT (Address == $4016 AND Readnotwrite AND M2). Each time all 18 of those lines match, /RD4016 goes to 0V. Similar for /RD4017.

Quote:
So if you all are seriously interested in this deglitcher, I suggest drawing up the schematic. I'll code it up in verilog and see how much logic it'll cost. If we're able to keep it simple enough to fit in the available room, then someone can write a test ROM and I'll test it out. Even still though I'd leave it up to a vote if it boiled down to this feature beating out some other feature of interest.
I'm really not trying to say "dude, this is so exciting, please build it!". I'm just trying to say "This should be a fairly inexpensive way of freeing up about 2% of CPU time for games that use DPCM".

Anyway, the attached image is how I'd build it using 74xx parts; something synthesized should be able to skip the inverters and nor gates.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#98774)
lidnariq wrote:
Quote:
So if you all are seriously interested in this deglitcher, I suggest drawing up the schematic. I'll code it up in verilog and see how much logic it'll cost. If we're able to keep it simple enough to fit in the available room, then someone can write a test ROM and I'll test it out. Even still though I'd leave it up to a vote if it boiled down to this feature beating out some other feature of interest.
I'm really not trying to say "dude, this is so exciting, please build it!". I'm just trying to say "This should be a fairly inexpensive way of freeing up about 2% of CPU time for games that use DPCM".


No I know, I don't mean to be a stick in the mud or anything either. I appreciate the ideas and discussion. Thanks for drawing that up, I'll code it up and compile it and see what we get. Assuming it fits, we can keep it as an option and pick and choose once I've got a few more compiled up and tested.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99567)
So the idea came up here to use a serial flash to store large amounts of data for a cart carrying 32KB of VRAM and WRAM for the level, music, and graphic data to be unpacked into.

I agree that the data rate should be effectively as fast as the rate the NES can take it making it look as if it were normal parallel memory reads. So ASSUMING I've got enough logic available with 2 the CPLDs on board to make this happen I started thinking about what commonly discussed problems this could effectively solve at little to no extra cost to the setup this 'hybrid MMC3' board already has.

The idea would be to maintain the code that is always used on a standard parallel ROM, things like the game engine, user input, and music routines. Tepples suggested you'd need 128KB which seems more than sufficient and still cheap for a 'boot/engine' ROM. Then you'd access the SPI flash through the CPLD and load level, music, and graphic data loading it into the 32KB of VRAM and 32KB of WRAM. Additionally the goal would be to allow WRAM to be mapped to $8000-BFFF or something similar.

lidnariq brought up good points about the cost comparison of parallel vs. serial. I agree that if you're ONLY looking for large amounts of ROM parallel is still probably your best option. But with this project I've got my sights set on this MMC3 like capable board. Depending what one has already bought into placing on the board the cost could effectively be free or even money saving.

Here are the benefits/costs I see:
1) Cost: Assuming you've already got WRAM, VRAM, 3.3v regulator, the logic available and level conversion (CPLD) You're only buying 1-2MBytes of ROM for ~$0.50. Now you're saving buying a large PRG ROM ($2.70+) for a smaller 128KB boot/engine ROM ($1) so you're actually saving money if you need/use the extra memory.

2) Game saves: you could effectively solve the age old issue of saves and ditch the problems of battery backing. This not only allows for better save integrity, but saves ~$1 worth of expense for the components.

3) Graphics: I can't really speak from much experience here since I'm only in the planning stages of my game. I've only really spent time doing artwork, but the idea of being able to create effectively as many tiles as I'd ever desire is appealing. Basically this just seems like it would make game development easier and less time intensive if I didn't have this constant thought that I need to save every byte of ROM space I can. I know there aren't a lot of artists around here so this might be falling on deaf ears, I happen to enjoy drawing on the NES albeit time consuming to do well. Armed with 32KB of VRAM(or 28KB with 4screen), 1MB or more of tiles, and MMC3 fine bank switching would allow for some pretty detailed and less repetitive graphics even with the color limitations.

4) Compression: Maybe it's just me but this sounds like the least fun part of making a homebrew. You could obviously still do it with this setup, but you could also just spend a few dimes more for another MByte and ignore the problem :). It would definitely save development time and possibly even CPU time when comparing loading to decompressing. Again there is less motivation to pinch every byte of ROM you can, I agree it's still important to conserve for other reasons. But I imagine it'd save development time if I didn't feel the need to do with everything you wrote/drew/composed.

5) Music: open up possibilities for streaming audio without the 'hard' limits of ROM space. I'm not a music/sound buff but I know some people would might appreciate it.

6) Other: One interesting thought is with the engine stored on it's own ROM and some level specific code, level data, and graphics data separately I'd be simpler to run as a dev cart once the boot/engine ROM was complete. You could make most of your modifications to just the SPI memory which is more manageable than a large parallel ROM behind a mapper. Not sure how worth while this really is, but it's an interesting thought. It'd also be an interesting way for someone to share a 'generic platformer engine' and allow someone else to fill in the details of animations, graphics, levels, and music. Or to release a sequel using effectively same engine where you could sell the serial EPROM alone as an upgrade that included both games or something. Head in the clouds thoughts I know, but interesting to think about...

I know, I know, "where's my game to support this mapper???" "make the game, then then add capabilities to the mapper as needed!!" I agree with these arguments to for the most part. But being early in the game development process these things seem like they would change how the game were developed. But I'm curious to see what you guys think who are further along in game development or have something substantial under your belt unlike myself. Does something like this sound appealing? Am I being too naive? Thoughts?
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99574)
I like the idea of using a serial RAM to save data, especially when they can be flashed and such. But, you still have to have a backup of the current "file" in WRAM so it honestly doesn't help much, unless your game uses multiple states. But, if it does use multiple save slots/states, then it sure would be nice to be able to use this idea instead.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99577)
You could make your own version of the Famicom Disk System basically. ;)

I find it hard to believe that a homebrew game is going to come along and say, be bigger than Kirby's Adventure. And if one does, I'm sure they will tackle getting more memory when that challenge presents itself. But it's a neat idea.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99579)
If I had the option to use vast amounts of data for a low cost, I'd make a couple of kick-ass animation sequences just for the heck of it, even if the game itself would fit in a traditional cartridge.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99580)
3gengames wrote:
I like the idea of using a serial RAM to save data, especially when they can be flashed and such. But, you still have to have a backup of the current "file" in WRAM so it honestly doesn't help much, unless your game uses multiple states. But, if it does use multiple save slots/states, then it sure would be nice to be able to use this idea instead.


The idea would be that you wouldn't bother trying to retain the save data in battery backed WRAM. It wouldn't be worth the cost of components and risk of lost data when you've got MBytes worth of flash at your disposal. That alone would offset the cost of the SPI flash chip with what's already on the board. You could have dozens of save slots if you desired with no added cost.

One more benefit I thought of: Increased PRG ROM space does NOT come at the cost of more logic. Once you comsumed the logic to implement serial flash an extra MByte or two doesn't really change the mapper as compared to parallel extensions which are costly with logic especially with MMC3.

MottZilla wrote:
You could make your own version of the Famicom Disk System basically. ;)

I find it hard to believe that a homebrew game is going to come along and say, be bigger than Kirby's Adventure. And if one does, I'm sure they will tackle getting more memory when that challenge presents itself. But it's a neat idea.


Yeah very similar to FDS. And I agree about the homebrew size comment. Really I guess If I'm making the mapper and the game I can do whatever floats my boat. If people want to tag along great, if they just want to watch me make big headed ideas that never come to life and laugh I'm okay with that too :) at least I had fun and they got a chuckle.

It just seems like having no reguard for ROM space would result in a different more enjoyable design process as compared to developing the game with the intent to save space where possible until you run out, only then looking for a solution. You wouldn't want to go back and implant more detail in your levels, music, and graphics now that you effectively don't have a limit. I feel that it would be required to prove it's can be done with the hardware before hastily programming without concern for space. Really my best point of reference is thefox's streemerz project, something he discussed often it seemed was space and compression (maybe it just seemed that way because it made my head hurt), and I'm not looking forward to that part of my project...

tokumaru wrote:
If I had the option to use vast amounts of data for a low cost, I'd make a couple of kick-ass animation sequences just for the heck of it, even if the game itself would fit in a traditional cartridge.

My thoughts exactly... Characters and animations are the first thing I've decided to work on and that's where I'm currently at. I keep telling myself, oh well how could I reuse this tile? Or how can I do this animation with fewest frames?
EDIT: I should be asking myself how can I make this as kick-ass as possible?

I feel like I'm holding myself back where I wouldn't with this setup. And I'm not going to want to revisit this step much once I move on to the next part. Not that you can't make good animations within normal limits, it was obviously done by MANY production games. But those were also made at a time when memory of this size didn't exist or wasn't cost effective, but they undoubtedly had to put a lot of effort in to making that happen. I'd think the leverage of dirt cheap bits should be utilized if it makes my life easier.

Aside from time, I just can't get myself to start a serious NROM project because I know I'll end up throwing most of the code out for a follow on project using a mapper I truly want to develop on.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99586)
infiniteneslives wrote:
The idea would be to maintain the code that is always used on a standard parallel ROM, things like the game engine, user input, and music routines. Tepples suggested you'd need 128KB which seems more than sufficient and still cheap for a 'boot/engine' ROM.

Actually I was thinking of an 8 KiB boot ROM and a 128 KiB RAM so that the engine could be loaded from NAND too.

Quote:
It'd also be an interesting way for someone to share a 'generic platformer engine' and allow someone else to fill in the details of animations, graphics, levels, and music.

Sort of like my original plan for President before I got sidetracked with other projects.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99596)
tepples wrote:
Actually I was thinking of an 8 KiB boot ROM and a 128 KiB RAM so that the engine could be loaded from NAND too.

This would certainly be more versatile.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99603)
tepples wrote:
infiniteneslives wrote:
The idea would be to maintain the code that is always used on a standard parallel ROM, things like the game engine, user input, and music routines. Tepples suggested you'd need 128KB which seems more than sufficient and still cheap for a 'boot/engine' ROM.

Actually I was thinking of an 8 KiB boot ROM and a 128 KiB RAM so that the engine could be loaded from NAND too.

I see, well something like that would also be possible. I'm slightly regretting my choice to leave DIP as the only choice for PRG-ROM. I dismissed the fact I couldn't get PLCC to fit well, but I should have considered SOIC. Not too much of a biggie though, something to consider for the next rev I guess. 128KB does come in DIP for a couple dollars more and less common but will do just fine for development. Then the 8KB boot ROM could go in the location on the PCB designed for WRAM.

This idea would make the board a lot more universal as well. Effectively ALL game data is stored on the SPI flash. So it really would be like our own FDS but using DIP-8 SPI flash instead of diskettes. More comparable to the Aladdin deck enhancer I guess. More interesting thoughts.... If we were to use something like this for the bi-annual homebrew compo you could just sell the new SPI chip and users could swap it out with their cart. A fair number of people own programmers capable of SPI flash programming so updates would be more achievable.


Quote:
Quote:
It'd also be an interesting way for someone to share a 'generic platformer engine' and allow someone else to fill in the details of animations, graphics, levels, and music.

Sort of like my original plan for President before I got sidetracked with other projects.

Yes but your original plan is probably more likely, you'd at least have my vote ;)
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99611)
infiniteneslives wrote:
If we were to use something like this for the bi-annual homebrew compo you could just sell the new SPI chip and users could swap it out with their cart.

Which is a pretty good argument for considering using (Micro)SD, actually. More expensive, but reprogrammable by almost everyone.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99617)
lidnariq wrote:
infiniteneslives wrote:
If we were to use something like this for the bi-annual homebrew compo you could just sell the new SPI chip and users could swap it out with their cart.

Which is a pretty good argument for considering using (Micro)SD, actually. More expensive, but reprogrammable by almost everyone.


True, although significantly more difficult to design/implement with effectively requiring a mcu. For the SPI flash I 'merely' need to toss a shift register with proper controls into my CPLD. Going though the work with an mcu I'd rather have USB connectivity with the mcu to reprogram the SPI flash especially since I've already got most of that work done with the NESDEV1, just need to swap to SPI vice parallel. A USB socket is also cheaper than microSD socket and card. Having USB would have the added benefit of making game development less cumbersome, and not necessarily slower if the whole ROM didn't need to be programmed. Plus if you only wanted to publish a game with this setup you wouldn't have to include the added cost of the mcu, socket, and flash card.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99655)
Okay, so going with the assumption that this is a go, I'd like to try and come up with an idea of how the data exchange might go so I can try to come up with the hardware design. Sorry for the HUGE post, I was basically board this evening and decided to type this up while I thought through everything.

So we've got an issue of where ROM/RAM/mapper registers are all located. I figure the best way is to keep MMC3 registers untouched which would require all loading/writes to PRG memory to occur while the memory is mapped to $6000-7FFF only. So you'd map the 8KB bank there by some variant of MMC3 prg bank control the prepare to load it up with your game data.

So the SPI byte can't very easily be mapped to $6000-FFFF then if we want to keep the data exchange simple. To keep from requiring too many more address inputs we could place it at $5000-5FFF with the addition of PRG A12 as an input. So the most recent read byte would be there until the mapper was 'signaled' to read the next byte.

Now to figure out how to handle the commands and addresses sent to the SPI via the mapper. For anyone interested in details this is the data sheet I'm looking at for winbond 2MB SPI flash.

So I was trying to figure out someway to make it so that writes to the SPI wouldn't be serial so that writing 8bit instruction and 24bit address wouldn't be so slow. But I after some looking into things I don't think it's worth spending the logic to keep writes from being serial. I figure start with bare bones essentials, then if additional things are needed/desired we'll consider adding and weighing the trade off between CPU time and mapper logic. Additionally this keep things independent of what type of SPI you're using. Basically the mapper doesn't care how big it is, how large the pages are, whether it's EEPROM or flash etc. So even if someone were interested in this for something small like save data alone the mapper doesn't care. Emulator authors you're on your own I guess... Good news is there are data sheets for this stuff and the commands and such are pretty universal.

So for anyone unaware or not interested in reading the data sheet the SPI flash is pretty simple I'll spell out the basics. You write a 8 bit command followed by the address if applicable. For reads you just continue to clock the chip and it spits out data bit by bit, byte after byte on each clock until you disable it by taking /CS high. Similarly for writes you just continue to write the data you'd like to save, assuming you set things up properly and erased the page in flash and everything before hand. Once you're done with the long stream of reading/writing you take /CS high to finish the process. To start another access you take /CS low and repeat the process with the next command, address, data etc. Trust me though, if you want to write anything to the chip from the NES you'll have to look through the data sheet. If you're just reading data the discussion below is probably enough.

I figure the best way to signal the mapper to read the next byte is to write to a control register. But conveintly we've also got PRG A0 as an input, so I figure we'll have two 'SPI registers' at $5000 and 5001 (more specifically: $5xxEVEN and $5xxODD in normal MMC3 style). Here are the definitions I'm thinking:

-----------------------
$5000 "SPI WRITE" All writes to this register are fed directly to the SPI flash. This is where you can write commands and data directly to the SPI flash. Only PRG D7 is seen by the SPI flash. Here is where you'll have to give the read command followed by the address before data can be pumped out by the mapper. You'll also have to write save data here serially bit by bit (like controllers but writes). Don't forget you'll have to supply the write command followed by the SPI address you want to save data to. This is ALSO where you'll read full bytes from that the mapper will pump out for you.

-----------------------
$5001 "SPI READ/Mapper command" So we need to use this register to enable and disable the SPI flash by controlling the /CS pin on the chip. I figure we'll just use D7. So writing any value with D7=0 enables the SPI flash and disables it when D7=1. Additionally this is the register to use to command the mapper to fetch the next byte from the SPI flash so you can read it out in one full byte. So for now we'll say writing any value with D7=0 commands the mapper to fetch the next byte. Writing D7=1 will disable the flash and stop the read data stream.


So I basically need 9 CPU clock cycles to fetch each byte. one cycle per bit and one more to reset my control circuitry and set things up to pump out the next byte. I *COULD* cut this down to 8 cycles and ABSOLUTELY REQUIRE 8 cycles no more no less. Basically I'd only clock in 7 bits and the 8th bit wouldn't be clocked into the shift register, it'd just be placed on PRG D0 for the required READ on the 8th cycle. I don't think this is very user friendly though, and could easily cause data to be read improperly. In my loop I insert a NOP to make an 8 cycle STA/LDA cycle stretch out to 10 cycles.



TL;DR:

Here would be the sequence of operations to read data from the SPI flash:

1) Write to $5001 with D7=0 to enable the SPI flash. (while this does reset my pump/shift register control circuitry this intial command WILL NOT count for the command to fetch the FIRST byte.)

2) Write serially to the SPI flash via $5000 bit PRG D7. Things are written MSB first. So you'll have to write 03h for the read command followed by the 24bit address.

3) Now write to $5001 with PRG D7=0 to give the 'read full byte' command to the mapper.

4)Wait 9 clock cycles. You can do anything during this time except read/write to $5xxx. In a loop this is where I store the data from the previous read.

5)Read the first/next byte from $5000.

6) read next byte by looping back to step 3.

7) when DONE reading, write to $5001 with PRG D7=1.

Now you can save yourself some CPU time with step 7. Basically if you know the next stream of data you'd like to read is sequential from your current read you can just let the mapper and flash sit there idle. You could then come back 5mins later and read the next byte in the stream. Maybe the best way is to just leave it enabled after the read. Then if before you start your next read/write cycle you decide if you need to disable, enable, and issue another command.

Here is the code I wrote up as an example obviously there may be better ways to do this. But this should explain how it all works.
Code:
;;;;;;copy SPI to $5000-$5FFF routine;;;;;;;;
;first you must place the desired PRG RAM bank at $5000-5FFF via the MMC3 style control registers. (details later)

LDY #00      
STY $5001   ; Writing to $5000 with D7=0 enables the SPI flash for access. (takes /CS low)

;Now you must serially write to the SPI via $5000 bit 7.  the read command (03h) followed by the 24bit address, MSB first.

;Start unloading data now that everything is set up!
LDY #00      
STY $5001   ;command to read the FIRST byte (with D7=0 still)
LDX #$00   ;2cyc; set up loop counter and provide 2 cycle delay for SPI data pump
NOP         ;2cyc;
NOP         ;2cyc; need total of 9 cycles to setup pump timing for entry to loop
NOP         ;2cyc;
NOP         ;2cyc; okay it's been 10 cycles since STA $5001, enter loop
load_spi_to_wram:   ;copies 8KB bytes from SPI flash into page at $6000-7FFF
   LDA $5000      ;mapper places most recent flash read at $5000 (decoded by PRG A0,12-15)
   STY $5001      ;command to mapper to fetch next byte
   STA $6000, x   ;store first byte that was read
   NOP            ;provides at least 9 cycle delay from STA $5001
   LDA $5000      ;read byte
   STY $5001      ;fetch command
   NOP            ;delay
   STA $6100, x   ;store byte
   LDA $5000      
   STY $5001      
   NOP            
   STA $6200, x    
   LDA $5000      ;4cyc
   STY $5001      ;4cyc
   NOP         ;2cyc
   STA $6300, x   ;4cyc
   ...
   LDA $5000
   STY $7F00, x
   INX            
   BNE load_spi_to_wram   
   
;;end the read stream if you know your next SPI access isn't going to be a sequential read.
LDY #$80
STY $5001   ;writing to $5001 with D7=1 disables the SPI flash (takes /CS high)

   ;;;14cyc per byte * 8192bytes = ~115K cycles / 29800 = ~3.8 NTSC frames


Obviously you wouldn't have to do an entire 8KB loop, but assuming I haven't made too many mistakes that should work I'd think. Additionally it's require your data to be arranged in the correct order on the SPI flash to support this non-sequential copy loop. Maybe you guys can come up with a better solution/loop. I just did this to sort things out for myself. Copying data to pattern tables is even easier with just repetitive read $5000, delay, write $2007 loop.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99664)
infiniteneslives wrote:
So I basically need 9 CPU clock cycles to fetch each byte. one cycle per bit and one more to reset my control circuitry and set things up to pump out the next byte.
Could you use the NES's 21/26MHz clock source? I guess the down side is that It's not famicom/famiclone compatible. A crystal/resonator (digikey:HWZT-12.00MD,12MHz,28¢/1)? Or use both edges of of M2 somehow? Winbond's large SPI EEPROMs can be clocked at up to 104MHz so there doesn't seem to be a relevant upper bound. Or can you use the winbond quad/dual SPI modes?

On the other hand, you can't really beat 224kB/s and you're still talking about aggregate read speeds of 200kB/s so whatever.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99665)
lidnariq wrote:
infiniteneslives wrote:
So I basically need 9 CPU clock cycles to fetch each byte. one cycle per bit and one more to reset my control circuitry and set things up to pump out the next byte.
Could you use the NES's 21/26MHz clock source? I guess the down side is that It's not famicom/famiclone compatible. A crystal/resonator (digikey:HWZT-12.00MD,12MHz,28¢/1)? Or use both edges of of M2 somehow? Winbond's large SPI EEPROMs can be clocked at up to 104MHz so there doesn't seem to be a relevant upper bound. Or can you use the winbond quad/dual SPI modes?

On the other hand, you can't really beat 224kB/s and you're still talking about aggregate read speeds of 200kB/s so whatever.


Yeah I considered most of those things actually. I also thought about doing something like using a RMW instruction and direct reads from the SPI and writes to $6000. So LARGE unrolled loop could conceivably do it in 6 cycles (~290KB/s) with a lot of trickery, complexity, logic expense, I/O, components, etc. Like yourself, I realized it was plenty fast anyways so none of it's really justified.

Super simple, super cheap, plenty fast, tons of ROM so I'm happy ;).
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99669)
Writing the saved game serially might be a little slow, but I guess players expect that.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99677)
tepples wrote:
Writing the saved game serially might be a little slow, but I guess players expect that.


Slow for the CPU, but pretty quick for the player. I'm used to a few seconds on everything but the NES which is take no time with battery backing. So even if you somehow managed to come up with 256bytes of save data (a full page of flash) It'll still be under a frame's worth of time.

I wrote up the routine quick, keep in mind it could be even faster if you unrolled each byte. Here I figured worst case looping on each bit.

Additionally I think I'm going to change my mind about D0 being connected to the SPI for the $5000 register. The SPI handles everything MSB first. So instead of hassling with rolling the MSB around to the LSB it just makes more sense to connect D7 to the SPI flash data input.

Code:
;;;;save data to SPI routine;;;;
;this routine writes a full page of SPI flash
;before running you must erase the page
;and write the page program command (02h)and 24bit address
;alternatively you could load the command and address into your 'save_data' array:
      ;02h, addr4, addr3, addr2, addr1, save data (251 bytes)
;then this routine would give the program page command, address, and save_data all at once

LDY #00     
STY $5001   ; Writing to $5000 with D7=0 enables the SPI flash for access. (takes /CS low)

LDX #$00
write_to_SPI:
   LDA save_data, X   ;4cyc; load byte
   LDY #$08         ;2cyc; bit counter
   save_byte:
      STA $5000      ;4cyc * 8; write MSB to SPI (only D7 is connected)
      ASL A         ;2cyc * 8; move bit 6 to D7
      DEY         ;2cyc * 8;
      BNE save_byte   ;3cyc * 7 + 2cyc last;
   INX            ;2cyc; increment byte counter
   BNE write_to_SPI   ;3cyc

LDY #$80
STY $5001   ;writing to $5001 with D7=1 disables the SPI flash to end the write (takes /CS high)

   ;TOTAL time: ~100 cycles per byte = ~25.6K cycles = ~1frame


At a glance unrolling the byte loop would take around 55 cycles making it twice as fast which is around 32KB/sec. I guess if you wanted to be safe and read the data back, verify every byte then it'll take longer obviously. You could read back and compare all in one loop with only a few instructions so it's still not going to take more than a frame or two. And 256 is a lot of save data, you don't have to program the entire page if you don't have that much data.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99689)
I'm sure with good game design, you can make it seemless. Palette fade? Save on top of it since the game won't be playing. Screen switches? Write a page. Save point in your game? Make it save and then a sound effect for the player to know. I'm sure you can find 1-10 frames in game play which you can reuse as a save point to seemlessly add it.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99712)
I really don't think this is an issue... People are used to messages like "saving the game, please don't turn the power off" being displayed for a few seconds and I honestly don't remember anyone complaining. And apparently this will fast enough to not even require a message. As long as it takes less than 1 second, I don't think you need a message.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99746)
tokumaru wrote:
I really don't think this is an issue... People are used to messages like "saving the game, please don't turn the power off" being displayed for a few seconds and I honestly don't remember anyone complaining. And apparently this will fast enough to not even require a message. As long as it takes less than 1 second, I don't think you need a message.
If it takes longer than one frame, display a message anyways, just to make sure.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99747)
zzo38 wrote:
tokumaru wrote:
I really don't think this is an issue... People are used to messages like "saving the game, please don't turn the power off" being displayed for a few seconds and I honestly don't remember anyone complaining. And apparently this will fast enough to not even require a message. As long as it takes less than 1 second, I don't think you need a message.
If it takes longer than one frame, display a message anyways, just to make sure.

Come on. It's actually BAD to display messages if the message won't be visible long enough for the player to see it properly.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99748)
Or use an icon, rather than a message. I know games that save extraordinarily fast, have a little SD card/floppy disk icon that appears in a bottom corner while it's saving.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99758)
Yeah, sort of like the little floppy disk that would blink in the corner when Doom 1 would stall for loading. Super Smash Bros. Melee has a "saving" icon in the corner as well.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99821)
If you have the SPI flash and a shift register, would it be hard to add a way to stream PCM or DPCM audio from the flash to the expansion sound pin? I guess 1-bit PCM would be trivial (just connecting the LSB of the shift register to a spare CPLD pin and then automatically retriggering the read command) but for 4 or 8 bit audio you would need another latch and more spare pins on the CPLD.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99822)
Grapeshot wrote:
If you have the SPI flash and a shift register, would it be hard to add a way to stream PCM or DPCM audio from the flash to the expansion sound pin? I guess 1-bit PCM would be trivial (just connecting the LSB of the shift register to a spare CPLD pin and then automatically retriggering the read command) but for 4 or 8 bit audio you would need another latch and more spare pins on the CPLD.


That's an interesting thought... I had only considered storing DPCM samples on the SPI, loading them to RAM and playing. But I'd think something like you're imagining could be possible as well assuming an EXP audio jumper/resistors were installed.

You'll have to forgive me I'm not much of a sound buff but I am interested in the possibilities, so feel free to correct me on this stuff or suggest better solutions. I made the exp pins easily accessible by extending all the pins into the cart (don't have to chip away at the cart shell to access them) The CPLD that's going to handle the SPI flash should have a free pin that could be assigned to the task. Or if you were accepting of a 0-3.3v signal you wouldn't even need a CPLD pin, the SPI could be connected directly to the EXP pin.

So really I'd imagine doing it a little differently than using the SPI for game/save/graphics data. It could be set up to just run free, so after writing the command and address to the SPI via $5000 bit 7, reads would be automatically enabled (all this really means is the SPI needs to be continually clocked after the read cmnd/addr). And the SPI would just spit out the data stream until the chip was disabled by writing to $5001 with D7=1. You wouldn't even bother with the shift register, just let the flash stream bits on each clock pulse. I'm guessing 1.79Mhz would be a little faster than desired for an audio stream. Instead of a shift register a clock divider could be put in it's place.

I'd guess you'd also want a low pass filter and could easily locate than in the perf area.

If there was logic to spare both the shift register and clock divider could be implemented at once. I'd just have to add another definition to $5001. Perhaps something like D6=0 divided clock bit stream to EXP pin, D6=1 byte feeding as discussed previously. D7 would still enable/disable the SPI which would stop either bit stream or byte feed reads.

EDIT: it wouldn't be required, but might be nice. The SPI's hold pin/function would basically act like a 'pause' for the bit stream. So you could stop the stream and pick up where you left off if control was given to that pin. Perhaps by D5 on $5001. We'll see how much logic and pins are available, but if desired this could be considered as well.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99830)
infiniteneslives wrote:
I had only considered storing DPCM samples on the SPI, loading them to RAM and playing.

Remember that DPCM samples can only be played from the $C000-FFFF memory area.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99854)
1-bit audio at 1.7 mhz would have decent quality (about equivalent to 8-bit at 22 khz) but at that rate you would only get 10 seconds of audio from a 2mb chip. Oversampled audio is not very efficient to store uncompressed.

8-bit audio could be implemented without a latch by zeroing the outputs when shifting new data into the register. For around 44 khz operation, that would mean loading a new byte every 40 cycles, and the DAC would be off 22.5 % of the time. This would require a more aggressive low pass filter to be used for the sound quality to be acceptable, and probably an opamp to amplify the output of a resistor ladder DAC, but it gives 45 seconds of recording time at a higher quality than the 1-bit solution. That's plenty for drum and bass samples and maybe a few voice clips, at a quality about as good as raw PCM with no CPU usage required. Below 44 khz it would become increasingly difficult to filter the waveform.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99857)
Grapeshot wrote:
1-bit audio at 1.7 mhz would have decent quality (about equivalent to 8-bit at 22 khz) but at that rate you would only get 10 seconds of audio from a 2mb chip.

Yeah, you'd be trying to push near SACD quality (SACD is 1-bit audio at 64*44100 Hz = about 2.8 MHz) through an NES. But the overall technique of a delta-sigma DAC remains useful for an output stage.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99864)
Huh, it's possible to get acceptable 1-bit audio quality at a much lower sample rate than I thought. (I was thinking of oversampling on a normal DAC, not a delta sigma bitstream) If 1/8th of the NES clock frequency is acceptable for samples, that gives much more flexibility.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#99870)
thefox wrote:
infiniteneslives wrote:
I had only considered storing DPCM samples on the SPI, loading them to RAM and playing.

Remember that DPCM samples can only be played from the $C000-FFFF memory area.

Good point I forgot about that. Shouldn't be much issue though with the 128KB PRG RAM though, each bank would be fully mappable to $8000-FFFF in normal MMC3 sytle. You'd just have to map the bank to $6000-7FFF while loading from SPI, before placing it in $C000-FFFF. Although a PRG-ROM/WRAM variant would require WRAM to be mapped to $C000-DFFF.

I wrote:
The CPLD that's going to handle the SPI flash should have a free pin that could be assigned to the task. Or if you were accepting of a 0-3.3v signal you wouldn't even need a CPLD pin, the SPI could be connected directly to the EXP pin.

Oh I forgot something else though. The CPLD is actually 3.3v supply it's only 5V tolerant. So the output would be 0-3.3v without making use of an output buffer. But I'm guessing that isn't much issue, perhaps even preferred assuming the EXP resistors are properly tuned. If not, a simple 5v inverter would be enough to step it to 0-5v before filtering.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#107232)
Sorry for another necropost, but I had a random idea:

What if there were a way to boot out the SPI ROM, so that we don't need a separate parallel flash at all? Two ideas occur: 1- use a fast clock (external RC, crystal, NES's 21/26MHz) so that the CPLD could fetch a byte from SPI at the full instruction fetch rate, or some very simple tiny padding loop that will still allow the use of the 1.7MHz main clock stuffed with some simple or repetitive byte sequence, possibly deliberately using bus conflicts.

Either way, this could load a small (256b?) "boot sector" which would then finish the rest of the game load "correctly".

It could also use way too much CPLD space for the cost savings, I don't know.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#107298)
oh I don't consider it a necro, the idea is still alive and well in my mind at least :)

I actually had what I thought was a decent idea to ditch the parallel bootrom that just might work. I was holding back from rambling on about may hair brained idea, but since there are people apparently still interested in the idea I'll share.

You're right, with a small cheap 5v tolerant CPLD is would probably consume too much logic to handle starting up from SPI flash. And there is no longer such a thing as small cheap parallel ROMs currently in production. We've discussed the idea of using a micro-controller as ROM before, which I still believe would be prohibitively too low for random access and additional tasks. BUT a bootrom/bootstrap wouldn't have to be random access. You could have the mcu act as a ROM that loaded the bootloader into NES SRAM with LDA/STA's and then jumped to the bootloader. The CPLD would have a 'startup' state which would blanket decode $8000-FFFF (or some subset) to the mcu. The bootloader in SRAM would then turn off the CPLD 'startup' mode and start pumping the SPI flash through the CPLD into (32-128KB) PRG-RAM.

So you could use a cheap little 50cent mcu as the 'boot-ROM' and keep all data on SPI flash. Have a simple CPLD mapper, and 32-128KB of CHR-RAM and PRG-RAM. Thereby completely removing all parallel ROM which thus making parallel ROMs a thing of the past. ;)

There's all kinds of other things you could use the mcu for as well while still keeping it cheap. Things like USB programmability of the SPI flash, IRQ's, and sound to name a few...
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#107318)
I'd almost recommend loading the first 512 bytes of the volume into $FC00-$FDFF, with some space reserved for the FDC descriptor and other fields in the PC volume boot record, so that a familiar file system can be used when loading files onto the SPI. Then that code can chain-load the game itself. To distinguish it from a boot sector for PCs, it could end with 02 65 instead of 55 AA.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#107330)
tepples wrote:
I'd almost recommend loading the first 512 bytes of the volume into $FC00-$FDFF,


So I understand you're recommendation, you're saying this descriptor would reside in the first 512 bytes of the SPI and describe the SPI volume. The bootloader would put that 512bytes into PRG-RAM @ $FC00-$FDFF so the program can reference it. I'm curious what uses this 512bytes would serve. aside from size all that FDC descriptor stuff seems kind of useless on the NES.

tepples wrote:
To distinguish it from a boot sector for PCs, it could end with 02 65 instead of 55 AA.

cute :)

My thought was fairly simple for homebrew use. The bootloader would load the first 8KB of SPI flash into the PRG-RAM's fixed bank at $E000-FFFF and hand over control to the game with the reset vector. Only thing is, the first 512bytes of SPI wouldn't end up in $FC00-$FDFF, they'd probably be at $E000-E1FF. Game data could be arranged however the user chose, they'd just have make sure it was loaded from SPI before making use of it. I figured it'd be nice to put a set of helper routines in there as well to abstract the SPI enough to keep the developer from needing to know much about SPI. They'd only have to specify which address and range of SPI to load where or vice versa for saves and jump to my subroutine.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#107346)
infiniteneslives wrote:
you're saying this descriptor would reside in the first 512 bytes of the SPI and describe the SPI volume. The bootloader would put that 512bytes into PRG-RAM @ $FC00-$FDFF so the program can reference it. I'm curious what uses this 512bytes would serve.

Same as a PC boot sector: to load the main executable from the root directory (like NTLDR or IO.SYS on a PC or PRODOS on an Apple II) and jump to it. Using the standard size boot sector allows files to be copied on in the usual way.

Quote:
aside from size all that FDC descriptor stuff seems kind of useless on the NES.

The MCU is exposing the SPI flash to the PC as a mass storage device. The descriptor is so that the computer (which more likely than not runs Windows or Mac OS X) knows what to do with the volume rather than "helpfully" reformat it.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#107486)
tepples wrote:
infiniteneslives wrote:
you're saying this descriptor would reside in the first 512 bytes of the SPI and describe the SPI volume. The bootloader would put that 512bytes into PRG-RAM @ $FC00-$FDFF so the program can reference it. I'm curious what uses this 512bytes would serve.

Same as a PC boot sector: to load the main executable from the root directory (like NTLDR or IO.SYS on a PC or PRODOS on an Apple II) and jump to it. Using the standard size boot sector allows files to be copied on in the usual way.

I'll have to learn up on boot sectors and such so I can figure out how this might work in this case. The bootloader wouldn't actually be in the SPI as I had planned, it'd be in the mcu's rom.

Quote:
Quote:
aside from size all that FDC descriptor stuff seems kind of useless on the NES.

The MCU is exposing the SPI flash to the PC as a mass storage device. The descriptor is so that the computer (which more likely than not runs Windows or Mac OS X) knows what to do with the volume rather than "helpfully" reformat it.

I didn't actually plan on making the SPI flash look like a mass storage device. I would set it up like I did the NESDEV1 where the mcu acted as a USB programmer for the SPI flash. So the OS formating the 'drive' isn't a concern. So the only use of the descriptor would be for the host PC's programmer application and for the NES itself. Something more along the lines of a ines header seem more useful than FDC descriptor.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#107490)
So what device class would your MCU implement? And who would make drivers for Windows (32-bit), Windows (64-bit), Mac (32-bit), Mac (64-bit), Linux (32-bit), and Linux (64-bit), including the cost of getting the Windows (64-bit) driver signed? Using the mass storage class means the user will already have the driver installed.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#107493)
It'd use V-USB

The drivers are pretty easy to install, and pretty popular. I've bought several devices that use this library, and made many others including the kazzo. There devices were sucessful using it, and I've been sucessful implementing it myself. I'm no guru on all the details of device classes, signed drivers, etc. It uses libusb drivers and I've never had an issue wth them. Are they signed? I dunno, they just work so I never took the time to dig into such details. As far as I'm concerned it's problem solved. I don't think it's that big of a deal to ask the user to install a driver...

That and I find it a lot easier and user freindly to have a little app that programs the cart. No copy pasting, or finding some input to allow data transfer to occur. After compiling your latest build you just click the program button.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#108190)
The MCU would need to feed the CPU enough of a boot loader to read the first sector from flash and execute it. This means it'd need to output on D7-D0 long enough to put a reset vector, a stream of A9 (program byte) 85 (address byte), and finally 4C 00 00 on the data bus before taking itself high-Z. How many inputs and outputs are on such an MCU?
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#108197)
tepples wrote:
How many inputs and outputs are on such an MCU?

The beauty of it is you really don't need many i/o to have a mcu act as a non-random access ROM. So all you'd need is 8 i/o for data and 1 i/o for enabling. You don't even bother to decode the address bus because you KNOW what the address will be, you're the ROM after all. You're telling the CPU what the next addresses will be, so you don't need the CPU to tell the same info back to you again. The bootrom doesn't contain any conditional code which is what allows it to be non-random. There would be a fair amount of time needed to do this depending on how long the boot routine was, but this really should be a limit for the mcu. The only limit that really applies is how much ROM the mcu has which is drastically more than necessary.

All the mcu has to do is watch its enable signal which could be as simple as PRG /CE. If it's low output the first byte and retrieve the next. When it goes high release the bus and prepare the next byte. When PRG /CE goes low again output next byte and repeat. You could even cheat and not release the bus when you know the CPU won't be writing to RAM on the following cycle. Shoot you could cheat even further if you wanted and output the same value that the CPU is putting on the data bus since you know what it's going to be. That makes it a little more complex though since you have to do a little more than watch PRG /CE.

The loop the mcu is performing is pretty small especially since it's not conditional aside from enabling to output on the bus. So timing should be pretty easy to satisfy. If I was able to emulate a '161 for bankswitching which had a couple conditional loops with a 12-16Mhz mcu this should be cake in comparison.

The mcu would start off by feeding a reset vector that really doesn't matter as long as it's $8000-FFFF and stays in that range through the bootrom access. Then you just feed a sequence of LDA and STA's copying boot routine from the mcu into RAM. Then jump to RAM and execute the whole bout routine. The routine would SPI access routine fetching the game data and code from the mapper/SPI and dumping it into PRG-RAM.

My initial thought was to use a ATtiny20 with their 12i/o's which is plenty if you ONLY want it to substitute as a bootrom. The only thing you need is 8i/o for data and one i/o for enabling the mcu. This could work for a published cart, but is limited and not easily converted to a devcart with USB programability. You need 4 more i/o for USB two for USB and two for a calibrated clock, which doesn't leave any i/o for enabling the mcu. It's conceivable to multipurpose the USB i/o when running on the NES and disconnected from USB, you'd also have to multipurpose the data lines for SPI interface through the CPLD. That'd be pretty limiting, and it doesn't allow you to move a lot of the logic from the CPLD to the mcu. The more logic that can be moved to the mcu the better, so we can save the CPLD logic for time critical things like bankswitching.

Because of that I've pretty much settled on the ATmega48/88/168/328 family, size depending upon how much ram/rom I need/want on the mcu. The ATmega88 should be plenty of space when running USB as well.

Here's my current i/o plan:
Code:
Port B:
(2x) USB
(4x) SPI
(2x) crystal/resonator

Port C:
(1x) enable signal from mapper
(1x) PRG R/W
(1x) Sound out to EXP6
(1x) IRQ
(1-2x) JTAG TCK for 1-2 CPLDs?
(1x) spare for mcu boot loader select perhaps?

Port D:
(8x) PRG Data
JTAG TDI, TDO, TMS connected to the data bus D0-2 to allow for reprogramming the CPLD(s) when the cart is disconnected from the NES which would multi-purpose these i/o. (EDIT:verified possible)


So this would take utilizing the mcu to another level. Pairing the CPLD(s) and mcu together should allow for some pretty sophisticated mapper features with a small amount of low cost hardware in comparision to my NESDEV1 project. Something like this would actually be cost effective enough to produce hardcopies for a homebrew game. And if the sophisticated mapper wasn't desired because something closer to a discrete mapper was enough you could cut back to the ATtiny20 as a bootrom. While the mapper does cost more in comparison to a cheap little discrete mapper, something like this allows you to have HUGE amounts of ROM giving you a lower total cost when you consider how expensive large parallel 5v ROMs are in comparison to SPI flash.

Some of the features like USB programability, mapper JTAG reflashing, and such are more tailored towards a devcart version. But having the mcu at the ready could really open up a large amount of features to a homebrew game. The CPLD would address decode for the mcu simply sending it a mcu enable signal and the mcu could sense PRG R/W. So my plan is to implement a set of opcodes for the mcu which would be accessed at $5000/5001.

I've still got some details to iron out but there's a ton of potential here. Moving SPI access to the mcu will be a little more tricky but frees up logic in the CPLD quite a bit. Aside from tossing a synth in mcu you could also put things in there like IRQs, hardware multiplier, etc. Anything that fits well in an opcode-operand format and doesn't require immediate processing should be feasible. You'd probably have to leave the mcu alone for a while while it goes and performs whatever operation you just asked of it. For longer operations like fetching SPI data it might be fitting to just have it fire an IRQ when it's ready to pump out data. That way the CPU can perform other stuff at the same time truly making it a co-processor. :D While all this is going on you'd still have the CPLD handling bankswitching, scanline/cycle counter, etc which is all the stuff the mcu doesn't have the time, i/o, or speed to take care of.

Here's the best part about this is the cost of the hardware is actually reasonable, one might even consider it free when compared to a comparable setup. Consider having a MMC3 with 32KB CHR RAM, battery backed WRAM, and 512KB PRG-ROM. Swap the parallel PRG ROM and battery backing for the mcu, 128KB PRG-RAM, and 1MB of SPI flash for around same cost. Basically you get twice the ROM space, hassle free saves, and all the complex mapper features for FREE. Want more ROM space? You don't have to worry about supporting a bigger mapper, just buy larger SPI flash. Double it (2MB) for less than 50cents. Quadruple it (4MB) for less than 75cents.

Even if you didn't care for all the complex mapper stuff and just wanted to use a ATtiny20 as a boot rom and have comparable to discrete mapper function from a small CPLD. Comparing it to UOROM, the hardware difference is less than $2 to get 1MB flash, 128KB PRG-RAM, and saves. You'd probably pay the same amount to add battery backed WRAM to a UOROM style mapper...

Now I'm really starting to get excited about the potential 8-)
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#108202)
I tried a MCU only approach on SMS for banking a parallel Flash, and the 16MHz ATmega is not gonna make it unless it adds a waitstate for every Z80 cycle. 20MHz chip would have no problems however.... but no time whatsoever for any other task than listening bus and receiving and putting bank values...
NES has ~half the bus speed though, so that will tremendously help.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#108205)
Even with the NES at half the bus speed PRG bank switching isn't even half the game. CHR is where the complex bankswitching is really going on which is ~3 times as fast, that's a real deal breaker for mcu bankswitching especially if you're trying to do both PRG & CHR. That's why I like this approach where the mcu has nothing to do with bankswitching. Bankswitching really only belongs in the hands of a CPLD.
Re: INL-ROM custom MMC3 hybrid mapper design
by on (#108455)
This project looks great!! Me personally I've been looking for a MMC3 tkrom 256kchr / 512kprg / 8kwram cart I could buy both for testing on the actual NES and to make carts of my homebrew once it's done.
Powerpak seems to have some unique issues.