I just finished a small project which uses the free / open source digital logic simulator
Logisim to simulate the MMC1 chip.
The logisim file is uploaded
here.
For starters, you are supposed shift bits in the 4 registers by toggling the ROMSEL input (when R/W is low, to simulate the 6502 CPU was writing to $8000-$ffff), and then you can try to see how the outputs reacts to different combinations (different adress ranges, etc...)
It's probably not 100% perfect, but it should do most things like the real chip.
The only part of the chip which is lacking is something that sets all latches when the chip is powered up.
That's pretty slick, thanks for posting. I'd never heard of Logisim before. How long did it take to put together the MMC1 layout?
Nice work!
I've been meaning to do something like this, I'm surprised no one has yet. Or at least it hasn't been made available. Some images of the logic would seem to be a great add to the wiki.
I'll definitely use this as a starting point for when I'm replicating the MMC1 on a CPLD for the NESDEV1 cart. I'll be sure to share the results of actual testing on the NES. If I get around to it I would want to do the same for MMC3.
The only thing about this circuit is that it doesn't simulate the "ignore 2 consecutive writes" beahavior of the MMC1.
I belive this is some issue that, on the 5th write, the last bit is shifted in the shift register and the shift register is copied to one of the 4 MMC1 regs simultaneously. It works fine in logisim but in reality it's a race between the MMC1 regs and the shift register to who updates first. If the shift register updates first then it's all fine but if you're unlucky and the shift register hasn't finished to update when the MMC1 reg is clocked, then it would load a wrong value.
To fix this I belive there is some logic that adds a 1-clock delay before the MMC1 registers actually gets updated, and that an additional shift register write in this state would not have any effect.
This requires some additional latch that would be 1 whenever the MMC1 regs should be clocked on next M2 clock (instead of simply being clocked when the counter equals '5'), and blocks writes when in '1' state, but also PRG A13 and A14 should be latched for the same reason.
Or alternatively, all 4 MMC1 regs should have this delay latch, and shift register writes should be blocked if ANY of those 4 latches is set to '1'.
I'll see if I can implement this in logisim.
Also this is pure specualtion, and how the chip is actually implemented might be completely otherwise.
Finally I'd say it's sad I can't test this on hardware but oh well. It sucks that the powepak is such a privately and closed source device. It could almost be a iPowerPak
Bregalad wrote:
It sucks that the powepak is such a privately and closed source device. It could almost be a iPowerPak
I guess the "almost" is that if it were really an iPowerPak, the firmware would act like the Atari 7800 firmware: .nes files would have to be either A. signed by bunnyboy or B. signed with a certificate that costs $99 per year to buy from bunnyboy.
Actually that's not how it works, I thought so too and I couldn't get a working MMC1 for the life of me. Loopy posted his shift reg logic though and he was clever enough to realize that each register's MSB comes directly from D0 (so it's a 4-bit SR).
If you're doing things at the gate level there is some room for improvement and simplification. Probably the real thing uses clock gating and latches.
Also the PowerPak isn't entirely closed source since Bunnyboy released an example mapper ages ago. If you tried it wouldn't be hard to make a full schematic of it to understand the boot logic. At that point the only thing missing would be an open firmware.
Well it seems this is pretty correct exept one thing.
The 4-bit shift register on the bottom apparently shifts on all writes, even during the 5th write - so the problem I mentioned in my second post still happens.
I think there should be a way to block the 4 bits in the shift registers and copy it's content in the actual register on the 5th write without it shifting at the same time. This is only a matter of an additional logic gate though.
EDIT : I've updated the logisim file so that now it uses only 4 bit shift register, and blocks the shifting every 5th write when the data should be transfered to a true register.
Also something looks wrong to me in Kyuusaku's circuit : When you wrtie something with D7 set, the counter will reset to 0, which is the same as what happens after a 5th write. In other words the first write is done at the same time as D7 is set, while on a real MMC1 the first write is the next one after a write with D7 set. I had to be very careful with this in my circuit : I load '1' in the counter after the 5th write, and reset the counter to '0' only if a write with D7 set is done.
I couldn't set it to '5' because if this was done data would have been copied to a register which is not what we want !
Therefore the additional '0' state is required, and is accessible only by writing something with D7 set.
On the 5th write the shift register shifts, but the 8/A/C/E data register is updated more or less at the same moment so the final shift won't corrupt the register. Using flip flops with enable instead of clock gating with a decoder will ensure that the data register is updated safely, but the timing isn't that critical and I really doubt a chip designer back then would use so many gates.
On the other side I think it's really unreliable to load a bit in a latch at the same time this bit is potentially changing. This is the kind of stuff which can work very well one day and not work at all another day if you changed a chip or something like that.
In a IC like the MMC1 maybe there is a way to control the circuit more precisely and make sure it behaves as supposed, by placing the transistors carefully on the die, but I'm not even that sure.
Blocking the shifting on the 5th write is necessarily, but it only takes a single gate to add this functionality so I guess it's not "terribly more logic".
If the data registers are clocked solely by /ROMSEL (fully synchronous) there are no stability issues, all sequential logic is like this including the shift register. Everything relies on the propagation of the asynchronous circuit and master latch to be longer than the clock inverter to switch from the master to slave latch in a D-FF. This stability is compromised in my example since in addition to the clock inverter inside the register, the decoder's propagation has to be taken into account, but it should still be safe. It always uses less logic to gate the clock or use delays, but I think in practice that's even less predictable.
You are right, shift registers are entirely based on this "instable" behavior so it seems it will work if both latches are clocked simultaneously.
Quote:
This stability is compromised in my example since in addition to the clock inverter inside the register, the decoder's propagation has to be taken into account, but it should still be safe.
What make you think it should still be safe ? There is no evidence for this.
Also what about the reset mode (D7=1) bug I mentioned in my older post ?
Unless a particular game is using the reset mode bug, does the reset mode bug really matter enough to be worth emulating? What does exploiting the bug save you in a homebrew game, one byte?
No, no that's not what I was talking about.
I was saying Kyusaku's circuit would bug when you use Reset, as the reset write will act as the "first" shift register write, and the next write will be considered the second, when it is supposed to be the first. Either it's me who misunderstood a part of the circuit, or this bug actually exists, and this would break *all* MMC1 games, homebrew or not.
As I understand it, the MMC1 front end works as follows:
- A reset (CPU D7 = 1) MUST copy 1 into the PRG bank mode bits of the $8000 latch, changing it to fixed-$C000 mode.
- A first through fourth write that is not a reset (CPU D7 = 0 and shift register D0 = 0) MUST shift CPU D0 right into a 5-bit shift register.
- A fifth write that is not a reset (CPU D7 = 0 and shift register D0 = 1) MUST copy D4-D1 from the shift register to D3-D0 of the latch selected by A14-A13 and MUST copy CPU D0 to D4 of that same latch.
- A reset or a fifth write (CPU D7 = 1 or shift register D0 = 1) MUST reset the shift register to a value of 10000 afterward.
- Behavior of writes in consecutive CPU cycles (INC, DEC, ASL, LSR, ROL, ROR) is probably officially unspecified, but in Nintendo's chip, the second write appears to be ignored, and a handful of games rely on this.
It appears you were talking about the fact that a reset doesn't shift the shift register afterward. I was talking about e, appearing to ignore consecutive writes.
You know your five points sounds complex for a non-english speaker for me. Why can't you keep things simple ?
(I haven't emulated reseting affecting reg0, nor did I emulate the behavior where 2 consecutive writes are blocked, nor does kyusaku's circuit, this is too complex for now)
I don't think it matters at all if, when you do a reset write with D7 set, if the shift register is affected or not.
Affecting it in any way should have absolutely no effect, as all the bits will be shifted out before being used, so I guess what requires less logic is to just shift D0 in as a normal write, even if this bit will end up unused. (what both my logisim circuit and kyusaku's circuit does)
However, what DOES matter is that none of the 4 MMC1 regs are updated on a "reset" write, and that they gets updated 5 writes AFTER the reset write, NOT including the reset write itself. Then, they should be updated 5 writes after that, etc...
In my logisim circuit I do this with a 0->5 counter, and the registers gets updated when the counter reaches 5. The only way to acess state '0' is by reseting the shift register, after the 5th write it returns to '1'.
In kyusaku's circuit, he do this with a 0->4 counter, and both reseting the shift register or doing a final write will reset the counter to '0'.
The problem is that I think this will not work, because after a reset, the regs will be updated after 4 writes instead of 5, which would lead to disastrous results.
I hope I made it clear now what is the "bug" I was talking about... And I am sorry if I misunderstood something about his circuit.
Bregalad wrote:
You know your five points sounds complex for a non-english speaker for me. Why can't you keep things simple ?
I was trying to write them in a way that one can easily translate into an HDL, and I used "must" to mean "MUST" as defined in
RFC 2119.
Quote:
I don't think it matters at all if, when you do a reset write with D7 set, if the shift register is affected or not.
At the very least, after such a reset write, the shift register after a reset must be ready to take a first write, not a second, third, fourth, or fifth write.
Quote:
Affecting it in any way should have absolutely no effect, as all the bits will be shifted out before being used
Doing it the way I said allows the shift register and the 0->5 counter to be implicit. This saves two bits of internal register vs. a separate 4-bit shift register and 0->5 counter. If you want, I can draw you a picture of how this method works.
Quote:
However, what DOES matter is that none of the 4 MMC1 regs are updated on a "reset" write
Not even the PRG bank mode bits of $8000? I'm under the impression that resetting the MMC1 forces the last bank into $C000-$FFFF. But you're right that a reset never copies five bits from the shift register into one of the four registers.
Oh I got it, you sugessted yet another method of simulating the register that would remove the conter completely at the price of a 5-bit registers for holding one "enable" bit (as in a Johnson counter) and 4 data bits, while the 5th bit would come directly from D0.
This might actually be pretty simple to implement in hardware even using discrete chips, you'd just probably need a 8-bit SR and use only 5 of it's bits, as only the 7496 seems to be a 5-bit SR and it haven't been re-released in HC form so this chip is old, rare, expensive, and energy-eating.
But again I'm afraid the problem of a "natural reset" after a 5th write vs a "forced reset" after a write with D7 set would be a bit tricky to implement
Reseting R0 to some value is as simple as trigerring its "Parallel load" mode when you detect a write with D7 set with a constant, I'll definitely add this to the next version of my logisim circuit.
This should also happen on power on by the way, at least for MMC1A and higher I think.
Bregalad wrote:
5-bit registers for holding one "enable" bit (as in a Johnson counter)
"Johnson counter": That's the word I was looking for. It's closer to an Overbeck counter according to Wikipedia, but now I know what to call the carry-as-counter technique I use in my controller reading code and my 16-bit binary to decimal conversion code.
Quote:
But again I'm afraid the problem of a "natural reset" after a 5th write vs a "forced reset" after a write with D7 set would be a bit tricky to implement
Parallel load 10000 enabled by (CPU D7 OR shift register D0) perhaps?
Quote:
This should also happen on power on by the way, at least for MMC1A and higher I think.
But how does the mapper detect power on?
Maybe a power-on delay? With it the mapper can then detect if it has been powered up or it has been reseted by comparing the power-on delay.
Ack! too much to read!
In my example after a reset it does take *5* clocks.
reset clock -> counter is 0
clock #1 -> counter is 1
clock #2 -> counter is 2
clock #3 -> counter is 3
clock #4 -> counter is 4 (now data register is selected by the counter's bit 3)
clock #5 -> data register is loaded and counter is reset simultaneously
Kyusaku, thanks for the precision, apparently the trick is to use a synchronous reset instead of asynchronous.
I uploaded my circuit to reflect those changes :
http://dl.dropbox.com/u/23465629/logisim/MMC1_logism.circ
For some reason I had the stability problem where the MMC1 regs loaded the data after the shift register being shifted, so I had to block the shifts during the 5th write.
What makes you so sure your circuit (using 74 series chips) will be stable on this aspect ?
I'm not sure it will work well, there is a race condition between the decoder disable speed and the shift register propagation + data register setup time. The decoder should win this condition because disable is fast, but it's possible it won't. The point is that this logic is correct, and design can definitely be made stable by making all registers synchronous to /ROMSEL. To do this the data registers need to be "enable" type maser-slave flip-flops with a MUX selecting D or Q. Perhaps Nintendo took this route, but it uses a lot of gates / bit.
Quote:
The point is that this logic is correct, and design can definitely be made stable by making all registers synchronous to /ROMSEL. To do this the data registers need to be "enable" type maser-slave flip-flops with a MUX selecting D or Q. Perhaps Nintendo took this route, but it uses a lot of gates / bit.
I'm not sure what your are talking about, but what about just blocking the shift register from shifting when the registers are about to load, like I do in my logisim circuit ? I think it should be as siple as OR-ing either /ROMSEL or R/W with Q4 before clocking / loading the shift register. The only con is that it requires an additional OR gate, but stability is guaranteed.
OK I tried the program and looked at the design.
You're right, you can gate the shift register with Q2, but if everything is synchronous to /ROMSEL it shouldn't be necessary, that's all I meant. And since you're using flip-flops with enable inputs already there's no reason for this section of the design to not be synchronous.
Right now both "good" high-gate components like DFFE are being used along side "bad" low-gate practices of clock gating, so it's like a confusing mix of paradigms. Clock gating doesn't matter much here because the timing isn't very strict on a 6502 since it doesn't drive the bus during Phi1, but in some other system the data registered will be invalid.
Something I noticed though is that you're asynchronously clocking the data registers using the counter's Q2. This won't work because they will be loaded on the 4th clock, not the 5th. In my diagram the decoder is enabled by Q4, /ROMSEL and R//W so the registers will be asynchronously strobed during Phi2 of the 5th clock. As soon as /ROMSEL is deasserted (Phi1) the registers are loaded so there is minimal delay. This is the standard practice when asychronously clocking registers, but as you noticed it could have problems without precautions.
OK so I guess to fix this, I have to add some logic that clocks the registers only when Q2 is high AND Romsel and R/W are low, so that I'll be able to get rid of the enable on the shift register
I'll do that as soon as I'll be back home.
EDIT : I've fixed things to do it like Kyusaku's circuit, using enable pin on the adress decoder, and apparently it works well. Also I made it now that Reg0 is OR-ed with 0xC when a shift register reset is executed. (it was quite an annoying thing to do in fact). I'll now be working on a MMC2.
EDIT 2 :
Now here is the
MMC2 and
MMC4.
(The only difference being them both are SRAM support and PRG banking.)
You are supposed to write the registers by setting R/W to zero and toggling !ROMSEL as usual, but this time the registers are directly loaded in parallel. The CPU adress will determine which register is written to.
On the bottom reside the infamous CHR Latches. On this mapper, only two of the 4 CHR registers are used at a time, CA12 decides it as usual but there is also one latch for each pattern table (left / right) deciding which CHR register (A or B) is used.
You set a latch by making the PPU adress bus to $xFD or $xFE, and toggling CHR !RD. Two (negative) read pulses are necesarly, to simulate the PPU fetching two pattern table bitplanes. This is why there is two latches in series, so that the actual switching only occurs on the next read after the first switch.
The only thing I find weird is that CA0-CA3 and M2 are physically connected to the MMC2 but I didn't find any use for those signals.
EDIT 3 : I suddently remembered posts about inner MMC2 shemetics being discussed here loooong ago. I'll compare my shematics with Nintendo's and look for any differences.
Here comes the
MMC3.
I haven't hard stressed the simulation yet, there might be a few bugs, especially arround the scanline counter, but it gets the job done for now.
You're supposed to write the the registers as usual by toggling /ROMSEL, but this time there is a lot of different latches for PRG and CHR banks, this is a bit spagetthi and I'm sorry about that.
(I purposely ignored the circuit which ignores quick A12 edges during sprites fretches to keep things simple).
Nice work!
I'm looking forward to using this as a reference to write these up in Verilog and test everything on the actual hardware. I've got a CPLD large enough for MMC1 right now and will have something big enough for the MMC3 once I get the NESDEV1 prototype together early next year.
The real issue for me is time, I have high hopes of emulating the MMC1 during break next month but we'll see.
EDIT: now all that's left is the MMC5...
Well I've been playing around with this lately. Thanks again for your designs Bregalad. I've got what I think is a pretty solid design right now. It's a bit different than yours in regard to the front end. Instead of gating and generating clock signals (which only results in troubles in my experience) I'm clocking everything with M2 and generating enables signals for all the registers with a state machine. I know it's more logic and everything but it's the only clean way to do it in my mind. And what makes you guys think that Nintendo was trying to reduce the number of gates and logic? I doubt it would have reduced die size by much which is all the really would have cared about.
Checking for consecutive writes is also easy to do when your state machine is driven by the CPU's clock. Drawing most of the waveforms out and everything I'm curious if anyone has found a reason Nintendo's design ignores the second write. There are only two possibilities as I see it either A) There was a minimization the made that had this side effect which they didn't care about. Or B) they intentionally put it in there as a means of security or anti pirating. Hoping that pirate designs wouldn't be able to easily realize the logic was there and then ROMs could use it to help minimize pirating of their games. To some degree that is the only thing that really makes sense to me. Why else would the few games that double write do it? They had to know the hardware would respond like it does. Chances are Nintendo would have told them that it would ignore the second write unless they found out accidentally. Unless they were somehow oblivious to the consecutive writes which doesn't make much sense, you'd think they would know the 6502 pretty well if they were developing on it.
I don't see the minimization that supports A) after drawing out all the signals. The only thing I can see is that PRG R/W never goes high between the two writes (I think, hard to tell on my analog oscope) In order to sense that or check CPU clock cycles it seems to me that they would have had to add to check for the double write. Which makes me think Nintendo put it in there on purpose supporting B). Not really surprising when you consider all the effort they put into the CIC. What do you guys think?
Anyways I've got it coded up in verilog but I need to write up a testbench to debug and validate my design so I can test it out on the NESDEV1. I'll post progress as I make it.
I'm pretty sure it's A).
I mean, I don't see how ignoring the second consecutive write would prevent anyone to pirate MMC1 games.
Using the inc $xxxx instruction on a ROM adress that contains a value >$80 seems to be the standard way to reset the MMC1, almost all games I've checked do it this way.
I think the face some games that do the tricky double writes just did a inc instruction on a place that contained a $ff value was pure randomness, and that since it worked they kept it this way and didn't even realize a $00 was written to the MMC1 but ignored.
Bregalad wrote:
I'm pretty sure it's A).
But how did they implement the logic to ignore the second write? For A) to be true there would presumably have to be less or the same amount of logic to have it ignore the second write. But from what I see, I'm unable to find a solution that will ignore the second write without adding more logic. And if they added more logic to ignore the second write then they would have had to of done it on purpose. Nintendo had to of at least known about it if INC with an initial value of $FF is used so commonly to reset the MMC1. They would have to have known that the first write would reset it and the second would be ignored.
I'm sure there is a possible logic solution that could support A) I just don't see it, so it leaves me to doubt that it wasn't intentional to ignore the second write.
Just remember that my logisim implementation might be very different form the true logic that stands inside a MMC1 die.
I also fail to see how ignoring the second write would take less logic, but only a decap of the MMC1 will tell us.
Bregalad wrote:
I also fail to see how ignoring the second write would take less logic, but only a decap of the MMC1 will tell us.
Or just some tricky test equipment to test things without using the NES
I got motivated to check into this more from the following post:
http://nesdev.com/bbs/viewtopic.php?p=91512#91512, but wanted to follow up with the discussion on this thread also.
There's more details about how figured all this out there, but I'm now certain that Nintendo ignored the consecutive writes intentionally. I'm also pretty sure they did it with a simple state machine clocked my M2. It will only allow a write if PRG R/W was high the last CPU clock cycle.
Aside from that I think you might have an error in your MMC1 schematic. When I tested your PRG switch in my verilog testbench PRG mode1 and mode2 (H and F based on Kevtris' docs) they were backwards. I may have just misinterpreted your schematic but that's how it appeared to me.
So much talk about MMC1, it's frickin MMC1!
kyuusaku wrote:
So much talk about MMC1, it's frickin MMC1!
Amen! I really don't see why so much interest in a mapper that can do little more than the discrete logic ones can and must have its registers written to in such a clumsy/slow way.
tokumaru wrote:
kyuusaku wrote:
So much talk about MMC1, it's frickin MMC1!
Amen! I really don't see why so much interest in a mapper that can do little more than the discrete logic ones can and must have its registers written to in such a clumsy/slow way.
Yeesh. When did this forum start hatin'?
FWIW, quirky "why the hell did they do it that way?" logic sorta turns me on...
tokumaru wrote:
I really don't see why so much interest
Some of what might be called "archivists" frequent this board, and they want to preserve these games exactly.
Quote:
in a mapper that can do little more than the discrete logic ones can and must have its registers written to in such a clumsy/slow way.
MMC1 can do more than the extant mapper 2 boards:
- Twice the PRG ROM capacity (512 KiB)
- Up to 32 KiB of PRG RAM
- Nametable mirroring switch (1-screen $000, 1-screen $400, H, V)
I know Rad Racer used SGROM just for switchable mirroring. (
Another topic delves into how it draws its road.) I guess much of the rest of the
SGROM game list used it for much the same reason.
720°, Bad Street Brawler, Battle Chess, Battle of Olympus, Bionic Commando, Defender of the Crown, Faxanadu, Ikari Warriors 2, Indiana Jones and the Last Crusade, Kid Niki, Mega Man 2, Muppet Adventure: Chaos at the Carnival, Phantom Fighter, Princess Tomato, Robin Hood: Prince of Thieves, Rocket Ranger, The Rocketeer, Space Shuttle Project, Strider, Tecmo Baseball, Tombs and Treasure, Win Lose or Draw, and Winter Games
And there are a bunch of games that use SNROM just for PRG RAM. No Nintendo board modified mapper 2 with PRG RAM in the way that Family BASIC modified mapper 0 with PRG RAM.
tokumaru wrote:
kyuusaku wrote:
So much talk about MMC1, it's frickin MMC1!
Amen! I really don't see why so much interest in a mapper that can do little more than the discrete logic ones can and must have its registers written to in such a clumsy/slow way.
Well before anyone else decides to get on their high horse...
Perhaps I should say I am merely starting with the MMC1. I've got all the discrete mappers down which obviously didn't take much. My end goal is to do something like this for the MMC5. But I can't just jump into the MMC5 acting like I really know what I'm doing. Gotta start simple. Lots of things are well understood about how to program with most of the MMC's but not much info as to how the hardware works or was designed. Lots of things that are a mystery about these mappers are fairly simple to figure out with proper test equipment.
I want to ACCURATELY replicate/emulate all of these mappers in hardware/emulators and it doesn't appear that the documentation is currently not available to do so. Everyone's solution seems to be 'just' decap it, but there is still a lot that can be found by stimulus and response testing outside the NES. And if I can make it look identical from the outside I would think that's good enough for most people.
Man, 4 extra writes and 4 LSR A's make the mappper automatically terrible? It's the best thing between a surface mount MMC3 and a crap logic board. AKA, The best mapper for tweeners that need more than UNROM and less than MMC3.
tepples wrote:
MMC1 can do more than the extant mapper 2 boards:
- Twice the PRG ROM capacity (512 KiB)
- Up to 32 KiB of PRG RAM
- Nametable mirroring switch (1-screen $000, 1-screen $400, H, V)
Even though a discrete logic board with these features doesn't exist, they could all be implemented with few chips.
infiniteneslives wrote:
I want to ACCURATELY replicate/emulate all of these mappers in hardware/emulators and it doesn't appear that the documentation is currently not available to do so.
Okay, I get what you mean. Getting these properly documented is indeed a good thing to do.
3gengames wrote:
Man, 4 extra writes and 4 LSR A's make the mappper automatically terrible?
I guess all of my designs require more frequent bankswitching than most people are used to... If you switch banks less than a dozen times per frame I guess the MMC1 isn't such a bad deal, but when you need more than that the overhead starts making a difference.
Quote:
It's the best thing between a surface mount MMC3 and a crap logic board.
One of the most advanced NES games ever, Battletoads, uses a crap logic board, and it's better than a shitload of MMC3 games.
This thread was made to discuss the simulation of the MMC1 in logisim, not to debate what mappers are good/bad or wathever. People not interested in this particular mapper are pleased to stay away of this thread instead of making dumb non-constructive bashing posts. Thank you.
Also this thread is significant to accurate emulation, so please at least respect this.
Back on the topic, @infinitelives, how can you be so sure two consecutive writes were blocked intentionally ? I don't really follow you on this point. And yes I read the other thread about this subject, I just didn't understand what leaded you to this conclusion.
I wasn't meaning to mapper bash, it was more of a complaint that we've received a lot of progress updates and conjecture, some of which has been wrong, about one of the simplest ASIC mappers and one that is already very well understood. I'm interested in new behavioral details as long as they're factual, but I can't imagine there being much left to discover about something so beaten to death. Maybe I'm cranky because I started a thread like this 5.5 years ago (though I was inexperienced and the logic was wrong):
http://nesdev.com/bbs/viewtopic.php?t=1866
Maybe I should point out something that I haven't noticed being taken into consideration: the MMC1 isn't likely to be a fully-custom chip, it's probably an early, very low capacity CMOS gate array by Sharp. It would be implemented in CAD using idiot-proof megafunctions provided by Sharp, like a low-tech FP
GA. Any errors or wonky behavior are almost certain to be logical, not poor analog characteristics. And yes conserving logic is important because gate arrays are inefficient at routing so a significant amount of the array is typically not usable. Megafunction blocks are however designed at the transistor level so are more efficient than the same function being implemented in gates so there isn't a big penalty for a DFFE vs a DFF for example, but you might run into grid/placement issues depending on the shape of the block I guess.
Quote:
Maybe I'm cranky because I started a thread like this 5.5 years ago (though I was inexperienced and the logic was wrong):
http://nesdev.com/bbs/viewtopic.php?t=1866Why did you remove the schematic anyways?
Quote:
about one of the simplest ASIC mappers and one that is already very well understood.
Yes and this thread is about simulating it's behavior. The fact the chip is well understood doesn't prevent one to do a simulation of the chip, does it? Note that I didn't do only the MMC1 but I did all the MMCs except MMC5 (for now).
Quote:
Why did you remove the schematic anyways?
Mostly because it was wrong, and 74 series couldn't fit the design efficiently so I didn't think it was very elegant.
Quote:
Yes and this thread is about simulating it's behavior. The fact the chip is well understood doesn't prevent one to do a simulation of the chip, does it? Note that I didn't do only the MMC1 but I did all the MMCs except MMC5 (for now).
Of course not, anyone should carry out whatever experiments they wish. I think a thread like this should be definitive though if it's going to be made, and especially if it's referenced. I could draw a very rough and incomplete MMC5 now but it's useless if it only shows that I know how to design a kind-of-working mapper and doesn't bring anything new to the table (either new mapper behavior or clever insight to the logic's implementation).
Okay well the more I dig around the more interesting things I find... It also appears this is stirring things up a bit more around here than I expected. As of now my plan is to keep probing away at this thing. I'll post what I find here, but keep in mind this is still experimental data. I could have broke something inside my MMC1 at some point, have a bad connection somewhere, or anything really. At risk of being wrong about something I'll still post it here. If I should start a new thread just let me know. But I don't see why to right now.
Once I figure things out as best I can if someone wants to write a test rom to verify these things on other MMC1's while running on the NES etc I would think it's a good idea before considering any of this official. I'm just performing experiments to try and figure out how this thing is actually built. Once things have been verified, I'll release what my official design is of the MMC1 that is as accurate as I can tell. With all that said hopefully no body gets their panties in a wad...
To answer your question Bregalad. Like we discussed earlier if the double write ignore isn't intentional then they wouldn't have added extra logic to check for double (or more than double) consecutive writes and only acknowledge the first. So some optimization or 'don't care' would have had to be made use of if it was unintentional. There are only a few ways to do this. Here are some possibilities that I disproved in my experiments.
The biggest possibility I saw was the fact that PRG R/W stays low between sequential writes. It will only go high when a read is done. So maybe they are clocking something with PRG R/W (even though it sounds like a bad idea). With my tests I was able to actually pull PRG R/W high between the two writes without changing M2 or PRG /CE. Even if I did this, the follow on writes were ignored.
Another idea is some bad circuit or something that a capacitance or something that requires time to discharge so the second write is ignored. But since I can wait seconds between writes this is disproved because it didn't matter how much time I wait between writes (usually 50msec for my rig).
The real kicker for me was the question, "what if I perform 3, 4, 5 or more writes concecutively?" What I found, only the FIRST will be acknowledged. ALL follow on writes are ignored.
Then I asked, "what if I write to ROM (MMC1), then WRAM, then ROM? Even though it's impossible to do with the NES, It turns out only the first will still be acknowledged. Even if you write below where the MMC1 can see ($0000-$5FFF) for several cycles but then come back and write to ROM. It still only acknowledges the first write.
I then asked "Well if it isn't checking PRG R/W solely (based on my earlier statement.) How is it sensing the follow on writes?" I the only thing I could figure is that it uses M2 and only checks PRG R/W when clocked. But how to check this? Well what if we DON'T clock M2 but still do everything else normally. It turns out that consecutive writes WILL be acknowledged if we do this. Regardless if M2 is held high or low. This also further proves that PRG R/W can be held low between writes and still get each write acknowledged. This also means M2 is used for more than just WRAM CE. It's not used for clocking the Shift register, but is used for checking for consecutive writes.
And if they did this, then the MUST have ignored consecutive writes on purpose. Unless they somehow accidentally threw in an extra flipflop to create an 'enable write' signal by clocking PRG R/W each CPU clock cycle. Which I think everyone will agree isn't plausible.
One question I have yet to answer yet though is, "will blocked writes with D7=1 cause resets or not?" So really this could also be asked, "will DEC $00 cause a reset or cause a 0 to be loaded into the shift register with no reset." I'll get around to this at some point.
I also did some testing today to check into the details of how only the address of the last write matters as stated in Kevtris' docs. Turns out this is only true to a certain point. I found this out accidentally at first, but it turns out it's actually possible to write TWO Registers at ONCE. Well almost, You can write to one full register and bits 4-0 of a second. If one were to implement this the 5th bit of the second register appears to always be set to 1. Now the exact details of this I'm not 100% certain of yet but here's what it looks like to me.
So according to kevtris:
Code:
LDA #[data to load]
STA 08000h ;It does not matter where these first 4 writes occur. only the last write matters.
LSR A
STA 08000h
LSR A
STA 08000h
LSR A
STA 08000h
LSR A
STA 0E000h ;NOTE: register 3 is what gets loaded!
But what I've found is that if the last 'STA 0E000h' instruction is located in $E000-FFFF this is true. However if this instruction were in $8000-9FFF, I believe that in addition to reg3 getting loaded as normal, bits 4-0 (F, H, and M1, M0) would also get loaded with the same value that we loaded reg3 bits 4-0 with. The one thing I'm not certain of is what the 5th bit gets set to. I did some similar testing with CHR registers (Reg1 and Reg2) and the 5th bit always gets set to 1.
I found this by only changing the address the last byte is written to, and keeping the address of the previous cycle's read the same as the previous writes. Implementing this on the NES the read that occurs the cycle before the write of a STA is from the address of the STA instruction. So wherever that read is performed from, the associated register gets partially written to. Looking at the opcodes this is only true for addressing modes that read from PC (+) before the write. Looks like you might get something funky for some addressing modes.
So it would seem this is somewhat of a bug and I wouldn't think it's intentional. But it gives hits as to how it's constructed. Looks like the copying from the Shift register to the other registers gets enabled a little early as a result of an optimization from not caring about this issue.
Well I thats all I've got for now, at this point I'm not sure how much I care to dig deeper. I should have enough info to get mine working atleast. It's been fun solving some of this puzzle but I don't think it's very fruitful to dig around much deeper. Most of this register stuff could be checked with ROM testing on the NES it just takes a lot longer to check each step compared to my set up. If someone wants something specific tested let me know. I can also burn eproms and test on the NES also.
OK so what does happen if I write for example at $2000, then immediately I write at $8000. Is the second write ignored by the MMC1, or is it taken in account.
Also you didn't mention the possibility for the shift register to be clocked by R/W AND /ROMSEL
If this were to be the case, R/W would not go high between two consecutive writes, so this means the shift register is not clocked. But if you do what you experimented, that is, if I understood it well, toggle both M2 and R/W (but keep R/W low during write cycles when M2 is high) then the shift register will ignore writes even though R/W is cycling, because /ROMSEL is held low.
I didn't understand the last bug you mentionned at all. It sounds completely new to me.
I'll definitely fix my logisim model when this MMC1 behavior will get definitive.
Bregalad wrote:
OK so what does happen if I write for example at $2000, then immediately I write at $8000. Is the second write ignored by the MMC1, or is it taken in account.
My guess is yes it will be ignored. But I do have that down as one thing to check.
Quote:
Also you didn't mention the possibility for the shift register to be clocked by R/W logically AND-ed with /ROMSEL.
If this were to be the case, R/W would not go high between two consecutive writes, so this means the shift register is not clocked. But if you do what you experimented, that is, if I understood it well, toggle both M2 and R/W (but keep R/W low during write cycles when M2 is high) then the shift register will ignore writes even though R/W is cycling, because /ROMSEL is held low.
Actually I did account for this. I didn't explicitly state it but it's what I was implying when I said this:
Quote:
Well what if we DON'T clock M2 but still do everything else normally (for the writes to the MMC1 shift register). It turns out that consecutive writes WILL be acknowledged if we do this. Regardless if M2 is held high or low. This also further proves that PRG R/W can be held low between writes and still get each write acknowledged.
So for that test I operated PRG /CE and PRG R/W naturally and did a test with M2 held high and another with it low. Basically if you disconnect M2 you can get it ACKNOWLEDGE consecutive writes. PRG R/W stayed low for the entirety of several sequential writes and ALL of those writes were acknowledged even though they occured one 'cycle' (less M2 toggling) after another.
Quote:
I didn't understand the last bug you mentionned at all. It sounds completely new to me.
Basically it appears to me that the shift register to reg0-3 coping happens partially (bits 4-0) on the STA instruction read the CPU clock cycle preceeding the STA's write cycle. Then on the write cycle the same bits4-0 plus the final 5th bit from that cycle's write are copied as well. Additionally It looks like the 5th unaccounted for bit of the first copy during the read cycle is always a 1. So if the STA instruction is read from a different ROM location (A14-13) than the register location you're trying to write to; you'll end up writting to BOTH registers with that single (5 bit) MMC1 write operation. Sorry I know it's a bit confusing I'll have to come up with an example...
Yes I've never heard of it either, I think it's safe to say this is the first time it's came to light. But it's definitely there to some degree, I need to further characterize it. Once I modify my design a bit it may make more sense as to why it's happening. I tested it with writing to reg0 and reg3 at the same time. And additionally I tested it with reg1 and reg2 with similar behavior (I don't know enough to say it's identical for all registers yet but I'm guessing so).
Good news. I got my MMC1 working
Thanks for making your logisim version Bregalad. I didn't copy it exactly with some of my differences noted previously. But your design was a great reference and starting point to help me to understand how things worked when the docs left details to be desired.
I'll post my design and everything soon in another thread.
Well I've still got a bug to figure out with my MMC1 before I publish my design. Something is wrong when I test out zelda, other SNROM games are fine like metroid but zelda has issues when drawing the first map screen with the cave and everything.
I decided to jump into MMC3 and found something that may be wrong with your MMC3 Bregalad. I think it's due to some ambiguity in all the MMC3 docs (wiki, kevtris, & disch). The issue is with the first two CHR registers/bank numbers.
I made the same assumption as Bregalad that those two 7bit wide registers ignore the MSB from PRG D7. Which is similar to the PRG banks which are only 6 bits wide. But it appears that they ignore the LSB (PRG D0) as they use CHR A10 in it's place. All the docs I found were ambiguous as to which 7 of the 8 bits are used for holding the bank number. I did some agreement with what I'm saying in the mapper 118 wiki impling that D7 is always mapped to CHR A17. Thinking about it like this makes a little sense now. Basically Bregalad's design shouldn't shift the first two CHR banks by 1 bit, they should stay as is. The value of D7 always corresponds to CHR A17. And all the games I'm testing with look MUCH better after I implemented this change so I'm pretty sure I'm not mistaken.
I know you made mention of it Bregalad, it's probably common knowledge to most, but I think it's worth pointing out that your design WILL NOT WORK as you've posted it because you're clocking your scanline counter with each positive edge of CHR A12. This behavior really isn't pointed out in Kevtris' docs well and your design led me further down the garden path. I didn't realize how the MMC3 ignores closely timed CHR A12 posedges until checking things with disch's docs, so a more explanatory note would have been helpful to me at least.
Yeah - in other words the 2kb CHR bank select registers select a 2kb bank, but with the corresponding 1kb bank number, ignoring the LSB, just like the MMC1 in 8kb CHR bank modes uses 4kb page numbers.
This make sense as it simplifies the inside logic.
Only the MMC5 does it the complex way.
I'm sorry for bringing this old topic back up, but I'm actually interested in designing a pcb for these, as there are no know, cheap mmc1 replacements and making one involves using cpld's, which maybe aren't that expensive, but learning to implement those would require a bit of time, which i currently don't have much. My question is: I noticed that you're using a lot of splitters out of d-type flipflops, how is that realised? shift registers? Then where is the latch bit?