MMC1 A12 demo

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic

MMC1 A12 demo
by Bregalad on 2010-10-16 (#68735)

I made a demo supposed to test SNROM board's additional WRAM enable bit.
The demo abuses the WRAM enable/disable by setting a different state for each 4k CHR/ROM region, effectively allowing you to read back A12 through WRAM (if you read back the value you wrote in WRAM, then it is enabled and A12 is low, if you read open bus then the WRAM is disabled, and A12 is high).

My demo is supposed to allow you to display a variable/height grayscale area on the screen (you can vary the height with the D-Pad). However, I have no SNROM devcart to test the demo on right now (if nobody has one I will make one), and the powerpak isn't an option currently as I'm pretty sure this disable bit isn't implemented.

So could someone test the demo for me ? The .nes and sources are here : http://jonathan.microclub.ch/MMC1_a12/

Note that currently the .nes frezes in all emus, because they kepp the WRAM enabled in all cases.

by 3gengames on 2010-10-16 (#68736)

I would test but I am not in-depth enough to do so and still need to make a SNROM cart. On the SNROM carts, does the revision matter for anything because there are SNROM-0 to like SNROM-5 or something....And why would you need to disable the RAM anyway? XD

by Bregalad on 2010-10-16 (#68738)

The revision shouldn't matter but it really matters that WRAM /CE is connected to MMC1's CHR A16.

This is useful so that by reading WRAM, you can indirectly read /CE (if it's open but then the WRAM is disabled), which makes you indirectly read MMC1's CHR A16, which makes you indirectly read CHR-A12. The MMC1 internally uses A12 to selct which of the two latches, R1 and R2, is sent to CHR A12 - CHR A16 outputs, effectively creating two 4kb switchable banks. By writing $00 to R1 and $10 to R3, this creates a CHR A16 out = CHR A12 in.

Reading A12 is "useful" because in this case, because BG uses left nametable, and sprites right nametables, it toggles exactly one time per frame, making it a scanline counter that doesn't depend on CPU speed (so it will work on both NTSC and PAL without any modifications). Not that useful I know, but it have never been done before.

I know this is kind of complex and hard to understand, it's also hard for me to explain. I'm far from sure this will behave as I expect it on a SNROM board, so that's why it should be tested.

by 3gengames on 2010-10-16 (#68739)

Yeah I don't get it entirely but I sort of get it. Well....I'll get to the dragon warrior cart later tonight.

by blargg on 2010-10-16 (#68755)

Great idea!

I'd like to try it, but wlalink is crashing on me when I try to link it, no idea why. I need to link it for $E000, because that's all my MMC1 carts support. OK, apparently my download of one of the .lib files was corrupt (would be good to post a zip file of everything, simpler to download too). Would be nice if wla had told me the file was corrupt rather than crashing.

Couldn't get your code working, it hung waiting for WRAM. I wrote my own code and found that it does detect A12 toggling as you planned, but that it detects it every scanline, regardless of where the sprites are. It also misses the A12 toggles every few frames on some resets, due I assume to CPU-PPU synchronization differences.

by Bregalad on 2010-10-17 (#68763)

blargg, you seem to like corrupt donwloads. (no personal offense though...)

I replaced the indvidual files by a zip file. You should be able to recompile it now... at least I hope so.

by blargg on 2010-10-17 (#68765)

It's because download software doesn't always recognize non-standard things like .nes and .lib as binary files. Actually, my browser recognizes them fine, but it's tedious to download each file individually, so I used a batch downloader that apparently didn't recognize correctly. That's two reasons for zipping archives of files rather than putting them up individually.

As I wrote above, I got it assembling. I'm not sure whether this was meant to allow waiting until the middle of the screen with one wait loop, or waiting until the next scanline. It seems only possibly useful for the next scanline, but even then, it misses them depending on synchronization. I even tried doing four WRAM reads in a row when checking, to increase the chance of seeing the change. I was hoping this would work really well, since it's such a clever abuse of the disable line.

by Bregalad on 2010-10-17 (#68775)

I tried to make myself a SNROM devcart but it refuses to work
I guess I have probably damaged the PCB or something in the like... I'll double check the connexion.

by 3gengames on 2010-10-17 (#68777)

Don't curse me!

That sucks. :/ I am going to put a 32-Pin socket on the board, and have a little protoboard wire up with socket pins so I can plug the boards in or a MASK ROM in the plain socket, minimizing the haxoring of the cart's board, so hopefully it won't die.

by Bregalad on 2010-10-17 (#68780)

Never mind, I got it working.
The "gray bar" is shaking by a few scanlines though. This should be due to A12 randomly getting low while fetching sprites or something in the like.

Also I unfortunately can only test this on my PAL NES : The socket + EPROM is too thick to fit into the toploader (without altering the system irreversibly).

by 3gengames on 2010-10-17 (#68785)

Ahh yes. That sucks. :/ For my NROM with eprom's, I need to take the bar out inside the system for the tray, so I understand what you mean. :/

by blargg on 2010-10-17 (#68797)

Yeah, sorry, I realized last night that there's no way I could have run this on any of my devcarts, because I run the code out of... WRAM. Let's see, code in WRAM, WRAM getting disabled/enabled randomly by PPU. Can you say crash? Yeah, and I had even noticed this when trying to do the effect myself, and switched to running code out of internal RAM, but forgot that would have crashed your code too.

Basically, this seems to suffer from the same kind of thing reading $2004 suffers from: synchronization affects whether CPU sees the magic value for a particular scanline, so it misses them sometimes. It's be great to work around this, because it's such a roundabout way of counting scanlines. CHR banking happens to also enable WRAM. Then you rely on open bus to detect whether it's enabled. And then count the number of times that occurs to count scanlines. Hahaha.

by Bregalad on 2010-10-18 (#68821)

Haha, running code from WRAM won't work obviously. In system RAM, if the code fit in $300-$7ff, this shouldn't cause problems.... but with the CHR set that isn't compressed I doubt it'll fit this area.

Quote:
It's be great to work around this, because it's such a roundabout way of counting scanlines. CHR banking happens to also enable WRAM. Then you rely on open bus to detect whether it's enabled. And then count the number of times that occurs to count scanlines. Hahaha.

yeah, haha. I'm glad that I had such a crazy idea, too bad it didn't turn out useful (yet).

Quote:
Basically, this seems to suffer from the same kind of thing reading $2004 suffers from: synchronization affects whether CPU sees the magic value for a particular scanline, so it misses them sometimes.

I don't know. I guess that even when fetching sprites, A12 sometimes becomes low for a very short amount of time, and that's why they added a small lowpass filter on the line on MMC3 board. I tried to idenfity this behavior in my code by having two consecutive checks for WRAM being re-enabled (instead of just one) but apparently this fails. If both checks just happens during one of those short low impulses during sprite fetch, this will create a false positive, and the delay will be shorter than expected.

EDIT : I should do another version with a gird on the BG so you can actually see how much scanlines is the gray area.

by tepples on 2010-10-18 (#68831)

Bregalad wrote:
Haha, running code from WRAM won't work obviously. In system RAM, if the code fit in $300-$7ff, this shouldn't cause problems

Where is the NMI vector?

Quote:
I don't know. I guess that even when fetching sprites, A12 sometimes becomes low for a very short amount of time

I've always thought these were garbage nametable fetches continuing during hblank.

by Bregalad on 2010-10-19 (#68899)

Quote:
I've always thought these were garbage nametable fetches continuing during hblank.

Does anyone have more info about this ?
I throught the dummy NT fetches were made before the actual NT/PT fetches, meaning that this will have the A12 line should constantly low.

However, if they added a cap on MMC3 boards, there was a good reason for it.

by tepples on 2010-10-19 (#68900)

As I understand it, the VRAM accesses straddling the start of hblank look like this:

241: nametable
243: attribute
245: bg pattern
247: bg pattern
249: nametable (never used)
251: attribute (never used)
253: bg pattern (never used)
255: bg pattern (never used)
257: nametable (never used)
259: attribute (never used)
261: sprite pattern
263: sprite pattern
265: nametable (never used)
267: attribute (never used)
269: sprite pattern
271: sprite pattern

Brad Taylor's 2C02 doc appears to suggest that the accesses at 257, 259 through 313, 315, are the same as 321, 323 (the first background tile's nametable and attribute entry). Was this ever independently confirmed?

by Bregalad on 2010-10-19 (#68902)

Oh so A12 actually is half high half low while fetching sprites, and always low when fetching NT/AT/PT. This complicate things.

Is there a way to effectively detect A12 from going always low to oscillate, and A12 from oscillating to always low, that works on both NTSC or PAL (with the same code, if possible, esle you'd just use timed code instead) ?
If not then this kills any chances of this method to be any useful one day. (other than test the accuracy of emulators... so far ALL emus fail on this one )

by tokumaru on 2010-10-19 (#68904)

Can someone explain what exactly we're trying to accomplish here? You are trying to come up with a new way to count scanlines... but from what I understand it's a "busy" counting (i.e. you have to continuously monitor PPU accesses), so what's the advantage over timed code? If I missed something, please point it out to me.

EDIT: If the only advantage is that it would work for both PAL and NTSC, I'd rather use a waiting function that adapts itself to PAL or NTSC depending on a variable (which I already have coded).

by lidnariq on 2010-10-19 (#68905)

Purely in code? I don't know, is getting a single A12 read as high good enough? during sprite fetches, PPU A8-A13 should be stable for four pixels for each read, or ~800ns, so a 50% square wave with period 1.6us. Doing something like LDA $7FFF LDX $7FFF LDY $7FFF really should collide with at least one of the A12-high periods.

by blargg on 2010-10-19 (#68910)

I was using a loop that repeatedly ANDed and ORed with WRAM, so it could find any time when it became disabled (or enabled):
Code:
wram = $7FFF

setup:
lda #$FF
sta wram
...

count:
...
lda #$FF
ldx #scanlines_to_skip
line:
; Wait until WRAM is disabled
: and wram
...
and wram
bmi :-

; Wait until WRAM is enabled
: ora wram
...
ora wram
bpl :-

dex
bne line

This was fairly consistent, but after some resets it would count several more/fewer than usual. I tried the loops with 1-4 reads, and with a ,Y (with Y=0) to add an extra clock here and there, and couldn't ever get it reliably at the same scanline. This is on NTSC.

Maybe I should have the scanline count merely do a few reads, then wait a scanline before reading again?

by Bregalad on 2010-10-20 (#68917)

tokumaru wrote:
Can someone explain what exactly we're trying to accomplish here? You are trying to come up with a new way to count scanlines... but from what I understand it's a "busy" counting (i.e. you have to continuously monitor PPU accesses), so what's the advantage over timed code? If I missed something, please point it out to me.

EDIT: If the only advantage is that it would work for both PAL and NTSC, I'd rather use a waiting function that adapts itself to PAL or NTSC depending on a variable (which I already have coded).

So far there is absolutely no advantage. I just wanted to explore this unexplored aspect of the MMC1, that's all. Maybe I'll figure out an advantage if I can reliably count scanlines this way. For example, maybe this allows you to do timed code while playing DPCM samples, something not possible under normal conditions, as the DPCM fetches randomly slows down the code.

I'll try what blargg suggests as soon as I get home.

by Bregalad on 2010-10-21 (#68945)

Well I didn't try it but I think it would be the exact same as what I did. It just reads the SRAM with and/ora instrucitons instead of lda instructions, but I see no reason the resuts would differ.
What does the "..." mean exactly. Do you use some kind of unrolled loop ?

Now it makes more sense to me why the MMC3 counter worked too when sprites are using the left PT and BG uses the right PT.
In the "normal" configuration, (BG left, sprites right), A12 oscillate when fetching sprites, and is low when fetching BG.
In the "reverse" configuration (BG right, sprites left), A12 is always low when fetching sprites, and oscillate when fetching BG.
If both are left, A12 is low all the time and the counter is never clocked.
If both are high, A12 oscillate all the time.

Because of the capacity on the board, the oscillation becomes a constant 2.5 V for the MMC3, which is considered a logical '1', right ? Then the counter is clocked one time per scanline. What surprises me is that the edge's timing should be highly dependant of the PPU's internal output imprdance on the A12 line which shouldn't be identical on all NES units. Then this might alter the exact time MMC3 IRQs fire. Anything I got wrong here ?

by blargg on 2010-10-21 (#68946)

Bregalad wrote:
Well I didn't try it but I think it would be the exact same as what I did. It just reads the SRAM with and/ora instrucitons instead of lda instructions, but I see no reason the resuts would differ.

The point of the ORA/AND is to allow reading it more often than every 7 cycles, in case that helps any with catching it changed. That's why I switched to OR/AND, so I could do this. CMP doesn't allow checking every 4 cycles.

Quote:
What does the "..." mean exactly. Do you use some kind of unrolled loop ?

That's where you could insert more ORA/AND instructions, to alter the timing and how often it reads on average.

Quote:
If both are left, A12 is low all the time and the counter is never clocked. If both are high, A12 oscillate all the time.

There's still nametable fetches at least, so I'd think it would still go low many times per scanline.

Quote:
Because of the capacity on the board, the oscillation becomes a constant 2.5 V for the MMC3, which is considered a logical '1', right ?

I removed the capacitor on my SMB3 board and the game ran fine, so it's not clear what it's for.

by Bregalad on 2010-10-21 (#68947)

Mmh.. Then there should be additional logic inside the MMC3 that handles this "delay" ?
As far as I understand, the PPU fetches nametables bytes all the time - no matter if it's fetching BG or sprites tiles. So A12 can be either oscillating or low, it'll never be high.

by blargg on 2010-10-21 (#68950)

Yeah, I don't think you can get A12 steady high unless you disable rendering and set the PPU address manually. I think the core problem is that it's never high long enough for the CPU to see it consistently. Or it might change state while the CPU is reading, so it might see open bus for part of the time only.

by Bregalad on 2010-10-21 (#68951)

I just figured how lacking the documentation of MMC3's scanline counter is. Not the functionment of the counter but of how it is clocked.

I always though, according to existing doccumentation, that it was simply clocked by A12's rising edges, but apparently this isn't the case else it would be clocked like 8 times per scanline (not just 1). Docs says this is due to the capacity filtering the signal.

This fishy as hell : Since I don't see any resistor in series with it, (the resistor can't be internal to the MMC3 either, because it should be present BEFORE the capacitor to have a low-pass effect), the "internal" impedance in the PPU (which should be close to 0) should play the role of resistor for the filter... but this won't work, because this would completely screw up PPU fetches. Since the MMC3 uses A12 for bank-switching as well, having it filtered is not acceptable in any ways.

I guess the capacity is only there to filter really short impulses, that are significantly shorter than a PPU clock cycles. Because the internal impedance of the PPU is close to 0, the filter cutoff will be really high.

Look at this for a proof of it. This board uses a 74HC08 chip instead of a capacitor. I guess it ANDs A12 with VCC (or itself) 3 times to delay the edges, and and it with the original. As far I know this should kill impulses of ~50ns and shorter, but have no effect on anything slower than this.

There is really something lacking/fishy here, how comes nobody ever figured how wrong/incomplete the current theories behind the counter are ?

by tepples on 2010-10-21 (#68952)

Bregalad wrote:
how comes nobody ever figured how wrong/incomplete the current theories behind the counter are ?

Because nobody has the $$$ to decap the MMC3 the way someone did for the CIC. Therefore, the MMC3 is a black box, and any theories behind its operation are empirical.

by kevtris on 2010-10-21 (#68953)

Bregalad wrote:
I just figured how lacking the documentation of MMC3's scanline counter is. Not the functionment of the counter but of how it is clocked.

Look at this for a proof of it. This board uses a 74HC08 chip instead of a capacitor. I guess it ANDs A12 with VCC (or itself) 3 times to delay the edges, and and it with the original. As far I know this should kill impulses of ~50ns and shorter, but have no effect on anything slower than this.

There is really something lacking/fishy here, how comes nobody ever figured how wrong/incomplete the current theories behind the counter are ?

that chip is not performing a delay on there. The capacitor is just for power supply bypassing also.

The way the MMC3 does it I'm pretty sure is it uses a counter to make a simple monostable ("pulse stretcher").

What you do is this:

when A12 goes high, AND a delay counter equals zero, you feed a clock to the scanline counter, AND you set this delay counter to a non-zero value at the same time.

each M2 you decrement your delay counter.

the net result is this: when A12 goes high, the IRQ counter is clocked exactly once, and this delay is loaded, which prevents the IRQ counter being clocked more times. each time A12 is high, this delay counter is reloaded. So, if A12 is toggling, the delay counter is constantly being reloaded. When A12 is low long enough, the delay counter finally hits 0, the next time A12 goes high it will pulse the IRQ counter.

A12 will be high and low fairly long, alot longer than 50ns. the PPU reads a byte every 8 21MHz clocks which is every 372ns. The fetch cycle is 4 accesses long, so that means A12 will toggle at least every 2 accesses (2 garbage fetches, 2 sprite tile fetches) so the A12 toggle rate is thusly 16 21MHz cycles high, and 16 21MHz cycles low.

This means that the A12 delay counter should be 3 or 4 M2 cycles long, which isn't very many at all. For safety, you could use 7 or 8, which gives a nice round number for a 3 bit counter (7, +1 depending on how the other logic works).

Of course, you can actually TEST how long this is by some carefully written code on the NES that toggles A12 via 2006 or something... but I think the delay is too short to check it this way.

So, in a nutshell that's pretty much how it has to work. An RC delay most likely isn't accuate enough for this but might be made to work. The internal delay counter is reliable and more importantly "free" since it's on the chip.

by blargg on 2010-10-21 (#68955)

Bregalad wrote:
I guess the capacity is only there to filter really short impulses, that are significantly shorter than a PPU clock cycles. Because the internal impedance of the PPU is close to 0, the filter cutoff will be really high.

Yeah, this makes more sense. Nice to have a better explanation of this. Nothing else counts the number of pulses, so glitches like this don't matter (and presumably RAM and the MMC writes somehow ignore these via some other method).

kevtris wrote:
The capacitor is just for power supply bypassing also.

It goes between ground and CHR A12, and it's only 220 pF.

by kevtris on 2010-10-21 (#68956)

blargg wrote:
Bregalad wrote:
I guess the capacity is only there to filter really short impulses, that are significantly shorter than a PPU clock cycles. Because the internal impedance of the PPU is close to 0, the filter cutoff will be really high.

Yeah, this makes more sense. Nice to have a better explanation of this. Nothing else counts the number of pulses, so glitches like this don't matter (and presumably RAM and the MMC writes somehow ignore these via some other method).

kevtris wrote:
The capacitor is just for power supply bypassing also.

It goes between ground and CHR A12, and it's only 220 pF.

Oh that little ceramic one. Yeah that's for minor pulse filtering. Nintendo must not have synchronized the A12 input to their logic (that is, run A12 through a flipflop off M2) before using it, so thin pulses on A12 could clock the counter multiple times. Honestly, that cap might've just been N being paranoid. I've taken it off of many boards for fun/testing and it doesn't seem to matter... but I'm sure there's SOME NES' where it WOULD matter, or the timing calculations say it's possible, so to be safe they included it. Even if only 1 out of 10000 NES units would be affected, it's cheap insurance.

on all the pirate MMC3 boards I've seen (multicarts, etc), they don't have the capacitor on A12. Either it was slightly redesigned with the synchronizer flipflop, or they just didn't put it on since it seems to work without it. Also interestingly, the N made MMC3's are a 44 pin QFP while the pirate ones are all 40 pin DIP.

by Bregalad on 2010-10-21 (#68957)

kevtris, your "monostable" approach would make at least some sense.
It's not good to talk about a "monostable" here, because this would refer to a circuit that uses R and C componants for a delay (which is subjects to a big error or margin), but A12 clocks being blocked by another circuits who counts M2 clocks after the fitst A12 rise would be a plosible solution.

Even if the internal PPU's out impedance is 100 ohm (which it normally shouldn't be that much) this correspond to a 22ns time constant filter. The short impules of a few ns should be due to asynchronous issues inside the PPU itself or something in the like. Chances are that the long lines on the NES connector already filters them a bit.

I belive people made tests and clocked A12 manually with $2006 writes, and figured that each rise (at slow speed) would result in an IRQ, RIGHT ? If nobody has ever done this, then it would be about time somebody really tests this.
Has it been tested how many cycles exacly between A12 rises are needed to clock the counter twice ?

This brings me another question. If Nintendo did have to use something this complicated (two counters, including one who isn't in sync) for a counter on A12, why didn't they use a counter on A13 instead which would just require a divide-by 42 pre-scaler before the actual scanline counter and would work wonders in theory ? (yes there is unused pins on the chip). Maybe Nintendo just didn't know their hardware this well.
Or maybe the actual circuit is different from what we imagine. As tepples says, no way to know it without decaping the chip

I'll try to simulate the "monostable" approach in software for my MMC1 demo. Hopefully I'll get something working.

by Memblers on 2010-10-21 (#68959)

I don't know if it's much help, but on my old Squeedo board back in 2005, scanline counting was one of the first things I tried. I hooked PPU-A12 to a timer input on the PIC16/18 (no capacitor or anything else), set it to prescale by 8, and it worked perfectly.

by kevtris on 2010-10-21 (#68960)

Bregalad wrote:
kevtris, your "monostable" approach would make at least some sense.
It's not good to talk about a "monostable" here, because this would refer to a circuit that uses R and C componants for a delay (which is subjects to a big error or margin), but A12 clocks being blocked by another circuits who counts M2 clocks after the fitst A12 rise would be a plosible solution.

I belive people made tests and clocked A12 manually with $2006 writes, and figured that each rise (at slow speed) would result in an IRQ, RIGHT ? If nobody has ever done this, then it would be about time somebody really tests this.
Has it been tested how many cycles exacly between A12 rises are needed to clock the counter twice ?

This brings me another question. If Nintendo did have to use something this complicated (two counters, including one who isn't in sync) for a counter on A12, why didn't they use a counter on A13 instead which would just require a divide-by 42 pre-scaler before the actual scanline counter and would work wonders in theory ? (yes there is unused pins on the chip). Maybe Nintendo just didn't know their hardware this well.
Or maybe the actual circuit is different from what we imagine. As tepples says, no way to know it without decaping the chip

I'll try to simulate the "monostable" approach in software for my MMC1 demo. Hopefully I'll get something working.

"Monostable" is the proper term for this. A monostable is any circuit that is only stable in one state, and can switch to the second "unstable" state for a predetermined amount of time. It doesn't matter if an RC controls the time, or a counter.

And yes, the Blargg test does the "manual A12" clocking test for his MMC3 tester program. As for the length of the delay I don't know what it is.

The monostable via counter on A12 is a pretty simple and elegant solution, IMO. It uses fewer gates/flipflops than a divide by 42 on A13 would use.

My FPGA NES uses the method I outlined in a previous post (the delay counter, etc) for its MMC3 IRQ scanline counter and it works with every single game. I tested every MMC3 game in the goodNES set along with some not in the set and it works properly with every one. Even the most tricky games like klax (japanese one specifically) which fires ALOT of IRQs in a frame. 16 or so from what I recall.

by Bregalad on 2010-10-21 (#68961)

I uploaded my test ROM and sources at the same place simulating the delay.
It works fine for most delay values but is unstable randomly for some values, which is weird.
Again, I can only test it on PAL because it's too thick for my topload, if someone in a NTSC country could test it (NOT on a powerpak) that'd be great.
PS : The ugly dash pattern is intentional, to be able to count pixels on a TV, a dash occuring every 8 pixels.

@Memblers : That's great. Wonder why Nintendo didn't do it that way (with the 8-prescaler). This wouldn't work with BG using the right PT and sprites the left one, like Armadillo does. But is this really an issue ? Honestly Nintendo had the choice between using /RD with a 85 prescaler, A13 with a 42 prescaler or A12 with a 8 prescaler, and the took the worst option, that is A12 with a weird digital monostable inside. WTF ??

by lidnariq on 2010-10-21 (#68965)

Dividing by 8 is so amazingly trivial in circuits... that's clearly why they did it. To divide by 85 or 42 would have required a 6-or-7 bit counter (instead of 3) and a 3-or-4 input AND gate.

by blargg on 2010-10-21 (#68966)

I found the posting from way back when I tried toggling A12 quickly. I realize that I could have toggled it for just 4 cycles by setting the address to say $1FFF, then reading from $2007 immediately.... nope, still gets clocked for a four-cycle high and also for a four-cycle low:

Code:
lda #0
sta $2006
sta $2006
lda #$1F
sta $2006
lda #$FF
sta $2006 ; A12 = high; clocked
lda $2007 ; A12 = low

lda #$1F
sta $2006
lda #$FF
sta $2006
lda #$0F
sta $2006
lda #$FF
sta $2006 ; A12 = low
lda $2007 ; A12 = high; clocked

Another thread about this

by Bregalad on 2010-10-21 (#68972)

OK so two A12 rises 4 CPU cycles apart are valid, on NTSC this is 12 PPU cycles, that's 6 read cycles. For real the line toggles every 2 read cycles, so after a clock the MMC3 apparently counts the equivalent of somewhere between 3 and 6 read cycles before enabling clocks again... complicated, but possible.

As lidnariq says, dividing by 8 is trivial (just requires 3 flip flops) so I don't know why they didn't do it that way. Dividing by 85 or 42 would be more comlplex, but this monostable things sounds even more complex to me but oh well...

It's fun how even fundamental aspects of very common mappers are badly covered... first this MMC1 WRAM enbale and now MMC3's counter... all in one thread. I think the nesdev communauty should make the best effort to fit those rabit holes in order to get accurate mapper emulations for homebrew games...

With current mapper emulation, it is now possible to make a game that would pass on all emus and powerpak and fail on real HW or vice versa.

by tepples on 2010-10-21 (#68973)

Bregalad wrote:
why didn't they use a counter on A13 instead which would just require a divide-by 42 pre-scaler before the actual scanline counter and would work wonders in theory ? (yes there is unused pins on the chip). Maybe Nintendo just didn't know their hardware this well.

Or maybe it was designed while Nintendo was still planning to make changes to PPU sprite evaluation, such as killing the dummy nametable reads in hblank to allow for 16 sprites instead of 8. I've been told Famicom back-compat in the Super Famicom PPU was dropped fairly early on, but the plans might still have been in the works during the MMC3 era. Or they were trying to save a few gates/latches in their design; a divide-by-42 counter is six extra latches whether implemented as a binary counter or as a polynomial counter.

A 4-cycle high is 12 dots wide. The PPU A12 pulses are only 4 dots wide, or one CPU cycle and change.

by blargg on 2010-10-21 (#68980)

Bregalad wrote:
OK so two A12 rises 4 CPU cycles apart are valid, on NTSC this is 12 PPU cycles, that's 6 read cycles.

Oh, my code didn't test two clocks (the code is up there to examine). For some reason I was thinking it was the pulse width. Here's some new code that verifies what you just stated:
Code:
lda #$0F
sta $2006
lda #$FF
sta $2006 ; A12 = low
lda #$0F
sta $2006
lda #$FF
bit $2007 ; A12 = high; clocked
sta $2006 ; A12 = low
bit $2007 ; A12 = high; clocked

Quote:
Dividing by 85 or 42 would be more comlplex, but this monostable things sounds even more complex to me but oh well...

It's just a counter that prevents further clocking until it counts down.

Quote:
It's fun how even fundamental aspects of very common mappers are badly covered... first this MMC1 WRAM enbale

The MMC1 disable has been covered well all along. Most of the pinouts show how its /CE is connected to CHR A16.

by Bregalad on 2010-10-22 (#68987)

Quote:
The MMC1 disable has been covered well all along. Most of the pinouts show how its /CE is connected to CHR A16.

Yeah, but it was never covered in ANY MMC1 doc I've ever looked at, and no emulators ever emulated it so far, even the so called accurate ones.
Quote:
Oh, my code didn't test two clocks (the code is up there to examine). For some reason I was thinking it was the pulse width. Here's some new code that verifies what you just stated:

I think it's impossible to test less than 4 cycles apart (which is 6 read cycles on NTSC, a read cycles is 2 PPU clocks, so 2/3 CPU clocks). Because sprite fetches takes 2 read cycles, and dummy fetches 2 read cycles, this is 3 times faster than your test code.

Quote:
Or maybe it was designed while Nintendo was still planning to make changes to PPU sprite evaluation, such as killing the dummy nametable reads in hblank to allow for 16 sprites instead of 8.

Could make sense. This would break Ninja Gaiden, Castlevania II and Legend of Zelda, though.

To make suire the MMC3 counter actually works as we suppose it does, someone would need to disconect M2 to the MMC3 chip (effectively breaking the ciruit for enabling SRAM but it doesn't matter here), and see if the counter is affected. If it triggers only a single IRQ and then stop working then we know we are on the right track. If the counter is unaffected, we are wrong, and are missing something about the counter.
Then, with a manual switch on the M2 line, it should be possible to know how many CPU cycles the MMC3 waits before enabling the MMC3 again (between 2 and 4, from what we know).