This is the last one for a while, honest.
In part of my code, I am able to cause glitches with the following setup:
-bg and sprites are off
-nmi is still on
-nmi is doing nothing but a bunch of empty loops
-the main thread is uploading a lot of data to the PPU
Are there any situations where writing to the PPU getting interrupted by NMI can cause issues? I haven't read anything to that effect. I don't want to turn nmi off, because this is useful for seamless sound continuing to play while uploading new PPU data. Previously, I had synchronized this upload with nmi, and had no glitches, but it was a bit slower. To be absolutely safe, I may go back to that approach---I dislike knowingly leaving bugs in my code! But, I thought it was supposed to be safe to write to the PPU in the main thread with graphics and sprites turned off---I did not think nmi would affect this?
The NMI in itself only changes behaviour of the CPU - this doesn't affect the PPU who is in iddle mode but still does it's cycle counting and fires NMI at 60 Hz.
What can be problematic is if you acknownledge the interrupts by reading the $2002 register in your NMI. If you do this, it can screw up the address you're writing to in VRAM in the main thread.
I am not using $2002 anywhere in my code. I'm sort of tempted to whittle this down over time to some concise example that causes a glitch so I can understand it better...I'm pretty stumped right now. Not holding up development, its more because I'm getting more curious about the nes hardware than in the past.
Are you absolute sure you're not doing anything PPU related in the NMI? Setting the scroll, maybe?
Yep, the nmi is totally blank. I kept taking things out---OAM, bankswitching, sound updates, til I had nothing but empty loops in nmi, while uploading ppu data in main thread (bg and sprites off). The glitch is very rare and seems to rely on extremely precise timing. *edit* happens in several emulators and on real hardware, though.
And when you disable NMIs everything is fine?
Could you by any chance be trashing registers in the NMI? If the main thread is doing logic you have to make sure you return all 3 registers (A, X and Y) exactly as they were when the NMI fired (P too, but the CPU takes care of that).
NMI could be trashing not only registers, but also the states of some variables for example. It could also trash the mapper state if you use a mapper like MM1 or MMC3 where multiple writes are required.
And yes even a simple innocent $2000 or $2005 write could potentially affect the scrolling = affect the VRAM adress but I think this can't be the case, as the change is only effective at the end of VBlank if no forced blanking (which is obviously not the case as we're in forced blanking) or at $2006.2 write, so as long as you don't read $2002 it should be ok.
So, I actually have not yet tried turning off nmi to weed it out. I will try that next. As for protecting registers, yes, I'm already doing that.
Movies and save states can be super useful when weeding out rarely occurring bugs like this. Record a movie, wait for the bug to occur, then seek the movie back to just before the bug occurred and step through it or enable instruction logging. It's still not much fun though, usually lots of code to go through.
I've confirmed that switching off NMI before the main thread ppu upload routine runs and then switching it back on after seems to make the bug go away, or at least mask it. I can't reproduce it in this case. I found the only place I'm using $2002 is startup where you're supposed to wait for the ppu to warm up, nowhere else am I using it.
Any registers or status flags getting trashed, or stack memory overwriting anything else?
Are you updating the palette in the main code and thus leaving the PPU address within the palette?
UGH! It turned out to be an embarrassingly simple bug. Fortunately, I found that the changes I made over the weekend actually did correctly fix the bug, I just didn't fully understand what the bug was until this evening. I was, in fact, clobbering a variable in a vblank routine. I'm glad I stuck with this. I'm a very right brained person so true engineering discipline doesn't come easy. Thanks for your patience and helping me rule out NES hardware being to blame. 99.99% of the time on ANY platform old or new, if you have a bug, chances are it is in your own code...it is a hard lesson. Keep learning it over and over and over. Maybe I'll get it this time
Glad you stuck with it until you figured out exactly what it was. Certainty is satisfying.
This is a duplicate of
Sprite OAM issue? (updated: bug in own code). I just thought I'd cross-reference these two so people don't get confused and think there's something to learn here other than: "You have a bug in your own code."
Except perhaps that misidentifying a bug can still be a good exercise in reviewing one's own code and making other improvements...
I'll lock this thread to keep future readers/posters focused on the other thread. :-)