I've got quite a nasty crash/reset bug in my project. It occurs when it's running right on the edge of using too much CPU time in a frame but is it normal for the NES to reset itself in those kind of circumstances? It happens both on hardware (via PowerPak) and in Nestopia (but oddly I can't make it happen in Nintendulator).
There's a lot of PRG and WRAM bank-switching going on so that's probably where I'm going to start looking but I'm pretty fastidious with making sure the correct banks are switched (excessively some might say).
Problem is, because of the configuration, I can only run it in Nestopia (which doesn't have debugging) or Nintendulator (which does but the crash doesn't happen).
Any tips or ideas?
This sure sounds like an annoying bug.
I'd check how you handle interrupts and stack. Maybe this is due to an NMI interrupting right before the frame finishes, and then the next NMI acts the same etc... and this leads the stack to overflow. This is pure speculation though.
Definitely sounds like a possibility.
OK, this might be a dumb question but: how would you prevent that from happening?
That was a dumb question. I just moved a big chunk of code out of the NMI and put it in the background loop.
Seems a lot more stable. Schoolboy error
Still isn't that just hiding the problem? Your program shouldn't crash because too much cpu time is used in a frame. While you could go on like that just trying to make it impossible for that to happen, what if you miss something? Crashing bugs are definitely not something you want to leave unfixed.
You can increment a counter in the NMI, and decrement it before returning. If the increment makes it higher than 1, then it's a recursive NMI. I think I had seen code like that in Metroid.
MottZilla wrote:
Still isn't that just hiding the problem? Your program shouldn't crash because too much cpu time is used in a frame. While you could go on like that just trying to make it impossible for that to happen, what if you miss something? Crashing bugs are definitely not something you want to leave unfixed.
Oh, I totally agree.
Evidence would definitely point to a recursive NMI though - if I added deliberate delay loops into the block of code (that I've now moved to background loop), it would crash with far higher frequency.
Now that I (hopefully) know what conditions are causing the issue I can try to figure out how to cope with them.
So far though, without changing any other code, it survived an extended period of 'stress testing'.
If you did it the "everything in NMI way", then you'd want to make sure only the first "quick" part of the NMI (video + sound) is done each time, but that the next part (logic) is only done if there isn't any previous NMI going on.
I have a feeling that what happens is that because I'm saving the current PRG bank and the current WRAM bank at the start of the NMI and then restoring it at the end, if I get a recursive NMI the PRG/WRAM banks could get restored to the wrong values.
For example, say the PRG bank was set to 0 when the NMI occurred and then in the NMI I'm switching the PRG bank to 1 (where, say, my screen code is located), then restoring it back to 0 on exiting the NMI. If the PRG bank hasn't been restored by the time another NMI occurs, the saved PRG bank then becomes 1, so when it does eventually get restored from the saved value (next frame), it gets restored to 1 instead of the 0 that it was originally.
I'm pretty sure that's what was happening. I'd be interested in suggestions of how to get around that although I have moved code around (out of the NMI) now and it's a lot more stable.
I'd suggest save the old bank on the stack instead of using a fixed memory adress.
If you save data to the stack instead of fixed memory locations (variables), as long as the stack doesn't overflow everything can be restored correctly.