- This is a quick implementation of a pixel-by-pixel core, plus the chance to learn how to deal with it made me to code a single & fast PPU core that works out of default NES PPU "method". In single words, the 1st cycle already outputs a pixel. This is done 256 times. Later, the Y offset is increased & reset if required. At cycle 341, the X offset + nametable bit is reset.
- By compairing it with the "correctness", I only see as "inaccuracy" the Y offset reset, that might occur at cycle ~231. I can't see a sprite zero collision problem, neither any other thing... The collateral effect, as far as I can tell you, is the Battletoads issue. It hangs during the stage 2 randomly. For the rest of games, it seems OK to continue. Even those PPU tests passed.
If you want a fast PPU, separate graphics rendering from other behavior (like statuses). This lets you downgrade timing accuracy of mid-scanline visual behavior while maintaining full accuracy in other respects (sprite hit, status flags, etc.). Quite a few games rely on accurate non-graphical behavior of the PPU, but far fewer rely on mid-scanline effects.
However, some commercial games do rely on this behavior. For instance, if you don't support mid-scanline alterations to PPU registers and/or CHR bank registers, Marble Madness, Pirates!, and Mother will all exhibit various levels of graphical corruption. Also, the orb-relighting beam on Final Fantasy will not appear correctly unless the rendering is accurate to the pixel. This may not matter; if 95% accuracy is good enough, you can get away with a scanline based renderer.
Someone a while back said that they had gotten Battletoads completely working on a scanline-based emulator (despite its timing pickiness) by splitting CPU execution for each scanline into two parts. In other words, instead of something like cpu->execute(114), they did something like cpu->execute(?), then render and set collision/overflow if relevant, update Loopy_V, then cpu->execute(?), then draw the OBJs and move to the next scanline. I don't know what the exact numbers they used were, though.
I think that if you emulate the NES on a PocketPC you may have no choice but to do a scanline based renderer. The current systems simply aren't beefy enough for a pixel based renderer. The same holds true of digital cameras, cellphones, and other limited platforms of this nature.
On this note... When you blit a 256-color bitmap to the PocketPC screen using the standard Windows CE DIB functions (*not* GAPI), is the upsampling from palettized to hi-color mode usually done in hardware or software? Or does that vary from machine to machine?