As we all know, The temporary PPU address (usually set by dual writes to $2005) gets moved to the real PPU address at the start of the frame (provided BG/Sprite rendering is on). What I'd like to know... is when exactly does this happen?
It's got to be near the end of scanline -1 (the dummy scanline after the 20 lines of VBlank and before the first rendered scanline) -- but on which cycle? Up until now I've been doing it on 260 (which was really just a guess), but recent difficulties with Cobra Triangle have me suspecting that this is too soon. When I push it back to 320 it works -- but is that too late?
Anyway, here's what I do: scanline 0 is the VBlank hit, down to 19. Scanline 20 is a dummy one: once ppu pixel reaches 256 (I mean >= 256 here), so I reset the ppu address (loopy_v = loopy_t).
Are you stealing 4 cycles for DMC byte fetches? And if so, does Cobra Triangle work?
I tested this on my NES and came up with 2373 cycles after NMI. This might be a few cycles off since I'm not sure of the exact interrupt latency. I put together the test source and a NES ROM that tests the proper delay with adjustments of +1, 0, and -1, to verify that the emulator is handling it correctly.
ppu_t_to_v.zip
Here is the essential code which I calculated the delay from:
Code:
nmi: lda #0 ; 2 set scroll to 0
sta $2005 ; 4
sta $2005 ; 4
ldy #4 ; 2325 delay
dl: ldx #115
dl2: dex
bne dl2
dey
bne dl
pha ; 21 delay
pla
pha
pla
pha
pla
lda #128 ; 2 set scroll again
sta $2005 ; 4
sta $2005 ; 4 (write occurs on fourth cycle)
blargg, what's the expected result? I see an orange background with black vertical bars. The screen becomes black after a few seconds.
Read the comments at the beginning of the asm source.
Edit: I'm getting different results occasionally. Sometimes the pattern table doesn't even get loaded properly if I power up rather than reset! I tried offsetting in the horizontal direction instead and got nothing. I've spent too many hours on this; I'm sticking to APU tests only. Maybe someone else with more PPU experience can use my code as a starting point for a proper test. Sorry for the inconvenience.
I've changed the code in my emulator to reflect this, that is, by putting it on PPU dummy scanline cc offset 299 instead of 256. I haven't noticed any problems with the few time sensitive games I've tested so far so I think I'll stick with this for now. If it causes problems with any other game I'll post it here. Thanks for your effort.
2373 cpu cycles = 7119 PPU cycles
20 scanlines = 6820 cycles
7119 - 6820 = 299
So about at cycle 300 in the pre-render dummy scanline? Plus there's some latency between the start of the frame and when an NMI actually occurs (enough for one instruction to be executed before the NMI is tripped).
I've been able to reproduce what I think is the desired result in the test rom by putting the reload at cycle 305. This adjustment didn't cause any problems with Cobra Triangle or any other games I tried ^^
Thanks blargg
Well, you can't say it's correct because of just 1 game, can you?
No, but I can say it's incorrect because of one game ;)
cc 300 works here. But it just seems so... random, doesn't it?