I had a nasty surprise this evening when I tested some of my experiments on a real NES. I thought I had tested this particular experiment on a real NES before, but apparently not. I'm getting nasty sprite flicker. It sucks cause this doesn't happen on any of the 4 emulators I've been testing with.
The only thing I can think is maybe I've written some highly inefficient, cycle-heavy code and it is going past the length of one vblank (but shouldn't an accurate emulator catch this?). In fact, I did add one (harmless) instruction in my code and the flicker got worse...that would lend credence to that possibility would it not?
There are two things I can see that would cause differences between Nintendulator/Nestopia and NES:
- How are you clearing RAM at the start of your program?
- Are you turning off rendering early? I helped discover a glitch in the OAM refresh while developing Tetramino.
Also be sure you don't have any of the emulators' "reduced accuracy/enhanced visuals" effects enabled. I know that in particular Nestopia has an option that turns off the 8 sprites per scanline limit.
I first disable graphics, then wait two vblanks, then clear ram in pages $000-$700.
My program is an experiment with displaying meta-sprites. In my current test program, I am displaying two meta sprites. If each meta sprite is composed of 8 sprites (4 wide x2 high), there is no flicker. If each meta sprite is composed of more than this, the sprites begin to flicker. The max size I've tried is two meta sprites which are 4x4 sprites. From what I understood from documentation, this would bring the max sprites per scanline to 8 (if the two meta sprites are positioned next to each other), which is the max allowed for the NES. I assumed when I read that that there would be no flicker...is that incorrect?
*edit* Upon modifying my test programs, when I displayed two 4x4 meta sprites, the flicker did begin to show up in both Nestopia and Nintendulator. There seems to be a small discrepancy between these emulators and the real hardware for when flicker begins occurring (based on no. of sprites)
*edit* I found tepples' post
http://www.nesdev.com/bbs/viewtopic.php?t=4647&postdays=0&postorder=asc&start=0&sid=ca06ee7f0b57499c8d5129d1ee7456b4
I am not yet certain if I'm seeing the same problem, though.
I don't think any NES emulator accurately emulates what can happen when writing to the PPU in an active scanline. An easy 'fix' for your program is: vblank NMI -> disable PPU -> upload data to PPU -> enable PPU.
Also, be sure to do a sprite DMA every vblank, as sprite attribute RAM contents may deteriorate if untouched for a while (not sure of this myself). This effect is also not emulated.
I noticed if I draw a large enough meta sprite (with 28 sprites in nintendulator), my background starts to flicker and scroll vertically (with your suggestion of turning off rendering-> upload to PPU -> turning on rendering). This looks like my rendering is taking longer than vblank. Maybe I should test in PAL mode to confirm this. *edit* sure enough, no problems in pal mode in nintendulator.
If you haven't implemented a "flicker" routine and all sprites hold the same places in the OAM page every frame, the sprites should not flicker at all. Have you ever played SMB1 and watched one of those pulley-platforms disappear when on the same scanline as another? That's what it should look like. One and only one should completely disappear.
My guess is it's something kind of unnatural, like a PPU glitch. Always be very aware of how long your Vblank code takes. You should count the cycles. If it works in PAL mode like you said, it's very possible this is what's happening.
I've read on this forum that vblank is over 2000 CPU cycles in length, yet from testing my own programs in nintendulator, I seem to get flicker with the following code: (I have already loaded a background, and in vblank all I do is disable rendering and re-enable it)
Code:
vblank:
pha ;1
txa ;1
pha ;1
tya ;1
pha ;1
php ;1
ldx #0
stx $2001 ; disable rendering
ldx #$ff ;2
- dex ;1
bne - ;2
ldx #$d3 ;2
;ldx #$d4 ;2 ;uncomment this line, you will get flicker
- dex ;1
bne - ;2
lda #%00011110
sta $2001
plp ;1
pla ;1
tay ;1
pla ;1
tax ;1
pla ;1
irq:
rti
I don't know how long it takes to enable/disable rendering, but I put some loops in there which, if I've done my arithmetic right, take 1402 cycles. Where are the remaining 600+ cycles going?
A taken branch is (usually) 3 cycles, not 2.
And all instructions take at least two cycles. Looks like you're totaling the number of BYTES each instruction is encoded in, rather than the amount of work it's doing. Here's a more realistic cycle timing:
Code:
vblank:
; 14 for previous instruction + vectoring
pha ; 3
txa ; 2
pha ; 3
tya ; 2
pha ; 3
php ; 3
ldx #0 ; 2
stx $2001 ; 4 disable rendering
ldx #$ff ; 2
- dex ; 2
bne - ; 3
; -1
ldx #$d3 ; 2
;ldx #$d4 ; 2 uncomment this line, you will get flicker
- dex ; 2
bne - ; 3
; -1
lda #%00011110; 2
sta $2001 ; 4
plp ; 4
pla ; 4
tay ; 2
pla ; 4
tax ; 2
pla ; 4
rti ; 6
I calculate 2293 cycles from NMI to the final STA $2001. Uncommenting that line adds 7 cycles.
Thank you for the corrections.
Does there exist a good reference for determining how many cpu cycles each instruction takes? I erroneously assumed the number in a certain column in a 6502 reference was the # of cycles, rather than the size of the instruction in memory...my bad.
Wow, that was fast. Thanks!
Thanks to you guys, I've had a leap of understanding about programming for the NES this evening. So, vblank is only around 2000 cycles long, that must mean there's quite a lot more cycles available while the PPU is rendering. Thus, I would imagine game engines are (often, I realize there is more than one way to do this) set up roughly as follows:
Code:
loop:
;wait for frame to begin (perhaps wait for our vblank to end with a simple flag)
;none of this code will update graphics or sound, just make various game related calculations (which may be quite CPU intensive, thus this is the most appropriate place for them)
jsr updateAllGameObjectsAndAI
jmp loop
vblank:
;all of these update graphics and sound as calculated by the game engine during the previous frame
jsr updateSprites ;sprite DMA
jsr updateBackground ;scrolling, nametable updates
jsr updateSound ;sound and music engine
;set a flag so the game engine knows it can do whatever it wants til the next vblank is called
rti
ZomCoder wrote:
Code:
-
lda $2002
clc
rol a
bcs -
I have never seen anyone poll for $2002 that way before! To me, that's not even a newbie way of doing it; it's more complex than the standard way to do it:
Code:
-
lda $2002
bmi -
And besides, you wouldn't need to clear the carry in that loop, as it gets overwritten with the value shifted into it.
I'm pretty sure you're right though about the game loop thing, just it's executed differently (probably with no polling). You never want the game loop to run at more than 60/50 Hz (either NTSC or PAL), and you never want the NMI to update data when its not ready. So that basically means you should have flags and stuff that indicate whether or not the data is ready. Though whether or not this approach is worth the complexity is debatable.
I also had another thought, one could do the music engine outside of vblank too as long as something (like the wait for a flag from vblank) is keeping things synchronized and timed well.
*edit* apparently you found my earlier edits...sorry about that. Bad habit. I was playing around with different ways of checking bit 7 in a 6502 simulator.
Celius wrote:
[...] To me, that's not even a newbie way of doing it; it's more complex than the standard way to do it:
Code:
-
lda $2002
bmi -
That's the standard incorrect newbie way (just wanted to be sure others knew you were showing it heh). Here's a correct way:
Code:
- lda $2002
bpl -
That is, you're waiting until the flag gets set, not clear (for one, reading it clears it, so the first loop will never iterate more than once).
I'm sorry, it was supposed to be a better way to loop of bit 7 was set, which was what was given. I guess I didn't really think about the meaning of the loop. There were actually 2 loops posted:
-
lda $2002
bmi -
-
lda $2002
bpl -
I just commented on one of them. That I gather would loop until the start of the next Vblank (so you don't do stuff mid-Vblank), which would make sense.
Don't worry buddy, WedNESday'll run that ROM just fine for you. Any chance of you releasing it?
Edit: I maid sum speling mistaks, agen.
Are you speaking of my ROM? I'm having no trouble getting it to work fine in multiple emulators and on real hardware since I realized my code was slow and how to split up my code between vblank and the main loop. There's nothing special about it as I am just learning, but if you'd like to run it through your emulator pm me and I'll be happy to send it to you.
ZomCoder wrote:
Are you speaking of my ROM? I'm having no trouble getting it to work fine in multiple emulators and on real hardware since I realized my code was slow and how to split up my code between vblank and the main loop. There's nothing special about it as I am just learning, but if you'd like to run it through your emulator pm me and I'll be happy to send it to you.
Please.
If you want to be accurate, the NTSC NES VBlank period is not over 2000 CPU cycles. It's closer to 1886 2/3 cycles. VBlank is 20 scanlines, each scanline takes 283 PPU cycles, and 1 CPU cycle is equal to 3 PPU cycles. So:
20*283/3 = 1886.6666...
I just don't want someone to waste their time coding something that takes 1900 cycles of VBlank and have them go "Why doesn't it work?"
Err... scanlines are 341 dots/ppu cycles, not 283. I don't know where you got 283 from.
341*20/3 = 2273.333 = over 2000
The 283 figure might refer to the portion of the NES scanline inside the NTSC production aperture, which includes the NES's left and right borders and the nominal analog blanking (
production aperture minus clean aperture) but excludes NTSC horizontal blanking.
Ah.. well that certainly explains why I didn't understand it.
Thanks Disch for the corrected and exact figure. It's just that "over 2000" is imprecise, and you usually want something more accurate when you're testing how long it takes your code to execute.