Logic tells me that $4014 would start copying data from the given address and putting it in OAM, starting at the sprite address specified by $2003. For example:
Code:
LDA #$80
STA $2003
LDA #$02
STA $4014
That would copy $0200->OAM[$80], $0201->OAM[$81], etc
However various games give me trouble when emulating like this -- and I find I have to start DMA at OAM[0], REGARDLESS of the sprite address. In my most recent example.... Akira (J) does the following:
Code:
C130: A2 00 LDX #$00 30 20 10 [..I.C] FD
C132: 8E 03 20 STX $2003 [2003=BA] 30 00 10 [..IZC] FD
C135: BD DA C1 LDA $C1DA,X [C1DA=7F] 30 00 10 [..IZC] FD
C138: 8D 04 20 STA $2004 [2004=BA] 7F 00 10 [..I.C] FD
C13B: E8 INX 7F 00 10 [..I.C] FD
C13C: E0 04 CPX #$04 7F 01 10 [..I.C] FD
C13E: 90 F5 BCC $F5 [C135] 7F 01 10 [N.I..] FD
C135: BD DA C1 LDA $C1DA,X [C1DB=81] 7F 01 10 [N.I..] FD
C138: 8D 04 20 STA $2004 [2004=BA] 81 01 10 [N.I..] FD
C13B: E8 INX 81 01 10 [N.I..] FD
C13C: E0 04 CPX #$04 81 02 10 [..I..] FD
C13E: 90 F5 BCC $F5 [C135] 81 02 10 [N.I..] FD
C135: BD DA C1 LDA $C1DA,X [C1DC=21] 81 02 10 [N.I..] FD
C138: 8D 04 20 STA $2004 [2004=BA] 21 02 10 [..I..] FD
C13B: E8 INX 21 02 10 [..I..] FD
C13C: E0 04 CPX #$04 21 03 10 [..I..] FD
C13E: 90 F5 BCC $F5 [C135] 21 03 10 [N.I..] FD
C135: BD DA C1 LDA $C1DA,X [C1DD=F0] 21 03 10 [N.I..] FD
C138: 8D 04 20 STA $2004 [2004=BA] F0 03 10 [N.I..] FD
After that code, the game writes to $4014 some time later -- however between the above code and the DMA, there is no further writes to $2003 or $2004. That above code leaves the sprite address at $04, so my emu currently is copying sprite data starting at OAM[4] (making sprite 0 use the last 4 bytes copied instead of the first 4).
Of course... this is causing sprite 0 to never be rendered, and the game falls into an infinite wait-for-sprite-0-hit loop afterwards.
Any clarification on this? Does the DMA write $00 to $2003 before copying bytes (perhaps that's why it takes 513 cycles instead of 512)? Or does the DMA just ignore the sprite address completely (seems less likely)? Or is there some other reason behind this?
How many CPU cycles elapse between the last write to $2003/$2004 and the writes to $4014?
Perhaps the OAM destination address, like OAM itself, is DRAM that leaks down to $00.
Between the last $2004 write and the next $4014 write, it's about 5 scanlines.
There are 5 $4014 writes total after the $2004 write (one every frame)... then the game gets stuck.
My
misc PPU test ROMs one tests sprite DMA and shows that the value in $2003 is used, and left intact after DMA (since it copies 256 bytes). In my emulator I clear $2003 at the beginning of VBL and am pretty sure I found this to be the case on the NES. Where in the frame does this trace occur?
The $2003 write occurs around cycle 279 of scanline 234 (where 0-239 are rendered scanline)
The PPU is off at the time the above code is run.
In game -- the hang happens after you press Start at the title screen (Start/Continue)
I tried Akira (J) (after quickly implementing its mapper) and it hangs when I don't have my PPU clear $2003 at the beginning of VBL. Based on the timing you stated, 5 scanlines would put the $4014 write just after VBL where $2003 is cleared. What am I missing?
Is it possible that screen rendering messes with $2003? Perhaps the PPU changes it as it fetches sprite data from OAM for rendering, and by the end of the frame it's found its way back to $00?
</wild guess>
Quote:
Is it possible that screen rendering messes with $2003? Perhaps the PPU changes it as it fetches sprite data from OAM for rendering, and by the end of the frame it's found its way back to $00?
Looks to be the case. PPU rendering disabled for several frames, $2003 unchanged. Enabling rendering at VBL+29544 (scanline 259, 316 PPU clocks) then disabling causes sprite address to become zero. Enabling rendering one CPU clock later leaves sprite address unaffected. Also enabling then disabling before the end of the frame leaves various values in the sprite address (not shown).
Code:
lda #0
sta $2001
lda #$EA
sta $2003
jsr sync_ppu_20 ; after return, time = VBL-20
ldy #140 ; 29559 delay
lda #41
jsr delay_ya2
lda #$18
sta $2001 ; writes at VBL+29544
lda #0
sta $2001
ldy #57 ; 30000 delay
lda #104
jsr delay_ya1
jsr determine_spr_addr
jsr print_y ; prints $00
lda #$EA
sta $2003
jsr sync_ppu_20 ; after return, time = VBL-20
ldy #140 ; 29560 delay
lda #41
jsr delay_ya3
lda #$18
sta $2001 ; writes at VBL+29545
lda #0
sta $2001
ldy #57 ; 30000 delay
lda #104
jsr delay_ya1
jsr determine_spr_addr
jsr print_y ; prints $34
jmp forever
blargg wrote:
Enabling rendering at VBL+29544 (scanline 259, 316 PPU clocks) then disabling causes sprite address to become zero. Enabling rendering one CPU clock later leaves sprite address unaffected.
That kind of makes sense when you think about it. For the PPU to check in-range values for sprites, it needs cycles from pretty much the whole scanline. If you flip the PPU on mid-scanline, it might already be past the time it would need to start in-range checks, so they might not be performed until the next scanline.
Since the end of scanline 259 is near the start of the last rendered scanline -- I guess enabling sprites past that point makes it impossible to do sprite evaluations on the next line
Quote:
Also enabling then disabling before the end of the frame leaves various values in the sprite address (not shown).
I would assume these values reflect the OAM fetches made during sprite evaluations. Kind of like how reading $2004 during rendering exposes what OAM byte the PPU is using -- turning the PPU off during rendering exposes the OAM address the PPU was using.
Ah well -- more assumptions and guessing on my part. Thanks for shedding some light on this mystery blargg. Does this mean we'll be fortunate enough to get a new batch of test ROMs soon? ;D
According to the nesdev wiki:
Code:
Cycles 0-63: Secondary OAM (32-byte buffer for current sprites on scanline) is initialized to $FF - attempting to read $2004 will return $FF
And...
Code:
On even cycles, data is read from (primary) OAM
On odd cycles, data is written to secondary OAM (unless writes are inhibited, in which case it will read the value in secondary OAM instead)
Perhaps $2003 is used as an index pointer into both primary and secondary OAM by the PPU's internal logic? This would explain why the PPU needs two cycles to copy a single byte from primary to secondary OAM. It would also mean that $2003 would be reset to zero not only at the beginning of the screen, but at the beginning of each active scanline (so that secondary OAM, starting at index $00, could be reset). This could be checked by disabling rendering at carefully timed intervals within a scanline, and then determining whether $2003 points to the location that would be predicted by this hypothesis.
Alternatively, it's possible that $2003 is only used for reads from the primary OAM. After all, during evaluation, the first byte read from the main OAM is OAM[n][0], which is the first byte of OAM (index $00). Obviously, empirical testing on a real NES will be needed in order to determine what exactly the PPU is doing, but I think it's highly likely given the design philosophy of the PPU that $2003 serves a dual use as both a user register and an internal register during rendering (just as $2004 and the PPU VRAM address do).
Quote:
Does this mean we'll be fortunate enough to get a new batch of test ROMs soon?
I (currently) care little about the details of mid-scanline PPU operation. There are too many details that have little practical use. If it's something that has a significant effect on many games, like clearing $2003 (near) the end of the frame, I'm interested in the general result, but not the details.
Fair enough.
Clearing $2003 at the end of the frame when the PPU is on will suffice for Akira and the games like it (iirc, some Donkey Kong Country pirate had a similar problem, and some other pirate RPG game.... Phantasy Star I think?)
So, are we sure that $2003 is cleared at the *end* of the frame, rather than the beginning? Also, do we know if $2003 is cleared *only* at the end of the frame, or is it cleared at the end of each scanline rendered?
Josh wrote:
So, are we sure that $2003 is cleared at the *end* of the frame, rather than the beginning?
Its value likely fluxuates as the screen is rendering. As blargg has mentioned, turning the PPU off mid-frame leaves $2003 with different values (presumably with what the PPU set it to last)
Quote:
Also, do we know if $2003 is cleared *only* at the end of the frame, or is it cleared at the end of each scanline rendered?
Probably every scanline. Most likely it repeats a similar pattern every scanline as it does sprite fetches (I'd even assume that $2003 reflects the address of the OAM byte the PPU is fetching at any given cycle on the scanline).
Since $2003 cannot be written to or utilized by the CPU during rendering... how it changes mid-frame doesn't really impact emulation at all. Therefore from an emulator standpoint -- the only thing that really matters is what $2003 is at the end of the frame... and when the PPU is turned off mid-frame.
We already know the former is 00... and an educated guess could be made on the latter using the known OAM fetches during the scanlines. But as of yet the latter is unknown.
I know this thread hasn't been active for a while, but I figured I'd mention my findings
.
My results are based on testing the following roms:
oam_stress.nes
Huge Insect
Akira
Firstly, I implement the OAM evaluation as described here:
http://wiki.nesdev.com/w/index.php/PPU_ ... evaluationThe only addition is that I had to make
Code:
OAM[n][1]
really mean
Code:
OAM[(OAMADDR + n) & 0xff][1]
. This allows Huge Insect to rotate sprite 0 without rotating the contents of RAM, it simply changes the OAMADDR before rendering.
* If I clear OAMADDR when vblank starts I got oam_stress.nes pass, but Huge Insect would hang (no sprite 0 hit).
* If I clear OAMADDR at some point point sprite evaluation each scanline, Huge Insect works, but oam_stress.nes fails.
Then i tried one more thing. clearing OAMADDR at DMA start... I did this because I recalled some people saying it appeared to be $00 after DMA.
When I combined the clearing before DMA with clearing late in the scanline (I do it at dot 256 currently).
all 3 seem to work correctly So in summary:
I don't know if this is correct when compared to actual hardware behavior, but it appears to make things work:
Clearing OAMADDR when DMA is initiated and post sprite evaluation every scanline makes all 3 ROMs appear to work.
Huge Insect seems to expect that whatever offset pointed to be OAMADDR be considered Sprite 0 when sprite evaluation is occurring.
Thoughts? Comments?
Some of this stuff could probably be verified pretty easily in Visual 2C02. I put together a tutorial on it my PPU diagram thread that might help with getting started.
We did some Visual 2C02 tracing on IRC and figured out what's probably going on. See
http://wiki.nesdev.com/w/index.php/PPU_ ... of_.242003.
As a followup/summation, here's the findings I've been discussing with Ulfalizer and Quietust on IRC.
* Sprite evaluation appears starts at OAM[$2003] not OAM[0]
* At dot 257-320 $2003 is set to $00
* On pre-render, at dot 257, before/during the $2003 reset, it **appears** that 8 bytes are copied from OAM[$2003 & 0xf8] to OAM[0]
* My previous hack of having $4014 reset $2003 was incorrect and unnecessary, it was just that, a hack.
Using these rules, oam_read.nes, oam_stress.nes, and Huge Insects all run as expected
proxy wrote:
* Sprite evaluation appears starts at OAM[$2003] not OAM[0]
I had missed this bit in my write-up. Does that also affect Huge Insect?
Is this known behavior explicitly assumed by the test roms btw, only undocumented on the wiki?
I can't reproduce the copying behavior in Visual 2C02 if I set $2003 = 8 and fill S007-S00F with some non-zero bytes. Nothing seems to get copied at tick 257 of the pre-render scanline. :/
Has it actually ever been seen in practice? Seems like it'd be a pretty bad hw bug since it means setting $2003 to anything >7 without resetting it messes up your first two sprites.
Edit: I'll revert the wiki edit until things have been verified.
Edit 2: If there's no copying, it seems the changed start address for sprite evaluation might still affect things, since it might change which sprite gets treated as "sprite zero" (and also change the priority between sprites).
Let's clear one thing up: oam_stress.nes is NOT a test to test whether your emulator is accurate when compared to the real hardware. "PASS" doesn't mean that your emulator matches the real hardware, it means that the OAM reading works (whereas on the real NTSC PPU hardware, OAM reading does NOT work consistently).
The "test" originates from here:
viewtopic.php?p=62044#p62044
Agreed that nes_stress.nes isn't reliable on hardware, and it may turn out to be an invalid test.
But, as far as I understand the test is designed to pass on an idealized version of the NES, in other words, one that behaves the same way, but has more reliable OAM reads. After all, all it is doing is writing random values to OAM and reading them back, this *should* work if the reads and writes are reliable.
I imagine it is somewhat akin to when PC emulators simulate a floppy disk, the real thing fails often and requires retries, but almost all emulations (bochs, qemu, etc) simulate an "ideal" floppy drive which never fails.
@Ulfalizer: I've never experienced the copying behavior, it was hypothesized by Quietust to explain some of the observed $2003 effecting sprite 0 hits. When I implemented the hypothesized behavior, the ROMs were happy. It is entirely possible that the behavior has another explanation.
I had some note in my NES development files that OAM behaved properly on some CPU-PPU clock alignments. I do also seem to remember PAL being problem-free.
proxy wrote:
I imagine it is somewhat akin to when PC emulators simulate a floppy disk, the real thing fails often and requires retries, but almost all emulations (bochs, qemu, etc) simulate an "ideal" floppy drive which never fails.
I think the situation here is a little bit different, because whereas something like a floppy disk drive is basically an essential feature that really benefits from "idealisation", OAM readback is used by very, very few games. And even they don't require it to work properly (because they can't, because it doesn't).
If you are making an emulator for gaming, it doesn't matter much how accurate your emulator is with regard to OAM writing/reading. If it's targeted at developers, I would much rather have it work like the real hardware does.
Anyway, the point I was trying to make was that this specific test differs from most other blargg's tests in this regard (blargg, too, makes the same point in the thread I linked in the earlier post). Just wanted to make the distinction so people don't get confused.
You may be right, unfortunately at least two games do depend on the inner workings of OAM reading.
Micro Machines seems to use it for timing.
Huge Insect seems to play with $2003 in order to rotate sprites (and therefore sprite 0) without reordering them in RAM.
All in all, my primary goal is to get these things to work. If I can do that, AND have oam_stress.nes pass, it's a good day
.
The problem is that oam_stress should explicitly return failure when emulating a 2C02 (and success when emulating a 2C07). Otherwise we could end up with even more homebrew that runs only in emulators, and not on hardware.
Nevermind that I still haven't gotten any closer to figuring out what's wrong here-
viewtopic.php?t=9912
proxy:
Is the glitchy copying required to get Huge Insect to run, or is it sufficient to emulate the $2003 clear and changed start of sprite evaluation (which should affect which sprite gets treated as sprite 0 i think)?
I did some tracing in Visual 2C02 to figure out what's going on with OAMADDR and sprite evaluation. Turns out to be pretty straightforward. Here's what I added to the PPU register page:
Code:
==== Obscure details of OAMADDR ====
OAMADDR is set to 0 during each of ticks 257-320 (the sprite tile loading interval) of the pre-render and visible scanlines.
The value of OAMADDR when sprite evaluation starts at tick 65 of the visible scanlines will determine where in OAM sprite evaluation starts, and hence which sprite gets treated as sprite 0 as well as relative sprite priority. The first OAM entry to be checked during sprite evaluation is the one starting at OAM[OAMADDR]. If OAMADDR is unaligned and does not point to the y position (first byte) of an OAM entry, then whatever it points to (tile index, attribute, or x coordinate) will be reinterpreted as a y position, and the following bytes will be similarly reinterpreted.
Huge Insect might depend on this behavior.
So what happens when $FC is written to OAMADDR during the prefetch (321-340) or the start of the next line (0-64)? Does that effectively hide most sprites?
I wonder what causes the two skipped sprites on the first rendered frame after a reset, as demonstrated
here.
I wonder how this affects OAM overflow behavior.
Code:
3b. If the value is not in range, increment n AND m (without carry). If n overflows to 0, go to 4; otherwise go to 3
In cases where OAM_ADDR has been written to values other than 0, it seems likely that this behavior would change. Unless the hardware literally relies on an $FC->$00 transition.
tepples wrote:
So what happens when $FC is written to OAMADDR during the prefetch (321-340) or the start of the next line (0-64)? Does that effectively hide most sprites?
Looks like it. Once the end of OAM is reached during sprite evaluation, no more data gets written into the secondary OAM.
After it wraps around, it still reads from the primary OAM like if it's looking for more matching sprites though (but never finds any, even if the y is within range). What I'm guessing happens is that the same sprite evaluation logic keeps ticking throughout the entire sprite evaluation period, only some flag is raised once the end of OAM is reached to prevent further sprites from matching.
This might be needed since the end of OAM could be reached well before the end of the sprite evaluation period if few sprites match, and if it just kept going, it might see the same sprite match twice and so copy its data twice into secondary OAM.
Edit: Maybe this is the same flag that gets raised once 8 sprites have found and the secondary OAM is full, since you get the same kind of access pattern after that.
Edit 2: Seems writes to the secondary OAM would have to be disabled too, since the y coordinate for sprites that don't match still get written into the secondary OAM. (There seems to be something magic about the rest of the bytes for those sprites being FF that prevents them from showing up. Do sprites show up at x = 255? Edit 3: I guess those sprites might be rejected at the same time that it calculates which row of the sprite to use, since y will always be out of range then...)