I understand that when reading from $2007, palette requests are served immediately because they are onboard in the PPU (same goes for OAM access through $2004), but all other accesses need to be routed through the I/O bus, which means that the actual result is not available when the CPU asks for it, but one PPU cycle later, available on the next read*.
The value that the PPU loads from the I/O is available in some kind of cache internal to the PPU. Nintendulator calls it "buf2007", Nestopia calls it "io.buffer".
However, my question is: Which actions overwrite the buffer? Is the buffer affected, for instance, when writing to $2007? How about, is it affected when the PPU internally does memory access during the screen rendering? How long is the value stored? In Nestopia and Nintendulator the buffer is maintained indefinitely and only affected by $2007 reads, but lacking documentation on the matter, I cannot trust that this behavior is correct. Both of these should be easy to test on real hardware, because you do not need to analyze the meaning of the value that is read, only whether it differs from what it would be if the previous $2007 read was stored indefinitely.
*) Presumably, $2007 writes are also delayed by ½ .. 1 cycle for the same reason.
Bisqwit wrote:
Which actions overwrite the buffer?
None, to my knowledge. That buffer supposedly is only used for reading from $2007. It's contents are however filled with VRAM on palette reads. There is a test ROM for that as well.
Bisqwit wrote:
Presumably, $2007 writes are also delayed by ½ .. 1 cycle for the same reason.
AFAIK, only the reads are delayed, since they have to be fetched from external memory, and that buffer is a function of design to sort of work around this.
The delay on write he's mentioning would be the one inherent in the PPU needing two cycles to access memory, so the data from the CPU probably can't go directly out. This doesn't matter much, as there's pretty much nothing that could actually detect that short of a scope/analyzer.
You can't write during rendering, and if the PPU isn't rendering, the only thing doing any reading is $2007 access, which you can't do in less than 3 PPU cycles.
One possibility comes to mind that would remove the need for any real delay or buffering on the read, but it certainly isn't testable without an analyzer. When rendering is disabled, either via $2001 or vblank, the PPU could be constantly latching the $2006 address, so that when you write to $2007, it could switch the ALE pin off and drive the data out AD0-7 more or less immediately. Determining that would take an analyzer, or some otherwise custom hardware to record or display the PPU output pins. Kevtris had a board that could do something like that at one point.
An auto-prefetch on the second write to $2006 would probably have taken too many gates compared to how often $2007 is actually read.
Not so much an auto-prefetch, as just having the A13-8/AD7-0 lines default to the $2006 address.
beannaich wrote:
Bisqwit wrote:
Which actions overwrite the buffer?
None, to my knowledge. That buffer supposedly is only used for reading from $2007. It's contents are however filled with VRAM on palette reads. There is a test ROM for that as well.
So it is like a dedicated register only used for that purpose? The PPU has an extra 8 bits of RAM (or register or whatever is the right term) for that single purpose? Nothing else updates it?
Bisqwit wrote:
So it is like a dedicated register only used for that purpose? The PPU has an extra 8 bits of RAM (or register or whatever is the right term) for that single purpose? Nothing else updates it?
I can't say definitively, because I haven't actually seen a RP2C02 datasheet. But, I have never run into any situation where anything needed to use that buffer for a purpose other than $2007 reads.
To sum up:
The buffer isn't directly accessible to the programmer. It's internal to the PPU, and the PPU only seems to use it to buffer reads to external memory. Whether or not it also uses the register to buffer writes, I can't say, but what I can say is that emulators work perfectly well only using the buffer for $2007 reads. That's not to say that the behavior is accurate, but it's as good as you can get without decapping RP2C02.
One can do better:
read, write, then read to see if it changes the buffer contents
read, render for a frame, then read to see if it changes the buffer contents
beannaich wrote:
The buffer isn't directly accessible to the programmer.
Sure it is! Just issue a VRAM read, and only use the first result and don't wait for the second result where the buffer has been replaced with the data you requested.
Bisqwit wrote:
beannaich wrote:
The buffer isn't directly accessible to the programmer.
Sure it is! Just issue a VRAM read, and only use the first result and don't wait for the second result where the buffer has been replaced with the data you requested.
That is an indirect way of accessing it, it also won't yield anything of use.
tepples wrote:
One can do better:
read, write, then read to see if it changes the buffer contents
read, render for a frame, then read to see if it changes the buffer contents
I assume someone has already tested this?
beannaich wrote:
That is an indirect way of accessing it, it also won't yield anything of use.
As in "I don't know what it yields but prolly nothing useful", or "I know it just gives the same value you would have gotten earlier if you read it earlier"?
Bisqwit wrote:
As in "I don't know what it yields but prolly nothing useful", or "I know it just gives the same value you would have gotten earlier if you read it earlier"?
The former. It's useless to rely on that read regardless of whether the PPU initializes the buffer to a known state or not, because that value is irrelevant to your program. Say it returns a garbage value after reset, it should be obvious that you can't rely on a random (or pseudo-random) value for anything other than a seed for a RNG (typically). Say it's initialized to 00h, what good does that do you? Either way, the $2007 read buffer is just that, a buffer for PPU $2007. It's only use is to interact with PPU memory, and it's not directly modifiable.
beannaich wrote:
Bisqwit wrote:
As in "I don't know what it yields but prolly nothing useful", or "I know it just gives the same value you would have gotten earlier if you read it earlier"?
The former. It's useless to rely on that read regardless of whether the PPU initializes the buffer to a known state or not, because that value is irrelevant to your program.
This is NESemdev, not NESdev. I have no intention of writing NES programs that rely on the value read; I only seek to know what is the
proper way to implement in the emulator (as opposed to the lazy way that enables all currently known games to work).
I've never seen any documentation about the behavior you were asking about. My personal guess is the buffer is used for this specific purpose only. But like you said, it would be easy to write a test ROM test this, why are so few emulator authors actually doing that?
Bisqwit wrote:
This is NESemdev, not NESdev. I have no intention of writing NES programs that rely on the value read; I only seek to know what is the proper way to implement in the emulator (as opposed to the lazy way that enables all currently known games to work).
I was attempting to answer your question giving examples as to why the behavior makes sense, as it's currently all we have. But if you'd like to continue to give attitude, then I can stop responding to your questions.
For the time being, the buffer is for $2007 reads only, and I don't hold out much hope for a test ROM showing that it does much else.
thefox wrote:
I've never seen any documentation about the behavior you were asking about. My personal guess is the buffer is used for this specific purpose only. But like you said, it would be easy to write a test ROM test this, why are so few emulator authors actually doing that?
I would guess no one is really interested because it most likely is only used for a specific purpose, as you guessed. I think by now if there were another use, a game would have made use of it, or it would have been discovered. There isn't enough mystery surrounding it to warrant a test ROM, as easy as it would be to make one. I would gladly do it, if I had a power pak. I have an NTSC front loading NES sitting around collecting dust, but no way to run homebrew code on it.
I am just wondering whether in my emulator, can I use the same buffer for $2007 reads by the CPU and for VRAM reads triggered by the scanline rendering.
Knowing how PPU works either way would be helpful.
Using the same buffer would make the emulator a bit shorter and less filled with special cases, so there is the motive. But I don't want to do it if it means making the emulator less accurate.
Telling me what a game programmer should do or not do on the NES platform really does not help me in this question, though I appreciate the effort; it may be valid documentation for others. It's just not something that addresses my question.
Bisqwit wrote:
I am just wondering whether in my emulator, can I use the same buffer for $2007 reads by the CPU and for VRAM reads triggered by the scanline rendering.
This will probably introduce more special cases, only allowing $2007 reads when rendering is off. I also believe the PPU has special shifters for VRAM reads triggered by scanline rendering. So I don't think emulating in the way you are talking about is accurate. The $2007 read buffer should only be accessed/modified by $2007 reads. The rest of the PPU hardware should be functionally unaware of that buffer, as it has it's own hardware set aside for rendering purposes.
Otherwise, I could see all sorts of weird bugs occurring. If the PPU modified that buffer, and the program did some tricky reads via $2007 during rendering, for example.
Also, I often find it very helpful to consider things from an NES programmer's perspective. Although I have never once in my life written any code to run on an NES.
beannaich wrote:
There isn't enough mystery surrounding it to warrant a test ROM, as easy as it would be to make one. I would gladly do it, if I had a power pak.
I have a PowerPak, but a PowerPak won't help me write ROMs that test power-up state. It would help someone write ROMs that test whether a frame's worth of rendering corrupts the $2007 readback buffer, which is all Bisqwit wants to know.
The only programs I know of that purposely read from $2007 during rendering are the game Young Indy Chronicles and the demo Boing 2007. See
previous topic.
beannaich wrote:
thefox wrote:
I've never seen any documentation about the behavior you were asking about. My personal guess is the buffer is used for this specific purpose only. But like you said, it would be easy to write a test ROM test this, why are so few emulator authors actually doing that?
I would guess no one is really interested because it most likely is only used for a specific purpose, as you guessed. I think by now if there were another use, a game would have made use of it, or it would have been discovered. There isn't enough mystery surrounding it to warrant a test ROM, as easy as it would be to make one.
My message probably was a bit unclear. I wasn't really talking about this case only, it just seems in general that surprisingly few emulator authors are willing to test stuff on the real hardware. There's so much hearsay and guessing on these types of threads, I wish more people would get down and dirty with the hardware to discover some new facts. Then again, I'm not so surprised that this is the case. Like you said, most emulator authors only care about running the games correctly.
Quote:
Otherwise, I could see all sorts of weird bugs occurring. If the PPU modified that buffer, and the program did some tricky reads via $2007 during rendering, for example.
Weird bugs are to be expected anyways if the program attempts to read $2007 during rendering.
I just noticed that Zelda II's title screen relies on $2007 reads (Probably due a bug in the code) to scroll the bottom half of the title screen up 2 scanlines lower. Fceux and Nintendulator get it right, Nestopia doesn't.
Dwedit wrote:
I just noticed that Zelda II's title screen relies on $2007 reads (Probably due a bug in the code) to scroll the bottom half of the title screen up 2 scanlines lower. Fceux and Nintendulator get it right, Nestopia doesn't.
This is a good find as it validates the behavior of $2007 reads when $2000.2 == 0 (Young Indiana Jones Chronicles reads $2007 while $2000.2 == 1). Thanks!