This is probably a really dumb question.
I know that the CPU has to write to $2007 twice for the low and high byte. Does that mean that the PPU performs writes whenever the second write to $2007 occurs i.e every 6 CPU cycles assuming that LDA is used? Are all reads and writes guaranteed to happen during a vblank (because the nametables should not be altered while the frame is rendering)?
Fjotten wrote:
I know that the CPU has to write to $2007 twice for the low and high byte.
You've gotten confused. The CPU has to write to $2006 twice to set the full address.
Data sent via $2007 is only a byte at a time.
Quote:
Does that mean that the PPU performs writes whenever the second write to $2007 occurs i.e every 6 CPU cycles assuming that LDA is used?
The PPU will attempt to perform a write during the 44 master clock cycles starting when the CPU starts the write to $2007.
Even if it's currently rendering.
Quote:
Are all reads and writes guaranteed to happen during a vblank (because the nametables should not be altered while the frame is rendering)?
Nope. Reads and writes via $2007 always happen ≈immediately.
I meant to write $2006 for the memory address…
So if I understand this correctly, the address is written to $2006, and once the data has been written to $2007, the PPU will keep writing whatever is in $2006 until the CPU reads from $2002?
No, you write to $2006 twice to select the address you want to access, and then you write the data, one byte at a time, to $2007. After each write, the address auto increments, so that you can write consecutive bytes without having to set the address again.
Reading from $2002 is just something you do as a safety measure to make sure that the next $2006 write does indeed set the high byte of the VRAM address, but it isn't required if the program always writes to $2006 and $2005 in pairs ($2005 and $2006 share the toggle that selects between first and second write).
Also, nothing is guaranteed to always happen in vblank. The programmer is responsible for timing everything so that the program doesn't try to write to VRAM when the screen is rendering, otherwise the screen will get corrupted.
You control writes to PPU by setting a $2000 bit...
---- -0-- = shift +1 every write
---- -1-- = shift +32 every write (=downward shift, since the screen is 32 tiles wide).
...bit set to 0
Code:
lda #$21
sta $2006
lda #0
sta $2006 ;sets PPU address to $2100
lda #1
sta $2007 ;value 1 now at $2100
lda #2
sta $2007 ;value 2 now at $2101
etc.
...bit set to 1
Code:
lda #$21
sta $2006
lda #0
sta $2006 ;sets PPU address to $2100
lda #1
sta $2007 ;value 1 now at $2100
lda #2
sta $2007 ;value 2 now at $2120
etc.
lidnariq wrote:
The PPU will attempt to perform a write during the 44 master clock cycles starting when the CPU starts the write to $2007. Even if it's currently rendering.
Can you elaborate on this? I was always curious about the read/write access of $2007.
I assume $2007 access sets some sort of signal which the PPU polls to actually perform the read/write. But what is the timing?
And if the writes are not immediate, they must be buffered somewhere. Is there a separate $2007 write buffer used or does it share the read buffer?
Disch wrote:
I assume $2007 access sets some sort of signal which the PPU polls to actually perform the read/write. But what is the timing?
Writes to and reads from $2007 start an FSM inside the PPU. (In visual2c02 check nodes "write_2007_trigger" and "read_2007_trigger"). Writes completely ignore whether the PPU is rendering, and reads
mostly ignore it.
Quote:
And if the writes are not immediate, they must be buffered somewhere. Is there a separate $2007 write buffer used or does it share the read buffer?
I believe it's the same buffer, although I've run out of patience to run around Visual2c02 to find the specific nodes.
lidnariq wrote:
Disch wrote:
And if the writes are not immediate, they must be buffered somewhere. Is there a separate $2007 write buffer used or does it share the read buffer?
I believe it's the same buffer, although I've run out of patience to run around Visual2c02 to find the specific nodes.
The "write buffer" isn't really a buffer - the value simply
floats on the internal data bus until it's ready to go out the data pins, and the PPU relies on the fact that this value takes a while to decay. It's actually the same value you get back when you read a write-only register (and modify when you write to any other register).
Well if the value is just sitting on the internal bus, the delay between the $2007 write and when it's actually written to the PPU must be pretty short. Seems like it wouldn't even be worth it to emulate a delay.... and is probably best to just have it take effect immediately.
44 master clock cycles is only 3⅔ CPU instructions or 5½ pixels... it's only "slow" in comparison to, say, DMA, or individual PPU reads.
So let's bring this back to my original question, then. What is the timing for the read/write?
You said "up to" 44 master cycles so I assume it can vary depending on what the PPU is currently doing. Is it like scroll updates where the PPU does it on certain fixed points in the frame? Or is it a fixed 44 master cycle delay between the $2007 write and the PPU write?
It's clocked by the pixel clock, so it depends on the CPU-PPU phase. But the default Visual2c02 CPU-PPU phase timing is:
1⅞px/7½mcy during /WR2007
1⅛px/4½mcy idle after /WR2007 finishes (which is where the variation will come from)
1px/4mcy for ALE cycle
½px/2mcy idle
1px/4mcy for PPU /WR
Aren't there four master clock cycles per pixel? Or are you talking about the half-cycles used by the color encoder?
Quote:
The "write buffer" isn't really a buffer - the value simply floats on the internal data bus until it's ready to go out the data pins, and the PPU relies on the fact that this value takes a while to decay. It's actually the same value you get back when you read a write-only register (and modify when you write to any other register).
Wow I'm surprised this works at all. Sounds like really choppy electronics design, I would certainly had bad grades as a student if I ever designed a chip like that.
There should still be some kind of buffer, since the PPU is connected to the CPU data lines (which will send the 8 data bits written to $2007), but then those data lines will immediately be used for fetching the next instruction, and as such another value will appear on this bus. I do not see how the PPU can rely on it when doing a later write to it's own bus, even if it's only 1 cycle later.
Bregalad wrote:
Quote:
The "write buffer" isn't really a buffer - the value simply floats on the internal data bus until it's ready to go out the data pins, and the PPU relies on the fact that this value takes a while to decay.
Wow I'm surprised this works at all. Sounds like really choppy electronics design, I would certainly had bad grades as a student if I ever designed a chip like that.
Then you'd end up flunking along with a
lot of professional computer engineers, as a dynamic latch is a fairly common design pattern.
Wikipedia's article says it's distinguished "by exploiting temporary storage of information in stray and gate capacitances. [...] Dynamic logic circuits are usually faster than static counterparts, and require less surface area, but are more difficult to design."
Perhaps Nintendo engineers considered 44 master clocks (11 dots) fast enough when the CPU is guaranteed not to write to $2007 more often than every 48 master clocks (4 cycles) so long as RMW instructions (ASL, LSR, ROL, ROR, INC, DEC) are not used.