I put together a test ROM to verify the addressing modes which have dummy reads. It tests STA and LDA with modes (ZP,X), (ZP),Y and ABS,X. Dummy reads are made for the following cases:
STA ABS,X or (ZP),Y
LDA ABS,X or (ZP),Y when carry is generated from low byte
The dummy read is at (ABS & 0xFF00) | ((ABS + index) & 0xFF). Presumably the same applies to the many other read instructions like ADC, ORA, CMP, etc.
Test ROM and full ca65 source code:
cpu_dummy_reads.zip
Some other things come to mind, like the timing of these accesses, and RMW instructions write testing (maybe use $2007 for that).
Does this means that lda $4016,X won't work (assuming X is #0 for joypad #1 and X is #1 for joypad #2), because it wil do a dummy read and then you'll get only one bits out of two from the serial joypad port ?
Quote:
Does this means that lda $4016,X won't work
It'll work fine since there is no carry generated when the index is added to the low byte of the address. The source code for the test ROM covers it pretty well. I always suggest people read it since it's the final word on what is really being tested.
Code:
ldx #$22
lda $2000,x ; no dummy read
ldx #$22
lda $20E0,x ; dummy read from $2002
ldx #$22
lda $20E2,x ; dummy read from $2004
ldx #$22
lda $3FE0,x ; dummy read from $3F02
Quote:
Some other things come to mind, like the timing of these accesses, and RMW instructions write testing (maybe use $2007 for that).
Yes, I want to make a full instruction timing test some time that verifies the timing of every non-opcode access.
LDA $4016,X is fine because it only reads $4016 once (assuming X is zero)
For read-only ops... the dummy read is only performed when an extra cycle is used (when X crosses a page boundary).
For write ops like STA... the dummy read is performed every time (which is why STA always takes an extra cycle even if it doesn't cross a page boundary)
RMW instructions (e.g. ASL $0246, X) will also perform a dummy read regardless of whether or not a page boundary is crossed. RMW instructions also include the famous "dummy write" where the memory address gets written twice, once with the old value and once with the new value.
Branch instructions may contain up to two dummy PC reads, depending on where the branch goes and whether a page boundary is crossed.
JSR has a dummy stack read before anything gets pushed to the stack (no idea why). All stack pull instructions (PLA, PLP, RTI, and RTS) have a dummy stack read before S is incremented. Further, RTS has a dummy PC read before the return address is incremented.
All single-byte instructions have a dummy PC read during the second cycle, at the address of the next instruction.
The modes Zp,X, Zp,Y, and (Zp,X) all have a dummy read in zero page, at the base address (before X or Y is added).
BRK's second instruction byte is read in the second cycle of the instruction.
Remember that a memory access occurs on every clock cycle, so if an instruction takes five cycles, there are five accesses (no more, no less). Of course, a lot of these dummy reads don't really have any consequence - the only real important ones are the indexed addressing modes and the extra write during RMW instructions.
Yeah. If (# of cycles the instruction takes - # of bytes the instruction is - #of reads/writes the instruction is supposed to do) is greater than zero, then you get dummy read and writes.
For example lda [$00],Y takes 2 bytes, is supposed to read $00 and $01, and to read the adress pointed in [$00],Y. So it's supposed to do 5 memory acesses, and the instruction takes 5 cycles, so you know there is no extra read or write. That's easy enough.
- I didn't get it yet. Like, lda $20E0+22 does a dummy read from $2002, but what data is actually fetched? Garbage?
A dummy read is simply one whose result is ignored by the CPU; in all other aspects it is a normal read. So a dummy read of $2002 should clear the VBL flag, etc. just as a normal read would.
So, the accumulator does not change?
Additionally, there's another problem: in the test rom, there's a lda 3FE0+22 (X indexed), making a dummy read from 3F02. Yes, it is a page cross, but my cpu core takes the effective sum, reading from 3FE0+22 = 4002, instead of 3FE0. So, should it read from offset&0xFF00 + (offset+X)&0xFF for every page crossing?
Correct for both. If (abs&0xFF)+X > 0xFF, then LDA abs,X makes a dummy read from (abs&0xFF00) | (abs+X)&0xFF, or more simply in an emulator, abs+X-0x100 (since you know it's going to generate a carry in this case, you can just subtract it).
The accumulator (and flags) does not change. Internally the CPU does a memory access every cycle. In this case, it sees that there was a carry when adding X to the low byte of the address, so it throws away the value read from memory, then increments the high byte of the address and then accesses the correct byte of memory.
OK, I got it. Thanks a lot. ^_^;;
*bump* Could someone update the link for the test ROM, please? ^_^;;
I would, but applying the mapping from
blargg's previous post gives a 404.
Cpow's archived most of the known test roms here (including this one)-
https://github.com/christopherpow/nes-test-roms