Quote:
I was thinking about the other interrupts (could mess up stack).
I didn't think of them at the time.
You can get around this "the poor man's way" by starting your stream from $0103 (or even higher up, if you think an interrupt will happen in an interrupt in your NMI interrupt). That way if the absolute worst case happens and an IRQ occurs immediately after you transfer #$02 to the stack, the program counter and process status flags will be put in a place that won't corrupt your stack when it wraps to $01FF. Of course, the beginning of the data stream can still be corrupt, but it should only affect data you have already pulled from the stack. Unless I'm wrong again.
tokumaru wrote:
Actually, the first color of the first sprite palette will be your background color... I'm too lazy to check, but I'm almost sure this is the case.
You're absolutely right, I just checked. Sorry for that bit of misinformation.
I think I finally understand the point of the 33rd write. It was to rewrite the background color after the byte from the sprite palette overwrote it. Writing to it separately isn't needed either, though. Tokumaru's code shows that.
Quote:
BTW, I'm of the opinion that using 32 bytes to define the palette is a bit of a waste... I use only 25 (3 colors for each palette * 8 + the background color), and I repeat the background color for all the palettes.
Heh. I had even written this out in my post, but deleted it. It saves only 7 bytes, and 28 cycles. I'm often told I go too far with that sort of thing.
For reference, here's my NMI routine. It has two formats.
The first thing it does is pull a "check byte". If the check byte doesn't set the negative flag, it means the check byte is actually the high byte of an address to write a string of bytes to. It writes this to $2006, then it pulls the second part of the address and writes that to $2006. The next byte is the number of bytes to write. The next byte is whether the PPU should increment by 1 or 32. Following that are the actual data bytes to write to the PPU.
If the check byte does set the negative flag, it checks if the byte is equal to #$FF. The stream ends on a "check byte" that is #$FF.
If the negative flag is set, but it's not #$FF that means a "One byte per address" (OBPA) stream is starting. The check byte is not used for this type of stream. So it pulls another byte, and that is the number of bytes to write-1. (We'll call it Z). 0 means there is one byte to write. The next Z+1 bytes are the bytes to write to the PPU. The next (Z+1)*2 bytes are the high and low bytes of the address the corresponding bytes need to be written to.
It has unrolled code for this type of stream.
OBPA mode is used for y attributes of course, but it could also be used for updating only a few palette colors or whatever else isn't sequential. It fails right now if you need to write more than 10. You have to add more obpa macros for that to write more than 10.
Apologies in advance for the nesasm format.
Code:
;Note: The NMI jumps to the "NMI" label and NOT the "NMI.minus" label.
;ppustream is $0100.
NMI.minus:
cmp #$FF
beq spriteDMA.stackres
pla;Loads the number of bytes to write (minus one)
tay
lda obpa.jmplow,y
sta <nmiaddrlow
lda obpa.jmphigh,y
sta <nmiaddrhigh
tsx;If ppustream,x is loaded, you'd get the number of bytes to write(minus one)
txa;Since we need to add the number of bytes in the stream to the
;current address to get the index location of the addresses
tay;The current index location is needed for y
sec;Adds one extra to make up for the one missing since the jmp
adc PPUstream,x;But this still only gives us the index location of the
;last byte in the byte stream
;Since we didn't start from the first byte in the byte stream
tax
txs;But since the stack reads the NEXT byte, we don't need to add one.
iny; y now contains the start of the byte stream
jmp [nmiaddrlow];jumps to the unrolled loop
NMI:;2270 cycles?
sta <nmia;Storing the registers so when this returns from the interrupt
stx <nmix;A, X, and Y can be reloaded so the expected values will be there
sty <nmiy;rather than the ones the nmi used
;One should probably use the stack to backup a, x, and y. I don't because... I don't.
lda <safetiles;A flag that tells if the stream is fully written. If it's not
bpl spriteDMA;We only sprite DMA
tsx
stx <nmistack
ldx #$FF
txs
inx
stx <safetiles
nmitileloopstart:
pla
bmi NMI.minus;If the high bit isn't set
sta $2006;It's an address
pla;Byte 2 of the address
sta $2006
pla;Number of Bytes to write
tay
lda <PPUmirror
and #%11111011
sta <PPUmirror
pla
ora <PPUmirror
sta <PPUmirror
sta $2000
nmitileloop:
pla
sta $2007
dey
bne nmitileloop
beq nmitileloopstart
spriteDMA.stackres:
ldx <nmistack
txs
spriteDMA:
ldy #$00 ; Must be done before a sprite DMA
sty $2003 ; Must be done before a sprite DMA
lda #$07
sta $4014
;sta $401F;remove
lda <PPUmirror
and #%11111100
sta $2000
sta <PPUmirror
lda <scrollxhigh
and #%00000001
beq nminametablexsetskip
ora <PPUmirror
sta <PPUmirror
nminametablexsetskip:
lda <scrollyscreenhigh
and #%00000001
beq nminametableysetskip
asl a
ora <PPUmirror
sta <PPUmirror
nminametableysetskip:
lda <PPUmirror
sta $2000
lda <scrollxlow
sta $2005
lda <scrollyscreenlow
sta $2005
lda #$FF
sta <vblank
lda <nmia
ldx <nmix
ldy <nmiy
rti
.macro obpabody
pla
sta $2006
pla
sta $2006
lda PPUstream,y
sta $2007
iny
.endm
;obpa = one byte per address
obpa.10:
obpabody
obpa.9:
obpabody
obpa.8:
obpabody
obpa.7:
obpabody
obpa.6:
obpabody
obpa.5:
obpabody
obpa.4:
obpabody
obpa.3:
obpabody
obpa.2:
obpabody
obpa.1:
obpabody
NMIreturntostream:
jmp nmitileloopstart
obpa.jmplow:
.db low(obpa.1)
.db low(obpa.2)
.db low(obpa.3)
.db low(obpa.4)
.db low(obpa.5)
.db low(obpa.6)
.db low(obpa.7)
.db low(obpa.8)
.db low(obpa.9)
.db low(obpa.10)
obpa.jmphigh:
.db high(obpa.1)
.db high(obpa.2)
.db high(obpa.3)
.db high(obpa.4)
.db high(obpa.5)
.db high(obpa.6)
.db high(obpa.7)
.db high(obpa.8)
.db high(obpa.9)
.db high(obpa.10)
There are ways to make it better I'm sure, like not changing how the PPU increments for every regular stream, or using the check byte for OBPA by anding out the high bit and using that to specify the number of bytes. I could also partially unroll the regular stream format. Still, I'm pretty happy with it right now. If any part of it is unclear or stupid, let me know. I didn't really clean it up for posting, but it does work and is fast enough to scroll 8 pixels in each direction in the same frame.