I'm sure this has probably been covered before, but wanted to share anyway.
I was analyzing my code to see what routines were using the most cycles overall. I was surprised to see that one of my most cycle hungry routines was just a simple loop I was using to clear out my shadow OAM before reloading sprite data into it:
I wanted to see if I could reduce the amount of cycles used here, and tried the following instead:
I was really surprised to see that this uses roughly half as many cycles, for just a handful of extra bytes of code. I understood in theory that a loop which makes 256 comparisons would take more cycles that a loop that only makes 32, but I had never actually tried it and looked at the difference in speed.
I know this is probably old-hat to many of you, but I'm excited about it. it's enough of a difference to visibly reduce lag in some areas, and I'm very pleased with it.
I was analyzing my code to see what routines were using the most cycles overall. I was surprised to see that one of my most cycle hungry routines was just a simple loop I was using to clear out my shadow OAM before reloading sprite data into it:
Code:
LDX #0
LDA #$FE
.Loop:
STA spriteTable, X
INX
BNE .Loop
RTS
LDA #$FE
.Loop:
STA spriteTable, X
INX
BNE .Loop
RTS
I wanted to see if I could reduce the amount of cycles used here, and tried the following instead:
Code:
LDX #31
LDA #$FE
.Loop:
STA spriteTable, X
STA spriteTable + 32, X
STA spriteTable + 64, X
STA spriteTable + 96, X
STA spriteTable + 128, X
STA spriteTable + 160, X
STA spriteTable + 192, X
STA spriteTable + 224, X
DEX
BPL .Loop
RTS
LDA #$FE
.Loop:
STA spriteTable, X
STA spriteTable + 32, X
STA spriteTable + 64, X
STA spriteTable + 96, X
STA spriteTable + 128, X
STA spriteTable + 160, X
STA spriteTable + 192, X
STA spriteTable + 224, X
DEX
BPL .Loop
RTS
I was really surprised to see that this uses roughly half as many cycles, for just a handful of extra bytes of code. I understood in theory that a loop which makes 256 comparisons would take more cycles that a loop that only makes 32, but I had never actually tried it and looked at the difference in speed.
I know this is probably old-hat to many of you, but I'm excited about it. it's enough of a difference to visibly reduce lag in some areas, and I'm very pleased with it.