I just made a change to two functions that are called exactly once or twice per frame. They basically just load the high/low byte of an address so I can load indirectly from it. All I did was update them to a slightly newer format.
But here's the kicker: Before I made the change, my entire game runs a frame in 8,000-12,000 cycles in one situation.
After the change? 29,765-33,712 in the same situation.
I'm fairly sure the extra time is not within the logic of the subroutines I changed, but in case you don't believe me, here two are:
Before:
After:
The other subroutine I changed was more or less identical, except it loads the fifth byte stream for my 16x16 tiles and stores it.
Basically what I've done is added an extra layer of abstraction, so every bank can have its own variable number of sets. I load from the header which is in a fixed position in all banks, and then I can get the addresses of each set of elements of each tile set, which could be anywhere in the bank since those can be any size.
Note: I'm actually still on NROM, I'm just preparing for the eventual move.
I've looked at some things this could be. The new one uses two temp variables, and X. I looked around, and I can't see anything that expects X or these temp RAM variables to have a certain value. I first thought maybe this was called in the middle of some loop, and this function was making it run A LOT.
But in fact, all three are actually set to expected values almost immediately after these subroutines are called.
The only other thing I can think of is that this change pushed a piece of, like, every SINGLE of my loops passed a page boundary. I certainly don't have any single loops running the THOUSANDS of times this would require.
I realize this isn't much to go on, but I have no idea. This is the only change between the two versions. Has anybody had a similar experience? I'm looking for debugging tips, I guess. Maybe ways to detect when an instruction passes a cycle boundary or any other clever things you guys might think this could be.
This has actually happened to me before. A few large changes = skyrocketed CPU performance, but back then I wasn't even sure EXACTLY what was added. And, as I added more code it just went away.
Now I know exactly what I changed, and want to know what this is about.
But here's the kicker: Before I made the change, my entire game runs a frame in 8,000-12,000 cycles in one situation.
After the change? 29,765-33,712 in the same situation.
I'm fairly sure the extra time is not within the logic of the subroutines I changed, but in case you don't believe me, here two are:
Before:
Code:
load16metatileaddresses:;{
ldy #$00;Replace with 16 by 16 tileset number
;Loading all the addresses. Shouldn't need to change.
lda meta16addrs0high,y
sta <dress9
lda meta16addrs0low,y
sta <dress8
lda meta16addrs1high,y
sta <dressB
lda meta16addrs1low,y
sta <dressA
lda meta16addrs2high,y
sta <dressD
lda meta16addrs2low,y
sta <dressC
lda meta16addrs3high,y
sta <dressF
lda meta16addrs3low,y
sta <dressE
rts;}
ldy #$00;Replace with 16 by 16 tileset number
;Loading all the addresses. Shouldn't need to change.
lda meta16addrs0high,y
sta <dress9
lda meta16addrs0low,y
sta <dress8
lda meta16addrs1high,y
sta <dressB
lda meta16addrs1low,y
sta <dressA
lda meta16addrs2high,y
sta <dressD
lda meta16addrs2low,y
sta <dressC
lda meta16addrs3high,y
sta <dressF
lda meta16addrs3low,y
sta <dressE
rts;}
After:
Code:
load16metatileaddresses:;{
ldx #$00;Replace with 16 by 16 tileset number
;Loading all the addresses. Shouldn't need to change.
lda met16header
sta <reserved0;Temp RAM
lda met16header+1
sta <reserved1;Temp RAM
lda #$F6
l16metaloop:
clc
adc #$0A;Number of elements in the set*2
dex
bpl l16metaloop
tay
lda [reserved0],y
sta <dress8
iny
lda [reserved0],y
sta <dress9
iny
lda [reserved0],y
sta <dressA
iny
lda [reserved0],y
sta <dressB
iny
lda [reserved0],y
sta <dressC
iny
lda [reserved0],y
sta <dressD
iny
lda [reserved0],y
sta <dressE
iny
lda [reserved0],y
sta <dressF
rts;}
ldx #$00;Replace with 16 by 16 tileset number
;Loading all the addresses. Shouldn't need to change.
lda met16header
sta <reserved0;Temp RAM
lda met16header+1
sta <reserved1;Temp RAM
lda #$F6
l16metaloop:
clc
adc #$0A;Number of elements in the set*2
dex
bpl l16metaloop
tay
lda [reserved0],y
sta <dress8
iny
lda [reserved0],y
sta <dress9
iny
lda [reserved0],y
sta <dressA
iny
lda [reserved0],y
sta <dressB
iny
lda [reserved0],y
sta <dressC
iny
lda [reserved0],y
sta <dressD
iny
lda [reserved0],y
sta <dressE
iny
lda [reserved0],y
sta <dressF
rts;}
The other subroutine I changed was more or less identical, except it loads the fifth byte stream for my 16x16 tiles and stores it.
Basically what I've done is added an extra layer of abstraction, so every bank can have its own variable number of sets. I load from the header which is in a fixed position in all banks, and then I can get the addresses of each set of elements of each tile set, which could be anywhere in the bank since those can be any size.
Note: I'm actually still on NROM, I'm just preparing for the eventual move.
I've looked at some things this could be. The new one uses two temp variables, and X. I looked around, and I can't see anything that expects X or these temp RAM variables to have a certain value. I first thought maybe this was called in the middle of some loop, and this function was making it run A LOT.
But in fact, all three are actually set to expected values almost immediately after these subroutines are called.
The only other thing I can think of is that this change pushed a piece of, like, every SINGLE of my loops passed a page boundary. I certainly don't have any single loops running the THOUSANDS of times this would require.
I realize this isn't much to go on, but I have no idea. This is the only change between the two versions. Has anybody had a similar experience? I'm looking for debugging tips, I guess. Maybe ways to detect when an instruction passes a cycle boundary or any other clever things you guys might think this could be.
This has actually happened to me before. A few large changes = skyrocketed CPU performance, but back then I wasn't even sure EXACTLY what was added. And, as I added more code it just went away.
Now I know exactly what I changed, and want to know what this is about.