Can somebody give me a simple (6502 is my first foray into programming) explanation of what loop unrolling is? These concepts are quite hard to understand because (it seems) a lot of prior programming knowledge is assumed in many online explanations. Makes sense though I guess, why would someone with no programming experience care about what loop unrolling is?
At the risk of sounding like a right numpty, is it something like this? (original code from Nerdy Nights 5)
The way I see it (with my very limited math skills) instead of repeating the loop 15 times and loading 1 address each iteration, you load 4 addresses. This way, the loop is only repeated 4 times (4*4=16).
I work out the total clock cycles for the first loop to be 18 if the branch is taken, and 17 if the branch is not taken. So 18*15=270 and then add 17 for the last run through in which the branch is not taken 270+17=287.
I work out the second loop as taking 51 cycles if the branch is taken, 50 if the branch is not taken. So 51*3=153 and then add 50 for the last run through in which the branch is not taken 153+50=203.
Therefore, ya save 84 cycles?
The last time I did any serious math (though I doubt you guys would consider this serious) was when I was in school, so please, be nice!
Thanks y'all
At the risk of sounding like a right numpty, is it something like this? (original code from Nerdy Nights 5)
Code:
;INSTEAD OF DOING THIS
LoadSprites:
LDX #$00 ; start at 0
LoadSpritesLoop:
LDA sprites, x ; load data from address (sprites + x)
STA $0200, x ; store into RAM address ($0200 + x)
INX ; X = X + 1
CPX #$10 ; Compare X to hex $10, decimal 16
BNE LoadSpritesLoop ; Branch to LoadSpritesLoop if compare was Not Equal to zero
; if compare was equal to 16, continue down
;DO THIS?
LoadSprites:
LDX #$00 ; start at 0
LoadSpritesLoop:
LDA sprites, x ; load data from address (sprites + x)
STA $0200, x ; store into RAM address ($0200 + x)
INX ; X = X + 1
LDA sprites, x ; load data from address (sprites + x)
STA $0200, x ; store into RAM address ($0200 + x)
INX ; X = X + 2
LDA sprites, x ; load data from address (sprites + x)
STA $0200, x ; store into RAM address ($0200 + x)
INX ; X = X + 3
LDA sprites, x ; load data from address (sprites + x)
STA $0200, x ; store into RAM address ($0200 + x)
INX ; X = X + 4
CPX #$10 ; Compare X to hex $10, decimal 16
BNE LoadSpritesLoop ; Branch to LoadSpritesLoop if compare was Not Equal to zero
; if compare was equal to 16, continue down
LoadSprites:
LDX #$00 ; start at 0
LoadSpritesLoop:
LDA sprites, x ; load data from address (sprites + x)
STA $0200, x ; store into RAM address ($0200 + x)
INX ; X = X + 1
CPX #$10 ; Compare X to hex $10, decimal 16
BNE LoadSpritesLoop ; Branch to LoadSpritesLoop if compare was Not Equal to zero
; if compare was equal to 16, continue down
;DO THIS?
LoadSprites:
LDX #$00 ; start at 0
LoadSpritesLoop:
LDA sprites, x ; load data from address (sprites + x)
STA $0200, x ; store into RAM address ($0200 + x)
INX ; X = X + 1
LDA sprites, x ; load data from address (sprites + x)
STA $0200, x ; store into RAM address ($0200 + x)
INX ; X = X + 2
LDA sprites, x ; load data from address (sprites + x)
STA $0200, x ; store into RAM address ($0200 + x)
INX ; X = X + 3
LDA sprites, x ; load data from address (sprites + x)
STA $0200, x ; store into RAM address ($0200 + x)
INX ; X = X + 4
CPX #$10 ; Compare X to hex $10, decimal 16
BNE LoadSpritesLoop ; Branch to LoadSpritesLoop if compare was Not Equal to zero
; if compare was equal to 16, continue down
The way I see it (with my very limited math skills) instead of repeating the loop 15 times and loading 1 address each iteration, you load 4 addresses. This way, the loop is only repeated 4 times (4*4=16).
I work out the total clock cycles for the first loop to be 18 if the branch is taken, and 17 if the branch is not taken. So 18*15=270 and then add 17 for the last run through in which the branch is not taken 270+17=287.
I work out the second loop as taking 51 cycles if the branch is taken, 50 if the branch is not taken. So 51*3=153 and then add 50 for the last run through in which the branch is not taken 153+50=203.
Therefore, ya save 84 cycles?
The last time I did any serious math (though I doubt you guys would consider this serious) was when I was in school, so please, be nice!
Thanks y'all