I made a 65816 routine (not 100% optimized yet though) for shell sort. I got the gap sequence (1, 4, 10, 23, 57) from several places that say it's the fastest gap sequence on average, but I'm curious about it's worst case scenario because 20 is divisible by both 4 and 10, so some sequences of numbers such as (0,x,x,x,1,x,x,x,2,x,3,x,4,x,x,x,5,x,x,x,6) would slow it down quite a bit, but that probably rarely happen.
Code:
shell_sort:
sep #$30
ldx.b #57
jsr +
ldx.b #23
jsr +
ldx.b #10
jsr +
ldx.b #4
jsr +
ldx.b #1
+;
stx {temp}
ldy #$00
-;
lda {z},x
cmp {z},y
bcs ++
phy
phx
sta {temp2}
lda {entry},x
pha
-;
lda {entry},y
sta {entry},x
lda {z},y
sta {z},x
tyx
tya
cmp {temp}
bcc +
sbc {temp}
tay
lda {temp2}
cmp {z},y
bcc -
+;
lda {temp2}
sta {z},x
pla
sta {entry},x
plx
ply
+;
iny
inx
cpx #$80
bne --
rts
The above code looks at 8-bit numbers. If 8-bit is only 256 values, and my own homebrew game already uses 8 levels of sprite priorities using a linked list system, I can probably just expand it to 256 levels of priorities. Do you think 8-bit is enough precision for Z layering? Because if it is, I don't think I need any sorting algorithm.