The CMP #$80 and CMP #$01 trick is definitely awesome.
Another one is that if you have a jsr followed by a rts, replace it by a jmp. It saves one byte each time but it can end up a lot of bytes in a large program.
Another thing I like to do is to avoid branches. For example, if I want A = #$04 if C is clear, and A = $06 if C is set, I could do something like :
Code:
bcc +
lda #$06
bcs ++
+ lda #$04
++ ....
But I find it less elegant than noticing that A = $04 + 2*C and coding it like that :
Code:
lda #$00
rol A
asl A
ora #$04
....
Note that this also show off you can sometimes use ORA instead of ADC -
if you know the values that are added together never have the same bits set - and don't have to deal with the carry.
Finally, whenever you use LU tables, be sure to have them pre-formatted as much as possible. I'd avoid doing stuff like that :
Code:
lda LUTable,X
asl A
asl A
clc
adc #$03
sta Wathever,Y
LUTable
.db Val_1, Val_2, Val_3
But do it like this instead :
Code:
lda LUTable,X
sta Wathever,Y
LUTable
.db Val_1*4+3, Val_2*4+3, Val_3*4+3
It can seem obvious put that way, but trust me you don't always think about it.
Finally, it's usually good practice if you have table of pointers, to split it between high and low tables. Instead of doing this :
Code:
asl A
tax
lda Adr,X
sta PointerL
lda Adr+1,X
sta PointerH
ldy #$00
lda (Pointer),Y
....
Adr
.dw Adr1, Adr2, Adr3, ...
You do it like this :
Code:
tax
lda AdrL,X
sta PointerL
lda AdrH,X
sta PointerH
ldy #$00
lda (Pointer),Y
....
AdrL
.db <Adr1, <Adr2, <Adr3
AdrH
.db >Adr1, >Adr2, >Adr3
However, I admit I don't always do that, because it only saves 1 byte (the ASL a, possibly a second if the routine is called with the index already in X), and it's very annoying to split long tables manually to save just ONE byte.
However, if the data is small, all high adress are likely to be equal. If the data pointed by AdrN is less than 256 bytes, you can take advantage of this if your assembler support align and do that :
Code:
tax
lda AdrL,X
sta PointerL
lda #>Adr1 ;Same high adress for all pointers
sta PointerH
ldy #$00
lda (Pointer),Y
....
AdrL
.db <Adr1, <Adr2, <Adr3
Then it's also likely that <Adr1 is equal to $00 (I don't know if there's a way to take advantage of it).
Also, I don't know if there is a way to do something like that but for data larger than 256 bytes. Something evil woud be to do a loop that seeks all entries of the table up to the one requested, and increment high byte when an entry in the low table is smaller than the previous entry. A variant of it would be to have a 8-bit size table (instead of a 16-bit pointer table) and add all values up to the one requested to find the address. I don't know how many bytes this would save, but it'd definitely be slower. That's an interesting concept for NROM or CNROM games though.