From the linked page on kkfos.aspect.fi:
Quote:
Use the optimizer (“-Oirs” switch) BUT be aware that it might in some cases produce broken code. One such case is when you read the controllers by strobing $4016, then reading it eight times. The first read is optimized away. Of course when you’re using this library you can simply use read_joy().
True, I/O routines on the NES should usually be written in assembly language because they need to be fast, unlike cc65 output. But why does it optimize out reads even when the joystick ports are declared as
volatile char *? Does it disregard
volatile?
Code:
#define JOY1 (*(volatile unsigned char *)0x4016)
unsigned char broken(void) {
unsigned char presses = 0;
JOY1 = 1;
JOY1 = 0;
for (unsigned char i = 8; i > 0; --i) {
presses = (presses << 1) | ((JOY1 & 0x03) > 0);
}
return presses;
}
But this doesn't work because cc65 still enforces a restriction on the position of variable declarations that was removed from the C standard sixteen years ago.
Code:
#define JOY1 (*(volatile unsigned char *)0x4016)
unsigned char broken(void) {
unsigned char presses = 0;
unsigned char i;
JOY1 = 1;
JOY1 = 0;
for (i = 8; i > 0; --i) {
presses = (presses << 1) | ((JOY1 & 0x03) > 0);
}
return presses;
}
Resulting assembly language with
-Oirs:
Code:
;
; File generated by cc65 v 2.14 - Git N/A
;
.fopt compiler,"cc65 v 2.14 - Git N/A"
.setcpu "6502"
.smart on
.autoimport on
.case on
.debuginfo off
.importzp sp, sreg, regsave, regbank
.importzp tmp1, tmp2, tmp3, tmp4, ptr1, ptr2, ptr3, ptr4
.macpack longbranch
.export _broken
; ---------------------------------------------------------------
; unsigned char __near__ broken (void)
; ---------------------------------------------------------------
.segment "CODE"
.proc _broken: near
.segment "CODE"
lda #$00
jsr pusha
jsr decsp1
lda #$01
sta $4016
lda #$00
sta $4016
lda #$08
ldy #$00
L001A: sta (sp),y
lda (sp),y
beq L000A
iny
ldx #$00
lda (sp),y
asl a
bcc L0019
inx
L0019: jsr pushax
lda $4016
and #$03
jsr boolne
jsr tosora0
ldy #$01
sta (sp),y
dey
lda (sp),y
sec
sbc #$01
jmp L001A
L000A: iny
tax
lda (sp),y
jmp incsp2
.endproc
How many times does this assembly language execute the code at
L0019? Is
cc65 -Oirs respecting or ignoring
volatile?
[ Thematic break ]
But here's why your I/O should be in assembly language. Compare the above to a formal-equivalent translation of the C code that is halfway optimized:
Code:
.export _broken
.proc _broken
lda #1
sta $4016
lda #0
sta $4016
ldy #8
L000P:
tax ; save `pressed` in a register
lda $4016
and #$03
cmp #$01 ; boolne in one instruction!
txa
rol a
dey
bne L000P
rts
.endproc
Not to mention the fact that use of a ring counter, which is impractical in C because of C's lack of any language construct resembling a carry flag, can produce something even more efficient:
Code:
.export _broken
.proc _broken
ldx #1
stx $4016
dex
stx $4016
inx ; init the ring counter to 1
L000P:
lda $4016
and #$03
cmp #$01 ; boolne in one instruction!
txa
rol a
tax
bcc L000P ; 1 shifted left 8 times fills carry
rts
.endproc