I've discovered that a taken non-page-crossing branch ignores IRQ/NMI during its last clock, so that next instruction executes before the IRQ. Other instructions would execute the NMI before the next instruction in this case. This doesn't occur for non-taken branch, or one that crosses a page. It also doesn't occur for JMP. The cpu_interrupts_v2 test on the Wiki now tests this behavior.
I encountered this while improving the new PPU synchronization scheme. I was using a HERE: BCC HERE wait loop for NMI, and was having my NMI occur later than expected. When I changed it back to JMP HERE, it worked fine. It made absolutely no sense, as I thought they were identical. I made sure there was no page crossing, that the carry flag wasn't being set, etc. and finally realized that its timing must actually be different. This behavior is probably already known in 6502 circles, maybe even here, but it was definitely news to me.
The test has an IRQ occur at each cycle within a test sequence, starting at some arbitrary point, and shows how many clocks delayed the IRQ was. T+ is how many clocks since the arbitrary starting point the IRQ was requested, and CK is how many clocks delayed it was, also relative to some arbitrary value. Only the relative values of these matter. PC is the saved PC of the next instruction that was on the stack within the IRQ handler, relative to some starting point. The example code has comments showing the offsets, so you can see where the IRQ was actually vectored.
The first three tests show nothing out of the ordinary, but not the fourth:
The timing looks similar to the NOT taken branch. Note how the IRQ being requested during the last cycle of the BCC doesn't cause an IRQ immediately after (07), but rather after the LDA (0A). So you get a 5-cycle delay for this case, even though there are no 5-cycle instructions in the test sequence.
I encountered this while improving the new PPU synchronization scheme. I was using a HERE: BCC HERE wait loop for NMI, and was having my NMI occur later than expected. When I changed it back to JMP HERE, it worked fine. It made absolutely no sense, as I thought they were identical. I made sure there was no page crossing, that the carry flag wasn't being set, etc. and finally realized that its timing must actually be different. This behavior is probably already known in 6502 circles, maybe even here, but it was definitely news to me.
The test has an IRQ occur at each cycle within a test sequence, starting at some arbitrary point, and shows how many clocks delayed the IRQ was. T+ is how many clocks since the arbitrary starting point the IRQ was requested, and CK is how many clocks delayed it was, also relative to some arbitrary value. Only the relative values of these matter. PC is the saved PC of the next instruction that was on the stack within the IRQ handler, relative to some starting point. The example code has comments showing the offsets, so you can see where the IRQ was actually vectored.
The first three tests show nothing out of the ordinary, but not the fourth:
Code:
nop
; 04
jmp :+
; 07
: nop
; 08
: jmp :-
test_jmp
T+ CK PC
00 02 04 NOP
01 01 04
02 03 07 JMP
03 02 07
04 01 07
05 02 08 NOP
06 01 08
07 03 08 JMP
08 02 08
09 01 08
clc
; 04
bcs :+
; 06
nop
; 07
: lda $100
; 0A
: jmp :-
test_branch_not_taken
T+ CK PC
00 02 04 CLC
01 01 04
02 02 06 BCS
03 01 06
04 02 07 NOP
05 01 07
06 04 0A JMP
07 03 0A
08 02 0A
09 01 0A JMP
clc
; 0D
bcc :+
; 0F
nop
; 00
: lda $100
; 03
: jmp :-
test_branch_taken_pagecross
T+ CK PC
00 02 0D CLC
01 01 0D
02 04 00 BCC
03 03 00
04 02 00
05 01 00
06 04 03 LDA $100
07 03 03
08 02 03
09 01 03
clc
; 04
bcc :+
; 06
nop
; 07
: lda $100
; 0A
: jmp :-
test_branch_taken
T+ CK PC
00 02 04 CLC
01 01 04
02 03 07 BCC
03 02 07
04 05 0A LDA $100 *** This is the special case
05 04 0A
06 03 0A
07 02 0A
08 01 0A
09 03 0A JMP
; 04
jmp :+
; 07
: nop
; 08
: jmp :-
test_jmp
T+ CK PC
00 02 04 NOP
01 01 04
02 03 07 JMP
03 02 07
04 01 07
05 02 08 NOP
06 01 08
07 03 08 JMP
08 02 08
09 01 08
clc
; 04
bcs :+
; 06
nop
; 07
: lda $100
; 0A
: jmp :-
test_branch_not_taken
T+ CK PC
00 02 04 CLC
01 01 04
02 02 06 BCS
03 01 06
04 02 07 NOP
05 01 07
06 04 0A JMP
07 03 0A
08 02 0A
09 01 0A JMP
clc
; 0D
bcc :+
; 0F
nop
; 00
: lda $100
; 03
: jmp :-
test_branch_taken_pagecross
T+ CK PC
00 02 0D CLC
01 01 0D
02 04 00 BCC
03 03 00
04 02 00
05 01 00
06 04 03 LDA $100
07 03 03
08 02 03
09 01 03
clc
; 04
bcc :+
; 06
nop
; 07
: lda $100
; 0A
: jmp :-
test_branch_taken
T+ CK PC
00 02 04 CLC
01 01 04
02 03 07 BCC
03 02 07
04 05 0A LDA $100 *** This is the special case
05 04 0A
06 03 0A
07 02 0A
08 01 0A
09 03 0A JMP
The timing looks similar to the NOT taken branch. Note how the IRQ being requested during the last cycle of the BCC doesn't cause an IRQ immediately after (07), but rather after the LDA (0A). So you get a 5-cycle delay for this case, even though there are no 5-cycle instructions in the test sequence.