Last Week:
Opcodes and LoopingThis Week: More opcodes: Finite Loops, Key Changes, Chord Progressions
OpcodesLast
week we learned how to use opcodes. Opcodes allow a song's streams to
call a subroutine mid-play. This is a very powerful tool. We learned
some of the most common opcodes: infinite loop (really a jump), change
volume envelopes and change duty cycles. Today we are going to expand
on opcodes and learn some cool opcode tricks that can save us a lot (!)
of bytes and time.
Finite LoopingLast week we added the infinite loop
opcode, which was really just an unconditional jump back to an earlier
part of the song. Today we're going to add a finite loop opcode. A
finite loop opcode tells the sound engine to repeat a particular
section of a song X times, where X is some number defined by you. In
the Battle Kid theme song I added last week there is a passage that
looks like this:
.byte sixteenth .byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4 .byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4 .byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4 .byte A3, C4, E4, A4, A3, E4, E3, E2 This
is really just the same 4 notes repeated over and over again. Wouldn't
it be cooler if we could do something like this instead:
.byte sixteenth .byte A3, C4, E4, A4 .byte loop_13_times_please .byte A3, E4, E3, E2 That
saves a lot of bytes. We go from 56 bytes all the way down to around
10. The Battle Kid song actually plays this same phrase on both square
channels, so really we go from 100+ bytes down to 20 or so. That's a
big deal! If we consider how common repetitions of 4 or 8 occur in
music, we can easily see that having a finite loop opcode could
potential save us hundreds if not thousands of bytes in our sound data.
Finite Looping?So
what is a finite loop really? We saw that with an infinite loop it was
really more like an unconditional jump. When the sound engine hits the
infinite loop opcode, it jumps back, always, no matter what, no
questions asked. A finite loop on the other hand is a conditional
jump. It checks a counter. If the counter isn't 0 it jumps. If it is
0, it doesn't jump.
Loop CounterFirst things first we need a loop counter. Each
stream will have the ability to loop, so each stream will need its own
loop counter:
stream_loop1 .rs 6 ;loop counter variable (one for each stream)We will want to initialize this to 0 in our sound_load code:
lda #$00 sta stream_loop1, xNext
we will need a way to set this counter to some value. Some games
bundle this up together in the finite loop opcode, but I prefer to make
it its own opcode:
;-----------------------------------------------------------------------;this is our JUMP TABLE!sound_opcodes: .word se_op_endsound ;$A0 .word se_op_infinite_loop ;$A1 .word se_op_change_ve ;$A2 .word se_op_duty ;$A3 .word se_op_set_loop1_counter ;$A4 ;etc, one entry per subroutine ;these are aliases to use in the sound data.endsound = $A0loop = $A1volume_envelope = $A2duty = $A3set_loop1_counter = $A4 se_op_set_loop1_counter: lda [sound_ptr], y ;read the argument (# times to loop) sta stream_loop1, x ;store it in the loop counter variable rtsNow we have an easy way to set the loop counter any time we want, like this:
;somewhere in sound data: .byte set_loop1_counter, $04 ;repeat 4 times
Looping With The CounterOur finite loop opcode will work like the infinite loop opcode, with two changes:
1) it will decrement the loop counter
2) it will check the result and only jump on a non-zero result
Let's write it:
;-----------------------------------------------------------------------;this is our JUMP TABLE!sound_opcodes: .word se_op_endsound ;$A0 .word se_op_infinite_loop ;$A1 .word se_op_change_ve ;$A2 .word se_op_duty ;$A3 .word se_op_set_loop1_counter ;$A4 .word se_op_loop1 ;$A5 ;etc, one entry per subroutine ;these are aliases to use in the sound data.endsound = $A0loop = $A1volume_envelope = $A2duty = $A3set_loop1_counter = $A4loop1 = $A5 se_op_loop1: dec stream_loop1, x ;decrement the counter lda stream_loop1, x ;and check it beq .last_iteration ;if zero, we are done looping.loop_back: lda [sound_ptr], y ;read ptr LO from the data stream sta stream_ptr_LO, x ;update our data stream position iny lda [sound_ptr], y ;read ptr HI from the data stream sta stream_ptr_HI, x ;update our data stream position sta sound_ptr+1 ;update the pointer to reflect the new position. lda stream_ptr_LO, x sta sound_ptr ldy #$FF ;after opcodes return, we do an iny. Since we reset ;the stream buffer position, we will want y to start out at 0 again. rts.last_iteration: iny ;skip the first byte of the address argument ; the second byte will be skipped automatically upon return ; (see se_fetch_byte. There is an "iny" after "jsr se_opcode_launcher") rts Now we can loop. To use the Battle Kid example above, we go from this (56 bytes):
.byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4 .byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4 .byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4 .byte A3, C4, E4, A4, A3, E4, E3, E2 to this (13 bytes):
.byte set_loop1_counter, 13 ;repeat 13 times..intro_loop: ;make sure our loop point is AFTER we set the counter! .byte A3, C4, E4, A4 ;the phrase to repeat. .byte loop1 ;finite loop opcode .word .intro_loop ;address to jump back to .byte A3, E4, E3, E2 ;the last 4 notes Pretty nice savings. Chances are we will be using this opcode set a lot.
BonusWe
can save a few more bytes here. You may have noticed that the code in
the .loop_back section of our finite loop opcode is identical to the
infinite loop code:
se_op_loop1: ;---snip--- .loop_back: lda [sound_ptr], y ;read ptr LO from the data stream sta stream_ptr_LO, x ;update our data stream position iny lda [sound_ptr], y ;read ptr HI from the data stream sta stream_ptr_HI, x ;update our data stream position sta sound_ptr+1 ;update the pointer to reflect the new position. lda stream_ptr_LO, x sta sound_ptr ldy #$FF ;after opcodes return, we do an iny. Since we reset ;the stream buffer position, we will want y to start out at 0 again. rts ;---snip--- Compare with:
se_op_infinite_loop: lda [sound_ptr], y ;read ptr LO from the data stream sta stream_ptr_LO, x ;update our data stream position iny lda [sound_ptr], y ;read ptr HI from the data stream sta stream_ptr_HI, x ;update our data stream position sta sound_ptr+1 ;update the pointer to reflect the new position. lda stream_ptr_LO, x sta sound_ptr ldy #$FF ;after opcodes return, we do an iny. Since we reset ;the stream buffer position, we will want y to start out at 0 again. rts Why
have identical code in two places? Let's cut out the whole .loop_back
section and replace it with a "jmp se_op_infinite_loop":
se_op_loop1: dec stream_loop1, x ;decrement the counter lda stream_loop1, x ;check the counter beq .last_iteration ;if zero, we are done looping jmp se_op_infinite_loop ;if not zero, loop back.last_iteration: iny ;skip the first byte of the address argument ; the second byte will be skipped automatically upon return ; (see se_fetch_byte after "jsr se_opcode_launcher") rts Multiple Finite LoopsYou
may have been wondering why I named the finite loop opcode "loop1".
Why stick a 1 on the end there? This is because sometimes one finite
loop opcode isn't enough. Consider the following song structure.
Assume each letter represents a long series of notes:
A A A B C A A A B C A A A B C A A A B C With one finite loop opcode you could reduce it to this:
(A A A B C)x4 But if you had two finite loop opcodes available, you could nest them to reduce it even further:
(Ax3 B C)x4 If
the music you write has a lot of patterns like this, it may be worth
your while to have two or more finite loop opcodes available to you so
that you can nest them. To add another finite loop opcode you need to:
1) declare another loop counter variable block in RAM (stream_loop2 .rs 6)
2) initialize the new loop counter to 0 in the sound_load routine.
3) add a new opcode for setting the new loop counter (se_op_set_loop2_counter)
4) add a new opcode to check the new counter and loop (se_op_loop2)
5) make sure to add the new opcodes to the jump table and give them an alias (set_loop2_counter, loop2).
Each
finite loop opcode you add requires 6 bytes of RAM (a limited
resource!), so please consider carefully if it is worth the tradeoff.
It all depends on your music data.
Changing KeysAnother useful
feature to have is the ability to change keys. Imagine you write a
song and you have it all done. Then at the last minute you decide you
want it to be in another key, say a step (2 notes) lower. Rather than
rewrite the whole song by hand (it takes forever), wouldn't it be nice
if there was an opcode that you could set to automatically subtract two
from every note? What if you have a song pattern that gets played in
more than one key (a rhythm track for a Blues song, for example)? We
could save lots of bytes if we can figure out a way to write the
pattern once, and then loop it while changing keys each iteration.
Let's do it.
Note OffsetWe will implement keys by having a note offset variable:
stream_note_offset .rs 6 ;note offsetThe
note offset is a value that gets added to the note value before pulling
the period out of the note_table. We will initialize
stream_note_offset to 0 so that the default behavior is to add 0 to the
note (resulting in no change). However, if we set stream_note_offset
to some value via an opcode, it will change the notes. Here is an
updated se_fetch_byte that demonstrates how this works:
se_fetch_byte: ;...snip....note: ;do Note stuff sty sound_temp1 ;save our index into the data stream clc adc stream_note_offset, x ;add note offset asl a tay lda note_table, y sta stream_note_LO, x lda note_table+1, y sta stream_note_HI, x ldy sound_temp1 ;restore data stream index ;...snip... Imagine what would happen if we have stream_note_offset set to 2. Say we read a C4 note from the data stream:
1. A C4 note is equivalent to hex value #$1b (see aliases in note_table.i)
2. we add stream_note_offset to this value. #$1b + #$02 = #$1d.
3. hex value #$1d is equivalent to a D4 note (see note_table.i)
4. wow, we raised the note up a step!
Using the same value for stream_note_offset, if we had a string of notes like this:
C4, E4, G4, B4, C5, E5, G5, E5, B5, C6 ;Cmaj7it would get translated to:
D4, Fs4, A4, C#5, D5, Fs5, A5, C#6, D6 ;Dmaj7Using
stream_note_offset we can easily transpose entire sections of music
into other keys. As mentioned above, we will initialize a stream's
stream_note_offset to zero:
sound_load: ;---snip--- lda #$00 sta stream_note_offset, x ;---snip--- Set Note OffsetNow let's make an opcode that will set stream_note_offset to a specific value:
;-----------------------------------------------------------------------;this is our JUMP TABLE!sound_opcodes: .word se_op_endsound ;$A0 .word se_op_infinite_loop ;$A1 .word se_op_change_ve ;$A2 .word se_op_duty ;$A3 .word se_op_set_loop1_counter ;$A4 .word se_op_loop1 ;$A5 .word se_op_set_note_offset ;$A6 ;these are aliases to use in the sound data.endsound = $A0loop = $A1volume_envelope = $A2duty = $A3set_loop1_counter = $A4loop1 = $A5set_note_offset = $A6se_op_set_note_offset: lda [sound_ptr], y ;read the argument sta stream_note_offset, x ;set the note offset. rtsNow we can set the note offset anytime we want in the data stream:
;oops, after writing the song, I realized I wanted it to be in D instead. No problem.sound_data: .byte set_note_offset, 2 .byte C2, C3, C4, C5, ;etc.. more notes in the key of C. Adjust Note OffsetSetting
the note offset to a specific value has very limited application. It's
like a one-time keychange. More often we will want to set the note
offset to some relative value. For example, instead of setting
stream_note_offset to 2, we might want to set stream_note_offset to
"the current offset + 2". If we had an opcode that let us adjust
stream_note_offset by a relative value, we could use it together with
loops. First let's write the opcode:
;-----------------------------------------------------------------------;this is our JUMP TABLE!sound_opcodes: .word se_op_endsound ;$A0 .word se_op_infinite_loop ;$A1 .word se_op_change_ve ;$A2 .word se_op_duty ;$A3 .word se_op_set_loop1_counter ;$A4 .word se_op_loop1 ;$A5 .word se_op_set_note_offset ;$A6 .word se_op_adjust_note_offset ;$A7 ;these are aliases to use in the sound data.endsound = $A0loop = $A1volume_envelope = $A2duty = $A3set_loop1_counter = $A4loop1 = $A5set_note_offset = $A6adjust_note_offset = $A7se_op_adjust_note_offset: lda [sound_ptr], y ;read the argument (what value to add) clc adc stream_note_offset, x ;add it to the current offset sta stream_note_offset, x ;and save. rts Let's look at this opcode in use. Say we have a long arpeggiated line like this:
C2, E2, G2, B2, C3, E3, G3, B3, C4, E4, G4, B4, C5, E5, G5, B5, C6, E6, G6, B6, C7 ;Cmaj7 (21 bytes)This passage just repeats the same 4 notes (C E G B) over 5 octaves.
.byte set_loop1_counter, 5 ;loop 5 times.loop .byte C2, E2, G2, B2 ;these are the 4 notes to loop .byte adjust_note_offset, 12 ;each iteration add 12 to the offset (ie, go up an octave) .byte loop1 .word .loop .byte C2 ;will be a C7. Cmaj7 (12 bytes) The
first time through the loop it will play C2, E2, G2, B2. The second
time through the loop it will play C3, E3, G3, B3. The third time
through will be C4, E4, G4, B4, etc. Using our opcodes, we reduce the
size of our data from 21 bytes to 12 bytes. That's almost 50% savings.
Battle KidTo take a better example, let's look at the bassline to the Battle Kid theme song. Last week, it looked like this:
song6_tri: .byte eighth .byte A3, A3, A4, A4, A3, A3, A4, A4 .byte G3, G3, G4, G4, G3, G3, G4, G4 ;down a step (-2) .byte F3, F3, F4, F4, F3, F3, F4, F4 ;down a step (-2) .byte Eb3, Eb3, Eb4, Eb4, Eb3, Eb3, Eb4, Eb4 ;down a step (-2) .byte loop .word song6_tri ;36 bytes We have a pattern here:
X3, X3, X4, X4, X3, X3, X4, X4,
where X = some note. It just so happens that each new X is just the
previous X minus 2. Using our new opcode, we can rewrite the bassline
like this:
song6_tri: .byte eighth .byte set_loop1_counter, 4 ;repeat 4 times.loop: .byte A3, A3, A4, A4, A3, A3, A4, A4 ;series of notes to repeat .byte adjust_note_offset, -2 ;go down a step .byte loop1 .word .loop .byte set_note_offset, 0 ;after 4 repeats, reset note offset to 0. .byte loop ;infinite loop .word song6_tri ;21 bytes We drop from 36 bytes to 21 bytes of ROM space. About 40% savings!
Loopy Sound EffectsWe can produce some cool sound effects if we combine loops and key changes at high tempos. Look at this one (tempo is $FF):
song7_square2: .byte set_loop1_counter, $08 ;repeat 8 times.loop: .byte thirtysecond, D7, D6, G6 ;play two D notes at different octaves and a G. Pretty random .byte adjust_note_offset, -4 ;go down 2 steps .byte loop1 .word .loop .byte endsound This
sound effect plays a simple 3-note pattern in descending keys super
fast. The sound data is only 12 bytes, but it produces a pretty
complex sound effect. Listen to song7 in this week's sample files to
hear it. By experimenting with loops like this we can come up with
some sounds that would be difficult to compose by hand.
Complex Chord ProgressionsWe
made some good savings percentage-wise on the bassline to Battle Kid.
But we were lucky. The chord progression went down in consistent
steps: -2, -2, -2. It was possible to loop this because we adjust the
note_offset by the same value (-2) each time. But what if we had a
pattern that was repeated in a more complicated way? We do. Let's
look at the rhythm pattern for our Guardian Legend boss song:
song1_square1: .byte eighth .byte A2, A2, A2, A3, A2, A3, A2, A3 .byte F3, F3, F3, F4, F3, F4, F3, F4 ;+8 (A2 + 8 = F3) .byte A2, A2, A2, A3, A2, A3, A2, A3 ;-8 .byte F3, F3, F3, F4, F3, F4, F3, F4 ;+8 .byte E3, E3, E3, E4, E3, E4, E3, E4 ;-1 .byte E3, E3, E3, E4, E3, E4, E3, E4 ;+0 .byte Ds3, Ds3, Ds3, Ds4, Ds3, Ds4, Ds3, Ds4 ;-1 .byte D3, D3, D3, D4, D3, D4, D3, D4 ;-1 .byte C3, C3, C3, C4, C3, C4, C3, C4 ;-2 .byte B2, B2, B2, B3, B2, B3, B2, B3 ;-1 .byte As2, As2, As2, As3, As2, As3, As2, As3 ;-1 .byte A2, A2, A2, A3, A2, A3, A2, A3 ;-1 .byte Gs2, Gs2, Gs2, Gs3, Gs2, Gs3, Gs2, Gs3 ;-1 .byte G2, G2, G2, G3, G2, G3, G2, G3 ;-1 .byte loop ;+2 (loop back to A2) .word song1_square1 Here
we have another pattern: Xi, Xi, Xi, Xi+1, Xi, Xi+1, Xi, Xi+1, where X
= some note and i = some octave. Cool. A pattern means we have an
opportunity to save bytes by looping. But wait. Unlike Battle Kid,
this pattern jumps around in an inconsistent way. What should we do?
Super TGL Transposition TrickI learned this trick from The Guardian Legend, so I call it the
TGL Transposition Trick.
What we do is we loop the pattern, and then use the loop counter as an
index into a lookup table. The lookup table contains note offset
values. Because the loop counter decrements, our lookup table will be
sequentially backwards.
Wait, what? Let's looks at our example:
song1_square1: .byte eighth .byte set_loop1_counter, 14 ;repeat 14 times.loop: .byte A2, A2, A2, A3, A2, A3, A2, A3 ;pull a value from lookup_table and ; add it to stream_note_offset .byte loop1 ;finite loop (14 times) .word .loop .byte loop ;infinite loop .word song1_square1 .lookup_table: .byte 2, -1, -1, -1, -1, -1, -2 .byte -1, -1, 0, -1, 8, -8, 8 ;14 entries long, reverse order I'm
going to break it down in a second here, but first let me tell you that
the part highlighted in red above will be covered by a single opcode,
transpose. The transpose opcode takes a 2-byte argument, so altogether
that commented section will be replaced with 3 bytes of data. So if we
count up all of the bytes in our rhythm sound data we get 34 bytes.
The original was 116 bytes. By using the TGL Transposition Trick, we
save 82 bytes. That's 70%!
song1_square1: .byte eighth .byte set_loop1_counter, 14 ;repeat 14 times.loop: .byte A2, A2, A2, A3, A2, A3, A2, A3 .byte transpose ;the transpose opcode take a 2-byte argument .word .lookup_table ;which is the address of the lookup table .byte loop1 ;finite loop (14 times) .word .loop .byte loop ;infinite loop .word song1_square1 .lookup_table: .byte 2, -1, -1, -1, -1, -1, -2 .byte -1, -1, 0, -1, 8, -8, 8 ;14 entries long, reverse order;*** altogether 34 bytes ***The
transpose opcode will set up a pointer variable to point to the lookup
table. Then it will take the loop counter, subtract 1, and use the
result as an index into the table. We subtract 1 because the tables
index from zero. If we loop 14 times, our table will have 14 entries
numbered 0-13. Once the transpose opcode has its index, it will pull a
value from the table. This value will be added to stream_note_offset.
Before
we write the opcode, let's trace through the data to see how it works.
We'll start at the very first byte of song1_square1:
1) set note length to eighth notes
2) set the loop counter to 14
(.loop iteration 1)
3) play a series of notes: A2, A2, A2, A3, A2, A3, A2, A3
4)
transpose opcode. Setup a pointer to lookup_table. Use our loop
counter, minus one, as an index. The loop counter is 14 now, so we
will pull out .lookup_table+13, which is an 8. Add 8 to the current
stream_note_offset: 0 + 8 = 8.
5) decrement the loop counter (14->13) and loop back to the .loop label
(iteration 2)
6) our new string of notes with the +8: F3, F3, F3, F4, F3, F4, F3, F4.
7) transpose opcode. Loop counter is 13. Grab .lookup_table+12, which is -8. Add -8 to stream_note_offset: 8 + -8 = 0.
8) decrement loop counter (13->12) and loop back to .loop label
(iteration 3)
9) our new string of notes with the +0: A2, A2, A2, A3, A2, A3, A2, A3
10) transpose opcode. Loop counter is 12. Grab .lookup_table+11, which is 8. Add 8 to stream_note_offset: 0 + 8 = 8.
11) decrement loop counter (12->11) and loop back to .loop label
(iteration 4)
12) our new string of notes with the +8: F3, F3, F3, F4, F3, F4, F3, F4.
13) transpose opcode. Loop counter is 11. Grab .lookup_table+10, which is -1. Add -1 to stream_note_offset: 8 + -1 = 7.
14) decrement loop counter (11->10) and loop back to .loop label
(iteration 4)
15) our new string of notes with the +7: E3, E3, E3, E4, E3, E4, E3, E4.
16) transpose opcode. Loop counter is 10. Grab .lookup_table+9, which is 0. Add 0 to stream_note_offset: 7 + 0 = 7.
17) decrement loop counter (10->9) and loop back to .loop label
etc.
On the last iteration our loop counter is 1. We grab .lookup_table+0
and add it to stream_note_offset. Then we decrement the loop counter
(1->0). Our loop counter is now 0, so our loop breaks. Pretty
cool, no? Let's write it.
;-----------------------------------------------------------------------;this is our JUMP TABLE!sound_opcodes: .word se_op_endsound ;$A0 .word se_op_infinite_loop ;$A1 .word se_op_change_ve ;$A2 .word se_op_duty ;$A3 .word se_op_set_loop1_counter ;$A4 .word se_op_loop1 ;$A5 .word se_op_set_note_offset ;$A6 .word se_op_adjust_note_offset ;$A7 .word se_op_transpose ;$A8 ;these are aliases to use in the sound data.endsound = $A0loop = $A1volume_envelope = $A2duty = $A3set_loop1_counter = $A4loop1 = $A5set_note_offset = $A6adjust_note_offset = $A7transpose = $A8se_op_transpose: lda [sound_ptr], y ;read low byte of the pointer to our lookup table sta sound_ptr2 ;store it in a new pointer variable iny lda [sound_ptr], y ;read high byte of pointer to table sta sound_ptr2+1 sty sound_temp ;save y because we are about to destroy it lda stream_loop1, x ;get loop counter, put it in Y tay ; this will be our index into the lookup table dey ;subtract 1 because indexes start from 0. lda [sound_ptr2], y ;read a value from the table. clc adc stream_note_offset, x ;add it to the note offset sta stream_note_offset, x ldy sound_temp ;restore Y rts There
is a new pointer variable here, sound_ptr2. Actually, what I really
did was rename jmp_ptr to sound_ptr2. The new name let's me know it's
for sound engine use only. Since we finish with jmp_ptr as soon as we
jump, there are no pointer conflicts here.
ConclusionThis
is just an example of how clever use of opcodes and looping can save
you lots of bytes. Keep in mind that this transpose opcode is only
useful if you write music that has repeating patterns in the rhythm
section. If you don't, then save yourself some bytes and cut the
opcode from your sound engine.
Putting It All TogetherDownload and unzip the
opcodes2.zip sample files. Make sure the following files are in the same folder as NESASM3:
opcodes2.asm
sound_engine.asm
sound_opcodes.asm
opcodes2.chr
note_table.i
note_length_table.i
vol_envelopes.i
song0.i
song1.i
song2.i
song3.i
song4.i
song5.i
song6.i
song7.i
opcodes2.bat
Double click opcodes2.bat. That will run NESASM3 and should produce the opcodes2.nes file. Run that NES file in FCEUXD SP.
Use the controller to select songs and play them. Controls are as follows:
Up: Play
Down: Stop
Right : Next Song/SFX
Left : Previous Song/SFX
Song0 is a silence song. Not selectable.
Song1-Song6 are the same as last week, but they take up less ROM-space now
Song7 is a new sound effect created by looping a key change at high tempo.
As usual, try adding your own songs and sound effects in using the new opcodes. Experiment.
Next Week:
Noise, Simple Drums