Hey! It's been a while since I last posted anything, mainly because I never had time to even attempt working on anything, but then I realised that these things take a while anyway to make.
I don't exactly know what I'm after, maybe all I'm gonna do is just a simple sound engine, maybe more, but I have a couple questions before I go deeper (the only things I've made so far were simple DPCM playback and a ROM that plays a pitch sweeping noise sound, so at least I understand how game logic and NMI have to cooperate)
So what I want to make for now is a simple sound driver, if possible I'm trying to go for just a 32k NROM program, and if the size of the individual programs gets too large (assuming that I'm going for a full game), switch to 128k UNROM (preferably without CHR-RAM).
I've spent a lot of time drawing and sketching my engine on paper, and I know how everything is supposed to work out (bitflags, manipulation of shadow registers, etc. )
but still, there are a couple of features I'm not sure how to implement (or if it's possible to implement them at all)
So no code here yet folks, just my ideas of how I want to do these things.
What I want to know ultimately, is whether this is a good way of thinking when it comes to programming or not
1. I'm planning to implement vibrato using the sweep units (so no triangle vibrato) and I'm not sure if this is possible. I want it this way though, so that I can get away with less code, and all I have to do is alternating between upward and downward pitchbends with a fixed timer (half period down, full up then half down to complete the triangle shape) , which activates when half the note duration is reached, if vibrato "mode" is enable with an effect.
2. Effects to apply to notes are all contained in a single byte, which is changed by an effect
msb| TDV- IIII |lsb
T - Tone drum mode (pulse only): if set, discard note and play a pre-defined drum sound with the sweep unit and envelope
D - Detune mode: if set, add one to the low period-shadow register after fetching it if a new note byte is read
V - Vibrato mode (pulse only): if set, apply vibrato at a pre-defined rate with the sweep units at half note duration.
I - Select instrument (pulse only): Selects intrument for the pulse channels. All instruments are 8 bytes, successively written to $4000/$4004 and at entry $03, waits for note off (so the last 4 entries are the "release" phase)
3. Should all possible note pitches be accessible directly with a single byte, and duration be set independently (which would allow more flexibility with song speed and such) or the 4 high bits meaning a duration in frames in lookup table, and the low 4 meaning notes (and octaves are set by an effect)? Which one is more efficient?
4. What size should I expect? What always gets me and takes my motivation is thinking about possibly ending up with a program way to large or way too CPU-extensive. Or should I just not worry about that at all?
A simple sound driver and its data will certainly fit in 32 kilobytes, more likely 16 if you don't use any DPCM. Look at the size of a well-optimized NSF for instance. And even then, dealing with CHR RAM isn't that hard; you just need to copy an 8192 byte chunk from one of the banks of PRG ROM to PPU $0000.
Are you planning to use the NES as an instrument, where the music driver has most of the RAM and CPU time available for itself? Or are you planning a music engine to be used by a game?
As for pitch, my music engine's phrase data format uses a range of 0-24 semitones above the phrase's base pitch. Semitone 25 means "tie", or hold the previous note and don't start a new note. Semitone 26 means "rest", or cut the current note. Semitones 27-31 have special meanings related to effects such as arpeggio, shifting the phrase's base pitch, and the like. This leaves three bits for duration, selected among 1, 2, 3, 4, 6, 8, 12, or 16 rows, where in-between durations are made with ties. A phrase's initial base pitch is specified in units of a semitone in the "conductor track" that tells when to play each phrase, not in the phrase itself, so that the phrase can be transposed up or down at various parts of a song.
Code:
76543210 Phrase bytecodes $00-$C7
||||||||
|||||+++- Duration index
+++++---- Pitch (0-24: semitone offset from base pitch,
25: continue note, 26: stop note)
Vibrato using the sweep units may drift from the intended center pitch when the APU's updates doesn't line up with the music engine's updates. In
Tetris for NES, for example, I can change the line clear sound by rotating the falling piece at just the right moment.
I would use it in a game if anything (I want to get better at programming by doing something with the part of the NES I'm the most familiar with) and sound effects would only be played with the Pulse 2 and Noise channels. The SFX data format is simply 16 note bytes (raw period for noise straight away), and then 16 raw data bytes for $4004 and $400C. Sweep is disabled for sfx. Or maybe I could go for having vibrato only on Pulse 1 and on Pulse 2 I only use it for sfx, I'll see how it goes.
And so, whenever I play sfx, I interrupt music playback by letting all the updates for the channel run, except that I never write the shadow registers to the real ones while an sfx is active, right? Not that I'd have to be overly concerned with this in the case of noise sounds. I want to use the envelope generator for the drums, so I can't recover the current volume and continue the envelope anyway.
My music engine uses software envelopes for everything, with the attack phase and start of the decay specified frame-by-frame and the rest as a linear decrease in volume specified as a starting level of x and a decrease rate in units per 16 frames.
For each channel, it interprets the phrase bytecode if needed, runs the envelope code, and writes the pitch and duty/volume values to locations in low zero page. Then it reads from the current sound effect on that channel, and if the sound effect is louder, it uses the duty/volume and pitch from the sound effect instead. A sound effect interrupts an existing effect only if it is longer than the existing effect's data. Square wave sound effects get played on channel 1 or 2, whichever has less remaining data.
Drums are actually sound effects in my engine. For example, a kick drum has two components: noise at high pitch ($3) for one frame followed by a few frames of noise at the lowest pitch ($F), plus a few frames of triangle at descending pitches. This allows the triangle part of the drum to interrupt the bass line in a reasonable way. Hi-hats alternate between the long-period (hiss) and short-period (tonal) modes of the noise channel, which to me sounds slightly more realistic especially for open hats.
You can listen to an
NSF of my engine, and the source code for the latest version is part of
RHDE.
za909 wrote:
4. What size should I expect? What always gets me and takes my motivation is thinking about possibly ending up with a program way to large or way too CPU-extensive. Or should I just not worry about that at all?
I wrote a music engine that is about 1.6k of code, and it supports a subset of Famitracker features, so I can use Famitracker to make tunes and SFX for it. It runs in about 1700 cycles, typically, peaking at around 2400. I haven't done any significant optimization of it, so it could probably be a little smaller or a little faster if I needed either of those things. I have fit my whole game soundtrack and SFX and music driver into a single 32k bank (I am using BNROM), which seemed like a pretty reasonable size target.
Alright, I've finally started working on this, and at least I think I'll get this. It's not like you people figure out everything the first time in a matter of seconds, right? (Because I feel like an alien here, my brain is wired for art stuff and not exactly "science" stuff like this, but that doesn't make me give up without trying you know)
You know, I just want to know if I'm on the right track or should find another hobby.
So this is what I have for now, initiating playback of a new song. I load the song header address to temporary zero page ram and then the pointers for the 4 channels. I hope I'll be able to do this with the planned virtual registers and counters in 32 zero page bytes and 32 regular RAM bytes. (Maybe that's too much as it is, but I have nothing to compare to)
It does what it's supposed to in a test file, and then crashes with stack overflow, because it just kind of ends for now.
Code:
.org $8200
SoundBegin:
; First of all, check if the sound engine is
; enabled at all
lda prog_flag1
and #%10000000
bpl EndSound
jmp PlayBack
EndSound:
rts
PlayBack:
; First, check if the song to be played is a new
; one, or the same as in the last frame
lda cur_song ; Other stuff requests a song here
cmp prev_song
sta prev_song
bne InitSongl_00
jmp ProcessFrame
InitSongl_00:
; Fetch the start address for all 4 channels
ldy #$00
InitSongl_01:
ldx cur_song
lda songTBL,x
sta temp_1
inx
lda songTBL,x
sta temp_0
InitSongl_02:
; CH addresses will have to be copied to temp. memory for
; use with indirect read!
lda (temp_0),y
sta p1_addrhi,y
cpy #$07
beq InitSongl_03
iny
jmp InitSongl_02
InitSongl_03:
; Clear sound memory
lda #$00
ldx #$00
InitSongl_04:
sta p1_shvol,x
cpx #$15
beq InitSongl_05
inx
jmp InitSongl_04
InitSongl_05:
; Clear on page 3
ldx #$00
InitSongl_06:
sta p1_timerload,x
cpx #$08
beq ProcessFrame
inx
jmp InitSongl_06
songTBL:
; The data doesn't make any sense for now
.dw $9000,$9000,$9000,$9000,$9000,$9000,$9000,$9000
.dw $9000,$9000,$9000,$9000,$9000,$9000,$9000,$9000
ProcessFrame:
nop
I have a few pretty easy questions here, since it could save a couple bytes during certain conditional jumps, and additions if it works. So for example if I lda #$00, the zero flag is set. But what happens if I load a non-zero value? Is the zero flag cleared or unaffected?
Does the state of the carry flag actually affect the result in the accumulator druing additions and subtractions? As in, does it change any of its 8 bits? So can I get rid of clc in this piece of code?
Code:
EffEnd:
; Increment ch address and return
; If wrap around occurs, increment hi address
clc
adc p1_addrlo,x
sta p1_addrlo,x
bcs EffEndl_00
rts
EffEndl_00:
inc p1_addrhi,x
rts
Yes. The adc instruction computes (A + value from memory + value from carry). Bits 7-0 of the result go to A, and bit 8 goes to carry. So if the carry is set when the adc instruction is executed, the CPU adds 1 to the result. The clc prevents the CPU from adding 1 by ensuring that the contribution of the value from carry to the sum is 0. If you can find some other way of proving that carry is 0 before it hits that line of code, you can drop the clc.
The carry does not affect the inc and dec instructions (including inx, iny, dex, and dey), nor do they affect the carry.
Thanks, so really in this case it acts as if the accumulator was a sort of "9-bit" register. My first question did not get answered though, I guess you just forgot about it or didn't notice it.
No big deal because while I'm at it I feel the need to get the rest out of the way (it might be worth just contacting some of you privately instead of polluting the forum with my shit in the long run)
1. BRK. Why do I see no use for this (yet)? It's all dependent on software, so the only way I can make use of it, is by endlessly looping and waiting for NMI to do something. But even then, I can just jsr or jmp. Is this just some feature that's not very useful for the NES, but instead in other 6502 based machines? OR am I missing the point entirely?
2. ASL & LSR vs. ROL & ROR. How are they any different from eachother? The seem to be doing the same things, have the same addressing modes available and even affect the status flags in the exact same way.
3. This one is not very important because I'm not going to use unofficial opcodes, but are there any that affect the unused status flag?
1. Being shared with the IRQ vector, NMI/IRQ hijacking, and the stack juggling necessary to read the byte after the opcode like in other 6502 based machines, does makes it less useful.
2. ROL and ROR inputs the carry bit, where ASL and LSR inputs a constant 0 bit.
3. No. and as far as I know the unofficial opcodes that use the ALU are not affected by the D flag.
43110 wrote:
3. No. and as far as I know the unofficial opcodes that use the ALU are not affected by the D flag.
I don't know what he meant by the "unused" flag, but bits 4 and 5 don't even physically exist in the CPU, so they can't be affected by anything. The decimal flag (bit 3) is also completely disconnected in 2A03, so it doesn't affect anything, be it official or unofficial instructions, even though the flag exists.
You can still set the D flag with CLD / SED but there is not much point. You could use it for a boolean, but it's probably more trouble than it is worth to read it. Maybe:
Code:
php
pla
and #$08
; is D set?
bne there
; is it clear?
beq here
Thank you for your answers, I'd really like to keep this going because programming is such a good remedy for stress for me. (Does anyone else feel that way too?)
So really there's not much point in trying to use the D flag. (If you're really that craving for every single bit of memory you can get you're probably not doing a very good job?)
And again, I've run into something fairly specific. I want to test if a value is higher than another fixed value, so I use subtraction and if the result is below zero, I do a certain action. Though I don't know which status flag to check.
The 6502 reference I'm using tells me this about sbc:
Carry Flag Clear if overflow in bit 7
...
Negative Flag Set if bit 7 set
So do I check N, or C to know what the result is? In my code (to avoid having to include a bunch of $00 bytes in the high period table) I use sec before the subtraction, so C should be cleared after the instruction if the result is negative, and if I subtract a value too large, I can't rely on N getting set.
Code:
; If the note is high, automatically load 0 for hi period
lda temp_3
sec
sbc #$25
bcc LoadPeriodl_00 ; Manually find the correct hi period
lda #$00
sta p1_shhi,x
rts
I recommend reading these:
http://6502.org/tutorials/compare_instructions.htmlhttp://6502.org/tutorials/compare_beyond.htmlFor unsigned comparisons you want to check C and/or Z. CMP is equivalent to SEC SBC, but it doesn't change the value in A.
For signed comparisons you need to do a little more work, see the second article.
Alright, now I'm at the point where the pulse channel handling is 90% finished, so I need to start thinking about what to do with the triangle. I'm planning to have two modes for it, one with infinitely held notes (but upon reading a delay $00 byte I turn it off with its bit in $4015) and one using the linear counter (in which case I'll probably ignore note delays, since silencing the channel is automated)
An effect is used to change this, the parameter is saved to RAM, and during data processing for the triangle channel, it reads this value (if it's $00, use infinite length, if non-zero, use it as the linear counter load)
But I don't see how the counter load affects the note length at all. I thought it was just a 7-bit countdown at 240Hz, and that's it, and I tested it with the SNDTEST.nes ROM but it's just all weird and I'm not sure what's going on, and why I randomly get endless notes even though bit 7 of $4008 is set.
So linear counters, how to they work?
The
length counter loads a value from a built-in table. My recommendation is to disable the length counter and just use the volume to control on/off for the channel (except the triangle, which doesn't have volume). When it's disabled, though, you still need to enable the channel by writing to $4003/4007/400B/400F to load a non-zero value into the length counter. There's really nothing that can be done with the length counter that can't be done with software control of the volume, so it's probably only worth using for an extremely tiny music engine.
The triangle's linear counter (different from the length counter) can be used to make very short blips, since it can operate at 240hz. Sunsoft used this for croaking noises, for example.
I never use the length counters for anything. My pulse channels are fed with volume/duty register writes every frame from a table accessed via a table of pointers (these are the instruments), so it's just a matter of always sending the correct bit to keep the length counter disabled. The triangle will not use it either (except that I should probably write 0000 1xxx to $400B on every new note while using the linear counter so that it doesn't interfere) and since I use the envelope for the noise channel, I have nothing better to do than writing 0000 1000 to this length counter as well, so it doesn't cut anything. Sfx will be entirely done by software, so that won't be much of a problem either.
What I couldn't figure out was that bit 7 of $4008 clear is not a toggle between the length/linear counter, but rather a switch for both, so while using the linear counter, the length counter can still mess the intended length up if it's in the range of the other. So really, what the linear counter essentially is, is an extension for the length counter, which allows 240Hz accuracy for the first 32 frames.
Yeah, the length counter and linear counter unfortunately have to both be on at the same time. It just means that you must select a long enough table entry for the length counter if you are using the linear counter.
It's pretty dumb.
ORA & EOR vs. ADC & SBC
So the situation is the following: Sometimes I need to set or clear certain bits in memory, but without affecting the rest of them. I figured this could be done by simply adding or subtracting in binary, but sometimes that bit is already set/clear, so a remainder is carried over, and destroys other bits, that's how I started using the two or-s instead (ora for setting new bits, eor for clearing)
But can I still use addition/subtraction if I'm 100% certain that the bit is in such a state before it hits the instruction, that it won't mess up anything?
Example
Setting bit 4 in two ways:
Code:
lda #%10000000
adc #%00010000
Code:
lda #%10000000
ora #%00010000
Yes, but you need to make sure the carry flag is in a predictable state before the instruction.
za909 wrote:
ora for setting new bits, eor for clearing
ORA sets bits, AND clears bits, EOR
flips bits.
za909 wrote:
But can I still use addition/subtraction if I'm 100% certain that the bit is in such a state before it hits the instruction, that it won't mess up anything?
Can you give an example where using ADC/SBC to set/unset bits actually has benefits over AND and OR? If not, then don't use them, unless you want to intentionally obfuscate your code.
Ah ok, that makes sense. It's just simply the context that made it confusing for me, because the only thing that matters is that certain bits must be cleared by this code, it doesn't matter if I flip them or AND them with the right 2's complement, because when this fires off, both bits are 1 always.
But at least now I know why controller reading requires EOR a lot.
This is part of a three-way conditional jump, which decides which bits to clear in the "susoff_flags" variable. All 4 channels have a "sustain level reached"-bit and a "note stop command was read, so process the key off part"-bit, which need to be reset when a new note is read. Noise is handled entirely differently. The instructions in caps are what used to be EORs (with all bits flipped in their parameter)
Code:
ClearPu1
; If X is 0
AND #%01110111
jmp ClearPuCommon
ClearPu2:
; If X is 2 or greater. If greater, advance to ClearTri
bne ClearTri
AND #%10111011
ClearPuCommon:
; Common parts of resetting pulse channels
sta susoff_flags
lda #$00
sta p1_patchseq,x
sta p1_vibtimer,x
sta p1_vibphase,x
rts
...
I guess the JMP can be replaced with a conditional jump if I can find one that always succeeds.
ClearPu2 starts with a BNE instruction. What sets the Z flag used by BNE before you jumped or branched to ClearPu2?
Code:
lda foo
ldx bar
beq ClearPu1 ; if X == 0
cpx #2
bcs ClearPu2 ; if X >= 2
jmp XWasOne
ClearPu1:
...
jmp ClearPuCommon
ClearPu2:
bne ClearTri ; when you reach this point, the Z flag is still the result of cpx #2
...
There's no pressing need to replace the jmp with a branch. It's not faster than a jmp, just saves a byte, and might even be slower if the branch crosses a page. Unless you're really out of space, I'd recommend against it, as it will make that area of code harder to edit in the future (as you need to review the branch and make sure the condition is still set).
This
Code:
lda susoff_flags
cpx #$02
bcs ClearPu2
This decides which channel bits have to be cleared.
X serves as an index, depending on which channel's turn it is
00 means Pulse 1, 02 means Pulse 2, 04 = Tri, 06 = Noise
I don't know why I included the lda, but that's where I load, and then select a value to AND it with.
It looks like you are thinking of the branch instructions as "if <condition> goto label", which is okay, but it can sometimes work out better for code structure to invert the test and branch over the code that you want to execute.
As a simple example if you wanted to do something if A >30h
Code:
cmp #$30
beq skip
bcc skip
; do stuff here when A is greater than $30
skip:
So it's come to this yet again. So many really little things I just need to ask, but thank you all of you for the help up until now, you're aiding me to set sail and explore the amazing waters of asm and it's a wonderful experience (and who knows if I get experienced who's to stop me from moving on to other languages?)
My lack of knowledge this time somewhat is asm6 specific.
Are constant declarations local between different assembly sources, and can I use declared constants in byte definitions like this?
Code:
; Data definitions
OFF = #$00
A-0 = #$01
A#0 = #$02
B-0 = #$03
C-1 = #$04
...
.db C-1,A-0,C-1,A-0,OFF...
Is there any way to make a jump table or a "read table" more versatile? To be precise, is there any way to put a variable in the place of p2sfxvolTBL? In this code for example, I want to read the Y-th byte from one of many .db lines, and the absolute addresses of all those lines are stored in a table, but I need to specify which table, and that makes my code impossible to reuse like this.
Code:
lda sfx_p2,x
asl a
tay
lda p2sfxvolTBL,y
sta temp_E
lda p2sfxvolTBL+1,y
sta temp_F
ldy sfx_p2seq
lda (temp_E),y
sta sfx_p2vol,x
Again, thank you for all your troubles, it'll take some time until I can give something back, and even longer to help back, but I'm willing to spend as much time and energy as I need to.
za909 wrote:
can I use declared constants in byte definitions like this?
Sure, but you aren't going to be able to use some of those characters in your symbol names. This should work though:
Code:
; Data definitions
OFF = $00
A0 = $01
A_0 = $02
B_0 = $03
C_1 = $04
; ...
.db C_1,A_0,C_1,A_0,OFF ; ...
Using a subtraction symbol in your .db statement is going to result in a subtraction and you aren't going to be able to place it in a name to begin with. Plus you only want a number sign (hash mark) when specifying that an instruction is to be immediate mode. (
http://www.obelisk.demon.co.uk/6502/addressing.html )
za909 wrote:
.. is there any way to put a variable in the place of p2sfxvolTBL?
There may be a better way to approach that problem, but for that specific question, you may want to use a macro:
Code:
; ca65 style macro:
.macro loadPtr table
lda table,y
sta temp_E
lda table+1,y
sta temp_F
.endmacro
; ...
lda sfx_p2,x
asl a
tay
loadPtr p2sfxvolTBL
ldy sfx_p2seq
lda (temp_E),y
sta sfx_p2vol,x
Alright, now comes the final stretch. I got it to work, sort of. The triangle and noise channels work perfectly after a little bit of debugging and adding safety measures, and the whole thing is stable so I just need to find out what's wrong with the pulse channel code. Which brings me to a question. The FCEUX debugger really works wonders, but are there any other tools with more features you would recommend? One thing I could use a lot is adding breakpoints for certain CPU registers (stop if Y is loaded with the specified value, etc.) and that doesn't seem to be very common.
I'm also not sure about how well I'm doing with using the resources, especially the RAM. The code is 1381 bytes at the moment with instrument/song/sfx data stripped, and my budget was 2k at most. The engine takes 1100-1300 cycles to run, though I'd expect a couple hundred more once everything in it is utilised. As for RAM, I gave it 32 bytes from zero page and 32 on page 3, plus it uses 10 bytes for temporary storage, which is going to be shared with all the other programes anyway so I'm not counting that. Not sure if it's ideal to give more RAM to sound of not (but if I want to add transposing and arpeggio, I'll have to)
Though it says Bank 7 ($C000-$FFFF is another source file), I'm using NROMs to test it for now, since I'm devoting a single 16k bank to sound so there's no bankswitching needed in the engine.
za909 wrote:
10 bytes for temporary storage, which is going to be shared with all the other programes anyway so I'm not counting that.
You should only share temporary variables within the same thread. If music runs in your NMI but your gameplay doesn't, for example, those variables will get clobbered by the music routine if it ever interrupts gameplay (e.g. if one frame runs long). Either divide your temporaries up by thread, or alternatively use the stack for them (the stack is always thread-safe).
That does seem like a good idea now that I realised the operation of the stack is not lethal to data you just write to $0100-$01FF. If pushing and pulling only affect S it's fine for this. I thought stack affecting operations rotate bytes as well, and whatever is at $0100 is lost if I push something onto the stack, which would make it unreliable outside of individual subroutines. I will probably use the bottom of the stack for NMI temporaries, it's a waste of RAM to have that many bytes untouched during rendering.
Yeah, it has become my standard practice to use $0100-$019F for the data that will be copied to VRAM in the next vblank.
Hmm, I'm a little worried because I use the sweep units for percussion, (and I could create an instant cut command later)
but what if the channel is silenced due to overflow from the sweep unit adder, or because of a period less than 8? Do I have to write to the length counter of the channel to restart the sequencer? Percussion uses envelopes and the longest length counter table entry, musical notes are completely software controlled.