Why PM? Why not keep the conversation public so that others can benefit from it?
PCM data is fairly simple to understand. You can think of a waveform as a digital recording of how the cone in the speaker is going to vibrate. Several "samples" at taken...each of which are effectively the cone's position at a particular point in time.
This image may help:
https://upload.wikimedia.org/wikipedia/ ... cm.svg.pngThe Y axis here would be the value of the sample (position of the cone), and the X axis would be time. The "taller" the wave, the louder it is. The "wider" the wave, the lower pitch it is.
The NES produces its waveforms by rapidly changing output values for a channel. For example, the pulse channels form a wave by rapidly alternating between '0' output and 'V' output (their volume). Effectively, the channel is clocked every cycle*, and that clock is divided by a down-counter (the "period"). When the period loops, the channel changes its output**. The period then acts are sort of a "delay" which dictates how much of a gap there is between the change in output. Smaller period = smaller delay.
So with a small period, (and with volume=8), the pulse channel might output the following:
00008888000088880000888800008888
But a longer period might output this:
00000000888888880000000088888888
If you were to plot that output out in a method similar to the above image (Y axis=output value, X axis=time), you'd see that it forms a sort of "square" shaped wave. The wave is basically how the speaker cone moves, generating your output.
The interesting thing to note here is that the 'height' of the wave is dictated by V. Higher values for V will produce a taller wave, resulting in a louder sound. And the 'width' of the wave is dictated by the period. So a longer period will create a longer delay which results in a wider wave, and therefore a lower pitch. So the period is effectively the pitch control.
You can think of the NES as outputting one sample every CPU cycle... effectively having a samplerate of 1.79 MHz. The tricky part is scaling that data down... because PCs typically only want to output 44100 samples every second. The easiest way to do this (and the way I recommend starting) is just to do a "nearest neighbor" approach:
1789772.7272 / 44100 = ~40.5
So you can downsample the audio by ignoring APU output for 39 cycles, then using the 40th cycle... then ignoring 40, using 41st... then ignoring 39, using 40th... etc.
This will more or less work, and is VERY easy to implement, but produces low quality sound (it'll sound grainy, particularly with higher tones).
There are higher quality approaches that can be done here -- but I wouldn't concern myself with that yet. Just get it outputting something, and worry about making it sound good later.
*Technically I think the squares are clocked every APU cycle, which is every OTHER CPU cycle.
**The period divider actually clocks the duty cycle generator, which is responsible for changing the output.I'm happy to help with questions / clarifications.