APU Triangle Popping Question

APU Triangle Popping Question
by jwdonal on 2011-07-08 (#81307)

My triangle channel seems to be working very well. All of the notes sound right when I compare them to Nestopia's triangle channel output with various games. But I'm still getting really bad "popping".

I'm silencing the triangle channel whenever the channel's period (i.e. $400A/$400B) is set to $7FF,$7FE,$001,$000. And I'm also making sure that when the triangle seqeuencer is not being clocked that I maintain the current output sample value rather than making it 0 or something (which I've read can also cause popping).

Is there anything else I need to do to prevent the popping? Here is a sample of my emu's triangle channel output for the Journey to Silius title screen song:

http://dl.dropbox.com/u/36237540/2011_07_09_jts_title_tri_silencing_bad.wav

EDIT: My logic to silence the triangle channel is as follows:

Code:

bad_period = (timer_period == 7FF) ||
             (timer_period == 7FE) ||
             (timer_period == 001) ||
             (timer_period == 000) ? 1 : 0;

tri_out = (!bad_period && len_cntr_nonzero && linear_cntr_nonzero) ? sequencer_value : 0;

Is the above logic correct for silencing the channel?

Thanks!

by thefox on 2011-07-08 (#81308)

Did you look at the wav in an audio editor? There are some strange sudden jumps in the samples when they get turned off, which cause the popping. It's not maintaining whatever value the triangle waveform was stopped at.

E: And the problem is obvious from your edit, you should still output the sequencer value when the channel is "silenced", but just not clock it.

by jwdonal on 2011-07-08 (#81309)

So I guess I should not be silencing the triangle channel when the counters are zero, but only when bad_period is true? The sequencer_value variable in my code above is maintaining a constant value whenever it is not clocked, but with my current logic when either of the counters is zero the channel will be pulled to 0. Which completely defeats the purpose of the sequencer maintaining its value!! Haha. I'm going to fix it and see what happens. Thanks for pointing that out fox!! I think you're on to something!! I will let you know what happens....stay tuned (design is synthesizing now)...

by jwdonal on 2011-07-08 (#81310)

YES! That fixed it! FINALLY! No more popping!

Thanks a lot fox!!!

I'm gonna post some new youtube vids of my new and improved APU!

by thefox on 2011-07-08 (#81311)

I would also get rid of the bad period exceptions unless you have verified that they are actually needed. But even if you leave them in, you probably should change the logic so that instead of forcing the value to 0 when it detects a bad period, it simply would stop clocking the sequencer. Otherwise you'll still get an occasional pop.

by jwdonal on 2011-07-08 (#81312)

thefox wrote:
I would also get rid of the bad period exceptions unless you have verified that they are actually needed. But even if you leave them in, you probably should change the logic so that instead of forcing the value to 0 when it detects a bad period, it simply would stop clocking the sequencer. Otherwise you'll still get an occasional pop.

Yes, I completely agree with you. So the final tri_out logic is:

Code:
tri_out = sequencer_value;

lol (really not logic at all, just pass-through straight to DAC, as I now realize it should have been all along)

_However_, the triangle sequencer clocking-enable logic is:

Code:
clk_tri_sequencer = (tri_clk_enable_from_timer && len_cntr_nz && lin_cntr_nz && !ultrasonic) ? 1 : 0;

Here is my new and improved audio output (with the old to compare to):

http://dl.dropbox.com/u/36237540/2011_07_09_jts_title_tri_silencing_bad.wav
http://dl.dropbox.com/u/36237540/2011_07_09_jts_title_tri_silencing_good.wav

That's it! Jeez, everything is so much clearer to me now! Again, thanks so much for your help fox!

Jonathon

by Zepper on 2011-07-09 (#81319)

Uh... well, sort of. You never get a zeroed (centered) waveform, but a huge rectangle. To avoid such problem, I added volume decay to center the waveform.

by jwdonal on 2011-07-09 (#81327)

I have posted a new vid on my youtube channel demonstrating my new and improved APU! I broke out each channel individually so you can hear each one clearly. I annotated when each channel was enabled/disabled. Enjoy!

Journey to Silius (*Annotated* APU Demo):
http://www.youtube.com/watch?v=qn8HvL2Xheo

EDIT (some more vids)
--------------------------
Shatterhand (APU Demo):
http://www.youtube.com/watch?v=r5dV0OdjVj4
Super Mario Bros. 1 (APU Demo):
http://www.youtube.com/watch?v=Ihx-jyQjVvM
Super Mario Bros. 2 (APU Demo):
http://www.youtube.com/watch?v=fAaJ73lBN4E
Super Mario Bros. 3 (APU Demo):
http://www.youtube.com/watch?v=gTa9B8Bvgag

Note that these vids are primarly for demonstrating the APU. I know that I have lots of PPU bugs still so you don't need to let me know....unless of course you can tell me how to fix'em.

Jonathon

by Dwedit on 2011-07-09 (#81330)

Lots of incorrect duty cycles in SMB3. How long are you waiting before you update the duty cycle? That should happen instantly.

by jwdonal on 2011-07-09 (#81332)

Which world/level is it happening? All of them? I think I may know what I'm doing wrong. I'm going to try a fix and then I will repost a WAV of the world/level that you name.

Thanks Dwedit!

by Dwedit on 2011-07-09 (#81333)

First note in Grass Land's theme is the wrong duty cycle.
Jump sound effect is using the wrong duty cycle.
Some notes in SMB2 are wrong after it switches duty cycles. You can hear this clearly in the "overworld" music, it continues to use the old duty for the entire first note of the song.
It's also very annoying in the "underworld" music as well.

My guess is that you are updating the duty cycle only after writes to the period, instead of updating it immediately. SMB2 writes the period, then changes the duty cycle afterwards.

by jwdonal on 2011-07-09 (#81334)

Ok, thanks, that helps a lot. Here are some WAVs. The first is the original implementation that was used to make the youtube vid. The second is a new WAV sample with my corrected (I hope) implementation. Lemme know what you think. I'm not an audiophile but it does sound a little different - I think I at least made some type of positive (read "good") change.

The following WAVs are of me playing through World 1, Levels 1, 2, 3, then the first small castle.
http://dl.dropbox.com/u/36237540/2011_07_09_smb3_duty_cycle_update_bad.wav
http://dl.dropbox.com/u/36237540/2011_07_09_smb3_duty_cycle_update_good.wav

Thanks Dwedit!

I will also upload some clips of just the 2 pulse channels by themselves of the specific sounds that you mentioned (in case those would be more useful).

UPDATE:
As promised, here are some clips of _only_ pulse channel 1 in SMB3 which is used for the jump sound effect. It sounds better to me, but I'll let the experts decide.

http://dl.dropbox.com/u/36237540/2011_07_09_smb3_jump_pulse1_duty_cycle_update_bad.wav
http://dl.dropbox.com/u/36237540/2011_07_09_smb3_jump_pulse1_duty_cycle_update_good.wav

by Banshaku on 2011-07-09 (#81336)

Is it only me or the SMB1 jump sound doesn't seems quite right? I don't know the reason, just my ears feels that it's wrong.

by Dwedit on 2011-07-09 (#81337)

In the youtube video based on the older buggy code (which he has since fixed), it's using the wrong duty cycle.

by jwdonal on 2011-07-09 (#81338)

Dwedit wrote:
My guess is that you are updating the duty cycle only after writes to the period, instead of updating it immediately.

Yup. That's exactly what I was doing. Jeez, how the heck you guys "just know" this stuff never ceases to amaze me. I asked Kevtris the other day about this stupid triangle bug (that I couldn't get rid of to save my life) and he only had to listen to a short sample and was like, "oh, I know exactly what the problem is." Lol...what the??

Dwedit wrote:
(which he has since fixed)

Just to be sure, are you saying that the new implementation sounds correct now?

Thanks Dwedit!

by jwdonal on 2011-07-09 (#81339)

Banshaku wrote:
Is it only me or the SMB1 jump sound doesn't seems quite right? I don't know the reason, just my ears feels that it's wrong.

Yes, it was wrong, as Dwedit confirmed. Here is the before and after version of the SMB1 jump (pulse channel 1 only).

http://dl.dropbox.com/u/36237540/2011_07_09_smb1_jump_pulse1_duty_cycle_update_bad.wav
http://dl.dropbox.com/u/36237540/2011_07_09_smb1_jump_pulse1_duty_cycle_update_good.wav

I can't thank you guys enough for your help. If you notice anything else at all that sounds messed up please let me know. Show no mercy!!

by Dwedit on 2011-07-09 (#81341)

If you want some good testcases for frequency sweeps, try Zelda 2 (sword beam sound effect, and hitting a rock with the sword), and Mike Tyson's Punch Out's sound effects. FCEUX even gets those wrong.

Also, Dr. Mario will reveal whether you are correctly shutting off the channel when the sound period gets too low.

by jwdonal on 2011-07-09 (#81346)

Thanks for the tip! How do these sound?

Zelda 2 Sword Beam Effect (Pulse Channel 1 Only):
http://dl.dropbox.com/u/36237540/2011_07_09_zelda2_sword_beam_pulse1.wav

Mike Tyson's Punch-Out SFX (Pulse Channels 1&2):
http://dl.dropbox.com/u/36237540/2011_07_09_punchout_sfx_pulse12.wav

Dr. Mario (All Channels):
http://dl.dropbox.com/u/36237540/2011_07_09_drmario_all_channels.wav

by tepples on 2011-07-09 (#81347)

Dr. Mario seems to sound OK, except for what might be a problem with channel balance: noise is a bit louder than I remember it. How are you handling the case where the output level of a channel changes multiple times during one output sample period? Are you using a box filter or something else to smooth out fast changes, or are you just sampling once every 1/44100 of a second? On the NES, noise periods 3, 2, 1, and especially 0 sound noticeably quieter than 4 because more of their energy is shifted out of the <20 kHz audible band.

by jwdonal on 2011-07-10 (#81351)

Hey tepples! Hmm...so my audio codec (which is an AC'97 chip on my Xilinx board, and the board that I used for generating these WAV files) is configured for 48kHz. It also supports 44.1kHz. I just arbitrarily chose 48kHz. I don't know which is better, I'm not an audio expert, but if someone could tell me that would be much appreciated.

For the mixer I use the LUT method described on the wiki: (http://wiki.nesdev.com/w/index.php/APU_Mixer_Emulation)

The 16-bit data output of the mixer is then sent straight to the AC'97 codec. I have no low/high-pass filters between the mixer and the AC'97 data buffer. Although Kevtris mentioned on mIRC a couple days ago that I should think about adding a low-pass filter.

Does that answer your question? Sorry, I'm kind of lacking in my comprehension of filters and sampling rates and such. Hopefully my reply wasn't too dumb.

by tepples on 2011-07-10 (#81356)

Let me rephrase: The APU generates a sample on every CPU cycle, or 315/176*1000000 = 1789773 Hz. This means at a 48000 Hz output, the APU generates 315/176*1000000/48000 = 37.3 samples. How do you turn the 37.3 samples that the APU makes into one sample that you send to the AC'97 codec?

by cpow on 2011-07-10 (#81358)

tepples wrote:
Let me rephrase: The APU generates a sample on every CPU cycle, or 315/176*1000000 = 1789773 Hz. This means at a 48000 Hz output, the APU generates 315/176*1000000/48000 = 37.3 samples. How do you turn the 37.3 samples that the APU makes into one sample that you send to the AC'97 codec?

I'm not trying to hijack this thread, but my method is just to keep track of the APU-generated samples between each sound-card-needed sample and do a simple average of the APU-generated samples to get the sound-card-needed sample. That's probably not the best way, but it sounds alright. What is "the best way"?

by tepples on 2011-07-10 (#81361)

cpow wrote:
my method is just to keep track of the APU-generated samples between each sound-card-needed sample and do a simple average of the APU-generated samples to get the sound-card-needed sample.

That's a box filter or mean filter, and it's an OK first approximation. I suspect jwdonal is just point-sampling, which will produce noticeable artifacts with high square or noise frequencies.

Quote:
That's probably not the best way, but it sounds alright. What is "the best way"?

Probably blargg's blip-buf library, which resamples stepwise functions like those generated by the APU using "band-limited synthesis".

by Jarhmander on 2011-07-10 (#81364)

tepples wrote:
I suspect jwdonal is just point-sampling, which will produce noticeable artifacts with high square or noise frequencies.

That's nearest neighbor interpolation, isn't it?

by jwdonal on 2011-07-10 (#81374)

tepples wrote:
Let me rephrase: The APU generates a sample on every CPU cycle, or 315/176*1000000 = 1789773 Hz. This means at a 48000 Hz output, the APU generates 315/176*1000000/48000 = 37.3 samples. How do you turn the 37.3 samples that the APU makes into one sample that you send to the AC'97 codec?

(Apologies for the lateness of my reply.)

I honestly never really thought about this before because what I have now always just "seemed" to work/sound OK. But now that I've actually taken a few minutes to think about how I have it set up - it's pretty kludgy. Haha.

Hardware Layout
------------------------------------------------------------------------------------
In order for my APU to write samples out to the AC97 codec I need to cross a clock domain. That is, I need to transfer data from the 1.79MHz clock domain to the 12.288MHz AC97 domain. Since these two clock domains are completely asynchronous from one another the typical solution in FPGA design is to insert a bisynchronous FIFO/RAM in between the 2 clock domains (otherwise you will get all sorts of insane timing errors). I use the RAM method.

The RAM block is a simple dual-port, that is, one side of the RAM is write-only (the APU side), while the other side is read-only (the AC97 side). The write-side is synchronous to the 1.79MHz clock while the read-side is sync to the 12.288MHz clock. Additionally, I simply tie the RAM's address lines to all zeros and only ever read/write from/to a single RAM address (this is not as big a waste as it seems because these internal FPGA RAMs are very small and plentiful). Essentially, I create a single bisynchronous register to use between the APU and AC97 clock domains.

Sampling Control
------------------------------------------------------------------------------------
On each CPU cycle the current 16-bit value on the output of the APU's mixer is written into the bisync RAM register. There is no handshaking/waiting or communication to the AC97 controller. The bisync register is just updated every CPU cycle with the current mixer output without fail.

On the AC97 side of the bisync register (i.e. the read side) it is a bit more complex. The AC97 codec (i.e. the actual chip on the board) has a serial data input interface to receive audio samples. The 16-bit audio samples are shifted out 1 bit at a time by the AC97 *controller* (i.e. the HDL AC97 control module that resides in the FPGA and sends data to the codec chip).

However, it's actually much more complex than that because AC97 codecs utilize a communication protocol based on "slots" and "frames". That is to say, you can't just shift out the 16-bit audio sample all by itself and then move on to shifting out the next sample. What really happens is for every two 16-bit samples (2 samples needed for stereo audio) that is sent to the codec you must send 13 "slots". These 13 slots make up one frame (one stereo sample) including all codec control overhead.

Each AC97 "frame" is 256 clock cycles and is organized as follows:

- 16 cycles for Slot 0 (16 bits)
- 20 cycles each for slots 1-12 (20 bits each)

(Just FYI, slots 2&3 hold the left/right 16-bit audio samples with 4 bits of 0 appended to the end in order to fill up the 20-bit slot. Also, I simply copy the mono APU mixer output to both the left/right channels. All other slots are just codec control overhead.)

The total frame time is therefore [16 + 12*20 = 256 cycles]. With a bit-clock frequency of 12.288 MHz, the frame frequency is 48,000Hz (12.288MHz/256). This results in a frame period of 20.83us (48kHz^-1).
------------------------------------------------------------------------------------

Well, that was probably more than you ever wanted to know about my APU works. Haha. But it did help me to actually type out how it all works. I can already see where there are some issues.

So to answer tepples question about how I turn 37.3 (i.e. 20.83us/(1.79MHz^-1)) samples into 1 AC97 sample...the answer is...uh, I don't do anything. LMAO. And, yes, I know that's very bad and stupid, but when I originally wrote my codec interface I just implemented the absolute simplest method possible to get data out to the codec without consider what havoc I might be causing. Hehe. I was much too eager to get working on the fun stuff!

Would like to hear anybody's thoughts on this implementation and how I might make it better. Tepples, I think your sample averaging would be helpful, along with an APU-to-AC97 "handshake" in between each audio sample...or something like that. I really appreciate your post. You really made me stop and think about the mess I'm actually making in my audio output. LOL

cpow wrote:
I'm not trying to hijack this thread, but my method is just to keep track of the APU-generated samples between each sound-card-needed sample and do a simple average of the APU-generated samples to get the sound-card-needed sample. That's probably not the best way, but it sounds alright. What is "the best way"?

You aren't hijacking the thread - what you said was completely on-topic. In fact, I think I might just implement what you have implemented (basically what tepples suggested) in software and recreate it in hardware. Should be pretty easy and it's about a 100x better than what I have now.

by jwdonal on 2011-07-10 (#81382)

So how about a running average of the mixer output? Would that make sense? I tested it out and here are two recordings from SMB1 (triangle channel only).

No averaging (original implemenation described in previous post):
http://dl.dropbox.com/u/36237540/2011_07_10_smb1_level1song_triangle_noavg.wav

Running average of last 32 samples:
http://dl.dropbox.com/u/36237540/2011_07_10_smb1_level1song_triangle_runavg.wav

Which one do you guys think sounds better? Personally, the running average sounds less "buzzier" than the non-averaged, which I think is a positive improvement.

Pz!

by tepples on 2011-07-11 (#81390)

Yeah, much of the aliasing is gone in the average-filtered version. Good job.

by jwdonal on 2011-07-11 (#81400)

Awesome! Thanks tepples!

by kyuusaku on 2011-07-11 (#81404)

So has anybody calculated the response of this filter? It's a 32-tap FIR where every coefficient is 0.03125?

by jwdonal on 2011-07-11 (#81408)

Here is a re-recording of Dr.Mario, but now with the 32-sample, running average.

>> 0:00-2:13 = intro music and demo playing (all channels)
>> 2:13-2:36 = new game menu (all channels)
>> 2:39-2:50 = new game menu (pulse 1 only)
>> 2:50-3:04 = new game menu (pulse 2 only)
>> 3:04-3:19 = new game menu (triangle only)
>> 3:20-3:35 = new game menu (noise only)
>> 3:35-3:53 = new game menu (dmc only)
>> 3:54-4:58 = game play (all channels)

http://dl.dropbox.com/u/36237540/2011_07_11_drmario_various_channels.wav

Tepples, does the noise problem you mentioned sound any better in this version than the previous version?

by jwdonal on 2011-07-11 (#81436)

kyuusaku wrote:
So has anybody calculated the response of this filter?

I have not analyzed the impulse response of this filter.....all I know is that it does sound better on all the games I've tried. So that's good enough for me. Lol. Really all the filter is doing is "smoothing" out the waveform - it's not specifically a low/high/band-pass specific type of FIR filter. It's a just a smoothing FIR function. Maybe one day I will add a proper low-pass filter...but right now I'm not sure I care enough to bother with it since this filter seems to work just fine.

kyuusaku wrote:
It's a 32-tap FIR where every coefficient is 0.03125?

Yes, technically speaking it is a "31st-order, 32-tap Moving Average FIR Filter" where each tap has an equal "tap-weight" (i.e. coefficient) of 1/32 (or 0.03125).

by kyuusaku on 2011-07-12 (#81446)

Well it's definitely a LPF, just a crazy one. Each sample can change 1/32 at most so logically it must reject 1789772/32 = 55.9 kHz at least. If the difference is audible then it's probably lower than that.

by jwdonal on 2011-07-12 (#81447)

Hmm...while we're on the subject...if I were to create a proper low-pass filter what would be the best cutoff frequency and number of taps to use for the NES? Any suggestions?

Thanks!

by jwdonal on 2011-07-12 (#81474)

kyuusaku wrote:
So has anybody calculated the response of this filter?

In case you're still interested, here is the frequency response of various orders of smoothing FIR filters with all coeffs = (1/N+1) where N is the order of the filter. It is essentially showing a slight attenuation across the entire frequency range - which makes sense.

by kyuusaku on 2011-07-12 (#81484)

Cool, but how do you interpret it?

Both axes are in decibels? Is Y in amplitude or power?

X is pretty confusing. If X is frequency and 10^0 is the sample rate, then there's really poor roll off way beyond the sample rate... I don't get how there could be much power (only harmonics?) above f/32.

As for another LPF I guess going for accuracy you could approximate the response of the passive filters in the audio path. I know we had a thread on this but I'm not sure what the results were.

by jwdonal on 2011-07-14 (#81539)

I am now working on a proper low-pass filter which has a much more recognizable frequency response shape (like this).

And just to make it clear. Even though the resulting audio from the output of the smoothing filter sounded a bit better (i.e. less aliased) smoothing FIR filters are generally a horrible idea for audio output. But it was still fun to try out.