MintaBOOM - A WIP sample-based music engine for the NES

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
MintaBOOM - A WIP sample-based music engine for the NES
by on (#175146)
This is a project I've been doing on the side for a while now, and it's time to finally reveal it and show my current progress. MintaBOOM (the word "minta" [mintɒ] being the Hungarian word for "sample") is a new, work-in-progress music player, which aims to make it possible to play sample-based music in games without having to completely suspend gameplay. The additional cost of an IRQ source in a cartridge has also been considered, and as a result, no complicated mapper hardware is needed to make use of the music player, as it runs using the DMC channel for its timing!
Here is a list of the current features, possible advantages and disadvantages, as usage of the player obviously comes at a significant cost:

- Streams 7-bit PCM audio at a sampling rate of 8 272 Hz
- A "measure" system providing up to 128 reusable sections of music to make the most of the cartridge space available
- Hardware-controlled noise sounds can be played at any time to make up for the loss of high-frequency components in percussive instruments
- Planned: using the most significant bit in every sample for primitive RLE compression, or lowering the resoluton to 6 bits for more possible compression

- Runs using the DMC IRQ, so there is no need for an extra IRQ source in the game cartridge
- Does not require full attention from the CPU, so a game can still be run in the meantime
- Since manual sample playback is not limited to $C000-$FFFF any amount of space may be used for music
- OAM DMA does not affect the sound severely

- Requires a lot of PRG ROM space: ~485kB/minute of unique music
- The current version is guaranteed to use ~49.5% CPU time every frame (with fully optimized sample data - no bit shifting needed by the 2A03 - this could be lowered to 47.7%)
- The game must either be very quick to run, or be forced to run at 30/25 frames per second
- Use or certain instruments is heavily discouraged if they contain a lot of high-frequency components (harpsichord, distorted guitars, etc.)

How it works:
MintaBOOM relies on the IRQ generated by the DMC channel. At rate $F, a 1-byte dummy sample is played with a bit pattern of %10101010, so the resulting high-pitched square wave does not affect the sound and the channel provides interrupts as frequently as possible. This would normally only allow a sampling rate of 4 136 Hz, meaning a write to $4011 every 432 clock cycles. However, MintaBOOM sits around in the IRQ handler just enough to ensure that only 216 clock cycles pass between writes, effectively doubling the sampling rate. This waiting time can be used for decompression, or to process one of the available effect commands.

When an interrupt fires, a sample buffer (filled during the last IRQ) is sent to the output. Then, the waiting period is spent fetching the next byte for the buffer, and then another to be output right before exiting the IRQ handler. This means that every "buffer byte" will come from an odd address, and every directly output byte will come from an even address.

Data format:
Currently the data structure is very simple. Every time an interrupt happens, two bytes are read. To save time, only the "buffer byte" is checked to see if it's actually an effect command. As a result, effect commands MUST be at an even address, and their argument byte must come BEFORE the command itself.
$00 means that an effect must be processed, and the argument byte denotes which effect:
$00-$7F: play the corresponding noise sound
$80-$FF: get the next segment ID from the song sequence - if the segment ID is signed, the player will loop back to the segment of the song specified by the 7-bit value

The song sequence specifies which of the 128 possible segments are played, and where to loop back to at the end.

The test ROM provided uses the NES 2.0 header format, so please make sure to use an emulator that is compatible, otherwise the song will loop prematurely. If you'd like to compare the music to the original, please follow this link! This test does not demonstrate the usefulness of the noise accompaniment.
Any comments or feedback is much appreciated!
Re: MintaBOOM - A WIP sample-based music engine for the NES
by on (#175149)
This is just impressive.

Sadly my games tend to use most of the frame time (C coder here). But I'm sure this could be used in title screens and stuff like that. Or in visual novels and adventure games.
Re: MintaBOOM - A WIP sample-based music engine for the NES
by on (#175151)
Pretty cool stuff. Could be a fun gimmick in a simple game that doesn't require all that much CPU/VBlank time.

NDX says "Header format not recognized" for some reason, even though it should support NES 2.0. (Didn't test with the latest version of Nintendulator.)
Re: MintaBOOM - A WIP sample-based music engine for the NES
by on (#175156)
thefox wrote:
NDX says "Header format not recognized" for some reason, even though it should support NES 2.0. (Didn't test with the latest version of Nintendulator.)
The NES2.0 magic number is wrong:
        if ((Header[7] & 0x0C) == 0x0C)
                return _T("Header format not recognized - please repair it and try again.");

Nintendulator also complains that there's explicitly no CHR-RAM nor CHR-ROM, even though that's accurate.
Re: MintaBOOM - A WIP sample-based music engine for the NES
by on (#191937)
Minor Update:
I am still working on this little project every now and then, and I have created a simple compression tool that optimizes a raw 8-bit PCM file sampled at 8 272Hz for the player. Currently the player uses a primitive 1-bit RLE compression, to make use of the MSB (since the $4011 register is only 7-bit), which can (depending on the high-frequency content of the audio) cut down the data size by 5-20%. I might make a version of this in the future that sacrifices one bit of fidelity in exchange for 2-bits of run-length, and also add a feature to the compressor to analyze the raw data and replace samples whose values are only different by 1 or 2 with the same sample twice, thus, allowing the RLE to save even more space.
In order to properly decompress the data during runtime, I'm thinking of introducing a large buffer (138 bytes) that would be filled once per frame, and then output in the interrupts. This could also reduce the time spent during an IRQ, and open up the possibility of running some fixed-time tasks from the IRQ handler during the wait between the two successive samples. Alternatively, this new available time could be used to generate and output averaged samples to have some kind of primitive linear interpolation in the sound (which would be missing between every two samples, when the CPU is outside the IRQ).

So this project is not dead, and I'm very much open to suggestions because I really really want to make a digital audio soundtrack possible in an NES game.