I'm a relative outsider to a lot of this stuff, and I haven't had any dealings with NSF before; this is my first time looking at the format. So some of these questions may have obvious answers that I just didn't think of... but anyway. Looking at the NSF specification on the wiki, I have some questions from the perspective of an implementer:
Where does player code go? Looking at the memory map, it seems like all I have to work with is $4080:$5bff, but with that being peppered by IO ports here and there to complicate things. If you're not supporting FDS, MMC5 or N163 audio, it get somewhat simpler, but that means you wouldn't be able to support tunes using those expansions.. With expansions on, it seems like the largest contiguous region of memory available is about 2KiB, and it would take a lot of work to decode it properly.
Where does player data go? I get to have a few things on the stack, but that's it; the rest of $0000:$07ff and $6000:$7fff is reserved for the tune. The player doesn't even get 2 bytes of zero page space to put a tmp pointer into? What? As above, it seems like the only option is to put about 2KiB or so of ram awkwardly sandwiched into $4080:$5bff somewhere, being careful to stay clear of the IO ports.
How am I supposed to make standards-compliant RAM access? Tunes are not supposed to read or write in $0800:$1fff, and that's space is not supposed to contain a mirror of $0000:$07ff. There's no way to enforce that; the cart doesn't control any of that. The best thing I can think of is watching for accesses to $0800:$1fff and then crashing the whole operation when one occurs. For that matter, why is this rule in place? Are there really famiclones out there that don't mirror the main RAM in this way?
How does banking with interrupt vectors work? $fffa:$ffff is reserved for the player, so I get to put interrupt vectors there. But that's also part of $e000:$ffff, which needs to be swappable on demand in response to the IO port at $5fff. That means that those 6 bytes need to be separately decoded and passed to part of some other RAM or ROM bank? That's a lot of silicon.
Why 4KiB PRG banking? Amongst the other stumbling blocks, this one isn't bad at all; I just don't understand what possible motivation there was for it. If you're writing tunes from scratch, you can easily make do with 8KiB banking only; you have a massive amount of space and banks to work with even with that. If you're extracting tunes from existing NES games, not one single mapper ever in the history of mankind that I'm aware of ever provided finer PRG banking than 8KiB. (And if there is some obscure oddball pirate original that does, that couldn't have possibly been what anyone was thinking of when they designed this spec.) So the ported tunes don't need 4KiB banking.
What am I supposed to do with a play routine that doesn't return? My thought would be that the simple user interface of the player could be handled entirely in NMI, and the play scheduling would be handled in IRQ (the board would have some custom M2 counter with reload that could be used to handle the fine grained timing that tunes require). The main thread would simply do this:
But that would have the possibility of NMI and IRQ occurring while the tune is in its play function, which isn't allowed, because it would have the player using stack space that's expressly reserved for the tune. The player only has $01f0:$01ff as its reserved stack space; the rest is for the tune. If i get an NMI or IRQ inside play, S would presumably be in the reserved range, and so the CPU writing the NMI/IRQ return vector would already be a disallowed function.
Where does player code go? Looking at the memory map, it seems like all I have to work with is $4080:$5bff, but with that being peppered by IO ports here and there to complicate things. If you're not supporting FDS, MMC5 or N163 audio, it get somewhat simpler, but that means you wouldn't be able to support tunes using those expansions.. With expansions on, it seems like the largest contiguous region of memory available is about 2KiB, and it would take a lot of work to decode it properly.
Where does player data go? I get to have a few things on the stack, but that's it; the rest of $0000:$07ff and $6000:$7fff is reserved for the tune. The player doesn't even get 2 bytes of zero page space to put a tmp pointer into? What? As above, it seems like the only option is to put about 2KiB or so of ram awkwardly sandwiched into $4080:$5bff somewhere, being careful to stay clear of the IO ports.
How am I supposed to make standards-compliant RAM access? Tunes are not supposed to read or write in $0800:$1fff, and that's space is not supposed to contain a mirror of $0000:$07ff. There's no way to enforce that; the cart doesn't control any of that. The best thing I can think of is watching for accesses to $0800:$1fff and then crashing the whole operation when one occurs. For that matter, why is this rule in place? Are there really famiclones out there that don't mirror the main RAM in this way?
How does banking with interrupt vectors work? $fffa:$ffff is reserved for the player, so I get to put interrupt vectors there. But that's also part of $e000:$ffff, which needs to be swappable on demand in response to the IO port at $5fff. That means that those 6 bytes need to be separately decoded and passed to part of some other RAM or ROM bank? That's a lot of silicon.
Why 4KiB PRG banking? Amongst the other stumbling blocks, this one isn't bad at all; I just don't understand what possible motivation there was for it. If you're writing tunes from scratch, you can easily make do with 8KiB banking only; you have a massive amount of space and banks to work with even with that. If you're extracting tunes from existing NES games, not one single mapper ever in the history of mankind that I'm aware of ever provided finer PRG banking than 8KiB. (And if there is some obscure oddball pirate original that does, that couldn't have possibly been what anyone was thinking of when they designed this spec.) So the ported tunes don't need 4KiB banking.
What am I supposed to do with a play routine that doesn't return? My thought would be that the simple user interface of the player could be handled entirely in NMI, and the play scheduling would be handled in IRQ (the board would have some custom M2 counter with reload that could be used to handle the fine grained timing that tunes require). The main thread would simply do this:
Code:
1. Check and see if the NMI routine set a new song to play. If so, call init with the new song number.
2. Check and see if the IRQ routine set a memory flag indicating that an IRQ has occured. If so, call the play routine.
3. Goto 1
2. Check and see if the IRQ routine set a memory flag indicating that an IRQ has occured. If so, call the play routine.
3. Goto 1
But that would have the possibility of NMI and IRQ occurring while the tune is in its play function, which isn't allowed, because it would have the player using stack space that's expressly reserved for the tune. The player only has $01f0:$01ff as its reserved stack space; the rest is for the tune. If i get an NMI or IRQ inside play, S would presumably be in the reserved range, and so the CPU writing the NMI/IRQ return vector would already be a disallowed function.