I start at first principles of a programmable cartridge, show how a boot loader fits in, then get to design of a standard PC link cable, and the issues it faces, so that we can focus on these and not be distracted by tangential things like the boot loader. I'll comment in a follow-up posting, so it's separate from this coverage.
Programmable Cartridge
We've got a programmable cartridge and we want to run programs on it. We don't want to remove it from the NES every time we need to change the program, so we connect it to a PC via a link cable. The NES must boot to a loader on the cartridge that receives data from the PC and programs it into the cartridge. This loader knows how to do everything, so the PC just needs to send the data.
We run into problems. We must first write and debug the loader itself. This will involve lots of cartridge swapping, since we don't yet have a loader that uses the link cable. If we ever find any bugs or improve the loader, we must reprogram the cartridge. This requires that the boot area be reprogrammable. If updating the loader ever fails, the cartridge might become unbootable and must be reprogrammed with an external programmer. If we change the protocol, the updater program must be able to send the new loader using the OLD protocol.
Boot loader
A boot loader solves the problems posed by a full loader on the cartridge. The boot loader is a tiny loader whose only purpose is to receive code from the PC and put it in NES RAM, and then execute it. It need not support loading of arbitrary-sized programs to arbitrary regions of memory. The boot loader is simple to get debugged and working. It gives the PC control of the NES, where it can then send the main loader and run it. This allows the main loader to be written and debugged easily. Updates aren't an issue, because the latest version is always running.
The boot loader for a particular cartridge-link cable pair only needs to handle that link cable. If you have programmable cartridge and cable X, and programmable cartridge and cable Y, there's no need for them to use the same boot loader, since the main loader must know about each cartridge anyway. This avoids complicating the boot loader with code to handle many different link cable types. The only reason to have a universal boot loader would be if there were a need for a single cartridge that would work with multiple PC link cable designs.
Low-cost link cable
The lowest-cost PC link cable can be made by anyone with just a few discrete parts (transistor, resistors), and a NES controller and RS-232 cable. Someone constructing this wouldn't necessarily have a programmable cartridge, so he would need to get it from someone else. To make this easier, a boot loader supporting this cable could be put on various homebrew game cartridges as an extra feature. This boot loader would only need to support this low-cost cable; other cable types would be distributed with an associated development cartridge, so that the user didn't need anything extra to use them.
A similar low-cost cable can be made for the Famicom, but it would use different I/O lines. At first it might seem like the same boot loader would need to support this as well as NES, but that's not the case. The boot loader will be on a cartridge, either NES or Famicom, which can only be used on the respective system. Thus, the boot loader put on to NES homebrew would only need to support the NES cable, and on Famicom homebrew would only need to support the Famicom cable.
Even though homebrew cartridges wouldn't be programmable, the boot loader would still allow loading small programs into NES RAM, or if the homebrew cartridge has WRAM, there as well. This covers the boot loader; if user programs want to use the link cable as well, then you might want a standard for that so that programs can work with various link cable types.
PC link cable differences
Different cartridges might use different PC link cables. One might use a USB cable that does parallel transfer over the expansion port, another a synchronous parallel connection using three data inputs on the controller, and third asynchronous serial connected to RS-232. These are each handled by the software on their respective programmable cartridges.
But, user programs might also want to communicate with the PC. To work with different cables, they must either know about every cable type, have hooks for plugging in the proper driver code, or the cables must all behave in a way that a single driver can work with (though not necessarily optimally).
Trying to include driver code for every possible link cable isn't practical. Having hooks is reasonable, and could either be some reserved area of the user code where the appropriate driver can be patched in just before loading the program onto the cartridge, an area in ROM that user code can call, or even the driver routines put into NES RAM by the loader.
The third approach spreads the complexity across hardware and software. The hardware is constrained to something that can be read common driver code, and the common driver is made to allow some flexibility in hardware, but not so much that the driver becomes unwieldy. Note that the hardware need not support only this common way of receiving data; it can support other more capable methods as well, as long as this common one is available by default.
Standard PC link cable
We want a set of standard PC link cable designs for which at least one is easy to build, and all are easy to interface with in software. This will allow it to be used by more than just the loader programs. So, the boot loader isn't relevant to decisions about the cable; it must also work easily with user programs as well.
Coming up with driver code that works with different cables, and cable designs that can all work with driver is challenging, and involves tradeoffs. We don't want to make the driver unnecessarily general, because that makes it more complex, harder to test, and larger. We don't want to over-constrain the link cable design either. So it's important to avoid designing in flexibility just in case it's needed. To avoid this, we should examine the various configurations we might encounter.
Note that a user program is much more likely to merely receive data from the PC, rather than send it. There are lots of useful programs which have the PC stream various kinds of data, without the NES needing to send anything back. So having a standard for the PC-to-NES direction should be the main focus.
Hardware-wise, there's the front-loader NES, redesigned NES, NTSC/PAL versions, Famicom, and NES/Famicom clones. Each has variations on how the link cable can be connected, and the CPU timing.
The standard five-wire NES controller cable only allows for one approach: D0 for data to the NES, and Strobe for data from the NES. The NES connector also has D3 and D4, but the standard cable doesn't connect these, and third-party cables might not either. So this must be one of the supported cable connections. But on the Famicom, D0 is hardwired to the controllers, so another bit must be used for input. The Famicom allows access to Strobe, so that can be used as an output there as well.
Software-wise, there are several considerations.
Ideally, input is in bit 0 of a register. This allows easy shifting into the carry bit without disturbing A. The NES supports this, but the Famicom hardwires the controllers to bit 0, so another bit must be support as well if this is to support Famicom.
Supporting inputs on more than one register is not very practical. The loop that waits for the start bit must introduce as little jitter as possible. This requires that it not take very many cycles per iteration, since that determines jitter. Checking a single register requires 7 cycles per iteration, giving a jitter of +/- 3.5 cycles.
Checking two registers cannot take less than 11 cycles, and realistically would take 17 cycles, giving a jitter of +/- 8.5 cycles.
At 57600 bits per second on a PAL NES, this represents a +/- 30% deviation from the center of a bit, as compared to the +/- 12% a single register gives.
Output-wise, bit 0 of a register is also easier to support. Again, the NES supports this via the strobe, and the Famicom does as well. Outputting on multiple bits would need some justifying hardware reason. Note that even if data were output on another bit, it would still need to be output on bit 0 to support the low-cost link cable.
The PAL NES can be supported by two different drivers (or one with a variable delay) that are selected between at run-time after detecting whether the code is being run on an NTSC or PAL NES. This wouldn't support other CPU clock rates that another clone might have, though.
Programmable Cartridge
We've got a programmable cartridge and we want to run programs on it. We don't want to remove it from the NES every time we need to change the program, so we connect it to a PC via a link cable. The NES must boot to a loader on the cartridge that receives data from the PC and programs it into the cartridge. This loader knows how to do everything, so the PC just needs to send the data.
We run into problems. We must first write and debug the loader itself. This will involve lots of cartridge swapping, since we don't yet have a loader that uses the link cable. If we ever find any bugs or improve the loader, we must reprogram the cartridge. This requires that the boot area be reprogrammable. If updating the loader ever fails, the cartridge might become unbootable and must be reprogrammed with an external programmer. If we change the protocol, the updater program must be able to send the new loader using the OLD protocol.
Boot loader
A boot loader solves the problems posed by a full loader on the cartridge. The boot loader is a tiny loader whose only purpose is to receive code from the PC and put it in NES RAM, and then execute it. It need not support loading of arbitrary-sized programs to arbitrary regions of memory. The boot loader is simple to get debugged and working. It gives the PC control of the NES, where it can then send the main loader and run it. This allows the main loader to be written and debugged easily. Updates aren't an issue, because the latest version is always running.
The boot loader for a particular cartridge-link cable pair only needs to handle that link cable. If you have programmable cartridge and cable X, and programmable cartridge and cable Y, there's no need for them to use the same boot loader, since the main loader must know about each cartridge anyway. This avoids complicating the boot loader with code to handle many different link cable types. The only reason to have a universal boot loader would be if there were a need for a single cartridge that would work with multiple PC link cable designs.
Low-cost link cable
The lowest-cost PC link cable can be made by anyone with just a few discrete parts (transistor, resistors), and a NES controller and RS-232 cable. Someone constructing this wouldn't necessarily have a programmable cartridge, so he would need to get it from someone else. To make this easier, a boot loader supporting this cable could be put on various homebrew game cartridges as an extra feature. This boot loader would only need to support this low-cost cable; other cable types would be distributed with an associated development cartridge, so that the user didn't need anything extra to use them.
A similar low-cost cable can be made for the Famicom, but it would use different I/O lines. At first it might seem like the same boot loader would need to support this as well as NES, but that's not the case. The boot loader will be on a cartridge, either NES or Famicom, which can only be used on the respective system. Thus, the boot loader put on to NES homebrew would only need to support the NES cable, and on Famicom homebrew would only need to support the Famicom cable.
Even though homebrew cartridges wouldn't be programmable, the boot loader would still allow loading small programs into NES RAM, or if the homebrew cartridge has WRAM, there as well. This covers the boot loader; if user programs want to use the link cable as well, then you might want a standard for that so that programs can work with various link cable types.
PC link cable differences
Different cartridges might use different PC link cables. One might use a USB cable that does parallel transfer over the expansion port, another a synchronous parallel connection using three data inputs on the controller, and third asynchronous serial connected to RS-232. These are each handled by the software on their respective programmable cartridges.
But, user programs might also want to communicate with the PC. To work with different cables, they must either know about every cable type, have hooks for plugging in the proper driver code, or the cables must all behave in a way that a single driver can work with (though not necessarily optimally).
Trying to include driver code for every possible link cable isn't practical. Having hooks is reasonable, and could either be some reserved area of the user code where the appropriate driver can be patched in just before loading the program onto the cartridge, an area in ROM that user code can call, or even the driver routines put into NES RAM by the loader.
The third approach spreads the complexity across hardware and software. The hardware is constrained to something that can be read common driver code, and the common driver is made to allow some flexibility in hardware, but not so much that the driver becomes unwieldy. Note that the hardware need not support only this common way of receiving data; it can support other more capable methods as well, as long as this common one is available by default.
Standard PC link cable
We want a set of standard PC link cable designs for which at least one is easy to build, and all are easy to interface with in software. This will allow it to be used by more than just the loader programs. So, the boot loader isn't relevant to decisions about the cable; it must also work easily with user programs as well.
Coming up with driver code that works with different cables, and cable designs that can all work with driver is challenging, and involves tradeoffs. We don't want to make the driver unnecessarily general, because that makes it more complex, harder to test, and larger. We don't want to over-constrain the link cable design either. So it's important to avoid designing in flexibility just in case it's needed. To avoid this, we should examine the various configurations we might encounter.
Note that a user program is much more likely to merely receive data from the PC, rather than send it. There are lots of useful programs which have the PC stream various kinds of data, without the NES needing to send anything back. So having a standard for the PC-to-NES direction should be the main focus.
Hardware-wise, there's the front-loader NES, redesigned NES, NTSC/PAL versions, Famicom, and NES/Famicom clones. Each has variations on how the link cable can be connected, and the CPU timing.
The standard five-wire NES controller cable only allows for one approach: D0 for data to the NES, and Strobe for data from the NES. The NES connector also has D3 and D4, but the standard cable doesn't connect these, and third-party cables might not either. So this must be one of the supported cable connections. But on the Famicom, D0 is hardwired to the controllers, so another bit must be used for input. The Famicom allows access to Strobe, so that can be used as an output there as well.
Software-wise, there are several considerations.
Ideally, input is in bit 0 of a register. This allows easy shifting into the carry bit without disturbing A. The NES supports this, but the Famicom hardwires the controllers to bit 0, so another bit must be support as well if this is to support Famicom.
Supporting inputs on more than one register is not very practical. The loop that waits for the start bit must introduce as little jitter as possible. This requires that it not take very many cycles per iteration, since that determines jitter. Checking a single register requires 7 cycles per iteration, giving a jitter of +/- 3.5 cycles.
Code:
lda #mask
wait: bit register ; 4
beq wait ; 3
wait: bit register ; 4
beq wait ; 3
Checking two registers cannot take less than 11 cycles, and realistically would take 17 cycles, giving a jitter of +/- 8.5 cycles.
Code:
wait: lda register1 ; 4
and #mask1 ; 2
bne start ; 2
lda register2 ; 4
and #mask2 ; 2
beq wait ; 3
start:
and #mask1 ; 2
bne start ; 2
lda register2 ; 4
and #mask2 ; 2
beq wait ; 3
start:
At 57600 bits per second on a PAL NES, this represents a +/- 30% deviation from the center of a bit, as compared to the +/- 12% a single register gives.
Output-wise, bit 0 of a register is also easier to support. Again, the NES supports this via the strobe, and the Famicom does as well. Outputting on multiple bits would need some justifying hardware reason. Note that even if data were output on another bit, it would still need to be output on bit 0 to support the low-cost link cable.
The PAL NES can be supported by two different drivers (or one with a variable delay) that are selected between at run-time after detecting whether the code is being run on an NTSC or PAL NES. This wouldn't support other CPU clock rates that another clone might have, though.