Okay, so I've come to the conclusion that it would be of great benefit if I could design a mapper suited to my needs. I don't really know where to begin, so I hoped to pose a lot of the questions and goals I have here in hopes of getting a little direction to begin.
I don't need to go overboard. One of the main goals of making the custom mapper is affordability in a (relatively) large quantity, so if it's gets too expensive, it's not practical. I don't need like a full MMC5 here, but I'd like almost any features that are affordable, so I'm going to make a prioritized list.
1. Clever Scanline IRQs-
This is the thing that's missing from the available mapper selection. I'm not sure how the one on the MMC5 functions, because I've always considered it unavailable for production. (The wiki says it's not known how it senses scanlines. Interesting)
I have played around a little with the MMC3, VRC7, and FME-7. Overall, what is probably the ideal scanline counter combines things from all three.
The MMC3 will actually work for what I'm doing. The way it times scanlines automatically snaps them to rendering timing, which a cycle based counter can't do. The downside is that you have to burn cycles waiting for hBlank. In a lot of circumstances you can put some IRQ setup code in that portion, but, there's almost always going to be wasted cycles.
VRC7, as a cycle based scanline counter, has the issues of compounding latency. The neat thing about this one is the autoscaler in hardware. It's still cycle-based though so it won't lock it's timing.
FME7 is a great mapper. I've been using it. It's PRG-ROM and RAM bankswitching options are the best as far as I know, outside of the MMC5. It's scanline counter, however, leaves a lot to be desired. The precise cycle based timing is nice to negate the need for NOPs, but without a way to read the counter and compensate for latency, it becomes inflexible.
I have been experimenting with whether or not I can do what I need to do with this mapper, and I am still unsure. I need the scanlines on which IRQs fire to vary based on Y scroll. If I can get it to work with FME7, it's going to require a table of high byte and low byte for scanlines 0-239. That in itself isn't terrible, but linking the different scanline IRQs is a pain. In my tests, on some numbers of scanlines, I can get more than 16 X/Y scroll settings to fall into hBlank. On some numbers, for example, trying to reset the scroll every two scanlines, I can only fit a couple into hBlank before they start glitching out before and after. I probably haven't thought about this enough. Maybe I can use some other timing source to correct IRQ latency, but I haven't seen the answer. In any case, it's a pain to use for complex raster effects.
Something like the second level background on Recca would be borderline impossible with FME7. MMC3 would waste a lot of cycles, but it wouldn't be hard.
2. Fine PRG Banking
For PRG banking, I honestly want basically what the FME-7 has. The only thing the MMC5 has over FME-7 in this aspect is the fact that it can bankswitch RAM into more than just one slot. That's cool, if it wasn't expensive I'd want that improvement but it's not that big of a deal. Bankswitching RAM at all is a big deal even with one window.
One idea, which would be fantastic if it could work, but probably too complicated, would be to be able to bankswitch CHR-RAM into the PRG window. This is the only way I know you'd really be able to address it during rendering, since it would have to have a location in the CPU memory map. This is just a thought, and if possible, would be neat, but I'm not hinging anything on it. I would expect this idea to be non-feasible in a monetary sense. The downside if it could work is that it will require the board to support larger PRG-ROM to hold the data for both code and graphics.
3. Fine CHR Banking
FME-7, VRC7, MMC5 all support 8 x 1KB. This is great. MMC3's 4 x 1KB + 2 x 2KB is alright, but, finer is better. Since the banks are already split into 1KB on MMC3, you don't gain a benefit of larger capacity or anything for it.
The only board that I saw that looked like it has significant improvements to the CHR bankswitching was the Namco 163 with bankswitching nametables. I honestly don't really understand what this is for. I don't know if it would provide any benefit that I could use in games.
4. Large ROM capacity
What's the downside of this? Having to use a 16-bit value for bank numbers? That's not so bad. I can't imagine getting a 512 KB or 1024 KB ROM manufactured these days would cost much more than a 256KB ROM. If the bigger ROM makes the game better, then having the board allow that would be nice.
5. 8x8 attributes
It would be nice if this could be turned off for instances when you don't need 8x8 and don't want to use the cycles or ROM for the data, but other than that, I think it would suffice to say that 8x8 attributes would be great from a design perspective, although I have no idea what it takes to implement this in hardware.
Other features would just be basically a wishlist of stuff from the MMC5. If it was really cheap and easy to design, I'd like it, but I don't need the other things. Mostly it's combining a workable scanline counter with fine PRG and CHR banking. If I'm going that far, I'd like to be able to put a good sized ROM on there, and I'd like to explore the possibility of enabling 8x8 attributes.
I don't even know where to begin. I get the general idea that I'm going to have to design a PCB with ICs and other components, but I'm no expert on the subject. Even if I was to hire someone, I highly doubt I could ask a company to build something to the specifications of the Nintendo Entertainment System and expect them to do so without me providing a lot of the fundamental information. I'm sure that it can be done, although I don't know how, but I'm willing to put the work into it, as I am with my game.
I don't need to go overboard. One of the main goals of making the custom mapper is affordability in a (relatively) large quantity, so if it's gets too expensive, it's not practical. I don't need like a full MMC5 here, but I'd like almost any features that are affordable, so I'm going to make a prioritized list.
1. Clever Scanline IRQs-
This is the thing that's missing from the available mapper selection. I'm not sure how the one on the MMC5 functions, because I've always considered it unavailable for production. (The wiki says it's not known how it senses scanlines. Interesting)
I have played around a little with the MMC3, VRC7, and FME-7. Overall, what is probably the ideal scanline counter combines things from all three.
The MMC3 will actually work for what I'm doing. The way it times scanlines automatically snaps them to rendering timing, which a cycle based counter can't do. The downside is that you have to burn cycles waiting for hBlank. In a lot of circumstances you can put some IRQ setup code in that portion, but, there's almost always going to be wasted cycles.
VRC7, as a cycle based scanline counter, has the issues of compounding latency. The neat thing about this one is the autoscaler in hardware. It's still cycle-based though so it won't lock it's timing.
FME7 is a great mapper. I've been using it. It's PRG-ROM and RAM bankswitching options are the best as far as I know, outside of the MMC5. It's scanline counter, however, leaves a lot to be desired. The precise cycle based timing is nice to negate the need for NOPs, but without a way to read the counter and compensate for latency, it becomes inflexible.
I have been experimenting with whether or not I can do what I need to do with this mapper, and I am still unsure. I need the scanlines on which IRQs fire to vary based on Y scroll. If I can get it to work with FME7, it's going to require a table of high byte and low byte for scanlines 0-239. That in itself isn't terrible, but linking the different scanline IRQs is a pain. In my tests, on some numbers of scanlines, I can get more than 16 X/Y scroll settings to fall into hBlank. On some numbers, for example, trying to reset the scroll every two scanlines, I can only fit a couple into hBlank before they start glitching out before and after. I probably haven't thought about this enough. Maybe I can use some other timing source to correct IRQ latency, but I haven't seen the answer. In any case, it's a pain to use for complex raster effects.
Something like the second level background on Recca would be borderline impossible with FME7. MMC3 would waste a lot of cycles, but it wouldn't be hard.
2. Fine PRG Banking
For PRG banking, I honestly want basically what the FME-7 has. The only thing the MMC5 has over FME-7 in this aspect is the fact that it can bankswitch RAM into more than just one slot. That's cool, if it wasn't expensive I'd want that improvement but it's not that big of a deal. Bankswitching RAM at all is a big deal even with one window.
One idea, which would be fantastic if it could work, but probably too complicated, would be to be able to bankswitch CHR-RAM into the PRG window. This is the only way I know you'd really be able to address it during rendering, since it would have to have a location in the CPU memory map. This is just a thought, and if possible, would be neat, but I'm not hinging anything on it. I would expect this idea to be non-feasible in a monetary sense. The downside if it could work is that it will require the board to support larger PRG-ROM to hold the data for both code and graphics.
3. Fine CHR Banking
FME-7, VRC7, MMC5 all support 8 x 1KB. This is great. MMC3's 4 x 1KB + 2 x 2KB is alright, but, finer is better. Since the banks are already split into 1KB on MMC3, you don't gain a benefit of larger capacity or anything for it.
The only board that I saw that looked like it has significant improvements to the CHR bankswitching was the Namco 163 with bankswitching nametables. I honestly don't really understand what this is for. I don't know if it would provide any benefit that I could use in games.
4. Large ROM capacity
What's the downside of this? Having to use a 16-bit value for bank numbers? That's not so bad. I can't imagine getting a 512 KB or 1024 KB ROM manufactured these days would cost much more than a 256KB ROM. If the bigger ROM makes the game better, then having the board allow that would be nice.
5. 8x8 attributes
It would be nice if this could be turned off for instances when you don't need 8x8 and don't want to use the cycles or ROM for the data, but other than that, I think it would suffice to say that 8x8 attributes would be great from a design perspective, although I have no idea what it takes to implement this in hardware.
Other features would just be basically a wishlist of stuff from the MMC5. If it was really cheap and easy to design, I'd like it, but I don't need the other things. Mostly it's combining a workable scanline counter with fine PRG and CHR banking. If I'm going that far, I'd like to be able to put a good sized ROM on there, and I'd like to explore the possibility of enabling 8x8 attributes.
I don't even know where to begin. I get the general idea that I'm going to have to design a PCB with ICs and other components, but I'm no expert on the subject. Even if I was to hire someone, I highly doubt I could ask a company to build something to the specifications of the Nintendo Entertainment System and expect them to do so without me providing a lot of the fundamental information. I'm sure that it can be done, although I don't know how, but I'm willing to put the work into it, as I am with my game.