8x8 attributes with no mmc5 have been discussed at some point, but not sure anyone's actually tried them out yet.
So after experimenting with my new pass-through hardware debugging board, I figured I'd post it here and ask if anyone would find this to be a useful enough feature to spend logic on - despite the very quirky memory layout.
I chose mapper30 as an experimental platform, mainly because I did an Everdrive/Powerpak implementation (actually more like a copy'n'paste job from the available UNROM implementations). But also because I wanted to have a think about the minimum number of TTL circuits would be needed to add it to a popular discrete-circuit board.
The logic is really simple (and has been suggested on the forum): Latch the lowest X/Y during nametable accesses, and apply this during the attribute table accesses. In the goal of keeping this as simple as possible, these two bits get directly used to as CHR_A14 and CHR_A13 on the 32kB CHR-RAM chip. This means:
1) The new attribute table has a bit of an awkward layout: 4 different 64bytes areas, all mapped to address 03C0-03FF, 07C0-07FF, 0BC0-0BFF and 0FC0-0FFF for each nametable, across all 8kB CHR-RAM pages.
2) You gain some, you lose some: Because these areas overlap with the CHR, there's going to be 16 tiles particular tiles in pattern table 0 of each 8kB CHR that you need to avoid using. (though this could probably be cut to 8 tiles if you take advantage of mirroring, or 4 if you do single-screen mirroring)
I haven't had time to make a test ROM, so I've tried it out with a Megaman ROM instead. Here's the colourful mess of attribute accesses it currently displays:
And here's the very simple Everdrive Verilog code:
I originally tried to use an asynchronous latch that would latch the value whenever 'ppu_oe' was low and 'attribute_access' was high. But this resulted in glitchy attributes. Not sure if this is due to actual quirks of the PPU signals, or just some FPGA-specific limitation.
So instead I treated ppu_oe as a clock and latched the attribute fine coordinates on the falling edge. Seems kind of naughty as it is not a dedicated clock input - but it appears to work.
If sticking to discrete component, I think it'd take say a 7408 'AND' TTL circuit, a 7400 'NAND' TTL circuit, a TTL latch circuit, and maybe an OR circuit. So it'd roughly double the number of TTL circuits on a mapper30 board, which isn't great but also wouldn't break the bank. This is mostly guesstimates though - I'd like to sketch up and implement this stuff on actual TTL circuits at some point in the future just as another fun experiment - but feel free to beat me to it.
A few thoughts of mine:
1) Infiniteneslives 4-screen variant of mapper30 could be used to provide a configuration that doesn't overlap with CHR, because you've got 4kB spare CHR-RAM and would only need 1kB for 8x8 attributes for all 4 screens. This would probably make the logic circuits required slightly higher, as you'd have to multiplex signals. It also can't be prototyped on an Everdrive/Powerpak, because they both cannot address less than a 1kB page.
2) Likewise, the single-screen variant of mapper30 would probably be more useful if the attribute fetches were redirected to the other screen that's currently not selected. This could give you an attribute table that does not overlap with CHR-RAM, and could still leaves some room for a status bar. But the same notes about increased IC cost/incompatibility with Powerpak/Everdrive apply.
3) 8x1 attributes should be fairly simple to implement by latching the lowest 3 bits from the PPU's CHR-fetch, but again it means increased number of discrete ICs. Also not sure how useful 8x8 attributes would really be? You'd still have a pretty awkward memory layout, and would need to have a 2kB attribute table for a single screen - twice the memory of a vanilla nametable to update.
4) And obviously, this solution might be a better fit for a CPLD-based implementation of MMC3/FME-7 and similar more advanced mappers, where you've already ditched discrete mappers and stepped up the cost. I haven't tried prototyping any of these mappers in a CPLD, but would hope this feature could be added without breaking the macrocell budget.
Let me know your thoughts on what you think would be the most useful of these hypothetical configurations.
So after experimenting with my new pass-through hardware debugging board, I figured I'd post it here and ask if anyone would find this to be a useful enough feature to spend logic on - despite the very quirky memory layout.
I chose mapper30 as an experimental platform, mainly because I did an Everdrive/Powerpak implementation (actually more like a copy'n'paste job from the available UNROM implementations). But also because I wanted to have a think about the minimum number of TTL circuits would be needed to add it to a popular discrete-circuit board.
The logic is really simple (and has been suggested on the forum): Latch the lowest X/Y during nametable accesses, and apply this during the attribute table accesses. In the goal of keeping this as simple as possible, these two bits get directly used to as CHR_A14 and CHR_A13 on the 32kB CHR-RAM chip. This means:
1) The new attribute table has a bit of an awkward layout: 4 different 64bytes areas, all mapped to address 03C0-03FF, 07C0-07FF, 0BC0-0BFF and 0FC0-0FFF for each nametable, across all 8kB CHR-RAM pages.
2) You gain some, you lose some: Because these areas overlap with the CHR, there's going to be 16 tiles particular tiles in pattern table 0 of each 8kB CHR that you need to avoid using. (though this could probably be cut to 8 tiles if you take advantage of mirroring, or 4 if you do single-screen mirroring)
I haven't had time to make a test ROM, so I've tried it out with a Megaman ROM instead. Here's the colourful mess of attribute accesses it currently displays:
Attachment:
And here's the very simple Everdrive Verilog code:
Attachment:
I originally tried to use an asynchronous latch that would latch the value whenever 'ppu_oe' was low and 'attribute_access' was high. But this resulted in glitchy attributes. Not sure if this is due to actual quirks of the PPU signals, or just some FPGA-specific limitation.
So instead I treated ppu_oe as a clock and latched the attribute fine coordinates on the falling edge. Seems kind of naughty as it is not a dedicated clock input - but it appears to work.
If sticking to discrete component, I think it'd take say a 7408 'AND' TTL circuit, a 7400 'NAND' TTL circuit, a TTL latch circuit, and maybe an OR circuit. So it'd roughly double the number of TTL circuits on a mapper30 board, which isn't great but also wouldn't break the bank. This is mostly guesstimates though - I'd like to sketch up and implement this stuff on actual TTL circuits at some point in the future just as another fun experiment - but feel free to beat me to it.
A few thoughts of mine:
1) Infiniteneslives 4-screen variant of mapper30 could be used to provide a configuration that doesn't overlap with CHR, because you've got 4kB spare CHR-RAM and would only need 1kB for 8x8 attributes for all 4 screens. This would probably make the logic circuits required slightly higher, as you'd have to multiplex signals. It also can't be prototyped on an Everdrive/Powerpak, because they both cannot address less than a 1kB page.
2) Likewise, the single-screen variant of mapper30 would probably be more useful if the attribute fetches were redirected to the other screen that's currently not selected. This could give you an attribute table that does not overlap with CHR-RAM, and could still leaves some room for a status bar. But the same notes about increased IC cost/incompatibility with Powerpak/Everdrive apply.
3) 8x1 attributes should be fairly simple to implement by latching the lowest 3 bits from the PPU's CHR-fetch, but again it means increased number of discrete ICs. Also not sure how useful 8x8 attributes would really be? You'd still have a pretty awkward memory layout, and would need to have a 2kB attribute table for a single screen - twice the memory of a vanilla nametable to update.
4) And obviously, this solution might be a better fit for a CPLD-based implementation of MMC3/FME-7 and similar more advanced mappers, where you've already ditched discrete mappers and stepped up the cost. I haven't tried prototyping any of these mappers in a CPLD, but would hope this feature could be added without breaking the macrocell budget.
Let me know your thoughts on what you think would be the most useful of these hypothetical configurations.