Being the indecisive person I am, I've battled back and forth as to whether I want to have my object table be indexed by direct page or x/y. I've thought hard about it, and now I'm kind of curious I ever wanted it the other way, because it's faster and overall just better for every application I can think of:
For any routine where you are constantly searching through object slots, each load or store will take an extra cycle (it will take 2 extra cycles if the index registers are 16 bit, but you only need 8 bit for this, because 128 objects with 2 byte object slots will just fit), but you only need to do 2 inx or iny's instead of tdc, clc, adc, tcd. Heck, the accumulator might not even be 16 bit here, so that's an extra instruction or two.
Direct page can be freed up to actually be useful as a 16 bit index register, and as I said earlier, There's no reason index the object table by a 16 bit number. I've have situations where x and y are 16 bit, but only one of them has to be, it's just that you can't do that. Now though, you just make the thing that needs to be indexed by a 16 bit number be indexed by direct page.
The object table no longer has to be in the first 8KB of ram and can even have some parts of it outside the first 8KB of ram.
So yeah, I'm pretty dumb. I found I could make my metasprite routine significantly faster now because what I'm mostly loading in that routine is metasprite data that will now be direct page, (1 cycle faster, but has to all be in 32KB of rom. I can actually shrink each sprite in the metasprite from 8 to 6 bytes with a little more processing time, but it's still much faster than metasprite data being indexed by x or y) so x and y can now be 8 bit, because the object table can be indexed by an 8 bit number and I could easily index oam by an 8 bit number if I create an otherwise duplicate routine, which I'm more than willing to do to save CPU time.
For any routine where you are constantly searching through object slots, each load or store will take an extra cycle (it will take 2 extra cycles if the index registers are 16 bit, but you only need 8 bit for this, because 128 objects with 2 byte object slots will just fit), but you only need to do 2 inx or iny's instead of tdc, clc, adc, tcd. Heck, the accumulator might not even be 16 bit here, so that's an extra instruction or two.
Direct page can be freed up to actually be useful as a 16 bit index register, and as I said earlier, There's no reason index the object table by a 16 bit number. I've have situations where x and y are 16 bit, but only one of them has to be, it's just that you can't do that. Now though, you just make the thing that needs to be indexed by a 16 bit number be indexed by direct page.
The object table no longer has to be in the first 8KB of ram and can even have some parts of it outside the first 8KB of ram.
So yeah, I'm pretty dumb. I found I could make my metasprite routine significantly faster now because what I'm mostly loading in that routine is metasprite data that will now be direct page, (1 cycle faster, but has to all be in 32KB of rom. I can actually shrink each sprite in the metasprite from 8 to 6 bytes with a little more processing time, but it's still much faster than metasprite data being indexed by x or y) so x and y can now be 8 bit, because the object table can be indexed by an 8 bit number and I could easily index oam by an 8 bit number if I create an otherwise duplicate routine, which I'm more than willing to do to save CPU time.