Pretty cool! I saw this thread a few days ago but didn't have time to look through the code until today. Now that I've looked through it I want to throw some ideas your way. BTW, what you wrote is just fine and you don't have to change anything. These are just ideas so feel free to ignore them if they don't work for you
. Just some tips that could improve performance:
1) You can save some more bytes on your background data by removing the $FF terminator on all columns. Usually terminators are used when you have data of variable length, where you don't have any idea when it's going to end (dialogue text, for example). Since all your columns are the same size, you can know when you reach the end by keeping a count of how many tiles you've processed. I'd load the column size into an index register (or a temp variable if X and Y are both being used) and then decrement it every time you write a tile to the buffer. Column is finished when your counter reaches 0. Cutting the terminator will save you 16 bytes per screen, which makes a significant difference if you have a bunch of screens.
2) Not quite sure why you have a special case for the sky metatile. My guess is that you were trying to save a byte by not having to declare the sky metatile id in the case of a normal RLE, ie for a run of 8 you save a byte like this: $FE, sky_metatile_id, $08 -> $FD, $08.
It makes some sense since sky is used a lot, but you use a lot of bytes hardcoding the subroutine to handle the special sky case and if you change the ordering of your tiles in the pattern tables, you have to hunt down the hardcoded special cases and change the codes manually.
Another way to do RLE is to have a bit (say, bit7) signify RLE mode instead of a whole byte. Then you can pack the RLE indicator in with the metatile ids. So instead of:
.byte $01, $02 ;2 metatiles
.byte $FE, $10, $04, ;a run of 4 metatiles
.byte $FE, $07, $06 ;a run of 6 metatiles
you could have:
.byte $01, $02 ;2 metatiles
.byte $90, $04 ;a run of 4 (metatile id $10, with the RLE bit set)
.byte $87, $06 ;a run of 6 (metatile id $07, with the RLE bit set)
This way every run is 2 bytes and its generic so nothing needs to be hardcoded.
3) Your drawing buffer appears to draw every frame, even if there is no movement/change. In other words, when the player isn't moving, you keep drawing the same offscreen columns over and over again. This is ok for a demo, but in a game it wastes some drawing time that you might need to do other PPU updates. One way to solve this is to put a cap on the drawing buffer (say a $00). After you draw a column, put the cap at the beginning of your buffer. Then in your buffer->ppu subroutine, skip to RTS if the first byte is $00.
4) Your drawing buffer could be set up to draw strings of bytes instead of just individual bytes. Right now, you set the target PPU address via $2006 for each byte. But really you only need to write an address to $2006 once per column. The PPU supports column drawing (bit2 of $2000 toggles row/column drawing). So instead of your drawing buffer looking like this:
hi_address, lo_address, tile, hi_address, lo_address, tile, hi_address, lo_address, tile, etc ;3 PPU writes per tile!
you could have:
count, hi_address, lo_address, tile, tile, tile, tile, tile, tile ;set address once, and then 1 PPU write per tile
The key is to set the PPU to increment by 32 via bit2 in $2000 so that consecutive writes draws a column instead of a row (see
http://wiki.nesdev.com/w/index.ph... ).
Oops, gotta go to class. No time to edit. Hope that made some sense
Good job, btw. Your code is starting to look slick!