Basic use of CHR-RAM is quite simple. The only difference from CHR-ROM is that it's empty on power on, and your program has to copy tiles from PRG-ROM to CHR-RAM before it can display any tiles. Here's a simple example of how to populate the CHR-RAM so you can start using the name table, palettes, etc. to show something on the screen:
Code:
;set up a pointer to read the tiles
lda #<MyTilesStart
sta Pointer+0
lda #>MyTilesStart
sta Pointer+1
;set the destination address to the beginning of the pattern tables
lda #$00
sta $2006
sta $2006
;prepare to loop as many times as necessary to copy all the tiles
ldx #((MyTilesEnd - MyTilesStart) / 16)
CopyTile:
;prepare to copy the first byte of the tile
ldy #$00
CopyByte:
;copy one byte an move on to the next
lda (Pointer), y
sta $2007
iny
;go copy another byte if we haven't copied all 16 yet
cpy #$10
bne CopyByte
;update the tile counter and skip to the end if done
dex
beq Done
;move the pointer over to the next tile and go copy that tile
clc
lda Pointer+0
adc #$10
sta Pointer+0
bcc CopyTile
inc Pointer+1
bcs CopyTile
Done:
(...)
MyTilesStart:
.incbin "tiles.chr" ;<- this is still in the PRG-ROM area, not after it as is the case with CHR-ROM
MyTilesEnd:
This will copy all the tiles in the CHR file (up to 256, or 4KB) to the beginning of the first pattern table. Note that this can be optimized in many ways, this is intended simply as a straightforward example. After this code runs, you can use those tiles just as if you were using CHR-ROM.
You can do the same thing later on to update a few tiles each frame as your game runs, to create animations and the like. Keep in mind that you can only do this during vblank, which is a very short time, so you won't be able to update a lot of tiles each time. Normally it's possible to update 8 to 12 tiles per frame, depending on what other updates you have to do. That estimate also assumes a much more optimized code than the slow ass example I wrote above. One way to do it is to buffer the data on the stack before vblank starts, and when it does you just run a series of
PLA +
STA $2007 commands, which takes 8 CPU cycles per byte (i.e. 128 cycles per tile).