What is the max number of CHR tiles that can be updated during vblank on NTSC and PAL? (I'm assuming the full 16 bytes are rewritten, and assuming that sprite DMA takes place)
Edit: The search feature really sucks, I already found out the answer to this question for NTSC, but the search was very unhelpful. Fix the search, and I won't have to ask redundant questions!
I guess it all depends on the method you use. I once heard of a game that would put the data on the stack before VBlank, so that when it's time to write all you have to do is pop the bytes and write to VRAM.
Anyway, I guess that with the quickest methods you'd be able to write 256 bytes during NTSC VBlank safely. That'd be 16 tiles. But then I don't think there would be time for a sprite DMA. Since PAL VBlank takes like, more than 3 times the NTSC one, that'd be a lot more tiles. I'm against PAL-only code though.
Anyway, I think it would be smart to give up on a few scanlines so that you could upload a much higher number of tiles.
Are you planning on doing some sort of bitmap display or something?
Fastest way to modify tile memory? Maybe doing INC $2007 over and over, since it does a double write in a 5-clock instruction, though it also reads so it'd modify two out of every three bytes. You can fill tile memory quickly using an unrolled loop consisting of a bunch of STA $2007 instructions. If the data can be determined in advance, you can dynamically generate code consisting of LDA #value, STA $2007 for each byte to modify. As tokumaru mentioned, you can fill the stack with the reversed data then pop it off in an unrolled loop consisting of PLA, STA $2007.
With about 2270 CPU clocks for VBL, reduced by 513 for sprite DMA, I calculate 439 bytes (27 tiles) filled, 292 bytes (18 tiles) for dynamic writing, and 219 bytes (13 tiles) for the stack method.
If you need to quickly modify ever pixel of many tiles per frame, you could double the tile counts above by modifying only one bit plane of the tiles. To get around the problem of quickly skipping every other 8 bytes, you could use the PPU's 32 byte skip mode and write the data in a somewhat convoluted fashion that reduces the number of $2006 writes to 8:
$0000, $0020 ...
$0001, $0021 ...
This wouldn't complicate the code much since it would only involve storing the source data in this scrambled order. It would mean that you use even tiles only: 0, 2, 4 ...
This has been covered
here.
Tell us if you still need more explainations.