pops wrote:
Thanks for the replies, everyone. I was particularly interested to learn what percentage of a frame was spent in sprite rendering for everyone else. If anyone would like to chime in on this subject, I'd be interested in hearing your statistics as well.
The following post was written while very tired, look forward to missing words/words that rhyme with the intended word.
(Edit3: I just removed some evidence of the tiredness. A whole block of text was in this post twice. >_>)
I typically love to share this sort of info because I also usually want it.
I always optimize a lot, but hearing the cycle counts of other people helps me know if someone's doing something faster. So I can then ask how they did it.
But in this case, it's fairly difficult for me to figure out how long just rendering takes because in my game each object has its own render behavior which can be used solely to render or for many other things. Many objects check for collisions in their render routine rather than their main routine. This ensures that each object's position is already final for that frame, so you don't get stuff like player one not being able to grab player two but player two could grab player one if their positions/speeds were swapped. If I didn't do what I do, player two could move into player one's grab range after the grab attempt already failed, but player one's position would already be final by the time player two tried to grab him... (PS: I highly advise absolutely everyone to ignore things like this if doing NES development. People don't really notice one frame advantages between player ports unless they specifically look for them, and the time it takes to make everything totally fair is huge. I'm somewhat proud I've eliminated most, if not all, cases of frame-unfairness between players, but at the end of the day it just makes my game slower for something no one cares about.)
Anyway, no one asked about any of that. Here's the info you're looking for:
My fastest render routine that draws a 2x2 object to the screen clocks in at around 439 cycles. It does a very small amount of things that aren't directly related to rendering it, but it's probably fewer than 20 cycles, whatever. Doing it 16 times would be 7,024 cycles+some for the loop. So we'll say 7,324, whatever.
In this case, the horizontal/vertical flip components are known in advance and not calculated (but still have to be blended with the palette of the object), and the tiles are not fixed (I have to add an offset to each tile because the same object might use the same set of tiles, but stored in a different place in a different CHR bank). This routine also will not draw say... the right side of the object on the left side of the screen, if the object's top left position isn't off screen. It's a per sprite check. If you don't care about this, you can do it even faster.
... And wait, I guess I lied, because what I thought was going to be my slowest 2x2 routine is actually faster and is 382 cycles max. This 382 cycles is for an object where horizontal/vertical flipping is not known in advance, but no offsets are added to the CHR tile numbers. (It's the main character, so the sprites are always in the same part of the CHR bank.) It 16 times would be 6112 cycles+some for the loop. So 6412, whatever.
I thought it would be slower because it also accounts for wrapping (in levels with wrapping, it will draw the right side of the object on the left side of the screen, in levels without, it doesn't), and makes a few more decisions. The reason it's faster is that it saves the setup (storing to temp RAM so the generic function can act on it, then loading the return values.), plus the tiny 12 cycles of overhead you get from each call to a subroutine.
So maybe 10,500 just for just translating "real" positions to screen positions for sprites/sprites tiles seems a little high to me, for what it's worth.
Edit 2: I guess I should mention that none of my sprite rendering routine will flip metasprites. I have to have both a left facing and right facing metasprite defined in rom in most cases. Though for objects like a rotating ball, you can just swap the tile drawn for upper right and upper left to flip horizontally etc. In fact, the object that took 439 cycles does exactly that. That why the flipping is known in advance, it's always the same for each corner. If flipping metasprites is where your time is going, it's probably not that bad. You might still be able to make it slightly faster, though.
For scrolling, again it's hard for me to figure out (But I won't rant about it this time). But 15% of frame time for both a horizontal and vertical update sounds pretty fast to me, depending on what it includes. (Like... does this include the NMI routine drawing from the buffers/decompressing your level map?) Even if it includes none of that, it may even be faster than mine, which is pretty cool.
If you're really interested I can try to get exact counts, but I will say that in rare cases scrolling alone in my game can dominate up to about half of my frame time if the player happens to scroll past both the x screen boundary and the y screen boundary in the same frame. It also needs the screens themselves to be fairly hard on the decompression routine. (This includes the NMI drawing time, and decompressing 3 new screens of the level.) For cases where screen decompression is not involved, I think it does around the same as yours, maybe worse.
Edit: Here's an old post of mine in an entire topic you might be interested in:
viewtopic.php?p=86756#p86756I think I've made my scrolling faster since then. But the huge CPU sucker in my game is the main character. His routine takes even more time now that he does more, and I'm trying to support two of them with minimal slowdown...