How many cycles takes in your program each section?
-Game logic
-Scrolling
-Apply objects physics
-Drawing objects (to shadow oam)
-Object-Object collision
-Object-Background collision
I think I will run out of time if there's more than 6-7 objects in a frame, only drawing the metasprites (with H or V flip possible) takes about ySize * xSize * 73 cycles (+setup) per metasprite.
Your metasprite drawing routine sounds incredibly slow. Are you sure you can't optimize it anymore?
Anyway, one of the slowest things in my engine is the scrolling, because of the realtime map decoding that has to be done. I believe it takes something close to 30 scanlines to process a new row or column of metatiles (meaning that the time taken to update both in the same frame is considerable). I'm considering unrolling some of the code to make it faster.
tokumaru wrote:
Your metasprite drawing routine sounds incredibly slow. Are you sure you can't optimize it anymore?
Anyway, one of the slowest things in my engine is the scrolling, because of the realtime map decoding that has to be done. I believe it takes something close to 30 scanlines to process a new row or column of metatiles (meaning that the time taken to update both in the same frame is considerable). I'm considering unrolling some of the code to make it faster.
Each sprite copy is something like:
Code:
lda _tmpMetaSprite.y
sta shadowOAM, x
lda _tmpMetaSprite.firstTile
sta shadowOAM+1, x
clc
adc _vidXTileInc
sta _tmpMetaSprite.firstTile
lda _tmpMetaSprite.attributes
sta shadowOAM+2, x
lda _tmpMetaSprite.x
sta shadowOAM+3, x
clc
adc #TILE_X_LENGTH
sta _tmpMetaSprite.x
if(not carry) {
txa
clc
adc #sizeof(OAM_ENTRY)*7
if(zero) {
lda #sizeof(OAM_ENTRY)*7
}
sta _vidLastHWSpriteUsed
tax
dec _vidSpritesFree
} else {
return
}
dey
Plus you have to reverse X order if Hflip and invert y order if Vflip
I don't think I have any suggestions for improving your routine, since a lot of game aspects have influence over this kind of thing. I just thought the times you mentioned sounded bad, but then again this is a complex process due to the flipping, clipping of parts that are off screen and even OAM cycling, so it might not be as slow as I first thought it was. Mine is fairly slow too, but I don't have any numbers.
tokumaru wrote:
I don't think I have any suggestions for improving your routine, since a lot of game aspects have influence over this kind of thing. I just thought the times you mentioned sounded bad, but then again this is a complex process due to the flipping, clipping of parts that are off screen and even OAM cycling, so it might not be as slow as I first thought it was. Mine is fairly slow too, but I don't have any numbers.
I use the virtuanes referred here:
http://nesdev.com/bbs/viewtopic.php?t=5836
To check routines time, it's really useful.
You only have to write to a register to start counting and write to another to stop the count.
The engine of my game was programmed in this order :
- Move the player arround
- Execute enemies (objects) AI
- Draw graphical effects (explosions etc...)
- Sort player and objects in function of their Y coordinate (my game use Zelda-style top down view so sprite sorting is required)
- Draw player and objects
- Update status bar if necessarly
- Check if level beaten or game over, and if not, repeat indefinitely
Music is done in the NMI thread and scrolling is only executed when wrapping from a screen to another.
Metasprite mazing is fairly slow but to improve it a little I used self-modifying code so that I could do sprite cycling while being as fast as if no sprite cycling chances (it's the code itself that will change the order in which the sprites are mazed, instead of having to check a variable and draw the sprite in a different order in function of this variable).
Ah, I remember that. I guess I'll time my routines with it once I finish migrating my code to a new mapper (again!... I'm going for BxROM now, so that I can go up to 512KB with really simple hardware).
Hardwired banks : to hell
I think my scrolling + NMI takes like... max 14,000 cycles? And that is SUPER uncommon.
When my game is doing nothing it's 2,400 cycles for the NMI+other basic stuff.
Scrolling left or right direction is 4,440 or 6,900 (these counts all include the NMI and basic game stuff as far as I know) cycles when a new screen isn't being decoded. This is when it's scrolling fast and passed both a tile boundary and an attribute boundary in the same frame. If it's scrolling slower than 4 pixels a frame, it's only ~5,000 cycles. ~9,000 when it scrolls passed a screen boundary+tile boundary+attribute boundary. (It can be more, depending on the screens, but it's uncommon)
Scrolling up down is a little worse. ~9,000 for scrolling passed a screen boundary, ~6,000 for regular scrolling.
The absolute worse is when it's scrolling in both directions at once, and passed both screen boundaries at once, and even then it ensures that the shared screen both scroll to is not loaded twice.
My main character's routine (sonic style slope physics, + 8 direction shooting) takes a max of like 5,000 cycles. And I wanna support co-op, so I have my work cut out for me. The absolute worse case is the frame when jumping, and the frame when landing, but other than those two cases, it's 2,500 to 3,000.
As it stands though, if both players land in the same frame, on a frame where it also passed two scroll boundaries I'm using 24,000 cycles. This is with no object-object collision detection, and no object other than the bullets.
I hope that with simple AI, and my current plan to optimize the main character's routine I can pull this off. Otherwise, I might be stuck with an N+ style game with few objects, because I don't wanna compromise my awesome physics.
I'll probably even support a 4 player battle mode, since with a screen that doesn't scroll that can definitely happen.
Bregalad wrote:
Hardwired banks : to hell
Hehehe! I used to think that switching the entire 32KB at once would be a pain, but by using a "virtual" fixed bank (small piece of code repeated at the end of every bank), it's actually not that bad.
Wave wrote:
How many cycles takes in your program each section?
-Game logic
-Scrolling
-Apply objects physics
-Drawing objects (to shadow oam)
-Object-Object collision
-Object-Background collision
I think I will run out of time if there's more than 6-7 objects in a frame, only drawing the metasprites (with H or V flip possible) takes about ySize * xSize * 73 cycles (+setup) per metasprite.
One strategy I utilized in my game was splitting logic between even and odd frames. I'm sure not everyone here is a fan of this strategy, but it ended up working incredibly well for me and I couldn't tell a major difference in sprite movement or lag on either my CRT (running from the PowerPak/NES), my flatscreen in the living room (again, running from my PowerPak/NES) or on an emulator on my laptop.
For instance, the "Apply objects physics", "Object-Object collision", and "Drawing objects (to shadow oam)" together would end up being too much and push things out of frame about 80-90% of the time. So what I ended doing was running the calculations in even frames and then doing the drawing part of it on odd frames. Of course this means you are essentially slowing your game down by half, so you need to re-tweak the speeds at which things move in order to keep the movement close to how you originally designed the game. Another drawback is you lose versatility in object movement. If you're using 16bit numbers for your movement calculations (I used them for parabolic type movements), your paths won't look as smooth because fewer frames will be used to display the updated positions. But again, I really couldn't tell that much of a difference once I re-tweaked everything. And a major advantage for me was the ability to get more stuff on the screen. And the game crashing from out-of-frame type conflicts became nil.
Obviously, the first solution you want to aim for is to optimize your code. But if you get to a point where you feel you can't optimize it much more and you reeeeeeally want it done, splitting logic between frames is another possibility.
Thwaite runs most logic at 60 or 50 Hz. But collision detection is effectively 30 or 25 Hz, as it tests half the explosions against all the missiles. So is the smoke, as only half the particles are drawn in each frame. Some tasks are run at only 10 Hz (once every 6 frames on NTSC or 5 on PAL), running one step in each frame:
- Spawn one new missile if needed
- Look for threatened buildings
- Remove expired tip from screen
- Move half the villagers by one step (scurry.s)
- Increment the game clock by one tenth of a second and redraw the score area at 2 Hz (0/10 and 5/10) to flash the colon
tepples wrote:
Thwaite runs most logic at 60 or 50 Hz. But collision detection is effectively 30 or 25 Hz, as it tests half the explosions against all the missiles. So is the smoke, as only half the particles are drawn in each frame. Some tasks are run at only 10 Hz (once every 6 frames on NTSC or 5 on PAL), running one step in each frame:
- Spawn one new missile if needed
- Look for threatened buildings
- Remove expired tip from screen
- Move half the villagers by one step (scurry.s)
- Increment the game clock by one tenth of a second and redraw the score area at 2 Hz (0/10 and 5/10) to flash the colon
Yep, it was actually from watching Thwaite @ MGC last year and asking you about it that I "borrowed" this idea. Didn't do the exact same thing as I also draw at 30Hz, but still ended up being a really useful solution. Placing lower priority tasks at lower rates (like the tasks going at 10Hz is a really effective idea, as well.)
If you have the rom to waste, you can pre-flip your sprites. I combined this with self-modifying code (to avoid indirection with that huge table of data). The biggest drawback so far has been palette swapping. I don't have any numbers, but I remember when I made the change, the worst case was reduced by about half.
For my sprite drawing, I have separate routines for different types of sprites. In my current project, I have different routines for single tiled sprites, metasprites, metasprites flipped horizontally, and absolute sprite maps. Absolute sprite maps are the simplest type to draw, because they are just drawn as they are defined on the screen with 8-bit coordinates (like the word "PAUSE" when you pause the game). It also saves a ton of cycles when you know you're drawing only 1 sprite, so having a predefined routine for this is a great thing as well.
I don't bother with flipping metasprites vertically. I don't have much purpose for it in my game, and if I do, I'll just have a predefined vertically flipped version of that particular metasprite. However, flipping horizontally is very common. For all of my metasprites, I define the relative distance of each single sprite from the upper left corner, and I just add that distance to the upper left corner's coordinates when I draw each sprite. I have a separate routine that does the same, except it subtracts the relative distance from the upper left corner, thus flipping the sprite horizontally.
In terms of my main loop, it looks like this:
1. Handle Character
2. Handle AI
3. Handle Scrolling
4. Handle Animation
5. Handle Spontaneous objects
6. Draw Sprites
All together, my engine can handle about 5 intelligent enemies on screen with bullets flying around (the player's and enemies'), and of course the player and everything else without slowing down. My engine handles up to 8 active objects, but this slows things down.
My NMI routine is tied up until the sprite 0 hit for my status bar, but I use this time for handling sound.
Is it really that hard to use a single routine that draws the same metasprites both normal and flipped? Can't you just XOR each sprite piece's X coordinate with a zero page variable that can be set to $0F or $1F before adding it to the metasprite's X origin point?
tepples wrote:
Is it really that hard to use a single routine that draws the same metasprites both normal and flipped? Can't you just XOR each sprite piece's X coordinate with a zero page variable that can be set to $0F or $1F before adding it to the metasprite's X origin point?
My routine that draws sprites handles flipping dynamically. I can't remember the details right now, but the trick is indeed XOR'ing the coordinates with some ZP variable.
tepples wrote:
Is it really that hard to use a single routine that draws the same metasprites both normal and flipped? Can't you just XOR each sprite piece's X coordinate with a zero page variable that can be set to $0F or $1F before adding it to the metasprite's X origin point?
It's not hard... It's just slow. The problem arises if you are using 16-bit coordinates (I mean for off-screen activity). Unless you are absolutely certain all of the sprites that make an object will fit on screen, you have to do 16-bit arithmetic with the origin point and individual sprites. So adding 8 pixels is actually adding $0008 to the origin point. With XORing for flipping, you have to add $FFF8 to the origin point. This requires you to perform XORing on two bytes, which in the end, can slow things down a lot. A lot less cycles are involved with A) hardcoding the high part of the 16-bit addition, and B) just subtracting instead of calculating an opposite to add.
Celius wrote:
With XORing for flipping, you have to add $FFF8 to the origin point. This requires you to perform XORing on two bytes
I get away with XOR'ing only the low byte by displacing the origin by 128 pixels in both axis. This way all sprite coordinates are positive, even after you XOR them, so there's no need to XOR the high byte (it's always 0). There's of course the added cost of subtracting 128 from both coordinates, but maybe that can be incorporated in the part that calculates screen coordinates by subtracting the camera's position from the object's position (i.e. just subtract an additional 128 pixels).