I was just wondering how games usually handle this...
Most objects in a game are composed by multiple sprites, we all know that. They probably store the information about how to compose each object in tables. What I'm thinking about is how would games efficiently convert that data into OAM values, specially when only part of the object is visible.
Calculating the position of each individual sprite based on the actual position of the object and verifying if the result is inside the screen seems like a lot of trouble, specially if nearly all 64 sprites are used.
And there are other things to take into consideration, such as sprite flipping. The mappings for each object would have to be scanned in a different order depending on the side it's facing.
Handling all of that could take a lot of the processing time. Does anyone have anything to say about this?
tokumaru wrote:
Calculating the position of each individual sprite based on the actual position of the object and verifying if the result is inside the screen seems like a lot of trouble, specially if nearly all 64 sprites are used.
Super Mario 3 slowed down once half of that many sprites were in use.
Quote:
And there are other things to take into consideration, such as sprite flipping. The mappings for each object would have to be scanned in a different order depending on the side it's facing.
Handling all of that could take a lot of the processing time.
How many metasprites do you plan to have in a shot at once?
tepples wrote:
Super Mario 3 slowed down once half of that many sprites were in use.
That is the problem... I used to think that games slowed down when many enemies were present because of the AI of each one... but since enemy AI is usually very simple, I'm starting to think that drawing their sprites is what takes most of the time...
Quote:
How many metasprites do you plan to have in a shot at once?
I don't really know yet, but I expect to have a lot going on at once.
I have already coded a routine that will render sprite mappings while doing all the necessary stuff, like flipping and making sure only sprites that are inside the screen are rendered, but I haven't tested it's speed yet.
I'm finishing my scrolling engine now, as soon as it's ready I'll pay some attention to the sprites. I was just trying to think a little ahead. I guess there is no shortcut for this.
This isn't a hard task, but you'll most probably never made a super-ultra fast sprite mazing routine either.
What I use is basically indexed table for each enemy, position and direction (tri-dimentional tables !) in order to get each individual sprite set with it's own relative position, palette/flipping, and so on. I incorporated the feature to have a set of "hardware" sprite on a row with consecutive tile index to be on one single "software" sprite, allowing 16x8, 24x8 and 32x8 "software" sprites in order to slightly reduce ROM amount.
The sprite mazing routine tests screen overflow manually for each sprite. The only other alternative is to make a whole object dissapear at once, this should be harder to do but maybe less time consumming.
Eventually I think sprites eat more ROM thant process time. You can also put the limitation that all sprites in a single object are mazed in a 8x8 gird, so that you only have a couple of bits for the position of the sprite (intead of a full byte for both horizontal and vertical position), but you may waste this ROM space back on the CHR ROM / RAM data. Also forcing each object to be single-paletted would reduce this, but also reduce graphics quality.
If you make every object dissapear at once, it'll actually be a lot easier. That way, you'll only have to check if the whole sprite is on screen. If it is, then render it. If not, don't render it.
This'll actually give me a lot to think about for my game. I haven't thought about sprites being halfway on screen yet.
EDIT:
You can check whether or not the object is on screen. If it is, just use 16 bits to calculate the X/Y Coords of each sprite. If the High byte comes out not equal to $00, you'll know it's off screen. For example, if the enemy's right edge is 20 pixels off the right edge of the screen, you'll come out with a value of $114 for the X coord, with $01 as the high byte, and $14 as the low. If the high byte is not $00, it's off the screen.
And also, you can use bounding boxes to determine if the enemy is on screen. If the enemy collides with the edge of the screen, you'll know it should be displayed.
What you describe disalows mismatches between enemy/object graphics width/height and it's collision window width/height. I usually give my enemies a smaller collision window that what they actually are large, to give the player some margin (and because not all the area of tiles is used).
However, a per-object drawing condition can be the best for a game with fixed-size objects, such as a RPG.
Bregalad wrote:
What you describe disalows mismatches between enemy/object graphics width/height and it's collision window width/height.
I don't really see what you mean. Could you clarify a little?
Well, you'd want to have an enemy to be wider and taller on the screen than on the "internal" calculation used to detects hits with BG or other sprites. The techinque you describe use hits to do screen mazing stuff, and personally I like to have both of these be completely separate.
Celius, my routine works (does it? I haven't tested!) exactly like you described! It first calculates where the central point of the object is, relative to the camera (object coordinates minus camera coordinates). Some tweaking is done to the resulting coordinates if the sprite is flipped. Then it scans through the sprite definitions, which have the following format:
Code:
.sprite count;
.relative X coordinate;
.relative Y coordinate;
.tile index;
.sprite attributes;
(repeat "sprite count" times)
I chose this format for the sprite definitions instead of more compact ones because of this freedom this gives me. Each sprite can use a different palette, they can be asily layered and don't have to be arranged as a grid.
Anyway, the relative coordinates are inverted if the sprite is flipped in that direction, and are then added to the screen coordinates of the object. If the high byte is 0, it is output. The attributes byte is in the regular format that the PPU uses, but this routine receives a mask that is XOR'ed with the attribute byte of each sprite, so that the tiles can be flipped, the piority and palette can be changed, etc. When the flipping bits are set in this mask, this means that the whole sprite is flipped, and the coordinates are to be inverted before the addition.
In theory, it works great, and I can't see a reason for it not to. I'll check soon how well it performs and I'll let everyone know. I'm not sure if I'll perform the bounding box check before rendering them, because the routisne will handle objects that are out of the screen just fine, although it will waste time going through the definitions without ever outputting a sprite. If not many objects are active, this is a good solution, otherwise, the bounding box check may greatly reduce the calls to the sprite drawing routine.
And I believe that what Bregalad said is that each object will then need 2 different bounding boxes, because the one used for collisions between objects is usually smaller than the graphics of the object, to compensate for the fact that it is a box, while most objects are not rectangle.
I suppose you're right about that, but whatever the case, you should just do a check somehow to see if it's on screen. It'll greatly reduce the time of calculations.
So do most games refresh the page in RAM used for DMA completely every update? This would be much more reliable than editing the data that's already there, but it may take a while.
Well, on my current game engine, all active objects are on the screen because it only loads a screen at a time, and scolls whole screens, so this avoid this problem. The only way a sprite can be off screen for an object is if the object is close to the border of the screen. The "collision" box of the object can never go across the screen (my routines refuse moving the object if it tries to do so), but that doesn't dissalow actual sprites to overflow the screen, because you'd want the sprite take more place than it's own bounding box. If you don't the game will be hard as the player will (frustatingly) get hurt unfairly just by getting close of an object.
Oh, you have SO much less to worry about then someone making a game that's constantly scrolling (Such as Sonic or Castlevania). My game, which is almost exactly like Castlevania SOTN, has many objects that are off screen. In most rooms in my game, there'll be about 8 to 12 enemies. Only 4 of them can be on screen at once, but the rest are doing stuff off screen. I pretty much have to check whether or not an object is on screen in order to calculate its sprite positions, because if I were to check where every object's sprites are in relation to the screen, I would be calculating the positions of around 96 sprites, while only 64 can actually be on screen. I'd rather not waste that much time.
Getting definitions out of the way early helps resolve
Layne's Law issues before they become problems:
- Actor: A game object.
- Cel: An image used to represent the position and state of an actor.
- Sprite: An entry in the display list used to draw cels.
Each cel has two different bounding boxes: the cel border and the hit box.
Figure 1: A cel, 24 pixels by 16 pixels, with three colors plus transparency. The "cel border" encloses the entire cel.
Figure 2: Six 8x8 pixel hardware sprites make up this sprite cel. (Enemy sprite cels in
Super Mario Bros. are the same size.)
Figure 3: The "hit box" is the rectangle that encloses most of the non-transparent area of the sprite cel, used for collisions against other objects (not against the screen).
To determine whether an actor shall be rendered, you compare the cel border to the view border. Reject all cels that do not overlap the view in some way. Then for each sprite in the cel, you compare this sprite to the view border and write it to the shadow-OAM if it is inside. Hit boxes are ignored here.
So the case described by Celius would involve 44 tests against the view: 12 for entire cels, and 8 for each of the four actors that can be partially on the screen at once. Depending on how fast the view and actors are moving relative to each other, some of the tests might be skipped from one frame to the next, causing probably-not-noticeable artifacts.
Yeah, Bregalad's case sure is simpler than games with scrolling, because you know for sure that all active objects are being displayed, entirely even. So you don't have to clip them or sort them.
I'm still not sure that performing preliminary bounding box checks before rendering sprites is the best choice... In my game, I have RAM for 32 active objects, so I'd call the sprite drawing routine at most 32 times. But I'll hardly have that many active objects (the object RAM is recycled as the camera moves, with objects constantly being loaded and unloaded), and even if I do, not many of them will be represented with sprites.
If I ever approach this limit of 32 objects, I expect it to be because of rows and columns of rings, which appear very often in Sonic games, but these are drawn with the background. A separate individual ring object that is rendered with sprites is avaliable for places where the background rings can't be used,but I don't expect to have many of these.
When you look at Sonic games, you see that not that many objects are active at a time (excluding rings), even though they have 96 RAM slots for objects (maybe the only reason for this is the crazy amount of rings that bounce around when Sonic looses them). And since my game is for the NES, many things that were rendered as sprites in the original games (such as rings, lamp posts, spikes and monitors) will be rendered to the background most of the time. Sprites will only be use in parts where the background can't be used.
In fact, just to make something clear, the fact that many objects can be rendered to the background as well as with sprites is the reason I like 8x16 sprites so much, because I can reuse the tiles.
tepples wrote:
So the case described by Celius would involve 44 tests against the view: 12 for entire cels, and 8 for each of the four actors that can be partially on the screen at once.
Where does the number 8 come from?
Celius wrote:
tepples wrote:
So the case described by Celius would involve 44 tests against the view: 12 for entire cels, and 8 for each of the four actors that can be partially on the screen at once.
Where does the number 8 come from?
"I would be calculating the positions of around 96 sprites" divided by "about 8 to 12 enemies" equals 96/12 = 8 sprites in a cel.
Oh, okay. Most of my enemies are 8 sprites per cel, I just didn't know how you figured that. But now it's completely obvious.
For my game, I think I'll be able to use the "hit box" to detect whether or not the enemy is on screen. The hit box ranges from head to toe, and the width is generally how wide the enemy is. There'd at most be 2 pixels of the actor's graphic outside of the hit box. I know that if these few pixels should be rendered on screen while the rest of the actor is off screen, the graphic won't be displayed. But I don't think it'll be too noticable.
Take your actor for example (The one you displayed). If you were to check whether or not the actor's hit box was on screen instead of the cel border, I don't think it'd make much difference. It'd only be a few pixels worth of clipping. I know many of you probably disagree with me, but that's just what I think.
Well, I agree. The hit box should be enough for detecting if an object is visible or not. I've seen games where objects disappeared before they were completely off the screen, but only because I was looking for that kind of thing. It's usually unnoticeable.
Also, I'd actually benefit from using the hit box. Since a TON of RAM will be used to handle intelligent objects, I will save space by using the hit box instead of sacrificing another 8 bytes per object for cel border definitions.
Celius wrote:
I will save space by using the hit box instead of sacrificing another 8 bytes per object for cel border definitions.
I think that calculating the coordinates of the new hit box is worse than wasting a few more bytes (I'd waste bytes over time any day)... and I don't even see why you'd have to waste 8 more bytes... AFAIK, you could just use the same bytes to hold the information about the new box after you're done with the old one.
BTW, I just finished implementing my sprite drawing routine. After some tweaking, I got it working. Flipping, clipping, everything seems to work fine. The bad part is how long it takes to execute... Here is a Nintendulator shot of some results:
Don't mind the terrible color scheme, I just picked random colors. Here I just called the sprite drawing routine 3 times, once for each of those tall blocky things (notice that the one to the far left is clipped). I turned red emphasis on after all the processing was done, which means that the lighter strip at the top represents the time it took to draw everything (actually, there are a few other things happening too, so it's unfair to say that all of that time was spent with the sprites, but most of it was). I wasn't expecting on spending so much time just drawing sprites.
I don't think I'll be able to optimize this any further, but I could probably render sprites every other frame (intercalating that with some other task), and just modify the coordinates of the sprites that were rendered last time to reflect any changes in the coordinates of the camera (there are games that don't do this and it looks weird, don't you think?).
Anyway, any new ideas on how to reduce the amount of time used to render sprites is greatly appreciated.
EDIT: I feel I should say something about what kind of sprite cycling I'm using. I'm not cycling the individual sprites, instead I process the objects in a somewhat random order, so that each time different objects get to use sprites of different priorities. I always fill the OAM page sequentially, but I have 2 pointers to it, one to the start and one to the end, so that I can draw to either end (by sending a "SpriteStep" value to the drawing routine, which can be 4 or -4) and force a few objects to be in front of all the others (explosions, for example) if I have to. Also, the priority of the sprites in the same object are respected, which allows for layered sprites and such.
Quote:
So you don't have to clip them or sort them.
Yes I sort my object so that the bottommost object is rendered first and the topmost object is rendered last. This completely avoid sprite cycling, but gives a top-down 3D effect (not many games did thig, Double Dragon comes in mind). I only cycle sprites if their position is close enough.
Quote:
I know that if these few pixels should be rendered on screen while the rest of the actor is off screen, the graphic won't be displayed. But I don't think it'll be too noticable.
I 100% agree with you here, but what about the other way arround ? If the sprite hit box is entierely on the screen, but it's leftmost (or rightmost) sprites that made it are off the screen, if your routine doesn't check this, the sprites will warp arround and this will look terrible. So you have to check for the sprites to overflow anyway. Checking if the entiere sprite is on the screen at all is necessary only if you're short of time. (I mean a check is probably better than two additions then check several times).
My sprite displaying routine isn't very optimised, yet it works fine, and my games doesn't lag with the maximum of 8 normal enemies on the screen (enemies are probably the most time consuming object). In this worst case I guess I had about 60% of the CPU used. However, I never made any huge enemies with lot of sprites yet (no inspiration). I'll post more details later because I don't remember how I handled all of this since I wrote routine for this more than one year ago.
Also, tepple's exemple is rather bad, because the hit box almost matches the bouning box. In my game, I have enemies with a hit box twice as small as the place they take on the screen, mainly because the hit box of my player is large, and because the "head" of the enemies isn't supposed to hit with the "foot" of another object, because it's a top-down view (think of it being like Secret of Mana) and not a 2D side-view.
tokumaru wrote:
Celius wrote:
I will save space by using the hit box instead of sacrificing another 8 bytes per object for cel border definitions.
I think that calculating the coordinates of the new hit box is worse than wasting a few more bytes (I'd waste bytes over time any day)... and I don't even see why you'd have to waste 8 more bytes... AFAIK, you could just use the same bytes to hold the information about the new box after you're done with the old one.
I wouldn't be wasting time. I'd only need to calculate coords for 1 box. Making 2 boxes takes more time than making one.
So yeah, it looks like that's gonna take a while. Do you refresh OAM every frame?
One thing that's always kind of confused me was how people talk about doing all these things in different frames, and having your game run at 30 fps. How does this go unnoticed? Do most games not run at 60 fps?
EDIT: I only meant for the hit box to be checked for whether or not the object was on screen at all. I don't need to check if it's all the way on screen, or partially on screen, since I will have a universal routine that calculates for both. This problem that you mention, Bregalad, would only occur if I told the game to draw the entire object when the whole hit box was on screen. I don't think there'll be warping, just the slight clipping.
Plenty of Game Boy games run at 30 fps so that the slow LCD can keep up. The launch titles were developed on Wide Boy (SGB predecessor), so they show up blurrier on the LCD than the developers had seen on the CRT, but later titles used a lower frame rate to correct for this.
Films run at 24 fps, but they have natural motion blur to hide things.
Celius wrote:
Do most games not run at 60 fps?
I believe that most games do run at 60 fps (meaning that every frame presents some sort of change over the previous one), but not all tasks are performed every frame. I can't think of a specific game right now, but I'm sure I've seen games where the sprites seem to shake a little, because their coordinates are not always aligned to the background. I think some of the GameGear Sonic games do this (Tripple Trouble?), and some of the NES Mega Man games too, I think. I can only assume this happens because they intercalate the updating of sprites with the updating of the background?
Anyway, if I decide to not update the sprites every frame, I'd still want to maze the previously rendered ones a bit (so that sprite flickering happens at 60Hz, not 30Hz) and update all the coordinates to compensate for the movement of the camera, getting rid of the ones that went out of the screen. Hoever, this also seems very time-consuming and not such a big advantage after all.
I see how some things don't need to be updated every frame, but those things are usually game mechanics, such as your character's health. That doesn't need to be so immediate and constant. But the things that matter really should be done every frame. Cutting your frame rate in half is a big deal to me. Have any of you played Bugs Bunny's Birthday Blowout? Everyone that has ever played that game knows that it has a terrible framerate (I still love that game though). It's really important that sprites and scrolling are updated every frame.
It's funny how those things take up so much time. I counted the cycles of my scrolling/updating routine, and I like to overestimate, so I assume that every instruction takes 5 cycles. It took about 2300 cycles with that assumption, so it takes a couple hundred less cycles than that, but it's still a lot. That does deal with scrolling, updating the name table, and attribute table correctly.
I suppose there is quite a bit of time left in each frame. I don't think it's impossible to have the game run at 60hz.
Yeah, you still have a lot of time left in the frame. I'm worried about my sprite drawing routine though. It's taking about 4000 cycles just to render those 3 metasprites.
I'm considering the idea of having more than one type of sprite drawing routine, and more than one type of sprite definitions. I could make another one to use with simpler objects, that do not need layering, special attributes or different palettes. It would always draw sprites arranged in grids (something that makes it possible to skip entire rows or columns that are outside of the screen). I'm sure many of the objects in the game could be drawn like this. This way I can get some speed while not giving up on complex sprites where I really need them.
Another option would be to tell if the object is completely inside the screen or if it's clipped, so that an optimized routine could handle the first case without checking if each one of the damn sprites is visible or not!
Celius, do you have a routine that renders sprites yet? I believe yours will be a bit simpler than mine, because it seems that your game is a side-scroller (which eliminates a bunch of the Y coordinate logic), but I'd still like to know your times.
I also expect to use more time rendering the background than you do, because of the vertical scrolling, in addition to horizontal. The only thing that makes me a bit more confident is that the enemy AI in sonic games is very simple, most of the time the enemies don't even collide with the level map, they just move in a certain pattern, so I expect to spend more time drawing sprites than processing their logic.
The physics for the playr will be quite complex though, and that scares me. I'll probably have to face some serious optimization step soon!
Celius wrote:
Have any of you played Bugs Bunny's Birthday Blowout?
That one's closer to between 15 and 20 frames per second. A lot of 16-bit (EGA/VGA era) PC games also ran at 18.2 fps due to how one of the built-in timers was programmed. And Castlevania (at least the first one on the NES; I haven't played Dracula X) is less of a twitch game than, say, Sonic.
I haven't made my sprite routine yet. When I sat down to start coding for the object engine, I realized there were a ton of things I don't know. As in everything. I'm still working on the ideas for how everything works. Once I know all that, I'll start coding.
My game will scroll all 4 directions just as yours does (Not nearly as fast), so I'll have to worry about the same things. I haven't even started coding my object engine, I'm getting all the ideas down on paper before starting. But I can probably do my drawing routine by tommorow or the next day. I can probably get that figured out without knowing too much else. It'll be a while before I can even start coding for enemy handling and whatnot.
So it took you 4000 cycles just to draw those three metasprites? I guess I'll have to see how long it'll take for my routine. But I have to ask. What exactly is happening in your drawing routine?
4000 cycles, that's like...almost 4 whole scanlines!
Dwedit wrote:
4000 cycles, that's like...almost 4 whole scanlines!
What GBA are you thinking of? This is NESdev, where four thousand cycles equal 35.2 scanlines.
tepples wrote:
This is NESdev, where four thousand cycles equal 35.2 scanlines.
Yes... Boy, I wish it were just 4!
Celius wrote:
My game will scroll all 4 directions just as yours does
Oh, I see... I kinda remembered Castlevania games to be side-scrolles, with stairs taking you to the other floors, but with no vertical scrolling. Well, then you know that scrolling in both directions is not exactly trivial! =)
Quote:
So it took you 4000 cycles just to draw those three metasprites? I guess I'll have to see how long it'll take for my routine. But I have to ask. What exactly is happening in your drawing routine?
It'd probably be easier for me to just paste it here then explaining everything, so here it is:
Code:
DrawMetaSprite:
;-- SUBROUTINE --------------------------------------------------
;DESCRIPTION:
; Processes the sprites in a sprite definition.
;INPUT:
; A: Mask used to modify the attributes;
; X: bytes to skip when moving to the next slot (4 or -4);
; SpriteDefinition: Address of the sprite definition;
; SpriteStep: value to add to the slot index after each sprite;
; SpriteX, SpriteY: coordinates of the object;
;DESTROYS: A, X, Y, SpriteX, SpriteY;
;----------------------------------------------------------------
;Verify if there are slots left
ldy SlotsLeft
bne +
rts
+
;Save the attributes
sta SpriteAttrib
;Point to the first byte in the definition
ldy #$00
;Copy the sprite count
lda (SpriteDefinition), y
sta SpritesLeft
;Calculate the central X coordinate of the sprite
sec
lda SpriteX+0
sbc CameraX+0
sta SpriteX+0
lda SpriteX+1
sbc CameraX+1
sta SpriteX+1
;Fix the coordinate if the sprite is flipped horizontally
bit SpriteAttrib
bvc NoHorFlip
sec
lda SpriteX+0
sbc #$07
sta SpriteX+0
lda SpriteX+1
sbc #$00
sta SpriteX+1
NoHorFlip:
;Calculate the central Y coordinate of the sprite
sec
lda SpriteY+0
sbc CameraY+0
sta SpriteY+0
lda SpriteY+1
sbc CameraY+1
sta SpriteY+1
;Compensate for the sprite delay and blank scanlines
clc
lda SpriteY+0
adc #$0f
sta SpriteY+0
lda SpriteY+1
adc #$00
sta SpriteY+1
;Fix the coordinate if the sprite is flipped vertically
bit SpriteAttrib
bpl NoVertFlip
sec
lda SpriteY+0
sbc #$0f
sta SpriteY+0
lda SpriteY+1
sbc #$00
sta SpriteY+1
NoVertFlip:
;Load the correct index of the slot
stx SpriteStep
txa
bmi +
ldx SpriteSlotA
jmp DrawSprite
+ ldx SpriteSlotB
jmp DrawSprite
OutOfScreen:
dec SpritesLeft
beq SpritesFinished
;Advance to the next definition block
clc
tya
and #%11111100
adc #%00000100
tay
DrawSprite:
;Advance to the next definition byte
iny
;Load the relative X coordinate
lda (SpriteDefinition), y
;Check if the sprite is flipped horizontally
bit SpriteAttrib
bvc +
;Invert the value if it is
eor #$ff
+ sta SpriteTemp
;Add the displacement
clc
adc SpriteX+0
;Store the result
sta SpritePage+3, x
;Check if the result was valid
php
lda #$7f
cmp SpriteTemp
adc #$80
plp
adc SpriteX+1
and ScreenXMask
bne OutOfScreen
;Advance to the next definition byte
iny
;Load the relative Y coordinate
lda (SpriteDefinition), y
;Check if the sprite is flipped vertically
bit SpriteAttrib
bpl +
;Invert the value if it is
eor #$ff
+ sta SpriteTemp
;Add the displacement
clc
adc SpriteY+0
;Store the result
sta SpritePage+0, x
;Check if the result was valid
php
lda #$7f
cmp SpriteTemp
adc #$80
plp
adc SpriteY+1
and ScreenYMask
bne OutOfScreen
;Advance to the next definition byte
iny
;Load the index of the sprite
lda (SpriteDefinition), y
;Store it in the slot
sta SpritePage+1, x
;Advance to the next definition byte
iny
;Load the byte with the attributes of the sprite
lda (SpriteDefinition), y
;Modify it as necessary
eor SpriteAttrib
;Store it in the slot
sta SpritePage+2, x
;Move on to the next slot
clc
txa
adc SpriteStep
tax
dec SlotsLeft
beq SpritesFinished
;Move on to the next definition
dec SpritesLeft
bne DrawSprite
SpritesFinished:
lda SpriteStep
bmi +
stx SpriteSlotA
rts
+ stx SpriteSlotB
rts
This is the working version. As far as I tested, no errors. There are probably ways to optimize it, and I'll look into it soon. But since the sprites are fully working now, I'll go back to working on the background code, which is almost ready.
EDIT: Oh, you must remember to clear the unused sprites after you're done drawing all the objects. I do this with the following code:
Code:
;Clear the unused sprite slots
lda SlotsLeft
beq SlotsCleared
ldx SpriteSlotA
lda #$ef
ClearSlot:
sta SpritePage+0, x
inx
inx
inx
inx
dec SlotsLeft
bne ClearSlot
SlotsCleared:
I think no game where the player is supposed to interract in real time should ever run slower than the console's framerate. All Mega man games runs at the consle frame rate, and all Castlevania games too. Don't take any fasle information as true or you'll end up make wrong decisions.
Hey, I'm impressed. If I put the maximum of 8 objects, the sprite mazing routine effectively takes a lot of time. Something about 50% of the whole CPU time I assume. About 2/3 of the screen are grayed when I gray this part trough $2001. However, it's fairly rare that that much objects are active, and this is the only time-consumming task during gameplay if you're not scrolling.
Also, I wrote my routine with easy-to use in mind and it's not very good optimised. For every single sprite, the program does a lot of checks before actually mazing it. The idea to have different sprites configuration tables is crazy, but maybe it could work who knowns ?
EDIT : Tokumaru, it's amazing how you programm things differently as I do. You take your sprite coordinates, and add somthing to them, store them back, then add something to them again, store them agin etc... You do everything step by step. I would never do anything like this myself, I'd always take the coordinates perform all checks and calculations on them, and then store them back at the end. I guess your way of doing things is clearer to understand than mine, but in the end maybe it's slightly less optimised.
Bregalad wrote:
You do everything step by step. I would never do anything like this myself, I'd always take the coordinates perform all checks and calculations on them, and then store them back at the end.
I'd rather do that too (I'm all for performance rather than making understandable code), but when you are working with 16-bit values there is not much choice... you have to store the result because you have to use A again for the high byte! Unless you used X and/or Y to hold temporary results, something I do in other parts of my code, but for just 1 extra CPU cycle this is hardly worth it.
Do you have any ideas on how I could optimize the code above? Optimizing the loop has a much bigger effect than optimizing the setup that comes before it, that's for sure.
On a somewhat related topic, the background-drawing routine seems to perform much better than this one. Even when rendering a row and a column in the same frame, many less cycles are used, when comparing to the sprites.
Bregalad wrote:
I think no game where the player is supposed to interract in real time should ever run slower than the console's framerate.
Doom for PC ran at much slower than the 70 fps of VGA mode 13h. A lot of PS1 games ran at 30 or even 20 fps.
Quote:
All Mega man games runs at the consle frame rate, and all Castlevania games too.
Including the Castlevania games on Game Boy?
Quote:
Hey, I'm impressed. If I put the maximum of 8 objects, the sprite mazing routine effectively takes a lot of time.
Where did the word "mazing" come from?
@ Tokumaru : The inc and dec instruction are made to handle 16-bit opperation when the high byte is only here to serve to test purpose. For example this :
Quote:
Code:
clc
lda SpriteY+0
adc #$0f
sta SpriteY+0
lda SpriteY+1
adc #$00
sta SpriteY+1
;Fix the coordinate if the sprite is flipped vertically
bit SpriteAttrib
bpl NoVertFlip
sec
lda SpriteY+0
sbc #$0f
sta SpriteY+0
lda SpriteY+1
sbc #$00
sta SpriteY+1
Could be optimized in this :
Code:
lda #$00
sta Temp
clc
lda SpriteY+0
adc #$0f
sta SpriteY+0
bcc +
inc Temp
+
;Fix the coordinate if the sprite is flipped vertically
bit SpriteAttrib
bpl NoVertFlip
sec
lda SpriteY+0
sbc #$0f
sta SpriteY+0
bcs +
dec Temp
+
etc...
Or even better use X or Y instead of a temporary variable (but this isn't always managable, especially in sprite mazing routine where you keep an index indexing the OAM all the time (at least I do this)).
@ tepples : I'm don't remember where mazing come from, but I'm pretty sure I didn't made it up. Isn't this a correct english word ?
Oh and by the way I don't know any games that runs solwer than the console framerate while looking good. I never played doom, but this is an early 3D game, so I think the lag is excusable. I also never played any original gameboy Castlevania games, I was talking about NES Castlevania games that runs at full framrate (60 fps on the NTSC and 50 fps on PAL).
I don't know much about original Gameboy games, but I'm pretty sure the only gameboy game I really love, wich is Final Fantasy Adventure, runs at full speed. All 2D Gameboy Color and Advance games I played seems to run at full speed.
I see what you mean... but this can only be done when one of the numbers is 8-bits and positive. But yeah, I could do what you suggested. This part of the code is still outside of the loop, so I could use X or Y if I needed to. But inside the loop, X is used to point to the sprite slots (OAM mirror) and Y is used to point to the sprite definitions.
About the word "mazing", I believe it makes some sense because of sprite cycling, where the sprites are distributed semi-randomly (or "mazed") across the sprite slots. I don't know. =)
Bregalad wrote:
Oh and by the way I don't know any games that runs solwer than the console framerate while looking good. I never played doom, but this is an early 3D game, so I think the lag is excusable. I also never played any original gameboy Castlevania games, I was talking about NES Castlevania games that runs at full framrate (60 fps on the NTSC and 50 fps on PAL).
I don't know much about original Gameboy games, but I'm pretty sure the only gameboy game I really love, wich is Final Fantasy Adventure, runs at full speed. All 2D Gameboy Color and Advance games I played seems to run at full speed.
Or maybe they did a good job of it and you didn't even notice the difference! =)
So I made my sprite drawing routine, and I think I'm gonna have to come up with a better idea. What I did was I had the tile values and attribute values in an array in RAM. After fetching those, I calculated the coordinates of every sprite, and I copied all the data from the arrays into the OAM page. In the tile fetching routine, I checked to see if there was a flip. If so, I copied the values accordingly. The problem is that I didn't do it for the coloring.
It took me about 8 scanlines to draw an 2x2 sprite, which I don't think is very good. If I took a 4x4, it'd take about 32 scanlines. So I think I might want to take a different approach.
Tokumaru, I look at your code, and I really don't understand how you handle flips. Could you explain?
Yeah, your times are not looking very good... 32 scanlines is the time it took me to draw 3 2x4 sprites, and that's not very good either.
Celius wrote:
Tokumaru, I look at your code, and I really don't understand how you handle flips. Could you explain?
You mean vertical and horizontal flipping? Well, my definitions have the relative (relative to the position of the object, in my routine, SpriteX and SpriteY) coordinates.
So, I can say that a sprite is 8 pixels to the left of the central point and 16 pixels above it, for example. Before adding the relative X value, I check if the sprite is fliped horizontally. If yes, I invert the 8 turning it into -8, so it's moved to the other side. But this is still not enough, because the cordinates of the sprite are for it's top left corner, but when fliping it you'd kinda like those coordinates to be for the right corner, but since this is impossible, I just move the coordinates of the object to the side to compensate for this before entering the loop. Flipping vertically works exactly the same.
About inverting the number, to do it you have to inver all the bits (eor #$ff) and add one. To avoid having to add one to each sprite, I take this 1 into account when compensating for the width of the sprite as I said above.
The idea is that I tweak the coordinates of the object in case of flipping, so that each relative coordinate can be flipped with a simple EOR command. When outputting the attributes of each sprite, the individual flipping bits are EOR'ed with the flipping bits of the whle object, so if the object used any flipped sprites originally, they'd be unflipped, causing them to look flipped relative to the other ones that were just flipped. Well, this sounded confusing, but trust me: the definitions an contains flipped and unflipped sprites, and the final structure of the object is maintained in case o flipping because all the sprites will be flipped, even the ones that were already flipped.
But if I'm not mistaken, your sprites are arranged in grids, right? So you don't define the coordinates of each sprite, but only of the whole block, right? I must admit that this seems harder to flip. But since this was my original design, I had a solution for this.
My designe used relative coordinates for the top left corner of the grid, it's width and height (in sprite units), and then the indexes and attributes of each individual sprite. To flip that, You'd also have to invert the relative coordinates to have them go to the other side of the block. When inverting, you'd probably have to account for the width of the sprites (8) too. After that you got the coordinates of the first sprite, and can enter the loop that will draw them all.
In this loop, you should check the high byte of each coordinate and if both are 0, output the sprite. Increment the X coordinate for the next sprite. The amount you use to increment should probably be in a variable, because you'll want to add 8 when it's not flipped and -8 when it is. Just set this variable with the proper value before entering the loop.
I'd keep the number of horizontal sprites (width of the block) in an index register, so that I could decrement it and detect when the first row ended. When the row ends, reset the X coordinate (to the number yu calculated right before entering the loop), and increment the Y coordinate by 16 or -16 (the amount should be in a variable, like for the X coordinate), assuming you are using 8x16 sprites. When the number of vertical sprites (height) ends, you're done.
That would not need any buffers, you could just keep updating the same pair of coordinates for all the sprites (and just keep the calculated X coordinate for when starting new rows). This is how I'd do it.
I'm considering implementing a routine like this and use both types of sprites in my game, because this other type seems to leave more room for otimization. Depending on the type of the object, it will call one routine or the other, and I won't waste precious cycles when they are not needed.
Oh, I think I should advise against using a lot of RAM buffers/arrays, specially when it's possible to output the data directly. Handling arrays is a very time consuming process, because of the loops and all that. Did you see how the output in my routine works? When I output the X and Y coordinates of the sprite, I always write them to the OAM page directly, even before knowing if they are valid or not. I leave the validity check for later (and i don't even store the high byte anywhere, I just need to know if it is zero or not), and in case a coordinate was not valid, I simply do not advance a slot, and that invalid information will be overwritten by the next valid sprite. This makes the cases when the coordinates are valid much faster then buffering the results.
Heh, I had never thought that this task could use so much CPU time!
Bregalad, you said you used tables to make this process faster... what kind of tables are those? I can't think of anything you could pre-calculate to make this whole process faster...
You know something that sucks? Having to switch banks to access different types of data (level maps, sprite mappings, etc) with the MMC1, which requires a lot of time to complete a register write.
I'm saying this because I need the screen mappings to be loaded when the object routines are executed, because objects my need that information when walking, and so on. The actual level map and the object definitions are in RAM, so those are fine. But to render their sprites, the sprite mappings must be loaded. They can't be in the same bank, because there are many different screen mappings, spread across multiple banks.
The only solution I see is to buffer the parameters that would otherwise be sent to the drawing routine, and send them all at once after all the objects have been processed, so I'd bankswitch only once. This solution is annoying, because it uses more RAM, and wastes more time with the menaging of this new list.
Another option would be to dedicate part of the object RAM itself to hold the buffered values, and just scan all objects again sending the buffered values to the drawing routine, when these are present. In any case, the sprites should be rendered last.
Oh my it's amazing how you can get complicted stuff from a simple stuff.
Well, I don't see the problem to have all sprites definition is a single bank. And bankswitch in MMC1 is a bit longer than with a discrete logic mapper, but it's really nothing to worry about I think. 5 writes and 4 shifts take something like 30 cycles or so.
And myself I've used completely different sprite definitions for flipped sprites, so that they aren't forced to be symetric. However it's a top-down game so this is really different than a plaftomer where everything is flipped horizontally anways.
That gives me some great ideas! Inverting the object as a whole sounds much simpler than going through some inverting loop that takes a million cycles.
And also, I did the same thing in my code. I took the sprite, calculated it's coordinates, and if the high byte was used, I simply moved on to the next cel in the object. This I would not change.
I just have to think about how to go about this wisely. I won't use arrays, because in the end, it's a waste of time and RAM. The only thing I'll use arrays for is updating while scrolling. But yeah, inverting the top left corner to the other side is a really smart idea.
Also, in my sprite tables, I had the color data compressed, and this is why it took a lot longer. I had 4 attributes compressed into one byte, so I had to do numerous shifts to get these out. But I think I'll just stick to using decompressed values, but I feel like I wasting so much by only using 2 bits in every definition. What do you think about that? Should I leave them decompressed?
And yes, I have my sprites in arranged grids, so I have one general coordinate for that object. I really have to think about how to do this routine wisely.
EDIT: I've tested my new routine. It takes about as long as yours, Tokumaru. I can probably shorten it a bit. I would post it up, but I don't have any comments or anything on it, and it wouldn't make much sense. I'll post it up later.
You said something about inverting your positions. I avoided inverting pretty much. Last time, I took a different tile/color for a certain set of coordinates if the object was flipped. BAD IDEA. I just read the data as is this time, and calculated the coordinates for that specific cell depending on whether or not it was flipped. For a flip, I took the tile width of the sprite - 1, multiplied it by 8, and just added it to the X coord. This will give me the X coord for the tile on the right side of the metasprite. I then subtract 8 for every tile placement instead of adding. I did the same for vertical flips, except I multiplied the vertical position -1 by 16.
But drawing a 2x4 sprite took about 12 scanlines, but this can be shortened. I would really rather draw from an array in RAM, because I could compress my tables to not take up so much space.
I'll modify my routine to take less time. I'll also add some comments and post it up.
Also, Tokumaru, I see in your first screen shot that the top of the screen is pink partially for seeing how long the routine is. Did you take Vblank into account when seeing how long your routine was? In the beggining of my routine, I waste time so I can get out of Vblank to see how many scanlines it takes.
Celius wrote:
EDIT: I've tested my new routine. It takes about as long as yours, Tokumaru. I can probably shorten it a bit.
When you optimize it, tell me what you did, maybe you can give me some ideas! =) I will not work on this again until my scrolling engine fully works. I got the columns updating fine now, I just gotta do the rows, but I got everything pretty much worked out already. There's some tweaking to the code that handles attributes too.
Quote:
I just read the data as is this time, and calculated the coordinates for that specific cell depending on whether or not it was flipped. For a flip, I took the tile width of the sprite - 1, multiplied it by 8, and just added it to the X coord. This will give me the X coord for the tile on the right side of the metasprite. I then subtract 8 for every tile placement instead of adding. I did the same for vertical flips, except I multiplied the vertical position -1 by 16.
Yeah, I think this is the way to go for grid-aligned metasprites. Now let me ask you one thing: from what I can see, your coordinates always indicate the top left corner of the sprite, right? This is the only part I seem to diagree with you, as I chose to have a pair or coordinates relative to the central point of the object (Sonic's is at the bottom, by his feet, centered horizontally) indicate where the sprites are. This keeps me from having to manually calculate the position of the sprites every time... Well, unless you consider your player's coordinates to be at the top left corner, like the sprite. I wouldn't do that, but if it works for you, OK.
Quote:
Also, Tokumaru, I see in your first screen shot that the top of the screen is pink partially for seeing how long the routine is. Did you take Vblank into account when seeing how long your routine was? In the beggining of my routine, I waste time so I can get out of Vblank to see how many scanlines it takes.
Yeah, there are other things before the sprite code that take up most of VBlank (I update the palette, draw a few patterns, and there's some other test code), so I guess that was pretty accurate.
Here's the code, but it didn't really end up taking less time. But it's commented:
Code:
;The first three bytes of the sprite definition are the number of tiles in the metasprite,
;The X dimension, and the Y dimension of the metasprite. The rest of the bytes
;define colors and tile IDs. So the next byte after the Y dimension byte will represent
;the Tile ID for the first tile. The next one will be the color data for that tile.
;The next two will represent the tile ID and the color for the next cel in the metasprite.
;It goes on for however many cels are in the metasprite.
DrawMetaSprite:
ldx #4
ldy #0
-
iny
bne -
dex
bne -
lda #$00
sta $2001
lda #<NoFlipX ;We may be jumping to these locations
sta TempAddL ;Depending on if there's a flip or not.
lda #>NoFlipX
sta TempAddH
lda #<NoFlipY
sta TempAdL1
lda #>NoFlipY
sta TempAdH1
ldy #0 ;Start at the beggining. Obviously.
lda (SampleL),y ;Load the number of cels in the metasprite.
sta SpritesLeft
iny ;Go to the next byte.
lda (SampleL),y ;Load the width of the metasprite
sta DimX
iny ;Go to the next byte.
lda (SampleL),y ;Load the Height of the metasprite.
sta DimY
iny ;Go to the next byte.
;**************************************************
sec ;Here we take the coords of the object,
lda ObjectXL ;Subtract the coordinates of the screen
sbc ScreenXL ;And it becomes the relative coordinates of the metasprite.
sta StartingXL ;But we need to remember it for when we start a new row
sta CurrentXL ;Of sprites, so we have a starting X value.
lda ObjectXH ;All coordinates are 16-bit. So we need to take that into account.
sbc ScreenXH ;The only reason for a 16-bit X coordinate is so we can determine
sta StartingXH ;if a cel in a sprite will be displayed or not.
sta CurrentXH
;*********************
sec ;The same goes for the Y coord. However, a starting value and
lda ObjectYL ;The current value do not need to be seperate, because we don't
sbc ScreenYL ;Need to refresh the value once we're done with it.
sta CurrentYL
lda ObjectYH
sbc ScreenYH
sta CurrentYH
;**************************************************
bit FlipStatus ;We'll test to see if it's flipped horizontally.
bvc + ;If not, skip ahead.
ldx DimX ;To calculate the X position of the opposite side,
dex ;We use the formula NewXPos = (Width - 1) * 8 + CurrentXPos
txa ;After getting that, we'll tell the routine to subtract 8
asl a ;For every tile instead of adding. We lay the tiles right to left
asl a ;Instead of left to right.
asl a
clc
adc StartingXL
sta StartingXL
sta CurrentXL
lda StartingXH
adc #0
sta StartingXH
sta CurrentXH
lda #<FlipHrzntl ;Instead of doing comparisons to see if it's flipped or not, we'll just jump
sta TempAddL ;Directly to where we need to go with a Temporary address.
lda #>FlipHrzntl
sta TempAddH
;*********************
+
bit FlipStatus ;We check here to see if there's a vertical flip
bpl + ;If not, just skip ahead.
ldx DimY ;We can use a formula very similar to the one to
dex ;calculate the Y coord of the bottom cels.
txa ;NewYPos = (Height - 1) * 16 + CurrentYPos
asl a
asl a
asl a
asl a
clc
adc CurrentYL
sta CurrentYL
lda CurrentYH
adc #0
sta CurrentYH
lda #<FlipVrtcl ;We also tell the routine to go bottom to top instead of
sta TempAdL1 ;Top to bottom if there's a vertical flip.
lda #>FlipVrtcl
sta TempAdH1
;**************************************************
+
lda DimX ;Here we copy the value of the X dimension because
sta Variable1 ;We'll be needing to do a certain loop for however many tiles the sprite is wide.
ldx CurrentPos ;Start where we left off if we call this routine more than once. (It starts off as 0)
DrawSprites:
lda CurrentYH ;Before doing anything, we need to check if the cel is actually on screen
beq + ;If the High byte is used, it's off screen.
iny ;Move on to the next set of definitions
iny
jmp ++ ;Skip past table copying
+
lda CurrentXH ;See if the high byte is used for the X coord
beq +
iny ;Move on to the next set of definitions
iny
jmp ++ ;Skip past the copying
+
lda CurrentYL ;Copy the current Y value
sta OAMPage,x
inx
lda (SampleL),y ;Copy the current tile ID
sta OAMPage,x
iny ;Get the next byte
inx
lda (SampleL),y ;Copy the Attribute data
ora FlipStatus ;This byte can include priority data, I just called it FlipStatus for some reason.
sta OAMPage,x
iny
inx
lda CurrentXL ;Copy the X position
sta OAMPage,x
inx
++
jmp (TempAddL)
--
dec Variable1
bne DrawSprites
lda DimX
sta Variable1
lda StartingXL
sta CurrentXL
lda StartingXH
sta CurrentXH
jmp (TempAdL1)
-
dec DimY
bne DrawSprites
lda #$1E
sta $2001
stx CurrentPos
jsr Clear_Unused
ldx #0
stx CurrentPos
jmp Return
NoFlipX:
clc
lda CurrentXL
adc #8
sta CurrentXL
lda CurrentXH
adc #0
sta CurrentXH
jmp --
NoFlipY:
clc
lda CurrentYL
adc #16
sta CurrentYL
lda CurrentYH
adc #0
sta CurrentYH
jmp -
FlipHrzntl:
sec
lda CurrentXL
sbc #8
sta CurrentXL
lda CurrentXH
sbc #0
sta CurrentXH
jmp --
FlipVrtcl:
sec
lda CurrentYL
sbc #16
sta CurrentYL
lda CurrentYH
sbc #0
sta CurrentYH
jmp -
Clear_Unused:
lda #0
sec
sbc CurrentPos
tay
ldx CurrentPos
lda #$FF
-
sta OAMPage,x
inx
dey
bne -
rts
At the beggining, I waste time just to get it out of Vblank. Then I shut the screen off until it's done with the loop. And the thing at the end will be changed. I won't jump directly into the clearing routine after the first sprite is done being drawn. There are many things just there for testing purposes. I'll also be doing a different routine to check whether or not the metasprite is touching the screen. After confirming, I'll call the routine.
EDIT: I had to hurry, so I left some things out of my post. Yes, my object positions are always defined by the top left coordinate. I don't really see a reason to change it. I think it works fine the way it is.
But after looking at your routine again, I notice that it allows for objects that aren't completely surrounded by a box, while mine doesn't. This would be really good in some cases, but generally metasprites are so small that you wouldn't really have sprites displayed that are blank tiles. In my game, most of the background enemies are the big ones. But yours allows for it because you define all the positions in the metasprite. I personally see this as a lot of ROM being used, but if it works for you, that's good.
tokumaru wrote:
I will not work on this again until my scrolling engine fully works. I got the columns updating fine now, I just gotta do the rows, but I got everything pretty much worked out already. There's some tweaking to the code that handles attributes too.
I took a long break from NESdev a couple months ago. As soon as I got back in, I finally conquered that task once and for all. I hope to never have to make another scrolling routine. I felt really really good once I finished it, because I can use it in pretty much any game that uses scrolling. I just need to tweak it to allow scrolling speeds faster than 4 pixels. If my rows or columns are split between two nametables, I write the data for one half in one frame, and write the data for the second half in the next. This is the reason I can't scroll faster than 4 pixels a frame, because I update every section of 8 pixels. By the time the second part of the column/row needs to be written, it's already displaying a new column/row that needs to be updated. So the first half of the column/row would be updated correctly, while the next part appears in the newly displayed row/column. It's dumb, and I have to fix it. Then I'll be able to scroll 8 pixels a frame. This will be a problem for my character falling down a long pit or something, because gravity will grow to have the character falling faster than 4 pixels a frame, and my camera needs to follow the character.
I suppose yours has to support really really high speeds, huh?
Celius wrote:
If my rows or columns are split between two nametables, I write the data for one half in one frame, and write the data for the second half in the next.
Boy, you'd flip if you saw my routines that draw columns and rows!
Quote:
I suppose yours has to support really really high speeds, huh?
16 pixels per frame, in both directions if necessary!
I always draw full metatiles, never just tiles. Rows are always 17 metatiles long, and columns 15 metatiles tall. I always assume they will cross the name table barrier (rows in fact always do, because they are wider than the name table, and columns almost always do to). It's really not hard at all to handle this...
See, you most likely have the destination address (the one you write to $2006) stored somewhere, because you use it to write the first half of the row/column being updated. After you write the first half, with a small modification to that address you are ready to draw the second half! When drawing rows, for example: if you crossed the edge of the name table and entered the other one, you should flip the bit in the address that selects between the 2 name tables (so, if you updated name table 1, now you'll update name table 0). The other little modification is to clear all the bits that select the X coordinate, because since you just entered a new name table, you'll sure start updating if from the absolute left.
What I'm saying is that you don't have to spread your update across 2 frames, since with this simple modification of the address you can find the address to where the rest of the tiles should go.
Now, if your problem is speed, it mostly likely is because you are drawing the tiles with a loop. Loops are slow, and for maximum speed I use a series of LDA & STA instead of loops. You may ask how can I do this if I don't know how many tiles will go to each name table... the answer is pretty simple... a jump table. So, my drawing "loop" looks something like this:
Code:
DrawMetatiles:
;Y holds the number of metatiles to draw
lda SkipDrawLo, y
sta TempAddress+0
lda SkipDrawHi, y
sta TempAddress+1
;Skip a number of metatiles
jmp (TempAddress)
Draw16:
lda TileBufferA-16, x
sta $2007
lda TileBufferB-16, x
sta $2007
Draw15:
lda TileBufferA-15, x
sta $2007
lda TileBufferB-15, x
sta $2007
(...)
Draw02:
lda TileBufferA-02, x
sta $2007
lda TileBufferB-02, x
sta $2007
Draw01:
lda TileBufferA-01, x
sta $2007
lda TileBufferB-01, x
sta $2007
Draw00:
rts
Two tables hold the address of where to skip, depending on how many metatile I have to draw:
Code:
SkipDrawLo:
.db <Draw00, <Draw01, <Draw02, <Draw03, <Draw04, (...)
SkipDrawHi:
.db >Draw00, >Draw01, >Draw02, >Draw03, >Draw04, (...)
The routine must be called twice, once for each half. Note that because of the "Draw00" label, you can always assume the tiles are divided, because even if they aren't, there will be no harm done.
Then there is the value of X... This is a big part of the trick: the first time the routine is called, it should be the number of metatiles you want to draw. So if you wanted to draw 4 metatiles, X would be 4. The jump would send you directly to the "Draw4" label, where the value at "TileBufferA-02, x" is loaded. If X is 4, the address will be TileBuferA, which is the beginning of the buffer, and this is exactly what we want.
For the second time, X should be whatever makes the last copy command see the last slot of your buffer. Since this is a row of 17 metatiles, the last slot is numbered 16, and for that last address evaluation (TileBufferA-01, x) to be 16, X must be 17. So, the calls to the drawing routine will look like this:
Code:
;WRITE THE ADDRESS TO $2006 HERE!
ldy TileCount0
ldx TileCount0
jsr DrawMetatiles
;MODIFY THE ADDRESS AND WRITE TO $2006 HERE!
ldy TileCount1
ldx #$11
jsr DrawMetatiles
There you have it, the secret for my fast scrolling! =) Of course, since I draw full metatiles, I actually call the drawing routine 4 times for a row, and 4 times for a column, for a total of 8 calls if both rows and columns are being rendered, and the value sent in X is more complex because it selects between rows and columns, first half or second half, left or right side, etc. But it's still pretty fast.
Great idea! Right now, I have a destination address already calculated for the next frame. I really should stay away from loops. Loops really do add up, but that looks like it doesn't take very long at all. I think I CAN modify my routine to do 1 string of writes per 8 pixels, however I would have to really change my code.
I'm thinking of writing all of my routines at least twice. Since I rewrote my sprite drawing code yesterday, I feel that if I write rough drafts of code, and then write the real routine, I'd do a lot better. My first draft of my code yesterday was really sloppy looking, and it was barely understandable. I wonder if most programmers do this...
Celius wrote:
I'm thinking of writing all of my routines at least twice. Since I rewrote my sprite drawing code yesterday, I feel that if I write rough drafts of code, and then write the real routine, I'd do a lot better. My first draft of my code yesterday was really sloppy looking, and it was barely understandable. I wonder if most programmers do this...
Heh, I think I know what you mean. When I started my project, I used to plan everything very carefully, so everything looked nice. Now I just write the stuff directly into the code files, and I'll write whatever it takes to test my new ideas, even if the code is not very pretty.
After I see that the ideas work fine, I rewrite the code, making it look better, adding comment and maybe even optimizing a little.