I have an idea for rendering bullets for a bullet hell shmup. Store 2bpp bullet sprite patterns and their transparency masks in the ROM at 8 horizontal scroll values. One row of 8 pixels would work like this:
lda buffer,y
and mask,x
ora pattern,x
sta buffer,y
That's already 8 pixels in ~20 cycles. I'm not too sure if that's enough though.
I'm not sure if I understand the idea for this. If your bullets are sprites, why do you need to update the pattern? Wouldn't it be easier to have a single sprite shared between as many bullets as you want, and move the bullets on-screen via object position?
You're going to need a value with which to calculate collisions anyway. Perhaps I'm missing something, but the explanation is very vague.
It could be that I'm missing something because this is SNES and not NES, but you can still move sprites on SNES, right?
If you can figure out what they did for Aero Fighters, from what I've seen I feel like that game has probably the best bullet:slowdown ratio for the console.
darryl.revok wrote:
I'm not sure if I understand the idea for this. If your bullets are sprites, why do you need to update the pattern?
But they
aren't sprites.
It's BG3 being used as a screen with a bunch of bullets being "painted" on it. The reason you'd do this is to avoid the sprite limit and the sprite pixel per scanline limit. It's really a shame that oam can't just be updated during vblank, because I think sprites are being drawn to a linebuffer then?
Wait... Couldn't you just write to oam during active display? I must be missing something, because this is way to obvious. I just don't see how oam could be used during hblank and active display, because I thought I remembered hearing about how sprites are drawn and then BGs or something like that.
Sorry for derailing this already. Sprites are really one of those situations where I'm not completely sure why they aren't just meant to be CPU driven, as in it wouldn't just have the same number of sprites per scanline as the total and you'd just multiplex it. Doesn't the Amiga actually work this way? Oh wait, I just said I was sorry for interrupting this.
Honestly, if I were to make a bullet hell for a 4th gen platform, I'd try to come up with regular patterns that are easy to recreate (e.g. by using tilemaps). Bonus in that it simplifies collision calculations (e.g. if there's a row of bullets, first check if the ship is inside the row, then if within the loop it's a bullet or a gap)
psycopathicteen wrote:
That's already 8 pixels in ~20 cycles. I'm not too sure if that's enough though.
Umm, according to the bsnes/higan source it would be 24 cycles if the Accumulator & Index registers are 16 bits long.
Anyway, I thought about coding a bullet hell game. Let me find my notes.
There would be 2 buffers, one for player bullets, one for enemy bullets. Each buffer would be a 1bpp bitmap, 256x192 px in size (6144bytes). Bullets would be a single pixel in size.
Some
VMAIN magic would combine the two buffers into a single 2bpp tileset.
Transfer One: player bullet buffer DMA DMAP_TRANSFER_1REG to VMDATAL with VMAIN set to $04 (increment on VMDATAL, 8 bit address shift).
Transfer Two: enemy bullet buffer DMA DMAP_TRANSFER_1REG to VMDATAH with VMAIN set to $84 (increment on VMDATAH, 8 bit address shift).
I never actually implemented this. My napkin-math suggested that I would not have been able to fit 250 bullets and 10 enemies onto the screen at 30fps (I lost that sheet and I can't remember how I got that conclusion),
Draw Bullet Code:
Code:
.A8
.I16
; DP = bullet address
LDA z:Bullet::xPos
AND #$07
TAY
LDA z:Bullet::xPos
LSR
LSR
LSR
STA tmp
REP #$30
.A16
LDA z:Bullet::yPos
AND #$00FF
XBA
LSR
LSR
LSR
; C always clear
; value at address tmp+1 is always 0
ADC tmp
TAX
; X = (xPos & 7)
; Y = yPos * 32 + xPos / 8
SEP #$20
.A8
LDA buffer, X
ORA SetBulletTable, Y
STA buffer, X
Code:
SetBulletTable:
.repeat 8, i
.byte 1 << i
.endrepeat
Collision code would have been pixel perfect:
Code:
.A16
.I16
Check_8x8Collision:
; X = frame collsion data offset + (xPos & 7) * 2
; Y = yPos * 32 + xPos / 8
.repeat 8, i
LDA buffer + i * 32, Y
AND frameCollisionData + i * 16 * 2, X
BNE CollisionOccoured
.endrepeat
; no collision
CollisionOccoured:
; collision code
Code:
; CollisionData
; -------------
.macro _buildRow data
.repeat 8, i
.word data << i
.endrepeat
.endmacro
CollisionDataFrame1:
_buildRow %00011000
_buildRow %00011000
_buildRow %00111100
_buildRow %00111100
_buildRow %00111100
_buildRow %00111100
_buildRow %01111110
_buildRow %11111111
.endrepeat
EDIT: Added info about combining buffers.
I was basically thinking of doing this.
Making normal objects act like a normal game with hardware sprites, but also have a layer of software bullets on top of it. (Or underneath it, wait I need to check how priorities work again) The whole game would probably run at 30fps, alternating between updating the normal sprites and backgrounds, and updating bullets. It would need the screen to be cropped at 184 pixels in order to fit a whole 2bpp screen in one frame.
This is 256 bullets moving at 20fps.
psycopathicteen wrote:
This is 256 bullets moving at 20fps.
Nice.
The movements are smoother than I expected them to be.
Are you going to do more with this?
Quote:
Umm, according to the bsnes/higan source it would be 24 cycles if the Accumulator & Index registers are 16 bits long.
At first I didn't know what you were talking about, but then I found this at
http://www.defence-force.org/computing/ ... /annexe_2/:
Quote:
3) Add 1 cycle if adding index crosses a page boundary
I seriously never knew that. Surprisingly the long index addressing doesn't have a similar limitation. I wonder if that was just something left over from the 6502, because I don't see why a CPU with a 16-bit ALU would need to do that.
psycopathicteen wrote:
I was basically thinking of doing this.Making normal objects act like a normal game with hardware sprites, but also have a layer of software bullets on top of it.
I think 93143 is doing the same thing.
psycopathicteen wrote:
It would need the screen to be cropped at 184 pixels in order to fit a whole 2bpp screen in one frame.
Why not double buffer?
Presumably you need memory for everything else too. (also could be referring to transfer bandwidth)
Also I was thinking, most bullet hells are vertical. You could probably just use that ad an excuse to render only half the screen (the extra space would be presumably used for the HUD)
psycopathicteen wrote:
I seriously never knew that. Surprisingly the long index addressing doesn't have a similar limitation. I wonder if that was just something left over from the 6502, because I don't see why a CPU with a 16-bit ALU would need to do that.
It is because absolute index addressing can increment the bank when the index crosses the bank boundary. This means that that 2 processing cycles are needed to preform the 24 bit addition with the 65816's 16 bit ALU.
The 65816 first preforms an 8 bit addition between the low byte of the address and the low byte of the index when it reads ADDR.H.
In the next cycle preforms a 16 bit addition between the 16 bit DB:ADDR.H and IH.
If the page boundary is never crossed (8 bit index && carry of {ADDR.L + I} is 0) then DB:ADDR.H is unchanged and the addition is skipped, saving an unneeded cycle.
(source)With absolute long addressing the second addition is processed in the half-cycle after the bank byte is read from memory and will not save a cycle if skipped.
EDIT: added source, reordered sentences.
Quote:
Add 1 cycle for indexing across page boundaries, or write, or X=0
Do they mean X=0 as in the status register bit that controls the size of the index registers? So does that mean that it always take an extra cycle when the index registers are 16-bit?
psycopathicteen wrote:
Quote:
Add 1 cycle for indexing across page boundaries, or write, or X=0
Do they mean X=0 as in the status register bit that controls the size of the index registers? So does that mean that it always take an extra cycle when the index registers are 16-bit?
Yes, that extra cycle always occurs when the Index registers are 16 bit.
Espozo wrote:
psycopathicteen wrote:
I was basically thinking of doing this.Making normal objects act like a normal game with hardware sprites, but also have a layer of software bullets on top of it.
I think 93143 is doing the same thing.
Yeah, for a port of an existing game. But I'm not doing software rendering on the S-CPU, partly because of the sheer number, size, and colour depth of bullets in the original, and partly because I got stubborn about look&feel and blew 3/4 of my CPU budget on raster effects. I'm using the Super FX chip for bullets and collisions, which moves me out of direct competition with anything that doesn't need a coprocessor.
Also, it's been almost two years and I still haven't done a bullet test. Advantage: not me...
Wait a minute... Does this "existing game" use 3D graphics and you're just using 2D on the SNES, or is the original game 2D?
2D, the SuperFX is because there are way too many bullets for the 65816 to render.
The original game is functionally 2D, but some of the backgrounds are good candidates for Mode 7. Meaning that in a lot of cases the bullet layer has to be tiled sprites. Meaning the sidebar can't also be done in sprites or there will be glitching the moment the player starts shooting.
That's what eats the majority of the CPU time - the jitter-corrected H-IRQ in the middle of the screen to change from whatever the playfield is in to Mode 1, reset the scroll, mosaic, and colour math registers, and decrement a counter to see if it's time to do one of a couple of other things I ran out of HDMA channels for. I had already decided to use the Super FX because of the bullet load, so I told myself the loss was acceptable in terms of total available computing power...
From what psycopathicteen is posting, I imagine that if his idea results in a game it will be designed to make efficient use of the console's existing capabilities rather than brute-forcing non-native graphical elements that don't actually affect gameplay in any way. It would be really cool to see an intense bullet hell game running on a Super NES with no special chip assistance...
93143: Did you ever consider to render the game exclusively in TATE orientation? Opens up a whole lot of possibilities for vertical shooters.
Come on, guys, can we talk about the subject of the thread? I knew I shouldn't have posted...
...
I had considered tate a bit, but not much because the original was yoko with a sidebar, and I was trying to be as unreasonably accurate as possible. Which is why I didn't just do it in yoko without a sidebar and stick the necessary information in the top and bottom borders. In fact, since my initial inspiration was some random website commenter who said this game was impossible on the SNES, anything that looked impossible on the SNES was actually a plus (one of the advantages of working on real hardware is that you don't have to stick to the console's "limitations" if you can trick it into ignoring them).
I will admit that there are a number of things this game does that would be much easier to pull off if they were flipped 90 degrees, the playfield/sidebar split being the most notable. But now that I come to consider the idea more fully, there are a lot that wouldn't; ironically the Mode 7 candidate backgrounds are in this category. Not only are a lot of them in perspective à la F-Zero, but using sprites for the bullet layer would result in the player's lasers glitching out pretty much the entire scanline. Sure, I could draw all that stuff on the Super FX if I kept the playfield at 144 or 152 pixels wide and put the sidebar data on top or something, which gives me enough bandwidth for 8bpp, or two 4bpp layers, at 30 fps - but wasn't the whole point of this to save S-CPU time and thus modestly reduce the load on the Super FX? And if I'm willing to toss the Mode 7 backgrounds entirely for the sake of a little extra compute time, why not just leave the screen in yoko and use Mode 1 for the whole thing?
Besides, who in 1995 would have been willing to flip a TV on its side to play a SNES game? I know shmup nuts do it now, but...
...
On topic: It strikes me that smaller bullets would render significantly faster than 8x8s. For example, 4x4 bullets could be nearly three times as fast, since they're half as tall and only require one sliver per line for the majority of horizontal positions (not sure if the current code has any way to take advantage of this). If the movement and collision code is fast enough, and judging by that demo it seems pretty quick, I imagine you could potentially get quite a swarm going even at 30 fps...
Also, if I've understood this technique correctly it should scale fairly well to 4bpp and 8bpp, though obviously the screen coverage per frame would get a lot worse. By temporarily stashing the mask in direct page and unrolling the sliver to avoid needing three index registers, it should be possible to somewhat limit the data storage requirements.
psycopathicteen wrote:
This is 256 bullets moving at 20fps.
This is dope.
I do agree that the bullets should be smaller, which would allow you to do this faster.
How hard will it be to incorporate this layer with backgrounds, the player, enemies, and collision detection? I'd love to see a game come out of these techniques with awesome danmaku action for the SNES.
I'm guessing Radiant Silvergun 93143? I know you were talking about Ikaruga, but I don't see how that's possible. Alright, sorry...
Anyway...
Quote:
lda buffer,y
and mask,x
ora pattern,x
sta buffer,y
Wait, how are you dealing when the bullet is between two tiles horizontally?
Espozo wrote:
I'm guessing Radiant Silvergun 93143? I know you were talking about Ikaruga, but I don't see how that's possible. Alright, sorry...
Anyway...
Quote:
lda buffer,y
and mask,x
ora pattern,x
sta buffer,y
Wait, how are you dealing when the bullet is between two tiles horizontally?
I draw two tiles.
Really now?
I'm wondering
how you draw two tiles, more specifically which tiles you know to draw in.
darryl.revok wrote:
I do agree that the bullets should be smaller
Oh, I never said they
should be smaller. I'm sure there's room for an assortment of bullet sizes, maybe even big ones done with real sprites. I just noted that they
could be smaller.
Most bullets I've seen on this kind of resolution are 5x5 or 6x6.
I think I'll move the screen up a bit, and use IRQ instead of NMI. Should I make the picture full screen? I'd have to double buffer it though.
psycopathicteen wrote:
Should I make the picture full screen?
I would. You wouldn't really need both framebuffers to be fullsize, would you? Maybe you could have 1 full framebuffer, and then 2/3 a framebuffer but you'd need two tilemaps that you'd alternate between (assuming you don't want to update the tilemap). Wait, couldn't you use 1 32x32, 16x16 tile tilemap and just change the vertical scroll value to change between "two tilemaps"? I mean, in terms of data size, that's the same as an 32x32, 8x8 tile tilemap.
Also, just thinking, if you ever plan on making this into a game, I imagine you could do single pixel collision with the bullets to process everything faster. Maybe you could also have the option for the bullet layer to not check collisions against the background when you want it to. Generally in these games, if there are a ton of bullets, there isn't anything else to run into. (I never understood why. I always thought it would be neat to do a duck and cover sort of thing.)
There's already enough to worry about with having to dodge the bullets.
It would need double buffering on the CPU side, so it can work on one frame while the previous frame is being uploaded.
I actually have a very sneaky trick with collision detection up my sleave.
psycopathicteen wrote:
Should I make the picture full screen?
Wouldn't that limit you to 20 fps at best? Frankly it looks a little choppy to me - still impressive, but a bit suboptimal for a shooter where tracking these bullets by eye is the whole point. If you ended up wanting a pattern with a huge number of fairly slow bullets, you could always drop to 20 temporarily...
With 256x208, you could easily do 30 fps with a couple KB free every frame to do other stuff in. With 256x216, you could still do 30 fps, but there's not much room left for anything besides updating OAM, and even that's getting tight... A fully-occluding status bar reduces the amount of data to transfer, but doesn't help DMA bandwidth - 256x208 plus an 8-line status bar leaves less than a KB per frame...
I'd suggest trimming the edges, but I wonder if that would complicate wrap detection in the renderer...
...
Another thing I thought of: would you try to finish blitting a contiguous transfer chunk as soon as possible, so you could get a jump on the DMA, or would you finish the whole screen before starting the transfer and just eat the lag?
Espozo wrote:
Wait, couldn't you use 1 32x32, 16x16 tile tilemap and just change the vertical scroll value to change between "two tilemaps"? I mean, in terms of data size, that's the same as an 32x32, 8x8 tile tilemap.
Clever... and it can be switched per-layer too, so you aren't locking yourself into anything this way. I guess this is one of those SNES features I noticed but never put much thought into, since I figured I didn't need it...
As for the partial buffering in VRAM, 30 fps has an advantage there too. You don't have to actually sustain 30 fps, but if you have enough blanking time that you theoretically could, you can do 3/2 buffering, where you only need to buffer half the screen size over and above what the display is using. You'd save a couple of KB as compared with 5/3 buffering, and several as compared with full double buffering...
Quote:
Generally in these games, if there are a ton of bullets, there isn't anything else to run into. (I never understood why. I always thought it would be neat to do a duck and cover sort of thing.)
It would be cool, but it would require the terrain collision routine to be run for every single bullet.
93143 wrote:
Clever...
I'm good at coming up with ideas, just not implementing them.
Irrelevant, but I made the realization that if you were to have a 4096 color image on the SNES using color math between a 8bpp and a 4bpp layer, then you'd only need one tilemap.
Actually back to the discussion, It's still crazy, but like I said, the collision for the bullets could be a single point. Really, BG collision would be easy, that is, if it weren't being run several hindered times... Because bosses in these games often fire the most bullets and are more often than not around obstacles, you could switch between checking for BG collision and not when appropriate. Actually, you could do it for everything.
Off topic, but I really want to know how they handled BG collision here, even if it is slow as all get out:
Espozo wrote:
I made the realization that if you were to have a 4096 color image on the SNES using color math between a 8bpp and a 4bpp layer, then you'd only need one tilemap.
Well, 7/16 of a tilemap. If the rest of it never shows up on screen, you can put tile data in it. Meaning a 224x192 still image (perfect 4:3 with 16-pixel borders all around) is feasible at 12bpp.
Hang on; that's just 3/8 of the tilemap, even if you only count free space if tiles will fit in it... I suppose you could fit a few sprite tiles into the remaining 256 bytes...
I figured this subject wasn't likely to trigger an extended digression... I'll shut up now...
93143 wrote:
With 256x208, you could easily do 30 fps with a couple KB free every frame to do other stuff in. With 256x216, you could still do 30 fps, but there's not much room left for anything besides updating OAM, and even that's getting tight... A fully-occluding status bar reduces the amount of data to transfer, but doesn't help DMA bandwidth - 256x208 plus an 8-line status bar leaves less than a KB per frame...
Try 256x192, some MD games use that. There's a good reason: it pretty much doubles blanking time in NTSC, but the borders still aren't very noticeable, especially with the image centered (and overscan will eat part of the borders as well). That could be worth trying.
What would really speed up rendering would be to have the graphics embedded into the code itself as immediate values.
I really kind of wondered why you didn't do that, but I thought maybe you had a specific reason which was why I asked how it worked.
Again though, how do you know when to draw an extra tile horizontally?
Espozo wrote:
how do you know when to draw an extra tile horizontally?
if (bullet_left % tile_width) + bullet_width - 1 >= tile_width then you need two tiles across.
I don't have a clue what that means, but I was also wondering if you could also use a lookup table for that and see if it would go faster. For this, I'd do the craziest things just to save a couple of cycles.
bullet_left: The X coordinate of the left side of a bullet
tile_width: 8 on most platforms, 4 if trying to use 16-bit reads and writes on 4bpp packed pixel platforms such as Genesis or GBA, 7 on oddball platforms such as Apple II
bullet_left % tile_width: The remainder when dividing bullet_left by tile_width, which will be in the range 0 through 7. Interpret it as the distance in pixels from the left side of the first tile that the bullet occupies.
bullet_width: The width of the bullet in pixels, such as 5
bullet_width - 1: The distance from the center of the leftmost pixel of a bullet to the center of its rightmost pixel
(bullet_left % tile_width) + bullet_width - 1: The distance in pixels from the left side of the first tile to the rightmost pixel of the bullet
(bullet_left % tile_width) + bullet_width - 1 >= tile_width: Whether this is at least one byte
Let's try an example, with bullet_left=101 and bullet_width=5
bullet_left % tile_width = 5, meaning that the bullet starts 5 pixels from the left side of the tile
bullet_left % tile_width + bullet_width - 1 = 5 + 5 - 1 = 9, meaning that the bullet's rightmost pixel is 9 pixels from the left side of the leftmost tile containing the bullet.
Because this distance is at least 8 pixels (the width of a tile), the bullet will occupy two tiles.
Now move this bullet a bit, with bullet_left=97 and bullet_width=5
bullet_left % tile_width = 1, meaning that the bullet starts 1 pixel from the left side of the tile
bullet_left % tile_width + bullet_width - 1 = 1 + 5 - 1 = 5, meaning that the bullet's rightmost pixel is 5 pixels from the left side of the leftmost tile containing the bullet.
Because this distance is less than 8 pixels (the width of a tile), the bullet will occupy only one tile.
These two inequalities are equivalent:
bullet_left % tile_width + bullet_width - 1 >= tile_width
bullet_left % tile_width + bullet_width >= tile_width + 1
The first means that the last pixel goes into a new tile. The second means that the right edge of the last pixel is past the end of the first tile.
The following subroutine calculates the second inequality and puts the result in carry:
Code:
; as usual, capitals denote constants
TILE_WIDTH = 8
MOD_TILE_WIDTH = TILE_WIDTH - 1
.assert tile_width & mod_tile_width = 0, error, "tile_width must be a power of 2 to calculate remainders with AND"
.proc bullet_x_needs_two_tiles
lda bullet_left,x
and #MOD_TILE_WIDTH ; A = bullet_left % tile_width
clc
adc bullet_width,x ; A = (bullet_left % tile_width) + bullet_width
cmp #TILE_WIDTH + 1
rts
.endproc
Setting $2115 to $84 allows you to DMA 2bpp tiles as if they are arranged in a bitmap. The buffer is arranged from left to right, up to down, with each pair of bytes representing an 8x1 sliver.
Here is 30fps, 256x216, with 256 4x4 "sprites." I think they look a little too tiny.
psycopathicteen wrote:
Here is 30fps, 256x216
Now we're talking!
psycopathicteen wrote:
I think they look a little too tiny.
I agree. Try this: (I tried 6x6, but I couldn't get a convincing circle)
Attachment:
5x5 Bullet.png [ 193 Bytes | Viewed 1920 times ]
You know though, I still don't necessary understand how you're finding the tiles (although it's changed now that you said it acts like a bitmap now) but I tried me own code where you'd index two different tables by the x and y positions, but it kind of exploded...
I must seriously be doing something wrong... I pretty much just doubled the amount of data for actual drawing code just to get rid of one "tay"...
Code:
rep #$30 ;A=16, X/Y=16
ldx BulletYPosition
sec
sbc #BulletHeight-1
cmp #ScreenWidth
beq done
lda BulletYPositionBufferOffsetTable,x
tax
jsr (StartOfBulletYPositionCode,x)
bullet_x_position_code_finder:
ldx BulletXPosition
sec
sbc #BulletWidth-1
cmp #ScreenWidth
beq done
lda BulletXPositionCodeJumpTable,x
tax
jsr (StartOfBulletXPositionCode,x)
;============================================================
bullet_y_position=5_start:
ldy #$0200
ldx #$0200+BulletHeight*32
stx EndOfBullet
ldx #$0200+8
stx BottomOfTile
;============================================================
bullet_x_position=23_start:
sep #$20 ;A=8
lda Buffer+4,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4,y
tya
clc
adc #$08
cpy BottomOfTile
beq bullet_x_position=23_next_tile_start
cpy EndOfBullet
beq bullet_x_position=23_loop
rts
bullet_x_position=23_loop:
tay
lda Buffer+4,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4,y
tya
clc
adc #$08
cpy BottomOfTile
beq bullet_x_position=23_next_tile_start
cpy EndOfBullet
bne bullet_x_position=23_next_tile_start:
rts
bullet_x_position=23_next_tile_start:
lda Buffer+4+128,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4+128,y
tya
clc
adc #$08
cpy EndOfBullet
bne bullet_x_position=23_next_tile_loop
rts
bullet_x_position=23_next_tile_start:
tay
lda Buffer+4+128,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4+128,y
tya
clc
adc #$08
cpy EndOfBullet
bne bullet_x_position=23_next_tile_loop
rts
Edit: hey, psychopathicteen, you heard the discussion 93413 were having about tile map data and stuff, right? I'm just wondering because you could make the tilemap half as large if you use 16x16 tiles and move the tilemap up and down to "switch" it. Also, what's with the gaps between the buffers?
Because you said you could have it like a regular bitmap, I thought I'd improve the disastrous code I made earlier: (It's a bullet that's 5 pixels tall.)
Code:
rep #$30 ;A=16, X/Y=16
ldx BulletYPosition
sec
sbc #BulletHeight-1
cmp #ScreenWidth
beq done
lda BulletYPositionBufferOffsetTable,x
tay
bullet_x_position_code_finder:
ldx BulletXPosition
sec
sbc #BulletWidth-1
cmp #ScreenWidth
beq done
lda BulletXPositionCodeJumpTable,x
tax
jsr (StartOfBulletXPositionCode,x)
;============================================================
bullet_x_position=23_start:
lda Buffer+4,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4,y
lda Buffer+4+64,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4+64,y
lda Buffer+4+128,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4+128,y
lda Buffer+4+192,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4+192,y
lda Buffer+4+256,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4+256,y
rts
So if the CPU is running between 2.5 and 3.5 MHz or something, we can say it's running at 3, and 3,000,000 / 60 = 50,000. I was thinking of the feasibility or running just this at 60fps because although you can't upload tiles that fast, collision detection and whatnot will certainly bring it down, and you wouldn't want the game running at 20fps.
I wonder whether you could use the 8x8 bit multiplier to avoid having to store eight versions of the bullet sprite. Load the shift amount ($80, $40, $20, $10, $08, $04, $02) into $4202 before drawing a bullet. Then write each byte of the bullet sprite to $4203, and eight cycles later, read the tile out of $4217 (left tile) and $4216 (right tile). That'd let you use a wider variety of bullet sprites and even reuse the same routine for a proportional font.
Collision detection with that many bullets will have to use either 1D sorting (if the current bullet is too far from the player, all subsequent bullets in the same direction will be likewise) or the 2D "sector method" described in
a 1995 Dr. Dobb's article by Dave Roberts.
tepples wrote:
I wonder whether you could use the 8x8 bit multiplier to avoid having to store eight versions of the bullet sprite.
If it's slower, than don't bother.
In some cases, you might be right. One difference between the NES and the Super NES is that the memory in a Super NES Game Pak is typically about eight times as big. This is large enough to store eight copies of each bullet and each glyph in a font, each shifted by a different amount, without causing unacceptable compromises to the detail of other graphics. But you'll need to write a program that makes said eight copies of each bullet graphic, and you'll need to have your build process re-run that program every time you edit the bullet graphics.
tepples wrote:
But you'll need to write a program that makes said eight copies of each bullet graphic,
It's not exactly difficult to do that manually...
It is when you have 32 different bullet types and 96 different VWF glyphs, and you don't want an old version of a bullet to get included in the final build of your ROM sent to lot check.
tepples wrote:
32 different bullet types
32? That's extremely unlikely. Player bullets will be sprites, and they're usually the most varied. I doubt there'll be more than 8.
Even with a single bullet type you're very prone to forget to update any of the shifts though. Anything to reduce the amount of steps needed is useful, that's less places to forget to update.
Whatever. He can handle it the way he wants to. I'm just saying, I'd do nearly anything to reduce the number of cycles on a routine like this.
Espozo wrote:
Because you said you could have it like a regular bitmap, I thought I'd improve the disastrous code I made earlier: (It's a bullet that's 5 pixels tall.)
Code:
rep #$30 ;A=16, X/Y=16
ldx BulletYPosition
sec
sbc #BulletHeight-1
cmp #ScreenWidth
beq done
lda BulletYPositionBufferOffsetTable,x
tay
bullet_x_position_code_finder:
ldx BulletXPosition
sec
sbc #BulletWidth-1
cmp #ScreenWidth
beq done
lda BulletXPositionCodeJumpTable,x
tax
jsr (StartOfBulletXPositionCode,x)
;============================================================
bullet_x_position=23_start:
lda Buffer+4,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4,y
lda Buffer+4+64,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4+64,y
lda Buffer+4+128,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4+128,y
lda Buffer+4+192,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4+192,y
lda Buffer+4+256,y
and #BulletMask>>7
ora #BulletPattern>>7
sta Buffer+4+256,y
rts
So if the CPU is running between 2.5 and 3.5 MHz or something, we can say it's running at 3, and 3,000,000 / 60 = 50,000. I was thinking of the feasibility or running just this at 60fps because although you can't upload tiles that fast, collision detection and whatnot will certainly bring it down, and you wouldn't want the game running at 20fps.
That gives me an idea. You can probably set up a macro with the shift amount as the parameter.
macro draw_bullet(n){
lda buffer,x
and.w #(mask>>n&$00ff)|(mask&$ff00>>n)
ora.w #(pattern>>n&$00ff)|(pattern&$ff00>>n)
sta buffer,x
}
I'm not sure if I got the syntax correct though.
If it's possible, I'd also make the "bullet_x_position=XXX_start" thing a macro to, because how it's currently set up, you'll need over 256 copies of the second half of the code... Now that I think about it, that is a really stupid idea. Doing this:
Code:
lda bullet_left,x
and #MOD_TILE_WIDTH ; A = bullet_left % tile_width
clc
adc bullet_width,x ; A = (bullet_left % tile_width) + bullet_width
cmp #TILE_WIDTH + 1
rts
And something like this:
Code:
and #$0000000000000111
tax
lda BulletXPositionCodeJumpTable,x ;Only 8 different locations now
tax
jsr (StartOfBulletXPositionCode,x)
;============================================================
lda BulletXPosition
ror
ror
ror
and #$0001111111111111
clc
adc VerticalOffset ;Calculated earlier like in the other code
tay
Is... Actually over twice as long...
Maybe it wasn't such a stupid idea? I feel like this is the kind of situation where you have to go to extreme measures. Wait a minute though, you'd have to copy the code not only 256+ times for the different x positions, but also for every different bullet... Yeah, I give up...
Wait... I just noticed something... I really wouldn't even need the top part, bringing the code down to 13 instructions, not as good as 8, but I feel that only having 8 copies of the bottom code is a
bit more reasonable than 256 times...
This way, you could recopy the 8 different codes for every bullet. It's still a little ridiculous, but it's at least with in reason. I think this is the perfect speed/memory tradeoff.
You know what I also just now noticed? I haven't done anything relating to if the bullet is partially off the screen... I mean, with the older 256+ times code, it could work horizontally, but not vertically.
Dang it psychopathicteen, how are you even doing this? I want to see how what I've been doing stacks up. I don't even like bullet hell games, but this is an interesting project.
Now I got 5x5 sprites at 30fps.
I came up with a macro (in bass assembler) to automatically create routines for 8 different scroll values.
Code:
define pattern_0($0070)
define pattern_1($70e8)
define pattern_2($70c8)
define pattern_3($7088)
define pattern_4($0070)
define mask_0($7070)
define mask_1($f8f8)
define mask_2($f8f8)
define mask_3($f8f8)
define mask_4($7070)
macro draw_bullet(n) {
lda $0000,y
and.w #((({mask_0} & 0x00ff) >> {n}) + (({mask_0} >> {n}) & 0xff00) ^ 0xffff)
ora.w #((({pattern_0} & 0x00ff) >> {n}) + (({pattern_0} >> {n}) & 0xff00))
sta $0000,y
lda $0040,y
and.w #((({mask_1} & 0x00ff) >> {n}) + (({mask_1} >> {n}) & 0xff00) ^ 0xffff)
ora.w #((({pattern_1} & 0x00ff) >> {n}) + (({pattern_1} >> {n}) & 0xff00))
sta $0040,y
lda $0080,y
and.w #((({mask_2} & 0x00ff) >> {n}) + (({mask_2} >> {n}) & 0xff00) ^ 0xffff)
ora.w #((({pattern_2} & 0x00ff) >> {n}) + (({pattern_2} >> {n}) & 0xff00))
sta $0080,y
lda $00c0,y
and.w #((({mask_3} & 0x00ff) >> {n}) + (({mask_3} >> {n}) & 0xff00) ^ 0xffff)
ora.w #((({pattern_3} & 0x00ff) >> {n}) + (({pattern_3} >> {n}) & 0xff00))
sta $00c0,y
lda $0100,y
and.w #((({mask_4} & 0x00ff) >> {n}) + (({mask_4} >> {n}) & 0xff00) ^ 0xffff)
ora.w #((({pattern_4} & 0x00ff) >> {n}) + (({pattern_4} >> {n}) & 0xff00))
sta $0100,y
}
macro draw_bullet_second_half(n) {
lda $0002,y
and.w #((({mask_0} & 0xff00) << (8 - {n})) + (({mask_0} << (8 - {n})) & 0x00ff) ^ 0xffff)
ora.w #((({pattern_0} & 0xff00) << (8 - {n})) + (({pattern_0} << (8 - {n})) & 0x00ff))
sta $0002,y
lda $0042,y
and.w #((({mask_1} & 0xff00) << (8 - {n})) + (({mask_1} << (8 - {n})) & 0x00ff) ^ 0xffff)
ora.w #((({pattern_1} & 0xff00) << (8 - {n})) + (({pattern_1} << (8 - {n})) & 0x00ff))
sta $0042,y
lda $0082,y
and.w #((({mask_2} & 0xff00) << (8 - {n})) + (({mask_2} << (8 - {n})) & 0x00ff) ^ 0xffff)
ora.w #((({pattern_2} & 0xff00) << (8 - {n})) + (({pattern_2} << (8 - {n})) & 0x00ff))
sta $0082,y
lda $00c2,y
and.w #((({mask_3} & 0xff00) << (8 - {n})) + (({mask_3} << (8 - {n})) & 0x00ff) ^ 0xffff)
ora.w #((({pattern_3} & 0xff00) << (8 - {n})) + (({pattern_3} << (8 - {n})) & 0x00ff))
sta $00c2,y
lda $0102,y
and.w #((({mask_4} & 0xff00) << (8 - {n})) + (({mask_4} << (8 - {n})) & 0x00ff) ^ 0xffff)
ora.w #((({pattern_4} & 0xff00) << (8 - {n})) + (({pattern_4} << (8 - {n})) & 0x00ff))
sta $0102,y
}
draw_bullet_at_0:
draw_bullet(0)
rts
draw_bullet_at_1:
draw_bullet(1)
rts
draw_bullet_at_2:
draw_bullet(2)
rts
draw_bullet_at_3:
draw_bullet(3)
rts
draw_bullet_at_4:
draw_bullet(4)
draw_bullet_second_half(4)
rts
draw_bullet_at_5:
draw_bullet(5)
draw_bullet_second_half(5)
rts
draw_bullet_at_6:
draw_bullet(6)
draw_bullet_second_half(6)
rts
draw_bullet_at_7:
draw_bullet(7)
draw_bullet_second_half(7)
rts
bullet_routine_table:
dw draw_bullet_at_0
dw draw_bullet_at_1
dw draw_bullet_at_2
dw draw_bullet_at_3
dw draw_bullet_at_4
dw draw_bullet_at_5
dw draw_bullet_at_6
dw draw_bullet_at_7
That looks complicated...
Wow though!
Do you know how much processing time you have left over on the next frame?
You know, I'm trying something out and I want to know something... You said there was some kind of register you could set to where you can upload pixels as if it were a bitmap if they're 2bpp. How would this work for 4bpp?
Espozo wrote:
You know, I'm trying something out and I want to know something... You said there was some kind of register you could set to where you can upload pixels as if it were a bitmap if they're 2bpp. How would this work for 4bpp?
Quote:
2115 wb++?-
i---mmii
i = Address increment mode^:
0 => increment after writing $2118/reading $2139
1 => increment after writing $2119/reading $213a
ii = Address increment amount
00 = Normal increment by 1
01 = Increment by 32
10 = Increment by 128
11 = Increment by 128
mm = Address remapping
00 = No remapping
01 = Remap addressing aaaaaaaaBBBccccc => aaaaaaaacccccBBB (2bpp buffer)
10 = Remap addressing aaaaaaaBBBcccccc => aaaaaaaccccccBBB (4bpp buffer)
11 = Remap addressing aaaaaaBBBccccccc => aaaaaacccccccBBB (8bpp buffer)
Actually, for what I decided I want to do, this wouldn't be too useful because although I want to upload data from a buffer, I want to upload only parts of the buffer at a time, and the buffer is to be larger than the screen.