Today I just thought of a new animation scheme that fixes most of the issues I've had with previous techniques.
Use 8x8 and 16x16 sprites. Each VRAM slot be the size of 2 16x16 sprites. Metasprites can use as many slots as they want.
Isn't this a whole lot more limited? I guess it takes less processing time though.
A couple reasons:
1) uploading a lot of individual 16x16 causes a lot of DMA overhead. My game is barely keeping everything within vblank.
2) 8x8 allow me to experiment more with skeletal animation and sprite shearing effects.
Quote:
Each VRAM slot be the size of 2 16x16 sprites
I don't see how it fixes :
Quote:
1) uploading a lot of individual 16x16 causes a lot of DMA overhead. My game is barely keeping everything within vblank.
Can you elaborate a bit ? So i can get it.
Anyway a sprite engine on snes will almost always depend on the game design. If you do too generic code, you will almost always have problems with v-blank.
I'm currently porting Kung Fu Master (Arcade) to Snes. And when i see how i have to handle sprite, it's very specific and can't be done with something generic (dma transfer and vram size allowed for sprites).
Any way i would like to know more about your new scheme.
++ Lint
lint wrote:
Quote:
Each VRAM slot be the size of 2 16x16 sprites
I don't see how it fixes :
Quote:
1) uploading a lot of individual 16x16 causes a lot of DMA overhead. My game is barely keeping everything within vblank.
Can you elaborate a bit ? So i can get it.
++ Lint
It takes more time to set up DMA registers twice as often for the same amount of data.
Ok so your gain is just dividing dma setup for sprite by 2... it's really what is slowing down your game ?
Can you explain what you consider a VRAM slot ? and why 2 x 16x16 sprites ?
It won't slow down my game, it just causes the top few scanlines to be blacked out.
DMA bandwidth can be calculated like this:
DMA bandwidth = "total data size" + "CPU setup time" * "number of chunks."
By doing bigger chunks at once, there will be less chunks per total data size, and take up less bandwidth and less likely to cause black scanlines on the top of the screen.
A slot is just a designated vram location, that is usually updated in one chunk.
I don't get it, how is uploading only 32x16 any more beneficial than uploading 32x32's and 16x16's, unless all you're uploading is 16x16's?
I thought this code I made was fine:
Code:
tile_uploader:
rep #$30 ; A=16, X/Y=16
ldx #$0000
lda #$1801 ; Set DMA mode (word, normal increment) and destination register (VRAM write register)
sta $4300
lda #$0080
sta $2115
tile_uploader_16x16:
lda TileRequestCounter16x16
beq tile_uploader_32x32_start
lda #$0040
sta $4305
sta $4315
;16x16 Top Half
lda TileRequestCounter16x16+VramAdress,x
sta $2116
lda TileRequestCounter16x16+BankNumber,x
sta $4303
sta $4313
lda TileRequestCounter16x16+TileAddress,x
sta $4302
clc
adc #$0040
sta $4312
lda #%0000000100000000 ; Initiate DMA transfer (channel 0)
sta $420A
;16x16 Bottom Half
lda TileRequestCounter16x16+VramAdress,x
clc
adc #$0100
sta $2116
lda #%0000001000000000 ; Initiate DMA transfer (channel 1)
sta $420A
txa
clc
adc #$0006
tax
cpx TileRequestCounter16x16
bne tile_uploader_16x16
tile_uploader_32x32_start:
lda TileRequestCounter32x32
beq tile_uploader_done
lda #$0080
sta $4305
sta $4315
sta $4325
sta $4335
tile_uploader_32x32:
lda TileRequestCounter32x32+VramAdress,x
sta $2116
lda TileRequestCounter32x32+BankNumber,x
sta $4303
sta $4313
sta $4323
sta $4333
lda TileRequestCounter32x32+TileAddress,x
sta $4302
clc
adc #$0040
sta $4312
clc
adc #$0040
sta $4322
clc
adc #$0040
sta $4332
lda #%0000000100000000 ; Initiate DMA transfer (channel 0)
sta $420A
;Second Row
lda TileRequestCounter32x32+VramAdress,x
clc
adc #$0100
sta $2116
lda #%0000001000000000 ; Initiate DMA transfer (channel 1)
sta $420A
;Third Row
lda TileRequestCounter32x32+VramAdress,x
clc
adc #$0200
sta $2116
lda #%0000010000000000 ; Initiate DMA transfer (channel 2)
sta $420A
;Fourth Row
lda TileRequestCounter32x32+VramAdress,x
clc
adc #$0300
sta $2116
lda #%0000100000000000 ; Initiate DMA transfer (channel 3)
sta $420A
txa
clc
adc #$0006
cmp TileRequestCounter32x32
beq tile_uploader_done
tax
bra tile_uploader_32x32
tile_uploader_done:
rts
I think "32x16" is supposed to refer to two 16x16s that are always consecutive
I mean, I know that, I just don't know what kind of advantage you're getting.
The top or bottom half of a 16x16 is 64 bytes, which can be copied in the equivalent of 86 fast cycles. But it takes about 36 cycles just to set up the registers for one copy. If you make longer copies, you can set up the registers fewer times.
Basically, since each slot is 32x16 instead of 16x16, instead of
Code:
set up dma to transfer one 16x16 tile
transfer 16x16 tile
set up dma to transfer one 16x16 tile
transfer 16x16 tile
it's
Code:
set up dma to transfer two 16x16 tiles
transfer 16x16 tile
transfer 16x16 tile
So you only have to set up DMA once per pair.
Yeah, but one 32x32 is...
Code:
set up dma to transfer four 16x16 tiles
transfer 16x16 tile
transfer 16x16 tile
transfer 16x16 tile
transfer 16x16 tile
Of course, you will also be doing
Code:
set up dma to transfer one 16x16 tile
transfer 16x16 tile
Which will about balance it out, so I have no clue how the new system is any more beneficial.
A 16x16 has two transfers: a 2-tile top half (64 bytes) and a 2-tile bottom half (64 bytes). A 32x16 also has two transfers: a 4-tile top half (128 bytes) and a 4-tile bottom half (128 bytes). Yet it transfers more data. This halves register setup overhead, which helps when register setup takes the same time as 24 bytes. Halving the number of transfers will more than halve the time spent on register setup during vblank, as it allows the register setups during active picture to cover more bytes. Of course, the ideal situation for vblank time is to transfer a set of eight 16x16s at once, as that adds up to a single 1024-byte transfer. But that also adds more work during active picture, as the data must be copied to a transfer buffer, and if you're using HDMA for OPT rocking or a mode 7 floor, you can't use DMA copies to WRAM without crashing a 1/1/1 console.
Oh, right, I forgot that there's a gap between the top and bottom halves of a 16x16 tile in VRAM. So it's more like:
Code:
; transfer two 16x16 tiles individually
set up dma to transfer two 8x8 tiles
transfer 8x8 tile
transfer 8x8 tile
set up dma to transfer two 8x8 tiles
transfer 8x8 tile
transfer 8x8 tile
set up dma to transfer two 8x8 tiles
transfer 8x8 tile
transfer 8x8 tile
set up dma to transfer two 8x8 tiles
transfer 8x8 tile
transfer 8x8 tile
; transfer 32x16 slot
set up dma to transfer four 8x8 tiles
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
set up dma to transfer four 8x8 tiles
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
; transfer 32x32 slot
set up dma to transfer four 8x8 tiles
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
set up dma to transfer four 8x8 tiles
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
set up dma to transfer four 8x8 tiles
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
set up dma to transfer four 8x8 tiles
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
transfer 8x8 tile
In other words, you don't really gain anything from having 32x32 slots, since it takes just as much time as two 32x16 slots, while being more wasteful of space.
Just another reason why the way tiles are laid out for sprites is dumb.
I would have said it was the only reason. The OBJ table format is nice for visualization; any sprite, regardless of size, looks the same in VRAM as it does on the screen. The trouble is that this format was designed for the Famicom, which used CHR-ROM and thus didn't have to worry about setting up the OBJ graphics with DMA at runtime...
Unless you're referring to the way you can only use 16 kB at a time. That's kinda unfortunate too...
The format is also bad in my opinion because it makes the sprite thing I devised more complicated. The format is definitely better suited to CHR ROM systems where you actually care about how tiles are arranged in rom. Visualization doesn't matter in ram where everything is constantly changing, or at least that's my opinion.
One thing that could help is if the sprites are laid out in ROM so that for each metasprite, you would only need to set the source address once. I would've set it up like this, if only I knew that I was going to use the 32x32 and 16x16 slots method. Before that, I was using arbitrary sized rectangles, and the sprite frames were still left in the ROM in that format, with the 32x32 and 16x16 slot method being hacked on.
You guys might hate me for this, but ...
psycopathicteen wrote:
Today I just thought of a new animation scheme that fixes most of the issues I've had with previous techniques.
Use 8x8 and 16x16 sprites. Each VRAM slot be the size of 2 16x16 sprites. Metasprites can use as many slots as they want.
... huh, what?!
Am I the only one here who thinks that throwing out some,
any random idea that crosses your mind and possibly (i.e., hopefully) makes sense within the context of
your own codebase™ but totally fails to deliver as a starting post of a
SNESdev thread -- which I'd expect to be informative/talkative/questionable/inquiring/asking for help, or whatever
meaningful content you might have in store for me -- is just useless? Or even silly?
Seriously, what the heck???