I wonder why games like Super Mario RPG and Kirby Super Star needed the SA-1 chip for compression, when it doesn't look like they had more graphical data than the DKC series.
It's just a wild guess, but pehaps in DKC they don't decompress much graphics on the fly, so they can do it by software during forced VBlank, while in SA-1 they could decompress on the fly for any kind of VRAM updates ?
It's also possible they used a different compression scheme that is harder and more complex to decrompress.
Bregalad wrote:
It's just a wild guess, but pehaps in DKC they don't decompress much graphics on the fly, so they can do it by software during forced VBlank, while in SA-1 they could decompress on the fly for any kind of VRAM updates ?
You can't do any kind of decompression during VBlank! VBlank is a really precious resource in SNES, since is the only time window when you can send data to VRAM for next frame. It's just the other part of the processing time that is used to decompress.
psycopathicteen wrote:
I wonder why games like Super Mario RPG and Kirby Super Star needed the SA-1 chip for compression, when it doesn't look like they had more graphical data than the DKC series.
My guess is that they do some other kind of calculations using SA-1, like collsion detection in isometric perspective (Super Mario RPG) or sprite scalling (Kirby); besides, SA-1 has instructions dedicated to read bitstreams (not "bytestreams" as usually) and to convert bitmap images to character-based images (like the ones used by SNES PPU). All of these boost SNES capabilities to generate graphics.
For graphics, stuff like image scaling and variable width fonts take a little more processing power.
If VWF takes so much processing power, then how do RHDE and the Action 53 menu manage it so quickly on previous-gen hardware? The VWFs in KSS and SMRPG can't be more than a couple characters per frame, unlike the VWF engine that Blargg and I put together that sets a complete 128-pixel line of text in 10,000 6502 cycles. Has anyone profiled these games to see where they spend their CPU time?
Quote:
You can't do any kind of decompression during VBlank!
I didn't intend to. You can decompress graphics "on the fly" without doing it during VBlank - you just need to do it anywhere in the frame.
Some decompression algorithms can be extremely fast, barely slower than reading data.
Bregalad wrote:
Some decompression algorithms can be extremely fast, barely slower than reading data.
RLE for example can be even faster some times, since repeating a value is faster than loading a new one.
Yes, but RLE is not really useful for compressing graphics.
As silly as this sounds, byte pair encoding (that was originally mean to compress text) works somtimes well on graphics, and this is barely any slower than just copying data.
Static dictionary coding is also barely slower than just copying data.
The only drawback is that those algorithm doesn't work if the data is really random and uses all 256 combinations of bytes possible. But you'd be surprised how many data actually leaves a large range of values unused, leaving room to compress information.
tepples wrote:
If VWF takes so much processing power, then how do RHDE and the Action 53 menu manage it so quickly on previous-gen hardware? The VWFs in KSS and SMRPG can't be more than a couple characters per frame, unlike the VWF engine that Blargg and I put together that sets a complete 128-pixel line of text in 10,000 6502 cycles. Has anyone profiled these games to see where they spend their CPU time?
I have but it was a long time ago and from what I remember Super Mario RPG makes great use of BWRAM.
VWF shouldn't take up much time at all. Unless the game code is brain dead and it's building whole lines of chars at a time instead of one at a time (like you normally should/would).
The SA-1 has specialized Bitmap-to-bitplane RAM. Therefore the SA-1 can handle VWF and scaling faster than SNES CPU. The Bitmap RAM is only accessible from the SA-1 side.
I never knew VWF is something that is supposed to be CPU taxing. In this video
https://www.youtube.com/watch?v=hGmuaMoVO9I it looks like it only draws 1 or 2 letters per frame, with 2bpp tiles.
VWF is taxing when you're printing a whole freaking page of it at once, one line per frame (RHDE help, Action 53 menu). I think that's why Tetris 2 for NES prerenders the VWF on its copyright notice screen.
I just wrote a VWF routine and counted the cycles. It takes 50 cycles total to shift a 16-bit word (well actually multiply) and OR it with a VWF buffer. Each letter is 16 pixels tall and 2bpp, so that makes 32x50=1600 cycles, not including the time it takes to rearrange the bytes into tiles, and adding the box background. Some extra wide letters, like M and W, would need a second round, so that adds up to 3200 cycles, or little more than 5% of the CPU's time.
Code:
-;
sep #$20 //2
lda $0000,y //4 6
sta $4203 //4 10
rep #$20 //2 12
lda {vwf_buffer},x //5 17
nop //2 19
ora $4216 //5 24
sta {vwf_buffer},x //5 29
iny //2 31
txa //2 33
clc //2 35
adc #$0020 //3 38
tax //2 40
dec {temp} //8 48
bne - //3 51
psycopathicteen wrote:
not including the time it takes to rearrange the bytes into tiles
You can skip this part actually, just arrange tiles vertically then horizontally, e.g.
Code:
0 2 4 6 8
1 3 5 7 9
Then you know that the distance between two consecutive horizontal spans is exactly 32 bytes, while two consecutive rows are just 2 bytes away. No need to convert from bitmap to tiles or anything like that, just render directly into the tiles as-is!
Then you wouldn't be able to use 16-bit shifting.