I know I have started a topic about this before, but nothing good came out of it... =)
I'd like to ask you guys if anyone had any luck trying to compress pattern table data so that you can store it in CHR-RAM carts using less space?
I just tried out my latest idea for this, something based on quadtrees. I encode each bit plane separately, because they seem to have much more flat areas than when combined to make the 2bpp tile.
The pixels in a each tile are checked, and if they are all equal a flag indicating this is output, and the color comes next. So, a tile that uses only one color will be encoded in only 4 bits, 2 for each plane. But that rarely happens, of course.
If any of the pixels is different, a flag is output to indicate this, and the tile is divided into 4 4x4-pixel blocks, and each of them is processed the same way. These blocks will be divided into 2x2-pixel blocks if needed, and those will either share the same color or all 4 pixels are output.
The worst case would be when all squares need to be divided into smaller squares, because all the pixels will be there, and so will the flags that indicate that the blocks should be divided. In that case, the tile would be encoded in 149 bits, instead of the usual 128. Quite a bit of expansion there.
I was not so happy with the results. SMB's tileset was compressed to 71% of it's size. 8KB of some Mega Man tiles compressed about the same, 72%. So I guess the results do not get any better than this.
A really good compression ratio would be about 50% I guess... But is that a realistic goal for lossless compression of NES graphics? When I ZIP the SMB tileset I get a file about 1KB smaller than my own compression. When I ZIP my compressed file, nothing seems to happen, so that results in a bigger file than compressing the original data.
Just for fun, I also tested the algorithm on other types of data, such as JPEG images and NES code. All those tests resulted in some expansion of the original files.
Anyway, this method was my best idea so far, but I'm not really pleased with the results. Does anyone have any ideas on how to improve compression of tiles?
EDIT: Just to give you an idea of what my algorithm did, it turned this:
into this:
The compressed data was successfully decompressed back into the original data, the only problem is the sub-optimal compression ratio.
I was gonna use LZ77, with Mickael Pointier's FilePack, but I had some kind of bug in the decompression that corrupted the gfx a bit. Was kinda weird modifying it to do all the reading through $2007, heh. But I've stuck with RLE for going direct to VRAM, it's quick and easy.
From my Garage Cart project for example:
Solar Wars CHR
32,768 bytes originally (unused ones removed, many blank tiles)
15,248 bytes with RLE
13,191 bytes with LZ77
992 bytes squid.chr
736 bytes squid.rle
484 bytes squid.lz7
Looking at CHR data in an hex editor, it always seems like there'd be some really good way of compressing it. But it really depends on the tiles.
Memblers wrote:
Was kinda weird modifying it to do all the reading through $2007, heh.
Did you actually set the PPU address twice for each byte copied (once for reading it and once for writing it)? That is crazy!
Quote:
But I've stuck with RLE for going direct to VRAM, it's quick and easy.
Yeah, good, old, easy RLE. I'd never think it would work well with CHR data.
Quote:
15,248 bytes with RLE
This is impressive, but then again, there were the unused and blank ones you talked about.
Quote:
736 bytes squid.rle
484 bytes squid.lz7
The difference is quite big here. But is the difference worth the trouble of implementing more complex compression schemes? I don't know.
Quote:
Looking at CHR data in an hex editor, it always seems like there'd be some really good way of compressing it.
This is true.
I don't think LZ77 is a good option for the NES, mainly because of that linear nature of the PPU. On the other hand, if you are dealing with compression, time is hardly and issue, so it'd be OK to set the PPU address a million times.
My decompression scheme needs a 16-byte buffer, to decompress one plane of the tiles at a time. But it's kinda complex when compared to RLE, and maybe the difference in size does not pay off.
Maybe I should try my compression against RLE, and see how that goes. but there has to be another option... I'll keep thinking about it.
I think the results are pretty good. 70% ratio isn't awesome, but in some case it'll still be significant.
Compression may be worth it or not depending on how far you're from getting your overall size of a power of 2. Let's say, you may use compression if you're game is going to just fit a 128kb PRG ROM and you don't want it to be 256kb. Or if your game is so big you are just going to do anything to have it fit a 256kb EPROM and not need a cartridge with 512kb, often meaning a bigger mapper (MMC1 and UxROM isn't possible any longer, only SUROM, TGROM, etc... could do this).
If on the other side, your game is something like 196kb in size and you know you'll never stripe it down to 128kb, but you ask yourself what data you'd have to fit to make a total of 256kb while wasting the less space for just being blank, don't bother using compression.
I just want to fit more graphics in my 256KB cart. Tiles used during the levels will not be compressed, because those can be modified during the course of the level, so they must be copied quickly (and always using the same ammount of time). There will be 64KB dedicated to those tiles.
But a chunk of 32KB will hold compressed graphics, to be used in the title screen, cutscenes, and so on, and those will be compressed. Since these are typically bigger images, I'm not sure if I should use a totally different (maybe even lossy) compression scheme that worked on larger blocks than a tile. Maybe including palette information and all that, like a standalone graphics file format.
With graphics only using 96KB of the space, there will be a lot of room for all the level maps and code. My level maps take another 96KB of the ROM. The rest is for code and any extra data it needs.
EDIT: You know what? Now that I think of it, the compression ratio I achieved is very good, considering that the algorithm only works with individual planes of tiles, not making use of global redundancy of data. Tiles do not use data from other tiles for their compression. So, the fact that, in average, each tile was reduced to 71% of it's size, with no use of external data, is impressive.
So, if there is a way to combine this method with some way of using global redundancy, the compression ratio might grow significantly. Maybe some sort of delta coding of adjacent tiles (and planes), so that the difference between similar tiles results in a tile full of flat areas, that compress better. Maybe before encoding each tile, I could look for a similar tile already encoded/decoded, then point to that tile and encode the difference. I'll think about it today.
Anyone tried Huffman yet?
I, personally, am trying to avoid anything that uses a significant ammount of RAM, as that is very scarce on the NES, and reserving a relatively large chunk of it to the decompression routine may be a big disadvantage to the main engine of most complex games. Plus, I don't think huffman alone will do very well either.
I coded a compressor that gets me around 80% ratio.. the decompression routines only use 32 bytes of RAM to work each tile, though. It also allows seeking for a certain tile to start decompression from. It uses no official methods, i can provide the binary, source code, asm, if anyone is interested. (I'd hate to waste my time if not.)
I know this isn't NES related, but does anyone know how to uncompress wario land 2?
Matrixz wrote:
I coded a compressor that gets me around 80% ratio..
I always get confused when talking about compression ratios and percentages... when you say "80%" do you mean that the encoded data is 80% the size of the original? Or that the compression ratio is 80%, meaning that the encoded file is 20% of the original?
Quote:
the decompression routines only use 32 bytes of RAM to work each tile, though.
Mine is like that too, but I currently only use 16 bytes, since I work with each plane of the tile at a time. If I do implement the delta coding, I'll probably use 32 bytes of RAM too.
Quote:
It also allows seeking for a certain tile to start decompression from.
This can't be done with my compressed data, as it's all bit-oriented, and the idea was to have "packages" of tiles, with a byte indicating the number of tiles in the stream, followed by the stream of compressed tiles. One would have to know the address of each package to decompress all tiles in it.
Quote:
It uses no official methods, i can provide the binary, source code, asm, if anyone is interested. (I'd hate to waste my time if not.)
I think that a brief description of your method would be more interesting now. I'd like to know what kind of technique has the characteristics you described above.
tokumaru wrote:
I always get confused when talking about compression ratios and percentages... when you say "80%" do you mean that the encoded data is 80% the size of the original? Or that the compression ratio is 80%, meaning that the encoded file is 20% of the original?
Its 80% the size of the original.
Quote:
I think that a brief description of your method would be more interesting now. I'd like to know what kind of technique has the characteristics you described above.
First, it works byte-by-byte on the tile data instead of pixel-wise.
Also, the compressed data is jumbled together as values with different bit-lengths.
I'll use an example tile:
FF FF FF 1F 00 00 00 00 1F 1F 1F 1F 00 00 00 00
Basically, each tile is divided into 3 components. Before all that, a 5-bit value tells how many bytes the compressed tile uses.
First, there's an area telling which bytes of the tile has 00's or FF's (those are kind of common). A 1-bit value at the start tells if this data is used at all.
I'll make up a loop thing to explain.
the cursor is at the first tile byte.
loop start:
* A 4-bit value tells how many bytes from the cursor to skip before start outputting a series of 00 or FF. This value is added to the cursor.
* Next, a 4-bit value tells how many bytes makes the 00/FF serie.
* Next, a 1-bit value tells if the serie is of 00's or FF's.
* Set bytes in the tile memory to 00/FF, starting from the cursor.
* The size of the 00/FF serie is added to the cursor.
* then loop back to start. the loop is broken when the cursor is past the tile's 16 bytes.
Also, the decompression routine registers which bytes have been set to any value.
The 2nd area is the common area. A bit tells if the common area is used. If so, a 8-bit value tells the common byte (for the example this would be 1F, it makes up all the bytes expect the 00's and FF's).
This component works similar to the 00/FF area, it switches between skipping bytes, leaving them for the moment, and setting an array of bytes to the common byte, using 4-bit values to tell the distances.
The last area is the fill area. Any byte which haven't been set by
the two previous steps, is put here. The whole array is now filled.
I didn't feel i succeeded to make complete sense.. :/
Anyway, writing this gave me some new ideas.
Yes, tokumaru, I think you pick mostly the right decisions for coding your graphics. Having a part not compressed and a part compressed is a good idea.
And I really think you should not use any lossy compression, because of the low resolution of the NES. That makes few sense. Your compression sheme seems pretty good (but I'm no expert), and this will allow you to fit 1.4 times the graphics you would fit uncompressed, wich is quite nice, especially considering the very low cost in RAM. Using your delta thing may be even better, so I really think you've mastered the task here.
Umm... firstly, you should try "overlayed tiles". You know, each tile supports 4 colors. Technically, it's possible to overlay 3 letters, as example, using different "colors", since your font seems to be monochrome. Anyway, if mind doesn't fail, I saw this in a demo made by tepples...
Bregalad wrote:
Having a part not compressed and a part compressed is a good idea.
Yeah, whatever can be dynamically loaded during game play is not compressed, because those tiles have to be copied fast. I have given uncompressed tiles twice the space I reserved for compressed graphics, so it'd be nice to compress them really well, so that the difference in the number of tiles (when compared to the uncompressed ones) is not very big.
Quote:
And I really think you should not use any lossy compression, because of the low resolution of the NES. That makes few sense.
Yeah, I guess you are right. Even a few pixels out of place can look weird...
Quote:
Your compression sheme seems pretty good (but I'm no expert)
I'd say that 30% of compression when working only with each individual bit plane indicates good compression. Can't wait to see the results of when I try the delta coding!
About the RAM, this will always use 8 bytes of it for the buffer, even after I implement the delta coding. Those 16 bytes plus the few variables (have to implement the decoding in 6502 asm to see how many bytes they'll use) is all the memory it'll use. Anyway, the variables will probably use more memory than the buffer!
Fx3 wrote:
Umm... firstly, you should try "overlayed tiles". You know, each tile supports 4 colors. Technically, it's possible to overlay 3 letters, as example, using different "colors"
I don't see how it'd be possible to have 3 different bitmaps with only 2 bit planes. After you draw something with color 1 (plane 0) and something with color 2 (plane 1), what you'll see using color 3 is the intersection of those two, and you can't really manipulate that without screwing the other bitmaps.
Are you talking about that that technique of using the tiles to hold 2 1-bit bitmaps so that when you display them using different palettes you see one pattern or the other? That's not really useful for compression, as this can only represent monochrome graphics, and you have to waste a full palette to display only 2 colors.
With my compression scheme, if an empty plane is found (as happens with all the letters in that tile set), it's coded using only 2 bits, so I really see no reason to encode them in any different way.
Quote:
since your font seems to be monochrome.
Not really my font, those are the tiles from Super Mario Bros. I used just for testing. =)
UPDATE:
I implemented the delta-coding thing, and the Super Mario Bros. tile set now compresses to about 64.5% of it's original size. I think I was expecting more. I guess it's still a good improvement. I don't know if I could improve it even more. I think this is still a respectful ammount of compression.
Good thing is that decompression is just a little bit more complex than before, an aditional step is required to XOR the similar (already decoded) tile to the decompressed data, but that's not complicated at all.
Compression got a lot more complex (and slow), but that's not an issue since the only goal here is to have the data ready, so that a fast decompressor in the game can use it.
Wow, if you keep the same compression rate, you'll be able to put in 1.55 times the tiles you could put before, that means you can have more than a half more the data. I think this is great, I don't undestand why you're not looking satisfacted with that.
Bregalad wrote:
I think this is great, I don't undestand why you're not looking satisfacted with that.
Hum... I guess it's because regular file compressors can compress the same data to about 60% of the size. I think I'd like to reach that mark too.
Anyway, I still have some tweaking to do, I'll experiment with other ways of transforming the tiles based on previous ones, maybe even implement multiple transforms so that the encoder picks the one that results in better compression, PNG style. The bad part is the overhead of indicating the type of transform used, but the numbers will tell. The delta-coding itself came with a lot of overhead, as each bit plane must indicate if it was XOR'ed with a previous one, and if so, which one. But it still improved compression.
Heh, it's fun to try this thing with other types of data... most things actually expand, but it did compress some text documents I had...!
That does seem really good. For comparison I tried FilePack on SMB1's CHR, and it reduced to 71% of the original size. Your newer method did better than that.
I hope you'll release a utility for people like me, who always use CHR-RAM.
Compression makes a huge difference when memory is limited. Hot Seat Harry used RLE compression, even for a data table. On Garage Cart #1, a lot of the intro (characters for the menu and Nerdtracker 2 + 3 songs) fit in the space reclaimed by compressing CHR. It's a 64kB ROM, and Solar Wars on there was originally 64kB by itself.
If I would've had better compression than RLE, I could've thrown more music on there.
Memblers wrote:
That does seem really good. For comparison I tried FilePack on SMB1's CHR, and it reduced to 71% of the original size. Your newer method did better than that.
I still have to test it with other data, SMB has only been my primary test material, but later today when I get the time I'll try other tile sets, larger ones, smaller ones, and so on.
Quote:
I hope you'll release a utility for people like me, who always use CHR-RAM.
Heh, You know, I have been programming for the web for so long that I kinda forgot how to code anything other than that (and the NES)! I've been trying this out with FreeBASIC, but the code is really messy and I don't know if I could turn it into a good standalone utility. I'll try, though. I had some trouble switching to object-oriented programming, and I guess I still have trouble with that sometimes.
I'm also still experimenting a bit to see if I can improve compression even more, and when I think I can't improve it anymore I'll decide the final format and code the final compressors and decompressors.
I also have to code the decompression routines in 6502 asm, and see how fast it goes. Manipulating bits is never a fast thing! =)
Quote:
Compression makes a huge difference when memory is limited. Hot Seat Harry used RLE compression, even for a data table. On Garage Cart #1, a lot of the intro (characters for the menu and Nerdtracker 2 + 3 songs) fit in the space reclaimed by compressing CHR. It's a 64kB ROM, and Solar Wars on there was originally 64kB by itself.
That is impressive! =)
You probably want to test your algorithm(s) on multiple tile sets. You don't want to optimize for one tile set only to make the algorithm less effective on the average case.
Yes, yes, of course. I just didn't have the time to test with other tile sets after modifying the compression scheme. I'm not at home, and if I wanted other tile sets, I'd have to get an hex editor, get ROMs, and so on.
I'll go back home tonight, where I already have some tile sets ready for testing.
dvdmth wrote:
You probably want to test your algorithm(s) on multiple tile sets. You don't want to optimize for one tile set only to make the algorithm less effective on the average case.
If it would be possible to make a compression sheme effective for a particular tileset, then if you'd be coding a game with compressed tiles, you'd definitely want a compression sheme particulary efficient for your game's tileset. I don't think it works that way tough.
NB : A fun thing is that I remember Tokumaru stating earlier that CHR-ROM was better than CHR-RAM, and I was stating the opposite. And now, Tokumaru has a project where he's using CHR-RAM, and me I've a project where I use CHR-ROM. I think that's kinda ironical.
Bregalad wrote:
A fun thing is that I remember Tokumaru stating earlier that CHR-ROM was better than CHR-RAM, and I was stating the opposite. And now, Tokumaru has a project where he's using CHR-RAM, and me I've a project where I use CHR-ROM. I think that's kinda ironical.
You are absolutely right! =D I remember that! Know what? I got really spoiled by CHR-RAM now! =D I guess I do take back some of the things I said back then!
Turns out that, by taking some of the time of the rendered frame, I can simulate CHR bankswitching, in small chunks of 16 tiles per frame. I'm using this to update the main character, doing some background animations and so on.
Yes, but this is almost cheating (i.e Battletoads
)
I really think the best is to take the best from CHR ROM and CHR RAM, because both have their ups and downs, depending on what you want to show on the screen.
Bregalad wrote:
tokumaru wrote:
Turns out that, by taking some of the time of the rendered frame, I can simulate CHR bankswitching, in small chunks of 16 tiles per frame. I'm using this to update the main character, doing some background animations and so on.
Yes, but this is almost cheating (i.e Battletoads
)
I really think the best is to take the best from CHR ROM and CHR RAM, because both have their ups and downs, depending on what you want to show on the screen.
It's not cheating. It's what just about every game for Genesis, Super NES, and Game Boy Advance does. Seriously, back up your
Sonic Advance or
Kirby: Nightmare in Dream Land Game Pak, load it up in VisualBoyAdvance, open the pattern table viewer (Tools > Tile Viewer), and turn on automatic refresh. Scroll down to 0x06010000 (sprite cel VRAM), make Sonic run and jump, and see what CHR changes.
This white paper was written near the beginning of the GBA homebrew scene.
I know a lot of no-NES games does this (at the end they don't even bother mazing tiles, they just maze hard-coded tiles with hard cored positions on the screen, and put whathever graphics they need for the game's protagonist(s) here). I noticed at least Chrono Trigger, Golden Sun and Mega Man Battle Network does that (some of my all time favourite games), so I wouldn't be surprised if that would be the case of others as well.
However, Battletoads is the only one that was able to do this for the NES ever. I of course exaggerate and this is by no means cheating, but you have to admit doing this on the NES is insane, due to the slow CPU and lack of $2007 DMA.
I noticed Final Fantay IV advance goes a step further : In the menu, the map is just filled with continus tiles, and the whole tileset is modified to modify the content of the screen (like a bitmap image). This reduce the respond rate to scrolling menu commands, so they did it slightly differently in FF5 and FF6 advance.
Bregalad wrote:
I noticed Final Fantay IV advance goes a step further : In the menu, the map is just filled with continus tiles, and the whole tileset is modified to modify the content of the screen (like a bitmap image).
Like how Qix (NES), Qix (Game Boy), or Videomation (NES) works, right?
Well, I think so, and that because they don't use a regular 8x8 font, but a smaller font (because the menu originally had to fit a larger screen on the SNES).
I think Moon Crystal updates CHR-RAM to show animation, considering how smoothly the characters move. I'll have to check.
EDIT: Strangely enough, it doesn't use CHR-RAM, but it does dynamically update the characters (sort of). I've noticed that there is a 32-tile area in the pattern table that changes depending on how the player character is moved, but does not update if the character continually moves in the same direction. So I'm guessing that the ROM is bankswitched depending on how the character is moved, and that that bank holds all of the animations needed for that movement (since the character is 8 tiles, there is enough for 4 still frames per movement).
Many games do this. I think most machines at the time had their patterns in RAM, and the NES was a rare exception that had many games with patterns stored in ROM.
It was only natural that programmers would find some free time to update the patterns during gameplay for more dynamic graphics.
CartCollector wrote:
EDIT: Strangely enough, it doesn't use CHR-RAM, but it does dynamically update the characters (sort of). I've noticed that there is a 32-tile area in the pattern table that changes depending on how the player character is moved, but does not update if the character continually moves in the same direction. So I'm guessing that the ROM is bankswitched depending on how the character is moved, and that that bank holds all of the animations needed for that movement
Just like Super Mario Bros. 3 and Kirby's Adventure, right?
This is also very, very common, in NES games.
In MMC3 games using CHRROMs at leat, because the CHR ROM switching size of 1kb is small enough to allow one whole kb to be "wasted" on the main player sprites, with the 3 other kbs of sprite patterns free for the other objects on screen, and then the player can be mazed from any CHRROM banks.
Alternatively, they could have one whole 2kb bank switched for the player and the other 2kb for other objects, allowing one whole 2kb bank to be used for all animations of a given player (without bankswitching), and allowing different protagonists to take place on the same screen at different times.
Older games using CHR-ROM (a.ka. MMC1 or CNROM) that would have done any of these techniques coudln't have used CHRROM unless a big amount of banks would be wasted, due to the whole 4kb/8kb switching.
Just out of curiosity I tried applying a delta-codec to the smb_orig pattern data and got a file that was 11956 bytes (that is, about 18% of the original file. or 73% if compared to a plain 2-bit represenation). It'd be interesting to try to implement the decoder in 6502 assembly sometime, since it's not very complex iirc (no multiplication/division needed).
Speaking about compression, I've spent some time figuring out GFX compression in 'BEE 52'. Damn, that's pretty hard - Codemasters are genius of course, so I screwed up a little
Anyway - what compression format does this game has? dunno maybe
this will help - at least you'll learn about adresses...
Use rom corrupter to find where the data is stored in the file, then use breakpoints to find when the data is read and when its decompressed version is written to. Log what bits/bytes are read, and what bytes are written as a result.
Hmm. I don't think, that it will give result... Baybe it will work on easy compression, but this algorithm is extremely complex. Output stream will become some kind of crypt or something... Reversing - is the only way I think.
Bump! Solution to the puzzle posted
here. It was quite complex to figure out...