I have an idea on how to compress level data on the NES that I believe could achieve 50% or more compression. In Ecco the Dolphin for PC, the game is made up of tiles that like the NES are 8x8 in size, the levels on the other hand are made up of large blocks made up of 8 tiles, if I were to use a screen made up of blocks instead of tiles, than a single nametable could be filled up with level data that is in total 240 bytes, instead of the normal 960 bytes, technically that means I compressed the level by 75% but you got to account that I may have 50-100 blocks each 16 bytes (80-160 bytes total).
I want to implement this but at the moment I can't see any flaws.
I guess my question is, are there better ways of compressing level data, keep in mind I'm implementing a multi-directional scroll.
Errrr....
There is nothing new here. 99% of NES games in existence uses metatiles...
I figured that would be the case but would/could metatiles achieve 50% compression or more if used in that manner?
The tech that you just described is used everywhere, including most of NES games. Compression ratio depends from size and number of the metatiles, generally it gives good ratios for medium size blocks (typically 4x4 tiles) that reused all the time. If you have many unique blocks, this won't work well.
Ah, interesting! I'm trying to get a multi-directional scroll off the ground, but before I go jumping into making it work, I figured I should come up with a way to shrink the data down as much as possible. I guess what I'm wondering is if it could work better than lz77 on raw data, or is it about the same.
"Metatile" is the name for any kind of map structure that consists of multiple physical 8x8 tiles. The most common size is 16x16 pixels (2x2 physical tiles), mainly because that's also how the attributes work. Some games (Battletoads) use metatiles consisting of 4x4 physical tiles.
Whenever I say "tile", I'm usually referring to the 16x16 pixel squares that people also call Metatiles.
Map compression techniques used in games vary. Some games use RLE (Run-Length encoding) for the maps (Dragon Quest series, Zelda 2, etc). Other games place variable-sized 'objects' that expand to multiple tiles and can be positioned freely (Super Mario Bros series, Zelda 2, Metroid, etc.)
Some games use larger structures of tiles, like 2x2 (mega man 3), 4x4, 8x8, 16x14 (single full screens that can be reused as you scroll the map around), 16x16, etc.
Many games just simply use uncompressed tiles placed in ROM.
Some games use something homegrown and original. Lolo uses a scheme where it does RLE-like runs of whatever tile is above the current tile, effectively making it LZ77 with a fixed offset.
Some games don't use metatiles at all. Like racing games, pinball games, or other kinds of games where there are no tiled obstacles to interact with.
lz77 is difficult to use for a free directional scroll. It may work for one-way scroll, but you would need a modification that refers to packed data instead of depacked, you can't store all depacked data because NES does not have enough RAM for this.
I think the reason I didn't hear much about metatiles in NES games is either the folks here had more efficient methods, or didn't use it but may have been talked about once.
I've talked with the creators of Super Bat Puncher and Nomolos, one uses lz77 while the other uses no compression, I was gonna use no compression myself but than I realized how huge a single screen of level data was, and even though I'll have space for it, I don't want to write myself into a corner later. I'm working on a multi-directional scroll right now and I was thinking before I really start working on it, I should figure out a way to compress this data down. After what I heard here, it seems it will be well worth it, glad to hear it's common! I came to the right place
Shiru: Wow that does help to know, it was one of the options I was considering but I can already tell it won't be effective decompressing 4x4 tiles.
I'm making a multidirectional scroller like you plan to.
Disclaimer: I am only using test tiles with slopes right now, so they allow me to reuse things much easier than actual graphics would. (The test tiles look sort of like this:
http://tcrf.net/File:StriderArcTestTileset.png)
My levels contain a 3 byte header and a list of 256x256 screens which are made of an RLE compressed list of 32x32 metatiles which are made of four 16x16 metatiles which are made of 4 8x8 metatiles and a collision byte.
I have 5 levels.
Level 0 is 15x9 (135) screens.
Levels 1-3 are 2x2.
Level 4 is 5x2.
All of them use the same screenset, 32x32set, and 16x16 set.
Currently I'm using 82 unique screens, which take an average of 29 bytes each. Obviously 135+4+4+4+10 < 82, so I am reusing screens but not as excessively as it may appear. For instance, the sky screen which is just filled with blue is used for the entire top row of level 4. I have a similar solid screen that takes up quite a few parts of Level 0. I try not to reuse "gameplay" screens.
Because I wanted RLE without using an extra byte for tiles that aren't compressed, I reserved the high bit so there can only be 128 32x32 tiles. A screen with all the same tile (like the sky screen) is two bytes, but this is obviously very rare.
A level just takes as many bytes as screens it's made out of+3 for the header. Every used 16x16 tile is 5 bytes (one extra because of a reference to collision data), every 32x32 is 4, and screens are variable.
All the levels with all the sets takes up 2998 bytes. It breaks down like this:
(Map Editor Output)
Code:
Export Started:
Working on bank 0...
Found level 0_0...
Looking for levels that use the same screenset...
Exporting 16 set 0 to bank 0 as set 0...
Size was 235, Bank 0 is now 235
Exporting 32 set 0 to bank 0 as set 0...
Size was 248, Bank 0 is now 483
Exporting screen set 0 to bank 0 as set 0...
Size was 2343, Bank 0 is now 2826
Exporting level 0_0 to bank 0 as level 0...
Size was 138, Bank 0 is now 2964
Exporting level 0_1 to bank 0 as level 1...
Size was 7, Bank 0 is now 2971
Exporting level 0_2 to bank 0 as level 2...
Size was 7, Bank 0 is now 2978
Exporting level 0_3 to bank 0 as level 3...
Size was 7, Bank 0 is now 2985
Exporting level 0_4 to bank 0 as level 4...
Size was 13, Bank 0 is now 2998
Looking for levels that use the same 32set...
Looking for levels that use the same 16set...
Working on bank 1...
No level uses bank 1.
If I also RLE'd the list of screens in the levels, I could probably nickel and dime like... 16 bytes, which isn't worth the extra decoding time. In some ways I'm worried, because actual graphics may raise the average screen size and this is still with no enemies. I also have sets for other things that my collision byte refers to, but that stuff is actually pretty small. (And unnecessary if you're not using slopes.)
In any case, those are some hard numbers for you. Hope they're useful. I'd post pictures of the maps too, but... my map editor actually can't do that yet and I ain't posting a rom anytime soon.
Edit: Heh. I actually thought to do the math for (mostly) uncompressed.
82 unique screens * 1024 (my screens are 256x256 not 256x240) = 83968.
2343(size of my screens alone)/83968*100 = 2.79% I guess the savings are pretty significant and will only get larger with more unique screens.
edit: Well, there's an error in the math, since the screens obviously rely on the 16x16 sets and 32x32 sets which take up space. So it's more like 3.5%.
Arkonviox wrote:
I've talked with the creators of Super Bat Puncher and Nomolos, one uses lz77 while the other uses no compression
I'm pretty sure Nomolos uses metatiles though. Super Bat Puncher can afford to use lz77 (on top of metatiles), because it has extra 8KB of WRAM on board where it can decompress the map.
The Super Mario games use object compression, which has some of the advantages of LZ77 while retaining multidirectional access. Each screen is made up of objects, such as a platform of length L whose top left corner is at (x, y), and whenever the scroll reaches a row or column, it's checked against the objects that could overlap it. It's like the difference between Illustrator or Inkscape on the one hand and Photoshop or GIMP on the other. Play with Lunar Magic for a while to get an idea of how objects in SMW work.
240 bytes per screen is still pretty huge for platformers or RPGs, which typically have very large maps and require thousands of unique screens. A common solution is to use larger blocks, so that each screens uses even less space. Of course this increases the space required by the blocks themselves, but you can still subdivide the blocks into smaller blocks to minimize that. Several levels of blocks obviously result in level maps that are more complex to decode, but the compression is worth it.
Another common way to minimize the space required by level maps is to make them object-based, like in SMB1, so that there's a direct relation between the complexity and the size of each screen (vs. the other way where all screens occupy the same amount of bytes).
Common compression schemes such as RLE and LZ77/LZSS/LZW typically require a lot of RAM (because you can't start decoding at any point of the stream, so larger blocks of data must be processed at a time), so they are more common in games that have extra RAM on the cart.
I think I like the method you guys have mentioned super mario using, object based, I could see it being very effective as I could easily expand it later if I needed to.
Kasumi: That's a pretty awesome scheme you got there, I will probably explore such methods down the road, but it's very interesting to see what you did there.
thefox: I think you may be right, he mentioned to be he used no compression but I can't see how he didn't employ metatiles, I've put together two nametables worth of data already just to test a simple scroll test and already I think something needs to be done before I proceed with the engine design I was thinking about.
Ultima 6 for MS-DOS used 8x8 tile blocks (Chunks) for its maps, and an overworld map of 128x128 chunks. Up to 1024 chunks were allowed, so that's 10 bits to select each chunk. Underworlds were 64x64, and there were four of them. So we effectively have two 128x128 maps, and 1024 definitions for the chunks themselves. 40960 bytes for "maps", and 65536 bytes for the "chunks". The maps and chunks themselves were not compressed at all.
This game had a very large world map, and it fit in 104k of storage.
Of course, Ultima 6 also had additional layers of objects that went on top of the map.
I'm just trying to say don't rule out maps built out of larger blocks of tiles. The Sonic the Hedgehog games also used this structure. Sonic 1 used screen sized (256x256 pixel) sized blocks, and Sonic 3 used 128x128 pixel-sized blocks.
This is my idea of a level-data format:
Overhead bytes: nnnniiii
i - object id
n - number of objects
That's one byte for each unique object in the level or screen. i can also index an object set for that area instead of all objects in the game.
Position bytes: yyyyxxxx; If 5 or 6 bits are needed, store the extra bits for several objects in a row in the same byte. That way 8 objects can be stored in 10 bytes using 5-bit position or 4 objects for 6-bit.
Everyone here's given me a lot of ideas to try, it's gonna take time to process (especially since I been having insomnia for the past 3 days and am ready to shut down).
As for game development, the NES is new to me, I learned 6502 pretty quick, and have made a game demo before in HTML5
http://www.arkonviox.net which I ported to flash, but I felt I needed both a challenge and something to show for my effort, and after reading up on cheetahmen I thought to myself, if they can do it, I can do it. So far it's been pretty easy but I could use all the optimization tips I can get, especially since the NES's memory is so limited and assembly is both a source of power and a nasty beast.
strat: it's you mentioned sonic, ecco the dolphin uses 128x128 metatiles, which break down further and further.
I personally don't consider using metatiles as a form of compression, because your level IS made of mtatiles.
It is however common to use compression on a metatile-coded maps, because even if you use for example 16x16 pixel metatiles, a totally uncompressed screen is still 16x15 = 240 bytes which is still a lot.
You can use repetition of the same tiles as well as probabilities to get a high compression ratio on maps, but this is a complex subject.
Does anyone else have some numbers from their own compression scheme? I'm sort of curious how other methods do.
Also, I realize I can use the two free bits in an RLE length byte (which has a max value of 63) to possibly do some more tricky things, but I'll have to sleep on that.
For my Chu Chu Rocket compression scheme, I'm packing 125 maps (12x9 mazes with objects) into about 7800 bytes. About 62 bytes per map on average. The text for the level name is separate, currently uncompressed ASCII, and takes up a significant amount of space compared to the level data itself.
Mazes are simply bit-packed, two bits (top and left wall) for each tile. That's a fixed size of 27 bytes. Objects are RLE compressed, dividing bits differently depending on what kind of tile it is, but each run of objects fits into a byte. CD byte ends a level.
Originally made this scheme for the TI83 version a long time ago, then re-used it for the NES version.
One of RLE's biggest foes is a checkerboard pattern, where you place things every other tile. So in my version of RLE used there, I use a bit to determine whether to advance one square or two squares.
Currently investigating compression and large maps.
I'm experimenting with 4096x4096 pixel maps (eg: Dragon Warrior world maps).
Messing around on paper and in Photoshop just looking at things.
First recognizing that I only need 6 bits per metatile (64 metatiles in CHR-ROM bank at a time), or 5 bits (Dragon Warrior only uses half the active BG CHR-ROM tileset for world tiles, other half for text/dialogs).
Next is playing around with RLE length bits. I picked a random row of 1 x 256 metatiles of 16x16 pixel size spanning the world in a average moderately complex layout (single tile runs like crossing multiple continents, rivers, a town, moats, etc, not just all water and mountains).
Too low like 3 bits len + 5 bits value and simple 1 byte per run reads I got 40 something bytes but there was alot of:
8 water
8 water
8 water
8 water
due to the 3 bits for length.
Changing to 6 bits length, 6 bits value gives 1.5 bytes RLE block AND allows for full CHR-ROM access instead of half, and I got it to around 36 bytes for the same 1 x 256 metatile run. Typical compression trade off of more space wasted for the smallest cases to save more space in the largest case. At minimum I want my RLE blocks to be nibble aligned for rapid decompression with fixed length shifts/masks, etc, for fast real time decompression.
So with just basic RLE I'm managing about 30-40 bytes per 256 metatiles, or 1.07 screens worth of data.
~40 bytes vs 960 per screen = ~4 % of uncompressed size.
Shooting to have 256 x 256 of metatile data (64k) compressed to about 8k for an entire world map consisting of ~273 screens. Thats going on the average, because there will be strips of just ocean that will take 6 bytes, and there are probably more complex strips that will take more than 60+ bytes to cancel it out.
I haven't written code yet to run a compression of the whole map, but I'm estimating it will be less than 8k for something like a RPG world map where there are more regions of mountains/ocean that will dwarf the rows with more details.
Decompression should be fast enough to do on the fly from ROM straight to 15-16 metatile rows/columns to CIRAM. Just running sums until you hit the run that cooresponds to the location of the tile. Worse case is vertical columns, where you have to unRLE 15 rows, but even then its just a tight loop summing 10-20 times per row. 1-2 scanlines of draw time per single tile when updating a 15x column seems acceptable. If not, I'll just break the RLE into quadrants or implement some kind of skip lists to jump into the right ballpark to save some time.
Another worse case is each of the 256 metatiles are non repeating from the previous, and youd end up with 384 bytes of RLE data, but I'll call it a "level design restriction" of not doing this excessively and the "level editor" will warn "too much unique data" and not allow any RLE row to be more than 256 bytes, for the sole purpose of keeping the decompression operation manageble with 8 bit instructions to keep the decompression loop small, tight, and fast.
Any time a new row/column is needed, you either unRLE the one row from the start until you get the 16 tiles you need, or unRLE 15 rows from the start until you get each single tile you need for a column. Only RAM required is temp vars for the running sums, etc. The tile index would be pulled out of the RLE data, and used to immediately compute 2 real tile indices and write to CIRAM.
This works well for most any adventure games, and platformers in most cases, but not for any kind of grid/puzzle games where you have repeating alternating patterns.
One thing you might want to try is unary coded length and 6 bits value, or gamma coded length and 6 bits value. That should essentially always save space.
If metatiles repeat or share physical tiles, you may be able to fit more than 64 distinct metatiles in 256 physical tiles.
The way Dragon Warrior 2, 3, and 4 encoded maps:
xxxxx... length
.....xxx tile number
(not 100% sure of the order of the bits, I'd need to look that up)
If tile number is 0, instead use the length as the tile number of a single tile. Using a different set of tile numbers than the others.
tepples wrote:
If metatiles repeat or share physical tiles, you may be able to fit more than 64 distinct metatiles in 256 physical tiles.
I thought about this.
But that's a matter of just having a lookup table that maps a metatile to 4 physical tiles. So whether it's fixed (eg: metatile 0 = phys tile {0, 1, 2, 3} or {0, 1, 16, 17} and using logic/arithmetic instructions to compute y<<4+x and y<<4+x+1, or using 2-4 ldas from a metatile to physical tile mapping table, it's going to be approximately similar in the code size/execution time.
But this didn't have anything to do with the metatile map compression at all, which is my primary focus. Just the series of instructions between the decompressed metatile number and how that translates to 2-4 CIRAM writes.
Well, pehaps it's time to make some publicity for my recently released
CompressToolsI know it feels unfinished because one the last algorithm (arbitrary Static Dictionary) is so slow it'll hang up with most data, but it is still a very powerful tool so see which schemes works best to compress X data.
Compression is a domain where it's impossible to predict if X or Y will work best without trying, the only way is to compress with X, compress with Y and tie your conclusions.
Dwedit wrote:
The way Dragon Warrior 2, 3, and 4 encoded maps:
xxxxx... length
.....xxx tile number
(not 100% sure of the order of the bits, I'd need to look that up)
If tile number is 0, instead use the length as the tile number of a single tile. Using a different set of tile numbers than the others.
Makes perfect sense. Your 7 tiles would be the most common: water, mountain, hill, field, swamp, desert, ice, while 0 gives you up to 32 tiles for things that are unique anyway like towns, parts of castles, bridges, rivers, caves, numerous shoreline orientations, etc.
I picked Dragon Warrior games as my baseline due to its large 2D world maps which are much much larger than your typical 1D 6-8 screen platformer levels. And it helps that I just completed a playthrough of DW II this weekend and the complete 4096x4096 world map images are easy to find for test data.
The DW games smooth the shoreline after it's decompressed.
Dwedit wrote:
The DW games smooth the shoreline after it's decompressed.
Makes sense. The compressed data refers to metatiles. Keeping metatiles of all possible shore combinations would consume a large amount of CHR-ROM with duplicate ocean pixel data. Keeping the shoreline outside of the compressed map data means you can have the bare minimum 4 unique 8x8 physical tiles for the shorelines. And it's data you don't have to store anywhere; it can be computed on the fly any time you update a row/column of tiles and see a water tile next to a non water tile.
Otherwise there is no loss in compression efficiency since it would only have been a single byte single tile run in the RLE stream, and it occurs at a transition that breaks a contiguous RLE span anyway.
Interesting, I'm going to have to run it at 1% frame rate and watch as this happens, both during a single row/col update while scrolling, and during an entire screen rebuild after a battle/town. Curious if it does it on the fly as it converts metatile indices to physical tile numbers when writing to CIRAM or if it performs it as a separate pass since it would be more difficult to detect transitions in the vertical direction while decompressing RLE horizontally.
I suppose if you buffer the new row/col in RAM you could program the PPU address registers to read out the row/col adjacent to the one decompressing (n) and perform continuous single LDA CIRAM reads on the tiles next to the ones you decompressing (n-1). Then while decompressing a row/col: a) if the current decompressed tile to be written XOR the current CIRAM tile is water, it's a transition, amend the metatile to tile conversion to contain shore tiles, and b) track the same thing when starting/ending water runs in the horizonal direction. You have to map a metatile index to 2 or 4 CIRAM nametable tiles to begin with (along with 1 attribyte byte), so it would be pretty straight forward to look for transitions and inject the appropriate shore tiles in flight.Or I'm overthinking as always... decompression of a row is the trivial case, you can detect transitions as you decompress and inject the appropriate physical indices before writing to CIRAM. For columns, just keep the n-1 tile in the column from the previous decompression and compare the to current tile n, inject shoreline physical tiles as needed.
I think it just makes it a different metatile based on whether or not the surrounding tiles are water/bridges/shoals.
And more metatiles doesn't necessarily mean duplicate tiles, since metatiles can share physical tiles.
Dwedit wrote:
I think it just makes it a different metatile based on whether or not the surrounding tiles are water/bridges/shoals.
And more metatiles doesn't necessarily mean duplicate tiles, since metatiles can share physical tiles.
True if you have a mapping table. For simplicity in my current train of through I'm hard coding metatiles to physical tiles the same way OAM does (metatile 0 = physical tiles {0, 1, 16, 17}
I'm just not seeing why move the shorelines out of the RLE. Unless all those 1 byte entries really really add up by themselves that a block of code to check and inject the shorelines really saved a few more kb. We are talking 65k metatiles after all, and the shorelines probably do take up the most data since they far outnumber the other incompressible single instances like caves and bridges.
Ok so got home and looked at the map again.
Two types of data, compressible runs, and 1 offs. Compressible data like "63 water, 13 tree, 16 mountain, 13 water" compresses to almost nothing.
Leaves 1 off data to worry about.
Of the 1 off data, things like towns, caves, bridges are so infrequent so as to not even matter. Rough terrain with lots of breaks and things like rivers also doesn't occur terribly often as naturally IRL terrain tends to be localized.
The shorelines however are a significant source of 1 off tiles in every single row, potentially thousands.
So basically just saved a few more kilobytes by having code check for adjacent tiles to be 1 land 1 water and inject a shoreline tile in flight.
Could mean the difference between the world map nametable going from 64k to 8.1k vs 6.6k and fitting the entire world map into a single 8k bank frame along with it's other supporting data like metatile to tile tables, attributes, and other map related vars other than nametable data.