Well, here's the deal. Not long ago, a demo for Mega Drive was released that effectively would output a high color (512 colors!) linear bitmap on screen. Pretty stable, at that, the only serious issue being that VRAM refresh cycles causes some columns to be wider. Well, also that the massive DMA eats up practically all CPU time and that it needs more memory than available in RAM, but it's still useful when put together with the Mega CD (which in fact even allows hardware-based double buffering with this method, hah).
Here is the demo, if anybody cares (run on real hardware, no emulator will get this right at all).
For those curious, what the demo does is disable display and then pull off a massive DMA to the background color, which overwrites the color being shown on screen as it's being output (which results in something akin to a bitmap). Yes, this also means even the border area is affected... so you could easily output a widescreen image with this method too. That's pretty darn overkill for a 4th generation console =P
Now, the SNES is generally known as being able to output more colors, so getting something similar there should be easy, right? ...or so I thought, even reading through Nintendo docs I seem to be unable to come up with a way to do it. I really never programmed for the SNES so I'm most likely missing a lot of stuff (help filling in the gaps?), but whatever, this is what I came up with and why it wouldn't make a suitable replacement:
DMA: the most obvious trick would be to do exactly the same thing, right? Except that DMA on the SNES is limited to 64KB (as opposed to the 128KB of the Mega Drive), which means only half the amount of pixels (it may be possible to work around this by using two DMA channels, but no idea how well would that work). Moreover, it's byte-based instead of word-based, and I have no idea if CRAM entries are updated only when both bytes are overwritten or they change immediately (in the latter case you have even more issues as every half pixel would have the wrong color).
H-DMA: why not just split it into multiple DMAs and let the hardware do the synchronization for us, while we're at it? Sounds like the best thing we could do, right? Except H-DMA is limited to 1, 2 or 4 byte transfers. Definitely not what we need here. H-DMA is out of the question.
CG Direct: should have looked at this before, the SNES has its own way to output high color graphics! Up to 2048 colors with this method. Sadly, it doesn't seem to be well documented at all. How is this meant to work? Also the bottom bits of the R, G and B values are taken from the palette. Does this even have pixel granularity? We'd like to be able to manipulate all the RGB values individually in all the pixels of the image. Also VRAM usage may be an issue.
Blending: another idea was to use the blending hardware (in additive mode) to merge two colors into one and get a more colorful output. Sadly, palette limitations kick in... The only mode with a large palette (256 colors) is mode 7 which only has one tilemap (we need two) and a limited amount of tiles. The other modes offer at most 16 colors per tile, and we can only mix two layers, so that means we can get at most 256 colors... at which point mode 7 seems more useful (may be useful for YUV or YCrCb though!). Again, VRAM usage may also be an issue. I suppose that you could also try to mix mode 7 with the sprite layer... not sure how well would that work though.
I'm running out of ideas. Does anybody else know what other options would be out there? Did I get any of the technical facts wrong?
Sik wrote:
DMA: the most obvious trick would be to do exactly the same thing, right? Except that DMA on the SNES is limited to 64KB (as opposed to the 128KB of the Mega Drive), which means only half the amount of pixels (it may be possible to work around this by using two DMA channels, but no idea how well would that work).
You can transfer more than 64KBytes using multiple DMA channels; e.g. for 128KBytes you could use channels 0 and 1. There are are total of 8 channels available to use. The CPU will block (wait) until all selected DMA transfers are done (register $420B selects which channels you want to initiate transfers on), and they are done in sequential order (e.g. if you picked 2 channels (0 and 1) of 64KBytes each, channel 0 would transfer first, then channel 1 would transfer next, then control would be relinquished back to the CPU).
Sik wrote:
Moreover, it's byte-based instead of word-based, and I have no idea if CRAM entries are updated only when both bytes are overwritten or they change immediately (in the latter case you have even more issues as every half pixel would have the wrong color).
See register $43x0, bits 2-0, for how the writes can be done, as well as bit 3 (which controls/refers to the data being read from the source and how). It's entirely dependent upon what/where you're writing to (see other bits for that register, as well as register $43x1); for destination increment methods available see register $2115 bits 3 through 0. Registers $2118 and $2119 in the official docs also explain what happens.
Also, there is no such term "CRAM" (this is the first time I've ever heard of it). I believe what you're trying to say is, simply, either VRAM or PPU RAM (same thing); RAM is RAM. You tell the SNES which areas of RAM you want to use for which purpose. The purposes are:
- OBJ ("OBJ Base Address"; see register $2101)
- BG SC ("background screen character", e.g. actual tile data; see registers $2107 through $210a)
- BG NB ("background name base", e.g. tile layout data; see registers $210b and $210c)
- CG (colour/palette; see register $2121)
Hope that clarifies some of the technical aspects, since the last line of your post did ask if you understood things correctly (technically).
The bottom line is this: the SNES is not intended to do "full screen graphics changes" -- by this I'm referring to things like playing back full-screen movies (every pixel changing, etc.). It just doesn't have the capabilities or CPU time to do it. From developers I know who worked at Tiburon and one who worked for Konami, this was one of (many) reasons why the Playstation product for the SNES was scrapped.
I say this up front without having checked out the demo/thing you wrote. The Genesis/Megadrive is a very, very different beast altogether.
I would say that byuu probably has some better insights as to what you may/may not be able to do with the console, so don't take my word as purely authoritative.
koitsu wrote:
You can transfer more than 64KBytes using multiple DMA channels; e.g. for 128KBytes you could use channels 0 and 1. There are are total of 8 channels available to use. The CPU will block (wait) until all selected DMA transfers are done (register $420B selects which channels you want to initiate transfers on), and they are done in sequential order (e.g. if you picked 2 channels (0 and 1) of 64KBytes each, channel 0 would transfer first, then channel 1 would transfer next, then control would be relinquished back to the CPU).
OK, so that part seems to be a non-brainer unless it breaks timing somehow... which is what I wonder since it's one of those things that can break badly by being just one cycle off =S
koitsu wrote:
Also, there is no such term "CRAM" (this is the first time I've ever heard of it).
Right, forgot that CRAM is Sega's term and Nintendo used CG (CRAM stands for Color RAM).
koitsu wrote:
The bottom line is this: the SNES is not intended to do "full screen graphics changes" -- by this I'm referring to things like playing back full-screen movies (every pixel changing, etc.). It just doesn't have the capabilities or CPU time to do it. From developers I know who worked at Tiburon and one who worked for Konami, this was one of (many) reasons why the Playstation product for the SNES was scrapped.
Nor was the Mega Drive. We're talking about abusing the hardware like crazy =P
koitsu wrote:
I say this up front without having checked out the demo/thing you wrote. The Genesis/Megadrive is a very, very different beast altogether.
Er, it isn't mine, I thought the ZIP included some README inside or something o_O' It was Oerg866's (and Jorge's, though no idea how much he contributed). I only provided the image (which has 336 colors - no, we couldn't find anything with more colors, seriously).
Sik wrote:
koitsu wrote:
You can transfer more than 64KBytes using multiple DMA channels; e.g. for 128KBytes you could use channels 0 and 1. There are are total of 8 channels available to use. The CPU will block (wait) until all selected DMA transfers are done (register $420B selects which channels you want to initiate transfers on), and they are done in sequential order (e.g. if you picked 2 channels (0 and 1) of 64KBytes each, channel 0 would transfer first, then channel 1 would transfer next, then control would be relinquished back to the CPU).
OK, so that part seems to be a non-brainer unless it breaks timing somehow... which is what I wonder since it's one of those things that can break badly by being just one cycle off =S
I don't know how DMA can ""break timing"". The CPU is held/suspended while the DMA transfers happen, so it's not like your underlying 65816 program would have any idea what's going on. DMA transfers for large sums of data are by far, "cycle-count-wise" (for lack of better term), faster than a native 65816 rolled loop, unrolled loop, or using MVN/MVP opcodes.
I believe VBlank can happen in the middle of DMA, but I don't know the repercussions (I don't think there are any, at least not like the NES -- I guess it depends greatly on what B-Bus address you're writing to in the PPU). If you can find the SNES Developers Manual (send me a PM if you're interested) there is a known quirk with HDMA (not DMA) and "timing", but it's documented + known in the manual itself.
In general, maybe you could accomplish something on the SNES similar to blargg's wild-wacky-crazy-awesome-neat palette/colour demo, but I'm not sure how to get *that* degree of control over the underlying video circuitry. Writing to VRAM during HBlank, for example, I'm pretty sure you can do + get some neat effects. Possibly palette adjustments can be done in this way, thus gaining more than the stock number of colours than if you weren't to adjust things during HBlank.
I'd need byuu to chime in about now, since it's been a very long while (maybe 10 years or so) since I've worked on a SNES/SFC console and tinkered about with things to this degree. :-)
koitsu wrote:
I don't know how DMA can ""break timing"". The CPU is held/suspended while the DMA transfers happen, so it's not like your underlying 65816 program would have any idea what's going on. DMA transfers for large sums of data are by far, "cycle-count-wise" (for lack of better term), faster than a native 65816 rolled loop, unrolled loop, or using MVN/MVP opcodes.
The switch between both DMAs (since you'd need to issue two DMAs to fill the entire screen). I have no idea how many cycles it takes up for DMA to start, and this delay will happen between both DMAs and needs to be taken into account. Although I suppose it's always the same amount so it shouldn't be hard to synchronize (although since transfers are byte-based, there's the chance the second DMA may be shifted by half a pixel rather than a whole amount - good luck fixing that if it happens, maybe you'd need to somehow issue two DMAs in a row to get the delay twice).
koitsu wrote:
I believe VBlank can happen in the middle of DMA, but I don't know the repercussions
Display would be disabled, so for all we know the PPU is in a permanent VBlank state.
Another thing is that I have no idea about the refresh cycles. Remember, I mentioned the Mega Drive trick suffers from having wider columns (every 31th column has double the width, more specifically) because in those columns a VRAM refresh happens and no transfer is performed (causing the previous color to repeat). No idea why it affects all of VDP memories (CRAM and VSRAM shouldn't need refresh), but that's how it works. The SNES may suffer from the same issue, though no idea to what degree, and if it's even stable enough to work around it.
Sik wrote:
The switch between both DMAs (since you'd need to issue two DMAs to fill the entire screen). I have no idea how many cycles it takes up for DMA to start, and this delay will happen between both DMAs and needs to be taken into account. Although I suppose it's always the same amount so it shouldn't be hard to synchronize (although since transfers are byte-based, there's the chance the second DMA may be shifted by half a pixel rather than a whole amount - good luck fixing that if it happens, maybe you'd need to somehow issue two DMAs in a row to get the delay twice).
You issue two DMAs (e.g. transfer of 64KBytes of data using channel 0, and 64KBytes of data using channel 1) using a single write to $420b. I don't know how long (meaning on the hardware itself, in microseconds/milliseconds or whatever) the actual DMA initialisation takes. I'm sure it would be easy enough to find out, but there may be highly technical documents out there (written from a hardware engineering point of view) that document how long it takes. The official Developers Manual may have this in it somewhere too (I believe it has it in it for HDMA, but could be wrong).
The point I'm trying to make is that you can do back-to-back transfers of data using DMA
without relinquishing control back to the CPU between those two transfers -- you can do it in one single write to $420b.
If you want me to write you some example 65816 code that shows how to do it (it's very very simple, nothing magical or amazing -- just standard code) I can do so.
Sik wrote:
Display would be disabled, so for all we know the PPU is in a permanent VBlank state.
Well then not much to worry about there! :-)
Sik wrote:
Another thing is that I have no idea about the refresh cycles. Remember, I mentioned the Mega Drive trick suffers from having wider columns (every 31th column has double the width, more specifically) because in those columns a VRAM refresh happens and no transfer is performed (causing the previous color to repeat). No idea why it affects all of VDP memories (CRAM and VSRAM shouldn't need refresh), but that's how it works. The SNES may suffer from the same issue, though no idea to what degree, and if it's even stable enough to work around it.
Sorry, no idea -- what you just described to me is completely and totally over my head.
koitsu wrote:
If you want me to write you some example 65816 code that shows how to do it (it's very very simple, nothing magical or amazing -- just standard code) I can do so.
Nah, I was talking about the time it takes for the PPU in itself to start the DMA (since I assume it does the initialization for each of the DMA channels). Although the PPU is always in sync with itself (d'oh) so it's probably a nobrainer unless it turns out it takes too long to be usable (but then we'd have some serious issues there overall, DMA init shouldn't take
that long...).
Which reminds me, there also needs to be a way to synchronize the 65816 with the PPU, and more specifically, to a given position in screen. The 65816 running at different clock speeds depending on which address range it access doesn't make things any easier, we'd need to mess with all of them while trying to look for the timing.
On the Mega Drive synchronization is done by turning on display, overflowing the FIFO in active scan (so the VDP forces the 68000 to halt - this is where both get in sync), then turning off display and starting the DMA. Is it possible to do something similar with the 65816 and the PPU?
Sik wrote:
Nah, I was talking about the time it takes for the PPU in itself to start the DMA (since I assume it does the initialization for each of the DMA channels). Although the PPU is always in sync with itself (d'oh) so it's probably a nobrainer unless it turns out it takes too long to be usable (but then we'd have some serious issues there overall, DMA init shouldn't take that long...).
DMA capability previously discussed is either done within the physical CPU (e.g. some extension/capability Sony added to the chip) itself or a separate chip on the mainboard somewhere. It is definitely not done within the PPU.
Per the official developers documentation:
* NOTE: If 2 or more channel are designated, the DMA transfer will be performed continuously according to the priority order described on page B-1. The CPU will also stop operation until all general purpose DMAs are completed.
* Page B-1 shows that channel 0 has higher priority than channel 1, channel 1 has higher priority than channel 2, etc... all the way down to channel 7.
* Page B-1 also shows that DMA being done between the CPU (A-BUS) and VRAM (B-BUS) should be done during VBlank (or, obviously, when the screen is off). HDMA, on the other hand, can be done per-scanline between the CPU and VRAM.
* Section 25.1 documents the known issue (mentioned previously in my post) with HDMA vs. DMA timing and discloses some timing variables, also citing that "a real time trace function of the ICE can be utilised to confirm timing problems". Apparently, at least with HDMA, one horizontal line takes 63.5 microseconds, and an increment of the H-count timer (HTIME; see below) equates to 0.186 microseconds.
* Table 2-17-1 indicates in a footnote that "in the case of a screen with 224 scanline resolution, general purpose DMA can transfer 6KBytes of data maximum in the VBlank period".
So there's no official statement (from what I can discern) how long things take, but you can get a general idea. I'm sure someone with some kind of hardware analyser could figure it out for certain.
Sik wrote:
Which reminds me, there also needs to be a way to synchronize the 65816 with the PPU, and more specifically, to a given position in screen. The 65816 running at different clock speeds depending on which address range it access doesn't make things any easier, we'd need to mess with all of them while trying to look for the timing.
I'm not sure what you mean by "to a given position in screen". Are you requesting that the SFC/SNES have some way to get X/Y coordinates of the electron gun while the PPU + video hardware draws pixels?
The options as I see them are:
1. Use of HDMA, which won't give you locations of things, but does guarantee during which HBlank scanline you'll be doing PPU modifications.
2. Register $2137 (SLHV) can be used as a "software latch" for H/V location per the correlating $213c (OPHCT / Horizontal) and $213d (OPVCT / Vertical) registers. Both latter registers are dual-read and return 9 bits of data. You have to read $213f (STAT78) to "reset" the latch registers to make sure you get the correct low byte/high byte (bit) values from $213c/$213d.
3. Registers $4207 and $4208 (HTIME; horizontal), and $4209 and $420a (VTIME; vertical). These can be used to generate an IRQ when the electron gun is at specific horizontal and vertical locations.
4. Cycle counting (known to have been used in games like Chrono Trigger).
Sik wrote:
On the Mega Drive synchronization is done by turning on display, overflowing the FIFO in active scan (so the VDP forces the 68000 to halt - this is where both get in sync), then turning off display and starting the DMA. Is it possible to do something similar with the 65816 and the PPU?
See above for your options, unless byuu knows of others.
What about changing the palette each scanline? It worked on the Game Boy Color. But I'm not as familiar with the SNES's architecture, so I don't know if that works there too.
Dwedit wrote:
What about changing the palette each scanline? It worked on the Game Boy Color. But I'm not as familiar with the SNES's architecture, so I don't know if that works there too.
Yes, I believe that's doable (*especially* with HDMA -- that's partially what it's for!), but for whatever reason Sik is looking at things from a different perspective (mainly from a "comparison to the MegaDrive/Genesis" viewpoint). The two consoles are significantly different, thus accomplishing Fun Thing X on the MegaDrive is very different than on the SFC/SNES.
koitsu wrote:
DMA capability previously discussed is either done within the physical CPU (e.g. some extension/capability Sony added to the chip) itself or a separate chip on the mainboard somewhere. It is definitely not done within the PPU.
Is this not mentioned anywhere in the docs? Is this done by the bus controller, by any chance? (also H-DMA implies the PPU is involved somehow, since only the PPU knows when a line starts)
koitsu wrote:
I'm not sure what you mean by "to a given position in screen". Are you requesting that the SFC/SNES have some way to get X/Y coordinates of the electron gun while the PPU + video hardware draws pixels?
That can be useful, but I'm not necessarily talking about that - I was talking about getting the 65816 and the PPU in sync when the beam hits exactly a specific position of the screen. It's the only feasible way to do the DMA trick - otherwise we'll have to resort to other methods.
koitsu wrote:
Yes, I believe that's doable (*especially* with HDMA -- that's partially what it's for!), but for whatever reason Sik is looking at things from a different perspective (mainly from a "comparison to the MegaDrive/Genesis" viewpoint). The two consoles are significantly different, thus accomplishing Fun Thing X on the MegaDrive is very different than on the SFC/SNES.
It's that I find it pathetic that the one console known for sucking at color count badly can pull off an image with this amount of color precision but the SNES can't. There
must be some way to get a similar effect on the SNES (be it the same approach or not).
Also the documentation implies that HDMA is limited in the amount of data you can transfer... I repeat, it's 1, 2 or 4 bytes per line, we're looking at transferring many more if we go the DMA route.
What are you talking about ?
The SNES' mode 3, 4 and 7 can have 256 colour BG. The palette is 256 colours anyways so you can only have more by changing the palette midframe with HDMA, which is possible, as Secret of Mana does it.
But can you really see the difference between a 256 colour image and 512 colour ? I think you wouldn't note any differences.
Bregalad wrote:
What are you talking about ?
The SNES' mode 3, 4 and 7 can have 256 colour BG. The palette is 256 colours anyways so you can only have more by changing the palette midframe with HDMA, which is possible, as Secret of Mana does it.
Wait, so it isn't just mode 7? Then the blending idea would work because there's quite a large amount of colors to use - enough to give us some reasonable amount of precision per component. Then the issue would be to see if there's enough VRAM to hold it all XD (otherwise we'd have to resort to something that isn't fullscreen)
Bregalad wrote:
But can you really see the difference between a 256 colour image and 512 colour ? I think you wouldn't note any differences.
Why are we using higher color depths these days? =P Well, that, and the ability to manipulate each individual color of the RGB component individually, which can help with some stuff. And really... just a way to prove it can be done on the SNES >_>
Also I swear, Nintendo's docs are a complete disaster when it comes to organization. It's nearly impossible to find what I'm looking unless I already know where it is.
Quote:
Then the issue would be to see if there's enough VRAM to hold it all XD (otherwise we'd have to resort to something that isn't fullscreen)
In mode 7, you only have 256 tiles, and only the first 32k half of the VRAM is used, so no full-screen without repeated tiles.
However with mode 3 or 4 you have 1024 tiles, and if all of them are defined it takes exactly the entiere 64k VRAM. Therefore you'll have to reserve at least 1024 bytes for the map, but then 63k of RAM, or 1008 tiles. Because at most "only" 32x30 = 960 tiles are visible on the screen, it is possible to make a full-screen image in 256 colours without even resorting to any "dirty tricks".
Quote:
Why are we using higher color depths these days?
You're confusing the number of colours available (the depth of the palette) and the number of colours used at a time (the size of the palette).
If you take a high resolution image, and save it as a 256 colour BMP (3, 3, 2 bit RGB) chances are you'll immediately note a huge loss of quality.
However if you reduce the number of colours to 256 and save it as PNG, chances are you won't notice a difference, or a very slight one, especially if dithering is used.
Quote:
Also I swear, Nintendo's docs are a complete disaster when it comes to organization. It's nearly impossible to find what I'm looking unless I already know where it is.
Why not use Anomie's docs then ?
Bregalad wrote:
However with mode 3 or 4 you have 1024 tiles, and if all of them are defined it takes exactly the entiere 64k VRAM. Therefore you'll have to reserve at least 1024 bytes for the map, but then 63k of RAM, or 1008 tiles. Because at most "only" 32x30 = 960 tiles are visible on the screen, it is possible to make a full-screen image in 256 colours without even resorting to any "dirty tricks".
The problem is that with the blending method (which is what we'd be using there) we need to use
two tilemaps, so we'd need to store double the amount of tiles (assuming we don't repeat anything), meaning that in practice only half the screen can be filled that way =/ (also I don't care if the second tilemap has less colors, because we could just put two of the RGB components into the 256 colors tilemap and the other component in the other tilemap and still get a decent amount of colors).
Maybe some way to work around it? Just thinking, since the Mega Drive demo has pixels that are double the width as usual (so it'd be only fair to let the SNES do that too for this idea if needed), could we use mosaic mode to skip every other horizontal pixel? Then we'd need to store only half the amount of pixels in each tile. Then what can we do is fill the top half with all 960 tiles, then reuse those tiles in the bottom half but scroll the tilemaps horizontally by 1 in that area (so the remaining pixels are shown instead). Dunno if it's understandable what I'm trying to say.
On that note: the Mega Drive in NTSC mode can only show 224 lines (28 tiles vertically), so I guess it's fair game to just do 32×28 tiles and still claim it's fullscreen with the SNES.
Bregalad wrote:
Why not use Anomie's docs then ?
Because Nintendo's docs are what I have here right now >_>
EDIT: it seems mosaic mode must have the same value both horizontally and vertically... *sigh* So that'd make 2×2 pixels, not 2×1 pixels. It'd be worse than what we had before, but it's an improvement I guess. I suppose we could use H-DMA and change the vertical scroll every line to work around this?
2 tilemaps ? Blending ? What's the point ?
In all cases, if you use mode 4, you can get a second tilemap with 2BP tiles. A complete image with it takes 16k + 1k for the map.
If you use this, the space remaining for the 8BP tiles of BG 0 is 64k - 1k - 16 - 1k = 46k = 736 tiles = 23 tile rows. Not quite fullscreen, but not too far away.
Since this also means less tiles will be used for the second tilemap, a compromise could be found between 23 and 30 rows.
Bregalad wrote:
2 tilemaps ? Blending ? What's the point ?
Get past the 256 colors limit and retaining control over every RGB component. The idea would be to use additive blending (so the components from both layers are added), then put two components in the 256 color tilemap and the remaining component in the other (OK, other kind of combinations could be done if needed, but the idea is to add the values essentially).
And yes, I know, this may not seem to have much practical use - the Mega Drive demo has even less practical use since the bitmap is in ROM (it can't fit in RAM) and almost all CPU time is eaten up by the DMA (as the DMA halts the CPU). The idea is just to do it somehow.
(that said, it's not that useless... as I said, in the case of the Mega Drive the idea works like wonders when combined with the Mega CD, and in the case of the SNES not only you aren't halting the CPU if we go the way we're discussing right now, but there's always the possibility of using co-processors in the cartridge too like several games did)
Bregalad wrote:
In all cases, if you use mode 4, you can get a second tilemap with 2BP tiles. A complete image with it takes 16k + 1k for the map.
If you use this, the space remaining for the 8BP tiles of BG 0 is 64k - 1k - 16 - 1k = 46k = 736 tiles = 23 tile rows. Not quite fullscreen, but not too far away.
I managed to find the tables describing each mode, it seems mode 3 may be more useful (BG1 is 256 colors, BG2 is 16 colors). Is there any catch regarding mode 3 that makes mode 4 better, size usage aside? (and as I said, abusing mosaic would halve VRAM usage at the expense of doubling the pixel width - we can always just go that way if we have no option)
Bregalad wrote:
Since this also means less tiles will be used for the second tilemap, a compromise could be found between 23 and 30 rows.
Don't forget you could also always eat a bit from the horizontal borders to get some extra room.
Mode 4 has the "offset-per-tile" mode that mode 3 lacks.
You can't save VRAM by using mosaic, period. Remember how tiles are stored in memory, each byte is a "bitplane" of 8-pixel.
What you'd do by setting the mosaic to 2 is make every odd bit in all bytes of VRAM unused, but there is no way to use those bits in any way.
And I really don't think you'd notice if there is more than 256 colours or not, given the assumption that the 256 colours have been well chosen.
Bregalad wrote:
Mode 4 has the "offset-per-tile" mode that mode 3 lacks.
I really wasn't taking it into account, so I doubt that's an issue.
Bregalad wrote:
You can't save VRAM by using mosaic, period. Remember how tiles are stored in memory, each byte is a "bitplane" of 8-pixel.
What you'd do by setting the mosaic to 2 is make every odd bit in all bytes of VRAM unused, but there is no way to use those bits in any way.
GS Mikami disagrees (8:20 in case it doesn't seek). Mosaic mode seems to apply at the scroll level, not at the tile level, otherwise the background wouldn't suffer from those weird changes when it moves around. Unless there's a severe emulation error there (it'd be nice to see how it looks in real hardware).
If we can get mosaic to work like that then what I'm saying should be perfectly doable.
Bregalad wrote:
And I really don't think you'd notice if there is more than 256 colours or not, given the assumption that the 256 colours have been well chosen.
Again, the whole point of this is just to beat the Mega Drive, not to be practical... (for starters trying to modify the image in itself would be a pain)
I'm working on mock-ups, similar to
these that I made for "blending" on an NES, for direct-color (RGB 332) and optimized-palette conversions on a Super NES, compared to best-case operation (half res horizontally, RGB 333) on the MD.
Bregalad: What happens if you horizontally flip mosaiced tiles?
Oh I see what you mean. You're right it would be possible to combine two different BGs with mosaic set for a higher colour depth.
However there is no mode with mulitple 8BP BGs, therefore you'll have to deal with mode 1 or mode 2's 4BP BGs.
But then there is only 16*16 = 256 colours possible on each tile - no gain from mode 3 or 4.
So you'll have to use the same VRAM area for 8BP tiles and 4BP tiles at the same time in mode3 - possible but sounds complex to me !
In other words, yes you'll get more than 256 colours, but at the price of using huge "pixels" and two BGs, when you could just stick with 256 colours in mode 3/4 it would be simpler and look better in 99% of cases.
Bregalad wrote:
However there is no mode with mulitple 8BP BGs, therefore you'll have to deal with mode 1 or mode 2's 4BP BGs.
Not necessarily. Let's assume we went for the same amount of precision - RGB 3.3.3 (512 colors). Yes, we could do better, but let's start there. Each RGB component would require 3 bits. So, what we could do is store two of the components using the 8bpp palette (64 colors would be needed) and store the remaining component in the 4bpp palette (8 colors would be needed). That should certainly be doable with mode 3. We're only using 72 colors in CG, so we could probably expand this further (that's a topic for later).
Bregalad wrote:
So you'll have to use the same VRAM area for 8BP tiles and 4BP tiles at the same time in mode3 - possible but sounds complex to me !
Do you mean sharing the tilemap table or what? I was calculating, and the tiles would take up 42KB (assuming we do 32×28 and not 32×30, and we do the mosaic thing)... that leaves 22KB for other stuff. Isn't that doable?
Bregalad wrote:
In other words, yes you'll get more than 256 colours, but at the price of using huge "pixels" and two BGs, when you could just stick with 256 colours in mode 3/4 it would be simpler and look better in 99% of cases.
Considering the thing we're trying to beat doesn't even have sprites and needs an add-on to be even remotely useful (since it eats up lots of CPU time, and to make it editable you'd need to put RAM in the cartridge as it won't fit in 64KB), this is a lot better.
Then again, the whole thing here is just to beat that, not to make something practical. If our aim was the latter then I'd probably avoid using 8bpp graphics (mode 7 aside, as that's unavoidable if I want deformation).

Original true-color image


Dithered down to optimized 256- and 128-color palettes


Dithered down to direct color: 3x3x2-bit (SNES) and 3x3x3-bit palette (Genesis)
In the 256+16 layers, is it possible to have the 256-color layer as direct color and another paletted background sum-blended onto it? If so, here's what can be done with one red bit, one green bit, and two blue bits in the second layer:

Dithered down to 4-bit-per-channel direct color and cropped to the 256x160 pixel safe area for widescreen TVs
It took so long to post this because I had to explain to a family member what a velar fricative was.
Sik wrote:
And yes, I know, this may not seem to have much practical use
Which is why I'm describing this as a "title screen display technique" in the descriptions when I upload these mock-up images to my web space.
tepples wrote:
In the 256+16 layers, is it possible to have the 256-color layer as direct color and another paletted background sum-blended onto it?
Seems so, direct color applies only to BG1 (and works in mode 3), BG2 is unaffected. I suppose you could do RGB 3.3.2 on BG1 (not 4.4.3 since the extra bits have per-tile granularity only) and then RGB 1.1.2 on the 16 colors palette, giving effectively RGB 4.4.4. No downsides, other than being more messy.
I'm not sure how you did your calculation but basically we'd need :
32*28 8BP tiles = 32*28*64 bytes = 56 kb
32*28 4BP tiles = 32*28*32 bytes = 28 kb
Two maps : 2* 2kb = 4 kb
-> total = 88 kb > 64 kb
If you want to save RAM using mosaic it'll be complex as the 4BP tiles and 8BP tiles would have to share the same memory location !
You'd rather bet on stretching the image vertically using HDMA, simulating pixels which are larger vertically. This is an extremely trivial trick, we only use 14 rows of tiles instead of 28, and would cut the usage in half, both maps will now use half the tiles they were using before : 42kb for both bitplanes, 48kb in total. You even have room left for some sprites

PS : Tepples your image shows perfectly that stripping to a 3,3,2 256-colours palettes is not the same as restricing the total colours to 256, which is what I was mentioning. Thank you.
CG Direct Select (the direct color) mode allows to use 2048 colors on a 256-color layer in Mode 3 and 4 without any tricks.
Bregalad wrote:
If you want to save RAM using mosaic it'll be complex as the 4BP tiles and 8BP tiles would have to share the same memory location !
I really don't get this one, period. Do you mean that both tilemaps have to share the same table or what? Because if you mean that the tiles share the same base address don't forget you can simply use different tile IDs... you have two tilemaps after all, they don't have to be the same.
Shiru wrote:
CG Direct Select (the direct color) mode allows to use 2048 colors on a 256-color layer in Mode 3 and 4 without any tricks.
Except that the bottom bit of each component has to be set through the palette bits, which are set on a per-tile granularity (when we want per-pixel), and one of those bits is needed to make the blue component bump up to 3-bit (otherwise the Mega Drive would get an advantage here).
I'm not sure what you don't understand.
You agree that if you save VRAM by using mosaïc, this will make pixels 4 times bigger, and only the even or odd bits from the input tile will be used. However this has no purpose if you can't re-use those even/odd bits for the second BG layer.
This will mean both tiles share the same memory location - but will be distorted differently by mosaic.
In other words it's simpler and better to stretch the image vertically only. This will make the pixels bigger only in the vertical direction - they'll not be square, ok but who cares ? At least the high resolution is preserved horizontally, we have enough VRAM for it, and this can only be for the best.
Bregalad wrote:
I'm not sure what you don't understand.
You agree that if you save VRAM by using mosaïc, this will make pixels 4 times bigger, and only the even or odd bits from the input tile will be used. However this has no purpose if you can't re-use those even/odd bits for the second BG layer.
This will mean both tiles share the same memory location - but will be distorted differently by mosaic.
Um, no, that wasn't my idea. My idea was to use the odd bits for the top half of the screen, then use the even bits for the bottom half of the screen (which is why the horizontal scroll by 1 in the bottom half). There is no weird overlap between tiles like you say.
Ok okay, sorry I didn't understand it.
I wonder if there's a way to screw with the mosaic register to get it to work only horizontally, probably by changing it to a different value during draw than it has during blanking.
I could mention quite a few games that changes mosaic midframe with HDMA :
- Final Fantasy V when you wrap in the last dungeon
- Star Ocean when you encounter a random battle
I don't think it "screws up" anything, the SNES seems to handle it just fine.
I think he meant "screw up" as in "make it behave in a completely different way from what it was intended" (ideally in a stable way).
As I mentioned before, for what we want we could easily work around it by just changing the vertical scroll every line. If you wanted to get the same effect horizontally you'd start having issues though (as trying to change the scroll every pixel is just... plain insane, and it may even be cached at the beginning of the line making that impossible).
When I was playing a bit of Ristar and kept noticing that when the top of the water always had the palette-write artifacts in the same spots consistently, the idea of using these writes as a means of actual output crossed my mind but I definitely don't have the know-how to actually do so myself - I'm glad someone utilized the idea!
Sik wrote:
I think he meant "screw up" as in "make it behave in a completely different way from what it was intended" (ideally in a stable way).
Exactly. In English, there's a difference between "screw with", which is neutral, and "screw up", which carries a connotation of breakage. The various raster effects on NES hardware, especially the $2006-$2005-$2005-$2006 writes and extending vblank with forced blank, are methods of "screwing with" the PPU to produce a desired, stable result. So are the methods to manipulate so-called "bad lines" on the Commodore 64's VIC-II in order to produce smaller attribute areas, larger sprite counts, border intrusions, and the like. So are the various extended modes of the MMC5, in fact. I was just wondering if toggling mosaic at just the right point on the scanline would trick the PPU into forgetting the mosaic data that it had saved for the scanline and fetch new data for the next scanline instead.
I have no idea how the SNES PPU works internally at all, but if it's anything like the Mega Drive, it doesn't cache the tilemaps at all but reads them on-the-fly as it rasters. No idea if the mosaic data is cached, though (which could be a possibility, especially since it can work with non-power-of-two values). It's also possible it may work, but only on a 16 pixel granularity (considering that mosaic seems to work on that step amount).
Is there any research on how the SNES PPU renders each scanline? Maybe Byuu has more info?
I'm a bit late to the party here, I guess...
This is a demo of a palette expansion technique using HDMA. The actual SNES coding is not especially interesting; it's basically the same thing everyone does for background gradients, but with all channels engaged and using much funkier-looking tables, with a palette refresh in the NMI routine.
(Let me guess - someone's already done this, and I just haven't stumbled across it yet?)
It isn't really like the Mega Drive version from the OP, as it requires a lot of offline processing to generate the HDMA tables, whereas the FantomBitmap technique can apparently be used to render games live if you have a coprocessor that isn't halted by the DMA - say, the one in the Sega CD. But this looks much better and can at least be used for title screens...
Also, my HDMA scheduler tool could probably be improved; for the moment I've satisfied myself with beating the colour count in the MD demo. Even as it stands, I've gotten colour counts in excess of 500 with other (copyrighted) images, but since this was the one used in this thread...
I absolutely cannot find the original MD demo image. I can only assume this wasn't it...
93143 wrote:
(Let me guess - someone's already done this, and I just haven't stumbled across it yet?)
Not that I'm aware of.
93143 wrote:
It isn't really like the Mega Drive version from the OP, as it requires a lot of offline processing to generate the HDMA tables, whereas the FantomBitmap technique can apparently be used to render games live if you have a coprocessor that isn't halted by the DMA - say, the one in the Sega CD. But this looks much better and can at least be used for title screens...
I imagine that one of the co-processors used on the SNES may be able to work with this.
Even then, there's also the option of not making it fullscreen. Overdrive was originally going to include a screen that was half FantomBitmap and half normal (with a scrolling parallax), using that second half to do processing. In practice though I doubt there's enough CPU time left to do anything useful (remember, rendering on such a large bitmap takes up a lot of time).
93143 wrote:
I absolutely cannot find the original MD demo image. I can only assume this wasn't it...
If I could remember what the search query was I'd just look up it again... I do have the image (since I was the one that took it) but it's on my other hard disk so I can't access it at the moment =/
Sik wrote:
93143 wrote:
It isn't really like the Mega Drive version from the OP, as it requires a lot of offline processing to generate the HDMA tables
I imagine that one of the co-processors used on the SNES may be able to work with this.
Uh... it requires a LOT of offline processing to generate the HDMA tables.
Seriously, just processing a pre-quantized image (and simply flagging unhandled pixels instead of trying to re-quantize to fit them in) takes at least a few seconds on a 3 GHz Pentium 4. Admittedly, it's written in Matlab and wasn't coded for speed, but still...
Maybe I'm just being uncreative.
Quote:
I do have the image (since I was the one that took it) but it's on my other hard disk so I can't access it at the moment =/
If you get the chance, could you post it? I'd be interested to see what this method could do with it...
93143 wrote:
Uh... it requires a LOT of offline processing to generate the HDMA tables.
Seriously, just processing a pre-quantized image (and simply flagging unhandled pixels instead of trying to re-quantize to fit them in) takes at least a few seconds on a 3 GHz Pentium 4. Admittedly, it's written in Matlab and wasn't coded for speed, but still...
OK, so it's definitely not working even remotely similar to the Mega Drive demo. If it was you could get away with just two well-timed DMA operations (would be one if it wasn't because of the 64KB limit). In that case you'd literally have a linear high color bitmap (with some gap between lines).
93143 wrote:
If you get the chance, could you post it? I'd be interested to see what this method could do with it...
Yeah, but it'll take a while...
Sik wrote:
OK, so it's definitely not working even remotely similar to the Mega Drive demo.
Nope. Sorry if I got your hopes up. It's not actually direct colour; it's just a dynamic palette, changing entries during HBlank as the opportunity arises. It doesn't do anything fancy with the hardware.
But at least it doesn't halt the CPU, and the pixels are normal-sized...
According to byuu (circa bsnes v0.22) CGRAM internal addresses are scrambled during active display, like the OAM, and according to nocash force blank is always black. Also I don't think the SNES displays glitch pixels like the MD when you change the palette (but I don't have a source for that and you can probably disregard it). So I'm not sure how one would go about accessing the full 15-bit palette directly.
...on the other hand, Joshua Cain's 32,768 colour demo from 2002 really does display 32,768 colours - it just looks more like that "Font colour" box to the right of this post editor I'm using than any sort of actual picture. And it doesn't come close to filling the screen. And it works fine in ZSNES. I suspect it's just a combination of fairly conventional methods.
Either way, precisely-timed hardware exploitation isn't really my speed right yet - this is literally my first ROM. It's just a bloody hack of the slideshow demo from Neviksti's SNES Starter Kit, and the HDMA code started out as a copy/paste from the tutorial on smwcentral...
Sik wrote:
93143 wrote:
If you get the chance, could you post it? I'd be interested to see what this method could do with it...
Yeah, but it'll take a while...
Cool, thanks. I'm in no hurry...
93143 wrote:
I'm a bit late to the party here, I guess...
This is a demo of a palette expansion technique using HDMA. The actual SNES coding is not especially interesting; it's basically the same thing everyone does for background gradients, but with all channels engaged and using much funkier-looking tables, with a palette refresh in the NMI routine.
(Let me guess - someone's already done this, and I just haven't stumbled across it yet?)
It isn't really like the Mega Drive version from the OP, as it requires a lot of offline processing to generate the HDMA tables, whereas the FantomBitmap technique can apparently be used to render games live if you have a coprocessor that isn't halted by the DMA - say, the one in the Sega CD. But this looks much better and can at least be used for title screens...
Also, my HDMA scheduler tool could probably be improved; for the moment I've satisfied myself with beating the colour count in the MD demo. Even as it stands, I've gotten colour counts in excess of 500 with other (copyrighted) images, but since this was the one used in this thread...
I absolutely cannot find the original MD demo image. I can only assume this wasn't it...
Nice done, the image looks really nice ! I don't know how much colors are actually display but in conjunction with the RGB555 master palette the result is definitely very good and far better than what the MD RGB333 BITMAP mode can do !
Actually you used the "classical" scanline palette reprogramming but maybe at the maximum capabilities for the SNES.
The same method is used in the "Overdrive" MD demo to display a 512 colors image: they abuse of palette reprogramming during HBlank. The advantage on SNES is that you can use HDMA for that and so have CPU free (it just need to prepare the HDMA tables).
In "Overdrive" the HBlank area is extended so we can send more colors during blank period than usual... but because of that, we cannot use the H-Int anymore and the CPU is almost 100% busy in handling that (only VBlank period to do something else).
93143 wrote:
Sik wrote:
OK, so it's definitely not working even remotely similar to the Mega Drive demo.
Nope. Sorry if I got your hopes up. It's not actually direct colour; it's just a dynamic palette, changing entries during HBlank as the opportunity arises.
So it's more like the 3200-color mode of DreamGrafix for Apple IIGS.
Quote:
According to byuu (circa bsnes v0.22) CGRAM internal addresses are scrambled during active display, like the OAM, and according to nocash force blank is always black. Also I don't think the SNES displays glitch pixels like the MD when you change the palette (but I don't have a source for that and you can probably disregard it). So I'm not sure how one would go about accessing the full 15-bit palette directly.
Now I'm thinking of how to get 12-bit color. Mode 3 gives you a 256-color BG1 and a 16-color BG2, and 256-color layers can be set to "direct color" (a constant BBGGGRRR palette). This "direct color" alone nearly equals anything that can be done with the Genesis. But then you can do color addition between a direct color layer and a 16-color layer with a BBGR palette. Or did we already rule that out pages ago?
93143 wrote:
According to byuu (circa bsnes v0.22) CGRAM internal addresses are scrambled during active display, like the OAM, and according to nocash force blank is always black.
For the original method active scan doesn't matter since it uses forced blank. So can somebody confirm if the SNES displays black instead of the background color while blanking?
93143 wrote:
Also I don't think the SNES displays glitch pixels like the MD when you change the palette (but I don't have a source for that and you can probably disregard it).
The original method didn't use the glitch pixels so don't bother with that... they were hidden by making them the same color as the background. The idea is that you change the background color constantly to trick the video hardware into rendering a bitmap.
Could you do HAM (hold and modify) by DMAing to COLDATA ($2132)?
Stef wrote:
Nice done, the image looks really nice !
Thanks!
Quote:
Actually you used the "classical" scanline palette reprogramming but maybe at the maximum capabilities for the SNES.
I kinda suspected something like this had been attempted before, and last night I stumbled across a discussion of the Overdrive demo in which it became evident that it was doing something similar...
But my demo wasn't at max capability. This afternoon I changed the preprocessing algorithm, from only checking the earliest scanline with a stale colour to using a two-constant quality weighting system on
all scanlines with stale colours. (This boosted the execution time from 3-4 seconds to 30-40 seconds...) The result is attached - the first version had 337 colours; this one has 417. And it's still only using about 34% of the available HDMA bandwidth. I guess it's just a stubborn picture; the success of the HDMA scheduling seems to depend to a significant degree on the parameters used in the image quantizer...
tepples wrote:
So it's more like the 3200-color mode of DreamGrafix for Apple IIGS.
I... guess so, yeah. I'd never seen that before.
Quote:
Mode 3 gives you a 256-color BG1 and a 16-color BG2, and 256-color layers can be set to "direct color" (a constant BBGGGRRR palette). This "direct color" alone nearly equals anything that can be done with the Genesis. But then you can do color addition between a direct color layer and a 16-color layer with a BBGR palette. Or did we already rule that out pages ago?
No, it was mentioned (by you, as a matter of fact), and AFAICT it should work. The problem is that it overloads the VRAM, so you have to letterbox it a bit.
tepples wrote:
Could you do HAM (hold and modify) by DMAing to COLDATA ($2132)?
That's what the SNES counterpart of the original method would be, pretty much.
There's only one thing I'm not sure about which is why I started this thread in the first place. Writes are byte-wide, not word-wide (unlike in the Mega Drive). This means that transferring a color takes up two writes. So here's the question: does the first byte get latched until the second is written, or do bytes get written immediately? Because if the latter, the method won't work (every other column would have an invalid color).
EDIT: also that's not what HAM does (HAM basically just takes the previous pixel and replaces one of the RGB components to get the new pixel), but eh, you get the idea.
Sik wrote:
also that's not what HAM does (HAM basically just takes the previous pixel and replaces one of the RGB components to get the new pixel), but eh, you get the idea.
According to wiki.superfamicom.org, COLDATA writes transfer 3 bits of "which components to overwrite" and 5 bytes of value. That looked very HAMmy to me.
I don't have the SNES documents with me right now, I just assumed you meant DMAing to the CG memory register (which is literally "copy this word to CG RAM"). I'm not aware of any other method to write to it =S
No, COLDATA is separate from CGRAM. It's the subscreen background colour. Writing an arbitrary colour to it takes six pixels (assuming standard DMA at 8 cycles per byte), and intermediate values would be displayed - assuming writing to it during display takes immediate effect. Grayscale or monochrome could be done with two-pixel granularity...
Isn't the CPU (and by extension DMA) halted for 40 cycles in the middle of every scanline? That wouldn't look very nice...
...
I do kinda like the mosaic idea combined with colour math, if you really want full-screen direct colour and don't mind horizontally doubled pixels. It seems like it should work, though the underlying palette would only be 12-bit...
Unfortunately it still wouldn't be the functional equivalent of the MD version, since a single frame would exceed the available DMA bandwidth, whereas the MD version by definition does not. And since a full screen of this would take up more than half the VRAM, paging would be impossible without tearing...
Is it possible to send separate pixels to SNES screen, or one can only feed it tiles that need to be prepared beforehand? If the latter, how do you guys convert image parts to SNES tiles (and store them)? I'd be interested in seeing the source code of both hcolor demos.
feos wrote:
Is it possible to send separate pixels to SNES screen, or one can only feed it tiles that need to be prepared beforehand?
Normally, aside from the very limited graphical capabilities afforded by the backdrop colours and main/sub screen functionality in combination with HDMA and window masking, you pretty much have to use tiles. You can render tiles on the S-CPU, but it's slow, and to the best of our knowledge you can only transfer them to VRAM during VBlank or forced blank.
It may be possible to continuously modify the subscreen backdrop colour, but it can only take one 5-bit value at a time and apply it to any combination of the three colour channels, so even DMA wouldn't be able to do arbitrary colour changes faster than two pixels per colour channel, which would look weird. Plus DMA doesn't work during DRAM refresh in the middle of every scanline, so you'd have 10 pixels of no changes at all unless you masked it with tiles (you could perhaps modify the relevant palette entries during HBlank).
Also, byuu recently informed me that he's "
never once seen a CGRAM write fail", and that if the PPU is accessing it too the worst that can happen is that the colour will go to the wrong entry. This was in the context of HBlank writes like I use in my hcolor demos, but the generality of his wording suggests to me that FantomBitmap might just be possible on the SNES after all, though it would have to use quadruple-wide pixels instead of double-wide because of the 8-bit DMA. I plan to try this, but I'm under a lot of pressure right now so I can't really hobby much.
Quote:
If the latter, how do you guys convert image parts to SNES tiles (and store them)?
I use custom Matlab utilities to generate tiles, tilemaps, palettes, and HDMA tables from Windows bitmap data, since I'm good at Matlab and happen to have a copy (it's not cheap). Unfortunately Octave (free Matlab) doesn't run my scripts properly...
Storage is easy. Once I have the binary data, I just include it in the ROM via the
.incbin directive. Not sure what you're asking here...
Quote:
I'd be interested in seeing the source code of both hcolor demos.
Do you mean the Matlab data preparation script, or just the SNES ASM? I can do both, but I'd need to happen across a bit of free time to prepare the SNES stuff as it is a very ugly hack of someone else's slideshow and thus contains a lot of superfluous code. The actual method is very simple, and all that extra infrastructure would only confuse the issue.
Thanks for the answer, my goal is to show just anything on the screen, it can even consist of only black and white, if it'd be easy to implement. And being easy is actually critical for me, since I've never seen SNES code, only NES and Genesis.
So how much freedom does backdrop layer allow? Can it change colors every pixel? Color depth is not an issue at all, as I mentioned. If it only allows going in blocks of the one color, it's still okay for me. If only in lines, then no.
SNES ASM source is enough. Given it had some comments and other things that'd simplify understanding.
Oh, hiya feos! I missed this thread back in the day and I haven't fully caught up on it, but sounds kinda related to a project of mine from a while back?
I made a python script that takes a bitmap image and processes it into tiles and a tilemap for SNES. The color depth isn't fantastic since it's in mode 1 and not mode 7, therefore having to share 8 palettes of 16 colors for the whole image, but I'm still quite impressed with the results I got from it. There is definitely still room for improvement too - you could use sprites for a boost to the color depth for example. This was mostly just a proof-of-concept.
With a loooot of pre-processing (like, weeks of it), you can even use this method to render a video frame-by-frame into SNES format. As an example I did
the "Sonic Boom" video from Sonic CD. You need bsnes or an SD2SNES to play it unfortunately, due to reliance on MSU1, but the same concept can still be done without it. It's just a rom size thing; a 4 MB rom can only hold a couple seconds of video like that. Also, CD-audio!
My only complaint about the video results is that some frames that are almost but not quite one solid color came out looking like crap. If I had a cleaner source video that probably wouldn't happen, and the SNES's more limited color depth amplifies the smooth gradients into ugly jagged transitions, but yeah. I should really record video of it playing on my SNES and put it on youtube or something.
Anyways, hope that was relevant to the topic at hand.
EDIT: in the interest of the "easy" factor, I even wrote a small "picture displayer" ROM that takes static images output from my program and displays them on screen, in a slideshow-like fashion. If you're interested I could throw the source on my github when I get home.
Woah, haven't looked at those, but sounds powerful.
My plan is to write an article about arbitrary code execution, and the best explanation would be to try and execute some code that is exclusive, like displaying a logo of the 'zine I'm writing for. So it's not a ROM with inserted images, but an actual program (fed through executing contents of input registers as code) that generates them. Hence, to be fast and easy, I'd prefer some methods that are as direct and straightforward as possible, not impressive or demoscene-ish.
And yes, by all means, post your sources.
PS: I know you, Khaz, I accepted your runs! Hi.
feos wrote:
My plan is to write an article about arbitrary code execution, and the best explanation would be to try and execute some code that is exclusive, like displaying a logo of the 'zine I'm writing for. So it's not a ROM with inserted images, but an actual program (fed through executing contents of input registers as code) that generates them. Hence, to be fast and easy, I'd prefer some methods that are as direct and straightforward as possible, not impressive or demoscene-ish.
And yes, by all means, post your sources.
I will make a note to stick the sourcecode up on github when I get home if it's not there already.
I don't know how big of a block of arbitrary code you have to work with, but I can think of a few ways that could be done. My first take on it would be something like...
- Disable interrupts, turn off screen
- Ensure in Mode 1 (for example). Disable BG1 and 2, so you just have BG3 (2bpp)
- Set up a DMA to clear BG3's tilemap in VRAM. (I think you can do that with ACE? Set the DMA source as a zero value somehow?)
- Write a blank tile and then a few tiles for your logo to BG3's CHR VRAM through $2118, and then a few corresponding tilemap entries to show them on the screen
- Write a palette to CGRAM if you have time/inclination
- Ensure BG3's scrolls, tile sizes, etc... are set right, turn on screen!
- If you don't care what happens next, a STP instruction should suffice to keep it on screen.
Anyone have opinions on my method? Shouldn't be hard to do or take too many instructions to pull off, I don't think. If it sounds viable I'd be happy to write up some sample code of what I mean when I get home.
feos wrote:
PS: I know you, Khaz, I accepted your runs! Hi.
Glad to see I am so memorable.

If I were to try all that, I'd keep the screen on during all the building, just for the sake of itself. Unfortunately, I'm merely unable to try writing something that complex as your suggested payload. But I can try learning from examples that are noob-simple.
However, if you feel interested, I can just collaborate with you on this and make it actually something nice with your help. Though at first I'd need to learn the very basics, to understand what's going on. Because, you know, to explain something to people, I have to be able to pull it off myself

No problem, and hey, I wouldn't mind collaborating either. Always been fascinated by ACE and would love to see the process more closely. We'll see how it goes! I'll write up what I mean by that code when I get home.
The turn off screen part is only because you can't write to VRAM while the screen is on, so you'd need to either do that or wait for a vBlank. Turning off the screen just seems easier.
If you want to leave rendering on:
- Set the mode and scroll
- During vblank, use DMA to copy a palette to CGRAM and a few logo tiles to VRAM
- Prepare a tilemap in 2048 bytes of main RAM
- During vblank, copy this tilemap and more logo tiles to VRAM
I decided to just go ahead and clean up the hcolor2 code. It didn't take all that long, as it turns out - I have a much better understanding of what's going on than I did back then. It's not an ideal model even now, but it should at least be readable.
Attachment:
hcolor2.7z [122.98 KiB]
Downloaded 137 times
However, I would recommend taking a look at Neviksti's actual
SNES Starter Kit, as well as everything on
superfamicom.org (especially the
Registers page) and
nocash's docs. Also, I believe the SNES Starter Kit uses a very old, buggy version of WLA DX, so watch out for that. You may want to use an updated version or a different assembler entirely.
feos wrote:
So how much freedom does backdrop layer allow? Can it change colors every pixel? Color depth is not an issue at all, as I mentioned. If it only allows going in blocks of the one color, it's still okay for me. If only in lines, then no.
You have two backdrops, the main screen one (which uses CGRAM colour #0) and the subscreen one (which uses the separate COLDATA setting). You can change each one during HBlank using HDMA or an IRQ, or by polling HVBJOY, and I believe it is also possible to change COLDATA and
possibly CGRAM #0 mid-scanline with timed code and/or an IRQ (as I said above, I plan to try this using bulk DMA to see if I can get a picture). The main and subscreen backdrops can be combined with colour math. I'm 99% sure the colour math mode defined by CGADSUB can be changed mid-scanline (I don't know how else to explain my results), but I don't know if any artifacting occurs.
There are two masking windows available; each one has a left and right position that can be changed during HBlank (or
possibly during a scanline, though this could fail or result in garbage; I haven't tried it). The windows can be set to mask individual BG layers and/or the sprite layer, and can also prevent colour math or even force masked areas to black. Effects like the keyholes in Super Mario World and the rank numbers in Super Mario Kart are done by modifying the window edge positions with HDMA, and if your logo is vertical and doesn't contain M, N, or W this should work fine.
There's actually a commercial game (Air Strike Patrol) that changes the screen brightness in INIDISP mid-scanline. It's used to draw the shadow of the player's aircraft, though it's a bit jagged due to timing issues.
Mid-scanline writes are apparently dicey when moving from a regular SNES to a SNES Jr., as the PPU was radically redesigned and doesn't work the same. HDMA and windowing are no problem, besides which they are generally much better looking and less finicky to get working.
Quote:
My plan is to write an article about arbitrary code execution, and the best explanation would be to try and execute some code that is exclusive, like displaying a logo of the 'zine I'm writing for. So it's not a ROM with inserted images, but an actual program (fed through executing contents of input registers as code) that generates them.
That sounds like a pretty ambitious plan for someone who's never seen SNES code before.
I don't suppose you could load a graphics loader and then upload a bunch of tiles through the controller ports...
hcolor looks pretty impressive. I'll have to read back to the start when I have a night with some actual spare time and see how you did it...
Anyways, here's roughly what I was thinking of for the Arbitrary Code, in WLA format:
Code:
sep #$20 ;8 bit A
rep #$10 ;16 bit X / Y
lda.b #$00
pha
plb ;set bank to zero, in case it was in a bank without the hardware registers
stz.w $4200 ;disable all interrupts (and joypad reading)
lda.b #$80
sta.w $2100 ;turn screen off
lda.b #$04
sta.w $212C ;enable BG3, disable all others and sprites
lda.b #$41
sta.w $2105 ;set BG3 tile size to 16x16 and Mode 1
stz.w $2109 ;set BG3 TileMAP VRAM location to $0000 and map size to 32x32
lda.b #$02
sta.w $210C ;set BG3 TileSET VRAM location to $1000 (VRAM addresses are measured in WORDS not bytes)
stz.w $2111
stz.w $2112 ;set BG3 horiz and vertical scrolls to zero
;===== SET UP DMA TO CLEAR BG3 TILEMAP =====
lda.b #$80
sta.w $2115 ;set up VRAM port to accept the DMA properly
ldx.w #$0000
stx.w $2116 ;VRAM write address is $0000 for tilemap
lda.b #$11
sta.w $4300 ;Set DMA Control - should set to read from (I HOPE?) a fixed address.
lda.b #$18
sta.w $4301 ;set destination address to VRAM port - $2118
ldx.w #$???? ;address of a place in ROM that has a $0000
stx.w $4302
lda.b #$?? ;bank of said place in ROM
sta.w $4304
ldx.w #$0800
stx.w $4305 ;Number of bytes to transfer - $800 should cover a 32x32 tilemap
lda.b #$01
sta.w $420B ;initiate DMA to clear BG3 Tilemap.
;======= WRITE TILES TO VRAM MANUALLY ======
;$2115 should still be set up correctly
ldx.w #$1000
stx.w $2116 ;VRAM write address is $1000 for tileSET
ldx.w #$????
stx.w $2118
ldx.w #$????
stx.w $2118 ;write your tiles like so
[...]
ldx.w #$????
stx.w $2118
;====== WRITE TILEMAP TO VRAM MANUALLY =====
;same idea as writing the tile set, just reset $2116 to point to the right part of the tilemap
;======= WRITE SOME PALETTES TO CGRAM ======
lda.b #$01 ;Color zero is background colour, so unless it's bad, start at one.
sta.w #$2121 ;remember, for BG3 (2bpp), a palette is four colours.
;and color zero of each palette is always transparent.
lda.b #$??
sta.w $2122
lda.b #$??
sta.w $2122 ;write color data - format is 0BBBBBGG GGGRRRRR. Write lo byte then hi to same register
lda.b #$??
sta.w $2122
lda.b #$??
sta.w $2122 ;color 2...
lda.b #$??
sta.w $2122
lda.b #$??
sta.w $2122 ;color 3. So that gives you one full palette, do more if you want/have room
;============= DISPLAY RESULTS =============
stz.w $2100 ;turn screen on
STP ;halt processor. Done!
Can anyone tell me if I have the right idea on how to do the tilemap-clearing DMA? Set the "do not increment" bit in $4300, then find an address in ROM with $0000 and set it as the source? I think that's how it works.
Then, all you need to do is come up with some simple tiles for the logo and a colour palette, and it should work.
93143 wrote:
I don't suppose you could load a graphics loader and then upload a bunch of tiles through the controller ports...
... I'm really not sure of how the process works now that I think about it. Can you just execute controller input directly? Does "open bus" have to get involved, like I hear was used in the SMW ACE TAS?
93143 wrote:
That sounds like a pretty ambitious plan for someone who's never seen SNES code before.
I don't suppose you could load a graphics loader and then upload a bunch of tiles through the controller ports...
Khaz wrote:
... I'm really not sure of how the process works now that I think about it. Can you just execute controller input directly? Does "open bus" have to get involved, like I hear was used in the SMW ACE TAS?
http://www.youtube.com/watch?v=YHyaTCuZRzM#t=245Notice how little time it took to program that thing in. We have 8 controllers plugged via Multitap, so we can feed data
really fast.
http://tasvideos.org/3957S.htmlThere's already a setup to jump to there, so the only thing left is the very payload.
Oh, it's we, is it?
Okay, yeah, I hadn't seen that. All I saw was the Pong clone. If you can load data that quickly, you can pretty much do whatever you want.
feos wrote:
Notice how little time it took to program that thing in. We have 8 controllers plugged via Multitap, so we can feed data really fast.
I worry about the stresses on the power supply from using two multitaps long-term. You might remember that some Super FX games display an error message and halt if a high-current-draw peripheral, such as the multitap or Super NES Mouse, is connected. But that'd still be impressive with only two multitaps.
feos wrote:
http://tasvideos.org/3957S.html
There's already a setup to jump to there, so the only thing left is the very payload.
Okay, so now that I've refreshed my memory... Does the entire thing happen in that same controller loop? Execute three bytes, WAI-NOP-WAI, branch back and read another three bytes? The link there mentions that being "unstable", and it seems like it would take forever to do anything useful if you only get one instruction off every other frame, so am I misunderstanding how it works?
If that is stable and predictable though, sounds trivial to send it the program I wrote there and I don't see that there'd be any real limit on how long you could do it for...
Here's an
lsnes movie that was replayed in the vid I posted.
http://tasvideos.org/userfiles/info/19873869950560708You can open the movie as an archive and watch inputs as plain text.
Another attempt in game end glitch has a bit different input than the one I posted above, it seems:
http://tasvideos.org/4315S.html
Just a guess based on the description of 3957S:
- SMW glitchy gameplay loads a first-stage bootloader optimized heavily for size over speed. This is where WAI/NOP/WAI comes in, to ensure the autoreader has had a chance to read the controller.
- The first-stage bootloader slowly loads a second-stage bootloader that can make full use of the multitaps.
- The second-stage bootloader rapidly loads the game into RAM.
In the 'BGMODE or parameter changes during scanline' thread, byuu wrote:
I've never once seen a CGRAM write fail.
Thanks for the tip. I broke higan again.
Writing to CGRAM during active display to transfer data would require absolutely perfect CGRAM fetch timings. It's possible that with a few small tweaks we can make it look nicer, but we're a very long way from being able to pull that off flawlessly. Especially if you're using actual tiles there in any way (which I'm guessing you are to avoid having to write $2121 as well?)
Still though, it's pretty close. At least, it's infinitely better than literally everything else ;)
At any rate, you're a monster, congrats :P
93143 wrote:
In the 'BGMODE or parameter changes during scanline' thread, byuu wrote:
I've never once seen a CGRAM write fail.
Thanks for the tip. I broke higan again.
I wonder how well that would look with a transparent tiled overlay on it. Although it wouldn't be 60fps, you could do something like a jpg image where you have one color in a 4x1 pixel area and 4 shades of that pixel using color math, like just subtracting various amounts. As you already know, there are 4 possible colors with a 2bpp pixel. I bet it would look amazing if you overlayed a 8bpp layer over it, as you could handle more than just shade. (You could change the tint to an extent.) This whole thing is possible, isn't it? I mean, as long as you're changing color 1 instead of color 0.
I didn't inspect what was actually going on, just looked at the screen after running in higan.
If I were to guess, he's designed it so that each scanline contains tiledata that represent color palette entries 0,0,0,0,1,1,1,1,....62,62,62,62,63,63,63,63
That way each DMA can just write to $2122 twice per pixel. That takes 16 clocks, which is 4 pixels. Which explains the 4:1 ratio. Having to write to $2121 to reset the CGRAM index every time would result in an 8:1 ratio.
Essentially, he's created a 64x224x15bpp@60fps video mode :P
With MSU1, you could play back such a stream of video, too. But ... I doubt it'd be very pleasing or useful.
A 2bpp layer blended onto this layer to enhance luminance detail would provide the equivalent of 4:1:1 video, with brightness coming mostly from the 2bpp layer and color coming from the 15bpp layer.
You're not going to be able to do anything in addition to this.
He's literally bit-blasting the final output color fetches themselves and rewriting the entire palette as he goes along.
EDIT: actually, shit. Not even DMA can overcome DRAM refresh, yet I don't see any penalty.
Alright I give up, I have no idea what the hell he's doing here. I guess we have to sit around and wait for him to decide to tell us.
byuu wrote:
If I were to guess, he's designed it so that each scanline contains tiledata that represent color palette entries 0,0,0,0,1,1,1,1,....62,62,62,62,63,63,63,63
Actually, I just used 1 through 5 in a repeating pattern, since that's all I needed to buffer through refresh. But that's the general idea, yeah.
Originally, I was planning to do the refresh area with escalating-index tiles and preload it during HBlank, and do everything else on a constant index. But since DMA aligns to a multiple of 8 master cycles since the last reset, and scanlines aren't divisible by that, there would have been an unavoidable offset of at least one dot every scanline. And since every other frame is a dot cycle short, a static pattern would have been impossible. I thought of trying to pull off a scrolling diagonal pattern, which might have resulted in a higher perceived horizontal resolution due to the larger amount of data present when averaging all frames, but it would have required complicated timed code, and besides, I really wanted to do a direct static equivalent to the Mega Drive's FantomBitmap mode.
In testing, I noticed that ordinary CPU writes to CGDATA would randomly target indices from the main screen and subscreen, about 50:50. Which makes sense, because CPU activity is only quantized to half-dot resolution. But DMA does that 8-clock alignment thing before it starts, and at least on my SNES this results in DMA reliably targeting the main screen.
So what I did was I put a preload pattern in BG1 on the main screen (to set the CGRAM address to receive writes) and a display pattern in BG2 on the subscreen (to output the colours written to the preload pattern), turned on colour addition, and had the PPU clip the main screen to black before math. Turns out it still reads the main screen pixel colour when doing that, so the preload pattern works fine even though you can't see it. (I was originally hoping DMA would target the subscreen, because the PPU will read CGRAM for a BG on the subscreen even if colour math isn't turned on, but this trick works just as well.)
I may have misunderstood something here, but the upshot would appear to be the same...
...
This implementation is interrupt-driven, with a separate DMA transfer for each line (and as you can see from the bsnes image below, the timing is just barely tight enough; IIRC kicking the H-IRQ one dot in either direction results in misses). I imagine it's possible to do it with timed code, which would free up enough cycles to allow an extra DMA to a different destination - say, four bytes to the APU each line... I might try that some time.
The first few colours of the line are preloaded during HBlank, so as to buffer enough colours to get through the DRAM refresh. The required order of the colours is a bit garbled, but that's the one part that every emulator gets right (even ZSNES), so I'm guessing I didn't feed pcx2snes carefully enough (garbled colour indices in my patterns, I imagine). If I can fix it, the data format should be a simple raster list of 15-bit values with no wasted entries. I'll post source once I've sussed that out.
Quote:
Still though, it's pretty close. At least, it's infinitely better than literally everything else

Being lazy, I'm still using bsnes v072 a fair bit (it can open an SFC file on double-click, and it has blargg's NTSC filter); the accuracy core is naturally worse than higan's latest but far better than the rest of the pack. On the other end of the spectrum, ZSNES somehow manages to bodge up even the easy part...
Attachment:
dmacolor_bsnes072.png [ 47.27 KiB | Viewed 2668 times ]
Attachment:
dmacolor_zsnes.png [ 2.16 KiB | Viewed 2668 times ]
Quote:
At any rate, you're a monster, congrats

Thankyew, thankyew...
Espozo wrote:
I wonder how well that would look with a transparent tiled overlay on it.
I don't think that would work, unless there's another way to do this that doesn't completely kill the main screen*. As it stands, colour blending of any kind is impossible (well, you could do averaging instead of addition and get half brightness, but a fat lot of good that does you, especially with the IRQ approach where the whole main screen has to be indexed to receive writes).
However, while I haven't tried it, I imagine the sprite system and the spare 2bpp BG3 layer would display fine if you just sent them to the subscreen. If sprites work, that's at least half a screenful of Quantomatic on top of what I've got here... or heck, you could play Super Mario Bros. with a video of F-Zero GX playing in the background (will the code for SMB fit in VBlank without egregious slowdown?)...
...
* Actually, it occurs to me that my original idea with the diagonal scrolling pattern (or just vertical columns with 30fps edge dither) should work with transparency, since it would be writing directly to the visible layer, and in my experience DMA never accidentally writes to the subscreen. But I don't know how good it'd look...
Quote:
Essentially, he's created a 64x224x15bpp@60fps video mode

With MSU1, you could play back such a stream of video, too. But ... I doubt it'd be very pleasing or useful.
True - unfortunately the aesthetic difference between double-wide pixels and quadruple-wide pixels is fairly substantial. One could add static horizontal dither to the display pattern, but it might look funny (haven't tried it), and the actual amount of information wouldn't change... This technique might be good for impressionist backgrounds, or impressing people with raw colour counts...
Attachment:
6315colours.png [ 33.04 KiB | Viewed 2668 times ]
But I think this does finally answer the OP's question. You can, in fact, do DMA colour on the SNES. Shame about the bus width...
Well, that answers one of my questions about the S-PPU. During rendering, it evidently always does a CGRAM read for every main and sub pixel, regardless of how any of the math registers are set. That's pretty much what I expected but it's nice to have it confirmed. Also, it's interesting that DMA accesses to CGDATA always line up with the main fetches rather than the sub fetches. I wonder if that's true on all SNES models, including the mini-SNES (should be easy to test; if I understand how the demo works correctly, the picture will be more or less completely garbled if it isn't the case)
> The first few colours of the line are preloaded during HBlank, so as to buffer enough colours to get through the DRAM refresh.
Ooooooooooooooh you sneaky, sneaky bastard! :P
That's brilliant, seriously. Double congratulations, you not only broke higan, you've completely stumped and outsmarted me too :D
> Being lazy, I'm still using bsnes v072 a fair bit (it can open an SFC file on double-click, and it has blargg's NTSC filter)
blargg's filter's not coming back, but I'm sure I'll have something in place for direct-launch SFC eventually. Still working on icarus.
As for your test ROM, it might not require very significant changes to pass this one test. But I think as we move around and toggle other effects, it'll break again very easily. It's definitely a fun test and would be nice to run correctly, but I'm thinking that it'd probably be a better use of time to start working on timing out as much as we can through clever OAM/CGRAM accesses. I don't think we're ever going to find someone amazing at both writing SNES test ROMs and using a logic analyzer to trace out what the PPU chips are fetching and when, so I really don't know how we're going to get perfected VRAM access timings (or if it's even important, given it's untestable through emulation.)
> or impressing people with raw colour counts
It's easy to hit all 32768 colors on one screen already. And in fact, it's possible to go beyond 15-bit color, as the INIDISP luminance is an analog effect (which actually makes it really hard to say just how many colors the SNES can really produce. No telling how many colors are matches to other colors at different luminance levels.)
The best usage of this trick is probably to annoy emulator authors :D
byuu wrote:
The best usage of this trick is probably to annoy emulator authors

Such as to make screens where a picture of a character appears on the real thing but a skull and crossbones appear on emulators.
tepples wrote:
Such as to make screens where a picture of a character appears on the real thing but a skull and crossbones appear on emulators.
Are you guys seriously talking about inventing new DRM for a 26 year old console?
... Can I put it in my game? >.>
Khaz wrote:
Are you guys seriously talking about inventing new DRM for a 26 year old console?
Either that or giving an incentive for people to make their emulators more accurate.
Okay, apparently the palette ordering in pcx2snes has nothing to do with the order in which the colours appear in the image. It seems to be in ascending numerical order or something...
Anyway, I seem to have fixed it. The input data format is now sane: 15-bit BGR values left-to-right, row by row top-to-bottom, 64x224. 28 KB exactly. Unless I'm much mistaken, 64x239 should work without any changes other than switching the display height and extending the input data.
Please excuse the name change; I was using a different name entirely in testing so I didn't realize until just now that my Super Everdrive won't load a file with exactly 8 letters in the name... and anyway it's more correct now...
Attachment:
dmacolour.7z [108 KiB]
Downloaded 114 times
byuu wrote:
That's brilliant, seriously. Double congratulations, you not only broke higan, you've completely stumped and outsmarted me too

Careful, my head's gonna pop...
tepples wrote:
A 2bpp layer blended onto this layer to enhance luminance detail would provide the equivalent of 4:1:1 video, with brightness coming mostly from the 2bpp layer and color coming from the 15bpp layer.
I wonder if that might actually produce a half-decent-looking video mode. To be able to use colour math, I'd have to use the timed-code version with a shifting offset pattern, which I haven't written yet so I'm not sure it works... and even at 2bpp, there's not enough bandwidth for the whole screen, so either each frame would have to be optimized to fit (potentially very lossy) or else the video would have to run at a lower frame rate or screen size (which kinda defeats the purpose of the technique)... Even 1-bit luminance would eat 7 KB per frame, or 8 with multiple subpalettes (which seems like a good idea) or a bit more if you want to overwrite the actual subpalettes each frame... but 208 lines at 30fps should work without compressing the 2bpp layer, and 216 lines looks like it should work at
60fps 50fps on PAL...
I don't see why the RGB layer couldn't supply peak brightness information, with the subpaletted 2bpp layer just providing deltas via subtraction. Kinda like BRR...
93143 wrote:
60fps on PAL...
Huh?

You probably mean that 1 hardware frame = 1 video frame, which would be 50fps on PAL, right? Personally, I think that video at 30/25fps using the techniques you're describing would look amazing on the SNES, no need to shoot for 60/50fps.
With the
improperly spelled test ROM, I get this pattern:
(Hcounter in decimal) (CGRAM addr[actual write location])=(value written)
Where actual write location = override of CGRAM write address based on palette fetch from PPU itself
Code:
0024 0003[0002]=21
0032 0004[0003]=04
0040 0005[0004]=21
0048 0006[0005]=04
0056 0007[0006]=22
0064 0008[0007]=04
0072 0009[0008]=42
0080 000a[0009]=04
[[Hstart]]
0088 000b[0002]=42 <- will write 4204 (takes the last write before Hstart for the low byte latch)
0096 000c[0004]=08
0104 000d[0004]=21
0112 000e[0006]=04
0120 000f[0006]=21
0128 0010[0008]=04
0136 0011[0008]=21
0144 0012[000a]=08
0152 0013[000a]=41
0160 0014[0002]=08 0168 0015[0002]=42
0176 0016[0004]=08 0184 0017[0004]=65
0192 0018[0006]=0c 0200 0019[0006]=63
0208 001a[0008]=0c 0216 001b[0008]=63
0224 001c[000a]=10 0232 001d[000a]=62
0240 001e[0002]=0c 0248 001f[0002]=a4
0256 0020[0004]=18 0264 0021[0004]=83
0272 0022[0006]=10 0280 0023[0006]=29
0288 0024[0008]=19 0296 0025[0008]=6b
0304 0026[000a]=25 0312 0027[000a]=ad
0320 0028[0002]=29 0328 0029[0002]=84
0336 002a[0004]=0c 0344 002b[0004]=28
0352 002c[0006]=19 0360 002d[0006]=6d
0368 002e[0008]=5f 0376 002f[0008]=ef
0384 0030[000a]=73 0392 0031[000a]=08
0400 0032[0002]=57 0408 0033[0002]=62
0416 0034[0004]=21 0424 0035[0004]=44
0432 0036[0006]=36 0440 0037[0006]=64
0448 0038[0008]=3e 0456 0039[0008]=48
0464 003a[000a]=29 0472 003b[000a]=cf
0480 003c[0002]=41 0488 003d[0002]=ae
0496 003e[0004]=39 0504 003f[0004]=4b
0512 0040[0006]=29 0520 0041[0006]=af
0528 0042[0006]=35 0536 0043[0006]=54
[[DRAM refresh]] <- skips 0008,0009 [color# 4]
0584 0044[000a]=4e 0592 0045[000a]=50
0600 0046[0002]=63 0608 0047[0002]=f1
0616 0048[0004]=7b 0624 0049[0004]=ab
0632 004a[0006]=52 0640 004b[0006]=26
0648 004c[0008]=25 0656 004d[0008]=27
0664 004e[000a]=3a 0672 004f[000a]=27
0680 0050[0002]=3a 0688 0051[0002]=4c
0696 0052[0004]=46 0704 0053[0004]=d3
0712 0054[0006]=5e 0720 0055[0006]=14
0728 0056[0008]=67 0736 0057[0008]=57
0744 0058[000a]=6f 0752 0059[000a]=98
0760 005a[0002]=7b 0768 005b[0002]=55
0776 005c[0004]=77 0784 005d[0004]=d0
0792 005e[0006]=6a 0800 005f[0006]=cf
0808 0060[0008]=66 0816 0061[0008]=94
0824 0062[000a]=5a 0832 0063[000a]=56
0840 0064[0002]=18 0848 0065[0002]=56
0856 0066[0004]=14 0864 0067[0004]=54
0872 0068[0006]=14 0880 0069[0006]=70
0888 006a[0008]=18 0896 006b[0008]=54
0904 006c[000a]=14 0912 006d[000a]=79
0920 006e[0002]=4e 0928 006f[0002]=d0
0936 0070[0004]=3d 0944 0071[0004]=77
0952 0072[0006]=42 0960 0073[0006]=12
0968 0074[0008]=42 0976 0075[0008]=8c
0984 0076[000a]=35 0992 0077[000a]=55
1000 0078[0002]=52
1008 0079[0002]=54
1016 007a[0004]=4a
1024 007b[0004]=6f
1032 007c[0006]=3a
1040 007d[0006]=ca
1048 007e[0008]=29
1056 007f[0008]=6c
... kind of goes haywire here
1064 0080[0002]=46
1072 0081[0002]=ab
1080 0082[0002]=2d
[[Hend]]
The data doesn't look good at all.
Sometimes it starts at Hcounter=20, sometimes at Hcounter=24.
But it's starting way before the screen is supposed to be rendering.
Screen rendering is, as far as I know, around Hclock 88 - 1095. I based that off actual writes to CGRAM during active display and seeing when the write address was overridden. So that's where the timings in the MMIO write come from.
My main PPU render loop sleeps 28 clocks, then processes seven 'negative pixels' (to handle when you have Hscroll&7!=0 on tiledata) which is another 28 clocks. That totals 56 clocks.
The minimum value that runs your test correctly is 54. And 54+28=82. The highest value that works is 60, which of course 60+28=88.
This gives us much nicer data:
Code:
0024 0003[0002]=21 0032 0004[0003]=04
0040 0005[0004]=21 0048 0006[0005]=04
0056 0007[0006]=22 0064 0008[0007]=04
0072 0009[0008]=42 0080 000a[0009]=04
[[Hbegin]]
0088 000b[000a]=42 0096 000c[000a]=08
0104 000d[0002]=21 0112 000e[0002]=04
0120 000f[0004]=21 0128 0010[0004]=04
0136 0011[0006]=21 0144 0012[0006]=08
0152 0013[0008]=41 0160 0014[0008]=08
0168 0015[000a]=42 0176 0016[000a]=08
0184 0017[0002]=65 0192 0018[0002]=0c
0200 0019[0004]=63 0208 001a[0004]=0c
0216 001b[0006]=63 0224 001c[0006]=10
0232 001d[0008]=62 0240 001e[0008]=0c
0248 001f[000a]=a4 0256 0020[000a]=18
0264 0021[0002]=83 0272 0022[0002]=10
0280 0023[0004]=29 0288 0024[0004]=19
0296 0025[0006]=6b 0304 0026[0006]=25
0312 0027[0008]=ad 0320 0028[0008]=29
0328 0029[000a]=84 0336 002a[000a]=0c
0344 002b[0002]=28 0352 002c[0002]=19
0360 002d[0004]=6d 0368 002e[0004]=5f
0376 002f[0006]=ef 0384 0030[0006]=73
0392 0031[0008]=08 0400 0032[0008]=57
0408 0033[000a]=62 0416 0034[000a]=21
0424 0035[0002]=44 0432 0036[0002]=36
0440 0037[0004]=64 0448 0038[0004]=3e
0456 0039[0006]=48 0464 003a[0006]=29
0472 003b[0008]=cf 0480 003c[0008]=41
0488 003d[000a]=ae 0496 003e[000a]=39
0504 003f[0002]=4b 0512 0040[0002]=29
0520 0041[0004]=af 0528 0042[0004]=35
0536 0043[0006]=54 0584 0044[0006]=4e <- DRAM refresh in the middle
0592 0045[0008]=50 0600 0046[0008]=63
0608 0047[000a]=f1 0616 0048[000a]=7b
0624 0049[0002]=ab 0632 004a[0002]=52
0640 004b[0004]=26 0648 004c[0004]=25
0656 004d[0006]=27 0664 004e[0006]=3a
0672 004f[0008]=27 0680 0050[0008]=3a
0688 0051[000a]=4c 0696 0052[000a]=46
0704 0053[0002]=d3 0712 0054[0002]=5e
0720 0055[0004]=14 0728 0056[0004]=67
0736 0057[0006]=57 0744 0058[0006]=6f
0752 0059[0008]=98 0760 005a[0008]=7b
0768 005b[000a]=55 0776 005c[000a]=77
0784 005d[0002]=d0 0792 005e[0002]=6a
0800 005f[0004]=cf 0808 0060[0004]=66
0816 0061[0006]=94 0824 0062[0006]=5a
0832 0063[0008]=56 0840 0064[0008]=18
0848 0065[000a]=56 0856 0066[000a]=14
0864 0067[0002]=54 0872 0068[0002]=14
0880 0069[0004]=70 0888 006a[0004]=18
0896 006b[0006]=54 0904 006c[0006]=14
0912 006d[0008]=79 0920 006e[0008]=4e
0928 006f[000a]=d0 0936 0070[000a]=3d
0944 0071[0002]=77 0952 0072[0002]=42
0960 0073[0004]=12 0968 0074[0004]=42
0976 0075[0006]=8c 0984 0076[0006]=35
0992 0077[0008]=55 1000 0078[0008]=52
1008 0079[000a]=54 1016 007a[000a]=4a
1024 007b[0002]=6f 1032 007c[0002]=3a
1040 007d[0004]=ca 1048 007e[0004]=29
1056 007f[0006]=6c 1064 0080[0006]=46
1072 0081[0008]=ab 1080 0082[0008]=2d
[[Hend]]
But any value above 44 will break Air Strike Patrol's "Good Luck" rotating text (the very last tile will start to flicker.) ASP of course uses BG3HOFS writes. So most likely, the HOFS caching is happening at the wrong time. And I'm pretty sure even lower numbers were needed to square away some glitchiness in other games (probably Super Mario World's water level.)
So basically, we're stuck in a whack-a-mole situation here. We're not ever going to perfectly emulate the PPU by futzing numbers until tests pass. We have to be able to time this stuff out properly. And I don't believe it's possible to do it through software and screenshots.
But, for the time being, enjoy this screenshot:
tepples wrote:
Such as to make screens where a picture of a character appears on the real thing but a skull and crossbones appear on emulators.

OK probably Fusion wasn't the best for this (due to the display off gap being visible with it) but oh well.
Khaz wrote:
Are you guys seriously talking about inventing new DRM for a 26 year old console?
Custom hardware on the cartridge would be much easier for that. Especially with the social pressure for emulator authors to
not add emulation for it (because somehow it's OK to copy licensed games but not homebrew).
Quote:
Custom hardware on the cartridge would be much easier for that. Especially with the social pressure for emulator authors to not add emulation for it (because somehow it's OK to copy licensed games but not homebrew).
We've always had a huge double standard there. I've pointed out how hypocritical it was in the past.
Reproductions of game carts are fine, but of fan translations / ROM hacks are a sin.
Distributing commercially copyrighted ROMs is fine, but distributing pre-patched ROM hacks is very wrong.
Emulating commercial games and Chinese bootlegs is fine, but emulating Watermelon stuff is sacrilege.
We have to respect copyright unless it's our scene violating it, then it's okay.
Luckily, my own stance is not to emulate anything but commercial hardware (and my own stuff, heh), so I don't have to be faced with the dilemma of Project-N coprocessor emulation (assuming it ever comes out :P) I mean, I still wouldn't do it anyway out of personal respect to d4s, but yeah. There's an ethical quandary about doling out special favors like that. Choosing not to emulate
anything unofficial is a wonderful way to avoid it entirely.
If history's any indication though, they can probably expect someone else to release an unofficial SNES emulator build that supports it; like they did for Pier Solar and MESS/HazeMD. Maybe they'll anticipate that and make the chip use some really nasty tricks like crypto.
byuu wrote:
We've always had a huge double standard there. I've pointed out how hypocritical it was in the past.
Reproductions of game carts are fine, but of fan translations / ROM hacks are a sin.
Distributing commercially copyrighted ROMs is fine, but distributing pre-patched ROM hacks is very wrong.
Is this likewise hypocritical?
For Sale: Star Fox 2 ROM and CIC chips
Whether you're okay with that is based on several factors:
* whether it's the translated ROM or not (I don't personally care)
* whether you're for or against people destroying retail carts (I am very much against it)
* whether you're for or against the original copyright owner's rights to the game (I don't care at all since they won't sell it)
These factors will vary from person to person. But in my own case, I'm against it because it's destroying real SuperFX carts.
The word hypocritical would be whether you engage in any of the above, but are against this guy's work. Or whether he's against something despite what he's selling.
byuu wrote:
* whether you're for or against the original copyright owner's rights to the game (I don't care at all since they won't sell it)
Whether the game's copyright owner can get nesdev.com shut down.
tepples wrote:
Whether the game's copyright owner can get nesdev.com shut down.
Might as well throw every other fan site in there too, then... It's all small beans IMO.
Sik wrote:
tepples wrote:
Such as to make screens where a picture of a character appears on the real thing but a skull and crossbones appear on emulators.

OK probably Fusion wasn't the best for this (due to the display off gap being visible with it) but oh well.
How many colors are you updating per line in that part of the demo?
15 colors per line. If I recall correctly kabuto told me he could get 23 in before getting serious artifacts, so there's some time left for sprite processing (hence the spinning text - the "YOUR EMULATOR SUXX" message appears if too many sprites get rendered, as only about 30 can be processed with the remaining time).
byuu wrote:
Distributing commercially copyrighted ROMs is fine, but distributing pre-patched ROM hacks is very wrong.
Reminds me of a variant: somehow standalone patches are OK, despite requiring the copyrighted ROM to work anyway. Even worse, the patches are made from a modified ROM, meaning that even the standalone patches are derivative works (and hence illegal). Ugh. And those patches usually include stuff that would be infringement anyway (e.g. artwork from existing characters).
But maybe this is just me being grumpy about having to find an IPS patcher that doesn't crash every time you attempt to use it.
Sik wrote:
But maybe this is just me being grumpy about having to find an IPS patcher that doesn't crash every time you attempt to use it.
I'd hope that ones written in
perl or
python wouldn't crash...
Sik wrote:
Reminds me of a variant: somehow standalone patches are OK, despite requiring the copyrighted ROM to work anyway. Even worse, the patches are made from a modified ROM, meaning that even the standalone patches are derivative works (and hence illegal). Ugh. And those patches usually include stuff that would be infringement anyway (e.g. artwork from existing characters).
Technically, the translation itself is illegal under the Berne Convention. Only the copyright owner can authorize translations. So fan translation patches are already entirely illegal.
And like you said, the patches do often carry over modified sprites, and IPS patchers especially tend to optimize by sometimes including parts of the original file instead of starting a new pointer.
There's no reason companies
can't sue over free patches, they're just smart enough to realize that nothing would come out of it but terrible PR and lots of lawyer fees they'll never recoup.
It's pretty much impossible to explain anything to the ROM hacking scene. I'm not really sure why it is, but that scene seems to have just never grown up like the rest of us have. I'm only half surprised they still don't believe in the "24-hour rule" or the "Bill Clinton's bill says the authorities can't view this site!" nonsense from the '90s.
Quote:
But maybe this is just me being grumpy about having to find an IPS patcher that doesn't crash every time you attempt to use it.
Use Alcaro's Flips tool for the remaining IPS patches, and switch over to BPS going forward :D
BPS beats IPS in pretty much every possible way. Smaller files, integrity checking, validation checking, no limit on file size, delta encoding, can store authorship metadata, no chance for glitches (file offset of 'EOF'), no weird extensions (cut, etc), no trouble with copier headers (coin toss successful patching),still a dead simple format, etc.
byuu wrote:
no chance for glitches (file offset of 'EOF')
To be fair, the whole "EOF" thing wasn't a problem originally since IPS was meant for Mega Drive ROMs (which could be at most 4MB (SSF2 wasn't dumped yet), thereby not running into the range taken by that sentinel value). Why it used a sentinel instead of just checking for the file's actual EOF is beyond me though.
I have a question for byuu, or anyone else who might happen to know the answer and/or possess a TV capable of visibly displaying all 224 lines of a SNES video signal (mine has too much overscan).
I couldn't help but notice that when my DMA colour demo is run in bsnes or higan, it shows a brown line at the top of the screen, and the rest of the image is one line lower than it should be. So I moved the IRQ activation target from line 0 to line 261, with the result that the picture moved up one line to its correct position - but the top line was still not correct; it was just white instead of brown.
This seems exactly as though the H-IRQ on line 0 simply isn't firing. If it's the first DMA, it gets delayed one line, and the first line displays a repeating pattern consisting of the last few colours written at the end of the last frame. If it's the second DMA, the first one was in VBlank and presumably blew past the index range of the display pattern due to the lack of PPU fetches to redirect the writes, so the first line just displays the first few colours of the image in a repeating pattern.
I could be wrong, but that's certainly how it looks like it's behaving. And I can't find any documentation that says it should behave this way. The closest I can find is this:
anomie wrote:
Also, no IRQ will trigger for dot 153 on the short scanline in non-interlace mode, and no IRQ will trigger for dot 153 on the last scanline of any frame.
But I'm not triggering anywhere near dot 153 (or 153h for that matter), nor is either of the noted scanlines the one that's giving me trouble.
...
My question is: Is this accurate behaviour? Or would a real SNES display the image correctly? Or could there be something else going on other than what I've inferred above; some sort of goofy bug in my code?
I've come up with a fix, and it seems to work in higan, but I'd rather not post it until I'm reasonably sure that (a) there was an actual problem, and (b) I actually fixed it...
My HDTV can display all 224 lines. I tested it with 240p Test Suite (for Super NES) 1.02.
Attachment:
grid_test_success.jpg [ 144.97 KiB | Viewed 2425 times ]
I was planning on showing zoomed in version of the top row of pixels, but it turns out that on my 1/1/1 Super NES and SNES PowerPak running MUFASA firmware 11325, "dmacolour.7z" from
this post runs like crap. Did I load the wrong ROM?
Attachment:
dmacolour_test_fail.jpg [ 140.48 KiB | Viewed 2425 times ]
tepples wrote:
I was planning on showing zoomed in version of the top row of pixels, but it turns out that on my 1/1/1 Super NES and SNES PowerPak running MUFASA firmware 11325, "dmacolour.7z" from
this post runs like crap. Did I load the wrong ROM?
There's a known bug related to -- you guessed it -- (H)DMA in the MUFASA firmware prior to build #11331, causing e.g.
King of Dragons to fail completely. Shame on me.

The good news is that a fix has been available for over a month now.

Please download the
latest build and try again. If it still doesn't work, you can at least safely rule out any PowerPak-related issues.
On 11331 I still get glitches, but now they're moving.
At least that would imply the PowerPak is at fault. How can firmware break HDMA, though?
That's not HDMA. It's regular DMA during active display, just like FantomBitmap on the Mega Drive.
And it's possible that it doesn't work properly on every console revision - it depends on the timing of the PPU fetch pattern, and on DMA aligning itself in such a way as to reliably target the CGRAM entry last read for the main screen as opposed to the subscreen (half-dot alignment). It works fine on my SNES, with a Super Everdrive, but mine is at least a 2/1/3 and potentially a 1CHIP (serial number is UN231084565). Disappointing if true (though it might be possible to change things around and get it working again), but I can't rule it out.
Those screenshots look an awful lot like what bsnes v072 gives me...
...except that it looks like the glitch line is missing, but considering how hard the rest of it failed I can't say I'm satisfied...
NTSC 2/1/1 with sd2snes v0.1.7pre4, before and after pressing the reset button.
Actually, sometimes I'm able to get it to come up glitch-free the first time, but sometimes a reset is necessary (and sometimes resetting from a "good" screen will boot back into a glitched one).


In other words, it looks like the timing of the DMA is still hit and miss and worked "reliably" in previous tests by pure coincidence?
For what matters, its MD counterpart relies on a refresh slot for timing (as those are longer). Without the refresh delay it simply would be impossible to make it reliable (VDP is too fast for the 68000). What are you using as an anchor point on the SNES?
> In other words, it looks like the timing of the DMA is still hit and miss and worked "reliably" in previous tests by pure coincidence?
When I looked at the timing of the CGRAM writes via my tracer, it was pretty clear that 93143 wasn't synchronizing the start of the DMAs perfectly. There was fluctuation of 4-8 clocks for every transfer.
I know it's an extremely complicated issue, but it is possible to synchronize the H,V counters and DMA clock all to zero. And from there, have a perfectly reproducible state every time. Although it's still going to be hard to get around DRAM refresh moving around between hardware revisions.
It's really not worth the effort. It's a fun proof-of-concept, but totally useless for any real effects. The horizontal resolution is straight up Atari 2600-level garbage :(
byuu wrote:
Although it's still going to be hard to get around DRAM refresh moving around between hardware revisions.
Well, couldn't the program see what console version it is and adjust accordingly?
byuu wrote:
It's a fun proof-of-concept, but totally useless for any real effects.
If it is possible to overlay a BG layer on top of this, then it won't. I remember someone saying that even a 1bpp layer (well, divide a 2bpp layer in half) wouldn't be able to be fully updated at 60fps, but I imagine it could if you just took off a couple of pixels from the top and the bottom, it would work. I'm not sure how much you'd gain though in terms of picture quality though, and I wonder what amount you'd want to be added, if you were using color addition. I forgot, why can't using color math work the way you have it 93143?
You know, what's the technical explanation as to why this goes by every 4 pixels instead of every 2 like on the Genesis?
Espozo wrote:
byuu wrote:
Although it's still going to be hard to get around DRAM refresh moving around between hardware revisions.
Well, couldn't the program see what console version it is and adjust accordingly?
That might be a slight challenge when like 80% of consoles in the wild all call themselves 2/1/3 even though the hardware is vastly different. I don't know if DRAM refresh timing differs as much as the overall picture quality does, but reading the hardware revision numbers might not be enough.
It may not be the most "professional" thing ever, but you could always have a sort of manual calibration test at the beginning.
I've never had it not work on my SNES, reset or cold boot, reload or rerun. It's rock-solid every time. Maybe it's the rev.1 PPU2 causing the problem? Or maybe the Super Everdrive is shielding me from the failure mode somehow?
Sik wrote:
What are you using as an anchor point on the SNES?
I'm kinda cheating, actually. This is IRQ-driven with no special measures; hence the jitter byuu found.
There's one DMA transfer per scanline, which means the data is exactly what fits on the screen (28KB) in a linear bitmap format. Originally I was going to trigger the DMA with timed code, which would have required me to use an IRQ at the top of the screen followed by a cycle-counted stagger-stepping technique based on an H-counter read to align to a particular dot (I've done this before). And once I found out that DMA always targets the main screen (on my SNES) I started thinking about exploiting that to align the CPU to half-dot precision. Unfortunately, if I've correctly understood the way DMA aligns itself, just writing straight to the active layer would inevitably result in a one-dot offset between scanlines, and the offset pattern would change every two frames.
So I set up a loading pattern on the main screen and a display pattern on the subscreen, turned on colour addition, and set it to clip the main screen to black before math (which results in the subscreen being displayed by itself). So the DMA now writes to the CGRAM index pointed to by the (invisible) main screen, and a bit later the subscreen displays the colour. This allows DMA writes to happen with up to three dots worth of jitter with no visible artifacting, and it also allows me to start the DMA before the beginning of the active scanline and buffer enough colours to get through the DRAM refresh (I could have done that with the live-write technique, but it would have required a less straightforward data format).
What it looks like to me is that the failure cases seem to involve the DMA targeting the subscreen instead of the main screen. This would actually be trivial to detect without checking the version number. If the alignment is consistent within a single runtime (ie: only a reset or power cycle can change it), it may be possible to fix this...
byuu wrote:
DRAM refresh moving around between hardware revisions.
How much? fullsnes doesn't mention this...
Espozo wrote:
I forgot, why can't using color math work the way you have it 93143?
Because I need an uninterrupted loading pattern on the main screen, and it has to be invisible.
Using timed code to write directly to a display layer would probably allow colour math, but because of the DMA alignment issue (8 master clocks since reset, and a scanline isn't a multiple of that; neither is every other frame) I wouldn't get nice static vertical columns like you see here. This might not matter as much for a video mode, but it'd probably look goofy for a still.
Quote:
You know, what's the technical explanation as to why this goes by every 4 pixels instead of every 2 like on the Genesis?
DMA on the Mega Drive goes at two pixels per word, just like on the SNES, but the word size is 16-bit. VRAM only accepts half a word, which is why screen coverage per VBlank is almost identical to the SNES (a touch lower, actually, but you can keep going into active display at a much slower rate if you don't mind the CPU hit). But CRAM and VSRAM take 16-bit words, meaning you can DMA a 9-bit value to CRAM in one shot.
The SNES has an 8-bit data bus, and the DMA unit is on the CPU, so it's 8-bit no matter what. This means you need two writes to update a colour, and at two pixels per byte that's four pixels.
93143 wrote:
VRAM only accepts half a word
You mean if it could accept a whole word, you could upload twice as much tile data? What happens to the unused 8 bits?
93143 wrote:
I wouldn't get nice static vertical columns like you see here.
You mean like some lines would be shifted over a pixel or something like that?
I tried it again on my 1/1/1 and it loaded just fine, nice and stable. Reset and it's glitchy as ass. Reset again and "No Signal!" Reset again and it's fine. But I can see some banding, especially on the brown table at top center and on Leah's black shirt (at left). In 256-color, one could use dithering to make this banding less visible, but not so much luck when pixels are that wide (PAR: 32:7).
Now as for the reason I chose to try it in the first place: I can't see a brown line at top, but it might be blending with the black of the border. (My TV shows 226 lines, including one line of border at top and bottom.)
Espozo wrote:
93143 wrote:
VRAM only accepts half a word
You mean if it could accept a whole word, you could upload twice as much tile data? What happens to the unused 8 bits?
I thinl that one refers to the MD case... the VRAM bus is 8-bit, so words take twice as long to be transferred. So yeah, nothing is lost, it's just wasteful.
You can configure it to use 16-bit VRAM and indeed that'll double the transfer bandwidth (oh boy). This setup also supports twice as much VRAM. The only reason this wasn't used is because it probably was too expensive (Sega engineers thought it wasn't much of a gain...)
EDIT: also wait what, "no signal"?
tepples wrote:
I tried it again on my 1/1/1 and it loaded just fine, nice and stable. Reset and it's glitchy as ass. Reset again and "No Signal!" Reset again and it's fine.
I've tried various combinations of boot/reset and load/reload with my SNES+Super Everdrive, and I still can't get it to not work. It could be the Everdrive, or it could be the system revision...
No signal? That doesn't sound like a software bug on my end... the PPU should be outputting a signal of
some description no matter what, right?
Quote:
But I can see some banding, especially on the brown table at top center and on Leah's black shirt (at left).
That's just what 15-bit colour looks like. Only way out is dither, and it'd have to be done with reference to the original true-colour image (and, as you note, probably with the assistance of a higher-resolution luminance mask, which I can't use with this version of the technique).
Quote:
I can't see a brown line at top
Neither can I,
and I can see the top line of the image just fine. Especially in Revenant's stable image, but on your glitchy ones too. Not that that matters, since this ROM isn't the one that cuts off the top line... More importantly, I can see the
bottom line on Revenant's image, which should rule out a downshift. If you can see it too, then either the rev.3 PPU2 (which I understand is the standard byuu is targeting) is causing a problem with the top line, or higan is doing it wrong.
This is what higan's output looks like when the emulator is fudged to make the timing work (
thank you byuu):
Attachment:
Untitled.png [ 74.82 KiB | Viewed 2376 times ]
And here's the raw image (take
Wii_kids_truecolor.png, linearly interpolate to 128x224, linearly interpolate to 64x224, posterize to 32 levels, nearest-neighbour back to 256x224 (for display purposes), and save):
Attachment:
dmacolour.png [ 22.86 KiB | Viewed 2391 times ]
Espozo wrote:
93143 wrote:
I wouldn't get nice static vertical columns like you see here.
You mean like some lines would be shifted over a pixel or something like that?
Every other line.
According to the documentation, DMA always aligns itself to a multiple of 8 master clocks since the last reset, and since a line is 1324 master clocks long, DMA would inevitably be offset by 4 master clocks (one pixel) every line. And since frames alternate between being divisible by 8 master clocks and not being divisible by 8 master clocks, the pattern would change at 30 fps.
It might be possible to exploit this to produce a zigzag pattern, or to treat it as chroma dither in that pseudo-4:1:1 video mode tepples was talking about. But straight columns are impossible that way.
I guess No Signal could happen if you mess with the video output, e.g. on the Mega Drive you could be constantly changing resolution mid-screen to mess with the timings of the scanlines (also I think there's a bit that outright removes the sync signals, period). I don't know if there's anything on the SNES that could be used to do that... or that could be being used here.
Maybe some really badly timed transfer messed with outputting the hblank sync?
Sik wrote:
Maybe some really badly timed transfer messed with outputting the hblank sync?
I really don't see how. The DMA unit is on the CPU, not the PPU, and during the main loop I'm not touching anything on the PPU bus except CGADD and CGDATA, plus the H/V counter registers in a loop in VBlank, plus one write to INIDISP (controls brightness, turns rendering on and off) at the end of VBlank. And before that there's nothing unusual; just standard VRAM and display setup. This is all supposed to be more or less isolated from the actual picture generation, which is on the far side of the PPU. Even if nothing is being shown on screen, it should still send a black picture...
Unless it can be replicated on multiple setups, I'm going to assume it was a hardware issue.
...
In other news, I've managed to get a stable picture in bsnes v072, by switching the roles of the main screen and subscreen based on an alignment test. Unfortunately I think I screwed up the alignment test, because it now breaks on my real SNES. Lemme see if I can't fix it...
Got it. I stupidly forgot that you have to mask off the top 7 bits of the H/V counter values, so the test DMA was happening outside active display...
This ROM works both in bsnes v072 (except for the brown line) and on my real SNES (as far as I can tell), and thus might work reliably on earlier-rev consoles.
> Well, couldn't the program see what console version it is and adjust accordingly?
You could run heuristics (like a test ROM) to detect the version and then run different programs, but that's getting a bit nuts.
> If it is possible to overlay a BG layer on top of this, then it won't.
If there is a BG layer or sprites, then the PPU will be fetching different color indexes, and the DMA writes to CGRAM won't go to the correct locations.
The way this trick works ... during active display, the CGRAM address is asserted by the PPU, overriding the user-specified address. It points at the pixel color being fetched in, and apparently it manages to write the new color before the PPU loads in the value through abuse of main/sub screen fetching patterns.
It's really very impressive 93143 made this work at all, but there's not really any more room to push things more.
> You know, what's the technical explanation as to why this goes by every 4 pixels instead of every 2 like on the Genesis?
It's the length of time it takes to write via DMA to the palette register twice for each color (8 cycles per write, 4 cycles per pixel). You can't get away with only one write to CGRAM registers.
> it also allows me to start the DMA before the beginning of the active scanline and buffer enough colours to get through the DRAM refresh
I still don't entirely get this because DRAM refresh is smack in the middle of the screen for 40 clocks. But obviously you made it work :D
> How much? fullsnes doesn't mention this...
It's always at 534 on CPUr2; whereas it toggles between 530 and 538 each scanline on CPUr1 (and the toggle doesn't happen on NTSC non-interlaced frame 1 scanline 240 due to the missing dot.) The way the latter one works is actually a clock divider by 8 triggering it, and 1364 not being evenly divisible by 8 is why the position shifts each scanline. So CPUr1 probably piggy-backed on the DMA clock divider. I have no idea why Nintendo felt the need to change the DRAM refresh timing for CPUr2. It wasn't related to the DMA/HDMA CPU crashing bug.
> EDIT: also wait what, "no signal"?
It's tepples. He apparently lives inside of a black hole (or something else that causes huge amounts of electronic interference) with decade-old hardware, so his computing equipment always experiences weird failures that nobody else ever sees :P
byuu wrote:
You could run heuristics (like a test ROM) to detect the version and then run different programs, but that's getting a bit nuts.
Uniracers is nuts.
byuu wrote:
> EDIT: also wait what, "no signal"?
It's tepples. He apparently lives inside of a black hole (or something else that causes huge amounts of electronic interference) with decade-old hardware, so his computing equipment always experiences weird failures that nobody else ever sees

My Super NES is from launch in 1991, but my TV isn't quite a decade old (a Vizio from 2007, shortly before Circuit City went under). I imagine a lot of people use TVs even older than that so that the Super Scope will still work.
Newer TVs are the most prone to barf to the video signals from old consoles =P
I was wondering if something caused the SPPU to output a tiny burst of color or something like that during hblank that could cause such a TV to outright treat it as invalid.
dmac_align.sfc now seems to work flawlessly on the 2/1/1. Nice work!
Tested this on my 1/1/1 Super Famicom (PowerPak/MUFASA build #11331): No errors at all with dmacolor.sfc (downloaded Jan 09, 2016), but immediate glitches with dmac_align.sfc (no glitches after tapping Reset though).
Then again, with
that poor resolution, I don't see the point of all this. If it's just about displaying an image, why not just use Mode 3, with a decent 256-color indexed palette?

Though this demo is of a still image as a proof of concept, a subsequent demo could use MSU1 video. This 64-pixel-wide mode can be updated at 60 fps, unlike DMA which is limited to about 6K/frame without letterboxing. This would take ten frames to load a new 256x224 pixel frame, with visible tearing due to not enough RAM for a double buffer.
tepples wrote:
This 64-pixel-wide mode can be updated at 60 fps
Once again, the sheer resolution is scaring me off.
tepples wrote:
unlike DMA which is limited to about 6K/frame without letterboxing.
What's so bad about letterboxing though?

Ramsis wrote:
tepples wrote:
unlike DMA which is limited to about 6K/frame without letterboxing.
What's so bad about letterboxing though?

The fact that you'd need a heck of a lot of it to display at 60 fps. I imagine most people are seeking at least 256x160 pixels to fill an HDTV when zoomed.
There are 262 lines in an NTSC frame. Let's assume that one of those lines is needed as overhead to set up blanking, transfer, unblanking, and scroll, leaving 261 lines. These are divided into "display" lines and "load" lines. Each "load" line provides 165 bytes; each "display" line consumes 256 bytes.
So we have D + L = 261, and 256*D = 165*L.
Substituting:
256*D = 165*(261 - D)
Distributive law:
256*D = 43065 - 165*D
Add 165*D to both sides:
421*D = 43065
That leaves you with roughly a 256x102 pixel window that can be updated at 60 Hz.
I can try the calculation again with a 192-pixel-wide window:
D + L = 261, and 192*D = 165*L
Substituting:
192*D = 165*(261 - D)
Distributive law:
192*D = 43065 - 165*D
Add 165*D to both sides:
357*D = 43065
This leaves 192x120 pixels.
Revenant wrote:
dmac_align.sfc now seems to work flawlessly on the 2/1/1. Nice work!
Excellent.
Ramsis wrote:
Tested this on my 1/1/1 Super Famicom (PowerPak/MUFASA build #11331): No errors at all with dmacolor.sfc (downloaded Jan 09, 2016), but immediate glitches with dmac_align.sfc (no glitches after tapping Reset though).
1) I really hope you got that backwards.
2) If you didn't, what did the glitching look like? Was it an unholy mess like tepples and Revenant posted earlier, or was it localized around the DRAM refresh, or something else? A screenshot would be ideal, if you can get one.
Quote:
Then again, with that poor resolution, I don't see the point of all this.
There are two: first, to answer Sik's question at the beginning of this thread (also because I wanted to see if it could be done).
Second, as tepples has suggested, it may be possible to get a useful video mode out of this. Depending on the application, it might be fine as it stands, but it would probably look much better if a higher-resolution luminance mask could be added. The way the trick works now, it's impossible, but if some predictable flickering and shifting of the direct-colour pixels is allowable it should be possible to just write directly to an active layer, freeing up either the main screen or the subscreen for colour math. The DMA transfers would have to be triggered by raster-aligned timed code rather than IRQs to get the necessary precision, but I'm pretty sure my buffering trick to hide the DRAM refresh would still work at the cost of a less straightforward data format.
...actually, it occurs to me that since subtraction is the most obvious way to implement a luminance mask, the active layer should probably be on the main screen. So if I can't rely on DMA consistently targeting the main screen after every reset, and I can't change the DMA/PPU alignment in software, the program might have to prompt the user to reset until the correct alignment shows up, either directly or by pretending to have failed to boot... but both of those are spectacularly ugly hacks, on top of the assumption that no SNES exists in which DMA reliably targets the subscreen after every reset... Maybe it'd be better to just use addition or averaging and accept the loss of saturation...
Is there an existing special chip that can allow software access to /RESET?
93143 wrote:
Ramsis wrote:
Tested this on my 1/1/1 Super Famicom (PowerPak/MUFASA build #11331): No errors at all with dmacolor.sfc (downloaded Jan 09, 2016), but immediate glitches with dmac_align.sfc (no glitches after tapping Reset though).
1) I really hope you got that backwards.
I did not.
93143 wrote:
2) If you didn't, what did the glitching look like? Was it an unholy mess like tepples and Revenant posted earlier, or was it localized around the DRAM refresh, or something else?
It looked almost exactly like in tepples' video.
That kind of makes me want to try it again on my console just to make sure the good results weren't a fluke. I was able to get it to display correctly all 10 or so times that I tried with the newer ROM, but...
I guess you're going to need to test on a lot of revisions and flashcarts to be 100% sure.
Yeah, it's a somewhat brittle method; even with the timing test it depends on assumptions that are hard to fully verify. Using timed code to line up the writes and just putting up with the shifting DMA alignment pattern would probably have been more reliable (since it wouldn't matter which screen it targeted, so I could just put the same thing on both), but I wanted straight columns because the MD version has them, and that necessitated the main/sub double shuffle that's the source of the weird timing requirements.
The newer code (dmac_align) tests DMA/PPU alignment by putting a constant nonzero index on the main screen and a different constant nonzero index on the subscreen, zeroing both corresponding entries in CGRAM, turning the screen on, triggering a two-byte DMA to CGRAM during active display, turning the screen back off, and checking where the write went. Unless the timing shifts during runtime, this should guarantee a working method, right? Except...
...based on testing in Snes9XW combined with sober second thought, there's a possibility that my timing test as originally written didn't have sufficient accuracy to guarantee the correct result. I have rewritten it to use an IRQ instead of polling H/V values; the write should now happen during the early portion of an active scanline, well before DRAM refresh and nowhere near the end of the colour fetch pattern. Once again, I've tested it in bsnes v072 and on my real SNES via Super Everdrive, and it works as expected in both.
If this doesn't work I'm not sure what to say, other than "how did the PowerPak manage to screw up (H)DMA with the old firmware, and could it still be doing something like it?". If the good boots stay good indefinitely, it means the half-dot alignment is consistent within a run. So as I see it, there should be no way for this to fail unless (a) there's sub-half-dot variance going on and quarter-dot alignment is too electrically finicky for reliable writes, or (b) something is kicking the alignment somehow in between the test and the display loop. Can anyone think of any other reason?
DMA is described as always aligning itself to an even multiple of 8 master clocks since reset before starting a transfer. Is this true of all models?
More like "how can flashcart firmware screw up with DMA at all?". Unless it's leaving some register in a non-default state...
Regarding timing, on the MD the HV counter is used for loose timing, then NOPs for some fine timing, then it causes a FIFO overflow to sync with the VDP (the 68000 on its own simply can't do it). Is there a way on the SNES to make the SPPU mess with the timing of the 65816 in a similar way?
Also I wonder what are the differences between slow ROM and fast ROM here.
93143 wrote:
"how did the PowerPak manage to screw up (H)DMA with the old firmware, and could it still be doing something like it?".
I used to leave $420B/$420C in whatever state they were in before starting a game. As of build #11331, I do
stz $420B : stz $420C (
details in the commit).
I can't test your new ROM right now but will do so once I get home.

EDIT: Okay, tried it just now on both the PowerPak and sd2snes. I didn't encounter any glitches (game loading: ~3 times, tapping Reset: ~20 times for each cart). Also, I tested the older dmac_align.sfc (without the _i in the file name) on sd2snes as well as I was curious if it would differ from the PowerPak. I got those moving glitches after about the fifth time of pressing Reset. (IRQ hooks on sd2snes were disabled, BTW.)
That's great news; thanks!
I don't mean to torment you with goalpost-shifting, but did you power-cycle a few times? It might be different from reset.
Anyone else who wants to test the ROM and report the results is welcome to do so as well. Has anybody got a 1CHIP or SNES Jr.? I know it's not that important in the grand scheme of things, but I kinda want this to work across the board.
I imagine that any effect power cycling could have would be greatly nullified by a firmware running first. Is there any flashcart that allows running games directly without any firmware running first?
Sik wrote:
I imagine that any effect power cycling could have would be greatly nullified by a firmware running first.
But if the firmware code can change whatever power up state affects the visual effect (which you're assuming it does), couldn't the ROM itself do the same and completely avoid glitches, making power cycles pointless?
Sik wrote:
More like "how can flashcart firmware screw up with DMA at all?". Unless it's leaving some register in a non-default state...
Regarding timing, on the MD the HV counter is used for loose timing, then NOPs for some fine timing, then it causes a FIFO overflow to sync with the VDP (the 68000 on its own simply can't do it). Is there a way on the SNES to make the SPPU mess with the timing of the 65816 in a similar way?
Also I wonder what are the differences between slow ROM and fast ROM here.
As far as I know, nothing external can halt the S-CPU or affect its timing. That's why the SA-1, for example, has to defer to the S-CPU when both access ROM at the same time.
Since the NES, Nintendo has liked to cram as much logic as possible on die on their CPUs; on the SNES, WRAM refresh cycles and memory wait states are actually generated by the S-CPU itself, and the S-CPU also decodes the WRAM and cartridge ROM address ranges and outputs the corresponding chip select signals directly.
tokumaru wrote:
But if the firmware code can change whatever power up state affects the visual effect (which you're assuming it does), couldn't the ROM itself do the same and completely avoid glitches, making power cycles pointless?
Pretty much all hardware will interfere.
The Super UFO resets all WRAM to 0x00, leaves its logo in the PPU WRAM (enable the display register and you'll see the UFO loading screen), and somehow manages to shatter open bus. When you read from eg $20ff, you're supposed to get back $20 (last byte the CPU fetched successfully), but instead you get 0x00. This causes bugs in DKC2 with barrel rolling, and completely breaks Speedy Gonzales in level 6-2.
The sd2snes seems to be the best so far, but it also destroys state upon loading games, and the reset trick invokes very weird issues reminiscent to the NES' different PPU phases at startup (asserting reset from the cartridge seems to only reset the CPU/APU, and not the PPUs.)
Well to be fair you want the ROM to work even if somebody presses Reset so that means having it to work from a completely unpredictable state.
Well, it looks like we're about done here for the time being. Considering I seem to have correctly guessed why the code wasn't working on my first try, both times, I'm going to just assume it works properly until such time as I can justify buying a SNES Jr. (for unrelated reasons: I'll want to do compatibility tests for my shmup port).
Thanks to everyone who helped out! Maybe something interesting will come of this knowledge someday...
...
Only one thing still bugs me. It looks like the dead line problem doesn't happen on a real SNES. So why does it happen on all versions of bsnes/higan?
byuu wrote:
93143 wrote:
it also allows me to start the DMA before the beginning of the active scanline and buffer enough colours to get through the DRAM refresh
I still don't entirely get this because DRAM refresh is smack in the middle of the screen for 40 clocks.
The main/sub trick allows me to preload a number of colours in advance by increasing the offset between the loading and display patterns. This allows me to start with a buffer that stays filled until I need it, without requiring me to break up the linear bitmap format of the image.
I start each line well back in HBlank. I manually set the CGRAM index to the index used at the beginning of the display pattern and start the DMA early enough that four colours get written during HBlank, with the CGRAM address incrementing normally without interference from the PPU. The first column of the loading pattern is actually the same index as the
fifth column of the display pattern, and it remains four columns (16 dots) ahead until DRAM refresh.
DRAM refresh is handled by making one of the columns in the loading pattern 14 pixels wide instead of 4. This leaves the loading pattern only 6 dots ahead of the display pattern rather than 16. (If what fullsnes says about half-dot CPU 'wakeups' during refresh is true, it should be possible in principle to hit one of those and have the write go to the wrong pattern, but I... guess I dodged a bullet?)
Attachment:
loading.png [ 539 Bytes | Viewed 3429 times ]
Attachment:
display.png [ 528 Bytes | Viewed 3429 times ]
You may notice that the actual data in the ROM has some extra colour and palette indices offscreen. That's a relic of my (successful) attempt to fix the dead line problem in bsnes/higan. But if the problem doesn't show up on a real SNES, why then it doesn't need fixing, does it?
As for why there's a colour #1 stripe at the right-hand edge of the loading pattern... I was trying to get the PPU to reset the CGRAM index for me, so I wouldn't have to waste time doing it manually. But it didn't work.
...
Source is attached. There are a few inconsequential details that are different from last time, such as the IRQ clear during the NMI routine - I was paranoid about breaking it this time, so I didn't change anything but the comments and formatting. The checksum should be the same as dmac_align_i.sfc.
@tepples: Should I try to find a different picture? I've just kinda been using your photo as a reference standard...