Hi !
I'm trying to improve my video emulation, and now only a couple of games still have issues.
One of those is this video with the intro of pokemon. I read the other thread, implemented that additional OAM 2 interrupt, and it plays till the end.
However, something is wrong on my tiles logic, video looks jerky.
The game manages to do the graphics by changing the scroll Y register when in mode 3, to different coloured tiles, with accurate timings, each line is rendered with this precise method. Quality of each frame is terrible.
But then I discovered the games does a little trick to "duplicate" the quality of each frame, turns out the game changes the scroll X register between 0 and 4 every frame, to simulate a better quality. I guess with a real gameboy LCD blending each frame, it looks better.
My emulator, when scroll X is changed to 4, it doesn't change anything, since the tiles are just solid colours (doesn't change in the X axis), it just does exactly the same as when the scroll X is 0.
I see that BGB, when scrollX = 4, cuts the first 4 pixels of all the vertical tiles in the first column, and then draws the rest as usual. What's the logic behind this ?
I'm also having distorted letters in the lava tunnel from the demotronic demo, which I think it's also related to this. Weird thing is that scrolling works perfectly in all other games. So it looks like I'm missing some edge case here ?
Thanks !
I really like how my GBVideoPlayer became such an headache for accurate emulation developers, including myself.
BGB definitely emulates this specific quirk correctly. Basically what happens (of course, this is a theory that I can't prove, but I couldn't prove it incorrect either) is this: On mode 3, the LCDC renders 8 pixels at a time. These pixels must be
aligned to a tile. So if the SCX % 8 is 0, it all "works" as expected, and it first draws the first 8 pixels, then the next 8, and so on.
However, if SCX % 8 = 4, the first (leftmost) pixel it has to render on a line is not aligned to a tile. Therefore, in order to align itself to a tile the LCDC only renders the first 4 pixels at its first iteration. Then it continues drawing the next 8 pixels, and the next 8, and so on.
Since the LCDC renders a single color in each iteration in GBVP, this causes the entire image to be shifted left 4 pixels.
This effect is
entirely documented at GBVP's Github.
That's awesome, I implemented it, and it looks like it works, so I'm ok with it.
Another thing that some people may struggle at first, is that the rom informs in its header, that it has 2 rom banks, when in reality it has more than 300.
That link had everything I needed, now back to improve other aspects =)
Yeah, I didn't really bother with the header... ><
LIJI wrote:
BGB definitely emulates this specific quirk correctly. Basically what happens (of course, this is a theory that I can't prove, but I couldn't prove it incorrect either) is this: On mode 3, the LCDC renders 8 pixels at a time. These pixels must be aligned to a tile. So if the SCX % 8 is 0, it all "works" as expected, and it first draws the first 8 pixels, then the next 8, and so on.
However, if SCX % 8 = 4, the first (leftmost) pixel it has to render on a line is not aligned to a tile. Therefore, in order to align itself to a tile the LCDC only renders the first 4 pixels at its first iteration. Then it continues drawing the next 8 pixels, and the next 8, and so on.
God, this explanation makes absolutely zero sense to me, sorry.
It sounds like you're trying to describe basic scrolling. But there's nothing remotely unusual about that. Every tile-based system I emulate does this the exact same way. Start at scroll=4, and the entire line is shifted by four pixels, obviously.
I have absolutely no idea if you're describing some behavior I don't know about or not >_>
The actual effect in this specific case is indeed simple scrolling, but it is definitely not obvious that it would work this way because GBVideoPlayer doesn't render by tiles. Remember the image GBVideoPlayer constructs with its tilemap and tileset (Right part in the picture below) is just plain horizontal lines and draws it frames with SCY hacks, so theocratically SCX shouldn't have had any effect on this.
What I'm describing isn't
what the LCDC is rendering, but
when does it render each pixel on a scanline, taking SCX into account. Note that in this specific case SCX = 0, SCX = 8, and SCX = 16 will look
exactly the same, not shifting the image at all.
Higan, BGB and SameBoy all correctly emulate this behavior
in this specific scenario, but probably for different reasons since I never found any documentation of it. I haven't tried to create a test ROM that proves my theory incorrect. In fact, I highly suspect it is, but I don't have a better theory yet, mostly because I don't have an appropriate test ROM.
I hope that explanation makes more sense now.
Now considering Higan does support this behavior, what is your theory on this? Why would SCX even have an effect on that scenario? What would the timings be if you change SCX inside a scanline? That last question is where my theory breaks since this allows you to make a REALLY long scanline.
tl;dr: A vertical scroll value affects an entire 8x1-pixel pattern sliver.
Fine background scrolling on the NES is accomplished by feeding the pattern and attribute bits to the equivalent of four
74164 serial-in, parallel-out shift registers as a delay line, which produces pixel data delayed by 1 to 8 pixels on the 8 outputs, and then selecting a tap through the equivalent of four
74151 8-to-1 multiplexers. But the addresses of the nametable and pattern bytes don't depend on the scroll value at the time the pixel is output but on the coarse scroll value at the time the pattern data is fetched.
I remember reading in
kev's Game Boy timing doc that the LCD operates slightly differently from the NES and Super NES PPU because it's allowed to pause the LCD's pixel clock. Thus instead of using a SIPO+mux as a delay, it can just not clock the LCD for the first SCX%8 pixels. But I imagine it still uses the scroll value at the time a sliver of pattern data is fetched, and the fetches occur on a regular cadence: one every eight pixels as on the NES.
The big
observable timing differences between NES-style SIPO+mux rendering and Game Boy-style are twofold: The Game Boy may pause between the main background and the window, and mid-line changes to fine SCX might not work as expected.
Managed to answer my own question with a test ROM – Changing SCX while on a scan line has no effect. I tested the ROM on a real Gameboy Color, BGB, Gambatte and SameBoy (Higan 0.99 won't compile on my Mac and I don't have a 64-bit Windows VM).
Gambatte's result is what a real CGB really displays, SameBoy and BGB fail. This means the LCDC ignores changes in SCX if they happen during a scanline. The test ROM basically has consecutive inc [hl] (hl = SCX) instructions during the entire scanline.
I attached the ROM and the results.
Very interesting stuff.
So the idea is, that when the LCD enters mode = 3, the scrollX gets locked ? Or should that be mode 2 ?
Just for the sake of comparisons, here's your test with my rom (also failing of course).
Attachment:
nefustogb.png [ 25.09 KiB | Viewed 4305 times ]
DarkMoe wrote:
Very interesting stuff.
So the idea is, that when the LCD enters mode = 3, the scrollX gets locked ? Or should that be mode 2 ?
Just for the sake of comparisons, here's your test with my rom (also failing of course).
Attachment:
nefustogb.png
I think it's still writable, but won't take effect until a new line starts.
Well, at least you got the timing of the first line right, this one is driving me crazy ><
Ok, fixed it by doing exactly like that, As soon as mode 3 starts, I locked the scroll X register. But it can still be writtable.
This is the result, I wonder if it's accurate enough (is it supposed to have those white blocks at the right edge ?)
It's still different from the actual result due to SCX locking on the wrong value. This result is similar to what SameBoy displays after applying the same fix, other than the top line. (I still can't figure out why the first interrupt triggers a little earlier in all other emulators and the real hardware, did I miss a quirk everybody else didn't?
)
> tl;dr: A vertical scroll value affects an entire 8x1-pixel pattern sliver.
Ah, that's what I was looking for, thanks :D
And no worries LIJI, I just unfortunately have a hell of a time reading textual descriptions of things (as well as proper math notation.)
> Now considering Higan does support this behavior, what is your theory on this?
Latching. Each time we hit a tile boundary, we fetch all the tiledata for it at that time (we probably fetch it slightly in advance so it's ready to start being drawn right away.)
There's not enough bandwidth for the DMG to refetch all the tiledata on every single pixel.
> Why would SCX even have an effect on that scenario?
tilex = (x + scx) & 7;
if(tilex == 0) fetch_tile();
SCX changes when we hit a tile boundary.
> Changing SCX while on a scan line has no effect
Well that's good. That means it's latched once per scanline.
Makes sense it'd work this way, but mid-scanline writes can be really screwy sometimes.
> did I miss a quirk everybody else didn't?
Probably the way LY seems to act like 0 partway through LY=153, at least for the sake of LYC IRQs and possibly reading the LY register. But like with everything on the DMG, nobody really knows what the hell is going on and it's all guesswork that's never fully confirmed by anyone. So take that with all of the salt in the pacific ocean.
Any idea how to get the Visual 6502 project back up and decapping so we can get some transistor-level simulation up in
this female dog?
DarkMoe wrote:
So the idea is, that when the LCD enters mode = 3, the scrollX gets locked ?
More likely explanation: it fetches SCX into an internal counter (used to keep track of its current position at each pixel) and then uses that for the rest of the scanline. So changes to SCX mid-scanline go ignored precisely because it isn't even checked in the first place.
Do new values still remain for the next scanline though or is the write actually masked out?
Sik wrote:
More likely explanation: it fetches SCX into an internal counter (used to keep track of its current position at each pixel) and then uses that for the rest of the scanline. So changes to SCX mid-scanline go ignored precisely because it isn't even checked in the first place.
That's exactly what I thought.
Sik wrote:
Do new values still remain for the next scanline though or is the write actually masked out?
I still don't have a test ROM for that, but considering somewhat similar behavior with registers related to the window (See 007 - The World Is Not Enough), I believe it is still writable.
Higan 0.99 fails this test too. It also misses the first line timing, meaning I'm not alone on this.
LIJI wrote:
I still don't have a test ROM for that, but considering somewhat similar behavior with registers related to the window (See 007 - The World Is Not Enough), I believe it is still writable.
You want me to make a test ROM? Shouldn't take me until this afternoon. If what I have in mind works, visual confirmation should be a lot easier than staring at some of the test pictures I see floating around this thread
Shonumi wrote:
LIJI wrote:
I still don't have a test ROM for that, but considering somewhat similar behavior with registers related to the window (See 007 - The World Is Not Enough), I believe it is still writable.
You want me to make a test ROM? Shouldn't take me until this afternoon. If what I have in mind works, visual confirmation should be a lot easier than staring at some of the test pictures I see floating around this thread
That would be great, as I'm currently trying to figure out why am I missing that first line.
As for a nightmarish-epilepsy-causing test image, I assumed the behavior would be more non-trivial when I wrote the test ROM, so I wanted it to contain as much info as possible. Also, basing the ROM on GBVideoPlayer didn't help making the result any simpler ><
Sorry LIJI, it's taking me a bit more time to make a good test ROM. I think I've run into some unexpected behavior. What I'm trying to do is make a test pattern and change the value of SX at different points during scanline rendering. Early testing about our "SX latching" theory may need some adjustments. I can definitely write to SX during Mode 2 (OAM), but curiously Mode 0 (Hblank) seems to ignore any new values of SX. The next line isn't scrolled at all.
I don't know if that is news to anyone, but it makes sense why so many games use the LY == LYC IRQ when changing SX midscreen. A timely LY == LYC IRQ should happen during Mode 2. No idea what happens when we try writing SX during Mode 3, most likely the same as Mode 0. Don't know if that helps, or if that muddies up things even more. I'll have to adjust the test ROM to account for this, so I'll probably have something ready later during the day/night.
Alright, I think this is good enough. Forget about what I said earlier. Seems you can change SX in Mode 2, Mode 3, and Mode 0 and still have it take effect on the next line. Not being able to get it working on HBlank was just my bad coding
Anyway, it's simple. The test ROM just shifts a fullscreen pattern by 0 or 1 every other line (the original pattern is already shifted every other line, so SX changes "correct" it). The result should create a series of vertical bars. The source code is attached as well as the ROM. Nothing fancy, but I've messed around with the OAM-STAT IRQ a few times to get it working with Modes 2, 3, and 0. Same result on real hardware. Nothing changes on the current line, just the next one. My guess is the latched value for SX is determined between the end of Mode 0 and the start of Mode 2.
mmm, I think something is still wrong with my implementation. Look at this, first line is inverted, and the rest start with black line.
What's the correct pattern for your test ?
The 1st line should be in sync with the rest. Technically, the very last pixel on some of the scanlines should be messed up as well, but I haven't pinpointed why that seems to happens on real hardware.
To me, it looks like you're applying SX = 1 to the 1st scanline. This shouldn't happen. Every even line should have SX = 0, and every odd line should have SX = 1. The only time SX is changed is in those OAM-STAT interrupts. I think what you're dealing with is some obscure (and poorly documented) behavior with the last scanline in VBlank that affects Line 0's STAT interrupts and LYC interrupts. For more info, check out Kirby's Dream Land 2's intro with LYC == 0 IRQ. Line 153 does some funky stuff, and it looks like that's at play here. Try stepping the test ROM through BGB for a closer look.