PPU questions

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic

PPU questions
by n6 on 2006-08-10 (#16096)

Just a few questions about PPU behaviour...

1. for NTSC the bottom and top 8 pixels should be removed from the screen output. Do this mean that Sprite0Hit and LostSprites should be discarded as well if its inside this region?

2. Is sprite0Hit flag set even if the sprite 0 is removed due to more then 8 sprites in a scanline.

3. One last thing about lost sprites, if 8 sprites are recognized in a scanline should just the current line of the sprite be removed or the whole sprite? cause in some cases maybe just a part of the sprite is on a scanline containing more then 8 sprites.

Thanks

by hap on 2006-08-10 (#16097)

1: You should really leave that up to the end-user, eg. my PAL tv clips the first scanline, shows all bottom scanlines, and clips some pixels on the left and right. No matter how many pixels are clipped by the tv, the NES renders all of them, so you shouldn't discard your Sprite0Hit and LostSprites.

2: No, if it's not rendered, the flag is not set. Though that's only possible if the current sprite address isn't 0. *edit* come to think of it, the 8 sprites limit can actually never even affect sprite 0.

3: just the current line

by n6 on 2006-08-10 (#16098)

Thanks for the fast answer.

Quote:
the 8 sprites limit can actually never even affect sprite 0.

say that sprite 0 has position (20, 0)
all other sprites have position (0, 0)
isnt the render order sprite 1-63 and then last sprite 0

by Memblers on 2006-08-10 (#16099)

n6 wrote:
isnt the render order sprite 1-63 and then last sprite 0

Yes. Since it's rendered last, that gives it the opportunity to overwrite any other one (besides 1-7 of course). They're all evaluated before being drawn to the screen. For each scanline.

by Disch on 2006-08-10 (#16100)

Quote:
Do this mean that Sprite0Hit and LostSprites should be discarded as well if its inside this region?

No. Sprite 0 will hit (and sprite overflow will be detected) on these scanlines even if the scanlines are not visible on the user's display. All 240 lines are still rendered by the NES even on NTSC -- whether or not the TV shows them is irrelevent to how the NES operates.

Sprite 0 / Overflow will still work even if no TV is connected to the NES.

Quote:
isnt the render order sprite 1-63 and then last sprite 0

Sprites are not evaluated by their X position, they're evaluated by their index number (ie, Sprite 0 will always be on "top" of all the other sprites, and sprite 63 will always be on "bottom").

X position is irrelevent when it comes to determining which sprites are displayed and which get priority. Lower number sprites always have priority over higher number ones.

by n6 on 2006-08-10 (#16102)

Quote:
Sprite 0 / Overflow will still work even if no TV is connected to the NES.

Yes this makes perfectly sense, I was just a bit unsure cause the systems is different and my doc says that the screen is 224 in height, I just wanted to be sure. But I understand that this only depends on the TV.

Quote:
Sprites are not evaluated by their X position, they're evaluated by their index number (ie, Sprite 0 will always be on "top" of all the other sprites, and sprite 63 will always be on "bottom").

X position is irrelevent when it comes to determining which sprites are displayed and which get priority. Lower number sprites always have priority over higher number ones.

But what I wanted to say with my example was that only sprite 1-8 will be drawn right? and now Iam confused since the sprite0 isnt drawn will the sprite0hit flag be set or not?

by tepples on 2006-08-10 (#16103)

Sprite 0 is always drawn, unless it's offscreen or transparent. If you have sprite 0 through 8 on a scanline, sprite 8 always drops out.

by n6 on 2006-08-10 (#16105)

Okey so to find out which sprites which should be drawn I should do something like this (not care about x pos):

Code:
for (int i = 0; i < 64; i++)
{
if (spr[i].y + 1 >= curScanline && spr[i].y + 9 < curScanline)
{
// draw sprite pixels
sprCount++;
if (sprCount >= 8)
break;
}
}

by Quietust on 2006-08-10 (#16106)

Memblers wrote:
n6 wrote:
isnt the render order sprite 1-63 and then last sprite 0

Yes. Since it's rendered last, that gives it the opportunity to overwrite any other one (besides 1-7 of course). They're all evaluated before being drawn to the screen. For each scanline.

That's a rather bad way to think about it, since the PPU works in the exact opposite direction. The PPU doesn't render the 8th sprite first and then draw the 7th sprite on top of it - rather, it draws sprite 0 first and then, if it finds another sprite in the same location, it ignores it.

Rather than thinking of it as lower numbered sprites overriding higher numbered sprites, think of it as an ordinary priority encoder. After all, the PPU renders the background and all of the sprites simultaneously, one pixel per cycle, and it only has time to pick one sprite pixel per cycle (and then decide whether it should draw the sprite or the background).

n6 wrote:
Okey so to find out which sprites which should be drawn I should do something like this (not care about x pos):

Code:
for (int i = 0; i < 64; i++)
{
if (spr[i].y + 1 >= curScanline && spr[i].y + 9 < curScanline)
{
// draw sprite pixels
sprCount++;
if (sprCount >= 8)
break;
}
}

You don't actually draw any sprite pixels during the Y-coordinate evaluation - you just throw the sprite numbers (or the sprite data itself) into a list which you can reference during the next scanline (during which point you will, for each pixel on the scanline, loop from 1 to 8 and the break; out of the loop once you find a non-transparent sprite pixel).

by Memblers on 2006-08-10 (#16108)

n6: seems about right, as long as the lower-numbered sprites, when finally displayed will be on top of the higher numbered ones.

Quietust wrote:
Memblers wrote:
Yes. Since it's rendered last, that gives it the opportunity to overwrite any other one (besides 1-7 of course). They're all evaluated before being drawn to the screen. For each scanline.

That's a rather bad way to think about it, since the PPU works in the exact opposite direction. The PPU doesn't render the 8th sprite first and then draw the 7th sprite on top of it - rather, it draws sprite 0 first and then, if it finds another sprite in the same location, it ignores it.

Yeah I could've worded it better, I was thinking of rendering as being more like putting it into the buffer of what will be displayed (and overwriting it in that buffer). Backwards maybe, but I guess it would work too. I've never written a PPU emulator though.

by n6 on 2006-08-10 (#16109)

I don't understand when the Lost Sprites flag / the pixel is drawn.
Sprites can change their positions during rendering right?

Is this correct?

// This is done each PPU cycle or each beginning of a scanline?
Code:
sprCount = 0;

for (int i = 0; i < 64; i++)
{
if (spr[i].y + 1 >= curScanline && spr[i].y + 9 < curScanline)
{
// Add Sprite to list
sprCount++;
if (sprCount >= 8)
{
// Set Lost Sprites Flag
break;
}
}
}

for (int i = 7; i >= 0; i--)
//DrawSpritePixel from list

by dvdmth on 2006-08-10 (#16110)

Sprites cannot change their positions during rendering, AFAIK. Sprite memory cannot be written (through $2004 or $4014) except during VBlank or if rendering is disabled ($2001 bits 3-4 are both clear).

Sprites are processed one scanline before they are drawn. This is why games have to subtract 1 from the sprite's Y position before updating sprite memory. The PPU first scans sprite RAM, looking for sprites that are in-range. Each sprite that is found is placed in a temporary buffer. There are only 32 bytes in this buffer, enough for 8 sprites worth of data, which is why you cannot have more than eight sprites on a scanline. Then, during HBlank (the time between scanlines), the PPU goes to CHR-ROM or CHR-RAM (whichever is present) and gets the appropriate tile data for these sprites. The results are placed in another buffer, eight pixels per sprite, so that they can be rendered on the next scanline. During that scanline, the PPU scans each buffer looking for a non-transparent pixel to draw. It looks at the first sprite and sees if its pixel is non-transparent at that X-position. If it is, the other sprites aren't even looked at - the pixel is rendered and drawn (assuming the background doesn't overlap it and has priority). If the pixel is transparent, it is thrown out and the second sprite data is exaamined. The process is repeated until either a non-transparent pixel is found or all eight sprites have been examined (at which point no pixel is drawn in the sprite layer).

Thus, during each scanline, the PPU draws the sprites that were processed on the previous scanline, while simultaneously processing sprites to be drawn on the next scanline.

by Disch on 2006-08-10 (#16126)

My way isn't 100% accurate as far as when sprite patterns are fetched, but it's "accurate enough" for practical purposes. The only real way I could practically impove the accuracy would be to be more precise about when I load the CHR, but meh...

ANYWAY... my method is as follows:

I keep an intermediate buffer for sprite pixels (256+8 bytes wide).

On cycle 257 (near start of HBlank) I do all my sprite evaluations for the NEXT scanline, and fill the intermediate buffer with sprite pixels as they will be rendered on the next scanline. When rendering pixels, I take the appropriate byte from the buffer and render it. This means copying sprites pixels twice (once to intermediate buffer and once to output buffer), so it's not the most efficient method ever, but it's very easy and flexible.

In addition to having the desired pixel, the intermediate buffer also tracks two flags to signal the properties of that pixel. Bit 7 indicates foreground/background priority. If set, this sprite pixel has background priority. Bit 6 indicates the pixel belongs to sprite 0 (so that checks for sprite-0 hit can be made with the approprate BG pixel during rendering).

by hap on 2006-08-11 (#16137)

Then how do you handle MMC2/4 CHR bankswitching when fetching sprites? And is it possible to have an accurate MMC3/6 IRQ counter?

Quote:
In addition to having the desired pixel, the intermediate buffer also tracks two flags to signal the properties of that pixel. Bit 7 indicates foreground/background priority. If set, this sprite pixel has background priority. Bit 6 indicates the pixel belongs to sprite 0 (so that checks for sprite-0 hit can be made with the approprate BG pixel during rendering).
Sounds familiar, I'm using something similar, got the idea from you over a year ago. You reminded me I forgot to credit you for it.

by Disch on 2006-08-11 (#16139)

hap wrote:
Then how do you handle MMC2/4 CHR bankswitching when fetching sprites?

That'd be done where the sprites are fetched for the next scanline. After I put the sprite pixels in the intermediate buffer I simply check which tile was loaded and if it was $FD/$FE, I notify the mapper.

Code:
if(MprMMC2Latch && (count < 8) && (tl >= 0xFD) && (tl <= 0xFE))
(this->*MprMMC2Latch)(nSpCHRPage,tl == 0xFE);

(the "count < 8" thing is so that it will only swap if this is one of the first 8 sprites being rendered -- since I have the option to not limit drawing to 8 sprites per scanline, I wouldn't want additional sprites to mess with the CHR page)

Quote:
And is it possible to have an accurate MMC3/6 IRQ counter?

Sure -- since I keep my IRQ counter tracking events in their own way (completely unrelated to how the PPU emu is running).

I did a basic outline of my method in another thread if you're interested:

http://nesdev.com/bbs/viewtopic.php?t=1822

The code IS pretty atrocious though, and very hard to follow. But if you are interested I could upload it for you.

Quote:
Sounds familiar, I'm using something similar, got the idea from you over a year ago. You reminded me I forgot to credit you for it.

Hah. Don't worry about crediting me for an idea. Man if I credited every person I got a coding idea or technique from, I'd have a 10 page list.

by hap on 2006-08-11 (#16151)

Quote:
I did a basic outline of my method in another thread if you're interested
I've read it now, it's at a higher level than what I have: a function call for ever PPU read, and if MMC3 is used, overriding them by setting a few functionpointers to its IRQ handler.

by Disch on 2006-08-11 (#16159)

hap wrote:
I've read it now, it's at a higher level than what I have: a function call for ever PPU read, and if MMC3 is used, overriding them by setting a few functionpointers to its IRQ handler.

How do you predict upcoming IRQs with that method? It sounds like you'd have to constantly be keeping the CPU and PPU in sync (sloooooooow).

by Quietust on 2006-08-11 (#16162)

Disch wrote:
hap wrote:
I've read it now, it's at a higher level than what I have: a function call for ever PPU read, and if MMC3 is used, overriding them by setting a few functionpointers to its IRQ handler.

How do you predict upcoming IRQs with that method? It sounds like you'd have to constantly be keeping the CPU and PPU in sync (sloooooooow).

That's funny, since that's exactly how Nintendulator does it. And it isn't really that slow - any Pentium 4 system can run it fine at full speed with zero frameskip with at least 25% CPU to spare.

by blargg on 2006-08-11 (#16164)

Polling something hundreds of thousands or millions of times per emulated second is a lot slower than predicting it in advance (possibly updating that a few times) and then carrying it out at the predicted time. Things really add up if you're polling multiple subsystems. It may not be a problem for emulating at normal speed on modern systems, but when you want to do fast-forward or run multiple emulators at once, it becomes a problem.

by tepples on 2006-08-11 (#16165)

Inefficiency also becomes a problem if you want to port your emulator to handheld devices.

by hap on 2006-08-11 (#16166)

Yeah, PPU and CPU have to be kept in sync constantly for MMC3, but I don't think that's a big speed hit: MMC3 games on my emulator run about 5% slower than an NROM game that uses a catch-up method to keep CPU and PPU in sync.

Having every read and write going through a function, instead of direct, is more of a 'problem' for speed. It allows for greater flexibility though, not just for MMC3. My whole emulator state is mostly based on a bunch of functionpointers.

by Disch on 2006-08-11 (#16171)

Quietust wrote:
That's funny, since that's exactly how Nintendulator does it. And it isn't really that slow - any Pentium 4 system can run it fine at full speed with zero frameskip with at least 25% CPU to spare.

Eehhh..

I would call Nintendulator slow. Not that I'm trying to knock it or anything, but it's the only NES emu I can't get a steady framerate on (even with 1x scaling and sound disabled), and I can run most others ablazing with graphics filters, full sound, and speedy fastforwarding capabilities without a hitch.

Granted my computer is several years old and it's not all that hot -- but I guess my point is what is considered "slow" is subjective. I can say that in my emu, when I catch the PPU up between every CPU instruction, the hit in speed is very noticable.

by blargg on 2006-08-11 (#16172)

Quote:
I can say that in my emu, when I catch the PPU up between every CPU instruction, the hit in speed is very noticable.

A design meant for catch-up won't be optimized for having the various subsystem emulator functions called every cycle, so it will naturally give much poorer performance than one originally intended for continuous operation (like Nintendulator). Thinking about this more, both designs require extra attention to different key areas: catch-up needs reliable prediction and precise "run until" functionality, and continuous needs heavy optimization for the code paths that run every cycle.

And now I can't remember what this thread was originally about.

by hap on 2006-08-12 (#16181)

Quote:
Granted my computer is several years old and it's not all that hot
Mine's several years old too (2002 technology), and runs Nintendulator just fine, yours must be antique then .. but seriously, I think it's actually an advantage being a developer and having a slower computer, it gives you the will/need to prioritize optimalisation.

by Disch on 2006-08-12 (#16186)

P3 1GHz, 512MB RAM. It wasn't top of the line when I bought it (probably around 1999? Can't remember), but it was pretty decent.

Nowadays, yeah I suppose it's a relic. I'd upgrade if I could, but you need money for that kind of thing. ;P

by Quietust on 2006-08-12 (#16191)

Disch wrote:
P3 1GHz, 512MB RAM. It wasn't top of the line when I bought it (probably around 1999? Can't remember), but it was pretty decent.

Nowadays, yeah I suppose it's a relic. I'd upgrade if I could, but you need money for that kind of thing. ;P

Same system I use, except mine's got 768MB RAM.

Yes, Nintendulator doesn't even run full speed on my own system.
This is one of the reasons why I want to upgrade.