Stuck on PPU Implementation

Stuck on PPU Implementation
by RobertLoggia on 2015-04-05 (#144556)

I just finished writing my CPU and tested it against Klaus' 6502 test program. Everything works except BCD mode. I'm now starting to write the PPU. The CPU seems like a walk in the park compared to what I'm trying to grok on the nesdev wiki.

What's a great/easy ROM for testing while programming the PPU?

Also, any recommended documentation (for beginners) for writing the PPU? I don't have any experience in game programming which is why I might be struggling in understanding the concepts.

Re: Stuck on PPU Implementation
by nIghtorius on 2015-04-05 (#144570)

You also keep instruction cycles? Because that is very required to do so. As the ppu (NTSC) does 3 cycles per cpu cycle. This synchronization is very imported as a lot of games depend on it.
You could try with Donkey Kong. This is the easiest game to play with considering PPU development.

Super Mario Bros is actually allot harder to emulate.
And test NROM games, not MMCx, etc mappers. Because you need to emulate those mappers too to get those games running.

Re: Stuck on PPU Implementation
by mkwong98 on 2015-04-06 (#144628)

I started with BKG Graphics Test which can be found on the project page in NesDev wiki. It is very simple so it is easy to see what's wrong.

Re: Stuck on PPU Implementation
by MottZilla on 2015-04-06 (#144636)

First, do you understand the general concept of how the PPU generates an image? For example do you understand the NameTables and Pattern Tables? It would help anyone trying to help you if you can detail what you understand so far.

Re: Stuck on PPU Implementation
by tokumaru on 2015-04-06 (#144638)

Just like the CPU, the PPU repeats a series of operations over and over. While the CPU is stuck fetching, decoding and executing instructions, the PPU is stuck on a more complex loop, generating video based on some internal parameters (which can be changed by the program) and VRAM/ROM.

This wiki page describes what happens every frame, and that's what the PPU does over and over. There's also the sprite avaluation, that runs in parallel with the background rendering.

The CPU and the PPU run in parallel, so you have to find a way to emulate that. People who are worried about speed often switch between the PPU and CPU every scanline (i.e. run the CPU for 1 scanline, then run the PPU for 1 scanline), but those who are worried about accuracy may switch every cycle. Another option is the "catch up" method, which lets the chips run until something that one does affects the other, then it catches up to that point. The CPU can affect the PPU by writing to its registers ($2000-$2007), and the PPU can affect the CPU with NMIs and the flags in the status registers (VBlank, sprite 0 hit, etc).

A very very very crude way to get a basic PPU working is to draw an entire picture at once using the current state of the PPU (palettes, name tables, attribute tables, pattern tables, scroll, etc). This will help you understand how the different parts are combined to form the picture before you have to worry about timing details, but this is so rudimentary that it will fail in any games that modify PPU parameters mid-frame (i.e. raster effects). Note that you still have to implement the basic CPU-PPU communication with some degree of timing accuracy, such as NMIs, the VBlank flag and the sprite 0 hit flag, otherwise the program might get stuck waiting on those.

Re: Stuck on PPU Implementation
by zeroone on 2015-04-08 (#144756)

Based on my own experience and the experiences that I read about on this forum, most NES emulator developers end up writing the PPU in iterative stages of complexity. In other words, the PPU is virtually rewritten several times such that each version approximates the actual hardware better. There are several reasons to do this: It takes time to comprehend each aspect of the PPU and part of the learning process involves coding. Working with multiple versions maybe the only practical way to learn the material. Another reason is the amount of time that you have to spend on the project, which depending on your current programming knowledge and experience may be quite immense. Each version of the PPU will be able to play some subset of games and you can call it quits at any of the iterative stages. But, if you jump right into the most complex PPU design, you might never get anything playable completed. Related to that is motivation. Once you see some games running with a simple PPU implementation, you'll probably find it a lot easier to work on the next version as opposed to waiting and waiting for the complex one to get done.

Understanding timing is key to making the emulator work. Each frame needs to be displayed approximately every 17 milliseconds. You'll need some sort of sleep function that delays until it is time to generate the successive frame:

Code:

while(true) {
  renderFrame();
  waitForNextFrameTime();
}

The CPU and PPU execute in parallel and they are synchronized by a common clock. But, the first approximation of this might look like:

Code:

void renderFrame() {
  renderBackground();
  renderSprites();
  generateNMI();
  runCpuForNumberOfCyclesInFrame();
}

That is sufficient for the simplest games like Donkey Kong and Popeye.

The next approximation of PPU is scanline based:

Code:

void renderFrame() {
  for(int i = -1; i < 240; i++) {
    renderScanline(i);
    runCpuForNumberOfCyclesInScanline();
  }
  generateNMI();
  for(int i = 240; i < 262; i++) {
    runCpuForNumberOfCyclesInScanline();
  }
}

Ultimately, you should create a PPU function that renders a single pixel:

Code:

void renderFrame() {
  for(int i = -1; i < 262; i++) {
    for(int j = 0; j < 341; j++) {
      renderDot(i, j);
    }
  }
}

In this model, the PPU drives the CPU. For NTSC, the ratio is 3:1 (3 dots per CPU cycle). For PAL, the ratio is 16:5 and there are additional vblank scanlines. The sleep delay between frames will also be slightly different. These ratios can be maintained by using floats or by integer overflows.

The PPU does several things in parallel. Such a renderDot() function will contain a lot of switching logic that decides what to do based on the current scanline and the current dot index. The wikis that describe the PPU are not written in procedural pseudo code. Instead, they are written as a bunch of possible cases. You'll need the switching logic to direct execution to each case.

Finally, do not optimize early. Modern CPUs are insanely fast. Write your code clean and readable and your emulator will likely run perfectly with plenty of time to spare for each frame.

Re: Stuck on PPU Implementation
by tepples on 2015-04-08 (#144757)

zeroone wrote:

Modern CPUs are insanely fast.

Only PC or also mobile?

Re: Stuck on PPU Implementation
by zeroone on 2015-04-08 (#144758)

tepples wrote:

Only PC or also mobile?

Who is the audience for an emulator project? The world is saturated with super optimized and accurate emulators for all possible devices. The reality is that these projects are done for the experience. The only relevant platform is the machine that the developer develops on.

Besides, someone will likely stumble upon this post 5 years from now (hello future person), in a world where mobile devices run just as fast as a typical desktop does today.

Re: Stuck on PPU Implementation
by James on 2015-04-08 (#144788)

zeroone wrote:

Code:

void renderFrame() {
  for(int i = -1; i < 262; i++) {
    for(int j = 0; j < 341; j++) {
      renderDot(i, j);
    }
  }
}

That's one scanline too many. Should be:

Code:

for(int i = 0; i < 262; i++)

Edit: oops

Re: Stuck on PPU Implementation
by Sik on 2015-04-08 (#144794)

Erm, that's identical...

Re: Stuck on PPU Implementation
by James on 2015-04-08 (#144795)

Sik wrote:

Erm, that's identical...

Oops. Fixed.

Re: Stuck on PPU Implementation
by tokumaru on 2015-04-09 (#144803)

I think the -1 was for the pre-render scanline, in which case 0-239 would be the visible picture, 240 would be the post-render scanline, and 241-260 would be VBlank, so the for should be for(int i = -1; i < 261; i++). I think that numbering an scanline as -1 is a bit confusing though.

Re: Stuck on PPU Implementation
by zeroone on 2015-04-09 (#144804)

My bad. Yep, there was an extra scanline in there. Some of the docs refer to the pre-render scanline as -1; so, I put that into the for-loop to stress that.

Re: Stuck on PPU Implementation
by tokumaru on 2015-04-09 (#144805)

I never coded an emulator, but I guess it would make sense to start the render loop at the same point a real PPU would. According to this wiki page, "The PPU comes out of reset at the top of the picture", but to me it's unclear if the top of the picture is the pre-render scanline or scanline 0.

EDIT: forgot to link to the page.

Re: Stuck on PPU Implementation
by thefox on 2015-04-09 (#144806)

tokumaru wrote:

It's scanline 0: http://wiki.nesdev.com/w/images/d/d1/Ntsc_timing.png

I'd also suggest going by the counters actually used by the PPU (since we now know them), since it makes it easier to compare the implementation to that diagram and Visual 2C02.

Re: Stuck on PPU Implementation
by tokumaru on 2015-04-09 (#144811)

thefox wrote:

It's scanline 0: http://wiki.nesdev.com/w/images/d/d1/Ntsc_timing.png

Yeah, my confusion came from the fact that the diagram implies that scanline 0 is the first, but the page where the diagram is starts the description from the pre-render scanline.

Quote:

I'd also suggest going by the counters actually used by the PPU (since we now know them), since it makes it easier to compare the implementation to that diagram and Visual 2C02.

Care to refresh my memory?

Re: Stuck on PPU Implementation
by thefox on 2015-04-09 (#144815)

tokumaru wrote:

Quote:

I'd also suggest going by the counters actually used by the PPU (since we now know them), since it makes it easier to compare the implementation to that diagram and Visual 2C02.

Care to refresh my memory?

Hmm? The counters used by the PPU are exactly the ones used in that diagram. I.e. (0, 0) is the first pixel of the 256x240 view area. (240, x) is the post-render line and (261, x) is the pre-render ilne.

Re: Stuck on PPU Implementation
by tokumaru on 2015-04-09 (#144817)

thefox wrote:

The counters used by the PPU are exactly the ones used in that diagram.

Oh, OK.