When rendering sprites, the circuity searches across the OAM-DRAM for sprites within range to fetch them to the temporary buffer.. Well, it seems to be a comparator that detects some upper bits of the next line Y location..
Could 4-bit adder be used to do the same function related to the within-range bit?
A simple digital magnitude comparator and a simple ripple-carry adder are approximately equivalent in complexity, so there was no advantage at the time the NES was made.
I think there isn't a comparator equivalent to a carry-lookahead adder, in which case that would help, if the speed mattered ... but modern hardware has been designed for a more programmatic approach, without sprite evaluation happening on every scanline.
Could it be simplified into getting the difference (current Y - sprite Y) and refining the bits into some glue logic to get true within-range bit if the remaining upper 5 bits of the difference are all zeroed using a 5-input NOR or equivalent?
Sure, but the individual parts of a magnitude comparator are simpler devices than the necessary XOR3 and Carry per bit for a real adder(subtracter).
On the other hand, the real question is what does the PPU actually do, and I have no idea where to look in Visual2C02.
The PPU has to actually subtract in order to find which line of the sprite to draw if it is in range.
tepples wrote:
The PPU has to actually subtract in order to find which line of the sprite to draw if it is in range.
I guess it's mandatory in order to get which CHR row to the sprite FIFO, cutting circuity size to half..
I heard somewhere the PPU needs to subtract in order to find which line of the sprite to draw if it's in range, but I don't remember where.
So, if within range, the circuity consumes about 8 cycles and skips 6 cycles if not in range.. How possible it is to represent this circuit in a schematic?
As a state machine. If the sprite is in range, go to the state of copying the next 3 bytes to secondary OAM. Otherwise, stay in the same state and advance to the next sprite.
At first I though the 192 cycles of sprite evaluation are divided to 3 cycles per spr-Y location
So, it's like 2 cycles per byte I think because both primary and secondary dram portions are on the same bus, right?
ExtraOrdinary wrote:
So, it's like 2 cycles per byte I think because both primary and secondary dram portions are on the same bus, right?
They are indeed on the same bus - "primary OAM" is effectively $000-$0FF while "secondary OAM" is $100-$11F.
This is somehow a state machine which depends on two separate buses for primary and secondary OAM.. This circuity could double the rate using one cycle instead of two per byte..
How possibly the OAM FIFO could determine which sprite is on show with the down binary counters?
Quote:
How possibly the OAM FIFO could determine which sprite is on show with the down binary counters?
The down counters determine where is positioned horizontally by counting pixels until it's time to start feeding pixels out of the shift register to the compositor. Vertical positioning is handled by the evaluation logic.
So, if the counter rolls over '0', does that trigger some "horizontal evaluation" logic?
And is there a possible way using a priority encoder to determine which sprite shows when overlapped?
The OAM FIFO term itself doesn't seem to fit here
Each scanline, the sprites are scanned in order from 0 to 63. The first 8 sprites detected are recorded. Since the sprites are scanned from 0 to 63, this recording will always be sorted from lowest sprite number to highest sprite number.
The X position of each sprite is stored in a down counter. When a down counter reaches 0, that sprite becomes "active" and its pixels begin being drawn.
To select which pixel is sent to the screen, the sprite logic scans the list of 8 recorded sprites, and the first non-transparent pixel it finds gets sent to the screen. Remember that those 8 sprites are always recorded from lowest sprite number to highest sprite number. That means, the pixels of lower-numbered sprites will always be preferred over higher-numbered sprites.
If there are less than 8 sprites on a scanline, the left-over sprites are given an all-transparent bitmap.
But what you probably still wonder is exactly where the
priority encoder fits into the process.
The eight sprite units are loaded from secondary OAM during horizontal blanking. Then during draw time, for each screen pixel, each of the eight sprite units outputs a 5-bit value: 1 bit priority (attribute 2 bit 5), 2 bits color set (attribute 2 bits 1-0), and a 2-bit pixel value (from two PISO left shifters). Each sprite unit's 2-bit pixel value is OR'd together to produce an opacity bit, and these opacity bits are fed to an
8:3 priority encoder, producing a 3-bit value of which sprite is frontmost. This in turn is fed as the select to a set of five 8:1 multiplexers, which choose the 5-bit sprite layer value (4 bits color, 1 bit priority) that will be composited against the background unit's output. In addition, if sprite 0 is in range, the first sprite unit's opacity bit is ANDed with the background's opacity to produce the set input to the sprite 0 latch.
All that stuff happens in OAM FIFO module:
Sorry, I can't give detail schematic right now.
One interesting fact is that the SNES is capable of shooting up simultaneous 34 sprites at once.. It's like over 1088 registers to hold just the tile data in 4bpp..
A huge leap from NMOS to CMOS..