This might be an interesting topic.
I don't think that the actual rotation is the slow part, since a lot of it could be sped up with look up tables (for example a table indicating the coordinates of the pixel to use for each position at all needed angles), it's the pixel manipulation that's a killer, at least in the NES. I think that dynamic sprite rotation is out of the question for actual games on the NES... If anyone really needs it, better pre-render the sprites at the needed angles.
On the SNES it might be possible for a few sprites at a time I think, but I don't know enough about it to make more specific comments.
I tried it earlier on the year, and I managed to pixel rotate a little 8x8 sprite, but I also tried a 16x16 sprite and it caused lagging. Since I was planning on rotating big sprites, I tried to find a way to divide the work of rotating sprites between frames, but it got pretty complicated and I gave up.
Now I'm just using a combination of prerendered sprites, and metasprite placement + dynamic tile scrolling tricks. If my game takes up too much memory, I might use the cpu to rotate sprites at 90 degree angles.
I was writing a game that used sprite rotation on the NES a while back and pre-rendered was the way to go. That also means you can do touch-ups on the graphics since there's sure to be at least one odd-looking pixel in one of the frames. Just simply moving the data will always be the fastest.
If you look at arcade games that do lots of scaling and rotation, it seems like they often had 2 (or 3) 68000 CPUs running at 10 or 12mhz and probably varying degrees of hardware assistance.
A common bottleneck in a lot of early computers when it comes to cpu pixel manipulation is lack of 8-bit packed pixel mode. Most computers used nibbles to defined colors, or even worse used planar format.
Maybe 4-bit packed pixel mode isn't that bad, since you can do dual-pixel rotation, but too much of that would get pretty obvious if you rotate to far either direction.
I just thought of an easy way of doing this with a 65816. Use the memory like a 256x256 bitmap image, where the low address byte is X pixel and the high address byte is Y pixel. Instead of doing it in a tight loop pixel by pixel, an easier method is to do calculate every row of pixels as a whole.
First calculate the starting X and Y position of the first pixel of the row of pixels. (low bytes are decimals, high bytes are actual pixel locations) Then calculate the Y position of every pixel in the row, by loading the accumulator with the Y position, then repeatedly add a constant to the accumulator and store the results in a list with an unused byte between results. After that do the same with the X position, but store the results exactly one byte before the Y position. Now the high byte of the X position is stored over the low byte of the Y position and now makes the memory address of the pixel. Now all you have to do is copy and paste the pixels from the calculated memory addresses to wherever your storing the rotated sprite.
This method takes approximately 30 cycles per pixel, without planar conversion. That's 3,600,000 / 60 / 30 = 2,000 pixels or almost 2 32x32 sprites. Too bad Super Nintendo uses planar mapping, and would take a little longer.
psycopathicteen wrote:
This method takes approximately 30 cycles per pixel, without planar conversion. That's 3,600,000 / 60 / 30 = 2,000 pixels or almost 2 32x32 sprites. Too bad Super Nintendo uses planar mapping, and would take a little longer.
you'll get much less than 2k pixels after bitplane conversion and all those reads + writes to RAM happening at 2.68Mhz. FastROM can only help so much there. If it had the Genesis packed pixel format it would've been nicer. It must have helped plenty for the games that did 3D with nothing but the 68k. Buffering pre-rotated and/or scaled graphics in RAM to use on demand is the only use I can see for this and it's alot of RAM to burn depending on the graphics required. Would be nice on ROM space though.
Memblers wrote:
If you look at arcade games that do lots of scaling and rotation, it seems like they often had 2 (or 3) 68000 CPUs running at 10 or 12mhz and probably varying degrees of hardware assistance.
Sega Y board seems to be really powerful in this regard. Triple 68k each with 'instant' multiply and divide custom chips along with sprite scaling in hardware for fast pseudo-3D. Galaxy Force II looks great running at 60FPS.
It would take another approx 30 cycles to convert a packed pixel into a planar pixel. So it's more like 1k pixels.
I wrote a demo that did that 2 years ago. Too bad I suck and don't have it on my HD any longer but I'm pretty sure thanks god someone made a back-up of at least the ROM somewhere so you'd be able to find it on the net.
Quote:
I don't think that the actual rotation is the slow part, since a lot of it could be sped up with look up tables (for example a table indicating the coordinates of the pixel to use for each position at all needed angles), it's the pixel manipulation that's a killer, at least in the NES. I think that dynamic sprite rotation is out of the question for actual games on the NES... If anyone really needs it, better pre-render the sprites at the needed angles.
In fact, to rotate a X*Y sized image, you need to multiply X*Y vectors by a 2*2 matrix (3*3 if you need to change the origin of the rotation but that's another story) - resulting in a 4*X*Y multiplications operations per frame (or 9*X*Y if you want to change the center point). Of course, symetries inside the matrix allows to fast things up a bit - but that's still easily a crazy number of multiplications, that even if done fast with lockup tables takes up a crazy amount of CPU time.
As you say, dressing individual pixels is also a complex and you'd definitely want to use some lockup tables for it. I remember it was a headache when I wrote that demo 2 years ago.
For rotation, you don't need any matrix muls in the inner loop, just two adds per pixel and two adds per scanline. I've implemented a software mode 7 engine on the PC before.
I'd like to know if this really is any faster. I think that if both techniques are equivalent mathematically, it's likely both are about the same amount of computations so are about the same slowness. I might be wrong - but I tried to go the "add" way to scale a bitmap on the NES and I kind of failed it was slow as well too.
Bregalad wrote:
I think that if both techniques are equivalent mathematically, it's likely both are about the same amount of computations so are about the same slowness.
Unless one kind of computation is faster on a given piece of hardware than another kind of computation. For example, 6502 family has hardware adds and software muls unless you're using a memory-mapped multiplier (like the one in the MMC5 or Super NES). See
Strength reduction on Wikipedia. Perhaps the steps "read texel here" and "write pixel" overwhelmed "find next texture coordinate".