Everywhere I see a review of this game it always says there are:
"hundreds of enemies onscreen"
"no slowdown"
"super fast action"
"20 layers of parallax"
All of that is bullshit. Here are the facts:
"up to 8 enemies onscreen at max"
"lags when there are atleast 5 enemies onscreen"
"they scroll the background very fast just to distract you from the slow moving sprites."
"6 layers of parallax, 1 water, 2 mountains, and 3 clouds. There are visible line-scrolling boundries between clouds layers."
You're only about 2 decades late.
psycopathicteen wrote:
Everywhere I see a review of this game it always says there are:
"hundreds of enemies onscreen"
"no slowdown"
"super fast action"
"20 layers of parallax"
Reference?
This thread = overemotional
shawnleblanc wrote:
psycopathicteen wrote:
Everywhere I see a review of this game it always says there are:
"hundreds of enemies onscreen"
"no slowdown"
"super fast action"
"20 layers of parallax"
Reference?
http://www.meanmachinesmag.co.uk/review ... orce-4.php
It's ass-hats like "Damo" is the reason why I hate Sega fanboys.
Still nice game, isn't? Didn't play it much, though. I remember the third better.
Shiru wrote:
Still nice game, isn't? Didn't play it much, though. I remember the third better.
It's a nice game. It's just that the SNES has tons of games with more sprites and less slowdown, than ThunderForce 4. You just have to dig a little deeper than just playing arcade ports and Capcom/Konami classics.
Does TF4 even beat Recca in complexity and smoothness of motion? If not, we know the reason behind
this Easter egg. Nintendon't? Ha!
MottZilla wrote:
You're only about 2 decades late.
So is everyone else on this board. I thought being 2 decades late was the entire point of NESdev.
There is
another easter egg as well.
tepples wrote:
Does TF4 even beat Recca in complexity and smoothness of motion? If not, we know the reason behind
Not even close.
psycopathicteen wrote:
Shiru wrote:
Still nice game, isn't? Didn't play it much, though. I remember the third better.
It's a nice game. It's just that the SNES has tons of games with more sprites and less slowdown, than ThunderForce 4. You just have to dig a little deeper than just playing arcade ports and Capcom/Konami classics.
Not to mention games like the Parodius one that uses SA-1 which I don't recall any slowdown in. Some people really read a bit too much into the SNES's slowish CPU or overestimate the Genesis's cpu for being "fast". Plenty of Genesis games have slowdowns. The 68000 at around 7.5mhz isn't exactly lightening fast. The 12mhz 68000 in the Sega CD starts to make the SNES cpu really look slow but then if you are counting add-ons give the SNES the SA-1 and it doesn't look so bad.
It all comes down to how the game is programmed anyway. You can certainly have slowdowns even if you are on a platform with a faster CPU if you waste time.
MottZilla wrote:
psycopathicteen wrote:
Shiru wrote:
Still nice game, isn't? Didn't play it much, though. I remember the third better.
It's a nice game. It's just that the SNES has tons of games with more sprites and less slowdown, than ThunderForce 4. You just have to dig a little deeper than just playing arcade ports and Capcom/Konami classics.
Not to mention games like the Parodius one that uses SA-1 which I don't recall any slowdown in. Some people really read a bit too much into the SNES's slowish CPU or overestimate the Genesis's cpu for being "fast". Plenty of Genesis games have slowdowns. The 68000 at around 7.5mhz isn't exactly lightening fast. The 12mhz 68000 in the Sega CD starts to make the SNES cpu really look slow but then if you are counting add-ons give the SNES the SA-1 and it doesn't look so bad.
It all comes down to how the game is programmed anyway. You can certainly have slowdowns even if you are on a platform with a faster CPU if you waste time.
I like this post.
It takes less than 1% of a frame for any post-NES system for the CPU to touch every byte in the sprite attribute table atleast once, and the spacial difference a sprite moves from one frame to the next has no impact on how long it took the CPU to "touch" that sprite. Collision detection between 2 sprites doesn't take very long either. I think the cpu usually spends more time looking for objects to apply the collision detection to, than it does actually calculating collision, and even that can be illiminated by storing "bullet objects" in a seperate batch of object slots.
Speaking of objects slots, I know a little speed trick that can be done on the 65816 but not on a 6502 or 68000. I like to move the direct page to the object slot of the object to be processed. This way I can easily reach everything in the object slot, and when I have two identical objects, it tricks the CPU into thinking it's writting into the same registers when it isn't.
'Less than 1%' sounds very optimistic to me. That's 512 to 640 bytes, and in case with the SNES it arranged in not very convinient way (one bit of the X is packed these bits and sizes of with three other sprites).
Rough calculation for Genesis. 7.68 MHz, 128000t per frame, memory write opcode is ~10t (varies a lot). 12800 bytes written per frame / 640 bytes of sprite list = 20 times of the sprite list size per frame, one sprite list is 5% in the best case (doing nothing but writting data, like clearing or copying the list). Is my math seriously off?
I think a fairly considerable amount of time is spend "building" the sprite list each frame, atleast depending on the game. If you have more complex sprites, just moving a character like Sub-Zero in Mortal Kombat requires perhaps a different number of total sprite cells depending on the animation frame, and even if the animation frame is the exact same from one frame to the next you have to move each cell. And you aren't going to move a sprite by only modifying the X/Y coordinates unless in some very special case.
You'll going to draw characters instead with a function that takes a position and "meta sprite" definition and draw that character where needed. Now this isn't going to take all your time, but it will take some time. Calculating collisions and going through the whole list of potential collisions could take alot of time too.
Really it all depends how you spend your time and how well you optimize and eliminate waste of cpu time. You only have so much time to draw the sprites, process all game logic including collisions and movements, uploading new vram data, etc.
Nothing is free or instant. Any action is going to take some cpu time. The SNES has little CPU time to spare since it will range between 2.68mhz and 3.58mhz on average. The Genesis doesn't exactly have time to burn either.
The whole reason this is brought up is because of non-technical understanding fanboys spouting the same old hype. All they need is that clock speed number of the CPUs to justify that the Genesis has a "much faster processor" or whatever. Nevermind the totally different design and how it all applies to programming a video game.
Okay, I was wrong with the 1% thing, but it does take less than 1% for a post-NES system to perform a collision detection between 2 sprites.
I counted that the 65816 takes about 30 cycles to do a rectangular box collision algorithm. There are about 60000 cycles in an SNES frame. 30/60000 = .0005 = .05%
Quote:
and in case with the SNES it arranged in not very convinient way (one bit of the X is packed these bits and sizes of with three other sprites).
A little while ago I attempted to find what was causing slowdown in Gradius 3, and I found the oam clearing routine, and the way they dealt with the the hi-oam was an unefficient way. Gradius 3 shuffles back and forth between the low-oam and high-oam, one sprite at a time.
The way I manage the high-oam is when I am writing a sprite to the oam, I "AND #$01ff" the x-coordinate, and "ORA #$0200" for big sprites. Then I store the result in both the oam buffer, and a second list that is 544 bytes after the oam buffer. Then I overwrite the high-byte of the x-coordinate with the y-coordinate, and overwrite the high-byte of the y-coordinate with the attribute word. After all the game logic has been calculated, I use the table of 16-bit x-coordiates, and build the high-oam with it.
psycopathicteen wrote:
I know a little speed trick that can be done on the 65816 but not on a 6502 or 68000. I like to move the direct page to the object slot of the object to be processed. This way I can easily reach everything in the object slot, and when I have two identical objects, it tricks the CPU into thinking it's writting into the same registers when it isn't.
Isn't that the same as base+constant (4,A4) addressing on a 68000? Or is that slow due to an extra memory access? In any case, direct page and absolute,X addressing are the same speed on a 65816 if the direct page isn't 256-byte aligned and the absolute address is page-aligned.
It's atleast 12 cycles for the 68000, 4 for the opcode, 4 for the address, and 4 for the memory fetch.
On the 65816, it takes 3-5 cycles depending on word size and where the DP is located.
Today's secret word please, Conky:
Code:
gencycle
The Super NES dot rate is the same as that of the NES: 3/2 Fsc, or 945/176 = 5.369 MHz. The master clock rate is four times that, and cycles take six or eight master clocks depending on whether they access slow memory (RAM or slow ROM); fast ROM and internal operation cycles always take six clocks. This gives an effective CPU rate somewhere between 2.7 and 3.6 MHz.
Because of the 68000's state machine, we can consider the Genesis to have a master clock of 7.67 MHz and actually run at 1.92 MHz, where each cycle of the internal state machine takes 4 master clocks. But how exactly is the 7.67 MHz rate related to the pixel clock? It's slightly faster than the Amiga and TG16 clock speed of 2*Fsc = 7.159 MHz. The dot rate in 256px mode is the same as the NES and Super NES, and the dot rate in 320px mode is 5/4 that: 6.712 MHz. Is the 68000 clock suposed to equal 8/7 of this 320px dot rate, which would equal 10/7 times the 256px dot rate or 15/7*Fsc?
To derive a clock speed useful for comparison between the Super NES and Genesis, I derive an abstract unit called the
gencycle (AAAAAAAAA!), which is 1/2 of a fast (6 clock) cycle (that is, 7.159 MHz) or 1/3 of a slow (8 clock) cycle (that is, 8.054 MHz). Over the long run, the average period of a gencycle should be close to that of a Genesis master clock, allowing 65816 instruction timings to be quoted in units nearly commensurate with 68000 timings.
This 12-clock 68000 instruction corresponds to 10 or 12 65816 gencycles, as seen here:
Ordinary direct page instruction: 10 gc
- opcode fetch: 2 gc (fast ROM)
- offset fetch: 2 gc (fast ROM)
- data low: 3 gc (slow RAM)
- data high: 3 gc (slow RAM) (if 16-bit M/X)
Direct page instruction, D not 256-byte aligned: 12 gc
- opcode fetch: 2 gc (fast ROM)
- offset fetch: 2 gc (fast ROM)
- address generation: 2 gc (internal)
- data low: 3 gc (slow RAM)
- data high: 3 gc (slow RAM) (if 16-bit M/X)
Absolute indexed instruction: 12 gc
- opcode fetch: 2 gc (fast ROM)
- address fetch low: 2 gc (fast ROM)
- address fetch high: 2 gc (fast ROM)
- data low byte: 3 gc (slow RAM)
- data high byte: 3 gc (slow RAM) (if 16-bit M/X)
But to what extent do direct MOVEs, adds and subtracts with a memory destination, hardware MULs and DIVs, and 32-bit addition and subtraction give 68000 the edge?
Don't forget index register [pre](de)incrementation of index registers within instructions.
psycopathicteen wrote:
Quote:
and in case with the SNES it arranged in not very convinient way (one bit of the X is packed these bits and sizes of with three other sprites).
A little while ago I attempted to find what was causing slowdown in Gradius 3, and I found the oam clearing routine, and the way they dealt with the the hi-oam was an unefficient way. Gradius 3 shuffles back and forth between the low-oam and high-oam, one sprite at a time.
The way I manage the high-oam is when I am writing a sprite to the oam, I "AND #$01ff" the x-coordinate, and "ORA #$0200" for big sprites. Then I store the result in both the oam buffer, and a second list that is 544 bytes after the oam buffer. Then I overwrite the high-byte of the x-coordinate with the y-coordinate, and overwrite the high-byte of the y-coordinate with the attribute word. After all the game logic has been calculated, I use the table of 16-bit x-coordiates, and build the high-oam with it.
See what I mean, they wasted CPU cycles it sounds like. Causing more slowdown than needed to achieve what needed to be done. Here is a fun challenge since you might still have notes, did you try rewriting the routine optimized and seeing what sort of performance boost you could yield? That would be an impressive patch if you could significantly reduce slowdown in gradius 3.
Keep in mind though I think Gradius III was Konami's first Super Famicom game. It is not that strange that it would not be coded very efficiently.
http://www.emulationzone.org/consoles/snes/tech.htm
More idiots passing off incorrect technical information. They mentioned the cpu being "slow" 5 times, and wrote "slow" in all caps 4 of the 5 times.
My god this page is just so, so innacurate/retarded - obviously written by biased sonic fanboys.
Last updated in 1999 though so it's 13 y.o.
Yeah, that page is horrid. Have you tried e-mailing corrections to the maintainer? Say the presence of DMA to VRAM is equivalent to Blast Processing, and my gencycle theory might help alleviate the capitalized SLOW-ness. Apparently the
most current e-mail address is wacko413 at Hotmail.
The "64K at a time" part appears to have something to do with the data segment register set with the PLB instruction. On the 68000, on the other hand, a pointer fits in a single register.
That was a very funny read. It reminds you how bad the internet was for information back then.
Quote:
That was a very funny read. It reminds you how bad the internet was for information back then.
You still see this kind of incorrect information on Sega-16.com. Those people just can't get over the fact that they were wrong and we were right.
psycopathicteen wrote:
The way I manage the high-oam is when I am writing a sprite to the oam, I "AND #$01ff" the x-coordinate, and "ORA #$0200" for big sprites. Then I store the result in both the oam buffer, and a second list that is 544 bytes after the oam buffer. Then I overwrite the high-byte of the x-coordinate with the y-coordinate, and overwrite the high-byte of the y-coordinate with the attribute word. After all the game logic has been calculated, I use the table of 16-bit x-coordiates, and build the high-oam with it.
In other words, you emulate Metal Combat's OBC1 in software. I'm using this too, but I've discovered that the 32-byte buffer can be generated overlapping the 512-byte buffer.
Meh, since this thread was bumped I may as well read it... and wow there's so much problems here.
Also: memory accesses on the 68000 are probably the worst part of it. They're so pathetically slow I'd compare them to cache misses on modern CPUs, except you can't work around it. Avoid memory accesses at all costs, no matter what. Luckily there are enough registers that in practice RAM will pretty much get used only to store permanent state rather than variables in a subroutine, but yeah...
The biggest advantages of the 68000 are 1) having a much easier time manipulating large numbers (here is where most of the important optimizations can come in) and 2) being a tad easier to program for. Although I think people are underestimating the amount of time available in a frame, unless you're being wasteful a frame is usually plenty of time for a game (my code tends to stay around vblank duration even when I'm sloppy...). This goes for both the 68000 and the 65816.
tepples wrote:
But to what extent do direct MOVEs, adds and subtracts with a memory destination, hardware MULs and DIVs, and 32-bit addition and subtraction give 68000 the edge?
Hardware MUL and DIV don't give any edge at all since they're so pathetically slow nobody in their right mind would use them (and if you have raster effects, you outright can't use them because they'll delay the interrupt and you risk missing the timing window - DIVU/DIVS outright can delay as much as almost a third of a scanline).
32-bit operations help
a lot from experience. Yeah, most of the time you'll use 16-bit (in fact, maybe even 8-bit moreso than 16-bit), but there are a lot of places where having 32-bit operations helps, and they're a lot faster than doing the equivalent with 16-bit operations. They can also be useful if you're dealing with a lot of data in a go.
Adds and substracts to memory tend to be useful with counters and such. No need to waste time loading the value into a register when all you want is just to increment it. Direct moves fall under a similar usage, and also make things easier when you want to store constants.
PS: Thunder Force IV sometimes slows down with a
single enemy on screen, so huh... =P (although that PCM playback cuts the music is a much bigger offense to me, honestly)
Hardware multiply is a huge boon if you want to do 3D transforms.
Ah, all those SEGA fanboys spreading bullshit lies on the SNES. That's nothing new. The fact is, the SNES has about the same processing power as the MD, despite a much lower clock rate, but better graphics and sound. I understand it's annoying, however, there's no hope to having them ever shut up their lies over the so called supperiority of the MD.
In any case, who cares about the supperiority or inferiority of a system ? The NES is inferior to so many gaming systems so far and I still love it.
Actually, the format of the data used by the SNES hardware tends to get in the way too and makes code slower than it should be. The sprite table is probably the prime example of this, trying to add a sprite gets cumbersome when two of the bits are in a different place in the table and their byte is shared with other sprites. Also the fact that the only way the SPC can get any new data at all is through a busy loop with the 65816 (which is why so many games pause for a couple of seconds when switching the background music), which considering it's a sample-based chip, it can get really bad since those need a lot more of memory than synthesized sounds.
Also using planar format can hurt performance-wise, modifying a single pixel requires touching four bytes in RAM (assuming the usual 4bpp formats), if it was packed (like the Mega Drive's) it'd require touching only one byte as well as potentially simplifying code logic (mode 7 makes this even easier since 1 byte = 1 pixel, but it has a very limited amount of tiles so that reduces its usefulness). Granted, in practice like only 1% of games are probably affected by this, but I assume this is the main reason why the SuperFX is so slow for rendering (for comparison,
the 68000 alone on the Mega Drive can do something comparable to Star Fox).
Of course in practice it's generally the sloppiness of the code what actually tends to affect performance the most (both systems are full of games that are slow as hell when there isn't any justification for it). Prime examples would be Sonic 2 in the case of the Mega Drive, and Super Mario World on the SNES seems to be prone to get slow easily (though I don't recall the game slowing down? could be hacks what are affected mostly, that game does have a reputation for slowing down easily for some reason).
Super Mario World was also a launch title, and launch titles tend to be inefficient for two reasons. First, they're developed on preliminary hardware and much of the development effort is spent on continuously porting it to final hardware. Second, developers haven't yet discovered good workarounds for the hardware's quirks such as psycopathicteen's soft-OBC1 routine.
Finally, hacks tend to have more enemies per screen than the original game, when the original game's developers may have removed enemies from the map or, if a big group of them is necessary for a particular scene, taken time to optimize their code. And hacks, having been made by experienced players for experienced players, need more enemies for more difficulty.
The first two Sonic the Hedgehog games for Genesis used time-intensive compression methods to store all the tiles, meaning they couldn't replace as many tiles between sections of a level as Sonic 3 and later, which used a codec with faster (but less space-efficient) decompression.
I just went with those two because they were popular.
tepples wrote:
The first two Sonic the Hedgehog games for Genesis used time-intensive compression methods to store all the tiles, meaning they couldn't replace as many tiles between sections of a level as Sonic 3 and later, which used a codec with faster (but less space-efficient) decompression.
Um, no? Sonic 1 doesn't use any more compression than your average game. Sonic 2 does, but even then, in both cases it's just at load time, so compression doesn't factor in at all when it comes to slow down (only to increasing load times, thereby the title cards). Sonic 3 had some important changes to the engine but then again it doesn't slow down unless you abuse debug mode.
In other words, Sonic 2 is just sloppy as hell.
For the record:
STH1 uses the
Huffman-based format cracked by Nemesis used widely throughout the Genesis library. STH2 uses a mix of "Nemesis" and an
LZSS variant format cracked by Brett Kosinski. STH3 uses the Kosinski format more often.
Even code that does not have to deal directly with hardware, are written inefficient in SNES games, but not in Genesis games. A lot of SNES games are filled with subroutines where they push every register on stack, do something that takes 3 instructions, pull everything back from stack, and return. I never saw this happen in Genesis game code.
I wonder if the habit of pushing registers on the 65816 but not on the 68000 is because the 68000 has enough registers that it can get away with having an ABI that designates some as caller-saved. The 65816 has A for data and X, Y, D, and S for addresses; the 68000 has D0-D7 for data and A0-A6 and A7* for addresses. For equivalent code on both platforms, the 65816 might have to
spill variables to direct page more often.
Or the fact that C compilers for 65816 might not have been very advanced, and the publisher of a multi-platform game didn't necessarily want to have to pay someone to code something in 65816 assembly and again in 68000 assembly. Even compilers targeting ARM aren't perfect; look at the grumbling about "HLL" in
GBATEK.
* I list A7 separately because exceptions treat A7 specially.
I almost never use pushing and pulling in subroutines. I just don't expect the registers to be the same before and after a subroutine.
In other words, you use a
all-caller-saved convention. Does this include only AXY, or also the data segment (B), frame pointer (D), and accumulator, index size, and decimal bits of flags (MXD bits of P)?
In any case, parts of my games use a convention of treating X as callee-saved, referring to the index of the player, unit, sound channel, or other thing being worked on at the moment. Think of it as like the "this pointer" in C++.
tepples wrote:
In other words, you use a
all-caller-saved convention. Does this include only AXY, or also the data segment (B), frame pointer (D), and accumulator, index size, and decimal bits of flags (MXD bits of P)?
Probably just whatever the caller subroutine is expecting to use later.
And yes, the 65816 having way less registers is likely to be the cause. Also I don't think anybody even dared to use compilers on the 65816, it'd be way too slow, although C on the 68000 was reasonably efficient so compilers were more common there (still rare though, games were mostly asm). I think it was mostly Western studios that used compilers though, not Japanese studios (and Japan made the bulk of games).
That said, it was probably easier to just contract a company to do a port than to work on both platforms simultaneously =P
When I optimize my 68k code, I end up using more than half of each set (A and D) most of the time. I'd say that I pushed/popped more on the 68k than I have ever done on the 65x. Though I don't want to make it out to seem I did a lot of stack work on the 68k, 'cause I don't. Just more so than the 65x.
Japanese developers in general, of that era, were never that great at writing ASM code IMO, or from what I've seen. Euro coders on the other hand, obsessively proficient at it - but I never liked their games. Go figure. 68k gets you pretty good bang for the buck (code) without much effort. That's probably why it was soo popular with Japanese arcade developers. But then again, they still wrote crappy code. Just a few examples: Gradius II arcade board was a dual 12mhz 68k setup and it still slows down. One of the Metalslug games (2nd or 3rd one?) had to be fixed cause the game slowed down like crazy.
psycopathicteen: You should really try to hack/patch SGnG and Gradius III. Even if that would require a faster speed rom setup, too.
I nearly never find myself using the stack on the 68000. In fact I only do it in two situations: when I literally run out of registers (by this I mean I exhausted all of D0-D7 and A0-A6, which is extremely rare) or when I'm storing an array (e.g. the pathfinding algorithm in
Star Chaser). Otherwise RAM is pretty much touched just to store permanent state meant to stay around across subroutines (and often across frames).
Also I'm not surprised European developers were proficient at it, they had the Amiga after all =P
And yeah, arcades are probably the worst example of optimization - why optimize when you can just throw more money at it? Code was usually extremely inefficient (once they got beyond what they had in the early '80s). Also reminds me of how Sega used three 68000s in their arcades, but two of them went wasting doing a single task that barely took up any time - all of the processing really was in the first 68000. I can't help but think the only reason they added those 68000s is to make the hardware more expensive to clone.
I did post a slowdown reduced patch for SGnG in another thread.
Sik wrote:
And yeah, arcades are probably the worst example of optimization - why optimize when you can just throw more money at it?
Same could be said for gaming PCs today. In the 90's and the early part of the 2000's, PC tech was expensive and advanced VERY quickly, so developers had to target specific hardware and hope to get it done before the hardware was too old. Today, companies such as Bethesda can churn out the most sloppy game ever, but people will still buy and play these games because fans provide free patches that fix every massive issue with the game, and if that don't work, they just spend money on buying a new GPU. At least with consoles, you have a set of limitations you need to abide by, which is exponentially more strict than PC hardware, and never changes. A game developed for a console has a guarantee of working like the way it should, look at Metal Gear Rising. The PC version doesn't even need powerful hardware to run at full settings, even if they recommended an i7. Then there's curious cases like Saints Row 2, where they apparently took the console code and ran it through an interpreter, essentially making it a glorified emulation. Quite sad that the proud and arrogant PC Master Race focuses their efforts into running years old console ports the way they should be played.