rainwarrior wrote:
What is "sub-pixel latency" supposed to mean?
I was responding to the claim, regarding lag, that an emulator could have "none". In the context of a SNES emulator, I interpreted that as meaning zero (or, to be generous, less than two)
extra master clocks between input and output (not necessarily perceptible result, as that's partly on the display device) as compared with real hardware. I would probably be satisfied with less timing precision than that, personally, but that was the claim.
Quote:
If you want 2 simulated chips to correlate with cycle-accurate timing, that's entirely doable on PCs and RPis with emulators.
I know an emulator can be very accurate. I also know that multi-frame latency isn't inherent to software emulation in principle. But in practice, achieving both high accuracy and low latency at the same time is very hard, in particular for a specific console/chip combo that byuu has been complaining about quite recently. As you say, most of the latency is not due to the emulator, but is imposed by the computing environment that runs it. And while it may be different for NES, you can't run an accurate SNES emulator on a system that offers easier low-level access - you need a PC.
Quote:
This is not a performance problem, and I don't know why you think it must be. Many emulators are already doing this kind of thing just fine.
It is absolutely a performance problem. Why do you think higan takes so much CPU power? It's not the individual chips; it's all the syncing. You cannot run higan anywhere near full speed on a RPi.
Which brings up another way in which very low latency could potentially be expensive. If you're simulating exactly what the console does in real time, you have to sync all the chips every cycle. Unless things have changed quite a bit since last time I checked, the only reason higan runs at full speed even on a modern high-powered PC is that it's smart about only syncing when it has to. This means it can't generate half-dots every 93 ns on the tick; it's asynchronous and the output only comes together properly because the results are buffered.
(Correct me if I'm wrong about this.)
Quote:
that's a video device problem
The video device problem is a separate problem, and it applies equally to FPGAs or any other method of making new hardware pretend to be an old console. With a CRT it's not a problem. With an HDTV, the problem is actually worse for original hardware than for clones capable of using HDMI. I'd rather leave the display technology argument aside because it's a whole other discussion.
Quote:
Incidentally, you could probably do scanline-by-scanline output on many PCs' built-in video hardware in VGA mode while connected to a CRT, if you really wanted to go down this road. (Shader language would not help with this in any way, IMO.)
Interesting. But could you do pixel-by-pixel?
I mentioned shader language because I know that GPUs can be used for massively parallel general computing, and it occurred to me that it might be possible to leverage this in an emulator if the goal was a combination of very high accuracy and very low latency. I was basically handwaving at that point because I don't have any expertise with GPU programming.
Quote:
But even if you did this, this concept of zero latency vs 1 frame of latency is almost meaningless.
One frame is absolutely perceptible. Ever play Mario Golf? Even on a CRT, the shot control timing is noticeably late. Also, I've worked with digital music creation tools, and an ASIO buffer size of 20 ms (only a little more than a frame) is unacceptably long for live playing. It's on the same order as the amount of time it takes for a piano key to fully depress after being struck firmly.
According to that keyboard latency page linked earlier, humans can detect as little as 2 ms (0.12 frames) of lag, and perceived lag does make you worse at what you're doing.
...
Dwedit wrote:
Emulators are able to use tricks involving savestates (RunAhead) to skip the game's internal lag frames, and show frames from the future, and thus reduce input lag.
Interesting trick.
Quote:
If you have a CRT plugged in, you have beaten the original hardware at latency.
Not if you're using a framebuffer you haven't (well, the typical full-frame buffer anyway). Also, there are other factors besides the monitor that induce latency on a PC, even if you do figure out how to do direct line-by-line output.
What is hard GPU sync?
Quote:
good enough performance to run multiple frames at once.
That's kinda the catch, isn't it?
Besides, once you try to compensate for more lag than the original game had, you no longer have the necessary input data to emulate the future frames regardless of how fast you can render them, and run-ahead is no longer a perfect display of what the real system would show. If it takes five frames for your controller input to make it through the USB driver to the emulator to the graphics card to the monitor to the actual screen, you need to handle some of that some other way because run-ahead will give you glitches.
All of this game-specific hacking doesn't really damage the case that an FPGA is a "purer" way to get high accuracy at low latency (if anyone were to attempt to make such a case)...
...
Oziphantom wrote:
93143 wrote:
Super Accelerator System
This is an SA-1 cart? Adding a 10mhz 65816 is not going to be a problem at all, even if we write the code to run at the minimal step and get the code to emulate various bus levels I don't see that a current cpu would have any issue. Is there somewhere that documents the issues faced?
From the
Mesen-S thread:
byuu wrote:
The thing that hurts libco with the SA1 is that both the SNES CPU and SA1 can simultaneously access BWRAM and IRAM, which are of course volatile, and ROM can be dynamically remapped. So in effect, for perfect synchronization you would have to synchronize to the other component every time ROM, BWRAM, IRAM, and I/O registers were accessed, which is almost every cycle.
And
again:
byuu wrote:
The design of the SA1 is ingenius and evil: the CPU cannot be stalled because the SNES CPU has no concept of external wait states (/DTACK on the Genesis, for instance.) So instead, the SA1 detects when the SA1 CPU tries to access ROM, BWRAM, or IRAM while the SNES CPU is accessing it, and will insert wait states into the SA1 CPU.
Three years ago in a
different thread:
byuu wrote:
SA-1 memory conflict stalling is going to be the thing that totally destroys us. We're chasing our tails over a bit of SFX timing issues, but the SA1 is probably running 30% faster than it should.
Now, byuu is sometimes a bit hyperbolic about SNES emulation issues, but that doesn't sound trivial to me.
Oziphantom wrote:
Rahsennor wrote:
getting low latency on a 'modern' PC is a stone bitch.
This isn't a Emulation vs FPGA argument, this is "custom designed thing to do task X" vs "giant general purpose machine that multitasks and runs lots of different software" argument.
That's the entirety of the argument. An FPGA is not a unique philosophical primitive. If you think about it, it's really just a computer with an unusual architecture, programmed in an unusual language. Running a simulated console on an FPGA
is software emulation.
And for a variety of practical reasons, it's much better suited to certain low-latency parallel applications than C++ on a Windows PC.