Hi, nesdev.
I think this info will be interesting.
Q:
A:
21.01.2017
I think this info will be interesting.
Q:
Me wrote:
Hi, Marty.
Just want to tell thank you again
for great nestopia emulation core.
I did the test compare performance of modern cycle-accurate
emulators (written on C and C++) vs nestopia
on old intel-atom N550 1.50 GHz machine.
Results is amazing.
- puNES 0.100
- nintendulator 0.975b
- mesen 0.7.0
- bizHawk 1.11.9
- rockNES 5.41
All of them eats 100% of CPU core and cannot
run fullspeed on the old low-powered netbook CPU. It gives only 30-40 FPS without frameskipping.
(real performance of Atom N550 is about good Pentuim 3~1000MHz)
Nestopia result is only 40-45% CPU load, and it run at 60FPS fullspeed!
FCEUX with old inaccurate scanline-based PPU render + low sound quality have the same performance.
For now, nestopia-libretro core (in fact it's your core with minimal modifications by Rdanbrook)
work perfect on the Raspberry Pi 3.
I wonder how you did so _heavy_ optimization of your cycle accurate emulator!
Just want to tell thank you again
for great nestopia emulation core.
I did the test compare performance of modern cycle-accurate
emulators (written on C and C++) vs nestopia
on old intel-atom N550 1.50 GHz machine.
Results is amazing.
- puNES 0.100
- nintendulator 0.975b
- mesen 0.7.0
- bizHawk 1.11.9
- rockNES 5.41
All of them eats 100% of CPU core and cannot
run fullspeed on the old low-powered netbook CPU. It gives only 30-40 FPS without frameskipping.
(real performance of Atom N550 is about good Pentuim 3~1000MHz)
Nestopia result is only 40-45% CPU load, and it run at 60FPS fullspeed!
FCEUX with old inaccurate scanline-based PPU render + low sound quality have the same performance.
For now, nestopia-libretro core (in fact it's your core with minimal modifications by Rdanbrook)
work perfect on the Raspberry Pi 3.
I wonder how you did so _heavy_ optimization of your cycle accurate emulator!
A:
Marty wrote:
Thanks Eugene. Nice to hear from you again, hope you are well.
Doing code optimizations without sacrifizing accuracy can be
real fun and I'm happy to see it payed off.
As for the various optimizations I did to Nestopia at the time,
I heavily used Intel Vtune and AMD CodeAnalyst profiler to
find hotspots in the code and also let the compiled IA-32 assembly
code guide me through it.
I also made heavy use of (or abused if you will) C++ template style
programming, or concept-oriented programming as I'd like to call it,
to let the compiler do as much work for me as possible and allowing
me to not needing to repeat myself in code.
Using the Intel C++ Compiler and Microsoft Visual Studio at the time, I
also fine-tuned many parts of the code through compiler directives to give
hints to the compiler on what to optimize for speed and what to optimize for
size.
As a programmer, having a knowledge of low level stuff such as branch-prediction, cache-lines
and other things helped a lot during development. Even if you're developing something in a high-level
language such as Java, C#, Python, I believe you can still influence performance a great deal in the way
you structure and arrange your code.
For reference and maybe not surprisingly, the most critical method for performance in the whole Nestopia code
base I remember was Ppu::renderPixel(). That one I remember optimizing to be ~20FPS faster just by re-arranging
some statements. That was surely a branch-condition killer, but by allowing the CPU to not stall and do other work in parallell made it almost free.
Doing code optimizations without sacrifizing accuracy can be
real fun and I'm happy to see it payed off.
As for the various optimizations I did to Nestopia at the time,
I heavily used Intel Vtune and AMD CodeAnalyst profiler to
find hotspots in the code and also let the compiled IA-32 assembly
code guide me through it.
I also made heavy use of (or abused if you will) C++ template style
programming, or concept-oriented programming as I'd like to call it,
to let the compiler do as much work for me as possible and allowing
me to not needing to repeat myself in code.
Using the Intel C++ Compiler and Microsoft Visual Studio at the time, I
also fine-tuned many parts of the code through compiler directives to give
hints to the compiler on what to optimize for speed and what to optimize for
size.
As a programmer, having a knowledge of low level stuff such as branch-prediction, cache-lines
and other things helped a lot during development. Even if you're developing something in a high-level
language such as Java, C#, Python, I believe you can still influence performance a great deal in the way
you structure and arrange your code.
For reference and maybe not surprisingly, the most critical method for performance in the whole Nestopia code
base I remember was Ppu::renderPixel(). That one I remember optimizing to be ~20FPS faster just by re-arranging
some statements. That was surely a branch-condition killer, but by allowing the CPU to not stall and do other work in parallell made it almost free.
21.01.2017