I've been researching for the cpu emulator I'm building for my IDE, and I've found that nobody does what I did when executing an instruction.
What I've done is to pack all the procedures into one function, called runProgram(). It starts at the (currently) hard-coded address of $C000, reads an instruction, then processes the following data (depending on the addressing mode) then it increments the program counter by the recorded length of the opcode.
What I've seen others do is create one function to call other functions which (1) fetch the instruction, (2) decode the data, (3) execute, and (4) write it back (if necessary). Whatever analogue that they use to my runProgram() (which is usually the function that I've described as making these calls) does this in a loop, with an lcv (of sorts) to represent a cycle counter.
Now for my question: how does the 6502 process an instruction, and how do I account for interrupts (both IRQs and NMIs)?
The CPU interacts with the rest of the hardware through memory mapped registers, which means that every read and every write could possibly trigger some action somewhere else in the system. Writes and reads are probably handled in separate functions because it would be counterproductive to check for all the memory mapped hardware in multiple parts of the program.
For example, a write to address $2007 sends the written value to the PPU's data port. If you have a function to handle writes, it can detect that the address used is a PPU register and it can send the value to the PPU right away, at the precise cycle the write actually happened.
You can't get the timing right if you treat all of the memory as if it were RAM/ROM and you don't execute the individual cycles of each instruction. Instructions need different amounts of cycles to complete because they're doing something meaningful each cycle, so if you're going for accuracy it's important that you execute these steps in the correct order.
Interrupts are more annoying to handle IMO... Apparently the CPU is always checking whether an interrupt is pending before starting a new instruction, and when this is the case it does a series of procedures to call the interrupt handler (pushes the CPU flags and the current PC to the stack, jumps to the IRQ/NMI vector, etc.), but there are some priority and delay issues I don't really understand, so someone with deeper CPU knowledge will have to explain it to you.
So then, would you recommend this sort of processing routine? (written in pseudocode, of course)
variables:
<global scope>
dword cycles; //keep a 4-byte variable to track the current cycle
boolean isRunning;
function runProgram(word startAddr) //startAddr for start address
while isRunning is true
fetch instruction
determine addressing mode
fetch necessary data
perform the operation
write-back (if necessary)
endwhile
endfunction
(Edited twice, once in an attempt to correct what I thought to be a mistake, once to put it back. *DWORD is 32 bits as I thought in the first place*)
That's basically it, but you can't simulate the CPU very far ahead of the other chips in the NES or interrupts won't happen on the correct CPU cycle and odd bugs will pop up in games with strict timing requirements. Especially Codemasters games. So every time you advance the CPU a cycle you have to advance certain counters in the APU, PPU, and mapper chips, or you have to be able to save and restore the CPU state to allow for backtracking when an interrupt is determined to have happened. (This gets complicated quickly.)
Also the cycle counter will have to be reset once a frame or so, because otherwise you'll roll over a 32 bit counter in less than a second.
Yeah, emulating only the CPU isn't so hard, the real problem is emulating the CPU, the PPU, the APU and the mapper, all in parallel, because what one does affects the other. For example, writes to PPU addresses will change how the picture is rendered, the picture being rendered will trigger a mapper IRQ, and a mapper IRQ will interrupt the CPU... So you can't simply run one of these parts and forget about the others, because on a real system they're all running in parallel.
To emulate the NES you'll either have to alternate the emulation of the different parts by the same amount of time (the amount of time you emulate will determine how accurate your emulator is: it could be 1 cycle or a full frame!) or predict when the changes that affect other parts will happen, so that you can stop at these moments and apply the changes as necessary.
Tepples posted what I was going to say. You can have a system that keeps track of register writes and when they occurred to read them back later. Ofcourse some actions will force that you sync up. Any game that uses Sprite Hit Zero to detect a raster position will constantly need to jump back and forth between CPU and PPU emulation. But many other games you could run the CPU for a long time before needing to run the PPU and update the screen.
The main reason to do this is performance which isn't very relevant to modern PCs. If your target is a typical PC you might as well run things a CPU instruction at a time followed by catching the PPU up. Or go even further and execute a CPU cycle at a time and then do the PPU. Either way your modern PC is going to be very fast and have a huge amount of cache. In the past this certainly was not true. Think back to 1999 or so and you'll see that trying to optimize your program probably involved having certain pieces of code fit into cache.
I don't think it's "the real problem" to emulate these all at once. Actually it's very easy to just run an instruction at a time and then however many PPU cycles to catch up. It's very straight forward and easy to understand. Programming a catch-up type optimization is when things get tricky. Or when you want to be *really* accurate where you run a single cpu cycle at a time, for issues with instructions that do RWM or whatever it is called which can affect MMC1. It's a bit more confusing to actually run a CPU eumlator where an instruction is run in pieces compared to the full instruction at a time.
Thank you all for answering!
My final product (with all things implemented) will have an emulator that runs entirely in modules. The user will be able to select a console system, and the IDE will select a default set of plugins. To use NES emulation as an example, there will be separate modules for PPU, APU, and CPU emulation, plus whatever peripherals there are that need to be emulated. I am confident that I can use Java for the framework, and the JNI to interface with code written in any other language (though I realize that this limits me to anything which is compatible with C). I'll deal more with this when I get to it. Right now my current focus is strictly limited to achieving CPU emulation.
Speaking of which, I'll plan on running the CPU emulator for one cycle, then bringing the other processing units up to the same point, as my method of emulation. Perhaps I'll create an ADT for instruction cycle information?
Also, I've got the source code for the HalfNES emulator, which is also written in Java. HalfNES separates the operations of the CPU from the RAM, and I want to do the same thing. My question, however, is whether I should allocate space in a CPU Ram ADT for the full 64K of memory, or just for the first 2K?
The NES only has 2K of RAM. You don't allocate 64kb of space. You need to account of memory mirroring, and ROM banking. When programming in C++, pointers are often used. If not using pointers you can setup your own type of pointer system.
The point is the 64K of address space on the NES is not just a flat chunk of memory. You have 2K of RAM which is the most straight forward section. But then you have memory mapped registers, and then cartridge space that could contain technically anything, and you must be able to alter it depending on the game being run, and while it is being run. It's a huge mistake to try to just allocate 64KB of memory and copy chunks in for handling ROM banking.
I'm actually the author of most of HalfNES, so feel free to ask me about the code. Most things in that emulator are not synchronized to cycle level accuracy, or even instruction level accuracy, unless games rely on it. This was needed to get decent speed in Java on the computer I had at the time. Probably now I could go back and increase the granularity to fix a few games like Fire Hawk.
CPU reads and writes go through two levels of indirection and switch statements , one for the fixed memory mappings for RAM and APU and PPU registers and one for all of the address space controlled by the mapper. A pointer table might be used in C but especially for different mappers the significant bits for selecting a register can be anywhere in the address.
Also, in Java it is a good idea to use bytes packed into ints for most of your memory because otherwise you end up casting to a byte after every operation since Java auto-promotes to an int and needing to be careful about signs since Java lacks an unsigned type.
@Grapeshot: wow, I honestly didn't think that it was you. I mean, I did notice that the package was named "com.grapeshot.halfnes", but I... never mind. Now that I can communicate with you, would you mind if I use HalfNES sourcecode to create the modules that I was talking about? It would be so much more simple, as HalfNES is already well developed, and I know Java pretty well (that int promotion thing is a big pain, I know). Of course, this would be subject to any conditions that you would choose to name, but if I can use a working emulator instead of building my own, I'm already 1/3 of the way to the first prototype of my IDE.
Why not take source from MAME and a 6502 game? That's exactly why it exists, AFAIK.
To be honest, my reason is because I'd never head of MAME. Now that you've told me about it, though, I'll look into it. In fact, before logging in (and after seeing your post) I did a google search for MAME and got the web site. I'll be sure to examine the files over the weekend. Thanks!
The MAME source code is licensed with some restrictions that might not be appropriate for all projects.
[Years later, this was changed.]
Feel free to base things on my code, but I can't license the CPU under anything but GPL any more since I'm not the only one who has contributed. CPU and mappers are in decent shape, everything else should not be used as a reference (at least not for timings).