So I recently got my emulator to draw stuff (yay) and what it basically does is draw the entire background over 341 * 262 or so cycles, and when the background is done, it draws every single sprite it has to draw at once. I know the method is extremely flawed, and I'm going to improve it, but for now I have a different issue:
When running the game Mario Bros (not Super, but the original), the sprite of mario gets drawn facing to the left after I start the single player game (i.e. the un-flipped sprite as in the pattern table), and the character just starts automatically moving to the right while still facing to the left. The sprite remains static, and when I press the right button, the character will flip to the correct direction (i.e. the right) and will then proceed to show the walking animation as the walking slightly speeds up. Releasing the right button causes the character sprite to flip to the left again and proceed moving to the right at a slower pace. Pressing the left key keeps the sprite facing to the left, shows the running animation, but speeds up to the right!
Even the enemy turtles all move to the right, even though some of them face to the left and some of them face to the right. Basically, this game seems to be unable to move anything to the left. What could be the cause of this?
Holding no key (ALL sprites move to the right, Mario sprite facing left, Mario animation static) :
Holding right key (ALL sprites move to the right, Mario sprite facing right, Mario running animation showing) :
Note that the turtle facing to the left is also moving to the right.
Holding left key (ALL sprites move to the right, Mario sprite facing left, Mario running animation showing) :
Note that the turtle on the upper right is moving to the right over the green pipe even though it's supposed to go left.
I did test Donkey Kong (works perfectly), Donkey Kong Jr. (works perfectly), Astro Robo Sasa (works nearly perfectly except the vertical flipping of sprites since I have yet to implement that), dig dug (gets stuck after starting the game and the character digs himself to the middle and there are weird artifacts all over the screen), Arkanoid (writes to $4025 but never reads from it for some reason, and I haven't implemented expansion ROM, so I have a hacky "pseudo-write" in there just to silence the emulator from giving errors, but it works perfectly fine for the 3 minutes I've tested it) and Galaxian (doesn't work well, shows too many enemy sprites, and destroyed enemy sprites don't disappear. Also some movements are glitchy).
Does anyone have a clue? I've been on this issue for like 10 hours, and normally there'd be a way to trace to the error, but I couldn't find any. Disassembling Mario Bros was an idea, but the code is so unclear to me that it's practically useless.
ArsonIzer wrote:
So I recently got my emulator to draw stuff (yay)
Congrats. That's a big and satisfying step, for sure.
ArsonIzer wrote:
...
Have you checked your emulators SBC implementation? I had strange movement bugs in SMB [I know...not the original] when I had a weird bug in my SBC. But...it was a ridiculously long time ago and I've slept since then, so I can't remember exactly what it was. Although I did have trouble with the overflow flag. I also recall various other games worked seemingly flawlessly even with my SBC bug.
Something about always moving in a positive direction makes me think that something's wrong with your SBC.
Now that you have controls, you can run individual tests in nestest. Do all official instructions pass?
You don't need to implement SBC as a separate opcode. It's the same as ADC, except the memory operand is EOR'd with #$FF first. If it's not in SBC, it might be in CMP.
ArsonIzer wrote:
Arkanoid (writes to $4025 but never reads from it for some reason, and I haven't implemented expansion ROM, so I have a hacky "pseudo-write" in there just to silence the emulator from giving errors, but it works perfectly fine for the 3 minutes I've tested it)
Arkanoid seems to be an FDS port, or something; by elimination, those writes to $4025 have to be to the FDS motor control register. You don't need to implement it; the cartridge couldn't coexist with the FDS.
Quote:
and Galaxian (doesn't work well, shows too many enemy sprites, and destroyed enemy sprites don't disappear. Also some movements are glitchy).
Galaxian requires correct emulation of mid-screen X scroll splits, but nothing fancier.
tepples wrote:
Now that you have controls, you can run individual tests in nestest. Do all official instructions pass?
You don't need to implement SBC as a separate opcode. It's the same as ADC, except the memory operand is EOR'd with #$FF first. If it's not in SBC, it might be in CMP.
All official instructions pass using NESTest. I just press start and they all show OK. Does that still leave SBC and CMP as suspects?
EDIT: Also, disassembling any game shows that CMP and SBC are being used GENEROUSLY to say the least. I can't imagine every other game working perfectly, using like a hundred of those instructions, and Mario Bros being the one which fails.
ArsonIzer wrote:
EDIT: Also, disassembling any game shows that CMP and SBC are being used GENEROUSLY to say the least. I can't imagine every other game working perfectly, using like a hundred of those instructions, and Mario Bros being the one which fails.
It depends on how they use the SBC result. If they're merely looking for oVerflow bit set after an SBC, for example. I'm not remembering nestest and I'm not infront of my home computer so I can't check it out, but it might be possible you're passing nestest but nestest doesn't test each-and-every possible SBC result/flags variation. Have you tried other opcode test ROMs? Like
this one, or
this one?
cpow wrote:
ArsonIzer wrote:
EDIT: Also, disassembling any game shows that CMP and SBC are being used GENEROUSLY to say the least. I can't imagine every other game working perfectly, using like a hundred of those instructions, and Mario Bros being the one which fails.
It depends on how they use the SBC result. If they're merely looking for oVerflow bit set after an SBC, for example. I'm not remembering nestest and I'm not infront of my home computer so I can't check it out, but it might be possible you're passing nestest but nestest doesn't test each-and-every possible SBC result/flags variation. Have you tried other opcode test ROMs? Like
this one, or
this one?
I did attempt other tests, but I currently don't support mapper 1 which seems to be a prerequisite for some. Other than that, testing the individual Blargg ROMs reveals that, if I remember correctly, everything works perfectly except the BRK instruction (which seems to make the program take a wrong turn at some point). While games don't seem to mind (I think) there are some issues with the I/B/U flags which I'm not entirely aware of (i.e. is the U flag always supposed to be a 1, which some sources claim, or is it always supposed to be a 0, which FCEUX often does and it seems to pass the BRK test). Other than that, everything works fine as far as I know.
Proper handling of flags on the stack is essential for Galaxian.
There is no "U flag" that I know of, but bit 5 of P is always pushed as 1. PHP and BRK always push B (bit 4 of P) as 1, and /IRQ and /NMI push it as 0. PLP and RTI ignore bits 5 and 4.
tepples wrote:
Proper handling of flags on the stack is essential for Galaxian.
There is no "U flag" that I know of, but bit 5 of P is always pushed as 1. PHP and BRK always push B (bit 4 of P) as 1, and /IRQ and /NMI push it as 0. PLP and RTI ignore bits 5 and 4.
So do PHP and BRK also set bit 4 as 1, or do they just push it and let the P register retain its original value? In my source, for some reason, the BRK instruction does always push bit 4 as 1, but retains bit 4's original value in the current value of P. I thought I read that somewhere, but I forgot where.
Also, if PLP and RTI ignore bits 5 and 4, does it mean that bits 5 and 4 are both set to 0 when pulled, or is bit 5 set to 1 and bit 4 set to 0?
PS: Oh, forgot, by U I mean the 'Unused' flag, which is indeed bit 5 in the P register.
Galaxian does this in its NMI handler:
Code:
TSX ; S -> X
CPX #<stack ; check if stack overflow
BCC Error ; carry = S>=$30
LDA $0100+4,x ; check 4 bytes above TOS i.e. flags
AND #$18 ; check if BRK or DECimal was set
BNE Error
LDA $0100+6,x ; check 6 bytes above TOS i.e. MSB of return addr
CMP #>Reset ; see where the NMI was called from
BCC Error ; carry = NMI called from 0xE000-0xFFFF
Correct handling of the $20s bit isn't necessary for Galaxian.
Anyway: the B bit doesn't exist. The P register is 6 bits in size. Nothing "sets" the B bit in P because it doesn't exist. There's nothing to keep track of because it doesn't exist.
Your NMI and IRQ calls should always push the value of P on the stack with the $10s bit clear, and PHP and BRK should always push the value on the stack with the $10s bit set.
PLP and RTI shouldn't do anything to the $10s and $20s bits because they don't exist.
lidnariq wrote:
Galaxian does this in its NMI handler:
Code:
TSX ; S -> X
CPX #<stack ; check if stack overflow
BCC Error ; carry = S>=$30
LDA $0100+4,x ; check 4 bytes above TOS i.e. flags
AND #$18 ; check if BRK or DECimal was set
BNE Error
LDA $0100+6,x ; check 6 bytes above TOS i.e. MSB of return addr
CMP #>Reset ; see where the NMI was called from
BCC Error ; carry = NMI called from 0xE000-0xFFFF
Correct handling of the $20s bit isn't necessary for Galaxian.
Anyway: the B bit doesn't exist. The P register is 6 bits in size. Nothing "sets" the B bit in P because it doesn't exist. There's nothing to keep track of because it doesn't exist.
Your NMI and IRQ calls should always push the value of P on the stack with the $10s bit clear, and PHP and BRK should always push the value on the stack with the $10s bit set.
PLP and RTI shouldn't do anything to the $10s and $20s bits because they don't exist.
I see, thanks for the explanation. It seems that I am handling those bits correctly. It's weird that my program fails Blargg's individual BRK test ROM though. There might be a tiny bug I haven't noticed somewhere in the program which makes it go into a branch where it shouldn't go, as it ends up jumping to $03A0 at the end, which shouldn't happen. I'll try fixing that tomorrow, and if I'm lucky I'll hit 2 birds with one stone and it will fix the glitchy Mario Bros behavior too. I doubt it though, but who knows. If anyone has any other suggestions, please keep them coming.
Got bad news for 'ya. nestest ain't perfect.
viewtopic.php?f=5&t=9893&hilit=nestest
WedNESday wrote:
Got bad news for 'ya. nestest ain't perfect.
viewtopic.php?f=5&t=9893&hilit=nestestFigured as much. Thing is, my BRK function appears to be broken indeed, since it apparently incremented the PC after it went to the address at the BRK/IRQ vector, which made my emulator jump to a wrong instruction at some point and throw an exception (or in non-Java terms: give an error). Now Blargg's 14-brk.nes test fails, rather than making my emulator crash (which is good-ish), but I have no clue what the new problem could be. Are there any other tests or things I could do that might pinpoint the issue, or at least get me a bit closer? This is what my BRK and PHP instructions do:
BRK:
- Pushes high byte of PC value to stack
- Pushes low byte of PC value to stack
- Sets Interrupt (I) bit in status register
- Temporarily sets Break (B) bit in status register
- Pushes status register
- Sets the PC value to whatever's at the BRK/IRQ vector
- Sets Break (B) bit in status register back to its original value before the temporary change
PHP:
- Temporarily sets Break (B) bit in status register
- Pushes status register
- Sets Break (B) bit in status register back to its original value before the temporary change
Any remarks on what might be wrong?
Firstly, I've never encountered a game that uses BRK myself, so I can assure you 0 NROM games rely on it. The B bit doesn't exist on the 6502, only on the value pushed onto the stack by BRK and PHP.
BRK cycles;
1. Read Opcode, Increment PC
2. Read next Opcode (do nothing with it), Increment PC
3. Push PC High, Decrement S
4. Push PC Low, Decrement S
5. Push P with B bit set, Decrement S
6. Read BRK/IRQ vector high into temp
7. Read BRK/IRQ vector low into temp, copy temp to PC, set I
PHP cycles
1. Read Opcode, Increment PC
2. Read next Opcode (do nothing with it)
3. Push P with B bit set, Decrement S
http://nesdev.com/6502_cpu.txt (if you didn't already know about it)
WedNESday wrote:
Firstly, I've never encountered a game that uses BRK myself, so I can assure you 0 NROM games rely on it. The B bit doesn't exist on the 6502, only on the value pushed onto the stack by BRK and PHP.
BRK cycles;
1. Read Opcode, Increment PC
2. Read next Opcode (do nothing with it), Increment PC
3. Push PC High, Decrement S
4. Push PC Low, Decrement S
5. Push P with B bit set, Decrement S
6. Read BRK/IRQ vector high into temp
7. Read BRK/IRQ vector low into temp, copy temp to PC, set I
PHP cycles
1. Read Opcode, Increment PC
2. Read next Opcode (do nothing with it)
3. Push P with B bit set, Decrement S
http://nesdev.com/6502_cpu.txt (if you didn't already know about it)
So what you're saying is that even though when pushing the P register with B bit set, the actual P register never changes, and when you use RTI or PLP, the pulled byte's 4th and 5th bits are ignored? That I somewhat understand, and although the P register's 4th and 5th bit "don't exist", they're still supposed to have a value after RTI, PLP and at the initial start of the emulator, no? So from what I've understood, the nonexistent 5th bit is ALWAYS a 1 (probably as a convention), and the nonexistent 6th bit can be anything because it's not supposed to be used anyway?
Also, when disassembling games, there are BRK instructions scattered throughout the source, although it's highly likely that those are just filler zeroes or whatever. Really no NROM game relies on it? Then I'm semi-wasting my time trying to fix it I suppose, since my emu goals for now is to just run NROM games perfectly before I start implementing more complex mappers... now I'm still stuck with my Mario Bros issue ._.
ArsonIzer wrote:
my emu goals for now is to just run NROM games perfectly before I start implementing more complex mappers
That goal can get very complicated very fast. A relatively simple UNROM game might prove easier to emulate than an NROM game that relies on obscure PPU or APU behaviors. But in general, you're right that games made later in the NES's life are more likely to rely on these behaviors.
I can't think of a single game that uses BRK to be honest. Maybe there are some out there. You are right, most of those 00s are simply filler.
Bit 5 (0x20, $20, 00100000) is always set to 1. So the value that is pushed onto the stack always has this bit set. It's the only time the bit is known to exist because no opcode uses it.
Bit 4 (0x10, $10, 00100000) doesn't exist at all. When BRK/PHP are used the value on the stack has it set. Like above, its the only time it is known to exist because no opcode relies on it.
Since these bits don't exist, PLP/RTI simply ignore them when updating the P register. So PLP and RTI have bit 5 'set' afterwards and bit 4 'clear' afterwards.
http://slack.net/~ant/nes-emu/6502.htmlPlease read blargg's guide on how to handle the 6502's P register. Not only is it faster and simpler it'll put your concerns to rest.
tepples wrote:
ArsonIzer wrote:
my emu goals for now is to just run NROM games perfectly before I start implementing more complex mappers
That goal can get very complicated very fast. A relatively simple UNROM game might prove easier to emulate than an NROM game that relies on obscure PPU or APU behaviors. But in general, you're right that games made later in the NES's life are more likely to rely on these behaviors.
Super Mario Bros. springs instantly to mind...
http://wiki.nesdev.com/w/index.php/Tric ... late_gamesRead up on the section about Super Mario Bros..
WedNESday wrote:
Super Mario Bros. springs instantly to mind...
http://wiki.nesdev.com/w/index.php/Tric ... late_gamesRead up on the section about Super Mario Bros..
Yup I know, it shows the title screen just fine on my emu, but it just freezes right there.
To get back to the PHP/BRK instructions, apparently the 01-basics.nes Blargg test ROM fails on those. It says "PHP and PLP should preserve bits 7,6,3,2,1,0." and the fail message is "Failed #4". Let's see if fixing that helps.
ArsonIzer wrote:
WedNESday wrote:
Super Mario Bros. springs instantly to mind...
http://wiki.nesdev.com/w/index.php/Tric ... late_gamesRead up on the section about Super Mario Bros..
Yup I know, it shows the title screen just fine on my emu, but it just freezes right there.
To get back to the PHP/BRK instructions, apparently the 01-basics.nes Blargg test ROM fails on those. It says "PHP and PLP should preserve bits 7,6,3,2,1,0." and the fail message is "Failed #4". Let's see if fixing that helps.
SMB uses Sprite 0 hit detection to not only scroll the screen but also to activate the game. Here is a hack to get it working real fast
.
Code:
if (Scanline == 30)
PPUStatus |= 0x40;
else
PPUStatus &= 0xBF;
WedNESday wrote:
Code:
if (Scanline == 30)
PPUStatus |= 0x40;
else
PPUStatus |= 0xBF;
I assume you meant "PPUStatus &= 0xBF;" (or "PPUStatus &= ~0x40;") for that last line.
Also, it probably ought to trigger if
greater than or equal to 30.
Quietust wrote:
WedNESday wrote:
Code:
if (Scanline == 30)
PPUStatus |= 0x40;
else
PPUStatus |= 0xBF;
I assume you meant "PPUStatus &= 0xBF;" (or "PPUStatus &= ~0x40;") for that last line.
Also, it probably ought to trigger if
greater than or equal to 30.
Whoops! Edited.........
30 works for me perfectly.
WedNESday wrote:
30 works for me perfectly.
The point is that the sprite 0 hit flag stays set for the remainder of the frame, not just for the duration of that one scanline, thus a greater-or-equal check.
WedNESday wrote:
SMB uses Sprite 0 hit detection to not only scroll the screen but also to activate the game. Here is a hack to get it working real fast
.
Code:
if (Scanline == 30)
PPUStatus |= 0x40;
else
PPUStatus &= 0xBF;
Hahahaha that actually works, nice. Turns out my horizontal scrolling is beyond messed up, but this Mario moves just fine! Then again, scrolling is so messed up that the background changes to areas in the nametable where it shouldn't be, and at some glitchy point Mario just starts blinking like he just ingested one of those invincibility thingies, which after he starts falling through the screen over and over until the game resets the level.
Thanks for the hack, I think I at least have SOMETHING to fix for now.
Here are me and Quietust arguing over a hack. Shame on us!
Delete it now please. NEVER introduced hacks into an emulator. Ever. Its such bad practice.
WedNESday wrote:
Here are me and Quietust arguing over a hack. Shame on us!
Delete it now please. NEVER introduced hacks into an emulator. Ever. Its such bad practice.
Of course I wouldn't actually use it
. I want an emulator, not a crappy compilation of bad hacks. It was just for testing purposes, and well... there's still a lot to test I suppose.
Adding a hack while testing other parts of a program with intent to remove it soon, as in this case, is called the
mock object pattern.
tepples wrote:
Adding a hack while testing other parts of a program with intent to remove it soon, as in this case, is called the
mock object pattern.
I've heard similar things, but doesn't that only apply to complete constructions of objects rather than 5 random lines of code? I.e. a quick print line inside of an if-statement which you use for 1 test run vs mimicking the behavior of a hard drive as to not crash your own, which you will probably use extensively for unit tests and whatnot.
I wouldn't exactly say that the former is a Mock Object.
IMHO you've a large CPU bug somewhere. Any chance of some source code?
WedNESday wrote:
IMHO you've a large CPU bug somewhere. Any chance of some source code?
I fixed the BRK bug. This is what was wrong:
- I set the I (Interrupt) flag BEFORE pushing the P register onto the stack, rather than after
- I pushed the BRK address + 1 instead of the BRK address + 2 onto the stack
Now it passes the BRK and the Special test, but it seems that there's something wrong with my implied addressing. Now fixing that.
About the source, yeah, that's not so necessary now, although there seems to be something wrong with my implied addressing (brb fixing). If you're talking about the Mario Bros bug, then I have to tell you that I prefer not to share large quantities of my source right now, simply because it's kind of questionably written in my eyes (I'm not a neat person when I'm trying to get things done quickly).
EDIT:
Never mind, seems that the Implied test ROM uses unofficial instructions (at least it uses $1A which is an unofficial NOP). Just incrementing the PC by 1 every time it encounters an unknown instruction does allow my emu to pass the implied test.
Increasing PC by 1 won't help you with Puzznic though.
Dwedit wrote:
Increasing PC by 1 won't help you with Puzznic though.
No intention of emulating that just yet
Incrementing the PC by 1 for every unofficial opcode is like doing the same for unsupported legal opcodes; it normally spells immediate death.
WedNESday wrote:
Incrementing the PC by 1 for every unofficial opcode is like doing the same for unsupported legal opcodes; it normally spells immediate death.
Except that the unofficial opcodes used in the immediate Blargg test ROM were unofficial NOPs, so it doesn't make a difference since PC+1 does exactly that. I did notice that it messed up the rest of the tests using unofficial opcodes, but that's irrelevant for now since I'm just going on a straight path to basic emulation, rather than paying attention to all the "less important" side artifacts.
BTW: Fixed the SUPER Mario Bros horizontal scrolling issue (still using the hack to test scrolling of course), which appeared to be very simple: I have a for-loop which fetches 33 horizontal tiles per scanline, and instead of grabbing the nametable (loopyV & 0xC00) every single iteration, it grabbed the nametable only after every scanline, which made it wrap around the current nametable rather than the one to its "right", causing nametables to wrap around instead of scroll to the next. I simply moved the line which grabbed the nametable into the loop.
Wrong:
Code:
int currentNameTable = loopyV & 0xC00;
for(int tileX = 0; tileX < 33; tileX++) {
//Proceed to fetch tile data from incorrect nametable
}
Right:
Code:
for(int tileX = 0; tileX < 33; tileX++) {
int currentNameTable = loopyV & 0xC00;
//Proceed to fetch tile data from correct nametable
}
I highly doubt anyone would make such a simple mistake and not realize it immediately, but just in case, there you go ^.
Now my original Mario Bros problem still remains.
You appear to be going for illegal opcodes and full PPU emulation when you clearly still have issues with legal opcodes.
I know that it is a pain to have to go back and do so but it looks like you will have to test each legal opcode again (just 1 of each). I am currently writing my first Game Boy CPU emulator and I was at one point 50%+ complete only to delete everything and go back to the start.
Unless of course you posted your CPU code. Then we can help you a lot faster, possibly without a rewrite.
WedNESday wrote:
You appear to be going for illegal opcodes and full PPU emulation when you clearly still have issues with legal opcodes.
I know that it is a pain to have to go back and do so but it looks like you will have to test each legal opcode again (just 1 of each). I am currently writing my first Game Boy CPU emulator and I was at one point 50%+ complete only to delete everything and go back to the start.
Unless of course you posted your CPU code. Then we can help you a lot faster, possibly without a rewrite.
Maybe if it really gets out of hand and I can't solve it in any way, I'll upload my source, but I'd prefer not to for now. Also, if you re-read what I said, I was saying that I DIDN'T want to worry about less important side artifacts (like unofficial opcodes) and focus on getting the basics to work perfectly. I am not yet planning on cycle/pixel-precise rendering if I can't even run one of the most basic games
To add to that, I'm going to rewrite the emulator when I have the basics running anyway. It was my plan from the start, and I've already rewritten it twice, simply because it's my first and I'm learning so many things on the way that a rewrite is much more viable and realistic (and helps refreshing my memory on code I haven't looked at for a while) than correcting the issues. When I look at the first version of my emu, it almost makes me vomit (so to speak).
Cool. In that case you can now do 2 things; either retest each opcode or go for a total rewrite now. I favour the first.
WedNESday wrote:
Cool. In that case you can now do 2 things; either retest each opcode or go for a total rewrite now. I favour the first.
Problem is, it could be a multitude of things. Tracing my emulator's log next to FCEUX's debugger used to help, but this source is way too big and I have no idea where the bug could be. Other than that I tried disassembling the game and looking at the NMI handler/places where it reads controller input, but there are so many jumps and subroutines that it confused me. I bet it's something ridiculously simple that would make me facepalm the second I read it. I guess I'll write some unit tests for each opcode that could be relevant.
You can PM me your code if you don't want it public.
WedNESday wrote:
You can PM me your code if you don't want it public.
No, that is not the issue at all. I just don't want to rely on someone else reviewing my code if it's not absolutely necessary. It's pretty basic 6502 interpretation code, nothing special whatsoever. I have yet to see a bug that I didn't manage to fix within a week. Like I said, once it gets really out of hand, I will definitely request more help (and hopefully people like you will be nice enough to be eager to help me)
ArsonIzer wrote:
I have yet to see a bug that I didn't manage to fix within a week.
Ouch!
Testing with games is fine. But trying to figure out what's wrong with your emulator relying solely on bad game behavior is futile--as you're discovering. Go back to the plethora of CPU test ROMs at your disposal. Get your CPU working perfectly for all of the official opcodes.
I've already pointed you to a couple CPU test ROM bundles. There are more at
my GitHub collection. I know most of those are on the Wiki...I just wanted to keep them all in one place so I could run automated tests on my emulator.
EDIT: And...I can't get to the Wiki from work.
WedNESday wrote:
ArsonIzer wrote:
I have yet to see a bug that I didn't manage to fix within a week.
Ouch!
Ouch? What do you mean? Am I going to bite the dust?
cpow wrote:
Testing with games is fine. But trying to figure out what's wrong with your emulator relying solely on bad game behavior is futile--as you're discovering. Go back to the plethora of CPU test ROMs at your disposal. Get your CPU working perfectly for all of the official opcodes.
I've already pointed you to a couple CPU test ROM bundles. There are more at
my GitHub collection. I know most of those are on the Wiki...I just wanted to keep them all in one place so I could run automated tests on my emulator.
EDIT: And...I can't get to the Wiki from work.
That is one wide variety of test ROMS... I'll try them tomorrow (it's past midnight in my country). Thanks for the easy access to the bunch.
PS: Your boss must be proud that you're such a hard worker and only concentrate on your job rather than going to forums to help strangers
A week to fix a CPU bug seems like a long time that's all. To be fair, when you have a perfectly working CPU and you understand what you didn't before then you will look back and say to yourself 'Whoah, it seems so obvious now!'.
P.S. I'm unemployed... and loving it!
WedNESday wrote:
A week to fix a CPU bug seems like a long time that's all. To be fair, when you have a perfectly working CPU and you understand what you didn't before then you will look back and say to yourself 'Whoah, it seems so obvious now!'.
It took me about a month to code my CPU (badly) and another month to get it working to the point where its log would be similar to that of NESTest's (and I'm talking 5+ hours a day). Of course, after that I noticed that NESTest does by no means test everything, and it was INCREDIBLY frustrating, but when I look back at it, it's like "DAMN, how could I have missed that? That was so obvious!". I hope the same thing will apply to the APU... that stuff seems HARD AS HELL.
Of course (as I've said many times before) this is my first emulator, and I could not tell you what a register or an instruction was before I started this project. Hell, I didn't even know exactly what the purpose of the CPU was.
WedNESday wrote:
P.S. I'm unemployed... and loving it!
PS: I'm a college student. I'm studying Software Engineering, and it's pretty OK. The only problem with it is that I spend 80% of the time writing documents rather than doing actual programming
I shouldn't even have time to work on my emulator, let alone have a job. Good thing I'm not a big spender
ArsonIzer wrote:
The only problem with it is that I spend 80% of the time writing documents rather than doing actual programming
Then you won't want to get a job working in either the medical equipment or avionics equipment fields.
cpow wrote:
Then you won't want to get a job working in either the medical equipment or avionics equipment fields.
Who knows. It's not that I dislike the documentation, I just like the actual programming more (who doesn't?). On top of that, the stuff we have to document and the way of documenting isn't exactly in my view of "exciting", and even my teachers tend to agree. If I had to document the process of creating and optimizing an emulator, I'd be thrilled to, because at least this stuff is interesting. Heck, maybe one day when I'm actually good at this, I'll have some free time on my hands and I'll do it voluntarily (for some reason I have yet to figure out).
By the way, is the majority of people here more of a hobby programmer, or do most people do it for a living?
ArsonIzer wrote:
By the way, is the majority of people here more of a hobby programmer, or do most people do it for a living?
Most of us have answered that here.
You guys sound awesome at what you do, and I'm here wining about writing documentation xD I can't even read a PCB design or a datasheet to save my life. Nice to know what the general population of this forum does, thanks captain pow
ArsonIzer wrote:
You guys sound awesome at what you do, and I'm here wining about writing documentation xD I can't even read a PCB design or a datasheet to save my life. Nice to know what the general population of this forum does, thanks captain pow
Haha I am in no way an expert here. But I have traveled the path you are traveling. The key to reading PCB designs or datasheets is to know what it is you need to get out of the thing. I don't know any engineer [besides the author, perhaps] that knows everything about any particular device because they read the datasheet from front to back. Rather, you use datasheets as a reference...getting a pinout diagram when you need it for board bring-up testing...getting a register description or two when you need it for driver development...getting signalling characteristics when you need it for scoping/debugging...getting contact information when you need it for FAE support...etc. Similarly with PCB design...you don't need to digest the whole thing at once [unless you're the author or an initial reviewer]. Every project I work on I deal with devices individually as much as I can and usually those devices are constrained to one or two pages of the schematic and a few off-page connections to other devices. I don't worry about irrelevant pages of the schematic until I need to. But I do insist that the HW designer provide me with all relevant datasheets for the board...to save the headaches of using the wrong document. They always have the datasheets handy because they had to get them to do the PCB design...so it always works out.
cpow wrote:
Haha I am in no way an expert here. But I have traveled the path you are traveling.
If you are no expert, what does that make me D: ? Of course I'm kidding, but if you've traveled my path, then I hope I'll one day cross your current level of knowledge.
cpow wrote:
The key to reading PCB designs or datasheets is to know what it is you need to get out of the thing. I don't know any engineer [besides the author, perhaps] that knows everything about any particular device because they read the datasheet from front to back. Rather, you use datasheets as a reference...getting a pinout diagram when you need it for board bring-up testing...getting a register description or two when you need it for driver development...getting signalling characteristics when you need it for scoping/debugging...getting contact information when you need it for FAE support...etc. Similarly with PCB design...you don't need to digest the whole thing at once [unless you're the author or an initial reviewer]. Every project I work on I deal with devices individually as much as I can and usually those devices are constrained to one or two pages of the schematic and a few off-page connections to other devices. I don't worry about irrelevant pages of the schematic until I need to. But I do insist that the HW designer provide me with all relevant datasheets for the board...to save the headaches of using the wrong document. They always have the datasheets handy because they had to get them to do the PCB design...so it always works out.
I have no clue what you just said
and yet it's strangely interesting. To be honest, I was never interested in the low-level aspects of a computer, but I was always fascinated by emulators (playing GBA and NDS games on them with my little brother when those were a luxury to us), so when I started understanding how emulators work, I became eager to learn more. Now I wish to be able to read that stuff and be like "oh, now I understand why this piece of hardware does that!". I remember seeing a 6502 die shot (I think that's what it was called) when I was starting to learn about CPUs in general and someone was talking about how it proved that there was no decimal mode and I was just thinking: "HOW THE **** CAN YOU SEE THAT? WHAT THE ****? IT'S JUST A BUNCH OF LINES!". Haha anyway, in a few years I hope to be able to look back at this and laugh
. My priority of finishing this emulator remains though.
I'll plug my (with help from lidnariq and Quietust and others) Visual * circuit reading tutorial in case you haven't seen it:
http://wiki.nesdev.com/w/index.php/Visu ... t_tutorial . I can read digital (NMOS) circuits decently now, but am still pretty lost when it comes to many analog aspects of circuits.
ulfalizer wrote:
I'll plug my (with help from lidnariq and Quietust and others) Visual * circuit reading tutorial in case you haven't seen it:
http://wiki.nesdev.com/w/index.php/Visu ... t_tutorial . I can read digital (NMOS) circuits decently now, but am still pretty lost when it comes to many analog aspects of circuits.
I am the same way. Analog = voodoo. I understand pull-up, pull-down, and even to some degree filtering caps, transistors, and diodes. But beyond that, when I see a mesh of Rs and Cs and Ls and other crap I don't yet have the "oh yeah that's a bandpass filter at 1KHz" intuition.
I have been wanting to go back and re-learn the analog stuff I should have paid more attention to in college for many many years.
cpow wrote:
I am the same way. Analog = voodoo. I understand pull-up, pull-down, and even to some degree filtering caps, transistors, and diodes. But beyond that, when I see a mesh of Rs and Cs and Ls and other crap I don't yet have the "oh yeah that's a bandpass filter at 1KHz" intuition.
I have been wanting to go back and re-learn the analog stuff I should have paid more attention to in college for many many years.
Yeah, never got very deep into it. I find it hard to master stuff unless I'm immediately applying it in some way.
WedNESday wrote:
I can't think of a single game that uses BRK to be honest. Maybe there are some out there. You are right, most of those 00s are simply filler.
I'm pretty certain there are games that do. I want to say one of the Dragon Warrior games uses it. But I don't recall for sure.
MottZilla wrote:
WedNESday wrote:
I can't think of a single game that uses BRK to be honest. Maybe there are some out there. You are right, most of those 00s are simply filler.
I'm pretty certain there are games that do. I want to say one of the Dragon Warrior games uses it. But I don't recall for sure.
Now that you mention Dragon Warrior, maybe you're right.
Effectively, it is never used.
Dragon Warrior 4:
Code:
0F:C968:20 83 C9 JSR $C983
0F:C96B:20 2F C5 JSR $C52F
0F:C96E:00 BRK
For a moment, I thought about making C10H15Noid, an Arkanoid clone where the IRQ handler makes up part of the subroutine that handles removal of a block from the field. If the emulator breaks BRK, it'll break breaking bricks. This way the player can see that the emulator is BRKing bad.
But then I realized that the 6502 itself breaks BRK when NMI happens at the same time, and that's probably why games don't use it.
I see what you did there, Tepples.
tepples wrote:
For a moment, I thought about making C10H15Noid, an Arkanoid clone where the IRQ handler makes up part of the subroutine that handles removal of a block from the field. If the emulator breaks BRK, it'll break breaking bricks. This way the player can see that the emulator is BRKing bad.
But then I realized that the 6502 itself breaks BRK when NMI happens at the same time, and that's probably why games don't use it.
Let me guess... Breaking Bad fan?
Isn't the BRK opcode executed everytime Mario breaks a block in Super Mario Bros.?
Getting back on topic, I have a question. I'm still having the Mario Bros bug, and I was wondering: how likely is it that this bug is caused due to the lack of dummy reads/writes? I heard that those should apparently also be used in some instructions, but I haven't implemented them. My emulator passes NESTest (official opcodes) and NEStress perfectly (CPU-wise), so maybe it's related to the lack of dummy reads/writes? Anyway, I thought I should consider this possibility since I can't find a mistake in my CPU to save my life. Any thoughts?
PS: It seems that the green/red fireball type sprite thingies that randomly appear on the screen, don't have trouble moving to the left. It's just Mario and the "live" enemies.
ArsonIzer wrote:
Getting back on topic, I have a question. I'm still having the Mario Bros bug, and I was wondering: how likely is it that this bug is caused due to the lack of dummy reads/writes? I heard that those should apparently also be used in some instructions, but I haven't implemented them. My emulator passes NESTest (official opcodes) and NEStress perfectly (CPU-wise), so maybe it's related to the lack of dummy reads/writes? Anyway, I thought I should consider this possibility since I can't find a mistake in my CPU to save my life. Any thoughts?
PS: It seems that the green/red fireball type sprite thingies that randomly appear on the screen, don't have trouble moving to the left. It's just Mario and the "live" enemies.
The dummy reads/writes wouldn't cause a problem like that. My first CPU core wasn't cycle accurate and didn't have those problems.
Without some code or some footage of the problem on YouTube (could you upload it?) we aren't really gonna be able to help you anymore.
WedNESday wrote:
Without some code or some footage of the problem on YouTube (could you upload it?) we aren't really gonna be able to help you anymore.
I agree. Footage would help.
WedNESday wrote:
Without some code or some footage of the problem on YouTube (could you upload it?) we aren't really gonna be able to help you anymore.
I understand your point. I can't keep asking you to just fish in the dark (or whatever the saying is). I'll try to capture some footage of the emulator running tomorrow, and I'll upload the source of the CPU core then as well (just give me some time to clean up all the BS print lines, rename some variables and maybe add some comments).
Here you go:
http://codepad.org/kYweCFHVIt's the CPU core. I hope you guys can read some Java (I know it says C++ but it's Java).
I'll try to capture some footage and upload it to YouTube later today. I really appreciate that you guys are willing to look at this for me. Thanks
EDIT:
http://www.youtube.com/watch?v=6UWg_MZ_89A
^ Some footage of the emu showing what's going on when no key is pressed, left key is pressed, and right key is pressed. Also shows the turtles walking backwards.
WOW that's a big program for such a simple CPU.
I'll ignore telling you what I don't like about your style of programming (nothing disrespectful) and get straight down to the action. This is the only bug that I could see at a quick glance. It is probably causing the problems that you have mentioned.
Code:
public void checkOverflow(int... vals) {
int res = (byte) vals[0];
for (int i = 1; i < vals.length; i++) {
res += (byte) vals[i];
}
if (res > 127) {
p.setBit(FLAG_V, 1);
} else {
p.setBit(FLAG_V, 0);
}
}
If A + Byte + C is greater than 127 >>OR<< less than -128 then the overflow flag is set, else it is cleared. You are only doing the first bit.
Oh, and you must convert the A and Byte to a signed value too. Observe WedNESday's ADC code. temp is int;
Code:
temp = (char)A + (char)DataBus + C;
if (temp < -128 || temp > 127)
V = 0x40;
else
V = 0x00;
temp = A + DataBus + C;
N = Z = A += DataBus + C;
C = temp >> 8;
Edit: Above code works perfectly but is ANCIENT and has been superceded.
1. Never declare a variable inside of a switch/case.
2. Your Absolute X/Y addressing modes never check to see if an extra cycle is needed.
3. I'm no Java programmer but you seem to be using functions to set variable values. Is that.....normal in Java?
4. Going back to what I said about massive code;
Code:
/**
* AND (bitwise AND with accumulator)
*/
public void and(int currValue) {
int res = a.getVal() & currValue;
checkZero(res);
checkSign(res);
a.setValue(res);
}
When compared to;
Code:
void AND()
{
N = Z = A &= DataBus;
}
Edit:
http://slack.net/~ant/nes-emu/6502.html Please read what blargg says about the flags register.
WedNESday wrote:
WOW that's a big program for such a simple CPU.
I'll ignore telling you what I don't like about your style of programming (nothing disrespectful) and get straight down to the action. This is the only bug that I could see at a quick glance. It is probably causing the problems that you have mentioned.
Code:
public void checkOverflow(int... vals) {
int res = (byte) vals[0];
for (int i = 1; i < vals.length; i++) {
res += (byte) vals[i];
}
if (res > 127) {
p.setBit(FLAG_V, 1);
} else {
p.setBit(FLAG_V, 0);
}
}
If A + Byte + C is greater than 127 >>OR<< less than -128 then the overflow flag is set, else it is cleared. You are only doing the first bit.
Doubtful. I also have a method called checkNOverflow which checks overflow in the case of subtraction, and does indeed check if it's less than -128, and it works. I think something that simple would be picked up by NESTest or NESTress, but it wasn't.
WedNESday wrote:
Oh, and you must convert the A and Byte to a signed value too. Observe WedNESday's ADC code. temp is int;
Code:
temp = (char)A + (char)DataBus + C;
if (temp < -128 || temp > 127)
V = 0x40;
else
V = 0x00;
temp = A + DataBus + C;
N = Z = A += DataBus + C;
C = temp >> 8;
Edit: Above code works perfectly but is ANCIENT and has been superceded.
Again, that does happen in the check(N)Overflow method. Java's "byte" (which is lowercase by the way, uppercase "Byte" I will explain shortly) is the only signed 8 bit primitive type in Java, and is therefore used when casting to signed 8 bit values, so no, that's not an issue. Your code is identical to mine in result
But you don't use it in your ADC code. Both ADC and SBC need the code I posted above.
For instance $80 + $FF = -129 which would set the V flag but not on your code.
WedNESday wrote:
1. Never declare a variable inside of a switch/case.
2. Your Absolute X/Y addressing modes never check to see if an extra cycle is needed.
3. I'm no Java programmer but you seem to be using functions to set variable values. Is that.....normal in Java?
4. Going back to what I said about massive code;
Code:
/**
* AND (bitwise AND with accumulator)
*/
public void and(int currValue) {
int res = a.getVal() & currValue;
checkZero(res);
checkSign(res);
a.setValue(res);
}
When compared to;
Code:
void AND()
{
N = Z = A &= DataBus;
}
Edit:
http://slack.net/~ant/nes-emu/6502.html Please read what blargg says about the flags register.
1. I did not know that. I didn't think it could harm anyone, and I'm not sure if it's considered bad practice in Java as well.
2. They do check in the increaseCycles method.
3. No. I'm using something called "wrappers". Byte, is my custom made wrapper, while byte is the Java primitive type for an 8 bit signed value. Using a wrapper like Byte enables me to use a byte with extra functionality and without having to go like this all the time: a & 0xFF or x & 0xFF. Since Java doesn't have an 8 bit unsigned type, I use a method in my custom made wrapper which just does this:
Code:
public int getVal() {
return val & 0xFF;
}
4. Yeah, again, using a Byte (which is an object) is different than using a byte (which is a primitive type). Doing what you did (N = Z = A &= DataBus) would basically screw my program into oblivion and beyond.
The reason my coding seems f'd up is because Java programmers generally work towards: clarity, readability, maintainability, etc. This is preferred much more over performance, and of course I would change it to make it faster/more compact/etc, but like I said, this is the first thing I've done this low level, and I still had to get used to Java's signed/unsigned crap which I also only started learning about after I had started the emulation process.
On my level of Java, performance and compactness of code is not yet that big of an issue. Other than those Java/convention-specific issues, not much seems to be wrong with my code, does it?
Maybe it's the fact that my CPU runs for an entire frame before actual sprites are drawn? I mean couldn't that cause some kind of synchronization issues? I doubt it, but that might be something to look after, no? I quickly implemented Mapper 2 and Mapper 7, and while some games work fine (Metal Gear, Megaman) a lot of games mess up in huge aspects (Contra (keeps falling through the ground), Battletoads (complete f'd up obviously), Marble Madness (same as Battletoads), Castlevania (idem), Ducktales (bumps into faulty opcode), and some more). I suppose maybe I should implement a cycle-specific PPU and use the catch-up method to make them as synchronized as possible. Maybe that could help.
Any thoughts?
WedNESday wrote:
But you don't use it in your ADC code. Both ADC and SBC need the code I posted above.
For instance $80 + $FF = -129 which would set the V flag but not on your code.
But does addition also apply to the -128 rule? I thought addition checked > 127 for the overflow flag, and subtraction checked < -128 for the overflow flag. Does that mean I need to check both < -128 and > 127 for every instruction, or just ADC and SBC? The overflow flag did confuse me, but not this much O_O
ArsonIzer wrote:
WedNESday wrote:
But you don't use it in your ADC code. Both ADC and SBC need the code I posted above.
For instance $80 + $FF = -129 which would set the V flag but not on your code.
But does addition also apply to the -128 rule? I thought addition checked > 127 for the overflow flag, and subtraction checked < -128 for the overflow flag. Does that mean I need to check both < -128 and > 127 for every instruction, or just ADC and SBC? The overflow flag did confuse me, but not this much O_O
Both -128 and 127 for just for ADC/SBC.
The hell.... I looked at some other emulators' sources, and you're right, but my emulator seems to just f*ck up when I try it. It's like "nope, rejected". It even fails NESTest and NEStress when I implement it like this:
Code:
int tempResult = (byte) a.getVal() + (byte) currValue + p.getBit(FLAG_C);
if(tempResult > 127 || tempResult < -128) {
p.setBit(FLAG_V, 1);
} else {
p.setBit(FLAG_V, 0);
}
and I did the same thing with SBC except with subtraction rather than addition.
This doesn't work either (which is basically the same thing):
Code:
boolean v = (((a.getVal() ^ currValue) & 0x80) == 0)
&& (((a.getVal() ^ res) & 0x80) != 0);
p.setBit(FLAG_V, v ? 1 : 0);
I think you're right though, I need to get to the bottom of this (debating the purchase of a Sherlock Holmes-style outfit). I'll figure out what's wrong with my emulator's interpretation of the overflow flag and I hope that will fix the issue. I'll post tomorrow if I make any progress. Thanks for the suggestion, I really hope it turns out to be the problem.
PS: using (byte) in Java casts to a signed 8 bit value if you were having doubts, so it should have worked. Anyway, will still try fixing it tomorrow. Hope it progresses in some way or another.
NEStress is AWFUL beyond belief so don't bother with it.
NEStest is good but not perfect as I found out not that long ago...
viewtopic.php?f=5&t=9893&hilit=nestestThat code that you posted checks out. Which other emulator core did you check?
I know that you don't wanna hear this, no emulator author does, but you should really delete everything you've done and go back to the start...
This time use the techniques that we have explained to you in this topic.
Regarding ADC/SBC, emulator authors always get these wrong because they don't understand twos-complement and how the overflow flag actually works. This has been discussed to no end on this forum. Here are relevant threads, with code:
viewtopic.php?f=3&t=6331viewtopic.php?f=3&t=8703viewtopic.php?f=10&t=2468Be aware many of the posters in these threads (esp. the first one) get it wrong (and others point that out), so you need to read the threads fully/slowly to understand.
If you want the quick and easy way out (as in "I want to see code that works"), no problem -- authoritative and correct answer (not to mention clever), leave it to blargg. :-) (But be sure to read the posts under his, as people try fooling around and run into different caveats)
viewtopic.php?p=19080#p19080
ArsonIzer wrote:
WedNESday wrote:
1. Never declare a variable inside of a switch/case
1. I did not know that. I didn't think it could harm anyone, and I'm not sure if it's considered bad practice in Java as well.
Nothing wrong with that, and it keeps things local for clarity. Sometimes people forget to put them in a block, and since switch and case are basically just goto and labels, you can skip initialization easily:
Code:
switch ( n )
{
case 0:
int x = foo();
...
break;
case 1:
int y = x; // oops, used uninitialized variable
...
}
So be sure to wrap the case's statements in a block;
Code:
switch ( n )
{
case 0: {
int x = foo();
...
break;
}
case 1:
int y = x; // compile error, much better
...
}
WedNESday wrote:
NEStress is AWFUL beyond belief so don't bother with it.
NEStest is good but not perfect as I found out not that long ago...
viewtopic.php?f=5&t=9893&hilit=nestestThat code that you posted checks out. Which other emulator core did you check?
I know that you don't wanna hear this, no emulator author does, but you should really delete everything you've done and go back to the start...
This time use the techniques that we have explained to you in this topic.
I was planning on rewriting it anyway, but I wish I could fix these issues first so I don't have to keep stressing for a week while I'm rewriting the code.
As for the sources I've looked at, I've only looked at their Overflow flag interpretation in the ADC and SBC instructions, so I don't think there's anything useful I can provide by giving names. If you really want to know however: HalfNES, LambNES (don't know if this one actually works, but was on Google Code so I thought I'd take a peek), Nintendulator and FCEUX (which, although kind of different, seemed to have the same effect as the pieces of code I just showed.
PS: Every emulator I've ever tried which seemed to work (Nintendulator, Nestopia, FCEUX, HalfNES, Yanese, and I'm sure there are more which I forgot) pass NEStress and NESTest. By no means am I saying that those tests should be the standard, but they definitely show that the very basics of an emulator is functioning decently, and mine failing the tests should, most certainly, at least be worrisome.
blargg wrote:
Nothing wrong with that, and it keeps things local for clarity. Sometimes people forget to put them in a block, and since switch and case are basically just goto and labels, you can skip initialization easily:
So be sure to wrap the case's statements in a block;
Thanks. I didn't think there was something wrong with it. What's the significance of wrapping case statements in a block though? I know that it makes the scope more limited, but could it cause issues?
koitsu wrote:
Regarding ADC/SBC, emulator authors always get these wrong because they don't understand twos-complement and how the overflow flag actually works. This has been discussed to no end on this forum. Here are relevant threads, with code:
viewtopic.php?f=3&t=6331viewtopic.php?f=3&t=8703viewtopic.php?f=10&t=2468Be aware many of the posters in these threads (esp. the first one) get it wrong (and others point that out), so you need to read the threads fully/slowly to understand.
If you want the quick and easy way out (as in "I want to see code that works"), no problem -- authoritative and correct answer (not to mention clever), leave it to blargg.
(But be sure to read the posts under his, as people try fooling around and run into different caveats)
viewtopic.php?p=19080#p19080I do understand two's complement, and I do understand the overflow flag, but I guess I'm just overlooking something. I would think that I might benefit from some extra knowledge on the ADC/SBC instructions though, so I will definitely look at those threads as soon as I start messing with my emulator again tomorrow.
ArsonIzer wrote:
Thanks. I didn't think there was something wrong with it. What's the significance of wrapping case statements in a block though? I know that it makes the scope more limited, but could it cause issues?
Wrapping them in blocks avoids issues. The example code I gave showed the problem you can encounter when you don't wrap in a block. Another is when you use the same variable name in two cases:
Code:
switch ( n )
{
case 0:
int x = foo();
...
break;
case 1:
int x = bar(); // error: x already exists in this scope
...
}
switch ( n )
{
case 0: {
int x = foo();
...
break;
}
case 1: {
int x = bar(); // no problem
...
}
}
blargg wrote:
Wrapping them in blocks avoids issues.
But what's the reasoning behind such "issues"? What does wrapping something in a block actually do? And how consistent is this behavior across different languages that use { and } as block delimiters?
Probably because C++ has the variable destructors inside those blocks of code, while the breaks and cases get all ran as on entity of code, right? This shit is what makes C++ complete crap.
You don't even need to be talking about C++, even plain C will benefit from using scoping. The only point is to keep you from copypasta'ing some variable from one place to another without noticing and having accidental variable reuse that mysteriously makes for sometimes-functional and sometimes-not code.
By putting them in separate scopes, you can't screw that up.
tokumaru wrote:
blargg wrote:
Wrapping them in blocks avoids issues.
But what's the reasoning behind such "issues"? What does wrapping something in a block actually do? And how consistent is this behavior across different languages that use { and } as block delimiters?
See my previous two posts in this thread for C examples of why putting the case's code in a compound block avoids problems. At this point I'm wondering whether they were invisible because you're the second person to ask for examples of the problems.
And 3gengames no, it's not specific to C++, and my example was in C. You'll have to come up with a better reason to hate on C++. Meanwhile I'm happily using it on an 8-bit embedded processor with 8K of ROM, because it doesn't force me to use any feature unless I deem that feature worth its cost (assuming it even costs anything to use).
blargg wrote:
See my previous two posts in this thread for C examples of why putting the case's code in a compound block avoids problems.
An example is not the same as an explanation...
EDIT: had written a longer answer before, but never mind.
tokumaru wrote:
But what's the reasoning behind such "issues"? What does wrapping something in a block actually do? And how consistent is this behavior across different languages that use { and } as block delimiters?
By giving a variable a limited scope, you can prevent its accidental reuse. I think blargg's examples did explain this in their annotations.
When the scope is closed with } any variables within that scope are to be cleaned up / destroyed at that point. For a complex C++ class with a destructor, yes this means the destructor should be called at that point. However, for a primitive type like int, its destruction could be lightweight or nonexistent (e.g. it could be stored in a register, or it could be reused by another scope later in the function; it's up to the compiler). Even for a C++ class, an inline destructor would give the compiler opportunity to streamline the activity at the } point.
Scoping is the answer to the question (as others have said), re: using {} blocks within the case rather than leaving it open-ended, specifically in the case of re-using variable name.
Anyway, give the ADC/SBC method blargg mentioned a shot and report back. Those two opcodes are usually the ones most people get wrong, so don't feel bad. :-)
koitsu wrote:
Anyway, give the ADC/SBC method blargg mentioned a shot and report back.
I assume you mean me, not blargg.
Anyway he has already given it a shot and it has broken NEStest. Which means that there a bigger problems elsewhere in his code. I'm all for a rewrite to be honest. If he rewrites is now with the new style that has been described here the bug(s) may fix themselves. If he doesn't he may waste x amount of time looking for a bug we'll never find.
By the way if you declare a variable inside of a switch/case in C++ it gives you a warning.
WedNESday wrote:
By the way if you declare a variable inside of a switch/case in C++ it gives you a warning.
Wouldn't that depend on the IDE, or is it an actual compiler thing?
WedNESday wrote:
koitsu wrote:
Anyway, give the ADC/SBC method blargg mentioned a shot and report back.
I assume you mean me, not blargg.
Anyway he has already given it a shot and it has broken NEStest. Which means that there a bigger problems elsewhere in his code. I'm all for a rewrite to be honest. If he rewrites is now with the new style that has been described here the bug(s) may fix themselves. If he doesn't he may waste x amount of time looking for a bug we'll never find.
I agree with you. That way I can use less objects to improve performance, and make it cleaner/more compact. While I'm at it, I'll completely rewrite my PPU to make it more accurate, since drawing sprites ~30k CPU cycles too late might actually be over the top inaccurate.
The thing that bothers me though, is not knowing what precisely causes this bug. Rewriting the core won't magically fix the problem if it's an actually serious misconception in my code, because I'll be making the same mistake again. Even if it does fix the issue, I'd still like to know what caused it initially.
Anyway, if I don't manage to fix it within 2 or 3 days at most, I'll just go ahead and do a complete rewrite. Hope that fixes a lot of things which are wrong now.
PS: Maybe he did mean Blargg. Koitsu posted a couple of links to some topics where the ADC/SBC instructions were debated, and Blargg mentioned a pretty clever implementation in there, so maybe he meant that.
ArsonIzer wrote:
The thing that bothers me though, is not knowing what precisely causes this bug. Rewriting the core won't magically fix the problem if it's an actually serious misconception in my code, because I'll be making the same mistake again. Even if it does fix the issue, I'd still like to know what caused it initially.
Never ponder as to what caused bug a or bug b for any long period of time when it has either been fixed or isn't important. You'll probably never end up finding it. Trust me.
ArsonIzer wrote:
Rewriting the core won't magically fix the problem...
Don't bet on it.
Alright, thanks for the help guys. I've started implementing the CPU now using less objects. I hope it works and does indeed fix the issue. I'll make sure to thoroughly check every method with other emulators like FCEUX, Nintendulator, etc and use a more sensible approach to things that can be done easily.
Just a quick general question though: is the catch-up method used frequently in good emulators, or is it just an in-between stage for emulators which go for actual CPU/PPU-cycle specific implementation? I've heard that people generally don't go for a cycle-specific CPU, but rather a cycle-specific PPU and use this catch-up method. Are there better ways out there which are frequently used, or is this the go-to technique?
To add to that, when using the catch-up method and implementing a cycle-specific PPU, does the timing of the fetches (nametable, attribute table, pattern table, etc) matter, or can the data be fetched from memory when it's needed and keep the emulator working? What I mean is, let's consider the following:
Cycle 0: idle
Cycle 1: fetch nametable byte
Cycle 2: fetch attribute table byte
Cycle 3: fetch first pattern table byte
Cycle 4: fetch second pattern table byte and draw 4 pixels (I think)
If I were to just do everything on cycle 4, would it matter in any game, or is this procedure necessary? I.e. would this (in quick pseudo code) be sufficient:
Code:
if(cycle >=1 && cycle <= 256) {
if(cycle % 4 == 0) {
//fetch stuff
//draw 4 pixels
}
}
Yeah, the fetches need to be right for MMC3 as it "Watches" A13 or whatnot.
Catch-up is entirely an optimization, and apparently on today's desktop PCs unnecessary for a NES emulator. You can always do some catch-up later where it would help most (e.g. when the PPU will be running for several scanlines, use something optimized for the full scanlines). I think that lacking an emulator written without catch-up, you won't be able to reasonably make one with it, because you won't understand the full ramifications of everything and will be constantly finding problems with the catch-up implementation not handling these corner cases. I wish I had done a normal implementation before I did catch-up back when I was working in my NES emulator. Far easier to get right, and serves as a reference that works.
Wait a minute. If I understand correctly, the catch-up method is used to run the CPU for X amount of cycles, and while that is happening, if the instruction writes to the PPU, you let the PPU run until it "catches up" with the CPU before doing the actual write (or read or whatever). What is considered the "normal" method here if that's not it? Just letting the CPU run one instruction and then running the cycle-specific PPU for the elapsed amount of cycles * 3, or is it a simple scanline-based PPU where the CPU runs a full scanline before letting the PPU draw?
I get that the catch-up thing can be tricky, but wouldn't it be necessary for a game like Battletoads?
The precise method is to run one CPU cycle then three* PPU cycles. Anything else is an optimization. Optimization good; premature optimization bad.
* Assuming NTSC
ArsonIzer wrote:
Wait a minute. If I understand correctly, the catch-up method is used to run the CPU for X amount of cycles, and while that is happening, if the instruction writes to the PPU, you let the PPU run until it "catches up" with the CPU before doing the actual write (or read or whatever). What is considered the "normal" method here if that's not it? Just letting the CPU run one instruction and then running the cycle-specific PPU for the elapsed amount of cycles * 3, or is it a simple scanline-based PPU where the CPU runs a full scanline before letting the PPU draw?
I get that the catch-up thing can be tricky, but wouldn't it be necessary for a game like Battletoads?
Catch-up means catching up on components whenever their current state becomes significant. For example, if the value of one of the monochrome flag changes mid-frame, you'd catch up by first running the PPU up to that point using the old value of the monochrome flag, and then changing it. Or, in pseudo-code:
Code:
run_ppu_up_to_current_position();
monochrome_flag = new_monochrome_flag;
The current state that's significant in this case is the rendering position. The advantage of this approach is that you can draw in a more efficient fashion than pixel-for-pixel most of the time, and that each component can be run in a tight loop instead of switching between components millions of times per second. The disadvantage is that you need to catch all points where the state becomes significant. It's easy to miss stuff.
A related concept is
prediction. In prediction, you predict when the state of some component will become significant (e.g. because it fires an interrupt) to avoid having to do low-level emulation of that component up to that point. The advantage is the same as for catch-up. The disadvantage is that many thing can invalidate those predictions, making it very tricky to get right in some cases. Getting it absolutely solid for some things might involve so much prediction and invalidation that it becomes as slow or slower than no prediction.
Without catch-up and prediction, you simply run three PPU ticks (NTSC) and one APU tick for each CPU tick. An easy way to do this for the 6502 is to do the PPU and APU calls in the read and write routines (with a separate
tick() function that can be used elsewhere too). Provided you get the timing right, this makes interactions between components work automagically for all cases, at the cost of performance.
Even without prediction and catch-up there's still optimizations you can do, like having a
channel_updated flag for the APU that's set to true whenever the output level of some channel changes. That way you don't have to do channel mixing each CPU cycle. You can also have a flag that's set whenever some event that needs to be handled between CPU instructions (like a pending interrupt) occurs.
Tepples - I am aware that that is the PRECISE method, but I'm assuming that not every emulator uses that method. What do people normally use for their emulators? I don't assume it's common to have a cycle-specific CPU.
Ulfalizer - So what you're saying, is that the norm for today's emulators is to have a CPU, which makes the PPU and APU tick for every single one of its cycles? I thought that was the slow way to do things.
As a separate query: Let's take the ADC instruction using Zero page which is supposed to take 3 cycles. The 3 cycles do: Fetch opcode, fetch address and then fetch data from the address and doing the actual addition. On every one of these actions, I'm supposed to let the PPU cycle 3 times to achieve good accuracy (leaving out the APU for now)?
ArsonIzer wrote:
Ulfalizer - So what you're saying, is that the norm for today's emulators is to have a CPU, which makes the PPU and APU tick for every single one of its cycles? I thought that was the slow way to do things.
Not sure how much of the norm it is, but today's desktop computers seem fast enough at least (my emulation thread currently uses around 37% of one core on my two-year-old 2600K Core i7), and it has the advantage of simple and very robust emulation. I think Nintedulator uses this approach too.
Arsonizer wrote:
As a separate query: Let's take the ADC instruction using Zero page which is supposed to take 3 cycles. The 3 cycles do: Fetch opcode, fetch address and then fetch data from the address and doing the actual addition. On every one of these actions, I'm supposed to let the PPU cycle 3 times to achieve good accuracy (leaving out the APU for now)?
Yup, that's what I do.
The wiki has a page about catch-up.
In the bad old days (late 1990s), it was common to run the CPU for a whole scanline and then run the PPU for a whole scanline.
So long as NMI, APU IRQs, and mapper IRQs are predicted, the CPU can run thousands of cycles ahead if necessary. It can store the address, data, and cycle number of each write. This is called "timestamping", and it can help the cache performance of your emulator. But the PPU does have to catch up on a $2007 read, and it may have to on a $2002 read on a line that contains sprite 0 or eight sprites. Those scanlines can be predicted in advance after OAM is rewritten. Though a couple games (such as
Bigfoot) rewrite OAM mid-frame, invalidating the prediction, you can get away with predicting $2002 16 or so lines ahead without too much performance hit.
Sure, optimization isn't quite as necessary on modern Core i7 PCs as it was on old Pentium IIs. But phones and tablets are the current area of growth in the computing market, and the Atom in a netbook or tablet is about as fast as a P4 of the same clock frequency. Switching to a catch-up architecture with timestamping will help you multithread your emulator, which will improve efficiency on dual- to quad-core ARM devices. And if you want to run multiple emulators at once, as in an
emulator UI inspired by the Wii Menu or a
3D view of an arcade full of Vs. or PlayChoice machines, you'll need efficiency there too.
Yeah, you need to keep your target system in mind. At least the case of running several instances is embarrassingly parallel though, so if your emulator is single-threaded you can scale it up to at least the number of available cores (just experimented with 10 or so instances, and they ran fine until X11 decided to crash
).
Guess issues like memory contention might come in too, but seemed to work fine in practice in this case. Prolly shouldn't have a huge cache footprint...
ArsonIzer wrote:
As a separate query: Let's take the ADC instruction using Zero page which is supposed to take 3 cycles. The 3 cycles do: Fetch opcode, fetch address and then fetch data from the address and doing the actual addition. On every one of these actions, I'm supposed to let the PPU cycle 3 times to achieve good accuracy (leaving out the APU for now)?
It's not a question of accuracy, but of implementation difficulty and efficiency. If you run the PPU three dots every CPU cycle, you don't have to worry about what the CPU or PPU is doing; they stay in sync and the program structure is trivial.
I do want to eventually port my emulator to my Android phone (which is a 3+ year old HTC Legend with a 600mhz ARM 11 processor), and I want to be able to make it work full speed, but that is still a distant goal. For now, I just want a working, accurate emulator which runs decently on my core i7 laptop. I suppose I'll go implement the cycle-accurate CPU/PPU for now, and if that fixes the Mario Bros problem and the graphical glitches I'm having with a lot of games right now, I'll be happy for a long time (the time it takes to implement the APU, which I'm barely looking forward to
)
Anyway, thanks a lot for the help and input guys. I can't imagine not being able to ask these questions and still move forward
When I first started to write WedNESday back in late 2003 performance was a really big issue back then. Hardly anyone could run Nintendulator and it is only until this year that I actually rewrote the 6502 core for readability and size instead of performance.
To be honest its such a huge waste of time to base any emulator on performance as in 3-4 years even the most demanding of emulators suddenly becomes normal.
WedNESday wrote:
When I first started to write WedNESday back in late 2003 performance was a really big issue back then. Hardly anyone could run Nintendulator and it is only until this year that I actually rewrote the 6502 core for readability and size instead of performance.
To be honest its such a huge waste of time to base any emulator on performance as in 3-4 years even the most demanding of emulators suddenly becomes normal.
Yup, for now it's indeed not an issue, since I'm at least running it on a 2 Ghz core i7, but like I said before, eventually it's going to be my intention to make it run on a 500 Mhz ARM 11, which will probably require some optimization. This, however, is still in the future, and I don't think any optimizations will be necessary for at least another 6 months (if I'm really fast with the APU and mappers). Other than that, I have to ask, freaking 2003? Damn, that's a long time. Back then I was still a careless, annoying kid (might still be that
). The NES is just a gateway for me though. I probably won't be keeping myself busy with the NES for more than 2 years or so for progression purposes. Do you guys have other, more advanced emulators you work on (SNES, GB(A), X86, PS1, etc) or are you really just stuck in the NES scene?
The ability to apply Moore's law to raw cpu speed has long since disappeared. Transistors density and count has continued to rise, and even get more efficient, but algorithmic improvements are far more effective at saving power than just waiting for Intel to fix it for you. Don't throw away an optimized version unless it's getting in the way of correctness.
TL;DR: Targeting a device with a battery? You darn well better optimize.
Hadn't actually tried running the emulator as fast as possible in a while, and it seems to manage 7-8x speed on one core despite ~37% CPU usage being reported for 1x.
So yeah, complicated prediction and catch-up might have gotten pretty moot for modern desktop systems.
ulfalizer wrote:
Hadn't actually tried running the emulator as fast as possible in a while, and it seems to manage 7-8x speed on one core despite ~37% CPU usage being reported for 1x.
So yeah, complicated prediction and catch-up might have gotten pretty moot for modern desktop systems.
Currently I've implemented semi-cycle-accurate background drawing (does most things cycle accurate, but draws a tile sliver every 8 cycles rather than 1 pixel every single cycle), and I'm implementing the sprites as we speak, but even with unfinished rendering, no audio, no controllers and no complicated modern emulator functions, it requires ~145 - 165 ms to draw 60 frames in SMB, i.e. I can at most run it 6 - 7x the normal speed, even though it's less than half finished. Does the APU use a lot of resources compared to the CPU/PPU (in general that is)? I fear that it's going to slow down much more when I've "fully" implemented all the components, in which case optimization will definitely be required.
I'm running it on a laptop with 2.00 Ghz i7, GeForce GT540M (if the GPU is of any relevance). No fancy threading. Using Java 1.7.
What kind of optimizations have you implemented, and on what processor are you running it?
ArsonIzer wrote:
ulfalizer wrote:
Hadn't actually tried running the emulator as fast as possible in a while, and it seems to manage 7-8x speed on one core despite ~37% CPU usage being reported for 1x.
So yeah, complicated prediction and catch-up might have gotten pretty moot for modern desktop systems.
Currently I've implemented semi-cycle-accurate background drawing (does most things cycle accurate, but draws a tile sliver every 8 cycles rather than 1 pixel every single cycle), and I'm implementing the sprites as we speak, but even with unfinished rendering, no audio, no controllers and no complicated modern emulator functions, it requires ~145 - 165 ms to draw 60 frames in SMB, i.e. I can at most run it 6 - 7x the normal speed, even though it's less than half finished. Does the APU use a lot of resources compared to the CPU/PPU (in general that is)? I fear that it's going to slow down much more when I've "fully" implemented all the components, in which case optimization will definitely be required.
I'm running it on a laptop with 2.00 Ghz i7, GeForce GT540M (if the GPU is of any relevance). No fancy threading. Using Java 1.7.
What kind of optimizations have you implemented, and on what processor are you running it?
I'm using a 2600K Core i7 @ 3.4 Ghz.
APU should be much cheaper than PPU. You'll mostly have a few down counters for the different channels that tick the channel and get reloaded with the period when they reach zero. The frame counter could also use a down counter, though it's still a switch in my code (downcounterification seems to be a good optimization strategy in many instances
). The most important optimization for me was using a
channel_updated flag and only doing mixing when the output level on one of the channels changes (could be viewed as caching the output level). That brought the APU's share of the runtime down to about 5% from 15% (with both cases using a LUT for non-linear mixing). I suspect the frame counter accounts for a sizeable chunk of what remains.
Here's three random optimizations for the PPU off the top of my head. Not sure how much they help, but it's something concrete at least. You should profile and check where time is being spent first.
- Store $2001:0 (grayscale) directly as a mask that you AND the color by instead of using a conditional. 0x30 means grayscale on, 0x3F grayscale off.
- When the leftmost 8 BG pixels are hidden, set a variable bg_clip_comp to 8. When the background is disabled, set it to 256. Otherwise, set it to 0. That way you can check whether the current BG pixel is hidden with a single if (pixel < bg_clip_comp) conditional. A similar optimization is possible for sprite pixels.
- For palette writes, do mirroring by actually writing the mirrored values. The palette is read way more often than it's written, so not having to do mirroring when reading it helps a bit.
In general, try to move stuff off the hot path (tick_ppu()) where possible, and micro-optimize it a bit otherwise.
Should say that I assumed C/C++ when talking about performance btw. Not sure if you can get away with the lazy approach as easily in higher-level languages like Java. 6-7x doesn't seem too bad for a first stab though.
The overhead of the non-emulation parts seems negligible in comparison by the way. Rewinding is just copying a few megs worth of data per second at most - most of it in contiguous chunks - which is nothing on a modern system. Should be very cheap even if some compression is added.
Stuff like hardware scaling is nice to have though. Would want 3D-accelerated rendering and shaders for fancy effects anyway.
6 - 7x at the 4th CPU stab, and 2nd PPU stab, although this might be the first time I get them working this decently. I've got most of the rendering working right now, but many games still show rendering defects, like Skate or Die which has (what I assume are) random tiles just drawn as black instead of their original colors, or Battletoads which is completely messed up to say the least. I guess I need absolute cycle-precision. Anyway, I mainly need optimizations to be able to run it on a 600 Mhz ARM in the end. I wonder how far I'll get it working on such a slow processor with cycle-precise rendering. I don't think micro-optimizations will have a major impact on that.
PS: After a complete rewrite, the Mario Bros bug still remains
Does Mario Bros use any non-basic PPU functionality like sprite 0 hit?
PPS: Does anyone know why the Wiki says that Marble Madness is a tricky game to emulate because it switches
CHR banks mid-scanline? As far as I know, AxROM doesn't switch CHR banks but rather PRG banks.
ArsonIzer wrote:
Anyway, I mainly need optimizations to be able to run it on a 600 Mhz ARM in the end.
PocketNES, which offloads tile and sprite rendering to the Game Boy Advance PPU, runs on a 16.8 MHz ARM.
Quote:
Does anyone know why the Wiki says that Marble Madness is a tricky game to emulate because it switches CHR banks mid-scanline? As far as I know, AxROM doesn't switch CHR banks but rather PRG banks.
Even without mapper control of the CHR address, it's still possible to use PPU port $2000 to switch the background between $0000-$0FFF and $1000-$1FFF.
ArsonIzer wrote:
PS: After a complete rewrite, the Mario Bros bug still remains
Does Mario Bros use any non-basic PPU functionality like sprite 0 hit?
Could you just give us an update of what is different from the 1st post of this thread? (so that we know what changes you have made so that we can have a better idea of what the problem is)
Incorrect SBC for Mario Bros. removes all collision detection. Is sprite RAM being effected by your controller input somehow?
There is no Sprite 0 hit detection on Mario Bros.. This is a blatant CPU problem IMO. Could you post the CPU source?
tepples wrote:
PocketNES, which offloads tile and sprite rendering to the Game Boy Advance PPU, runs on a 16.8 MHz ARM.
Oh... that is quite efficient then, but that also makes many of the optimizations to emulators almost unnecessary for target CPUs of 100+ Mhz, unless it's just for the sake of creating optimizations. Then again, I have an NES emulator on my Playstation Portable which typically runs on a 222 Mhz MIPS CPU, so I don't know why I was worrying in the first place.
tepples wrote:
Even without mapper control of the CHR address, it's still possible to use PPU port $2000 to switch the background between $0000-$0FFF and $1000-$1FFF.
Ah, that kind of CHR bank switching. I thought the kind which, for instance, the UxROM mappers do when a value is written to the cartridge board, where they actually switch between active PRG ROM banks.
WedNESday wrote:
Could you just give us an update of what is different from the 1st post of this thread? (so that we know what changes you have made so that we can have a better idea of what the problem is)
Incorrect SBC for Mario Bros. removes all collision detection. Is sprite RAM being effected by your controller input somehow?
There is no Sprite 0 hit detection on Mario Bros.. This is a blatant CPU problem IMO. Could you post the CPU source?
1. Not much. I just rewrote the code so that it used less objects and just basically looked at the documentation, wrote as I saw fit, and checked with some other (working) emulator sources whether my implementation looked correct.
2. My controller input doesn't touch anything but memory address $4016, so I no, it doesn't. I don't get why this is the one game not functioning control-wise.
3. I'm messing around with the source code right now (implemented some things a bit dirty with how I'm handling reads/writes which invoke the PPU), but I'll post it as soon as I'm done refactoring some messy code.
Okay, so here's my CPU source (
http://codepad.org/42q4YS5s). Couple of things you might want to know:
1. I know, excessive commenting, but that's simply so I can remember stuff more easily.
2.
static final variables are basically the constants of Java.
3. Methods with an underscore at the start (i.e. _read _write _push _pull) invoke the PPU's cycle method 3x, implying 1 CPU cycle has passed. Methods with the same name but with no underscore (i.e. read write push pull) don't invoke the PPU.
4. The code is in Java, not C++, although the paste site does say C++.
5. I have no cycle counter yet, I just invoke the PPU's cycle method 3 times every cycle, so I don't actually need to count cycles (yet).
6. I have no IRQ implementation yet.
7. The code looks a lot like that of HalfNES (another Java NES emulator), mainly because I took the layout from there after having written 3 crappy and (IMO) badly structured CPU cores. Also whenever I was doubting a certain implementation aspect, HalfNES would be one of the emulators I'd look at immediately.
I'll hear it if anything else needs to be said. Thanks in advance guys.
Just read through the code and it all
seems OK to me. It would be a good idea to probably rewrite the entire CPU code and fix that bug now, rather than leaving until later on when it will **** up all other emulated components.
For a massive performance boost and to make your code smaller and more readable, please read this guide about deferring status flags calculation (written by blargg). I know that you haven't written the CPU with performance in mind
.
http://www.slack.net/~ant/nes-emu/6502.htmlFor instance, here is my AND code for all 8 AND opcodes (A and DataBus are 8bit unsigned). Compare that with what you have written and you will see what I mean;
N = Z = A &= DataBus;
WedNESday wrote:
Just read through the code and it all seems OK to me. It would be a good idea to probably rewrite the entire CPU code and fix that bug now, rather than leaving until later on when it will **** up all other emulated components.
I wish it was that easy but as I mentioned before... this is a rewrite. If you look at one of my earlier posts, I've uploaded my old source code which used a lot of unnecessary objects. This was a pretty clean rewrite.
WedNESday wrote:
For a massive performance boost and to make your code smaller and more readable, please read this guide about deferring status flags calculation (written by blargg). I know that you haven't written the CPU with performance in mind
.
Indeed, performance is the last thing on my mind right now. It's useless if I can't even get the simplest game to work. That said, I do know the article, and I already had it in my bookmarks in case I ever planned on optimizing my emulator.
Holy crap...
I dirtily implemented Mapper 1 for the sake of test ROMs, despite not having much confidence in finding the issue, and I started the official_only.nes test by Blargg...
The program proceeds to execute the individual tests:
Test 1..
Test 2..
Test 3..
Test 4..
Hold up, test 4 failed? LSR Zero Page doesn't work. Hmm, let's check:
Code:
public void lsr(int address) {
int data = _read(address);
....
carry = A & 1; //<<<<<<<<<<<<<<<<<<<<
....
}
All this time I was setting the carry to the 1st bit of A rather than the first bit of the read data. Just look at the source code I attached in one of my posts a few pages back. Mario Bros now works!
Blargg, I just want to give you a big 'ole hug man, your tests saved my sanity. Thank you so much! And thanks to everyone who was patient enough to spend helping a seemingly lost cause. You guys have no idea how happy I am
Nice job. How do mapper 1 games run? Give Mega Man 2 a try, it's a pretty forgiving game for inaccurate emulators.
By the way, those CPU tests will even catch you doing something incorrect like x++ or s|=0x08(SED) in your LSR. Obscure things like this can occur when copying and pasting instruction implementations.
miker00lz wrote:
Nice job. How do mapper 1 games run? Give Mega Man 2 a try, it's a pretty forgiving game for inaccurate emulators.
Thanks. My current version (the one with MMC1 support) is cycle accurate but has no sprite implementation yet. Drawing backgrounds seems to be just fine with games like Mega Man 2, Final Fantasy and Zelda. The issue is that some games tend to draw random tiles of black rather than the actual color, i.e. the guy with the white tanktop in the Contra start screen, the entire upper half of the start screen in Zelda II and several screens which are heavy on big background characters in Skate or Die. I think that's a problem with my drawing methods in the PPU though, so it should not be too much of a hassle to fix. I guess I'll quickly implement sprites and then I'll see how far I can get.
Blargg wrote:
By the way, those CPU tests will even catch you doing something incorrect like x++ or s|=0x08(SED) in your LSR. Obscure things like this can occur when copying and pasting instruction implementations.
I guess I can get pretty careless at times where I'm trying to work towards a goal too fast
I'm happy that there are so many tests though, I'd be lost without them.
ArsonIzer wrote:
The issue is that some games tend to draw random tiles of black rather than the actual color, i.e. the guy with the white tanktop in the Contra start screen, the entire upper half of the start screen in Zelda II and several screens which are heavy on big background characters in Skate or Die. I think that's a problem with my drawing methods in the PPU though, so it should not be too much of a hassle to fix.
My guess is that you're using the background color from the palette specified for the tile rather than always using the background from palette 0.
Quietust wrote:
My guess is that you're using the background color from the palette specified for the tile rather than always using the background from palette 0.
Actually, this is my code for it:
Code:
if (paletteIndex % 4 != 0)
c = Palette.COLORS[memory.read(0x3F00 + paletteIndex)];
} else {
c = Palette.COLORS[memory.read(0x3F00)];
}
c being a Color object in Java (just R, G, B and A values with some methods), and paletteIndex being a number from 0x0 to 0xF, denoting which of the palettes and which of the colors to use. I think this is right, since if paletteIndex % 4 == 0, meaning that it's one of the memory values 0x04, 0x08 and 0x0C, it will automatically fetch the background color at 0x3F00.
@ArsonIzer
Contra uses 8x16 sprites to draw the clothes of the men in the title screen.
Here is some code from WedNESday on how to handle the palette thing you just mentioned. This is the best way to go.
int Mask[16] = {0, 1, 2, 3, 0, 5, 6, 7, 0, 9, 10, 11, 0, 13, 14, 15};
WedNESday wrote:
@ArsonIzer
Contra uses 8x16 sprites to draw the clothes of the men in the title screen.
Here is some code from WedNESday on how to handle the palette thing you just mentioned. This is the best way to go.
int Mask[16] = {0, 1, 2, 3, 0, 5, 6, 7, 0, 9, 10, 11, 0, 13, 14, 15};
Of course! That's why my previous version with only 8x8 sprite implementation draws the shirts partially, while my cycle-accurate BG only PPU draws jack diddly-squat. I really need to implement more than this measly 20% of the PPU before I start complaining -_-
Thanks mate. Also, that mask thing looks good; I'll use it instead of the if-else statement.
Yeah as a general rule, avoid if-else statements whenever you can in heavily-used functions. They hurt performance quite a bit, at least on x86. Taken branches hurt. When you DO have to use if-else, you should put what's most likely to be the true case under "else". It takes a fair amount more clock cycles when a branch is taken because it then has to also calculate the effective address of the branch destination. When not taken, the code just falls through.
Modern x86 chips may not be so awful, I'm not sure, but at least in the old days on the 8088/286 era chips it was a substantial penalty.
miker00lz wrote:
Modern x86 chips may not be so awful, I'm not sure, but at least in the old days on the 8088/286 era chips it was a substantial penalty.
switches hurt too. Even my Pentium III felt the pain.
miker00lz wrote:
Yeah as a general rule, avoid if-else statements whenever you can in heavily-used functions. They hurt performance quite a bit, at least on x86. Taken branches hurt. When you DO have to use if-else, you should put what's most likely to be the true case under "else". It takes a fair amount more clock cycles when a branch is taken because it then has to also calculate the effective address of the branch destination. When not taken, the code just falls through.
Modern x86 chips may not be so awful, I'm not sure, but at least in the old days on the 8088/286 era chips it was a substantial penalty.
Branch penalty on the 8086 up through the 80386 weren't good, but they were comparable to many other operation times, so branching was still usually the right choice. With later machines, the cost of a cache miss started being huge enough, even with speculative fetch, such that using tables are better for things that never used to be true (at least as long as CS:IP/EIP/RIP is still in-order).
ArsonIzer wrote:
I guess I can get pretty careless at times where I'm trying to work towards a goal too fast
I'm happy that there are so many tests though, I'd be lost without them.
If you're using a UI toolkit or environment that supports parsing XML files, your emulator has the capability to feed in captured controller inputs, your emulator has the ability to stop after a specific number of PPU cycles, and your UI toolkit or environment has SHA-1 capability, you could do what I did and automate the process of running multiple [almost two hundred in my case] test ROMs. Some test cases repeat test ROMs with different inputs to produce different result screens. It is very handy. For example I just realized I'm failing one of the sprite overflow tests now
. Coupled with git bisect, a test automation can shave *minutes* off of bug chasing.
Doesn't even need to be XML; it can be JSON or tab-separated or a variant on INI. All you need is a way to start the emulator playing a movie from the command line (which you need anyway for the TAS crowd) at warp speed (which the TAS crowd will appreciate once you add AVI export) and
hash the current screenshot. Of course you need a beefy multicore PC and an efficient emulator if you want to run 200 test ROMs in a reasonable time.
WedNESday wrote:
Here is some code from WedNESday on how to handle the palette thing you just mentioned. This is the best way to go.
int Mask[16] = {0, 1, 2, 3, 0, 5, 6, 7, 0, 9, 10, 11, 0, 13, 14, 15};
What do you mask (AND) with these entries?
blargg wrote:
WedNESday wrote:
Here is some code from WedNESday on how to handle the palette thing you just mentioned. This is the best way to go.
int Mask[16] = {0, 1, 2, 3, 0, 5, 6, 7, 0, 9, 10, 11, 0, 13, 14, 15};
What do you mask (AND) with these entries?
Mask[Tile1Bits + Tile2Bits + AttributeBits];
Why? How do you do yours?
WedNESday wrote:
blargg wrote:
WedNESday wrote:
Here is some code from WedNESday on how to handle the palette thing you just mentioned. This is the best way to go.
int Mask[16] = {0, 1, 2, 3, 0, 5, 6, 7, 0, 9, 10, 11, 0, 13, 14, 15};
What do you mask (AND) with these entries?
Mask[Tile1Bits + Tile2Bits + AttributeBits];
Why? How do you do yours?
I think the point was the naming. "Mask" implies your ANDing something with them, but it looks like the table just contains an index in to the palette.
Yeah, a mask is like a literal mask, blocking things out and only letting some things show through. For example:
Code:
const uint8_t oam_mask [4] = { 0xff, 0xff, 0xe3, 0xff };
...
result = oam [addr] & oam_mask [addr % 4];
Hehe, it was the first verb that came to find when I wrote it years ago.
Naming is one of the biggest challenges when creating programs. You want it descriptive of what the thing does or is used for, distinguishing from what other things do, unchanging as the program evolves, true to the usual meaning of the word, preferably a common word most programmers understand, not overly long to type. You've got to solve these for dozens of names a day, and they don't have any impact on generated code so it's difficult to find the motivation to think of the long-term.
blargg wrote:
Naming is one of the biggest challenges when creating programs. You want it descriptive of what the thing does or is used for, distinguishing from what other things do, unchanging as the program evolves, true to the usual meaning of the word, preferably a common word most programmers understand, not overly long to type. You've got to solve these for dozens of names a day, and they don't have any impact on generated code so it's difficult to find the motivation to think of the long-term.
I couldn't agree more, but in this case Mask[] is only used twice for obvious reasons so its not important. I was gonna change it's name later on anyway.