C++ WTF - NESdev BBS

C++ WTF
by WedNESday on 2013-01-01 (#105359)

On creation of a mapper 3 game...

Code:

case 3:
   VideoRAM = 0
   break;

While fetching either VRAM or VRAM data...

Code:

if (VideoRAM)
   Data = VRAM[....];
else
   Data = VROM[....];

After the working Arkanoid title screen (which means VROM has been used so far) just as you go to play my emulator crashes with an error. Upon debugging it says that it tried to access VRAM. Nowhere in the program is VideoRAM being set to 1 by mistake and if I try to manually trap the illegal access with...

Code:

if (VideoRAM)
{
   MessageBox(NULL, "WARNING", "", 0);
   Data = VRAM[....];
}
else
   Data = VROM[....];

...nothing happens because VideoRAM never equals 1 but it still goes on to crash straight away on the next line.

I have deleted my PCH once already and that helped but it came back after a few tries. Whats more it only seems to happen about 80% of the time. Just WTF is going on?

Re: C++ WTF
by thefox on 2013-01-01 (#105361)

"Happens 80% of the time" usually means some kind of concurrency problem. It's impossible to say based on the information that you gave. Because you talked about PCH files, I assume you use Visual Studio, so why don't you debug from the point where the crash happened? (Look at the disassembly if you have to.)

Re: C++ WTF
by WedNESday on 2013-01-01 (#105365)

I can't debug any further from the point where it crashes as the Ignore button is greyed out.

It has nothing to do with the PCH as regardless to whether I delete that or not the problem still persists.

Re: C++ WTF
by natt on 2013-01-01 (#105370)

WedNESday wrote:

I have deleted my PCH once already and that helped but it came back after a few tries.

I highly doubt it has anything to do with PCH files, but if you think it does, just turn them off.

Re: C++ WTF
by Dwedit on 2013-01-01 (#105371)

Sounds like an access violation? I've seen some programs put in SEH handlers (structured exception handlers) to allow execution to continue despite an access violation.

But anyway, are you sure it's not just uninitialized or corrupted memory?

Re: C++ WTF
by Zelex on 2013-01-01 (#105380)

thefox wrote:

That or also could be memory corruption.

Re: C++ WTF
by rainwarrior on 2013-01-01 (#105381)

Why not put a data breakpoint on the variable in question?

Re: C++ WTF
by cpow on 2013-01-02 (#105396)

WedNESday wrote:

Code:

case 3:
   VideoRAM = 0
   break;

While fetching either VRAM or VRAM data...

Code:

if (VideoRAM)
   Data = VRAM[....];
else
   Data = VROM[....];

There's no C++ here. :?:

It's things like this that give C++ a bad name. :roll:

WedNESday wrote:

I have deleted my PCH once already and that helped but it came back after a few tries. Whats more it only seems to happen about 80% of the time. Just WTF is going on?

Check your CHR bank switching code. Make sure you're never selecting VROM banks other than 0 or 1. I'm assuming you mean Arkanoid (U).nes. When I run that it is CNROM and has 2 CHR banks. Make sure that no matter what is written to $8000 - $FFFF you only use the number of bits necessary to select between the actual number of banks present in the image.

I looked at my CNROM implementation and I don't mask any bits so that implies that the correct values are written to $8000 - $FFFF at least by Arkanoid (U).nes [which works for me].

Re: C++ WTF
by WedNESday on 2013-01-02 (#105402)

1. What's wrong with my C++?

2. The problem seems to have fixed itself now since I have mended the PPU renderer. Although I must admit that this is not the first time that I have had code self modify.

Re: C++ WTF
by Jsolo on 2013-01-02 (#105403)

Code does not modify itself unless explicitly told to. You are probably a victim of the common buffer overflow (bufferus overflowis).
C++ provides many abstractions to prevent buffer overflows, for instance std::vector<> or std::array<>.

Re: C++ WTF
by mic_ on 2013-01-03 (#105497)

For issues like these you might want to make use of utilities like Valgrind or Electric-Fence in order to find potential error sources faster.

Quote:

Upon debugging it says that it tried to access VRAM.

So it actually halted execution and entered the VS debugger? In that case you should be able to see what value VideoRAM has when execution halts. Does it look valid (you seem to be writing only 0 and 1 to it, so then it should never contain anything else)? If it doesn't you might be having problems with buffer overflows as Jsolo suggested.
Another potential error source would be concurrency, though I have no idea how you've structured your emulator, and it's impossible to say from the snippets of code pasted here. If you've got dependencies to a global variable in more than one thread and at least one of those threads is modifying the variable you'll typically want to make those pieces of code mutually exclusive from one another.

Quote:

Nowhere in the program is VideoRAM being set to 1 by mistake

At which points do you set it to 1 on purpose though? Might that have happened sometime earlier during execution, and now you're looking at an old value of VideoRAM while VRAM has been freed?

Quote:

are you sure it's not just uninitialized ... memory?

Only local variables (i.e. those allocated on the stack) are truly uninitialized in C++. A global or local static variable should contain a zero value if not explictly initialized.

Re: C++ WTF
by tepples on 2013-01-04 (#105503)

I was under the impression that the use of global and static variables was "considered harmful" nowadays. Global variables make it harder to turn an object into one that supports multiple instances (such as the parallel emulators that power nemulator's Wii Menu-style game chooser). Global variables also interfere with multithreading, especially now that computers have multiple cores, and cores in low-cost or low-power CPUs such as Atom and PPE have simultaneous multithreading to hide the negative effects of in-order execution.

Re: C++ WTF
by koitsu on 2013-01-04 (#105508)

tepples wrote:

I was under the impression that the use of global and static variables was "considered harmful" nowadays. ... Global variables also interfere with multithreading, especially now that computers have multiple cores, and cores in low-cost or low-power CPUs such as Atom and PPE have simultaneous multithreading to hide the negative effects of in-order execution.

Utter nonsense.

Re: C++ WTF
by Jsolo on 2013-01-04 (#105510)

tepples wrote:

I was under the impression that the use of global and static variables was "considered harmful" nowadays.

Global variables have their place, as do macros and all other language features. There are situations where using global variables is perfectly fine.

tepples wrote:

Global variables also interfere with multithreading.

C++11 fixed this by introducing thread_local.

Re: C++ WTF
by WedNESday on 2013-01-05 (#105603)

Just a quick update for you guys. As I have fixed elements of broken mapping/mirroring/timing etc. the whole thing has gotten a lot better. I think that maybe before the PPU was getting into a loop that it couldn't get out of or something.

Re: C++ WTF
by thefox on 2013-01-05 (#105616)

When you come across a bug like that, IMO you should find out its root cause even if you can fix it by changing something unrelated. Otherwise the bug may resurface at some other time.

Re: C++ WTF
by koitsu on 2013-01-06 (#105622)

thefox wrote:

When you come across a bug like that, IMO you should find out its root cause even if you can fix it by changing something unrelated. Otherwise the bug may resurface at some other time.

+1

Any time a programmer describes an issue as "or something" it indicates they didn't take the time to find out what was really going on/causing the issue. I have a friend who writes code like this and makes preposterous claims like "calloc() crashes my program but using malloc() works just fine so the compiler or underlying C libraries obviously are broken" (I gave up talking to him about this sort of thing long ago).

Re: C++ WTF
by WedNESday on 2013-01-06 (#105630)

thefox wrote:

When you come across a bug like that, IMO you should find out its root cause even if you can fix it by changing something unrelated. Otherwise the bug may resurface at some other time.

+1

Couldn't agree more. But to be honest I think that if it was getting stuck in the PPU loop because there were not enough cycles being emulated then the problem has since gone away. Of course I will go over all my code at some point to check for the like of buffer overflows.

Re: C++ WTF
by tepples on 2013-01-06 (#105638)

koitsu wrote:

Any time a programmer describes an issue as "or something" it indicates they didn't take the time to find out what was really going on/causing the issue.

It could be that, or it could be that he did consult the documentation and found it incomplete. This was the case for some of the OAM refresh "bugs" in the NES PPU that were characterized starting in early 2009.

Quote:

I have a friend who writes code like this and makes preposterous claims like "calloc() crashes my program but using malloc() works just fine so the compiler or underlying C libraries obviously are broken"

I seem to remember one version of the C library for Windows segfaulting on free(NULL); when that's supposed to be a no-op according to the C standard. If a difference between the library's behavior and the standard can be demonstrated in a 20-line program, isn't it the library's fault?

Re: C++ WTF
by Zelex on 2013-01-07 (#105703)

tepples wrote:

koitsu wrote:

Any time a programmer describes an issue as "or something" it indicates they didn't take the time to find out what was really going on/causing the issue.

Quote:

Possibly, but these situations are exceedingly rare. 99.9999% of the time it is user error.

Re: C++ WTF
by rainwarrior on 2013-01-07 (#105713)

I'd say I encounter a bug produced by a compiler about once every year or two. It's pretty common for the programmer to blame the compiler at first, but yeah, it's generally pretty rare that this is really the case. Slightly more common is being able to crash/fatal-error the compiler, which doesn't produce a bug since it failed to build your code, but can be pretty annoying to resolve. Library errors are not terribly uncommon, I'd say, especially with younger ones, or platform specific things that don't have as wide a testing net.

One of my favourite bugs that we tried to blame on a compiler was corruption of 32 bit floating points that were being endian swapped for big-endian platforms. The process involved reinterpreting a float as a 32 bit int, swapping its bytes, then reinterpreting it back into a float. This occasionally produced a NAN, which if loaded into the floating point unit would end up changing a few bits. However, with optimization on sometimes the conversion back to a float was optimized away, and the FPU was bypassed, leaving it intact. The result was something that produced occasional corrupt data if run in debug, and very little corrupt data if run in release, so this bug actually lived for maybe 6 months before somebody figured out that something was wrong with the data being produced, and yeah, our initial reaction was to blame the compiler, until we looked hard at the assembly and couldn't find anything wrong with it. (The lesson learned was never to put the data back into a float type after endian swapping it, just keep it as an integer until you write it to disk.)

Anyhow, it's a valid instinct to suspect the compiler, I think. It will be wrong most of the time, but on those few occasions where you can follow through and actually find a problem with the code it's producing, it pays off when you can report the bug and the compiler gets fixed.

Re: C++ WTF
by WedNESday on 2013-01-07 (#105715)

Code:

...
int Mapper;
int Mirror[4];
int NMI;
...

becomes

Code:

...
int WhatFetch;
int X;
int Mirror[4];

...fixes most of my errors.

W. T. F.

Re: C++ WTF
by Dwedit on 2013-01-07 (#105718)

Stack corruption? Out of bounds writes on that last array?

Re: C++ WTF
by WedNESday on 2013-01-07 (#105720)

No out of bound writes on that array whatsoever. How would I check for stack corruption?

...btw I did uninstall Visual Studio from D: and reinstall to C: WITHOUT reinstalling the OS. When it first ran I did get a load of wierd error messages. Could that affect something?

Re: C++ WTF
by rainwarrior on 2013-01-07 (#105721)

When you find a variable is getting an unexpected value in it, put a data breakpoint on it. This will break whenever it gets written to, and you can often find the offending piece of code quite easily this way. (You can even make it break only when the offending value gets written.)

Stack corruption can be really hard to deal with, since it usually leaves you with unreadable/garbage callback information in the debugger. I've usually had to do a lot of printf-style instrumentation of the code to try and figure out exactly how far it gets before crashing to figure out where to look.

Re: C++ WTF
by WedNESday on 2013-01-07 (#105722)

The 4 Mirror[] entries are 240, 4, 14 and 0 when the debugger is run. At no point in the code are they set to anything other than 0, 1, 2 or 3 for obvious reasons.

Re: C++ WTF
by cpow on 2013-01-07 (#105723)

WedNESday wrote:

The 4 Mirror[] entries are 240, 4, 14 and 0 when the debugger is run. At no point in the code are they set to anything other than 0, 1, 2 or 3 for obvious reasons.

Then you're memcpy()ing or otherwise walking beyond the end of some other nearby array in memory...

If you ever see something in memory you know your code isn't writing there, you're usually wrong. Your code is writing it there but not the code you'd expect. For example, looking at all of your accesses to Mirror[] will not lead you to the culprit--it may actually lead you to some other problem if you discover that you're accessing Mirror[] with an out-of-bounds index, but that's not what's causing this problem. Looking at accesses to nearby arrays could help...but it's a long shot.

Set a breakpoint somewhere that you'll hit each time through a frame. See if you can narrow down when the corruption is occurring. Does it start out bad?

Re: C++ WTF
by koitsu on 2013-01-07 (#105730)

cpow wrote:

WedNESday wrote:

The 4 Mirror[] entries are 240, 4, 14 and 0 when the debugger is run. At no point in the code are they set to anything other than 0, 1, 2 or 3 for obvious reasons.

Then you're memcpy()ing or otherwise walking beyond the end of some other nearby array in memory...

Given this showing exactly that, I would say your theory is likely. Not picking on ya WedNESday! Just saying cpow's proposal sounds likely given some established history.

Too bad Windows doesn't have native valgrind; it can usually detect this kind of thing.

I'll expand on what cpow wrote here with something that's a little more technical but might make more sense to you:

cpow wrote:

If you ever see something in memory you know your code isn't writing there, you're usually wrong. Your code is writing it there but not the code you'd expect. For example, looking at all of your accesses to Mirror[] will not lead you to the culprit--it may actually lead you to some other problem if you discover that you're accessing Mirror[] with an out-of-bounds index, but that's not what's causing this problem. Looking at accesses to nearby arrays could help...but it's a long shot.

The reason it's a "long shot" and so on has to do with how OSes handle memory allocation. When a program starts there's actually a boatload of memory allocated all over the place (based on relevant executable header data and all the underlying segments defined in the executable itself -- yes I'm greatly and intentionally simplifying). A crash/exception (again keeping it simple) only happens when trying to access memory that is outside of the allocated space for your program -- anything that your program has allocated (either intentionally, or the kernel allocating for your program as a result of the program loading, etc.) is game for being accessed (read or written) without any complaint.

Memory allocation schemes in an OS do so in pages -- sequential amounts of memory that are not necessarily back-to-back linear. Phrased differently, let's say you have this line: foo = malloc(65536); bar = malloc(65536); You might be inclined to think that the underlying VM might allocate both 64KBytes back-to-back so that you could technically access foo[65536] and foo[65537] and actually be accessing memory allocated pointed to by the bar pointer. That assumption is wrong -- however, there may be memory (for other reasons) allocated for your program past that 64KByte allocation (referring to what foo points to) that can be accessed without an exceptions generated. It could be for some variables you allocated on the heap or the stack (either or). It could be for some underlying API bits that your program uses that allocates memory itself. All this is memory your program technically owns, which means you're actually free to access it in whatever ways you wish -- intentional or unintentional. This is how, for lack of better term, "memory gets corrupted" when a program does something it shouldn't be doing.

The result of this is often the programmer resorting to stupid ideas that "seem" to work and make him/her think they've solved the problem. Things like "I turned off optimisation and the problem is gone", "the issue doesn't happen if I enable debug symbols", "if I run 5 instances of the program the 4th one works fine", or screwing around with stack size (I really hate it when I see people do this). All these result in the programmer suddenly believing the underlying OS or system "is unstable" when in fact it's their software that's broken.

I've mentioned this before (in the same thread I linked above actually). My point in bringing that up is that depending on where the VM decided to allocate memory for the pointer called Pixel, it could be next to memory used for other things. When I say "other things" I mean quite literally anything relating to your program. Again: accessing something out-of-bounds that's still associated with your process memory space won't result in an exception.

I myself learned about this the hard way, maybe a year or so after I had started learning C. I had a piece of code (a simple fread() call and nothing more) that worked when using -O2 (optimisation level 2, i.e. more optimisations), but broke (crashed) when using -O1 or -O0. I had no idea why; I started blaming the compiler because the situation seemed backwards (I'd heard of optimiser bugs but generating working code with -O2 but crashing code with -O1 or no optimisation?) and I was pompous. It wasn't until "other mysterious issues" happened a week later that I compared my code to that of an open-source program. It took me a while to understand what was going on, but it was quite simply the exact same thing you experienced above with COLORREF *Pixel (but for me it was with FILE *fp vs. FILE fp and how -- or rather, what -- I was passing to fread()).

Tracking down out-of-bounds accesses like this is somewhat difficult and often requires that you build your binaries with a kind of "guard" or "wrapper" that may wrap itself around every single system or library call in attempt to try and do the messy work for you. I mentioned valgrind above; it does some of this as a wrapper, but there are other solutions that involve compiler features or third-party libraries that inject themselves into things (i.e. malloc() might now actually call a third-party library to do some tracking, then gets handed off to the real malloc()). I'm sure there are tools for this under Windows I just don't have familiarity with Windows development to be able to recommend any. :(

Re: C++ WTF
by blargg on 2013-01-07 (#105733)

Not sure if anyone mentioned, but enabling all warnings and adjusting code to quiet them is a good way to let the compiler help you.

Re: C++ WTF
by koitsu on 2013-01-07 (#105738)

blargg wrote:

Not sure if anyone mentioned, but enabling all warnings and adjusting code to quiet them is a good way to let the compiler help you.

...until you find people forcing typecasts to squelch said warnings (which in my experience is only necessary maybe 20-30% of the time; the rest are usually indicators of something anomalous). I strongly recommend -Wall -Werror. I still remember back in the early 90s when I was learning C and literally everyone I knew who did C kept telling me to "just ignore warnings". This came from a good 8 or 9 people. To this day it was the worst advice a large number of people (who now do professional programming) ever gave me. They were so incredibly wrong.

For folks using gcc, this is what I've used for many years on my own projects for debug/beta builds, and only are all of these removed for final production releases:

Code:

-g3 -ggdb -Werror -Wall -Waggregate-return -Wbad-function-cast -Wcast-align -Wdeclaration-after-statement -Wdisabled-optimization -Wfloat-equal -Winline -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpacked -Wpointer-arith -Wredundant-decls -Wsign-compare -Wstrict-prototypes -Wunreachable-code -Wwrite-strings

For production, the only thing I use is:

Code:

-fno-inline

HTH.

Re: C++ WTF
by blargg on 2013-01-07 (#105740)

Just in case anyone is wondering, there's no way to turn on all GCC warnings (despite -Wall), thus the list is necessary. It's a pity this is the case, since it means that each time they add new warnings, you must be sure you've enabled them. If they truely had a "turn all warnings on", you could add that to your options, then turn any off that you don't want. So when they add new warnings, you'd have them on, and if they were duds, you'd turn them off (-Wno...).

Re: C++ WTF
by WedNESday on 2013-01-08 (#105752)

All fixed now guys. Turns out it was a buffer overflow, FetchData[] not Mirror[]. Thanks for all your help.