Ok - so I'm working up a theory here based on things I think I once saw somewhere.
I have a graphic glitch that I can not explain. Certain objects are drawn accompanied by bizarre glitched out horizontal lines. This seems to have nothing to do with the object data itself, as duplicates of the objects work fine.
I tried very hard to comb though to find any corruptions to memory, couldn't find any. I have isolated it to my draw routines, but can't quite pinpoint the exact place where it is happening, and like I said, it doesn't seem to be any particular input data that is doing it.
A few emulators, including our custom one, run fine with no glitch at all on these objects. But FCEUX and actual hardware-on-cart present the problem, leading me to believe it has something to do with timing in some way.
So...it got me thinking. I'm snagging *what sprite to draw* in a complex routine involving some indirect addressing, pulling data from a LUT table that is generated by my screen tool (of all the various objects' animation data and whatnot). Could this problem result from indirect addressing crossing a page boundary? I mean, since the data amount is sort of variable (monsters can be different sizes, have more animation, etc, changing the size of the tables) ... so this seems to track. But...
I have no idea how to check for this or to solve the problem. Like - does this sound probable/possible as a reason to cause this sort of problem? And how would I determine it for sure? And...how would I go about fixing it?
Any thoughts? Thanks!
Can't think of anything based on the description alone. Page crossing wouldn't be my first guess, as all that'd do is cause your code to be a little slower (unless you're doing some funny pointer math). This could be a problem if you were running this sort of complex processing during vblank, but I hope you're not.
Can you post a video or an animated gif to let us know what the glitch looks like, precisely? "Glitched out horizontal lines" could mean a lot of things.
BTW, FCEUX is not the best option for testing PPU timing issues. Its PPU emulation leaves a lot to be desired (special the "Old PPU", but the new one has also given me some weird results a couple of times). If possible, use a more accurate emulator to capture the glitch.
The 6502 has a "bug" if the address of the indirect pointer itself crosses a page, but it's really obscure. For most instructions, the indirect address has to be on the ZP, so it specifically only applies if you were trying to store the pointer at $FF. (This does not apply to the place the pointer points to. That part crosses pages just fine.)
The indirect JMP is the only one not restricted to ZP, but ca65 does give a warning for this case though: Warning: "jmp (abs)" across page border
The ZP indexed modes (not indirect) all wrap to ZP, but it would be highly unusual to try to store an array in a way that triggered this problem. E.g. if you had a 32 byte array at $0F0-$110 somehow? (Why would you put part of an array in the stack area?)
So... unlikely that this is your problem, I think? It's not normally something that has to be worried about.
Here is a link to a video where I show the bug fairly thoroughly and why it's vexing me so...
https://vimeo.com/205810305
I'm pretty certain the only way you should be able to get that kind of glitch is if something is writing to the PPU during rendering—especially if it shows up in FCEUX. ... so hopefully you should be able to use FCEUX's breakpoints to trace back what's causing that write.
This is sort of what I thought too - but if you watch the video, and how it doesn't happen with object, but does with the copy, and the copy copy, and does with the copy copy copy, but NOT with the copy copy copy copy of the object....it tends to lend more credence that it is in some way memory related, no? I mean, I'm all for trying to go tracing back through to try to find writes to PPU while rendering is on, but i'm pretty sure they're all covered...and that still doesn't seem to explain why it's happening when it is.
Thank you for the response though. And I'll certainly try tracing back through again!
Agree with the theory that the PPU address is pointing at palette. This might be the case of multiple (software) bugs working together to manifest some weird behavior. Also I noticed the new object you added was #17, maybe there's some buggy 16-bit math, like a multiplication by 16, that is corrupting memory somehow.
Dustmop - that occurred to me too, though I can pretty definitely rule it out (the 16 thing). It is sort of variable depending on objects being loaded. So for instance, the first time this happened, i'd loaded the monsters first, whereas this time, the NPCs...i think it manifested on the 18th object that time. This lends credence to it being memory-location related, too, as each object might be made up of more or less sprites, have more or less animations, etc.
Also, there doesn't seem to be any logical reason where, in this instance, by the 19th or 20th object, it would have righted itself that I can think of?
I'm certainly open to it being *multiple bugs* working together. Just how it is manifesting is incredibly bizarre to me. I appreciate all the thoughts, for sure...hopefully someone spins my brain in a different way!
What's that "NES Maker" program you're using? Custom made tool?
Punch - yes, and info will be forthcoming about that when the game is done...that's our first priority. Honestly, some info about it will likely be hitting as early as next week. If you're unfamiliar with the project -
http://www.thenew8bitheroes.com/ a ton of people here are featured and were integral in the whole project's development
But anyway, back to the glitch...i added a dummy loop in the main code just to evaluate the timing...it *moved* the glitch lines down the screen a bit. So it definitely feels like a timing issue. But in context, I don't get it...
Definitely looks like mid-screen messing with the PPU to me as well.
In FCEUX, I'd recommend doing this:
- 1. Put a breakpoint on $2000-2007 writes. Leave it disabled.
- 2. Advance to the glitch screen.
- 3. Stop execution (click "Step Into" in the debugger).
- 4. Look at the scanline counter. Click "Run Line" a bunch of times until you're out of NMI, maybe scanline 10 or so.
- 5. Now enable the breakpoint and hit "Run".
If you're writing to PPU in the middle of the screen that frame, it should break and it hopefully would be easy to figure out from there. If it's not happening every frame, you might have to try steps 3-5 a few times until you catch a "glitch" frame.
For some extra information on what's going on, you can enable a trace between step 4 and 5 and save an execution log.
Thanks - I did try something similar to this...from my understanding it would've had essentially the same effect...
I set up a breakpoint on writes to 2000-2007 from the start. Just let it catch the writes during the screen loading nonsense (skipped passed them) and then ran it during normal game play (when the screen was glitching). It didn't catch any writes during that time while the glitch was happening.
This should've had the same result, right? If it was a stray write to 2000-2007, it *should* have caught it then?
If there's RAM contamination, I guess it's also possible it's writing to one of the PPU mirrors. Try extending the breakpoint range all the way up to $3FFF...
(Also, what cart hardware are you targeting?)
Reads of $2007 and it's mirrors also affects PPU state, so maybe extend the breakpoint to reads as well.
Yeah, that's a good point. Also break on reads, and mirrors. Break on $2000-$3FFF, not just $2000-$2007. (If you have a bug that's writing to the PPU by accident, it could easily be at a mirrored address.)
From the look of it, I'd bet it's a write to an address that ends with $5 or $D.
Actually, if you've already ruled out writes to $2000-$2007 you might break $2008-$3FFF and save yourself the trouble of filtering out the false positives from well behaved code during NMI.
My money is on a 2001 write...actually 2 writes, 1 to disable rendering, then immediately one to enable it.
Writes to 2005 or 2006 or 2007 would cause scroll changes. I see no jitter, so NO.
Writes to 2000, could switch the nametable, so that could be possible, if the other nametable were blank (Black).
PLACE YOUR BETS NOW!I'd reckon this is actually a big combination of a lot of things that have been suggested. A
write to the PPU, caused by a
flawed 16-bit calculation that produced an
indirect pointer.
Brian Parker for the win.
This one was hiding in plain sight. It took him 4 minutes to find with just the ROM, and took me three to fix once I knew *where* the problem was. Found it instantly. And it was certainly user error. I'll tell you how the problem manifested for anyone who is interested.
My static bank = full. I mean...buckling. Maybe five or six bytes left in it. I have had to continue to move chunks of data into overflow banks for the last year as the engine has grown in complexity. Sometimes I'm just in a rush when I do this, and I decide, "no...no, let's keep that part in static bank....what else can we move?".
It seems that I started to move some of the drawing code (specifically, the *figure out what sprite should I draw* code) to an overflow bank, and then must've changed my mind. I deleted the bank switch...but never the switch back.
I'm guessing, if I'm tracing it right, this gave me a wonky y value, and it was trying to change momentarily to bank #$20, which means it was trying to load chr bank 1. Brian saw that there was a chr bank swap, and that made me straighten...I am not doing anything with chr bank swapping (at least, not yet). When he found that, I found the problem almost immediately.
By the way - can I say once again how humbled I am to be part of this community, and how awesome and supportive all of you have been with all of this? No joke, this is like the last sane safehaven on the internet. I hope I get to meet all of you in person one day and buy you all drinks!
Noticing that nobody mentioned CHR banking, myself included. For some reason I'd presumed you were using CHR-RAM, but I don't know why I thought that. What mapper are you using, by the way?
I am using CHR Ram, indeed, which was why this was funky, and as soon as he noticed the chr bank switching, i knew right where to look (since I'm not doing that! haha).
Using INL's *mapper 30*...essentially an UnRom with 512.