How do you guys debug your code

How do you guys debug your code
by albailey on 2007-02-19 (#22034)

This is a question about 6502 code written to run on the NES (and its emulators).

I'm wondering how you guys debug your 6502 code. What tools and/or techniques you use.

Personally, when I write a snippet of 6502 and I am not sure if I've done it right, I double check it in 6502 simulator, then add it into my code stream.

But over the weekend I made a bone-headed mistake. I tested my code in 6502 simulator as best I could but when I ported it to CA65 I made my mistake.
I was using a mask as part of my joypad input routine and had declared the mask:
JOYPAD1_SELECT_MASK = $04

And then I made use of it incorrectly like this:
AND JOYPAD1_SELECT_MASK

instead of like this
AND #JOYPAD1_SELECT_MASK

My point is, it was a dumb mistake, but what was more frustrating is that since I am a newbie I'm still unfamiliar with effective techniques to track down these errors.

I tracked it down by writing information to some memory addresses and viewing them using the memory viewer of FCEU.
I had tried stepping through the debugger of FCEU but had no luck because I lack patience when I run across code snippets like this:
: LDA $2002
BPL :-

and the debugger keeps looping over and over waiting for VBLANK. I try entering the value for the next instruction after this loop but I assume I am doing it wrong in FCEU because it never goes to that breakpoint for me.

So let my hear how your guys debug.
Thanks,
Al

by Bananmos on 2007-02-19 (#22037)

When developing code, I used to test it directly on the hardware as long as no hard-to-track-down bugs appeared. When they did, I always loaded the binary in FCE Ultra, having a list file output by the assembler in notepad next to the emulator's window. I never had a problem with the LDA $2002 snippet you mention, since you could easily make the emulator run until it hit the location after the polling loop.

At the time, I thought I couldn't get my EPROM emulator working in windows (turns out it was just a collision with my scanner's driver which also used the parallel port) so I didn't revert to using an emulator to debug the code unless I really had to, since I had to copy the stuff by disk from my DOS-only laptop to my desktop PC. Would I start coding something now, things oughta be much simpler.

If the assembler won't output a list file you're kind of screwed though, so be sure to use one which does.

For checking that the pattern tables had been written as expected, I always used Nesticle. Partly because Nintendulator (which can also display PPU contents graphically) was far too slow for my PC, especially when displaying PPU contents. But mostly because Nesticle worked in DOS so I didn't have to do the copy to my desktop.

Then of course, you have bugs which are graphical glitches resulting from not being properly synced with the PPU. These you need to track down on the actual hardware, so a good way is replacing your $2005/$2006 write with a code snippet that takes the same amount of cycles, buts sets/unsets the monochrome bit or emphasize bits in $2001 instead. That way, you can easily check if the write happens where you expect it to, and notice if your code is unsynced altogether.

I'd really wish for an emulator that has (besides a good graphical display of PPU contents) a debugger which can take a list file as input, and let you debug your code using symbols in the same way the MSVC debugger does. Don't expect any emulator to be up to the task though, as that feature's potential user base would be rather small. :)

by Bregalad on 2007-02-19 (#22039)

I often debug my code with FCEUltra, and sometimes I add manual beakpoints. I sometimes use "inc $100" as an easy breakpoint, because you'll never want to do anything with the location $100 (unless you run out of system RAM), it's in the stack, but the stack will never use the whole page and grow up to $100 (assuming you initialised it to $1ff at startup).
For the $2002 trick you mentionned, either double-click on the next instruction, forcing the CPU to jump there (knowing the emulation won't be accurate for timing any longer), or use a breakpoint on the next instruction's adress (wich is accurate, but happens to be annoying, since you'll usually have to remove the previous brackpoint, and set it back afterwards).

If you need to debug precise scanline stuff, I recommand getting it working in Nintendulator, wich is very accurate when it comes to tricky PPU synch. However, it is very slow and sometimes frezes, and that's why I use Nintendulator only when I have to.

Of course I won't claiming anything "works" without trying it on hardware when it comes to PPU or APU timing quicks.

by never-obsolete on 2007-02-19 (#22044)

Bananmos wrote:
I'd really wish for an emulator that has (besides a good graphical display of PPU contents) a debugger which can take a list file as input, and let you debug your code using symbols in the same way the MSVC debugger does.

FCEUXDSP has support for symbolic debugging. it uses this format:

#address#labelname#comment

the readme gives a more detailed explanation on how to set it up.

by albailey on 2007-02-19 (#22047)

Bregalad wrote:
For the $2002 trick you mentionned, either double-click on the next instruction, forcing the CPU to jump there (knowing the emulation won't be accurate for timing any longer), or use a breakpoint on the next instruction's adress (wich is accurate, but happens to be annoying, since you'll usually have to remove the previous brackpoint, and set it back afterwards).

I didnt know about the double click "trick". Thanks. That'll be very useful next time for me.

I'm going to read the docs. Whenever I try setting a breakpoint, I never seem to hit it. (meaning I'm not doing it right)

Al

by Bananmos on 2007-02-22 (#22106)

Quote:
FCEUXDSP has support for symbolic debugging. it uses this format:

#address#labelname#comment

the readme gives a more detailed explanation on how to set it up.

Just checked out FCEUXD SP. Really neat indeed! I suppose this will be my favourite emulator from now on. Its reverse-engineering features are just lovely. :)

Still though, even if I were to make a program for converting all labels and comments in my list file to this format, the code in the debugger wouldn't look very much alike my source anyway.

What I'd really like is a way to single-step the source code in the list file. (generated by X816 in this case) I guess that's asking for too much though, especially considering each and very assembler has it's own peculiar syntax for the list file it generates.

by Bregalad on 2007-02-23 (#22118)

The only solution to that is to create your own developement environement, including assember, linker and emulator all in a common IDE. Of course, the coded for various assembly and emulating parts could be stolen elsewhere with their respective author's agreement, so that it is possible.

by Bananmos on 2007-02-23 (#22122)

Not necessarily if there was a clear consensus on which 6502 assembler is preferable, but there ain't. I guess ca65 is the most used, but I prefer x816 myself, just because I'm so in love with its anonymous labels. :)

by Disch on 2007-02-23 (#22123)

Bananmos wrote:
I prefer x816 myself, just because I'm so in love with its anonymous labels. :)

They seemed awkward... like I never understood exactly how they worked. Then again maybe I'm just too used to ca65's anonymous labels.

by Bananmos on 2007-02-23 (#22124)

What I love so much about them (besides not having to spend half your coding time thinking up pointless label names) is that after just a few weeks of using them, your eye will be able to "see the bigger picture" in your algorithms just as easily as if you were using for- and while-loops, if not better.

An example:
Code:
--
-
lda DecrunchingState
beq -
bpl -

lda DecrunchingBank
jsr MMC1_WriteRegister3

lda #<DecrunchedTune
sta TempWord2
lda #>DecrunchedTune
sta TempWord2+1
ldx CrunchedTune
ldy CrunchedTune+1
jsr DecodeLZ77

lda #$01
sta DecrunchingState

-
lda DecrunchingState
bne -

jmp --

A second one:
Code:
ldx #8
-
lda _666addHi
beq +
+

lda _666 ;3
clc
adc _666add
bcs + ;3/2
+ sta _666 ;3 19.xxx/18.xxx

dex
bne -

by tepples on 2007-02-23 (#22126)

Bananmos wrote:
What I love so much about them (besides not having to spend half your coding time thinking up pointless label names)

You could use label names that start with '@'. Labels named this way are local, that is, they are valid only between label names that don't start with '@'. This means you can use '@loop' in more than one subroutine. CA65 has also a slightly different flavor of anonymous labels, called ':', which can be referenced backward with ':-' or forward with ':+' .

Code:
-
lda DecrunchingState
bne -

Is this supposed to loop forever? Or does an interrupt handler change DecrunchingState?

Anyway, I translated the first code to CA65, using an @-label:
Code:
@mainLoop:
:
lda DecrunchingState
beq :-
bpl :-

lda DecrunchingBank
jsr MMC1_WriteRegister3

lda #<DecrunchedTune
sta TempWord2
lda #>DecrunchedTune
sta TempWord2+1
ldx CrunchedTune
ldy CrunchedTune+1
jsr DecodeLZ77

lda #$01
sta DecrunchingState

:
lda DecrunchingState
bne :-

jmp @mainLoop

And the second one. The biggest difference is that a CA65 program can't easily use backward references that overlap forward references. (But what exactly does _666 do?)
Code:
ldx #8
@loop:
lda _666addHi
beq :+
:

lda _666 ;3
clc
adc _666add
bcs :+ ;3/2
:
sta _666 ;3 19.xxx/18.xxx

dex
bne @loop

by Bananmos on 2007-02-23 (#22127)

Thanks for displaying the difference tepples. That shows pretty well why I prefer the x816-style. I find those colons to be somewhat ugly, and the inability to overlap the reference kills most of the benefits of unnamed labels IMO.

But to each his own... some might say that too many unnamed labels make the code less readable as well, and I can't say they don't have a point. But I find code equally hard to read when there's too much "label clutter".

Though I'm not saying I could never become a ca65 convert if I had good enough reasons. (an emulator which allowed source-code debugging would probably qualify as a good one :)

Yeah, an interrupt handler changes DecrunchingState in the first example. Both sections are from Years Behind btw. The code that handles music switching in the NMI routine sets DecrunchingState to #$80 once all the variables for the DecodeLZ77 routine have been initialized. This will put the main loop into the decrunching state. Once the tune is decrunched, the main loop sets DecrunchingState to #$01 to signal that decrunching is complete. The NMI routine will then call NED_SetupNED once and then reset DecrunchingState to #$00.

The mainloop gets a few scanlines of cycles at the end of screen rendering, but even more of them in the volume bar window where the replay routine would normally be called. The NMI routines grants it these extra scanlines by setting off a DPCM IRQ and returning prematurely from the NMI. The IRQ handler will then jump back into the NMI routine and the code continues where it left off. If anyone ever wondered why YB buzzes when switching tunes, this is the reason.

In the second example _666 is a "cycle accumulator" that keeps track of how to adjust the cycles to keep the scanline code in phase with the PPU. Since we have a fractional amount of cycles on both NTSC and PAL, we need to emulate (harhar) a fraction of a cycle somehow, and the only way to do that it to make the scanline code take an extra cycle at regular intervals, so that the average number of cycles/scanline becomes a non-integer value. The code would normally look like this: (the clc can be skipped if you're running low on cycles)

Code:
lda _666
clc
adc #113 ;use #85 for NTSC
bcs +
+ sta _666

This will make your code take an extra cycle every time the adc gives a carry. The _666 variable was named so by Loopy, who showed me this nifty trick in the first place.

But since there's DPCM in years behind, the DPCM DMA will steal some extra cycles every now and then, so you must adjust the #113 constant to other values which depend on the playback frequncy. And on the higher frequencies, it takes more than a cycle per scanline, and the _666 variable won't be enough any longer. So _666addHi works a a flag for this case. Both are changed accordingly in the replay routine.

by Disch on 2007-02-23 (#22129)

EDIT -- nevermind I just realized you answered my Q already in one of your examples =P

by Bregalad on 2007-02-24 (#22131)

Wla-DX also supports '-' and '+' labels, in theory up to 8 + or - aligned, but in practice I never use more than 3 to keep the code readable.
What is great with Wla-DX is that is seems the only assembler that has serious provisions for bankswitched code and RAM.

by Bananmos on 2007-02-24 (#22132)

But what's really bad about WLA-DX (at least last time I tried it, which was a couple of years ago) is:

1) Its buggy 65816 support
2) Its buggy and incomplete list file support, which make the abovementioned bugs extremely hard to track down
3) Its inabilities to automatically detect zero-page or absolute adressing for an instruction, due to the separation of the assembler and the linker stage

I really liked WLA-DX's features when I first tried it, but eventually realized it was lacking many elementary features. Unless these issues have been fixed, my recommendation is to avoid it.

by Bregalad on 2007-02-24 (#22133)

Well, if the 65816 is actually buggy, that's a lot of trouble for SNESdev, but not so much trouble for NESdev. What kind of bugs did it have ?
And about the list file, effectivly it seems to be lacking, but I never use them anyway.
I haven't noticed inability to detect zero page, however you have to tell manually to reduce instruction on zero page when one is found, with the instuction '.8bit'. It just doesn't do it automatically.

The only real annoayance with WLA-DX is that after the .8bit directive some instuctions (but not all of them) will try to force themselves to 8bit when you exept to adress memory with all 16 bits with them, generating a linking error... So you have to manually place a ".w" after memory location with some operands, wich is anoying, but you get quickly used to it.

What I really like with it is to tell you how many bytes are free in each rombank each time you compile your code. Usefull to plan ROM space efficiently.

by Bananmos on 2007-02-24 (#22136)

To be honest, I can't even remember what bugs it had anymore. Think it had something to do with it changing some instructions depending on 8/16-bit mode which should be indifferent to it. But like I said, I can't really remember now. What I do remember though, is that it was royal PITA trying to find track the bugs when the list files weren't working properly. To begin with, they contained no address information whatsoever. Also, some things like macros would put the list generator in some crazy state where it seemed to output nonsense data for lines until it somehow got on the right track again. Here's an excerpt from a list file to illustrate my point:

Code:
$8D $81 $40 isrSplitScreen
$A9 $80    phb
$8D $82 $40    phk
$20 $CD $D1    plb
$68    pha
$A8    txa
$68    pha
$AA    tya
$68    pha
$AB
$40
$8B ; lda.b #0
$4B ; sta GPU_RenderFlags
$AB
$48
$8A    lda.b #0
$48    sta GPU_hBlankStatus
$98 ; sta GPU_hBlankStatus
$48 ; sta GPU_hBlankStatus
$A9 $00 ; sta GPU_hBlankStatus
$8D $82 $40

$A5 $42    lda Joypad
$0A    asl
$10 $02    bpl +
$C6 $59    dec scroll2
+
$A5 $59    lda scroll2
$AA    tax
$BD $66 $CE    lda SineTable.w,X
$8D $00 $40    sta GPU_ScrollXLo
;asl
$A5 $59    lda scroll2
$18    clc
$69 $57    adc.b #$57
$AA    tax
$BD $66 $CE    lda SineTable.w,X
$8D $02 $40    sta GPU_ScrollYLo

Like I mentioned earlier, I consider list files to be absolutely vital when you need to examine and/or debug the assembled code. I don't know how you get by without them...

And yes, the "operande hints" you mentioned was one of the biggest problems with WLA-DX that I experienced. In the 65C816 mode, the size of addresses (which translate to different opcodes) and size of operands (which depend on wether the 65C816 is in 8-bit index/accumulator mode) were treated equally. On a 6502 you don't have separate CPU modes, but having to manually specify the size of your addresses on each instruction is just not worth the effort, and something you shouldn't "get used to". That's the assemblers job.

Like I said, I did like a lot of features in WLA-DX that other assemblers lack, and the good part was that Ville Helin is very responsive to mails and fixed a lot of the problems in the assembler that I mailed him about. But in the end, there were just too many of them and fixing all of them would require a total rewrite where many of the linker's responsibilities would need to be transferred to the assembler.

Therefore, I don't consider WLA-DX a wise choice for 6502/65C816 development. It might still be good for Z80 development though. (that's what Ville Helin made it for after all) But I haven't written enough Z80 code to be able to judge that properly.

by Bregalad on 2007-02-24 (#22137)

Quote:
Like I mentioned earlier, I consider list files to be absolutely vital when you need to examine and/or debug the assembled code. I don't know how you get by without them...

I don't know either, but at least I've no trouble doing so. The list file shows your code, that you can see in your source instead, and the binary data that you can see in the compiled binary file. The only interesting thing you can see is the corelation between both, but WLA can create a symol list, wich is much more handy when you want to know wich variable has been assigned with wich location or something like that, but I don't use it often because you usually see it immediately when tracing your code in FCEU. If you trace code you just wrote, then you can immediately see wich variables are in wich location, but if you trace code you wrote long ago, it may be worth looking at the symbol table so that you can see the correlation between your code in your source and in FCEU.
Aside of that I guess most bugs have been corected since. Anyway, what I like most with WLA-DX is that the author is very open to corrections and sugetions, and that no other assembler I know have that strong point.

by AWal on 2007-02-25 (#22145)

I have found a fondness to FCEUXD(SP) for it's debug capabilities. I've been hex hacking (mostly game genie codes) for a lot longer than I've been actualy writing code, so seeing the hex numbers for common things like LDA $xxxx, STA #$xx, and BEQ $xx are just natural for me.

Of course I have the .asm file to the side of the emulator, just in case I run into some deep trouble.

I've been using the same recycled code snippets for some time, so most minor code flaws are not much of an issue to me.

P.S.: Am I the only person still using TASM to compile code?

by Anders_A on 2007-02-25 (#22159)

Uhm, bananmos examples in ca65-syntax using anonymous labels:

Code:
:
lda DecrunchingState
beq :-
bpl :-

lda DecrunchingBank
jsr MMC1_WriteRegister3

lda #<DecrunchedTune
sta TempWord2
lda #>DecrunchedTune
sta TempWord2+1
ldx CrunchedTune
ldy CrunchedTune+1
jsr DecodeLZ77

lda #$01
sta DecrunchingState

:
lda DecrunchingState
bne :-

jmp :--

Code:
ldx #8
:
lda _666addHi
beq :+
:

lda _666 ;3
clc
adc _666add
bcs :+ ;3/2
: sta _666 ;3 19.xxx/18.xxx

dex
bne :---

You use n + or - to reference the n'th anonymous label forward or backwards. The implementation of this was buggy in older versions though.

I prefer ca65 syntax, since a label isn't really anonymous if you have to name it -- or -. And could just aswell have been named @loop1 and @loop2 or whatever (using ca65 syntax for local labels)

If you have a large chunk of code, ca65 syntax will ofcourse get harder to follow, but you shouldn't really use anonymous labels for jumps longer then a couple of rows anyway IMO.

by tepples on 2007-02-25 (#22160)

Anders_A wrote:
You use a number of + or - to go forth or back a number of anonymous lables.

But then you have to count the labels, and adding another anonymous label inside the loop throws off the count.

Quote:
If you have a large chunk of code, ca65 syntax will ofcourse get harder to follow, but you shouldn't really use anonymous labels for jumps longer then a couple of rows anyway IMO.

Like the outer loops in the examples.

by Bananmos on 2007-02-26 (#22164)

On the other hand, since ca65 is open-source, I guess you could hack in x816-style anonymous labels into it in a few rainy days. Since the convention it uses seems to be mutually exclusive to x816's convention, it looks to me like it could be added without interfering with ca65's existing anonymous labels-conventions at all. (correct me if I'm wrong)

by albailey on 2007-02-26 (#22165)

Whats interesting about the direction this thread took, is that its relevant to a debugging problem I encoutnered this weekend (and fixed).

I use CA65 and had some subroutines something like this

foo:
... do some stuff...
bne :+
... do some more stuff...
:
rts

bar:
... do some stuff...
bne :-
... do some more stuff...
:
rts

Well, I meant to say bne :+ in the subroutine "bar"

My code compiled fine because I had another anonymous label in subroutine "foo" declared before these and since both had an rts, I wasnt getting stack issues.

The actual code pertained to my input handling.

The code "appeared to work" in FCEU both in memory and in the final output. (which doesnt make any sense)

But the bug revealed itself in Nintendulator, and several other emulators I downloaded. I only was able to find it by scanning my code line by line (again back to my problem debugging)

So, I no longer trust FCEU as much as I used to. And I run everything
in both it and Nintendulator now.

Al

by tepples on 2007-02-26 (#22172)

albailey wrote:
My code compiled fine because I had another anonymous label in subroutine "foo" declared before these and since both had an rts, I wasnt getting stack issues.

The actual code pertained to my input handling.

The code "appeared to work" in FCEU both in memory and in the final output. (which doesnt make any sense)

But the bug revealed itself in Nintendulator, and several other emulators I downloaded. I only was able to find it by scanning my code line by line (again back to my problem debugging)

So, I no longer trust FCEU as much as I used to. And I run everything
in both it and Nintendulator now.

That's a good idea.

In general: If one emulator's behavior differs markedly from that of the top-tier emulators, write a test case that amplifies this behavior difference and post it on a web site. Have the experts here on nesdev try it out on an NES, and you'll know which emulators are misbehaving and where to report the bug. I've done this several times for differences between VisualBoyAdvance and the GBA hardware.

by albailey on 2007-02-27 (#22181)

Here is a chunk of broken code:

getJoyPad1Input:
LDA CURRENT_JOY1_STATUS
STA LAST_JOY1_STATUS
;LDA CURRENT_JOY2_STATUS
;STA LAST_JOY2_STATUS

; strobe joypad
ldx #$09 ; bit zero is 1
stx JOY1
DEX
stx JOY1 ; bit 0 is zero
; Now we read 8 times from Joy1
: lda JOY1
LSR A
ROR CURRENT_JOY1_STATUS
;lda JOY2
;AND #$03
;CMP #$01
;ROL CURRENT_JOY2_STATUS
DEX
BNE :-
rts

processAButton:
; Check if the A button was pressed
LDA #JOY_A_MASK
STA ACTIVE_MASK
jsr checkChangeForInput
LDA MASK_RESULT
CMP #$01
BNE :- ; This is the error line. It should be BNE :+
; To Do: process A button
:
rts

You can see the problem. Normally I jsr getJoyPad1Input and later jsr processAButton (I didnt include all that code).
The bug means I query JOY1 (4016) some more. Since I dont know what X is set to, it queries it an indeterminate number of times and causes my CURRENT_JOY1_STATUS variable to have an altered result. So later when I want to move left, it thinks I've moved right, etc..

Nintendulator and Nestopia process the input in a backwards fashion(expected behaviour)

FCEU doesnt show me an error (although it should)

RockNES and JNES just ignore my input.

If anyone would like the .nes file, let me know and I can email it to them or if anyone knows a free hosting service. My geocities site doesnt allow it
I cant post a link to the the .nes file (I have a geocities account and they seem to be blocking it)

Al