Disassembler for NES 6502 - Problem with size of opcodes

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Disassembler for NES 6502 - Problem with size of opcodes
by on (#46101)
On the trying of writting a debugger I wrote a disassembler to make a great tool like that:
Nintendulator
Image

I'm walking to this way:
Look Mine
Image

But...
In some games as Super Mario Brothers and Others the disassembler works fine but on Excite Bike the disassembler generates an output that doesn't show correct. (mainly for the reset vector)

I'm always disassembler from the $8000 to $FFFA, reading and disassembling....

After get all the asm code...

I just point the cursor to RESET place.... in the Super Mario and others games fine! but on Excite Bike the cursor doesn't find the "correct" local.

On Excite Bike case:

The reset indicates the $C184 and my disassembler generated
Code:
C17D:CMP $E800, X
C180:NOT IMPLEMENTED [0X17]
C181:ORA ($00,X)
C183:ORA ($78,X)
C185:CLD


So to my disassembler I don't have the $C184 to go...

The initial address $8000 to forward shows...
Code:
8000:LSR $C3
8002:STA $C3, X
8004:TXS
8005:CMP #$2C
8007:NOT IMPLEMENTED [0XCB]
8008:ADC $C4
800A:TXS
800B:CMP #$2C
800D:NOT IMPLEMENTED [0XCB]
800E:ADC $C4
8010:LDA $7CC3 ,X
8013:CMP #$9A
8015:CMP #$2C
8017:NOT IMPLEMENTED [0XCB]
8018:AND $1CC4, X
801B:CPY $00
801D:ORA ($02,X)
801F:NOT IMPLEMENTED [0X03]
8020:NOT IMPLEMENTED [0X04]
8021:NOT IMPLEMENTED [0X02]
8022:NOT IMPLEMENTED [0X03]
8023:NOT IMPLEMENTED [0X04]


So I thougth that it was because the compiler puts some garbages which my disassember is trying to figure out what is it. am I rigth?

The compiler generates Illegal opcodes or garbage just to padding the left memory? (or even parameters or something...)

Or need I to change my way to disassembler ? I mean don't start on $8000 and finish on $FFFA.

ps1: I also believe that one of my opcodes must be with wrong size long.
ps2: some games tells 2 PRG banks but two of them are identical in this case I just ignore the address which holds the VECTORS.
ps3: sorry for my lack of knolegde on english language... if my thougths aren't well explained I can try to do it another way.

by on (#46102)
Nothing requires a game to store any code at $8000-$80FF. It could be a data table for all anyone cares, and in the case of games with non-trivial mappers, it often is. The only restriction is that $FFFC (the reset vector) has to point to a valid instruction. So read the starting location from $FFFD and $FFFC, and start disassembling from there.

by on (#46103)
Since you disassemble everything beforehand, the resulting disassembly is probably not correctly aligned to the code. This most likely happens because you interpreted data as code and that screwed up actual code.

For example, if I have the source of my game starts like the following:
Code:
   .org $8000

Reset:
   lda $2002

You'll probably disassemble it without problems. But what if I decide to add some data before that code, such as the number of lives the player starts with:

Code:
   .org $8000

Lives:
   .db $05

Reset:
   lda $2002

Execution will still start at "Reset", and the program will still work fine. However, if you try diassembling from $8000, that $05 will become a zero page ORA instruction, which requires an operand (the zero page location of the variable). Your disassembler will then interpret the LDA as this operand, so you'll get the instruction "ORA $AD", which is obviously wrong. Then it will interpret the next byte ($02) as an instruction, but that's an invalid instruction.

Well, you get the picture. Data will cause the code to not be aligned properly. Even if it is correct in some places, it might be wrong in others. I believe that most emulators with integrated diassemblers do it on the fly, disassembling only what they are currently displaying, and since they know the alignment of the current instruction they can correctly display the surrounding ones (although there will still be errors if this code is near some data).

EDIT: tepples' solution (starting from the location pointed by the reset vector) is the way to go if you insist on static disassembly, but keep in mind that there will still be errors, as that solution only guaranties the alignment of the reset code. Subroutines could still show up misaligned. One way to improve that is to follow every branch, JMP and JSR and keep disassembling from the locations pointed by those instructions, but still, there are many cases when a program jumps to a dynamic address stored in RAM, which you won't know until you actually execute the program. So I insist, the best way is probably to dynamically disassemble only the instructions that surround the current one while you emulate, instead of doing it all beforehand.

by on (#46104)
tepples wrote:
Nothing requires a game to store any code at $8000-$80FF. It could be a data table for all anyone cares, and in the case of games with non-trivial mappers, it often is. The only restriction is that $FFFC (the reset vector) has to point to a valid instruction. So read the starting location from $FFFD and $FFFC, and start disassembling from there.


Suposse that the initial address is $C000 this mean that all the code on $8000-$BFFF range is unuseless? (there is no subrotine on that)

by on (#46105)
dreampeppers99 wrote:
Suposse that the initial address is $C000 this mean that all the code on $8000-$BFFF range is unuseless? (there is no subrotine on that)

No, there could be routines there. You simply can't disassemble it all linearly and expect to get something meaningful. Machine language is very versatile, sometimes you'll even have code mixed with data, instructions that intentionally provoke code misalignment (see here), and other things that making disassembling a very complicated task.

by on (#46106)
If the reset code starts at $C000, then it might jump to code in $8000-$BFFF. A lot of games that use mappers are like this. Because the mapper's power-on state is unpredictable except that $C000-$FFFF is fixed to the last bank of the ROM, they reset into a piece of code in $C000-$FFFF that sets up the mapper and then jumps into code in $8000-$BFFF. So if you're doing a static disassembly, you really have to follow the JMP, JSR, and branch instructions just to get started.

But if your ROM has only one 16 KiB PRG bank, then it's replicated in both $8000-$BFFF and $C000-$FFFF.

by on (#46110)
tepples wrote:
So if you're doing a static disassembly, you really have to follow the JMP, JSR, and branch instructions just to get started.

Also remember that RTS can be used as a jump instruction too. Although this instruction is commonly used to return from subroutines, it's actual behaviour is that of jumping to the location pointed by the top of the stack, and there's a very well known trick that is manually placing an address there with 2 PHA instructions and then RTS'ing. That's probably impossible to follow in a static disassembly.

by on (#46111)
Quote:
[RTS's] actual behaviour is that of jumping to the location pointed by the top of the stack

the byte just AFTER the location pointed to (that is, plus one).

by on (#46112)
Yeah, well, the sentence was already too long as it was, which is why I left that detail out.

by on (#46146)
thank VERY MUCH you all ! :o :D :o
I work on Jpcsp emulator and in it the instruction (MIPS) are all with same size ! Then I was trying to do the same way... :?

I will try for the dynamic disassembling.