First, what I've managed to do is I've learned a bit about 6502 disassembly by porting someone's Java code into C++. Now what I want to ask:
1) While the code does function for mapper #0, are all other mappers similar enough to allow me to reuse the code directly?
1a) If not, what do I need to account for? (The only difference I can understand about the mappers is the need to switch banks due to the differences in allowed code size.)
2) I feel that I may be hoping without cause, but are there any patterns which I can use to mark a probable data label? (for instance, is a branch instruction cause to check for the possibility of a label?)
For my last question, I would simply like this information for a future project:
3) What syntax/directives/features does an assembler need to be 'useful?'
Most code that functions for mapper 0 will be able to run well on any other mapper.
The difference is that mappers will cause writes to the memory mapped region to have side effects, to do with switching banks, IRQs, or whatever it has going on. If there are no writes to the mapper's registers (likely, since it does nothing on mapper 0 anyway) then the code should run fine.
The best way to sort out code vs data is to log memory access with an emulator.
I dunno about useful assembly features. I'm pretty happy with the ca65 feature set, maybe give its docs a read for some ideas. (Though it has things I don't use, and doesn't have other things that I would.)
Like with emulators, the NES has far too few assemblers. We need at least a dozen, so we can have more discussions about which is best.
I wouldn't even bother trying to implement code/data detection heuristically without using a code/data log from an emulator, I just don't think it's worth it (lot of work to get shitty results, whereas using code/data log requires little work to get good results).
Interactivity is a nice feature to have in an disassembler (see
IDA).
About assembler syntax/directives/etc, I think you'll have hard timing writing a good assembler if you're not assembler user yourself.
@blargg: sir/madam, let's get onto the same page. Are you trying to be sarcastic, or are you seriously of that opinion?
@thefox: I do use asm6 currently, but I've never tried to use a lot of its stuff.
@all: The final plan for the assembler is to bring it into an ide. I do not, under any illusions, expect to finish this soon, but I find that the more I take on at once, the less I actually get done. I do plan, currently to be able to implement an interface for pluggable tools (I.e. I do plan on having a product which, among other things, will allow the user to use his/her favorite assembler.) I am willing to put in up to 3 years for this.
Features of a good multi-platform, retargetable IDE:
-Plugins for video, audio, memory, etc. features
-Table-based Assembler that can load 8, 16 and 32-bit CPUs (Starting with 6502, Z80, 65816 and x86)
-a Emulator and Debugger that utilizes those Functions
-and of course, Syntax Hilighting using the Tables from the Assembler!
Are you a bad enough dude to bring this to the table?
My friend, I think that I can do it. For the a/v, etc. plugins I think that I can use OOD to great effect. I must ask, though, what is a table-based assembler? Is it the same as a multi-pass assembler that I've been reading about?
Either way, I'm going to get the basics implemented first: I'll build an assembler, text editor, and then a basic emulator. Then I'll start work on additional features.
If you want, we can continue this conversation via PM.
A table-based assembler supports multiple processors by having a table of each one's instruction set. Someone who codes for more than one of the supported processors doesn't have to get comfortable with as many different assemblers and can reuse some techniques/macros between them. Some even support user-supplied processor tables, allowing someone to make it support all their favorite processors. A complication is that each architecture has its own fine points in memory layout and banking, requiring the assembler to have a flexible and rich set of memory organization primitives.
blargg wrote:
A table-based assembler supports multiple processors by having a table of each one's instruction set. Someone who codes for more than one of the supported processors doesn't have to get comfortable with as many different assemblers and can reuse some techniques/macros between them. Some even support user-supplied processor tables, allowing someone to make it support all their favorite processors.
The ca65 assembler, popular among some users of this board, has a mode
.setcpu none that disables recognition of all 6502 instructions. From there, you can reimplement all needed instructions as macros using ca65's macro language. I ought to try reimplementing 6502 in macros just to prove it can be done and give an example for those who would implement SPC700, 8080-derivatives such as Z80 and Game Boy, CHIP-8, 68000, etc.
Quote:
A complication is that each architecture has its own fine points in memory layout and banking, requiring the assembler to have a flexible and rich set of memory organization primitives.
And the ca65 toolchain certainly has a flexible linker in ld65. One limitation I can see is that it assumes program memory is linear in the sense that you can't target a platform with a polynomial program counter, such as a certain infamous 4-bit microcontroller.
tepples wrote:
The ca65 assembler ...] has a mode .setcpu none that disables recognition of all 6502 instructions. From there, you can reimplement all needed instructions as macros using ca65's macro language.
Wow, this is inspiring. I'd love to ditch wla-dx. I'm skeptical that this doesn't have snags, shortcomings in the macro system that are fine for normal use, but pose serious problems for something like this. Z-80's nn versus (nnnn) comes to mind as one that might be tricky to parse, though I know that ca65 supports substring functions that might work for these.
blargg wrote:
...though I know that ca65 supports substring functions that might work for these.
Not substring functions, but token list extraction. Using .mid(), .left(), .right(), it can extract and match the tokens listed
here. That should be enough to parse what you need though.