As I recently mentioned in another thread, after years trying to adapt to many of the existing 6502 assemblers and feeling constantly frustrated due to quirks and lack of specific features, as well as to the time I have spent trying to customize them to suit my needs, I've decided to write my own assembler. It's not supposed to be the ultimate assembler to dethrone them all (far from it!), but it'll pack everything I need out of the box so I don't have to overcomplicate things with intricate macros and jerry-rigs. The goal is to write something simple (so it doesn't take forever to get done), easy to use (no need for complex configurations) and generic enough to produce binaries for any 6502 machine (no need for NES header directives, for example, that can be done with macros). If I can make it flexible enough so it's easy to add support for other CPUs, even better!
I'm modeling this mostly after ASM6, which is the assembler that most closely resembles the ideal tool for me, but I'm picking up ideas from other assemblers as well. You might be asking: "If this is so close to ASM6, then why not just create a fork of it?" Well, besides not knowing HOW exactly to do that (I'm not particularly skilled in C and I can hardly understand the design of ASM6 from just looking at the source code), I'd need to modify a few core features of the program, and that would probably be too hard or even impossible for me to do, so I might as well write the whole thing from scratch. Plus I'm a control freak and don't want to be bound by other people's design choices, if I decide to include even more stuff later on. For now I'll be using a higher-level language than C, most likely JavaScript (one of the languages I'm most proficient in) running on Node.js, at least for prototyping, in order to get something working ASAP. If everything goes well, I may or may not port it to something more efficient at a later time.
The reason I'm making this thread is not to advertise my assembler, create hype (as if!), take requests, or anything like that. I'm actually a little insecure about my design ideas and ways to implement them, seeing as I've never written a program like this before, so I'd like to discuss some of these ideas with you guys first, to make sure I'm not missing anything important and making bad decisions left and right. If the end result is something people will be interested in using, I'll be more than happy to share the program, but like I said, my ultimate goal is to be able to write 6502 programs in a way that *I* am comfortable with.
First, I'd like to talk about the things I plan to carry over from ASM6, which are the following:
- LINEAR ASSEMBLY: I never cared much for outputting to different segments and linking a bunch of separate modules to put a ROM together, to me it makes more sense to simply fill the ROM linearly. Even when using ca65 I never felt the need to use segments or link individually assembled modules, I just used what as necessary to write my code as linearly as possible.
- MULTIPLE PASSES: I value this a lot because it allows for complex symbol resolving, which in turn allows for more dynamic memory arrangements, such as overlays, right-aligned sections and relocatable code, without specific features to handle those cases. All in all, the ability to freely use symbols/labels in expressions and directives greatly improves the versatility of the assembler.
- SIMPLE MACRO SYSTEM: Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler. Labels inside macros are all local. Recursion is not allowed.
Now here are a few things I'm also basing off of ASM6 but I plan on changing, taking other assemblers and my own use cases into account:
- OVERLAYS: ENUM is still the primary way to declare variables, but in order to facilitate the creation of overlays, ENUMs can optionally be named. This allows for an ENUM to pick up from where another ENUM left off. If two or more ENUMs pick up from the end of the same ENUM, they're effectively overlays. ASM6's ENUM can already kinda be used like that, you just have to define a label at the very end of the ENUM, and use that label as the starting address of another ENUM to keep going from where the first one stopped. The problem with that is that all these extra labels will get exported to label files along with the ones that are actually relevant for debugging. Maybe the solution is not to change ENUM, but to take a cue from ca65 and implement two forms of assignment for symbols, one that marks the symbol as a label (:=) and one that doesn't (=).
- ANONYMOUS LABELS: Anonymous labels are unidirectional in ASM6, and I have had to awkwardly put a + label and a - label on consecutive lines because I needed to access that point both from before and after it. For this reason, I feel like ca65's way is superior - the label is just a colon, and the direction is specified in the reference instead. Matching the number of colons to the number of + and - symbols is a bit error-prone though, specially if you need to add or remove an anonymous label between others that were already there, requiring you to double check and adjust all the nearby references, but I can live with that, specially considering that you're not supposed to abuse anonymous labels in the first place.
- LOCAL LABELS: Local labels start with "@", but having their scope delimited by non-local labels is too restrictive IMO. I constantly write subroutines with multiple entry points, which are obviously defined via global labels, and having those global labels break the scope of the local labels that should be visible in the entire subroutine is a major annoyance. To fix this, scopes now must be explicitly started with the SCOPE directive. Unlike in ca65, scopes are not blocks, you can only end a scope by starting a new one, meaning it's not possible to have nested scopes. Scopes can be named, which allows their local labels to be accessed from the outside (e.g. SomeScope.@LocalVariable).
- REPEATED LABELS: Several NES mappers require reset stubs and common subroutines to be repeated across multiple banks, and this is a problem when assembling a program because labels can't be repeated. One way to work around that in ASM6 is to define labels by assigning the PC to a symbol (e.g. MyLabel = $), since symbols can be reassigned without errors. However, I have created a problem when I introduced the SCOPE directive, because now I have to get around repeated scope names too. The only solution I could think of was to ignore repeated scope names, and create a nameless scope whenever a repeated name is supplied. This will cause any duplicates to be essentially invisible to the rest of the program.
And finally, a few things that ASM6 doesn't have at all:
- ZP ADDRESSING OVERRIDING: ASM6 uses ZP addressing whenever possible, but when you're writing timed code, you may want to access ZP locations using absolute addressing. Maybe an address size modifier like in ca65 (a:address) is the answer.
- MEMORY BANKS: Keeping track of what's where when dealing with bank switching is a big annoyance, and doing that manually is just too error-prone. My solution is to simply create a BANK directive that you can use to set the current bank number, causing any subsequent labels to be assigned that bank number. This information can then be extracted from the labels whenever necessary. The same numbers can be used over and over, since you may need to index PRG-ROM banks, CHR-ROM banks, RAM banks, and so on.
- FUNCTIONS: As far as I can tell, ASM6 doesn't have any built-in functions, and doesn't offer any means for users to create their own. User defined functions are probably outside of the scope of my simple assembler, but a few built-in functions could be really useful. There will certainly be a function to extract bank numbers from labels, and maybe a function to test whether macro parameters are blank or not.
- TEXT OUTPUT: ASM6 can only output error messages, but other kinds of messages are also very important. The OUT directive can output text without aborting the assembly process. Since this is a multi-pass assembler, text output must be buffered during each pass, and only text generated during the last pass must be displayed.
- CHARACTER MAPPING: Mapping characters to specific indices is important because we usually have few tiles to dedicate to text so we can't afford to be slaves of the ASCII encoding. The idea here is to use a directive to define an index, and then the character to put at that index. The reason to supply the parameters in this order is that after the index, you can supply multiple characters and strings, and the index will auto-increment to accommodate as many characters as necessary (e.g. CHARMAP $00, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", " .!?", $0D).
These are all my ideas so far. If anyone was brave enough to read through all of this, please share your opinions on what I have so far. Am I missing something important? Am I doing something in a dumb way? Please comment.
I'm modeling this mostly after ASM6, which is the assembler that most closely resembles the ideal tool for me, but I'm picking up ideas from other assemblers as well. You might be asking: "If this is so close to ASM6, then why not just create a fork of it?" Well, besides not knowing HOW exactly to do that (I'm not particularly skilled in C and I can hardly understand the design of ASM6 from just looking at the source code), I'd need to modify a few core features of the program, and that would probably be too hard or even impossible for me to do, so I might as well write the whole thing from scratch. Plus I'm a control freak and don't want to be bound by other people's design choices, if I decide to include even more stuff later on. For now I'll be using a higher-level language than C, most likely JavaScript (one of the languages I'm most proficient in) running on Node.js, at least for prototyping, in order to get something working ASAP. If everything goes well, I may or may not port it to something more efficient at a later time.
The reason I'm making this thread is not to advertise my assembler, create hype (as if!), take requests, or anything like that. I'm actually a little insecure about my design ideas and ways to implement them, seeing as I've never written a program like this before, so I'd like to discuss some of these ideas with you guys first, to make sure I'm not missing anything important and making bad decisions left and right. If the end result is something people will be interested in using, I'll be more than happy to share the program, but like I said, my ultimate goal is to be able to write 6502 programs in a way that *I* am comfortable with.
First, I'd like to talk about the things I plan to carry over from ASM6, which are the following:
- LINEAR ASSEMBLY: I never cared much for outputting to different segments and linking a bunch of separate modules to put a ROM together, to me it makes more sense to simply fill the ROM linearly. Even when using ca65 I never felt the need to use segments or link individually assembled modules, I just used what as necessary to write my code as linearly as possible.
- MULTIPLE PASSES: I value this a lot because it allows for complex symbol resolving, which in turn allows for more dynamic memory arrangements, such as overlays, right-aligned sections and relocatable code, without specific features to handle those cases. All in all, the ability to freely use symbols/labels in expressions and directives greatly improves the versatility of the assembler.
- SIMPLE MACRO SYSTEM: Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler. Labels inside macros are all local. Recursion is not allowed.
Now here are a few things I'm also basing off of ASM6 but I plan on changing, taking other assemblers and my own use cases into account:
- OVERLAYS: ENUM is still the primary way to declare variables, but in order to facilitate the creation of overlays, ENUMs can optionally be named. This allows for an ENUM to pick up from where another ENUM left off. If two or more ENUMs pick up from the end of the same ENUM, they're effectively overlays. ASM6's ENUM can already kinda be used like that, you just have to define a label at the very end of the ENUM, and use that label as the starting address of another ENUM to keep going from where the first one stopped. The problem with that is that all these extra labels will get exported to label files along with the ones that are actually relevant for debugging. Maybe the solution is not to change ENUM, but to take a cue from ca65 and implement two forms of assignment for symbols, one that marks the symbol as a label (:=) and one that doesn't (=).
- ANONYMOUS LABELS: Anonymous labels are unidirectional in ASM6, and I have had to awkwardly put a + label and a - label on consecutive lines because I needed to access that point both from before and after it. For this reason, I feel like ca65's way is superior - the label is just a colon, and the direction is specified in the reference instead. Matching the number of colons to the number of + and - symbols is a bit error-prone though, specially if you need to add or remove an anonymous label between others that were already there, requiring you to double check and adjust all the nearby references, but I can live with that, specially considering that you're not supposed to abuse anonymous labels in the first place.
- LOCAL LABELS: Local labels start with "@", but having their scope delimited by non-local labels is too restrictive IMO. I constantly write subroutines with multiple entry points, which are obviously defined via global labels, and having those global labels break the scope of the local labels that should be visible in the entire subroutine is a major annoyance. To fix this, scopes now must be explicitly started with the SCOPE directive. Unlike in ca65, scopes are not blocks, you can only end a scope by starting a new one, meaning it's not possible to have nested scopes. Scopes can be named, which allows their local labels to be accessed from the outside (e.g. SomeScope.@LocalVariable).
- REPEATED LABELS: Several NES mappers require reset stubs and common subroutines to be repeated across multiple banks, and this is a problem when assembling a program because labels can't be repeated. One way to work around that in ASM6 is to define labels by assigning the PC to a symbol (e.g. MyLabel = $), since symbols can be reassigned without errors. However, I have created a problem when I introduced the SCOPE directive, because now I have to get around repeated scope names too. The only solution I could think of was to ignore repeated scope names, and create a nameless scope whenever a repeated name is supplied. This will cause any duplicates to be essentially invisible to the rest of the program.
And finally, a few things that ASM6 doesn't have at all:
- ZP ADDRESSING OVERRIDING: ASM6 uses ZP addressing whenever possible, but when you're writing timed code, you may want to access ZP locations using absolute addressing. Maybe an address size modifier like in ca65 (a:address) is the answer.
- MEMORY BANKS: Keeping track of what's where when dealing with bank switching is a big annoyance, and doing that manually is just too error-prone. My solution is to simply create a BANK directive that you can use to set the current bank number, causing any subsequent labels to be assigned that bank number. This information can then be extracted from the labels whenever necessary. The same numbers can be used over and over, since you may need to index PRG-ROM banks, CHR-ROM banks, RAM banks, and so on.
- FUNCTIONS: As far as I can tell, ASM6 doesn't have any built-in functions, and doesn't offer any means for users to create their own. User defined functions are probably outside of the scope of my simple assembler, but a few built-in functions could be really useful. There will certainly be a function to extract bank numbers from labels, and maybe a function to test whether macro parameters are blank or not.
- TEXT OUTPUT: ASM6 can only output error messages, but other kinds of messages are also very important. The OUT directive can output text without aborting the assembly process. Since this is a multi-pass assembler, text output must be buffered during each pass, and only text generated during the last pass must be displayed.
- CHARACTER MAPPING: Mapping characters to specific indices is important because we usually have few tiles to dedicate to text so we can't afford to be slaves of the ASCII encoding. The idea here is to use a directive to define an index, and then the character to put at that index. The reason to supply the parameters in this order is that after the index, you can supply multiple characters and strings, and the index will auto-increment to accommodate as many characters as necessary (e.g. CHARMAP $00, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", " .!?", $0D).
These are all my ideas so far. If anyone was brave enough to read through all of this, please share your opinions on what I have so far. Am I missing something important? Am I doing something in a dumb way? Please comment.