SNES in CA65: Long Instructions Help

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
SNES in CA65: Long Instructions Help
by on (#86256)
I read CA65's manual, so here is what I am figuring out, What does CA65 accept for 65816 24-Bit jumps and pointers?

I think it must be a ''JSR or JMP !Label'', a ''!'' symbol before a lable representing a ''Long'' value, Is that correct?

by on (#86261)
Do JSL and JML work?

by on (#86280)
tepples wrote:
Do JSL and JML work?


I can't verify because there are no examples (that I know of) of using those opcodes on CA65, so if you have any examples, now seems to be a great chance,

Another thing, About Data Pointers: What can I do to data longs, are they .DL, .LONG or something?

by on (#86369)
I am still a bit confused with how longs in CA65 work, It is not very simple if I do not know what does work.

Will anyone still lend me a helping hand for explaining how 24-bit pointers and jumps in CA65 work? I really appreciate it.

(Sorry for double posting, Waited at least 2 days for this. Next time, I'll avoid this.)
Re: SNES in CA65: Long Instructions Help
by on (#237893)
Seven years later, I finally got around to experimenting with this. Hamtaro, do you still need insights? If so, I can provide some. (The short of it is: ca65 is quite stupid about knowing when to use 24-bit addressing, even in situations where segments are explicitly declared as far. There are some exceptions based on "aliased" opcode/instructions (ex. jml), but not on addressing modes (ex. lda label,x where you want this to assemble to instruction $BF, or long indexed (24-bit). The solution for the latter is to use the f: prefix on your labels, and examine your post-assembly listing file very very carefully). Let me know here in the thread if you need advice.
Re: SNES in CA65: Long Instructions Help
by on (#237897)
So
Code:
  JML jump_point
  JSL subroutine
are guaranteed to use 24-bit addressing while
Code:
  JMP f:jump_point
  JSR f:subroutine
  LDA f:table,x
are not?
Re: SNES in CA65: Long Instructions Help
by on (#237903)
JSR f:label might produce an error. The Mnemonic for long jumps to subroutines is JSL.

The others f: instructions should work, however, things in other modules might produce error messages, I think you need to explicitly import and export as long address like

.export bar: far
Re: SNES in CA65: Long Instructions Help
by on (#237922)
I see, so both declaring the segment as far and using the "f:" prefix, and for long jumps you always use the special menemonics JML/JSL and you should hopefully be fine.
Re: SNES in CA65: Long Instructions Help
by on (#237957)
Yes, that's correct.

A qualm rainwarrior and I have is with the fact that if you "make a mistake" (e.g. declare the segment as far + there's a label in the segment, but don't use f:label), the assembler silently assembles to the 16-bit absolute address which results in obvious runtime mistakes/errors. I personally ran into this last week and discussed it on the nesdev Discord with rainwarrior and others.
Re: SNES in CA65: Long Instructions Help
by on (#237960)
Yes, using JSL/JML or the f: prefix are safe. They will ensure you get the correct instruction for a far jump.

Otherwise these are automatically selected by whether or not the label is declared :far somewhere above its usage in the file. I don't really like the conflation of JSR with JSL but there seems to be some precedent for this in older assemblers and reference. (I'd love a directive to disable JSR as JSL.)

There are three ways I know of to declare something as far, using address modifiers:
Code:
; put it in a segment with a modifier
.segment "s" : far
label: ; this label is now a 24-bit address

; use .import, .export, or .global to forward-declare it with a modifier
.global label : far

label: ; this label inherits 24-bit size from its .global declaration above

.proc label : far ; this labelled procedure is now a 24-bit sized address
rts ; be careful if using .smart mode, which will change rts in a far context into an rtl.
.endproc

Because of the one-pass model of ca65, this only works if the address modifier declaration is above the place you used it. Unfortunately, accidentally putting the address declaration below fails to produce an error for the uses above, and the address gets silently truncated to 16-bits currently. I consider this a bug and have raised an issue on GitHub to report it.

Also, I strongly advise against using .org at all with ca65. It turns off most of the linker's capability to do safety checking. As a feature in ca65 it mostly exists to permit use of legacy assembly code that relies on it, but the idiomatic ca65 way to place code is through segments and your linker config.

Incidentally have been reading the ORCA/M manual after koitsu linked it to me. It also recommends against ORG: "The ORG directive, a relic from the older Apple // machines, is really not needed on this system. It is used to force code to be located at a specified address. We recommend that you not use it at all." This is basically for the same reason, as it is also an assembler with a linker, though in this case much of the link work seems to be done by the Apple II executable format, similar to DOS executables? (I don't know the Apple II territory very well.)


However, what I think I would suggest, at least as far as ca65 goes, is to avoid the "all in one file" model of doing things. If your translation units don't have more than one bank's material in them, they are a lot less prone to this error. I was personally surprised that I'd never run into it, and realized this practice is why; I'm used to this style of organization from C coding, where it was normal to separate your code into multiple files with imports and exports (in C, the header file externs) to manage the things that would connect.

When all your "far" references are imports at the top of the file, there's a lot less chance to mess up the ordering, or have problems with the one-pass model of things.


This got me thinking more about the movable direct page issue. I started an issue thread about this as well at the github: 65816 direct page and data bank assembly directives?

However, I raised the issue not really expecting ca65 to ever change in this way, but more as an opportunity to foster discussion about it. I listed the coding practices I use that seem to make it a very manageable issue for me so far, and the change I'm proposing I think is very high impact on the compiler. So... unfortunately the work (especially testing) required vs. what I'd get out of is below my threshold of wanting to undertake it. I'd welcome comments about it, though, which is why I started the issue thread. Can you think of a lower impact way to do it? Do you have other coding strategies that work well? Am I wrong to think it would be a useful feature at all? (I consider myself very experienced with ca65, but not so experienced with 65816, so I'd really appreciate commentary.)


Also, take note that the address and operand modifiers discussed above work for data instructions as well, i.e. LDA f: is always 24-bit, but regular LDA will do it only if it's already declared as far. By default they will be 16-bit addresses, though, which means you need to ensure the correct DB manually. Again, has the same problem that it won't report an error if your :far appears after instead of before, and trying to mix abs bank-local data fetches with far cross-bank ones or otherwise mixing banks in one file is a bit of a nuisance.
Re: SNES in CA65: Long Instructions Help
by on (#237963)
Ah, finally found the relevant stuff in ORCA/M. It actually doesn't seem that far from CC65 in structure, but critically here it was designed for 65816 specifically, rather than being a 6502 thing with a 65816 mode added on.

It has a similar concept of a relocatable SEGMENT.

The data bank problem is solved with a USING directive. Within a SEGMENT you can write USING segment_name and any references to labels from that segment will be considered bank-local. Any references to other segments will be considered far. You still have to manage DB yourself, but that's how you can tell the linker that other segments are still local.

It also has a DIRECT directive, which changes the direct page. From the documentation it sounds like it has the same constraints as ca65, i.e. DP variables must have an address known at assemble time to optimize... However it uses < | and > more or less like ca65's z: a: f: as operand prefixes, though < is explicitly usable with direct page here, so I think what you have here is using a prefix < when you want DP instructions (or using EQU to allocate directly, maybe) but the nice thing is that combined with DIRECT it can do error checking to make sure the variable you're using is on the correct page at link time. ca65 can't do that with z:, which can't be relocated, currently.


So... I guess the DIRECT thing is more or less equivalent what people have been proposing for a long time with ca65.

The USING thing is a little bit more "automated" approach to what I proposed in that github thread about an equivalent way to set the assumed DB. Might actually be less error prone? Though it is very asymmetrical with DP... I will mention it in the thread though, because I think it's an interesting alternative.

Though maybe worth pointing out that the currently viable technique I use of encapsulating banks as translation units is somewhat equivalent to USING, i.e. you're USING anything you .import as abs and not far. At that point it would have similar resistance to errors and safety checking, I think.
Re: SNES in CA65: Long Instructions Help
by on (#237968)
I can't think of an instance where I would NEED to change the DP from zero. Yes, you will get added efficiency. But consider this... with index registers set to 16 bit, you can adjust the DP by setting the X or Y register to $100 or $200.

(65816 in native mode does not restrict indexing to 1 page, unlike all previous 65xx chips)

And, for the far issue, you can manage far addresses by changing the DB (B) and using standard 16 bit absolute addresses.
Re: SNES in CA65: Long Instructions Help
by on (#237969)
dougeff wrote:
I can't think of an instance where I would NEED to change the DP from zero. Yes, you will get added efficiency. But consider this... with index registers set to 16 bit, you can adjust the DP by setting the X or Y register to $100 or $200.

That does defeat the purpose of using DP.

The usual cited use case is making an array of objects out of DP pages, which seems a pretty sensible use to me? (You don't have to use 256 bytes for each object, it could be shared with other arrays, etc.)

Also TG16/PC engine requires it to be fixed at $200, which is a bit of a different need but would be effectively serviced by the same feature.

dougeff wrote:
And, for the far issue, you can manage far addresses by changing the DB (B) and using standard 16 bit absolute addresses.

The main issue was the bug that generated non-working near addressing instruction from addresses that needed to be far. (Hopefully that won't be too hard to fix.)

The deeper issue is that it's a little bit obtuse to specify whether a label is near or far, and that if you want to have code from two different banks in the same translation unit it becomes trickier to use them both ways in the two different contexts. Some assembler level integration would permit much better error resolution and simplify this a lot, I think. (Though like I said, my solution was kinda just to break it into different files.)

Consider jumping to another bank in the same file, and changing the DB. Now all those labels that were local should now be far, and vice versa. There's no way to make a label act like both at once. You can use operand prefixes (z: n: f:) but these aren't allowed to truncate a label that's already far, they can only expand 1-byte to 2 or 3 byte operands, and check for range errors... except without knowledge of which bank near labels are in it's not allowed to error check or correctly truncate. The alternative is to manually truncate, which throws it away without the error check. (You could maybe make macros for this, but this feels worse than the alternatives to me.)

There's a huge opportunity here for the assembler to both keep you safe from accidentally mixing banks, and effectively auto-select efficient instructions, but it can't do either without giving the programmer an opportunity to specify a little more information about it.


...though TBH the more I say it out loud the better it seems to just break it into files like I've been doing and let that existing encapsulation ability take care of it. :S
Re: SNES in CA65: Long Instructions Help
by on (#237974)
Quote:
That does defeat the purpose of using DP.


Not really. If the purpose is to reduce the size of the binary, then what I described is no bigger.

Scenario 1. Changing the DP from 0000 to 0100 and loading from it, address $185, starting from 16 bit A and X

LDA #$0100 ;3 bytes
TCD ;1 byte
REP #$20 ; 2 bytes
LDA $85 ;2 bytes

Scenario 2. Leaving DP at zero, adjusting index

LDX #$0100 ;3 bytes
REP #$20 ;2 bytes
LDA $85, X ;2 bytes
Re: SNES in CA65: Long Instructions Help
by on (#237975)
The purpose I was referring to is speed, not size.

Not using DP avoids the problem entirely, of course, but I've seen many people complain here and elsewhere about not having a good way to do it in ca65. I can think of a few ways to do it, but built-in support would make it safer, easier, faster, simpler, you know?
Re: SNES in CA65: Long Instructions Help
by on (#238149)
I think I have a fix for the bug where you declare a segment as .far too late in the file for the first pass to catch it:
https://github.com/cc65/cc65/pull/885

I don't have a lot of internal cc65 experience, so I'm not 100% confident it's a complete solution to that bug, but it passes my own tests and doesn't fail cc65's regression test suite. Binary build is attached if anyone wants to help give it a test.

All the above suggestions still apply, this has nothing to do with having the assembler understand DP/DB, but I think it addresses the critical error of accidentally generating 16-bit addresses where you wanted 24-bit.

Basically as long as a label is in a :far segment it should be error checked for you now, regardless of whether it's declared above or below. The default addressing is still 16-bit (as it should be), but if the symbol isn't declared as far before its used, you'll get a range error because it would have assumed the default, which you can quickly resolve by adding a f: prefix on the operand (or e.g. using JSL instead of JSR).

Also note that if you use .global : far as a forward declaration, the .segment the label is in still has to be : far. Segment overrides import/export declarations. (This is good, though, it means you can use a symbol as a near address within the file, then .export it with a far address to be used elsewhere.)