I've been putting this off indefinitely and now that I have absolutely no excuse because it's the summer, I thought I'd have a crack at making a simple pong demo. I was trying to take baby steps by at first having an 8x8 ball that goes down and changes direction at the same speed and figured I'd do an eor #$FFFF on the velocity which should kind of make it a negative number, and it does flip it, but it makes the ball travel 2 pixels instead of one and it also exits the screen I thought I could just check to see if the ball's x position was over 248 (256 - 8) and 216 (224 - 8) and then do the eor there, and although it stops it the first time around, it doesn't stop it on the way back when the ball's position wraps to 65536, which is also over 0, probably because it's going back two and then adding one. Why is this phenomenon even happening? I figured adding #$FFFE would be the same thing as subtracting #$0001.
Code:
lda BallXPosition
cmp #248
bcc continue_finding_ball_x_position
lda BallXVelocity
eor #$FFFF
sta BallXVelocity
continue_finding_ball_x_position:
lda BallXPosition
clc
adc BallXVelocity
sta BallXPosition
lda BallYPosition
cmp #216
bcc continue_finding_ball_y_position
lda BallYVelocity
eor #$FFFF
sta BallYVelocity
continue_finding_ball_y_position:
lda BallYPosition
clc
adc BallYVelocity
sta BallYPosition
I can always add more code I made, but I think this is enough.
$FFFE is equal to -2, so you most likely want $FFFF (-1) instead. Adding 1 to the value after the
eor #$FFFF will get you the value you want ($0001 to $FFFF and vice-versa) for both the positive and negative direction (this is how
two's complement negation works).
Don't be so hard on yourself, dude, no idiot could even get this far with 65816 assembly language
Adding $FFFF, not $FFFE, is the same as subtracting one (because $FFFF is the two's complement 16-bit representation of -1, and adding -1 is the same as subtracting 1, blah blah blah...)
EOR #$FFFF followed by INC A is the same as negation (it's not clear whether you know this, you probably do but just sayin').
I'm not sure what's wrong - based on your code, if the X velocity starts at 1, it gets flipped - 1 once X = 249. Then the velocity = -2, so on subsequent iterations X = 247, 245...1, -1 = $FFFF. Then the X velocity should get flipped - 1 again, so it'll equal 1 again, so then X = 0 and things should start over...
I will say, as a rule, once your velocities start getting a little more unruly the safest thing to do is force the X position back to a known state on collision. So once you bounce on the left, just STZ BallXPosition and be done with it, and likewise on the right just set it to the max value (248). That probably won't fix your problem here, though.
Try stepping through it in a debugger?
Is there any reason why you didn't consider just doing v = 0 - v for negation? I mean, maybe you can save a cycle or two by futzing with bitwise stuff, but if you were wondering why that didn't work, why not take the simple approach first? (i.e. Get it working before you pursue optimization.)
Though the other problem you might run into is that simply negating is not entirely stable either; if the negated velocity is not enough to reject the ball from collision in a single step, it would get negated again on the next frame, and from that point it would end up trapped. The simple solution is to negate only when the position is below the floor AND the velocity is greater than or 0. Won't come up if your velocity is always sufficiently large at collision, though.
Another way to solve that issue is to move the ball out of collision first, and dampen the velocity a bit, so that the loss of velocity compensates for this gain in energy from moving it out of collision, but this is only really applicable if you want a ball that loses height with each bounce.
Though it sounds like you're not using gravity / parabolic motion anyway, so ignore that last bit.
adam_smasher wrote:
Don't be so hard on yourself, dude, no idiot could even get this far with 65816 assembly language
What's funny, is that I found out I made a grammatical error in the title, and I'm a native English speaker.
adam_smasher wrote:
Adding $FFFF, not $FFFE, is the same as subtracting one (because $FFFF is the two's complement 16-bit representation of -1, and adding -1 is the same as subtracting 1, blah blah blah...)EOR #$FFFF followed by INC A is the same as negation (it's not clear whether you know this, you probably do but just sayin').
Oh, I didn't actually think about that...
Yeah, it works now.
rainwarrior wrote:
Is there any reason why you didn't consider just doing v = 0 - v for negation? I mean, maybe you can save a cycle or two by futzing with bitwise stuff, but if you were wondering why that didn't work, why not take the simple approach first? (i.e. Get it working before you pursue optimization.)
I don't get it. The reason I'm doing the bitwise stuff is that the velocity flips no matter what direction the ball is traveling.
Espozo wrote:
rainwarrior wrote:
Is there any reason why you didn't consider just doing v = 0 - v for negation? I mean, maybe you can save a cycle or two by futzing with bitwise stuff, but if you were wondering why that didn't work, why not take the simple approach first? (i.e. Get it working before you pursue optimization.)
I don't get it. The reason I'm doing the bitwise stuff is that the velocity flips no matter what direction the ball is traveling.
The part you quoted has nothing to do with the initial direction. Maybe the stuff that followed was confusing, but what I meant by this particular paragraph was that if you want to flip velocity...
You have: velocity
You want: -velocity
So: velocity = 0 - velocity
Code:
lda #0
sec
sbc velocity
sta velocity
I was just trying to suggest that this is the simplest way of thinking about negation. Requires no knowledge of two's complement, only how to subtract. I was asking if you considered this or at all, or whether you thought negation must be a bitwise operation only?
Code:
lda velocity
eor #$FFFF
inc
sta velocity
The bitwise version works too, of course, but you started this thread because it's a little harder to get correct, right?
I agree that you should use SBC where it's clearer.
In-register negation with EOR and INC is more useful if you're trying to negate the result of a previous calculation rather than what's currently in a variable, such as calculating -f(y) or v - f(y). You could store f(y) and then load 0 or v and then subtract f(y), but the in-register method avoids the sort of
round trips to memory that make psycopathicteen cry.
I'm not arguing that you should do one or the other.
I'm just trying to point out a solution that Espozo seemed to be unaware of.
I'm pissed, I wanted to create a simple Pong demo to "prove my worthiness", but it ended up being a total disaster. Bad organization, near zero optimization, but the worst part is, is that there are a heap of bugs. Sometimes, the ball goes backwards through a paddle when "serving" it, and sometimes, the ball just goes outright through a paddle instead of bouncing back. I did a ton of lazy programming, like only checking the very surface of the paddle for collisions, and this is where it got me.
Maybe I was lazy because I knew I wouldn't be using this later anyway. Whatever. It would be much easier with an Object setup like I had going.
Attachment:
SNES Pong.zip [1.09 MiB]
Downloaded 84 times
The velocity of the ball when serving it is the same as when it hits the wall, so I have no clue how it's doing the thing where it goes backwards. Also, I tried to implement a pause feature, but then I realized that it pauses on and off every frame, but I was too lazy to get rid of it.
This is sad, but the fact that it's LoROM is kind of bugging me... I just don't want to make something in LoROM and then need the extra 32Mb later, but the odds of me getting there are looking increasingly slim...
How do you control player 2 on a computer?
My pong game on the NES is nothing special (its physics are not as complex as you're designing at all) but maybe it can give some inspiration:
https://github.com/nicklausw/pong-conso ... master/nes
psycopathicteen wrote:
How do you control player 2 on a computer?
Assign different inputs for the second player. How else?
nicklausw wrote:
(its physics are not as complex as you're designing at all)
I wouldn't exactly call this "complex". I originally tried to have it to where the ball's trajectory was influenced by the direction the paddle was traveling when it was hit by the ball, but that was kind of ridiculous. I also wanted it to act differently if the ball hit the edges and bottom part of the paddle. If I make a run and gun ever, I've always wanted to have it to where interactive explosions actually push enemies in the direction the blast is going and also do damage according to how close the object is to the center of the explosion, (I'd do another circular collision check after the original box) but I'm doubtful I'll get there.
Somehow, I feel like I've actually gotten worse at programming. The HiROM thing is driving me insane for no reason, so I think I'll suck it up and try to at least get the screen to appear in HiROM. All I'd really have to convert is the initialization routine, and the Vblank handler, if I'm not mistaken, to get it to where I can turn the screen white. (Well, and that snippet of code of course.)
Espozo wrote:
I just don't want to make something in LoROM and then need the extra 32Mb later
Oh, LoROM and HiROM don't have anything to do with maximum ROM size. Both have a maximum of 4 MB.
The difference is just that LoROM gives you 128 banks of 32 KB, whereas HiROM gives you 64 banks of 64 KB. It adds up to the same size, it's just a different layout is all.
Take another look at
this diagram I made earlier. Notice how each "ROM" rectangle (which represents a mirror of the same 4 MB ROM) has the same area; they're just laid out differently.
Well, I mean, it's not actually HiROM. I think it's like Mode 25, which is identical to HiROM except one of the parts that is normally mirrored is unique rom. I just won't use that part in HiROM, and if I need more rom, I'll switch to Mode 25 and they utilize the space. I believe I could potentially use this much space because I'm more interested in doing extra stuff than decompressing graphics. I probably couldn't store them in ram either, because you're kind of limited in terms of space. (Hell, the Splatoon idea would actually need more than 128KB of ram.)
You aren't going to need 32mbit, or anywhere close to it, for a pong game. Please stop getting hung up on this. Your brain has created intentional blockades as a way to avoid accomplishing your goal; ignore your brain. Just use mode 20 and accomplish your goal. A substantial number of commercial games are all mode 20 (example: Super Mario World). Once you accomplish that fully, then you can go back and try something like converting your game from mode 20 to mode 21 as a learning experience.
koitsu wrote:
You aren't going to need 32mbit, or anywhere close to it, for a pong game.
It definitely wasn't for this.
I'm sick of Pong, I liked my object system because it was easier to keep track of things, but that takes longer to make.
koitsu wrote:
Once you accomplish that fully, then you can go back and try something like converting your game from mode 20 to mode 21 as a learning experience.
I have to figure though, there has to be some demo that somebody has made that uses HiROM, right? All I really need is the initialization routine, and although I could try and make it, I'm not going to reinvent the wheel if I don't have to.
Do you think that it would be easier for me to make a full fledged game in LoROM and then convert it to HiROM then to try and convert the initialization routine to HiROM and then make the game because I'd have more experience? I really liked the Splatoon SNES idea. I could even render the character in Blender and draw over them to clean them up for sprites, (be prepared for another topic asking how to actually do this
) and I've figured out how everything will work from a graphical standpoint.
By the way, when I say HiROM, I really mean Mode 25.
I will admit though, the likelihood of me ever needing over 32Mb of rom is very slim, (I like a lot of graphical variety and frames of animation, but at the rate I draw, I won't ever get that much done) but like I've stated before, I want to make a game engine that I can release publically that could get more people into SNES development. The best day will be when we get people to collaborate in making games here.
Espozo wrote:
I'm sick of Pong.
If you're already sick of Pong, do you really expect to make it through a bigger project?
I've said it elsewhere before, but if you want finished projects, pick projects that you can finish.
If you can't finish pong, make something smaller first.
1. There are no demos I know of that use mode 21 or mode 25. Because, again, people don't really care: 32mbit of addressing space is a lot for most homebrew (and most (all?) homebrew I've seen has been mode 20). There are ***very*** few games that used more than 32mbit of space. (Don't let anyone show up here and tell you otherwise -- they are in the tiny minority, like maybe 1 or 2% of all SNES/SFC titles?)
2. You need more than an initialisation routine. You need to understand fully how the memory map works, because you're going to need to know how to organise your code and configure the assembler/linker so that labels/addresses/etc. refer to the proper bank. I've explained it as best I can, tepples has explained it a few times too, and some other folks have as well. I can't speak for others, but I'm at my wit's end trying to explain it.
3. I think you need to make a mode 20 game. I will repeat myself in a way: stop getting hung up on mode 20/21/25/whatever. USE MODE 20. You are absolutely in *no way/shape/form* at a skill level (right now) to be worrying about other memory modes. You just aren't. Nobody goes out and for their very first game, with absolutely no development or programming knowledge prior, creates something like Splatoon. Step back and be realistic. Baby steps (I've said this before). Nothing **at present** is going to stop you from making a mode 20 Splatoon-like game.
I'm sorry if this is harsh, but: the only thing I've seen from you to date is a screen that showed some graphics you ripped off from Metal Slug, converted them to one of the SNES's graphics formats (I don't remember which mode you used), threw in some sprites on top of it, and made joypad input move some sprites around (if I remember right). That's actually a **great** start! You should stick with improving that and doing with it whatever you want. I think overall you spend more time worrying about shit like mode 20/21/25 than you do actually accomplishing whatever your goals are. Turn off that part of your brain (I can tell you this because I myself have some form of OCD and have to turn that part of my brain off at times).
Worry about "running out of ROM space" when you get there. You ain't even close to being there yet with what you've done/got so far. You're trying to wear very "big boy" pants without even wearing pull-ups yet; wearing pull-ups is part of the process, and it's OK! That's me being respectful and realistic combined, k?
And keep in mind: **I** have never done an actual SNES (or NES!) game myself either. I've done lots of modifications and reverse-engineering of existing things, and of course written fresh code (for things like the Demiforce FF2j/FF2e intro), but game engines and actual game development is something I haven't done. Just the thought of it makes me cringe in fear -- it's a HUGE undertaking, even for someone that does have knowledge of the systems. Most of the guys on nesdev know way more than I do, and have accomplished way more than I do. So maybe take what I say with a grain of salt, but I assure you (no ego here) there is wisdom in my words.
4. The terms "LoROM" and "HiROM" have, since I have been doing this (since the very early 90s), always referred to mode 20 and mode 21. Mode 25 is a completely different beast. Please going forward say mode 25 and not HiROM, because the memory map is different for mode 25 than 21.
5. Then go work on your game engine! Do it in mode 20. If there becomes some kind of limit that would be alleviated switching to mode 21 or mode 25, then worry about that when you get there. Once you have that level of skill/knowledge, I think you'll find that (if deemed necessary) migrating your engine to use mode 21 or 25 won't be that hard for you. Trust me.
rainwarrior wrote:
If you're already sick of Pong, do you really expect to make it through a bigger project?
I just don't know what's going on, and I have a lack of motivation to fix it. Everything just fit so much more nicely with an object routine and a collision routine and whatnot: All I'd need to do is say the metasprite address, the size of the objects and what type of collision needs to be checked, and bam, done, and this is usefull for all types of objects. Instead, I'm writing stuff like:
Code:
lda Paddle1YPosition
sta SpriteBuf+5
clc
adc #$08
sta SpriteBuf+9
clc
adc #$08
adc SpriteBuf+13
clc
adc #$08
adc SpriteBuf+17
Maybe I can make an object routine and all the stuff that goes with it, and then come back to making Pong using it.
rainwarrior wrote:
If you can't finish pong, make something smaller first.
Well, I sort of finished it, it was just buggy and unpolished as all get out. I guess that's a step toward modern game development.
Anyway, what could I possible make that's more simplistic than Pong? I already got the screen to turn on flawlessly, does that count?
And yeah, thanks again koitsu for knocking some sense into me. I guess I'll just leave the other topic to die. I got too compulsive and jumped straight ahead. I'm not a fumbling idiot, but I've got a fair share of downsides, many that you are (unfortunately) well aware with.
As far as I can gather based on discussions of Tales of Phantasia, mode $25 is called "ExHiROM" because it shares a lot more with HiROM ($21) than with LoROM ($20).
But I agree with koitsu: If staying in mode $20 helps you learn, do so.
How I work when I'm motivated: (I did this in the timespan of a couple of comments ago to now)
Code:
lda #.BANKBYTE(dummy)
rep #$30 ;A=16, X/Y=16
sta ObjectTable+Identity
lda #.LOWORD(dummy)
sta ObjectTable+Identity+2
;====================================================================================
infinite_loop:
wai ;Wait for interrupts to finish (NMI/VBlank)
rep #$30 ;A=16, X/Y=16
jsr start_object_identifier
rep #$30 ;A=16, X/Y=16
inc ColorCounter
sep #$20 ;A=8
lda #$00
sta $2121
lda ColorCounter
sta $2122
lda ColorCounter+1
sta $2122
jmp infinite_loop ;Do this forever
.endproc
;====================================================================================
;====================================================================================
.proc start_object_identifier
rep #$30 ; A=16, X/Y=16
lda #ObjectTable
sta ObjectTableOffset
tcd
object_identifier_loop:
lda Identity ;load the object identification byte of the object we're currently on
beq next_object
pha
sep #$20 ;A=8
lda Identity+2
pha
rts
next_object:
rep #$30 ; A=16, X/Y=16
lda a:ObjectTableOffset ;says how many objects have been identified
clc
adc #$0020 ;add 24 to look at the next object
sta a:ObjectTableOffset ;store the result for the next time we go through the loop
tcd ;transfer the accumulator to y for "cpy #8192" and "lda ObjectTable,y"
cmp #ObjectTableSize+ObjectTable ;sees if all objects have been identified (each object is 48 bytes)
bne object_identifier_loop ;if so, quit searching
lda #$0000
tcd
jsr process_oam
rts
.endproc
;====================================================================================
;====================================================================================
.proc dummy
rts
.endproc
Basically, it's just a piece of code that looks through an object table, loads the 3 identification bytes which signify the address, pushes them to the stack, and then does an rts to get there, a trick tepples told me he found from a book about the Apple IIGS. (This is better than my old setup, where I used a lookup table that only allowed for 16 bit addresses.)
I can tell it works because I have code that changes the background color every frame that gets run after this, and it works.
Again, thanks for knocking some sense into me, and, of course, putting up with me.
I have to say though, and I know I've already been told this, but how do you create a sort of list that works like ".res", but only for the list and not the entire SNES memory? I want to create one for the object table. Also, I want to have it to where the object table size is automatically calculated based on multiplying the slot size (which should be automatically calculated to equal the list size) times the number of slots. Hopefully, this isn't asking too much.
Yeah, I think you were just a bit rusty from putting this off because of school.
Anyway, I have good news for you. I found a way to simplify my animation engine, without noticeable compromises.
That's good news. I'm assuming you're still doing the 16x16 and 32x32 searching thing. Did you ever get it to check for redundant tiles?
Anyway, I'm not to happy with myself, because I found a bug that had actually made it work.
I think you can figure out how this is wrong:
Code:
lda #.BANKBYTE(dummy)
sta ObjectTable+Identity
rep #$30 ;A=16, X/Y=16
lda #.LOWORD(dummy)
sta ObjectTable+Identity+2
So then, I fixed it, and I get a crash:
Code:
lda #.BANKBYTE(dummy)
sta ObjectTable+Identity
rep #$30 ;A=16, X/Y=16
lda #.LOWORD(dummy)
sta ObjectTable+Identity+1
What's going on in the first example is the number actually ends up being 0, which counts as a negative object, so it skips the part where it tries to do the rts to jump to the code. I found I was only checking to see if the top 16 bits were 0 instead of all 24, so I corrected it, and now even the top example doesn't work, but filling all 24 bits with 0 does.
This has to be the problem, but although I'm starring right at it, I don't see it:
Code:
object_identifier_loop:
lda Identity
bne continue_object_identifier_loop
lda Identity+1
beq next_object
continue_object_identifier_loop:
pha
sep #$20 ;A=8
lda Identity+2
pha
rts
next_object:
rep #$30 ; A=16, X/Y=16
Espozo wrote:
I think you can figure out how this is wrong:
In fact, the first and the second code snippets are both faulty.
Here's a hint to help you work out the real fix all by yourself:
The bank byte of any given 24-bit address = the
highest 8 bits of the address.
(This is of course assuming that
ObjectTable+Identity is indeed supposed to contain a 24-bit address, and that your third fragment of code is bug-free.)
Ramsis wrote:
The bank byte of any given 24-bit address = the highest 8 bits of the address.
Hmm... I guess this is right then:
Code:
lda #.BANKBYTE(dummy)
sta ObjectTable+Identity+2
rep #$30 ;A=16, X/Y=16
lda #.LOWORD(dummy)
sta ObjectTable+Identity
Well, unfortunately, I can't get it to work. I guess I'm not pushing the addresses in the correct order? (I tried with the original code, and it didn't work either)
Code:
rep #$30 ; A=16, X/Y=16
object_identifier_loop:
lda Identity
bne continue_object_identifier_loop
lda Identity+1
beq next_object
lda Identity
continue_object_identifier_loop:
pha
sep #$20 ;A=8
lda Identity+2
pha
rts
next_object:
rep #$30 ; A=16, X/Y=16
Should I instead be pushing from 3-1 instead of 1-3? I can't think right now.
I really can't tell what's wrong with your routine from those code snippets, but this looks fishy:
Espozo wrote:
Code:
lda Identity+2
pha
rts
Remember that
rts pulls 16 bit of data (namely, a return address minus 1) off the stack. So if you push some random data immediately before, unexpected behavior (i.e., a program crash) is the unavoidable consequence.
Ramsis wrote:
Remember that rts pulls 16 bit of data
Oh...
I thought it did 24: I did a 16 bit pha followed by an 8 bit pha, which I thought rts would cancel it, but I realize I need to use rtl.
I just tested it, but it unfortunately doesn't work, and I also changed the rts in "dummy" to rtl. I might just give up on trying to make this a long jump, but I'm kind of doubtful I can fit all the object program code I want in one bank.
Actually, I just thought of something... I'm sending the location of where I want to go to the stack, pulling it, and then on the next rtl, I'm actually pulling more than I'm pushing (and I'm not pulling the return address), which isn't good. So, hopefully, if I push the address of the object identifier code loop first, it should work.
There is a long indirect jump instruction.
Code:
rep #$30 ; A=16, X/Y=16
object_identifier_loop:
lda Identity
sta temp_address
bne continue_object_identifier_loop
lda Identity+1
beq next_object
continue_object_identifier_loop:
lda Identity+1
sta temp_address+1
jmp [temp_address]
next_object:
rep #$30 ; A=16, X/Y=16
Quote:
That's good news. I'm assuming you're still doing the 16x16 and 32x32 searching thing. Did you ever get it to check for redundant tiles?
No, but I found an easy way to make duplicated sprites when necessary.
So basically, my simplified slot searching algorithm works like this. At start up it copies a list of 20 32x32 slot CHR numbers, and a list of 32 16x16 slot CHR numbers. When it needs to a 32x32 slot, it pulls a number off the top of the list, and pushes it back on when it's done. The same with 16x16 slots.
Grabs a slot CHR number:
Code:
ldx {large_slot_stack_index}
dex
dex
lda {large_slot_stack},x
stx {large_slot_stack_index}
Clears the slot:
Code:
ldx {large_slot_stack_index}
sta {large_slot_stack},x
inx
inx
stx {large_slot_stack_index}
I did what you're saying, and I know I can't do an RTL anymore, so I wrote:
Code:
.proc dummy
jml object_identifier_loop
And it's telling me that "object_identifier_loop" is undefined, except it definitely isn't, and the code even comes after "object_identifier_loop" in the file, so I don't know what's up with the assembler.
You know, does this actually give me any advantage over what I tried earlier? It only wouldn't work because of the RTL in "dummy", but I can't do it here either.
IIRC, ca65 does this because it assumes you're going to define that label later, within the same scope it's being referenced, but then you don't, so an error occurs.
If that label is global, you can do jml ::object_identifier_loop, if it isn't, use the name of its scope. Working with scopes in ca65 (.proc creates a new scope) can be tricky sometimes.
tokumaru wrote:
if it isn't, use the name of its scope.
So I use the name of its scope in conjunction with the" sub label", somehow? Doesn't it go before?
You know, since you seem to be knowledgeable in ca65, how would I put stuff, like "XPosition" underneath something like "ObjectTableSlot"? It kind of works like .res, although it's not global, but I don't remember how to do it. Also, is there a way to have it to where the size of "ObjectTableSlot" is automatically calculated?
tokumaru wrote:
IIRC, ca65 does this because it assumes you're going to define that label later, within the same scope it's being referenced, but then you don't, so an error occurs.
This should only be a problem if the value needs to be a compile-time constant (its size matters, or it's used in an .if statement, and so on.) I'm not really familiar with 65816, but if ca65 always assembles JML to a 4-byte instruction then it shouldn't be a problem. If it tries to automatically pick whether to use a 3 or 4 byte variant, then it might give an error there.
EDIT: Gave it a quick try in ca65:
Code:
.p816
bar: nop
.proc xyzzy
jml bar
.endproc
^ Assembles fine, as expected.
Code:
.p816
.proc foo
bar: nop
.endproc
.proc xyzzy
jml bar
.endproc
^ Doesn't assemble, as expected.
Code:
.p816
.proc foo
bar: nop
.proc xyzzy
jml bar
.endproc
.endproc
^ Also works.
Summa summarum: nothing out of ordinary here.
I'm not particularly knowledgeable about ca65... But I've been able to build a framework using a subset of its features that I'm comfortable with. I did get the hang of scopes, but some of the quirks of single-pass assembly still take me by surprise.
I believe I already suggested you use .struct for things like the fields within object slots.
thefox wrote:
... I'm not really familiar with 65816, but if ca65 always assembles JML to a 4-byte instruction then it shouldn't be a problem. If it tries to automatically pick whether to use a 3 or 4 byte variant, then it might give an error there.
JML is a pseudo-op (or "alias") for JMP with an explicit 24-bit address. In other words: JML will always assemble to 4 bytes (1 opcode + 3 bytes for 24-bit address).
It seems from
this page that JML can also be used as an indirect long jump with a 16-bit pointer as an operand.
Great. Now what is wrong. (The file actually assembles, it's just that it crashes)
Code:
rep #$30 ;A=16, X/Y=16
lda #.LOWORD(dummy)
sta ObjectTable+Identity
sep #$20 ;A=8
lda #.BANKBYTE(dummy)
sta ObjectTable+Identity+2
Code:
rep #$30 ; A=16, X/Y=16
object_identifier_loop:
lda Identity
bne continue_object_identifier_loop
lda Identity+1
beq next_object
lda Identity
continue_object_identifier_loop:
sta LongJumpLocation
lda Identity+1
sta LongJumpLocation+1
jmp [LongJumpLocation]
next_object:
rep #$30 ; A=16, X/Y=16
Code:
.proc dummy
jmp object_identifier::next_object
Edit: I fixed some stuff, but it still doesn't work.
I don't know why it just now occurred to me to check the listing file...
I don't think this is correct. Shouldn't there be 3 sets of "rr"?
Code:
0003DDr 1 DC rr rr jmp [LongJumpLocation]
I actually wrote "jmp dummy" and it worked perfectly, so this must be the problem.
It occurred to me to write "jml" to force the assembler to do 24 bit addressing, but it still assembled to be 16 bit.
Code:
0003DDr 1 DC rr rr jml [LongJumpLocation]
Is this a bug with ca65?
Your config file is pointing to $808000 and so on for code and not $8000, right? (Or $800000 and $0000, can't remember).
EDIT: Wait, are you using a pointer to BSS for a location to jump to? Then it should assemble as 16-bit, so that's not the problem.
I wrote a whole big deal, and then I realized that I'm just being an idiot... It's 16 bit addressing to load the 24 bit number... I'm moving direct page and I don't think there's an absolute variant of this, so I guess this could be screwing it up? I think I'm just going to do the jsr thing.
I found out that you just found out what I did, nicklausw? I guess this isn't the problem, but the code starts at $008000.
Espozo wrote:
I don't know why it just now occurred to me to check the listing file...
I don't think this is correct. Shouldn't there be 3 sets of "rr"?
Code:
0003DDr 1 DC rr rr jmp [LongJumpLocation]
I actually wrote "jmp dummy" and it worked perfectly, so this must be the problem.
It occurred to me to write "jml" to force the assembler to do 24 bit addressing, but it still assembled to be 16 bit. :(
Code:
0003DDr 1 DC rr rr jml [LongJumpLocation]
Is this a bug with ca65?
No, the bug is in you. :-) You're using
jml [Address]. The brackets here are important. This is asking for the addressing mode absolute indirect long, which assembles to opcode $DC (correct), and consists of 3 total bytes (i.e. 1 opcode byte, 2 operand bytes). This is **indirect** addressing, which means the full 24-bit address is what's stored in memory location LongJumpLocation (i.e. if LongJumpLocation = $1100, then byte at $1100 = low byte of address to jump to, byte at $1101 = high byte of address to jump to, byte at $1102 = bank byte of address to jump to). That probably isn't going to make any sense to you, so I'll rewrite it in code:
Code:
LongJumpLocation .res 3 ; Needs to be in RAM or direct page
sep #$20
lda #$45
sta LongJumpLocation
lda #$23
sta LongJumpLocation+1
lda #$01
sta LongJumpLocation+2
jml [LongJumpLocation] ; Will assemble to $DC {2-byte address of LongJumpLocation}
;
; This will end up jumping to $012345.
;
Now compare that to:
Code:
LongJumpLocation = $6789ab
jml LongJumpLocation ; Will assemble to $5C AB 89 67
;
; This will end up jump to $6789ab.
;
If you wanted an absolute 24-bit jump, you'd want
jml LongJumpLocation. Whether or not you want indirect addressing depends on what you're trying to do and "how" you're using LongJumpLocation. There's no way any of us would know this.
As for the JMP vs. JML syntax: you can use the pseudo-op JML for two opcodes: either absolute long addressing (i.e.
jml LongJumpLocation) or absolute indirect long addressing (i.e.
jml [LongJumpLocation]). Both are considered acceptable.
I strongly urge you to go look at the
WDC 65816 documentation - specifically PDF page 459 -- and look/read the opcode chart there very carefully.
There is no "absolute indirect long jump" that has a full 24-bit address in the operand list (i.e. a 4 byte instruction). Your choices are absolute long, or absolute indirect long.
Yeah, what you're describing seems to be what I figured out. I looked at the page in the 65816 book, but it didn't have anything except the different types of jumps.
To break it down...
"Absolute" means it's not affected by Direct Page, "Indirect" means that it's loading the value of the address from ram, and "Long" means 24 bit addressing (the address of where in ram the value is).
So, really, I don't even need to copy over the contents of "Identity". In fact, I think the problem is that I didn't write "a:" in front of "LongJumpLocation".
I'll try to fix this, and see if it'll work...
Edit: I now remembered as to why I did the "LongJumpLocation" instead of just "Identity", but then I remembered that the opcode is always absolute, so it didn't work.
However, I did write
Code:
object_identifier_loop:
lda Identity
bne continue_object_identifier_loop
lda Identity+1
beq next_object
lda Identity
continue_object_identifier_loop:
sta a:LongJumpLocation
lda Identity+1
sta a:LongJumpLocation+1
jml [LongJumpLocation]
next_object:
And I finally got it to run correctly.
Thank you for tolerating me.
Sure thing. And you're probably setting this somewhere else in your code, but: don't forget about a:LongJumpLocation+2 -- that'd hold the bank byte needed for the jml [LongJumpLocation] to be correct. It could be working by total chance right now.
Oh, the accumulator is in 16 bit mode, so it's fine if there's only 2 loads and stores. The middle byte is loaded and stored twice, but it's still faster than going to an 8 bit accumulator and then back. I try to do everything with a 16 bit accumulator, as it ends up being the most efficient in 90% of cases.
Espozo wrote:
Oh, the accumulator is in 16 bit mode, so it's fine if there's only 2 loads and stores. The middle byte is loaded and stored twice, but it's still faster than going to an 8 bit accumulator and then back. I try to do everything with a 16 bit accumulator, as it ends up being the most efficient in 90% of cases.
Ahh, smart and clever! For a moment I was wondering how this worked correctly, but I worked it out and you're absolutely right (the "middle byte" of the 24-bit address gets written twice). I was expecting to see things like Identity and Identity+2, not Identity+1. I'm one of those who switches register sizes fairly often, so you'll have to excuse me, haha :-)