adam_smasher wrote:
On the GBZ80, the processor I'm most familiar with, I generally try to keep all of the temporary data I'm working on in the CPU's registers (A, B, C, D, E, F, H, and L) at all times. If I find that I can't, that's almost always a cue for me to rethink my code or to refactor, wrapping some inner logic in a procedure. At the start of this procedure, I just push the registers I need onto the stack and restore them at the end. It's mechanical, which means that I don't need to think at all about which registers are clobbered (by convention, I do assume A is clobbered; the Z80's other registers are more general purpose than the 6502's X and Y, but the accumulator does still get special treatment). It's precisely the inability to (conveniently) do this that makes ZP seem like a set of "poor man's registers" to me, and that makes me feel anxious about using it. If I have to think about what procedures use which temporaries (and what temporaries the procedures they call use, and...) I'm almost bound to get it wrong, and non-local bugs like that are really hard to figure out.
So it's just a matter of getting used to the 65x design, then. I went from x86, to GBz80, then to 65x (later 68k, and other processors). 65x was definitely strange at first, for me, and you do have have to think about things differently else you'll end up simulating the design and habits of other processors on the 65x, which rarely turn out to be optimal.
However you end up deciding how you view and treat ZP (emulating whatever style you like), it shouldn't be daunting if you understand which routines you write will be primary and which will be called inside others (nested/layered/whatever). I mean, this is assembly and not a higher level language like C - so you are in control of everything. You are designing everything, unless you resorting to someone else's code/library. But even then, I would still argue otherwise.
Like I said, you'll know first hand which routines benefit from using ZP as fast ram access, which routines should have untouchable address vectors, etc. You have 256 bytes to decide this. It's not like you only have 16 bytes or such. I always start with the most critical routines first; they get optimized for ZP usage first and foremost. Then I look at routines that are expected to be nested (called inside other routines), and setup ZP accordingly. I've never found myself worrying about overwriting or inadvertently corrupting ZP defined usage. Thee most important function of ZP is address vectors, and second comes saving a cycle as temp vars for critical routines or loops. Really, that's no so hard to manage. If you so find yourself to be running out of ZP room, you can always write non-critical routines to push ZP bytes onto the stack at the begging of the routine, and restoring them at the end. Although this really depends on the requirements of the surrounding architecture, like the NES in which one might want to prioritize zp as a fast buffer transfer to vram - but I think such designs are on the fringe of optimizations and aren't representative of normal 65x code design/constructs (i.e. on the PCE, which is a 65x design, I would
never do something like this).
I do more 6280 than 6502 coding, but I tend to reserve 32 bytes of ZP for general address and data registers; A0-A7, and D0-D7.. all being 16bit wide. For certain routines, I might reserve additional ZP usage with direct names instead of generic register labels. I've also have used equates to rename the already defined Ax and Dx pseudo regs as other names for the sake of clarity, especially when I'm juggling multiple of each in a routine. Interestingly, NEC/Hudson reserves some ZP bytes with x86 names; AX, BX, CX, DX, DI, and SI (as well as
al, bl, cl etc) for interfacing with the bios routines of the PCE CD unit.