Apologies if this is in the wrong section, please move it if it is.
I'm struggling to understand some concepts. I understand bit masking, but how does the following work? 16 4k pages for PRG-ROM was suggested to make the math easier somehow...
Code:
/* uint8_t *prg_page[16]; */
value = prg_page[addr >> 12][addr & 0x0FFF];
Let's say addr == 0x8123:
Code:
value = prg_page[8][123];
How does [8][123] map to one of the 16 pages?
I'm really not grasping something here, any help would be great!
I'm not sure what you mean or where you got that code.
You can use the calculator that comes with windows to do Hex and operations like AND and OR. The idea behind masking in your suggested usage is to make off bits higher or lower than a desired range. For example if you are masking the bottom twelve bits off by ANDing with 0xFFF that means you know the number left over won't be higher than that.
Since you seem to have arrays or pointers cut into smaller pieces (which you should, the address space shouldn't be one big array) when you access an address in these small windows you need only the address bits that apply to your window size. So you are taking the low bits to address the memory in that window, and the higher bits are being used to figure out which window to fetch from.
Hopefully that helps.
Shifting right by 12 gets you the 4k bank being accessed. ANDing the address by 0xFFF gets the 4k offset.
From this code one can assume prg_page is an array of pointers to 4k segments of memory.
So for example:
prg_page[8] = pointer to $8000 bank
prg_page[9] = pointer to $9000 bank etc
The bankswitch routine would thus do something like this (UNROM):
offset = &ROM[writeval << 4];
prg_page[0x8] = offset;
prg_page[0x9] = offset + 0x1000;
prg_page[0xA] = offset + 0x2000;
prg_page[0xB] = offset + 0x3000;
Personally I don't see the need for this kind of simplified addressing... I decode stuff in functions (function pointers are great) so there's only as much pointer arithmetic as necessary and everything can be easily hooked without hacky code.
I am probably just saying stuff others have but....
The left value is 4 bits. That equals a page number from the code. In this case, $8, which should be bank 8 of the 4K banks you have in your program. The value A would be bank 10, 2 bank 2, and so on.
Yeah, as kyuusaku says, get it working before you try to optimize it. Bit shifting and page pointers is optimization. Actually, you don't need any bit shifting; just use / and %. If having a divide is unbearable, make addr an unsigned int and the compiler will convert the / and % into a right shift and mask.
Code:
#define BANK_SIZE 4096
uint8_t* banks [65536 / BANK_SIZE];
uint8_t* memory( int addr )
{
return &banks [addr / BANK_SIZE] [addr % BANK_SIZE];
}
That's great, thanks guys, I understand it now. And blargg, you're right, premature optimization on my part wasn't doing me any favours, I'll take that on board
Thanks again.
blargg wrote:
Yeah, as kyuusaku says, get it working before you try to optimize it. Bit shifting and page pointers is optimization. Actually, you don't need any bit shifting; just use / and %. If having a divide is unbearable, make addr an unsigned int and the compiler will convert the / and % into a right shift and mask.
Code:
#define BANK_SIZE 4096
uint8_t* banks [65536 / BANK_SIZE];
uint8_t* memory( int addr )
{
return &banks [addr / BANK_SIZE] [addr % BANK_SIZE];
}
I don't agree with your point here. Bit shifting isn't that much of an optimisation, it is just the proper way of manipuling bits because it is its purpose, and using those bitwise operators makes code clearer when it's used right. For example, [var = val / 16;] makes me think the programmer wanted to do some arithmetics with a variable, while [var = val>>4;] clearly makes me thinks the programmer wanted to get the upper part of the integer, above the first 4 bits. If val and var are [unsigned int], my two examples yields the same results, but there is a difference in the intend in the code.
Good point about intention; code should reflect that, and any optimization should avoid disturbing that, if possible. If you've got some hardware that's specified to use the upper 4 bits of a byte, you should write b>>4, not b/16. If you're dividing the address space into 4096-byte banks, and you want the bank index and offset, you should use addr/4096 and addr%4096, not addr>>12 and addr&0xFFF.
blargg wrote:
Good point about intention; code should reflect that, and any optimization should avoid disturbing that, if possible. If you've got some hardware that's specified to use the upper 4 bits of a byte, you should write b>>4, not b/16. If you're dividing the address space into 4096-byte banks, and you want the bank index and offset, you should use addr/4096 and addr%4096, not addr>>12 and addr&0xFFF.
- Yes, but I bet you suppose a value that's power of 2..? I don't know if the compiler rounds a division or takes the integer part only. Anyway, an obvious thing: do not use shifts with signed numbers.
I don't understand your comment about a power of 2. / and % work regardless of the divisor, with / yielding the quotient and % the remainder. For positive values, this works as expected, for example 15/8=1 and 15%8=7.
Right shifting is fine on positive signed values, it's just negative values which yield an implementation-defined result. If you're reserving right shift for bit operations only, you'll never have to worry about a negative value in the first place.
blargg wrote:
If you're dividing the address space into 4096-byte banks, and you want the bank index and offset, you should use addr/4096 and addr%4096, not addr>>12 and addr&0xFFF.
Unless you're thinking of it as separating the address bus into A15-A12 and A11-A0, and then treating A15-A12 as the input to a decoder and/or a register file.
Zepper wrote:
I don't know if the compiler rounds a division or takes the integer part only.
In C, integer / integer rounds toward 0. This is true of both signed and unsigned integer arithmetic.
Only C99 specifies rounding towards zero; C89 and C++ leave it up to the implementation. It's unfortunate that algebraic rounding was chosen for C99, as it's the less-useful of the two possible approaches. From the respective language standards, in order of publication:
C89 wrote:
When integers are divided and the division is inexact, if both operands are positive the result of the / operator is the largest integer less than the algebraic quotient and the result of the % operator is positive. If either operand is negative, whether the result of the / operator is the largest integer less than the algebraic quotient or the smallest integer greater than the algebraic quotient is implementation-defined, as is the sign of the result of the % operator. If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a.
C++98 wrote:
The binary / operator yields the quotient, and the binary % operator yields the remainder from the division of the first expression by the second. If the second operand of / or % is zero the behavior is undefined; otherwise (a/b)*b + a%b is equal to a. If both operands are nonnegative then the remainder is nonnegative; if not, the sign of the remainder is implementation-defined*.
* According to work underway toward the revision of ISO C, the preferred algorithm for integer division follows the rules defined in the ISO Fortran standard, ISO/IEC 1539:1991, in which the quotient is always rounded toward zero.
C99 wrote:
When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded*. If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a.
* This is often called "truncation toward zero".
- addr/4096 -> 4096 is 2^12.
- addr/8 -> 8 is 2^3.
- This is easily translated with shifts. What about addr/7 ? Got it?
So you're talking about optimization by the compiler? I'm just not clear of your point, and where it fits in with things. I'd like to understand, but your terseness is making that difficult.
You also sometimes have the other scenario, where your shifts aren't constant. Then you can't simply use division or multiplication and rely on magic that makes the compiler turn them into shifts.
You should still code as a multiply/divide if that better expresses the intent, and leave optimization to those sections of code that you've identified as hotspots.
True, but as a rule, division by a variable is hard to pipeline in modern CPU hardware. So any inner loop with a division in it should be one of the first targets of your profiling instrumentation.
The target of your profiling should be the entire program, because you shouldn't make assumptions as to what the hotspots are. Profiling then tells you where to focus your optimization efforts. If a loop has a division but isn't a hotspot, then you ignore it.
Profiling every function call in the entire program will just tell you that 50% of the time is spent in the profiler.
Relying on division, floating-point operations, etc. on platforms where they are slow will result in
uniformly slow code.
tepples wrote:
Profiling every function call in the entire program will just tell you that 50% of the time is spent in the profiler.
Then you're using the wrong profiling tool. A good one shouldn't impact program performance much at all. There are two basic types I know about. One hooks into every function call, giving information about number of calls, average time spent, call graph. The other doesn't modify the code at all, instead sampling the program counter hundreds of times a second, seeing where it falls. From this it determines which functions it spends the most time in. Even for the one that hooks into functions, it should have a way to avoid doing so for inline functions, lessening the impact.