I wasn't aware of the different rounding for signed divide vs signed shift right. But I think that I do now understand why disassembled C code is so crappy!
In ASM code, almost everything is almost always unsigned (eg. memory addresses, loop counters, tile numbers, and so on). In most cases it really doesn't make sense to use signed numbers (and least to shift or divide them).
For C programmers, the main difference between "int" and "uint" is probably that "int" is shorter (and easier to pronounce). And so they might end up with signed "int", without actually being aware of what they are doing (and what the compiler will do if they use a supposedly harmless expression like "i/32" instead of right shifting).
For the example, the 3DS bootrom has some interrupt handling code like this:
Code:
;in: r4 = irq.no (range 0..7Fh)
;out: r3 = address of 32bit word: (17E01200h+(irq.no/20h*4))
;out: r1 = bit number within 32bit word: (irq.no AND 1Fh)
;---
0001247C 17E1 asrs r1,r4,1Fh ;sign-bit of irq.no
0001247E 4B0B ldr r3,=17E01200h
00012480 0EC9 lsrs r1,r1,1Bh ;sign*1Fh (=00h or 1Fh)
00012482 1909 adds r1,r1,r4 ;irq.no + sign*1Fh
00012484 114A asrs r2,r1,5h ;irq.no/20h
00012486 0949 lsrs r1,r1,5h ;irq.no/20h
00012488 0092 lsls r2,r2,2h ;irq.no/20h*4
0001248A 0149 lsls r1,r1,5h ;irq.no/20h*20h
0001248C 1A61 subs r1,r4,r1 ;irq.no - (irq.no/20h*20h) ;aka AND 1Fh
0001248E 18D3 adds r3,r2,r3 ;17E01200h + (irq.no/20h*4)
The compiler did apparently try to optimize "div 20h" as "shift 5", but then it went amok on rounding the (un-)signed result towards zero.
The code would be probably twice as small if the programmer had declared r4 as unsigned value (or if the source code had used shift instead of divide).
Assuming that it's a pretty common problem, and that it's impossible to teach C programmers not to use signed numbers... it would almost make sense to implement a "shift-and-round-towards-zero" opcode in newer processors (the newer ARM CPUs do actually have a fairly useless "uxt" opcode which helps on similar compiler-world issues, eg. when compilers think that they must ensure that "mov r0,15h" won't exceed FFFFh; which usually requires two useless opcodes, but can be now replaced with only one useless opcode).