I’m looking at what might be an issue in optimized code, where we are seeing some signs that expected byteswapping is not occuring when optimization is enabled. Trying to find exactly where the code is that does this swapping was been a bit challenging, so I decided to look a simple 2-byte swap sequence in a standalone program first. For such code I see the following in the listing file (xlC compiler)
182| 00002C rlwinm 5405C23E 1 SRL4 gr5=gr0,8 182| 000030 rlwinm 5400063E 1 RN4 gr0=gr0,0,0xFF 182| 000034 rldimi 7805402C 1 RI8 gr5=gr0,8,gr5,0xFFFFFF00
The SRL4, RN4, and RI8 “mnemonics” are internal compiler codes meant to be “intuitive”, but I didn’t find them intuitive until I figured out what the instructions actually did. Here’s a couple reverse engineering notes for that task. Tackling the first rlwinm instruction, note that the raw instruction corresponds to:
0x5405C23E == rlwinm r5,r0,24,8,31
Here’s what the Power ISA says about rlwinm:
Rotate Left Word Immediate then AND with Mask rlwinm RA,RS,SH,MB,ME n <= SH r <= ROTL_32 ((RS)_32:63 , n) m <= MASK(MB+32, ME+32) RA <= r & m The contents of register RS are rotated 32 left SH bits. A mask is generated having 1-bits from bit MB+32 through bit ME+32 and 0-bits elsewhere. The rotated data are ANDed with the generated mask and the result is placed into register RA.
To interpret this we have to look up the meanings of all the MASK and ROTL operations. My first attempt at that, I got the meaning of MASK() wrong, since I was counting bits from the wrong end. I resorted to the following to figure out the instruction, using gcc inline asm immediate operand constraints “i”, to build up the instruction I wanted to examine the effects of:
#include <stdio.h> #define rlwinm( output, input, sh, mb, me ) \ __asm__( "rlwinm %0, %1, %2, %3, %4" \ : "=r"(output) \ : "r"(input), "i"(sh), "i"(mb), "i"(me) \ : ) int main() { long x = 0x1122334455667788L ; long y ; rlwinm( y, x, 24, 8, 31 ) ; printf("0x%016lX -> 0x%016lX\n", x, y ) ; return 0 ; }
This generates an rlwinm instruction with the SH,MB,ME=24,8,31 triplet that I’d found in the listing. This code produces:
0x1122334455667788 -> 0x0000000000556677
Observing the effects of the instruction in this concrete example makes it easier to interpret the effects of the instruction. That seems to be:
long y = ((int)x << 24) | (char)(x >> 8) ; y |= (y << 32) ; (* The ROTL_32 operation in the ISA appears to have this effect, but it is killed in the mask application *) y &= 0xFFFFFF ;
Now the internal mnemonic “SRL4 …,8” has a specific meaning. It looks like it means Shift-Right-Lower4ByteWord 8 bits. It’s intuitive when you know that the L here means Lower. I didn’t guess that, and wondered what the hell shift RightLeft meant.
What does RN4 mean? That instruction was:
0x5400063E == rlwinm r0,r0,0,24,31
This has no shift, but applies a mask, and that mask has 16 bits less ones in it. This appears to be an AND with 0xFF. A little test program using “rlwinm( y, x, 0, 24, 31 )” this time confirms this, as it produces:
0x1122334455667788 -> 0x0000000000000088
What could the R and N have meant? Knowing what the instruction does, I’d now guess RotateNone(andMask).
Finally, how about the RI8 operation? This time we have
0x7805402C == rldimi r5,r0,8,32
The PowerISA says of this:
Rotate Left Doubleword Immediate then Mask Insert rldimi RA,RS,SH,MB n <= sh_5 || sh_0:4 r <= ROTL_64 ((RS), n) b <= mb_5 || mb_0:4 m <= MASK(b, ¬n) RA <= r&m | (RA) & ¬ m The contents of register RS are rotated 64 left SH bits. A mask is generated having 1-bits from bit MB through bit 63-SH and 0-bits elsewhere. The rotated data are inserted into register RA under control of the generated mask.
Let’s also see if an example makes this easier to understand. This time a read/write modifier + is required on the output operand
#include <stdio.h> #define rldimi( inout, input, sh, mb ) \ __asm__( "rldimi %0, %1, %2, %3" \ : "+r"(inout) \ : "r"(input), "i"(sh), "i"(mb) \ : ) int main() { long x = 0x1122334455667788L ; long y = 0x99aabbccddeeff12L ; long yo = y ; rldimi( y, x, 8, 32 ) ; printf("0x%016lX,0x%016lX -> 0x%016lX\n", x, yo, y ) ; return 0 ; }
This produces:
0x1122334455667788,0x99AABBCCDDEEFF12 -> 0x99AABBCC66778812
It appears that the effect is:
I find it tricky to understand this from the PowerISA description, so if I encountered different values of SH,MB I’d probably run them through this little reverse engineering program. That said, at least the meaning of RI8 in -qlist output is now clear.
Writing some emulation code, so thank you for this tutorial on how to test instruction operation! I find that the PPC documentation is self-contradictory about some of the Fixed-Point Rotate and Shift instructions; for example, slw says both “r <- ROTL32((RS)32:63, n)" _and_ "Zeros are supplied to the vacated positions on the right"; but earlier, "The rotation operations rotate a 64-bit quantity left by a specified number of bit positions. Bits that exit from position 0 enter at position 63" and "For the second type, denoted rotate32 or ROTL32, the value rotated consists of two copies of bits 32:63 of the given 64-bit value, one copy in bits 0:31 and the other in bits 32:63"; so ROTL32 does not "supply zeros to the vacated positions on the right". I look forward to using your methods to resolve this contradiction!
Good luck Karl. It’s been so long since I wrote this particular post, that I’d have to reverse engineer the post itself to understand what I was talking about — it doesn’t help that I no longer have access to a powerpc xlC compiler though.