I’m looking at what might be an issue in optimized code, where we are seeing some signs that expected byteswapping is not occuring when optimization is enabled. Trying to find exactly where the code is that does this swapping was been a bit challenging, so I decided to look a simple 2-byte swap sequence in a standalone program first. For such code I see the following in the listing file (xlC compiler)

  182| 00002C rlwinm   5405C23E   1     SRL4      gr5=gr0,8
  182| 000030 rlwinm   5400063E   1     RN4       gr0=gr0,0,0xFF
  182| 000034 rldimi   7805402C   1     RI8       gr5=gr0,8,gr5,0xFFFFFF00

The SRL4, RN4, and RI8 “mnemonics” are internal compiler codes meant to be “intuitive”, but I didn’t find them intuitive until I figured out what the instructions actually did. Here’s a couple reverse engineering notes for that task. Tackling the first rlwinm instruction, note that the raw instruction corresponds to:

0x5405C23E == rlwinm r5,r0,24,8,31

Here’s what the Power ISA says about rlwinm:

Rotate Left Word Immediate then AND with Mask 

rlwinm RA,RS,SH,MB,ME

n  <= SH
r  <= ROTL_32 ((RS)_32:63 , n)
m  <= MASK(MB+32, ME+32)
RA <= r & m

The contents of register RS are rotated 32 left SH bits. A
mask is generated having 1-bits from bit MB+32
through bit ME+32 and 0-bits elsewhere. The rotated
data are ANDed with the generated mask and the
result is placed into register RA.

To interpret this we have to look up the meanings of all the MASK and ROTL operations. My first attempt at that, I got the meaning of MASK() wrong, since I was counting bits from the wrong end. I resorted to the following to figure out the instruction, using gcc inline asm immediate operand constraints “i”, to build up the instruction I wanted to examine the effects of:

#include <stdio.h>

#define rlwinm( output, input, sh, mb, me ) \
   __asm__( "rlwinm %0, %1, %2, %3, %4" \
          : "=r"(output) \
          : "r"(input), "i"(sh), "i"(mb), "i"(me) \
          : )

int main()
{
   long x = 0x1122334455667788L ;
   long y ;

   rlwinm( y, x, 24, 8, 31 ) ;

   printf("0x%016lX -> 0x%016lX\n", x, y ) ;

   return 0 ;
}

This generates an rlwinm instruction with the SH,MB,ME=24,8,31 triplet that I’d found in the listing. This code produces:

0x1122334455667788 -> 0x0000000000556677

Observing the effects of the instruction in this concrete example makes it easier to interpret the effects of the instruction. That seems to be:

   long y = ((int)x << 24) | (char)(x >> 8) ;
   y  |= (y << 32) ; (* The ROTL_32 operation in the ISA appears to have this effect, but it is killed in the mask application *)
   y &= 0xFFFFFF ;

Now the internal mnemonic “SRL4 …,8” has a specific meaning. It looks like it means Shift-Right-Lower4ByteWord 8 bits. It’s intuitive when you know that the L here means Lower. I didn’t guess that, and wondered what the hell shift RightLeft meant.

What does RN4 mean? That instruction was:

0x5400063E == rlwinm r0,r0,0,24,31

This has no shift, but applies a mask, and that mask has 16 bits less ones in it. This appears to be an AND with 0xFF. A little test program using “rlwinm( y, x, 0, 24, 31 )” this time confirms this, as it produces:

0x1122334455667788 -> 0x0000000000000088

What could the R and N have meant? Knowing what the instruction does, I’d now guess RotateNone(andMask).

Finally, how about the RI8 operation? This time we have

0x7805402C == rldimi r5,r0,8,32

The PowerISA says of this:

Rotate Left Doubleword Immediate then Mask Insert  

rldimi RA,RS,SH,MB 

n  <= sh_5 || sh_0:4
r  <= ROTL_64 ((RS), n)
b  <= mb_5 || mb_0:4
m  <= MASK(b, ¬n)
RA  <= r&m | (RA) & ¬ m
The contents of register RS are rotated 64 left SH bits. A
mask is generated having 1-bits from bit MB through bit
63-SH and 0-bits elsewhere. The rotated data are
inserted into register RA under control of the generated
mask.

Let’s also see if an example makes this easier to understand. This time a read/write modifier + is required on the output operand

#include <stdio.h>

#define rldimi( inout, input, sh, mb ) \
   __asm__( "rldimi %0, %1, %2, %3" \
          : "+r"(inout) \
          : "r"(input), "i"(sh), "i"(mb) \
          : )

int main()
{
   long x = 0x1122334455667788L ;
   long y = 0x99aabbccddeeff12L ;
   long yo = y ;

   rldimi( y, x, 8, 32 ) ;

   printf("0x%016lX,0x%016lX -> 0x%016lX\n", x, yo, y ) ;

   return 0 ;
}

This produces:

0x1122334455667788,0x99AABBCCDDEEFF12 -> 0x99AABBCC66778812

It appears that the effect is:

y = (y & ~0xFFFFFF00L) | ((x << 8) & 0xFFFFFF00L) ;

I find it tricky to understand this from the PowerISA description, so if I encountered different values of SH,MB I’d probably run them through this little reverse engineering program. That said, at least the meaning of RI8 in -qlist output is now clear.