C/C++ development and debugging.

Multilanguage debugging in lldb: print call to function.

December 13, 2023 C/C++ development and debugging. , , , , , ,

There probably aren’t many people that care about debugging multiple languages, but I learned a new trick today that is worth making a note of, even if that note is for a future amnesiatic self.

Here’s a debug session where C code is calling COBOL, but in the COBOL frame, the language rules prohibit running print to show the results of a C function call (example: printf, strlen, strspn, …)

To make a function call in lldb, I used to go up the stack to a C language frame.  For example, if this was the COBOL code I was debugging:

(lldb) n
12/13/23 19:27:26 LTE14039I Opening LzMQZ connection. QMGR: MQZ1 MQZCONN: 0x7ff920625170 API: 0x7fed0008e0e0
Process 1673776 stopped
* thread #57, name = 'LZOCREG1', stop reason = step over
    frame #0: 0x00007ff9243b31f2 WINDC.NATIVE.LZPDS.A0116662(LTESVCXC).f3968a73`LTESVCXC at LTESVCXC.cbl:36:1
   33  
   34                  DISPLAY 'WSCHECK: "' WORK-VAR '"'
   35  
-> 36                 EXEC CICS LINK PROGRAM ('LTESVCXC')
   37                      COMMAREA(WORK-COMMAREA)
   38                      LENGTH   (LENGTH OF WORK-COMMAREA)
   39                 END-EXEC
(lldb) p &WORK-VAR
(*char [10]) $4 = 0x00007fadef810478
(lldb) p WORK-VAR
(char [10]) WORK-VAR = "STORISOK  "
(lldb) fr v -format x WORK-VAR
(char [10]) WORK-VAR = {
  [0] = 0xe2
  [1] = 0xe3
  [2] = 0xd6
  [3] = 0xd9
  [4] = 0xc9
  [5] = 0xe2
  [6] = 0xd6
  [7] = 0xd2
  [8] = 0x40
  [9] = 0x40
}

Aside: If you object to the use of a C address-of operator against a COBOL variable, that’s just because our debugger has C like & notational shorthand for the COBOL ‘ADDRESS OF …’, which is very useful.

If I want to run a C function against that COBOL WORKING-STORAGE variable, like strchr, to look for the address of the first EBCDIC space (0x40) in that string, I used to do it by going up the stack into a C frame, like so:

(lldb) up 2
frame #2: 0x00007ff9243b3f7e WINDC.NATIVE.LZPDS.A0116662(LTESVCXC).f3968a73`pgm_ltesvcxc + 382
WINDC.NATIVE.LZPDS.A0116662(LTESVCXC).f3968a73`pgm_ltesvcxc:
->  0x7ff9243b3f7e <+382>: jmp    0x7ff9243b3f88            ; <+392>
    0x7ff9243b3f80 <+384>: addq   $0x128, %rsp              ; imm = 0x128 
    0x7ff9243b3f87 <+391>: retq   
    0x7ff9243b3f88 <+392>: leaq   0x201039(%rip), %rdi
(lldb) print (char *)strchr(0x00007fadef810478, 0x40)
(char *) $6 = 0x00007fadef810480 "@@"

Sure enough, that space is found 8 bytes into the string, as expected. This is a very short string, and I could have seen that by inspection, but it’s just to illustrate that we can make calls to functions within the debugger, and they can even be functions that aren’t in the program or language that we are debugging.

I noticed today that ‘print’ is an alias for ‘expression –‘, and that expression takes a language option. This means that I can do cross language calls like this in any frame, provided I specify the language I want. Example:

(lldb) down 2
frame #0: 0x00007ff9243b31f2 WINDC.NATIVE.LZPDS.A0116662(LTESVCXC).f3968a73`LTESVCXC at LTESVCXC.cbl:36:1
   33  
   34                  DISPLAY 'WSCHECK: "' WORK-VAR '"'
   35  
-> 36                 EXEC CICS LINK PROGRAM ('LTESVCXC')
   37                      COMMAREA(WORK-COMMAREA)
   38                      LENGTH   (LENGTH OF WORK-COMMAREA)
   39                 END-EXEC
(lldb) expression -l c -- (char *)strchr(0x00007fadef810478, 0x40)
(char *) $7 = 0x00007fadef810480 "@@"

Ten points to me for learning yet another obscure debugger trick.

Letting a gdb controlled program read from stdin.

December 8, 2023 C/C++ development and debugging. , , ,

I was asked how to let a gdb controlled slave read from stdin, and couldn’t remember how it was done, so I wrote the following little test program, figuring that muscle memory would remind me once I had gdb running.

#include <stdio.h>

int main()
{
   size_t rc = 0;
   char buf[2];

   do {
      rc = fread( buf, 1, 1, stdin );
      if ( rc == 1 ) {
         printf( "%c\n", buf[0] );
      }
   } while ( rc );

   return 0;
}

Turns out the answer is just the gdb run command, which can specify stdin for the program. Here’s a demo session that shows this in action

(gdb) b main
Breakpoint 1 at 0x40061e: file x.c, line 5.
(gdb) run < x.c
Starting program: /home/pjoot/tmp/readit/a.out < x.c

Breakpoint 1, main () at x.c:5
5          size_t rc = 0;
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-225.0.4.el8_8.6.x86_64
(gdb) n
9             rc = fread( buf, 1, 1, stdin );
(gdb) n
10            if ( rc == 1 ) {
(gdb) p rc
$1 = 1
(gdb) p buf[0]
$2 = 35 '#'
(gdb) n
11               printf( "%c\n", buf[0] );
(gdb) 
#
13         } while ( rc );
(gdb) 
9             rc = fread( buf, 1, 1, stdin );
(gdb) 
10            if ( rc == 1 ) {
(gdb) p rc
$3 = 1
(gdb) p buf[0]
$4 = 105 'i'

The x.c input that I passed to run, was the source of the program that I debugging.

C++ compiler diagnostic gone horribly wrong: error: explicit specialization in non-namespace scope

September 23, 2022 C/C++ development and debugging. , , , , , , , ,

Here is a g++ error message that took me an embarrassingly long time to figure out:

In file included from /home/llvm-project/llvm/lib/IR/Constants.cpp:15:
/home/llvm-project/llvm/lib/IR/LLVMContextImpl.h:447:11: error: explicit specialization in non-namespace scope ‘struct llvm::MDNodeKeyImpl<llvm::DIBasicType>’
 template <> struct MDNodeKeyImpl<DIStringType> {
           ^

This is the code:

template <> struct MDNodeKeyImpl<DIStringType> {
  unsigned Tag;
  MDString *Name;
  Metadata *StringLength;
  Metadata *StringLengthExp;
  Metadata *StringLocationExp;
  uint64_t SizeInBits;
  uint32_t AlignInBits;
  unsigned Encoding;

This specialization isn’t materially different than the one that preceded it:

template <> struct MDNodeKeyImpl<DIBasicType> {
  unsigned Tag;
  MDString *Name;
  MDString *PictureString;
  uint64_t SizeInBits;
  uint32_t AlignInBits;
  unsigned Encoding;
  unsigned Flags;
  Optional<DIBasicType::DecimalInfo> DecimalAttrInfo;

  MDNodeKeyImpl(unsigned Tag, MDString *Name, MDString *PictureString,
               uint64_t SizeInBits, uint32_t AlignInBits, unsigned Encoding,
                unsigned Flags,
                Optional<DIBasicType::DecimalInfo> DecimalAttrInfo)
      : Tag(Tag), Name(Name), PictureString(PictureString),
        SizeInBits(SizeInBits), AlignInBits(AlignInBits), Encoding(Encoding),
        Flags(Flags), DecimalAttrInfo(DecimalAttrInfo) {}
  MDNodeKeyImpl(const DIBasicType *N)
      : Tag(N->getTag()), Name(N->getRawName()), PictureString(N->getRawPictureString()), SizeInBits(N->getSizeInBits()),
        AlignInBits(N->getAlignInBits()), Encoding(N->getEncoding()),
        Flags(N->getFlags(), DecimalAttrInfo(N->getDecimalInfo()) {}

  bool isKeyOf(const DIBasicType *RHS) const {
    return Tag == RHS->getTag() && Name == RHS->getRawName() &&
           PictureString == RHS->getRawPictureString() &&
           SizeInBits == RHS->getSizeInBits() &&
           AlignInBits == RHS->getAlignInBits() &&
           Encoding == RHS->getEncoding() && Flags == RHS->getFlags() &&
           DecimalAttrInfo == RHS->getDecimalInfo();
  }

  unsigned getHashValue() const {
    return hash_combine(Tag, Name, SizeInBits, AlignInBits, Encoding);
  }
};

However, there is an error hiding above it on this line:

        Flags(N->getFlags(), DecimalAttrInfo(N->getDecimalInfo()) {}

i.e. a single missing brace in the initializer for the Flags member, a consequence of a cut and paste during rebase that clobbered that one character, when adding a comma after it.

It turns out that the compiler was giving me a hint that something was wrong before this in the message:

error: explicit specialization in non-namespace scope

as it states that the scope is:

‘struct llvm::MDNodeKeyImpl

which is the previous class definition. Inspection of the code made me think that the scope was ‘namespace llvm {…}’, and I’d gone looking for a rebase error that would have incorrectly terminated that llvm namespace scope. This is a classic example of not paying enough attention to what is in front of you, and going off looking based on hunches instead. I didn’t understand the compiler message, but in retrospect, non-namespace scope meant that something in that scope was incomplete. The compiler wasn’t smart enough to tell me that the previous specialization was completed due to the missing brace, but it did tell me that something was wrong in that previous specialization (which was explicitly named), and I didn’t look at that because of my “what the hell does that mean” reaction to the compilation error message.

In this case, I was building on RHEL8.3, which uses an ancient GCC toolchain. I wonder if newer versions of g++ fare better (i.e.: a message like “possibly unterminated brace on line …” would have been much nicer)? I wasn’t able to try with clang++ as I was building llvm+clang+lldb (V14), and had uninstalled all of the llvm related toolchain to avoid interference.

The C compiler is too forgiving! sizeof(variable_name+1) allowed?

April 28, 2022 C/C++ development and debugging. , , , ,

I carelessly passed:

sizeof(s.st_size+1)

to an allocator call, instead of:

s.st_size+1

and corrupted memory nicely.

What the hell would sizeof(variable+1) even mean, and why on earth would the compiler think that is anything close to valid? Both gcc and clang, each with -Wall, are completely quiet about this error!

Debugging a C coding error from an XPLINK assembly listing.

March 12, 2021 C/C++ development and debugging., Mainframe , , , , , , ,

There are at least two\({}^1\) z/OS C calling conventions, the traditional “LE” OSLINK calling convention, and the newer\({}^2\) XPLINK convention.  In the LE calling convention, parameters aren’t passed in registers, but in an array pointed to by R1.  Here’s an example of an OSLINK call to strtof():

*  float strtof(const char *nptr, char **endptr);
LA       r0,ep(,r13,408)
LA       r2,buf(,r13,280)
LA       r4,#wtemp_1(,r13,416)
L        r15,=V(STRTOF)(,r3,4)
LA       r1,#MX_TEMP3(,r13,224)
ST       r4,#MX_TEMP3(,r13,224)
ST       r2,#MX_TEMP3(,r13,228)
ST       r0,#MX_TEMP3(,r13,232)
BASR     r14,r15
LD       f0,#wtemp_1(,r13,416)

R1 is pointed to r13 + 224 (a location on the stack). If the original call was:

float f = strtof( mystring, &err );

The compiler has internally translated it into something of the form:

STRTOF( &f, mystring, &err );

where all of {&f, mystring, &err} are stuffed into the memory starting at the 224(R13) location. Afterwards the value has to be loaded from memory into a floating point register (F0) so that it can be used.  Compare this to a Linux strtof() call:

* char * e = 0;
* float x = strtof( "1.0", &e );
  400b74:       mov    $0x400ef8,%edi       ; first param is address of "1.0"
  400b79:       movq   $0x0,0x8(%rsp)       ; e = 0;
  400b82:       lea    0x8(%rsp),%rsi       ; second param is &e
  400b87:       callq  400810 <strtof@plt>  ; call the function, returning a value in %xmm0

Here the input parameters are RDI, RSI, and the output is XMM0. Nice and simple. Since XPLINK was designed for C code, we expect it to be more sensible. Let’s see what an XPLINK call looks like. Here’s a call to fmodf:

*      float r = fmodf( 10.0f, 3.0f );
            LD       f0,+CONSTANT_AREA(,r9,184)
            LD       f2,+CONSTANT_AREA(,r9,192)
            L        r7,#Save_ADA_Ptr_9(,r4,2052)
            L        r6,=A(__fmodf)(,r7,76)
            L        r5,=A(__fmodf)(,r7,72)
            BASR     r7,r6
            NOP      9
            LDR      f2,f0
            STE      f2,r(,r4,2144)
*
*      printf( "fmodf: %g\n", (double)r );

There are some curious details that would have to be explored to understand the code above (why f0, f2, and not f0,f1?), however, the short story is that all the input and output values in (floating point) registers.

The mystery that led me to looking at this was a malfunctioning call to strtof:

*      float x = strtof( "1.0q", &e );
            LA       r2,e(,r4,2144)
            L        r7,#Save_ADA_Ptr_12(,r4,2052)
            L        r6,=A(strtof)(,r7,4)
            L        r5,=A(strtof)(,r7,0)
            LA       r1,+CONSTANT_AREA(,r9,20)
            BASR     r7,r6
            NOP      17
            LR       r0,r3
            CEFR     f2,r0
            STE      f2,x(,r4,2148)
*
*      printf( "strtof: v: %g\n", x );

The CEFR instruction converts an integer to a (hfp32) floating point representation, so we appear to have strtof returning it’s value in R3, which is an integer register. That then gets copied into R0, and finally into F2 (and after that into a stack spill location before the printf call.) I scratched my head about this code for quite a while, trying to figure out if the compiler had some mysterious way to make this work that I wasn’t figuring out. Eventually, I clued in. I’m so used to using a C++ compiler that I forgot about the old style implicit int return for an unprototyped function. But I had included <stdlib.h> in this code, so strtof should have been prototyped? However, the Language Runtime reference specifies that on z/OS you need an additional define to have strtof visible:

#define _ISOC99_SOURCE
#include <stdlib.h>

Without the additional define, the call to strtof() is as if it was prototyped as:

int strtof( const char *, char ** );

My expectation is that with such a correction, the call to strtof() should return it’s value in f0, just like fmodf() does. The result should also not be garbage!

 

Footnotes:

  1.  There is also a “metal” compiler and probably a different calling convention to go with that.  I don’t know how metal differs from XPLINK.
  2. Newer in the lifetime of the mainframe means circa 2001, which is bleeding edge given the speed that mainframe development moves.