Mainframe

Debugging a C coding error from an XPLINK assembly listing.

March 12, 2021 C/C++ development and debugging., Mainframe , , , , , , ,

There are at least two\({}^1\) z/OS C calling conventions, the traditional “LE” OSLINK calling convention, and the newer\({}^2\) XPLINK convention.  In the LE calling convention, parameters aren’t passed in registers, but in an array pointed to by R1.  Here’s an example of an OSLINK call to strtof():

*  float strtof(const char *nptr, char **endptr);
LA       r0,ep(,r13,408)
LA       r2,buf(,r13,280)
LA       r4,#wtemp_1(,r13,416)
L        r15,=V(STRTOF)(,r3,4)
LA       r1,#MX_TEMP3(,r13,224)
ST       r4,#MX_TEMP3(,r13,224)
ST       r2,#MX_TEMP3(,r13,228)
ST       r0,#MX_TEMP3(,r13,232)
BASR     r14,r15
LD       f0,#wtemp_1(,r13,416)

R1 is pointed to r13 + 224 (a location on the stack). If the original call was:

float f = strtof( mystring, &err );

The compiler has internally translated it into something of the form:

STRTOF( &f, mystring, &err );

where all of {&f, mystring, &err} are stuffed into the memory starting at the 224(R13) location. Afterwards the value has to be loaded from memory into a floating point register (F0) so that it can be used.  Compare this to a Linux strtof() call:

* char * e = 0;
* float x = strtof( "1.0", &e );
  400b74:       mov    $0x400ef8,%edi       ; first param is address of "1.0"
  400b79:       movq   $0x0,0x8(%rsp)       ; e = 0;
  400b82:       lea    0x8(%rsp),%rsi       ; second param is &e
  400b87:       callq  400810 <[email protected]>  ; call the function, returning a value in %xmm0

Here the input parameters are RDI, RSI, and the output is XMM0. Nice and simple. Since XPLINK was designed for C code, we expect it to be more sensible. Let’s see what an XPLINK call looks like. Here’s a call to fmodf:

*      float r = fmodf( 10.0f, 3.0f );
            LD       f0,+CONSTANT_AREA(,r9,184)
            LD       f2,+CONSTANT_AREA(,r9,192)
            L        r7,#Save_ADA_Ptr_9(,r4,2052)
            L        r6,=A(__fmodf)(,r7,76)
            L        r5,=A(__fmodf)(,r7,72)
            BASR     r7,r6
            NOP      9
            LDR      f2,f0
            STE      f2,r(,r4,2144)
*
*      printf( "fmodf: %g\n", (double)r );

There are some curious details that would have to be explored to understand the code above (why f0, f2, and not f0,f1?), however, the short story is that all the input and output values in (floating point) registers.

The mystery that led me to looking at this was a malfunctioning call to strtof:

*      float x = strtof( "1.0q", &e );
            LA       r2,e(,r4,2144)
            L        r7,#Save_ADA_Ptr_12(,r4,2052)
            L        r6,=A(strtof)(,r7,4)
            L        r5,=A(strtof)(,r7,0)
            LA       r1,+CONSTANT_AREA(,r9,20)
            BASR     r7,r6
            NOP      17
            LR       r0,r3
            CEFR     f2,r0
            STE      f2,x(,r4,2148)
*
*      printf( "strtof: v: %g\n", x );

The CEFR instruction converts an integer to a (hfp32) floating point representation, so we appear to have strtof returning it’s value in R3, which is an integer register. That then gets copied into R0, and finally into F2 (and after that into a stack spill location before the printf call.) I scratched my head about this code for quite a while, trying to figure out if the compiler had some mysterious way to make this work that I wasn’t figuring out. Eventually, I clued in. I’m so used to using a C++ compiler that I forgot about the old style implicit int return for an unprototyped function. But I had included <stdlib.h> in this code, so strtof should have been prototyped? However, the Language Runtime reference specifies that on z/OS you need an additional define to have strtof visible:

#define _ISOC99_SOURCE
#include <stdlib.h>

Without the additional define, the call to strtof() is as if it was prototyped as:

int strtof( const char *, char ** );

My expectation is that with such a correction, the call to strtof() should return it’s value in f0, just like fmodf() does. The result should also not be garbage!

 

Footnotes:

  1.  There is also a “metal” compiler and probably a different calling convention to go with that.  I don’t know how metal differs from XPLINK.
  2. Newer in the lifetime of the mainframe means circa 2001, which is bleeding edge given the speed that mainframe development moves.

Example of PL/I macro

January 27, 2021 Mainframe , , ,

Up until last week PL/I macros were a bit of a mystery.  Most of the ones that I’d seen in customer code were impressively inscrutable, and if I had to look at any of them, my reaction was to throw my hands in the air and plead with the compiler backend guys for help.  Implementing one such macro has been very helpful to understanding how these work.

Here is a C program that roughly models some PL/I code of interest

The documentation for the ‘foo’ function says of the final return code parameter that it is 12 bytes long, and that the ‘rcvalues.h’ header file has a set of RCNNN constants and a RCCHECK macro that can be used to test for any one of those constants.  A possible C implementation of that header might look something like:

/* rcvalues.h */
#define RC000 0x0000000000000000LL
#define RC001 0x0000000123456789LL
/* ... */

#define RCCHECK( urc, crc ) ( memcmp( &(urc), &(crc), 8 ) == 0 )

PL/I APIs do not typically use modern constructs like typedefs.  The closest that I have seen is for an API header file (copybook in the mainframe lingo) is to declare a variable (which becomes a local variable with a specific name in the including module), which the programmer can refer to using the LIKE keyword, as in the following example:

I believe there is also a DEFINE keyword available in newer PL/I compilers, which provides a typedef like mechanism, but most existing code probably doesn’t use such new-fangled nonsense, when cut and paste has far superior maintenance characteristics.  For that reason, the API would be unlikely to have a typedef equivalent for the return code structure.  Instead, the PL/I equivalent of the C code above, would probably look like:

(i.e. the C code is really modeled on the PL/I code of this form, and if this was a C API, the API would have a struct declaration or a typedef for the return code structure)

The RCNNN constants would actually be found as named variables (not immutable constants) in the copy book, perhaps declared something like:

I struggled a bit to figure out what the PL/I equivalent of my C RCCHECK macro would be.  The following inner function correctly did the required type casting and comparisons:

The implementation is very long, since the entire declaration of the input parameter type has to be duplicated.

If I was to put this RCCHECK implementation above into my RCVALUES.inc header file, it would only work if all the customer declaration of their return code structure objects were field by field compatible.  What I really want is for my RCCHECK function to take the address of the parameter, and pass that instead of the underlying type.  That was not at all obvious to figure out how to do, but with some help, I was eventually able to construct a PL/I macro (with helper inner-function) of the following form:

It’s clearly no longer a one liner.  Some notes on this PL/I macro:

  • The PL/I macro body looks like a regular PL/I function, but the begin-PROCEDURE and END statements start with % (% is not part of the PROC name.)
  • Macro parameters and return values are explicit strings, regardless of the types of the parameters that were actually passed.
  • In PL/I the || symbol is used for string concatenation, so this constructs output that inserts an ADDR() call around ARG1 token and then passes the ARG2 token as is.
  • I don’t know if there’s a way to implement this macro in a way that doesn’t require a helper function, and still have the output work in the context of an IF statement.
  • You have to explicitly enable the macro, using %ACTIVATE.  In my case, without %ACTIVATE, the RCCHECK symbol ends up as an undeclared external entry, and was no call to the @RCCHECK_HELPER function \({}^{[1]}\).
  • Observe that the PL/I macro provides a mechanism to jam whatever you want into the code, as the compiler’s macro preprocessor replaces the macro call tokens with the string that you have provided, leaving that string for the final compiler pass to interpret instead.

If I compile the code using this macro version of RCCHECK, the preprocessor output looks like:

I’m still pretty horrified at some of the macros that I’ve seen in customer code — they almost seem like the source equivalent of self modifying code.  You can’t figure out what is going on without also looking at all the output of the precompiler passes.  This is especially evil, since you can write PL/I preprocessor macros that generate preprocessor macros and require multiple preprocessor passes to produce the final desired output!

Footnotes

[1] note that @ is a valid PL/I character to use in a symbol name, as is # and $ — so if you want your functions to look like swear words, this is a language where that is possible.  Something like the following is probably valid PL/I :

V = #@$1A#@(1);

For added entertainment, your file names (i.e. PDS member names) can also be like ‘#@$1A#@’. Storing files with names like that on a Unix filesystem results in hours of fun, as you are then left with the task of figuring out how to properly quote file names with embedded $’s and #’s in scripts and makefiles.

EBCDIC, thou art evil.

November 18, 2020 Mainframe , , ,

Here’s a bit of innocuous code. It was being compiled with gcc -fexec-charset=1047, so all the characters and strings were being treated as EBCDIC. For example ‘0’ = ‘\xF0’.

    if (c >= '0' && c <= '9')                                                                                                         
         c -= '0';                                                                                                                    
    else if (c >= 'A' && c <= 'Z')                                                                                                    
         c -= 'A' - 10;                                                                                                               
    else if (c >= 'a' && c <= 'z')                                                                                                    
         c -= 'a' - 10;                                                                                                               
    else                                                                                                                              
         break;         

Specifying the charset is not enough to port this code to the mainframe.  The problem is that EDCDIC is completely braindead, and DOESN’T PUT THE FRIGGEN LETTERS TOGETHER!

The letters are clustered in groups:

  • a-i
  • j-r
  • s-z
  • A-I
  • J-R
  • S-Z

with whole piles of crap between each range of characters, so comparisons like c >= ‘A’ && c <= ‘Z’ are useless, as are constructions like (c-‘A’-10) since c in J-R or S-Z will break that.

Now I have a big hunt and destroy task ahead of me.  I can fix this code, but where else are problems like this lurking!

An invalid transformation of a COBOL data description entry

August 28, 2020 COBOL , , , , ,

Here’s a subtle gotcha that we saw recently.  A miraculous tool transformed some putrid DELTA generated COBOL code from GOTO soup into human readable form.  Among the transformations that this tool did, were modifications to working storage data declarations (removing unused variables in the source, and simplifying some others).  One of those transformations was problematic.  In that problematic case the pre-transformed declarations were:

This declaration is basically a union of char[8] with a structure that has four char[2]’s, with the COBOL language imposed restriction that the character values can be only numeric (EBCDIC) digits (i.e. ‘\xF0’, …, ‘\xF9’).  In the code in question none of the U044-BIS* variables (neither the first, nor the aliases) were ever used explicitly, but they were passed into another COBOL program as LINKAGE SECTION variables and used in the called program.

Here’s how the tool initially transformed the declaration:

It turns out that dropping that first PIC and removing the corresponding REDEFINES clause, was an invalid transformation in this case, because the code used INITIALIZE on the level 01 object that contained these variables.

On page 177, of the “178 Enterprise COBOL for z/OS: Enterprise COBOL for z/OS, V6.3 Language Reference”, we have:

(copyright IBM)

FILLER
A data item that is not explicitly referred to in a program. The keyword FILLER is optional. If specified,
FILLER must be the first word following the level-number.

… snip …

In an INITIALIZE statement:
• When the FILLER phrase is not specified, elementary FILLER items are ignored.

The transformation of the code in question would have been correct provided the “INITIALIZE foo” was replaced with “INITIALIZE foo WITH FILLER”.  The bug in the tool was fixed, and the transformed code in question was, in this case, changed to drop all the aliasing:

As a side effect of encountering this issue, I learned a number of things:

  • FILLER is actually a COBOL language keyword, with specific semantics, and not just a variable naming convention.
  • Both ‘INITIALIZE’ and ‘INITIALIZE … WITH FILLER’ are allowed.
  • INITIALIZE (without FILLER) doesn’t do PIC appropriate initialization of FILLER variables (we had binary zeros instead of EBCDIC zeros as a result.)

Using the debugger to understand COBOL level 88 semantics.

August 10, 2020 COBOL , ,

COBOL has an enumeration mechanism called a LEVEL-88 variable.  I found a few aspects of this counter-intuitive, as I’ve mentioned in a few previous posts.  With the help of the debugger, I now think I finally understand what’s going on.  Here’s an example program that uses a couple of LEVEL-88 variables:

We can use a debugger to discover the meaning of the COBOL code. Let’s start by single stepping past the first MOVE statement to just before the SET MY88-VAR-1:

Here, I’m running the program LZ000550 from a PDS named COBRC.NATIVE.LZ000550. We expect EBCDIC spaces (‘\x40’) in the FEEDBACK structure at this point and that’s what we see:

Line stepping past the SET statement, our structure memory layout now looks like:

By setting the LEVEL-88 variable to TRUE all the memory starting at the address of feedback is now overwritten by the numeric value 0x0000000141C33A3BL. If we continue the program, the SYSOUT ends up looking like:

This condition was expected.
This condition was expected.

The first ‘IF MY88-VAR-1 THEN’ fires, and after the subsequent ‘SET MY88-VAR-2 TO TRUE’, the second ‘IF MY88-VAR-2 THEN’ fires. The SET overwrites the structure memory at the point that the 88 was declared, and an IF check of that same variable name checks if the memory in that location has the value in the 88 variable. It does not matter what the specific layout of the structure is at that point. We see that an IF check of the level-88 variable just tests whether or not the value at that address has the pattern specified in the variable. In this case, we have only on level-88 variable with the given name in the program, so the ‘IF MY88-VAR-2 OF feedback’ that was used was redundant, and could have been coded as just ‘IF MY88-VAR-2’, or could have been coded as ‘IF MY88-VAR-2 OF CONDITION-TOKEN-VALUE of Feedback’

We can infer that the COBOL code’s WORKING-STORAGE has the following equivalent C++ layout:

struct CONDITION_TOKEN_VALUE
{
   short SEVERITY;
   short MSG_NO;
   char CASE_SEV_CTL;
   char FACILITY_ID[3];
};

enum my88_vars
{
   MY88_VAR_1 = 0x0000000141C33A3BL,
   MY88_VAR_2 = 0x0000000241C33A3BL
};

struct feedback
{
   union {
      CONDITION_TOKEN_VALUE c;
      my88_vars e;
   } u;
   int I_S_INFO;
};

and that the control flow of the program can be modeled as the following:

//...
feedback f;

int main()
{
   memset( &f, 0x40, sizeof(f) );
   f.u.e = MY88_VAR_1;
   if ( f.u.e == MY88_VAR_2 )
   {
      impossible();
   }
   if ( f.u.e == MY88_VAR_1 )
   {
      expected();
   }

   f.u.e = MY88_VAR_2;
   if ( f.u.e == MY88_VAR_1 )
   {
      impossible();
   }
   if ( f.u.e == MY88_VAR_2 )
   {
      expected();
   }

   return 0;
}

Things also get even more confusing if the LEVEL-88 variable specifies less bytes than the structure that it is embedded in.  In that case, SET of the variable pads out the structure with spaces and a check of the variable also looks for those additional trailing spaces:

The CONDITION-TOKEN-VALUE object uses a total of 8 bytes.  We can see the spaces in the display of the FEEDBACK structure if we look in the debugger:

See the four trailing 0x40 spaces here.

Incidentally, it can be hard to tell what the total storage requirements of a COBOL structure is by just looking at the code, because the mappings between digits and storage depends on the usage clauses.  If the structure also uses REDEFINES clauses (embedded unions), as was the case in the program that I was originally looking at, the debug output is also really nice to understand how big the various fields are, and where they are situated.

Here are a few of the lessons learned:

  • You might see a check like ‘IF MY88-VAR-1 THEN’, but nothing in the program explicitly sets MY88-VAR-1. It is effectively a global variable value that is set as a side effect of some other call (in the real program I was looking at, what modified this “variable” was actually a call to CEEFMDA, a LE system service.) We have pass by reference in the function calls so it can be a reverse engineering task to read any program and figure out how any given field may have been modified, and that doesn’t get any easier by introducing LEVEL-88 variables into the mix.
  • This effective enumeration mechanism is not typed in any sense. Correct use of a LEVEL-88 relies on the variables that follow it to have the appropriate types. In this case, the ‘IF MY88-VAR-1 THEN’ is essentially shorthand for:
  • There is a disconnect between the variables modified or checked by a given LEVEL-88 variable reference that must be inferred by the context.
  • An IF check of a LEVEL-88 variable may include an implicit check of the trailing part of the structure with EBCDIC spaces, if the fields that follow the 88 variable take more space than the value of the variable. Similarly, seting such a variable to TRUE may effectively memset a trailing subset of the structure to EBCDIC spaces.
  • Exactly what is modified by a given 88 variable depends on the context.  For example, if the level 88 variables were found in a copybook, and if I had a second structure that had the same layout as FEEDBACK, with both structures including that copybook, then I’d have two instances of this “enumeration”, and would need a set of “OF foo OF BAR” type clauses to disambiguate things.  Level 88 variables aren’t like a set of C defines.  Their meaning is context dependent, even if they masquerade as constants.
%d bloggers like this: