## Debugging a C coding error from an XPLINK assembly listing.

There are at least two$${}^1$$ z/OS C calling conventions, the traditional “LE” OSLINK calling convention, and the newer$${}^2$$ XPLINK convention.  In the LE calling convention, parameters aren’t passed in registers, but in an array pointed to by R1.  Here’s an example of an OSLINK call to strtof():

*  float strtof(const char *nptr, char **endptr);
LA       r0,ep(,r13,408)
LA       r2,buf(,r13,280)
LA       r4,#wtemp_1(,r13,416)
L        r15,=V(STRTOF)(,r3,4)
LA       r1,#MX_TEMP3(,r13,224)
ST       r4,#MX_TEMP3(,r13,224)
ST       r2,#MX_TEMP3(,r13,228)
ST       r0,#MX_TEMP3(,r13,232)
BASR     r14,r15
LD       f0,#wtemp_1(,r13,416)


R1 is pointed to r13 + 224 (a location on the stack). If the original call was:

float f = strtof( mystring, &err );

The compiler has internally translated it into something of the form:

STRTOF( &f, mystring, &err );

where all of {&f, mystring, &err} are stuffed into the memory starting at the 224(R13) location. Afterwards the value has to be loaded from memory into a floating point register (F0) so that it can be used.  Compare this to a Linux strtof() call:

* char * e = 0;
* float x = strtof( "1.0", &e );
400b74:       mov    $0x400ef8,%edi ; first param is address of "1.0" 400b79: movq$0x0,0x8(%rsp)       ; e = 0;
400b82:       lea    0x8(%rsp),%rsi       ; second param is &e
400b87:       callq  400810 <strtof@plt>  ; call the function, returning a value in %xmm0


Here the input parameters are RDI, RSI, and the output is XMM0. Nice and simple. Since XPLINK was designed for C code, we expect it to be more sensible. Let’s see what an XPLINK call looks like. Here’s a call to fmodf:

*      float r = fmodf( 10.0f, 3.0f );
LD       f0,+CONSTANT_AREA(,r9,184)
LD       f2,+CONSTANT_AREA(,r9,192)
L        r6,=A(__fmodf)(,r7,76)
L        r5,=A(__fmodf)(,r7,72)
BASR     r7,r6
NOP      9
LDR      f2,f0
STE      f2,r(,r4,2144)
*
*      printf( "fmodf: %g\n", (double)r );


There are some curious details that would have to be explored to understand the code above (why f0, f2, and not f0,f1?), however, the short story is that all the input and output values in (floating point) registers.

The mystery that led me to looking at this was a malfunctioning call to strtof:

*      float x = strtof( "1.0q", &e );
LA       r2,e(,r4,2144)
L        r6,=A(strtof)(,r7,4)
L        r5,=A(strtof)(,r7,0)
LA       r1,+CONSTANT_AREA(,r9,20)
BASR     r7,r6
NOP      17
LR       r0,r3
CEFR     f2,r0
STE      f2,x(,r4,2148)
*
*      printf( "strtof: v: %g\n", x );


The CEFR instruction converts an integer to a (hfp32) floating point representation, so we appear to have strtof returning it’s value in R3, which is an integer register. That then gets copied into R0, and finally into F2 (and after that into a stack spill location before the printf call.) I scratched my head about this code for quite a while, trying to figure out if the compiler had some mysterious way to make this work that I wasn’t figuring out. Eventually, I clued in. I’m so used to using a C++ compiler that I forgot about the old style implicit int return for an unprototyped function. But I had included <stdlib.h> in this code, so strtof should have been prototyped? However, the Language Runtime reference specifies that on z/OS you need an additional define to have strtof visible:

#define _ISOC99_SOURCE
#include <stdlib.h>


Without the additional define, the call to strtof() is as if it was prototyped as:

int strtof( const char *, char ** );


My expectation is that with such a correction, the call to strtof() should return it’s value in f0, just like fmodf() does. The result should also not be garbage!

Footnotes:

1.  There is also a “metal” compiler and probably a different calling convention to go with that.  I don’t know how metal differs from XPLINK.
2. Newer in the lifetime of the mainframe means circa 2001, which is bleeding edge given the speed that mainframe development moves.

## Interesting z/OS (clang based) compiler release notes.

December 13, 2019 C/C++ development and debugging. No comments , , , ,

The release notes for the latest z/OS C/C++ compiler are interesting.  When I was at IBM they were working on “clangtana”, a clang frontend melded with the legacy TOBY backend.  This really surprised me, but was consistent with the fact that the IBM compiler guys kept saying that they were continually losing their internal funding — that project was a clever way to do more with less resources.  I think they’d made the clangtana switch for zLinux by the time I left, with AIX to follow once they had resolved some ABI incompatibility issues.  At the time, I didn’t know (nor care) about the status of that project on z/OS.

Well, years later, it looks like they’ve now switched to a clang based compiler frontend on z/OS too.  This major change appears to have a number of side effects that I can imagine will be undesirable to existing mainframe customers:

• Compiler now requires POSIX(ON) and Unix System Services.  No more compilation using JCL.
• Compiler support for 31-bit applications appears to be dropped (64-bit only!)
• Only ibm-1047 is supported for both source and runtime character set encoding.
• C89 support appears to have been dropped.
• Hex floating support has been dropped.
• No decimal floating point support.
• SIMD support isn’t implemented.
• Metal C support has been dropped.

i.e. if you want C++14, you have to be willing to give up a lot to get it.  They must be using an older clang, because this “new” compiler doesn’t include C++17 support.  I’m surprised that they didn’t even manage multiple character set support for this first compiler release.

It is interesting that they’ve also dropped IPA and PDF support, and that the optimization options have changed.  Does that mean that they’ve actually not only dropped the old Montana frontend, but also gutted the whole backend, switching to clang exclusively?

## COBOL code! Where’s the eyewash station?

In code that I am writing for work, I’m calling into COBOL code from C, and in order to setup the parameters and interpret the results, I have to know a little bit about how variables are declared in COBOL. I got an explanation of a little bit of COBOL syntax today that takes some of the mystery away.

Here’s the equivalent of something like a declaration of compile time constant variables in COBOL, a hierarchical beast something akin to a structure:

004500 01  CONSTANT-VALUES.                                             ORIG_SRC
004600     02  AN-CONSTANT PIC X(5) VALUE "IC104".                      ORIG_SRC
004700     02  NUM-CONSTANT PIC 99V9999 VALUE 0.7654.                   ORIG_SRC


This is roughly the equivalent of the following pseudo-c++11:

struct
{
char AN_CONSTANT[5]{'I','C','1','0','4'};
struct {
char digits1[2]{'0', '0'};
char decimalpoint{ '.' };
char digits2[4]{'7', '6', '5', '4'};
} NUM_CONSTANT;
} ;


Some points:

• The first 6 characters are source sequence numbers.  They aren’t line numbers like in BASIC (ie. you wouldn’t do a ‘goto 004500’), but were related to punch cards to make sure that out of sequence cards weren’t inserted into the card reader, or a card wasn’t fed into the reader by the operator by accident.
• The ‘ORIG_SRC’ stuff in column 73+ are ignored.  These columns are also related to punch cards, as an additional card sequence number could be encoded in those locations.
• The 01 indicates the first level of the ‘structure’.
• The 02 means a second level.  I don’t know if the indenting of the 01, 02 is significant, but I suspect not.
• PIC or PICTURE basically means the structure line is a variable and not the name of a new level.
• A sequence of 9’s means that the variable takes numeric digits in those locations, whereas the V means that location is a period.
• A sequence of X’s (or the X(5) here that means XXXXX), means that those characters can be alphanumeric.
• There is no reference to ‘CONSTANT-VALUES’ when the variables are referenced.  That is like a namespace of some sort.
• The level indicators 01, 02 are arbitrary, but have to be less than 77 (why that magic number? … who knows).  For example 05, 10 could have been used, so that another level could have been inserted in between without renumbering things.

The 01, 02 level indicators are also used for global variable declarations, also somewhat struct like:

004900 01  GRP-01.                                                      ORIG_SRC
005000     02  AN-FIELD PICTURE X(5).                                   ORIG_SRC
005100     02  NUM-DISPLAY PIC 99.                                      ORIG_SRC
005200     02  GRP-LEVEL.                                               ORIG_SRC
005300         03  A-FIELD PICTURE A(3).                                ORIG_SRC


This might be considered equivalent to:

struct
{
char AN_FIELD[5];
char NUM_DISPLAY[2];
struct {
char A_FIELD[3];
} GRP_LEVEL;
} GRP_01;


Here:

• A(3), equivalent to AAA, means the field can have ASCII values.
• The name ‘GRP-LEVEL’ header for the 03 structure level is not referenced in the code.

It is also possible to declare a variable as binary, like so:

005400 77  ELEM-01 PIC  V9(4) COMPUTATIONAL.                            ORIG_SRC

• Here 77 is a special magic level number, that really means what follows is a variable and not a “structure”.
• The V here means an implied decimal place in the interpretation of the value.
• The 9(4), equivalent to 9999, means the variable must be able to hold 4 numeric digits.
• The COMPUTATIONAL means the underlying variable must be able to hold a value as big as 9999.  i.e. a short or unsigned short must be used, and not a char or unsigned char.

The final variable group in the code I was looking at was:

005500 01  GRP-02.                                                      ORIG_SRC
005600     02  GRP-03.                                                  ORIG_SRC
005700         03  NUM-ITEM PICTURE S99.                                ORIG_SRC
005800         03  EDITED-FIELD  PIC XXBX0X.                            ORIG_SRC


which is roughly equivalent to:

struct
{
struct {
char NUM_ITEM[2];
struct
{
char digits1[2];
char blank1[1]{' '};
char digits2[1];
char zero1[1]{'0'};
char digits3[1];
} EDITED_FIELD;
} GRP_03;
} GRP_02;


Here

• EDITED-FIELD includes fixed blank and zero markers (B, 0 respectively).  When a four character variable is copied into this field, only the characters in the non-blank and non-zero values are touched.
• NUM-ITEM is a signed numeric value.  It’s representation is strange:

The signed representation is also char based, and uses what is referred to as an “over-punch” to encode the sign.  The normal (EBCDIC) encoding of a two digit variable 42 without a sign, would be:

‘4’, ‘2’ == 0xF4, 0xF2

when the S modifier is used in the PICTURE declaration, the F in the EDCDIC encoding range is changed to either C or D for unsigned and signed respectively.  That means the ‘4’, ‘2’ is encoded as:

0xF4, 0xC2

whereas the signed value “-42” is encoded as:

0xF4, 0xD2

The procedure prototype, specifically, what the parameters to the function are, are given in a ‘PROCEDURE DIVISION’ block, like so:

005900 PROCEDURE DIVISION USING GRP-01 ELEM-01 GRP-02.


Here

• The first 6 characters are still just punch card junk.
• Three variables are passed to and from the function: GRP-01, ELEM-01, GRP-02.  These are, respectively, 10, 4, and 8 bytes respectively.
• On the mainframe the COBOL function could be called with R1 something like:

struct parms {
void * pointers[3];
char ten[10];
uint16_t h;
char eight[8];
};

//...
struct parms p;

p.pointers[0] = &p.ten[0];
p.pointers[1] = &p.h;
p.pointers[2] = &p.eight | 0x80000000;

strncpy( p.ten, "XXXXX00ZZZ", 10 );
p.h = 0;
strncpy( p.eight, "99XXBX0X" );

setregister( R1, &p );


The 0x80000000 is the mainframe “31-bit” way of indicating the end of list. It relies on the fact that virtual memory addresses in 32-bit z/OS processes have only 31-bits of addressable space available, so you can hack one extra bit into a pointer to indicate end of list of pointers.

Suppose the program has statements like the following to populate its output fields

006400 MOVE AN-CONSTANT TO AN-FIELD.
006600 MOVE "YES" TO A-FIELD.
006700 MOVE NUM-CONSTANT TO ELEM-01.
006800 MOVE NUM-DISPLAY TO NUM-ITEM.
006900 MOVE "ABCD" TO EDITED-FIELD.


The results of this are roughly:

strncpy( p.ten, "IC104", 5 ); // MOVE AN-CONSTANT TO AN-FIELD (GRP-01)
strcpy( p.ten + 5, "25", 2 ); // ADD 25 TO NUM-DISPLAY (GRP-01): since the initial value was "00"
strncpy( p.ten + 7, "YES", 3 ); // MOVE "YES" TO A-FIELD.
p.h = 7654 // MOVE NUM-CONSTANT TO ELEM-01.
strcpy( p.eight, "25", 2 ); // MOVE NUM-DISPLAY TO NUM-ITEM.
strncpy( p.eight + 2, "AB C0D", 6 ); // MOVE "ABCD" TO EDITED-FIELD.


It appears that the the assignment of NUM-CONSTANT, a number of the form 99.9999 to the numeric value ELEM-01 which is of the form .9999, just truncates any whole portion of the number.