I found myself faced with the task of understanding the effects of a PL/I WRITE loop that does the initial sequential load of a VSAM DATASET.  The block of code I was looking at had the following declarations:

     dcl IXUPD FILE UNBUFFERED KEYED env(vsam);
     dcl
        01 recArea,
            03 recPrefix,
                05 recID        PIC'(4)9' init (0),
                05 recKeyC      CHAR (4)  init (' '),
            03 recordData       CHAR (70) init (' ');

     dcl recIndx FIXED BIN(31) INITIAL(0);

     dcl keyListSize fixed bin(31) initial(10);
     dcl keyList(10) char(8);

As a C++ programmer, there are a few of oddities here:

  • Options for the FILE are specified at the file declaration point (or can be), not at the OPEN point.  They can also be specified at the OPEN point.  The designers of PL/I seem to have been guided by the general principle of “why have one way of doing something, when it can be done in an infinite variety of possible ways”.
  • There is a hybrid “structure & variable” declaration above.  recArea is like an object of an unnamed structure, containing nested parts (with lots of ugly COBOL like nesting specifications to show the depth of the various “structure” members).  It’s something like the following struct declaration (with c++11 default initializer specifiers):

    #include <stdio.h>
    
    int main() {
        struct {
            struct {
                char recID[4]{'0', '0', '0', '0'};
                char recKeyC[4]{' ', ' ', ' ', ' '};
            } recPrefix;
            char recordData[70]{ ' ', ' ', /* ... 70 spaces total */ };
        } recArea;
    
        printf( "recID: %.4s\n", recArea.recPrefix.recID );
        printf( "recKeyC: '%.4s'\n", recArea.recPrefix.recKeyC );
    
        return 0;
    }
    

    To PL/I’s credit, only ~45 years after the creation of PL/1 did C++ add a simple way of encoding default structure member initializers.

    We’ll see below that PL/I lets you access the inner members without any qualification if desired (i.e. recID == recArea.recPrefix.recId). The PL/I compiler writer is basically faced with the unenviable task of continually trying to guess what the programmer could have possibly meant.

  • The int32_t types have the annoying “mainframe”ism of being referred to as 31-bit integers (FIXED BIN(31)). Even if the high bit in pointers is ignored by the hardware (allowing the programmer to set 0x80000000 as a flag, for example for end of list in a list of pointers), that doesn’t mean that the registers aren’t fully 32-bit, nor does it mean that a 32-bit integer isn’t representable. I can’t for the life of me understand why a 32-bit integer variable should be declared as FIXED BINARY(31)?
  • The recID variable is declared with a PICTURE specification, as we also saw in COBOL code. PIC ‘9999’ (or PIC'(4)9′, for “short”), means that the character array will have four (EBCDIC) digits in it. I don’t quite understand this specification in this case, since the code (to follow) seems to put ‘RNNN’, where N is a digit in this field.

Here’s how the declarations above are used:

    
     keyList(1) = 'R001';
     keyList(2) = 'R002';
...
     OPEN FILE(IXUPD) OUTPUT;

     put skip list ('====== Write record to file by key.');
     do while (recIndx &lt; keyListSize);
        recIndx = recIndx + 1;
        recID = recIndx;
        recKeyC = 'Abcd';
        recordData = 'Data for ' || keyList(recIndx);
        write FILE(IXUPD) FROM(recArea) KEYFROM(keyList(recIndx));
     end;
     put skip list (recIndx, ' records is written to file by key.');

     CLOSE FILE(IXUPD);

My guess about what this ‘WRITE FROM(recArea)’ would do is to create records of the form:

0001AbcdData for R001
0002AbcdData for R002
0003AbcdData for R003
...

However, the VSAM DATASET (which was created with key offset 0, and key size 8), actually ends up with:

R001    Data for R001
R002    Data for R002
R003    Data for R003
...

Despite the fact that we are writing from recArea, which includes the recID and recKeyC fields (numeric and character respectively), only the non-key portion of the WRITE “data payload” ends up hitting the disk.

If that is the case, where do the spaces in the key-portion of the records come from? Again, the C programmer in me is interfering with my understanding. I look at:

dcl keyList(10) char(8);
keyList(1) = 'R001';

and think that keyList(1) = “R001\x00\x00\x00\x00”, but it must actually be space filled in PL/I! This seems to be confirmed emperically, based on the expected results for the test, but I can also see it in the debugger after manually relocating the 32-bit mainframe address:

(gdb) p keyLen
$1 = 8
(gdb) p /x aKey + 0x7ffbc4000000
$2 = 0x7ffbc5005740
(gdb) set target-charset EBCDIC-US
(gdb) p (char *)$2
$3 = 0x7ffbc5005740 "R001    R002    R003    R004    R005    R006    R007    R008    R009    R010    "

The final form of the records in the VSAM DATASET (mainframe for a file), is now fully understood. Note that the data disagrees with the PICTURE specification for the recID field in the recData variable declaration, but that’s okay, at least for this part of the program, since there is never any store to that field that is non-numeric. Would anything even have to have been written to recID or recKeyC … I suspect not? Once we have R00N in that part of the record what happens if we read it into recData with the numeric only PICTURE specification? Does that raise a PL/1 condition?

ps. Notice how the payload for the keyList array entries is nicely packed into memory. This is done in a very non-C like fashion with no requirement for an array of pointers and the corresponding cache hit loss those pointers create when accessing a big multilevel C array.