COBOL has an enumeration mechanism called a LEVEL-88 variable. I found a few aspects of this counter-intuitive, as I’ve mentioned in a few previous posts. With the help of the debugger, I now think I finally understand what’s going on. Here’s an example program that uses a couple of LEVEL-88 variables:
We can use a debugger to discover the meaning of the COBOL code. Let’s start by single stepping past the first MOVE statement to just before the SET MY88-VAR-1:
Here, I’m running the program LZ000550 from a PDS named COBRC.NATIVE.LZ000550. We expect EBCDIC spaces (‘\x40’) in the FEEDBACK structure at this point and that’s what we see:
Line stepping past the SET statement, our structure memory layout now looks like:
By setting the LEVEL-88 variable to TRUE all the memory starting at the address of feedback is now overwritten by the numeric value 0x0000000141C33A3BL. If we continue the program, the SYSOUT ends up looking like:
The first ‘IF MY88-VAR-1 THEN’ fires, and after the subsequent ‘SET MY88-VAR-2 TO TRUE’, the second ‘IF MY88-VAR-2 THEN’ fires. The SET overwrites the structure memory at the point that the 88 was declared, and an IF check of that same variable name checks if the memory in that location has the value in the 88 variable. It does not matter what the specific layout of the structure is at that point. We see that an IF check of the level-88 variable just tests whether or not the value at that address has the pattern specified in the variable. In this case, we have only on level-88 variable with the given name in the program, so the ‘IF MY88-VAR-2 OF feedback’ that was used was redundant, and could have been coded as just ‘IF MY88-VAR-2’, or could have been coded as ‘IF MY88-VAR-2 OF CONDITION-TOKEN-VALUE of Feedback’
We can infer that the COBOL code’s WORKING-STORAGE has the following equivalent C++ layout:
struct CONDITION_TOKEN_VALUE { short SEVERITY; short MSG_NO; char CASE_SEV_CTL; char FACILITY_ID[3]; }; enum my88_vars { MY88_VAR_1 = 0x0000000141C33A3BL, MY88_VAR_2 = 0x0000000241C33A3BL }; struct feedback { union { CONDITION_TOKEN_VALUE c; my88_vars e; } u; int I_S_INFO; };
and that the control flow of the program can be modeled as the following:
//... feedback f; int main() { memset( &f, 0x40, sizeof(f) ); f.u.e = MY88_VAR_1; if ( f.u.e == MY88_VAR_2 ) { impossible(); } if ( f.u.e == MY88_VAR_1 ) { expected(); } f.u.e = MY88_VAR_2; if ( f.u.e == MY88_VAR_1 ) { impossible(); } if ( f.u.e == MY88_VAR_2 ) { expected(); } return 0; }
Things also get even more confusing if the LEVEL-88 variable specifies less bytes than the structure that it is embedded in. In that case, SET of the variable pads out the structure with spaces and a check of the variable also looks for those additional trailing spaces:
The CONDITION-TOKEN-VALUE object uses a total of 8 bytes. We can see the spaces in the display of the FEEDBACK structure if we look in the debugger:
See the four trailing 0x40 spaces here.
Incidentally, it can be hard to tell what the total storage requirements of a COBOL structure is by just looking at the code, because the mappings between digits and storage depends on the usage clauses. If the structure also uses REDEFINES clauses (embedded unions), as was the case in the program that I was originally looking at, the debug output is also really nice to understand how big the various fields are, and where they are situated.
Here are a few of the lessons learned:
- You might see a check like ‘IF MY88-VAR-1 THEN’, but nothing in the program explicitly sets MY88-VAR-1. It is effectively a global variable value that is set as a side effect of some other call (in the real program I was looking at, what modified this “variable” was actually a call to CEEFMDA, a LE system service.) We have pass by reference in the function calls so it can be a reverse engineering task to read any program and figure out how any given field may have been modified, and that doesn’t get any easier by introducing LEVEL-88 variables into the mix.
- This effective enumeration mechanism is not typed in any sense. Correct use of a LEVEL-88 relies on the variables that follow it to have the appropriate types. In this case, the ‘IF MY88-VAR-1 THEN’ is essentially shorthand for:
- There is a disconnect between the variables modified or checked by a given LEVEL-88 variable reference that must be inferred by the context.
- An IF check of a LEVEL-88 variable may include an implicit check of the trailing part of the structure with EBCDIC spaces, if the fields that follow the 88 variable take more space than the value of the variable. Similarly, seting such a variable to TRUE may effectively memset a trailing subset of the structure to EBCDIC spaces.
- Exactly what is modified by a given 88 variable depends on the context. For example, if the level 88 variables were found in a copybook, and if I had a second structure that had the same layout as FEEDBACK, with both structures including that copybook, then I’d have two instances of this “enumeration”, and would need a set of “OF foo OF BAR” type clauses to disambiguate things. Level 88 variables aren’t like a set of C defines. Their meaning is context dependent, even if they masquerade as constants.