COBOL

An absurd COBOL library: 2D Euclidean GA

December 31, 2023 COBOL, math and physics play , , , ,

I’ve achieved a new pinnacle of obscurity, and have now written a rudimentary COBOL implementation of a geometric algebra library for \( \mathbb{R}^2 \) calculations.

Who will use this?  Absolutely nobody.  Effectively, nobody knows geometric algebra.  Nobody wants to know COBOL, but some do.  The union of those two groups is vanishingly small (probably one: argued below.)

I understand that some Opus Dei members have taught themselves COBOL, as looking at COBOL has been found to be equally painful as a course of self flagellation.

Figure 0. A flagellation representation of COBOL.

Assuming that no Opus Dei practitioners know geometric algebra, that means that there is exactly one person in the world that both knows COBOL and geometric algebra.  Me.

Why did I write this little library?  Well, I was tickled to write something so completely stupid, and I’ve been laughing at the absurdity of it. I also thought I might learn a few things about COBOL in the process of trying to use it for something slightly non-trivial.  I’m adept at writing simple test programs that exercise various obscure compiler features, but those are usually fairly small.  On the flip side of complexity, I have to debug through a number of horribly complicated customer programs as part of my compiler validation work.  A simple real life test scenario might run 100+ COBOL programs in a set of CICS transactions, executing thousands of EXEC DLI and EXEC CICS statements as well as all of the rest of the COBOL language statements!  Despite having gained familiarity with COBOL from that sort of observational use, walking through stuff in the debugger doesn’t provide the same level of comfort with the language as writing code from scratch.  Since I have no interest in simulating a boring business application, why not do something just for fun as a learning game.

The compiler I am using does not seem to support object-COBOL (which would have been nicely suited for this project), so I’ve written my little toy in conventional COBOL, using one external procedure for each type of mathematical operation.  In the huge set of customer COBOL code that I’ve examined and done test compilations of, none of it has used object-COBOL.  I am guessing that the object-COBOL community is as large as the user base for my little toy COBOL geometric algebra library will ever be.

I’ve implemented methods to construct multivectors with scalar, vector and pseudoscalar components, or a general multivector with all of the above.  I’ve also implemented multiply, add, subtract, scalar multiplication, grade selection, and a DISPLAY function to write a multivector to SYSOUT (stdout equivalent.)

The multivector “type”

Figure 1 shows the implementation of my multivector type, implemented in copybook (include file) named MVI.  I have an alternate MV copybook that doesn’t have the VALUE (initialization) clauses, as you don’t want initialization for LINKAGE-SECTION values (i.e.: program parameters.)

Figure 1. Copybook with multivector declaration and initialization.

If you are wondering what the hell a ‘PIC S9(9) USAGE IS COMP-5’ is, well, that’s the “easy to remember” way to declare a 32-bit signed integer in COBOL.  A COMP-2, on the other hand, is a floating point value.

Figure 2 shows an example of the use of this copybook:

Figure 2. Using the multivector copybook.

Figure 3 shows these two copybook declarations after preprocessor expansion

Figure 3. Multivector global variable examples after preprocessing.

The global variable declarations above are roughly equivalent to the following pseudo C++ code (pretending that we can have anonymous unions that match the COBOL declarations above):

#include <complex>

using complex = std::complex<double>;

struct ga20{
   int grade{};
   union {
      struct { double sc{}; double ps{}; };
      complex g02{};
   };
   union { 
      struct { double x{}; double y{}; };
      complex g1{};
   };
};

ga20 a;
ga20 b;

COBOL is inherently untyped, but requires matching types for CALL parameters, or else all hell ensues, so you have to rely on naming conventions and other mechanisms to enforce the required type equivalences.  In this toy GA library, I’ve used copybooks to enforce the types required for everything.  Global variable declarations like these A-MV and B-MV variables are declared only using a copybook that knows the representation required, and all the uses in sub-programs of the effective -MV “type” use a matching copybook for their declarations.  However, I’ve also made use of the lack of typing to treat A-G02, B-G02, A-G1, and B-G1 as if they were complex numbers, and pass those “variables” off to complex number sub-programs, knowing that I’ve constructed the parameters to those programs in a way that is bit compatible with the MV field values.  You can screw things up really nicely doing stuff like this, especially because all COBOL sub-program parameters are (generally) passed by reference.  If you don’t match up the types right “fun ensues.”

Also observe that the nested level specifiers are optional in COBOL.  For nested fields in C++, we might write a.g1.x.  With a nested variable like this in COBOL, we could write something equivalent to that, like:

A-X OF A-G1 OF A-MV

but we can leave out any of the intermediate “level” specifications if we want.  This gets really confusing in complicated real-life COBOL code.  If you are looking to see where something is modified, you have to not only look for the variable of interest, but also any of the higher level fields, since any of those could have been passed off to other code, which implicitly wrote the value you are interested in.

Here’s what one of these multivectors looks like in memory on my (Linux x86-64) system

(lldb) c
Process 3903259 resuming
Process 3903259 stopped
* thread #10, name = 'GA20', stop reason = breakpoint 7.1
    frame #0: 0x00007fffd9189a02 PJOOT.GA20V01.LOADLIB(MULT).ec73dc4b`MULT at MULT.cob:50:1
   47              CALL GA-MKVECTOR-MODIFY USING C-MV, A-X, A-Y
   48              CALL GA-MKPSEUDO-MODIFY USING D-MV, A-PS
   49  
-> 50              MOVE 'A' TO WS-DISPPARM-N
   51              CALL GA-DISPLAY USING
   52                WS-DISPPARM-N,
   53                A-MV
(lldb) p A-MV
(A-MV) A-MV = {
  A-GRADE = -1
  A-G02 = (A-SC = 1, A-PS = 4)
  A-G1 = (A-X = 2, A-Y = 3)
}

i.e.: this has the value \( 1 + 2 \mathbf{e}_{12} + 3 \mathbf{e}_1 + 4 \mathbf{e}_1 \).

Looking at the multivector in it’s hex representation:

(lldb) fr v -format x A-MV
(A-MV) A-MV = {
  A-GRADE = 0xffffffff
  A-G02 = {
    A-SC = 0x3ff0000000000000
    A-PS = 0x4010000000000000
  }
  A-G1 = {
    A-X = 0x4000000000000000
    A-Y = 0x4008000000000000
  }
}

we see that the debugger is showing an underlying IEEE floating point representation for the COMP-2 variables in the program as it was compiled.

I have a multivector print routine that prints multivectors to SYSOUT:

Figure 4. Calling the multivector DISPLAY function.

where WS-DISPPARM-N is a PIC X(20).  (i.e.: a fixed size character array.)  Output for the A-MV value showing in the debug session above looks like:

A                     ( .10000000000000000E 01)                                                                         
                    + ( .20000000000000000E 01) e_1 + ( .30000000000000000E 01) e_2                                     
                    + ( .40000000000000000E 01) e_{12}            

End of sentence required for nested IFs?

I encountered a curious language issue in my multivector multiply function.  Here’s an example of how I’ve been coding IF statements

Figure 5. An IF END-IF pair without a period to terminate the sentence.

Notice that I don’t do anything special between the END-IF and the statement that follows it.  However, if I have an IF statement that includes nested IF END-IFs, then it appears that I need a period after the final END-IF, like so:

Figure 6. An IF with nested conditions that seems to require a period to terminate the sentence.

If I don’t include that period after the final END-IF (ending the COBOL sentence), then in some circumstances, I was seeing the program exit after the last interior basic block within this nested IF was executed.  In COBOL parlance, it seems as if a GOBACK (i.e.: return) was implicitly executed once we fell out of the big nested IF.  Why is that period required for a nested IF, but not for a simple IF?

In my “Murach’s mainframe COBOL”, he ends ALL if statements with a period, even simple IFs.  I don’t see a rationale for that in the book anywhere, but it’s a ~700 page book, so perhaps he says why at some point.

I’ve asked our compiler guys if this is a bug or expected behaviour, but I am guessing the latter…. I just don’t know why.

The multiplication kernel for this library

The workhorse of this GA(2,0) implementation, is a multivector multiplication operation, which can be implemented in two lines in Mathematica (or C++)

multivector /: multivector[_, m1_, m2_] ** multivector[_, n1_, n2_] := 
   multivector[-1, m1 n1 + Conjugate[m2] n2, n1 m2 + Conjugate[m1] n2 ]

In COBOL, it takes a lot more, and as usual, COBOL verbosity obfuscates things considerably. Here’s the equivalent code in my library:

Figure 7. GA(2,0) multiplication kernel in COBOL.

The library and a little test program.

If you are curious, you can poke around in the code for this library and the test program on github.  The sample/test program is src/MULT.cob, and running the job gives the following SYSOUT:

Figure 8. Sample SYSOUT for MULT.cob

A less evil COBOL toy complex number library

December 29, 2023 COBOL , , , , , , , , , ,

In a previous post ‘The evil of COBOL: everything is in global variables’, I discussed the implementation of a toy complex number library in COBOL.

That example code was a single module, having one paragraph for each function. I used a naming convention to work around the fact that COBOL functions (paragraphs) are completely braindead and have no input nor output parameters, and all such functions in a given loadmodule have access to all the variables of the entire program.

Perhaps you try to call me on my claim that COBOL doesn’t have parameters, nor does it have return values.  That’s true if you consider COBOL paragraphs to be the equivalent to functions.  I’ve heard paragraphs described as not-really-functions, and there’s some truth to that, especially since you can do a PERFORM range that executes a set of paragraphs, and there can be non-intuitive control flow transfers between paragraphs of such a range of execution, that is entirely non-function like.

There is one circumstance where COBOL parameters can be found.  It is actually possible to have both input and output parameters in COBOL, but it can only be done at a program level (i.e.: int main( int argc, char ** )). So, you can write a less braindead COBOL library, with a set of meaningful input and output parameters for each function, by using CALL instead of PERFORM, and a whole set of external programs, one for each of the operations that is desired. With that in mind, I’ve reworked my COBOL complex number toy library to use this program-level organization.  This is still a toy library implementation, but serves to illustrate the ideas.  The previous paragraph implementation can be found in the same repository, in the ../paragraphs-as-library/ directory.

Here are some examples of the functions in this little library, and examples of calls to them.

Multiply code:

And here’s a call to it:

Notice that I’ve opted to use dynamic calls to the COBOL functions, using a copybook that lists all the possible function names:

This frees me from the constraint of having to use inscrutable 8-character function names, which will get confusing as the library grows.

Like everything in COBOL, the verbosity makes it fairly unreadable, but refactoring all paragraphs into external programs, does make the calling code, and even the library functions themselves, much more readable.  It still takes 49 lines of code, to initialize two complex numbers, multiply them and display them to stdout.

Compare to the same thing in C++, which is 18 lines for a grow-your-own complex implementation with multiply:

#include <iostream>

struct complex{
   double re_;
   double im_;
};

complex mult(const complex & a, const complex & b) {
   // (a + b i)(c + d i) = a c - b d + i( b c + a d) 
   return complex{ a.re_ * b.re_ - a.im_ * b.im_,
                   a.im_ * b.re_ + a.re_ * b.im_ };
}

int main()
{
   complex a{1,2};
   complex b{3,4};
   complex c = mult(a, b);
   std::cout << "c = " << c.re_ << " +(" << c.im_ << ") I\n";

   return 0;
}

and only 11 lines if we use the standard library complex implementation:

#include <iostream>
#include <complex>

using complex = std::complex<double>;

int main() 
{  
   complex a{1,2}; 
   complex b{3,4};
   complex c = a * b;
   std::cout << "c = " << c << "\n";

   return 0;
}

Basically, we have one line for each operation: init, init, multiply, display, and all the rest is one-time fluff (the includes, main decl, return, …)

It turns out that the so-called OBJECT oriented COBOL extension to the language (circa Y2K), is basically a packaging of external-style-programs into collections that are class prefixed, just as I’ve done above.  This provides the capability for information hiding, and allows functions to have parameters and return values.  However, this doesn’t try to rectify the fundamental failure of the COBOL language: everything has to be in a global variable.  This language extension appears to be a hack that may have been done primarily for Java integration, which is probably why nobody uses it.  You just can’t take the dinosaur out of COBOL.

Sadly, it didn’t take people long to figure out that it’s incredibly dumb to require all variables to be global.  Even PL/I, which is 59 years old at the time I write this (only five years younger than COBOL), got it right.  They added parameters and return values to functions, and allow functions to have variables that are limited to that scope.  PL/I probably went too far, and added lots of features that are also braindead (like the PL/I macro preprocessor), but the basic language is at least sane.  It’s interesting that COBOL never evolved.  A language like C++ may have evolved too much, and still is, but the most flagrant design flaw in the COBOL language has been there since inception, despite every other language in the world figuring out that sort of stupidity should not be propagated.

Note that I work on the development of a COBOL and PL/I compilation stack.  I really like my work, which is challenging and great fun, and I work with awesome people. That doesn’t stop me from acknowledging that COBOL is a language spawned in hell by Satan. I can love my work, which provides tools for customers allowing them to develop, maintain and debug COBOL code, but also have great pity and remorse for those customers, having inherited ancient code built with an ancient language, and having no easy migration path away from that language.

The evil of COBOL: everything is in global variables

December 7, 2023 COBOL , , , , , , ,

COBOL does not have stack variables.  Everything is a global variable.  There is a loose equivalent of a function, called a paragraph, which can be called using a PERFORM statement, but a paragraph does not have any input or output variables, and no return code, so if you want it to behave like a function, you have to construct some sort of complicated naming convention using your global variables.

I’ve seen real customer COBOL programs with many thousands of global variables.  A production COBOL program is usually a giant sequence of MOVEs, MOVE A TO B, MOVE B TO C, MOVE C TO D, MOVE D TO E, … with various PERFORMs or GOTOs, or other things in between.  If you find that your variable has a bad value in it, that is probably because it has been copied from something that was copied from something, that was copied from something, that’s the output of something else, that was copied from something, 9 or 10 times.

I was toying around with the idea of coding up a COBOL implementation of 2D Euclidean geometric algebra, just as a joke, as it is surely the worst language in the world.  Yes, I work on a COBOL compiler project. The project is a lot of fun, and the people I work with are awesome, but I don’t have to like the language.

If I was to implement this simplest geometric algebra in COBOL, the logical starting place for that would be to implement complex numbers in COBOL first.  That is because we can use a pair of complex numbers to implement a 2D multivector, with one complex number for the vector part, and a complex number for the scalar and pseudoscalar parts.  That technique has been detailed on this blog previously, and also in a Mathematica module Cl20.m.

Trying to implement a couple of complex number operations in COBOL got absurd really fast.  Here’s an example.  First step was to create some complex number types.  I did that with a copybook (include file), like so:

This can be included multiple times, each time with a different name, like so:

The way that I structured all my helper functions, was with one set of global variables for input (at least one), and if appropriate, one output global variable.  Here’s an example:

So, if I want to compute and display a value, I have a whole pile of stupid MOVEs to do in and out of the appropriate global variables for each of the helper routines in question:

I wrote enough of this little complex number library that I could do conjugate, real, imaginary, multiply, inverse, and divide operations.  I can run that little program with the following JCL

//COMPLEX JOB
//A EXEC PGM=COMPLEX
//SYSOUT   DD SYSOUT=*
//STEPLIB  DD DSN=PJOOT.SAMPLE.COMPLEX,
//  DISP=SHR

and get this SYSOUT:

STEP A SYSOUT:
A                    =  .10000000000000000E 01 + ( .20000000000000000E 01) I
B                    =  .30000000000000000E 01 + ( .40000000000000000E 01) I
CONJ(A)              =  .10000000000000000E 01 + (-.20000000000000000E 01) I
RE(A)                =  .10000000000000000E 01
IM(A)                =  .20000000000000000E 01
A * B                = -.50000000000000000E 01 + ( .10000000000000000E 02) I
1/A                  =  .20000000000000000E 00 + (-.40000000000000000E 00) I
A/B                  =  .44000000000000000E 00 + ( .80000000000000000E-01) I

If you would like your eyes burned further, you can access the full program on github here. It takes almost 200 lines of code to do almost nothing.

An invalid transformation of a COBOL data description entry

August 28, 2020 COBOL , , , , ,

Here’s a subtle gotcha that we saw recently.  A miraculous tool transformed some putrid DELTA generated COBOL code from GOTO soup into human readable form.  Among the transformations that this tool did, were modifications to working storage data declarations (removing unused variables in the source, and simplifying some others).  One of those transformations was problematic.  In that problematic case the pre-transformed declarations were:

This declaration is basically a union of char[8] with a structure that has four char[2]’s, with the COBOL language imposed restriction that the character values can be only numeric (EBCDIC) digits (i.e. ‘\xF0’, …, ‘\xF9’).  In the code in question none of the U044-BIS* variables (neither the first, nor the aliases) were ever used explicitly, but they were passed into another COBOL program as LINKAGE SECTION variables and used in the called program.

Here’s how the tool initially transformed the declaration:

It turns out that dropping that first PIC and removing the corresponding REDEFINES clause, was an invalid transformation in this case, because the code used INITIALIZE on the level 01 object that contained these variables.

On page 177, of the “178 Enterprise COBOL for z/OS: Enterprise COBOL for z/OS, V6.3 Language Reference”, we have:

(copyright IBM)

FILLER
A data item that is not explicitly referred to in a program. The keyword FILLER is optional. If specified,
FILLER must be the first word following the level-number.

… snip …

In an INITIALIZE statement:
• When the FILLER phrase is not specified, elementary FILLER items are ignored.

The transformation of the code in question would have been correct provided the “INITIALIZE foo” was replaced with “INITIALIZE foo WITH FILLER”.  The bug in the tool was fixed, and the transformed code in question was, in this case, changed to drop all the aliasing:

As a side effect of encountering this issue, I learned a number of things:

  • FILLER is actually a COBOL language keyword, with specific semantics, and not just a variable naming convention.
  • Both ‘INITIALIZE’ and ‘INITIALIZE … WITH FILLER’ are allowed.
  • INITIALIZE (without FILLER) doesn’t do PIC appropriate initialization of FILLER variables (we had binary zeros instead of EBCDIC zeros as a result.)

Using the debugger to understand COBOL level 88 semantics.

August 10, 2020 COBOL , ,

COBOL has an enumeration mechanism called a LEVEL-88 variable.  I found a few aspects of this counter-intuitive, as I’ve mentioned in a few previous posts.  With the help of the debugger, I now think I finally understand what’s going on.  Here’s an example program that uses a couple of LEVEL-88 variables:

We can use a debugger to discover the meaning of the COBOL code. Let’s start by single stepping past the first MOVE statement to just before the SET MY88-VAR-1:

Here, I’m running the program LZ000550 from a PDS named COBRC.NATIVE.LZ000550. We expect EBCDIC spaces (‘\x40’) in the FEEDBACK structure at this point and that’s what we see:

Line stepping past the SET statement, our structure memory layout now looks like:

By setting the LEVEL-88 variable to TRUE all the memory starting at the address of feedback is now overwritten by the numeric value 0x0000000141C33A3BL. If we continue the program, the SYSOUT ends up looking like:

This condition was expected.
This condition was expected.

The first ‘IF MY88-VAR-1 THEN’ fires, and after the subsequent ‘SET MY88-VAR-2 TO TRUE’, the second ‘IF MY88-VAR-2 THEN’ fires. The SET overwrites the structure memory at the point that the 88 was declared, and an IF check of that same variable name checks if the memory in that location has the value in the 88 variable. It does not matter what the specific layout of the structure is at that point. We see that an IF check of the level-88 variable just tests whether or not the value at that address has the pattern specified in the variable. In this case, we have only on level-88 variable with the given name in the program, so the ‘IF MY88-VAR-2 OF feedback’ that was used was redundant, and could have been coded as just ‘IF MY88-VAR-2’, or could have been coded as ‘IF MY88-VAR-2 OF CONDITION-TOKEN-VALUE of Feedback’

We can infer that the COBOL code’s WORKING-STORAGE has the following equivalent C++ layout:

struct CONDITION_TOKEN_VALUE
{
   short SEVERITY;
   short MSG_NO;
   char CASE_SEV_CTL;
   char FACILITY_ID[3];
};

enum my88_vars
{
   MY88_VAR_1 = 0x0000000141C33A3BL,
   MY88_VAR_2 = 0x0000000241C33A3BL
};

struct feedback
{
   union {
      CONDITION_TOKEN_VALUE c;
      my88_vars e;
   } u;
   int I_S_INFO;
};

and that the control flow of the program can be modeled as the following:

//...
feedback f;

int main()
{
   memset( &f, 0x40, sizeof(f) );
   f.u.e = MY88_VAR_1;
   if ( f.u.e == MY88_VAR_2 )
   {
      impossible();
   }
   if ( f.u.e == MY88_VAR_1 )
   {
      expected();
   }

   f.u.e = MY88_VAR_2;
   if ( f.u.e == MY88_VAR_1 )
   {
      impossible();
   }
   if ( f.u.e == MY88_VAR_2 )
   {
      expected();
   }

   return 0;
}

Things also get even more confusing if the LEVEL-88 variable specifies less bytes than the structure that it is embedded in.  In that case, SET of the variable pads out the structure with spaces and a check of the variable also looks for those additional trailing spaces:

The CONDITION-TOKEN-VALUE object uses a total of 8 bytes.  We can see the spaces in the display of the FEEDBACK structure if we look in the debugger:

See the four trailing 0x40 spaces here.

Incidentally, it can be hard to tell what the total storage requirements of a COBOL structure is by just looking at the code, because the mappings between digits and storage depends on the usage clauses.  If the structure also uses REDEFINES clauses (embedded unions), as was the case in the program that I was originally looking at, the debug output is also really nice to understand how big the various fields are, and where they are situated.

Here are a few of the lessons learned:

  • You might see a check like ‘IF MY88-VAR-1 THEN’, but nothing in the program explicitly sets MY88-VAR-1. It is effectively a global variable value that is set as a side effect of some other call (in the real program I was looking at, what modified this “variable” was actually a call to CEEFMDA, a LE system service.) We have pass by reference in the function calls so it can be a reverse engineering task to read any program and figure out how any given field may have been modified, and that doesn’t get any easier by introducing LEVEL-88 variables into the mix.
  • This effective enumeration mechanism is not typed in any sense. Correct use of a LEVEL-88 relies on the variables that follow it to have the appropriate types. In this case, the ‘IF MY88-VAR-1 THEN’ is essentially shorthand for:
  • There is a disconnect between the variables modified or checked by a given LEVEL-88 variable reference that must be inferred by the context.
  • An IF check of a LEVEL-88 variable may include an implicit check of the trailing part of the structure with EBCDIC spaces, if the fields that follow the 88 variable take more space than the value of the variable. Similarly, seting such a variable to TRUE may effectively memset a trailing subset of the structure to EBCDIC spaces.
  • Exactly what is modified by a given 88 variable depends on the context.  For example, if the level 88 variables were found in a copybook, and if I had a second structure that had the same layout as FEEDBACK, with both structures including that copybook, then I’d have two instances of this “enumeration”, and would need a set of “OF foo OF BAR” type clauses to disambiguate things.  Level 88 variables aren’t like a set of C defines.  Their meaning is context dependent, even if they masquerade as constants.