COBOL

Enterprise COBOL 6.5 TYPEDEF and “User Defined Functions”

June 28, 2026 COBOL No comments , , , , , , , ,

I continually disparage and make fun of COBOL, a “language” deserving of all levels of mockery and condescension.  My last LinkedIn mockery of COBOL paragraphs, to my great amusement, resulted in a COBOL defender saying it’s not a programming language, but a report writing language.  That’s an even stronger case than I tried to make!  Even people that like the language cannot defend it.

I’d like to share some features that IBM has recently retrofitted into their Enterprise COBOL Language Reference that actually go a very long way into making COBOL into a real programming language.  I have to hand it to IBM for adding these features.  Hopefully, these will also make it into ISO COBOL, should there ever be a refresh of that standard.

I’m not saying that COBOL is no longer the worst programming language in the universe.  However, it could be improved quite a bit by two specific fairly innocuous seeming features, if used well:

  • New TYPEDEF/TYPE keywords, and qualified member access.
  • User Defined Functions (UDFs)

TYPEDEF/TYPE

Anybody who has used a real programming language takes it for granted that one can make type definitions.  COBOL has been around since 1959, and IBM’s Enterprise COBOL 6.5 was released in 2024 — it took about 65 years before a mainstream version of COBOL was available with a basic type mechanism (Microfocus and TypeCOBOL both did it earlier, but I’m sure that neither of these have much of the COBOL market compared to IBM’s compiler.)

Here’s an example of a declaration in COBOL, taken from the NIST suite

If you want a second instance of such a variable, you have to copy this and give every symbol a different name.  That’s such a common pattern in COBOL that the #include like statement (COPY) has a REPLACING keyword that can be used to change a boilerplate prefix or suffix into a name that is specific to the new use.  Here’s an example of a copybook fragment that is meant to be used with COPY REPLACING:

where the #include site may look like:

and the result, after preprocessing, would be:

When you have two instances of the same type in COBOL, you have no way of knowing if that’s the case.  There’s no real type information — instead, you have to know that the membership is all identical.  For large structures, this can be very unintuitive, to say the least.

Enterprise COBOL 6.5 provides the TYPEDEF mechanism taken from TypeCOBOL & MicroFocus (according to Claude.)  Instead of a COPYbook that has to be copied with REPLACING, to define a type, one can do so in a structured fashion.  Example:

And at the use point:

Unlike conventional COBOL where every instance of “the same type” has a different name and different member names, this new declaration mechanism gives the same name to each member.  You can do that in conventional COBOL, and write something like:


but this new COBOL TYPEDEF comes with a (double-colon) qualified member access syntax. Here’s an example:

Conventional COBOL qualified access would look like:

with verbosity that obscures any meaning, typical of most COBOL code.  In this specific case, since the COPY REPLACING guaranteed different names for all fields, the use case could have implied membership access:

If you go looking for L-IN1-RE or L-IN2-RE, you will never find it, because it exists only as a member of the copybook that has been processed with REPLACING.  This is a great example of a COBOL software development problem.  Even when you have the COBOL source code, understanding the code is a reverse engineering exercise.

This TYPE/TYPEDEF syntax is nice, but it’s really just syntactic sugar.  It doesn’t actually introduce any notion of strict typing to the language.

There are some minor exceptions to this lack of type safety.  For example gcobol will not permit ZERO to be assigned to USAGE POINTER type.  Instead you have to assign NULL.  However, for the most part you can still convert anything to anything, and most of the time you’ll never get any sort of warning or error for doing so.

Despite the weak nature of this type mechanism, I think that it’s actually a very important feature.  When you have a typical production COBOL program with 10000 lines of global variables (WORKING-STORAGE), you now have the capability of running an analysis that retroactively extracts the underlying implicit type representation, only known to the original programmers long since dead.  You can now assign names to types that are common to a given program, and even better, names to entities that are common to a software suite.  The act of reverse engineering COBOL program behaviour from the source code can then be made a little bit easier.

It only took 65 years to jam this little bit of sanity into the COBOL programming language.

User Defined Functions

The next little bit of sanity added to the language is a mechanism to define a function.  If you say, wait a sec, no programming language doesn’t have functions, are you sure that COBOL doesn’t have functions.

IBM’s Enterprise COBOL 6.4, released in 2022 (according to Claude), added user defined functions, so how does this differ from the other function like constructs available in COBOL?  A COBOL programmer may characterize the COBOL paragraph as a function.  You can call such a “function” using the PERFORM statement.  Here’s an example of a paragraph IN-4, called twice in the paragraph IN-3:

There’s a key observation to make here.  Notice how the caller does not pass any parameter to the callee.  That’s not just because this particular “function” takes no parameters, but it’s because that it NOT POSSIBLE to pass parameters.  COBOL paragraphs are only function like in that they can be called, but

  • they cannot pass parameters,
  • cannot return anything,
  • cannot have local variables,
  • and may or may not, depending on the whim of the author, implicitly fall through to whatever code happens to follow them.

That last point is probably why, in this case, that there is a GO TO.  Without that GO TO, the IN-4 paragraph may actually be executed twice. But you can’t actually know if that will be the case unless you know how you got to IN-3.  If IN-3 was “called” implicitly, due to IN-2 before it finishing and falling through, then that same loop of two could be performed by:

In that case, IN-4 will be executed once by IN-3, then IN-3 will fall through to IN-4 and it will execute a second time.  The tricky thing about this example is that you can’t look at this code and know how many times IN-4 will be called.  If IN-3 was called with PERFORM, then IN-4 will be called once, but if IN-3 was “called” by fall-through or GO TO, then IN-4 will be called twice.

Basically, the language,  and all programs written in it, are sufficient to make any programmer have a terrible, horrible, no good, very bad day.

Given that paragraphs are “functions” that cannot have parameters, nor return codes, then how is is justifiable to call them functions.  That justification is only possible because you can use global variables to simulate parameters and returns.  Here’s an example:

These are real and imaginary complex “functions”, each relying on you to copy values into specific “input” global variables, and get “return” values from specific output global variables.

What you will often see in production code is that programmers use naming conventions to simulate parameters, perhaps like this, for example:

A naming convention like PARAGRAPH-<something>-IN, and PARAGRAPH-<something>-OUT, can make it more obvious that the expected use case for the code is to MOVE things to the -IN variables before a PERFORM and to grab stuff out of the -OUT variables when it’s done.  You don’t have any guarantee that this is anything more than a naming convention, and any paragraph or section could potentially change the “parameter” and “return” variables of any other paragraph or section (a section is a collection of one or more paragraphs).

It is at least conceptually possible to take a well structured COBOL program, that uses a naming convention like this to model parameters, and translate it to a sane language — but you need an audit to make sure that the convention is actually followed before any sort of automatic translation can occur.  And even if a program follows a function like naming convention like this, that doesn’t mean that each one, doesn’t also have a thousand other side effects that have to be figured out.  With all variables usually accessed without qualification, it is very hard to look at COBOL code and have any sort of idea what data structures it is operating on.

Somebody that knows COBOL may ask.  What about nested subprograms and external program calls.  Aren’t those “function” like.  That I would agree with.  A subprogram or external program is very much like a void function that passes parameters all by reference.  When used in a structured way, this could model function in a modern programming language that has a set of inputs and a set of outputs, or input, outputs and mixed ins/outs.

Here’s an example of that, using the COMPLEX-REAL example above:

This might be equivalent to the following C++:

struct complex{
   double re_;
   double im_;
};

void crreal(complex * in, double * out) {
   *out = in->re_; 
}

You could think of this as a function with a return *out, where *in is agreed to be used in a read only fashion.

As an abstract entity, the UDF has the characteristics of a PROGRAM, specifying a FUNCTION-ID instead of a PROGRAM-ID, and also requiring a RETURNS.  One might look like:

This is probably not an ideal recoding in as a UDF, as it is possible to use BY VALUE parameter passing (and I assume BY VALUE return).  However, it illustrates the rough idea.  One of the other differences between the UDF and a program (or subprogram) is that a UDF can be prototyped.  Here’s an example of that:

Typical of COBOL, the verbosity is horrendous.  This isn’t a one like prototype like you would have in C, and is much harder to read and understand.  You can, however, put all the bloated PROTOTYPEs for your library functions in a copybook, and have your program COPY that.  The fact that it can be prototyped is a big improvement over a COBOL program, which is completely untyped.  You can have a COBOL program that takes an int by value, but pass it a parameter by reference, or vice-versa ; or that is supposed to take 3 parameters, but is passed only one, or is passed 5.  The call site has no way of knowing if the type or nature of the parameters matches the implementation.

With a program being far superior to a paragraph as a function-like entity, you have to wonder why it is not a common COBOL paradigm.  I suspect that a big part of that is the expense of a paragraph vs. a PROGRAM.

At LzLabs, a call to a program, subprogram or otherwise, was not cheap.  A big part of that was WORKING-STORAGE related.  WORKING-STORAGE is something like static storage in C, if static storage persisted across multiple invocations of a program (until “rununit” termination).  Contrast that to LOCAL-STORAGE which is more “stack” like.  I don’t know whether an IBM Enterprise COBOL subprogram call is any cheaper than a call to an entry function.  It was not typical to see customer code that made use of PROGRAMs as function calls, except for very specific use cases.

I also don’t know if the new IBM User Defined Functions are cheap enough that programmers would opt to use them instead of paragraphs.  I saw only one or two programs out of thousands (both from one customer) that used this feature, but it was still a very new feature, and relative to the age of COBOL, it still is.

At least conceivably, a UDF that doesn’t use WORKING-STORAGE (only LOCAL-STORAGE and LINKAGE-SECTION), and if used only with pass by value parameters, could, theoretically, be as cheap as a C function call.  I don’t know if that’s the case on the mainframe, but it’s at least possible.  You could imagine a “fastcall” convention with pass by registers for the parameters for a UDF, instead of the usual indirect “PARM” mechanism.  Regardless of the cost, you could use this tomodel procedural programming in a way that can be translated to another language.

You could imagine that it would be possible to find paragraphs for which a set of WORKING-STORAGE variables are always re-written when the paragraph is executed (i.e.: find the set of variables that are effectively local to a paragraph), and then factor a paragraph out into a UDF with well defined semantics, with inputs, outputs, and local variables, reducing that giant set of “10000” working storage variables in the original program.  If you repeat that process across an entire program, also extracting named representations for the implicit types used, would you be able to systematically “find” the structure of the program, and then have a candidate for translation to something not as intrinsically evil as COBOL?

An absurd COBOL library: 2D Euclidean GA

December 31, 2023 COBOL, math and physics play , , , ,

I’ve achieved a new pinnacle of obscurity, and have now written a rudimentary COBOL implementation of a geometric algebra library for \( \mathbb{R}^2 \) calculations.

Who will use this?  Absolutely nobody.  Effectively, nobody knows geometric algebra.  Nobody wants to know COBOL, but some do.  The union of those two groups is vanishingly small (probably one: argued below.)

I understand that some Opus Dei members have taught themselves COBOL, as looking at COBOL has been found to be equally painful as a course of self flagellation.

Figure 0. A flagellation representation of COBOL.

Assuming that no Opus Dei practitioners know geometric algebra, that means that there is exactly one person in the world that both knows COBOL and geometric algebra.  Me.

Why did I write this little library?  Well, I was tickled to write something so completely stupid, and I’ve been laughing at the absurdity of it. I also thought I might learn a few things about COBOL in the process of trying to use it for something slightly non-trivial.  I’m adept at writing simple test programs that exercise various obscure compiler features, but those are usually fairly small.  On the flip side of complexity, I have to debug through a number of horribly complicated customer programs as part of my compiler validation work.  A simple real life test scenario might run 100+ COBOL programs in a set of CICS transactions, executing thousands of EXEC DLI and EXEC CICS statements as well as all of the rest of the COBOL language statements!  Despite having gained familiarity with COBOL from that sort of observational use, walking through stuff in the debugger doesn’t provide the same level of comfort with the language as writing code from scratch.  Since I have no interest in simulating a boring business application, why not do something just for fun as a learning game.

The compiler I am using does not seem to support object-COBOL (which would have been nicely suited for this project), so I’ve written my little toy in conventional COBOL, using one external procedure for each type of mathematical operation.  In the huge set of customer COBOL code that I’ve examined and done test compilations of, none of it has used object-COBOL.  I am guessing that the object-COBOL community is as large as the user base for my little toy COBOL geometric algebra library will ever be.

I’ve implemented methods to construct multivectors with scalar, vector and pseudoscalar components, or a general multivector with all of the above.  I’ve also implemented multiply, add, subtract, scalar multiplication, grade selection, and a DISPLAY function to write a multivector to SYSOUT (stdout equivalent.)

The multivector “type”

Figure 1 shows the implementation of my multivector type, implemented in copybook (include file) named MVI.  I have an alternate MV copybook that doesn’t have the VALUE (initialization) clauses, as you don’t want initialization for LINKAGE-SECTION values (i.e.: program parameters.)

Figure 1. Copybook with multivector declaration and initialization.

If you are wondering what the hell a ‘PIC S9(9) USAGE IS COMP-5’ is, well, that’s the “easy to remember” way to declare a 32-bit signed integer in COBOL.  A COMP-2, on the other hand, is a floating point value.

Figure 2 shows an example of the use of this copybook:

Figure 2. Using the multivector copybook.

Figure 3 shows these two copybook declarations after preprocessor expansion

Figure 3. Multivector global variable examples after preprocessing.

The global variable declarations above are roughly equivalent to the following pseudo C++ code (pretending that we can have anonymous unions that match the COBOL declarations above):

#include <complex>

using complex = std::complex<double>;

struct ga20{
   int grade{};
   union {
      struct { double sc{}; double ps{}; };
      complex g02{};
   };
   union { 
      struct { double x{}; double y{}; };
      complex g1{};
   };
};

ga20 a;
ga20 b;

COBOL is inherently untyped, but requires matching types for CALL parameters, or else all hell ensues, so you have to rely on naming conventions and other mechanisms to enforce the required type equivalences.  In this toy GA library, I’ve used copybooks to enforce the types required for everything.  Global variable declarations like these A-MV and B-MV variables are declared only using a copybook that knows the representation required, and all the uses in sub-programs of the effective -MV “type” use a matching copybook for their declarations.  However, I’ve also made use of the lack of typing to treat A-G02, B-G02, A-G1, and B-G1 as if they were complex numbers, and pass those “variables” off to complex number sub-programs, knowing that I’ve constructed the parameters to those programs in a way that is bit compatible with the MV field values.  You can screw things up really nicely doing stuff like this, especially because all COBOL sub-program parameters are (generally) passed by reference.  If you don’t match up the types right “fun ensues.”

Also observe that the nested level specifiers are optional in COBOL.  For nested fields in C++, we might write a.g1.x.  With a nested variable like this in COBOL, we could write something equivalent to that, like:

A-X OF A-G1 OF A-MV

but we can leave out any of the intermediate “level” specifications if we want.  This gets really confusing in complicated real-life COBOL code.  If you are looking to see where something is modified, you have to not only look for the variable of interest, but also any of the higher level fields, since any of those could have been passed off to other code, which implicitly wrote the value you are interested in.

Here’s what one of these multivectors looks like in memory on my (Linux x86-64) system

(lldb) c
Process 3903259 resuming
Process 3903259 stopped
* thread #10, name = 'GA20', stop reason = breakpoint 7.1
    frame #0: 0x00007fffd9189a02 PJOOT.GA20V01.LOADLIB(MULT).ec73dc4b`MULT at MULT.cob:50:1
   47              CALL GA-MKVECTOR-MODIFY USING C-MV, A-X, A-Y
   48              CALL GA-MKPSEUDO-MODIFY USING D-MV, A-PS
   49  
-> 50              MOVE 'A' TO WS-DISPPARM-N
   51              CALL GA-DISPLAY USING
   52                WS-DISPPARM-N,
   53                A-MV
(lldb) p A-MV
(A-MV) A-MV = {
  A-GRADE = -1
  A-G02 = (A-SC = 1, A-PS = 4)
  A-G1 = (A-X = 2, A-Y = 3)
}

i.e.: this has the value \( 1 + 2 \mathbf{e}_{12} + 3 \mathbf{e}_1 + 4 \mathbf{e}_1 \).

Looking at the multivector in it’s hex representation:

(lldb) fr v -format x A-MV
(A-MV) A-MV = {
  A-GRADE = 0xffffffff
  A-G02 = {
    A-SC = 0x3ff0000000000000
    A-PS = 0x4010000000000000
  }
  A-G1 = {
    A-X = 0x4000000000000000
    A-Y = 0x4008000000000000
  }
}

we see that the debugger is showing an underlying IEEE floating point representation for the COMP-2 variables in the program as it was compiled.

I have a multivector print routine that prints multivectors to SYSOUT:

Figure 4. Calling the multivector DISPLAY function.

where WS-DISPPARM-N is a PIC X(20).  (i.e.: a fixed size character array.)  Output for the A-MV value showing in the debug session above looks like:

A                     ( .10000000000000000E 01)                                                                         
                    + ( .20000000000000000E 01) e_1 + ( .30000000000000000E 01) e_2                                     
                    + ( .40000000000000000E 01) e_{12}            

End of sentence required for nested IFs?

I encountered a curious language issue in my multivector multiply function.  Here’s an example of how I’ve been coding IF statements

Figure 5. An IF END-IF pair without a period to terminate the sentence.

Notice that I don’t do anything special between the END-IF and the statement that follows it.  However, if I have an IF statement that includes nested IF END-IFs, then it appears that I need a period after the final END-IF, like so:

Figure 6. An IF with nested conditions that seems to require a period to terminate the sentence.

If I don’t include that period after the final END-IF (ending the COBOL sentence), then in some circumstances, I was seeing the program exit after the last interior basic block within this nested IF was executed.  In COBOL parlance, it seems as if a GOBACK (i.e.: return) was implicitly executed once we fell out of the big nested IF.  Why is that period required for a nested IF, but not for a simple IF?

In my “Murach’s mainframe COBOL”, he ends ALL if statements with a period, even simple IFs.  I don’t see a rationale for that in the book anywhere, but it’s a ~700 page book, so perhaps he says why at some point.

I’ve asked our compiler guys if this is a bug or expected behaviour, but I am guessing the latter…. I just don’t know why.

The multiplication kernel for this library

The workhorse of this GA(2,0) implementation, is a multivector multiplication operation, which can be implemented in two lines in Mathematica (or C++)

multivector /: multivector[_, m1_, m2_] ** multivector[_, n1_, n2_] := 
   multivector[-1, m1 n1 + Conjugate[m2] n2, n1 m2 + Conjugate[m1] n2 ]

In COBOL, it takes a lot more, and as usual, COBOL verbosity obfuscates things considerably. Here’s the equivalent code in my library:

Figure 7. GA(2,0) multiplication kernel in COBOL.

The library and a little test program.

If you are curious, you can poke around in the code for this library and the test program on github.  The sample/test program is src/MULT.cob, and running the job gives the following SYSOUT:

Figure 8. Sample SYSOUT for MULT.cob

A less evil COBOL toy complex number library

December 29, 2023 COBOL , , , , , , , , , ,

In a previous post ‘The evil of COBOL: everything is in global variables’, I discussed the implementation of a toy complex number library in COBOL.

That example code was a single module, having one paragraph for each function. I used a naming convention to work around the fact that COBOL functions (paragraphs) are completely braindead and have no input nor output parameters, and all such functions in a given loadmodule have access to all the variables of the entire program.

Perhaps you try to call me on my claim that COBOL doesn’t have parameters, nor does it have return values.  That’s true if you consider COBOL paragraphs to be the equivalent to functions.  I’ve heard paragraphs described as not-really-functions, and there’s some truth to that, especially since you can do a PERFORM range that executes a set of paragraphs, and there can be non-intuitive control flow transfers between paragraphs of such a range of execution, that is entirely non-function like.

There is one circumstance where COBOL parameters can be found.  It is actually possible to have both input and output parameters in COBOL, but it can only be done at a program level (i.e.: int main( int argc, char ** )). So, you can write a less braindead COBOL library, with a set of meaningful input and output parameters for each function, by using CALL instead of PERFORM, and a whole set of external programs, one for each of the operations that is desired. With that in mind, I’ve reworked my COBOL complex number toy library to use this program-level organization.  This is still a toy library implementation, but serves to illustrate the ideas.  The previous paragraph implementation can be found in the same repository, in the ../paragraphs-as-library/ directory.

Here are some examples of the functions in this little library, and examples of calls to them.

Multiply code:

And here’s a call to it:

Notice that I’ve opted to use dynamic calls to the COBOL functions, using a copybook that lists all the possible function names:

This frees me from the constraint of having to use inscrutable 8-character function names, which will get confusing as the library grows.

Like everything in COBOL, the verbosity makes it fairly unreadable, but refactoring all paragraphs into external programs, does make the calling code, and even the library functions themselves, much more readable.  It still takes 49 lines of code, to initialize two complex numbers, multiply them and display them to stdout.

Compare to the same thing in C++, which is 18 lines for a grow-your-own complex implementation with multiply:

#include <iostream>

struct complex{
   double re_;
   double im_;
};

complex mult(const complex & a, const complex & b) {
   // (a + b i)(c + d i) = a c - b d + i( b c + a d) 
   return complex{ a.re_ * b.re_ - a.im_ * b.im_,
                   a.im_ * b.re_ + a.re_ * b.im_ };
}

int main()
{
   complex a{1,2};
   complex b{3,4};
   complex c = mult(a, b);
   std::cout << "c = " << c.re_ << " +(" << c.im_ << ") I\n";

   return 0;
}

and only 11 lines if we use the standard library complex implementation:

#include <iostream>
#include <complex>

using complex = std::complex<double>;

int main() 
{  
   complex a{1,2}; 
   complex b{3,4};
   complex c = a * b;
   std::cout << "c = " << c << "\n";

   return 0;
}

Basically, we have one line for each operation: init, init, multiply, display, and all the rest is one-time fluff (the includes, main decl, return, …)

It turns out that the so-called OBJECT oriented COBOL extension to the language (circa Y2K), is basically a packaging of external-style-programs into collections that are class prefixed, just as I’ve done above.  This provides the capability for information hiding, and allows functions to have parameters and return values.  However, this doesn’t try to rectify the fundamental failure of the COBOL language: everything has to be in a global variable.  This language extension appears to be a hack that may have been done primarily for Java integration, which is probably why nobody uses it.  You just can’t take the dinosaur out of COBOL.

Sadly, it didn’t take people long to figure out that it’s incredibly dumb to require all variables to be global.  Even PL/I, which is 59 years old at the time I write this (only five years younger than COBOL), got it right.  They added parameters and return values to functions, and allow functions to have variables that are limited to that scope.  PL/I probably went too far, and added lots of features that are also braindead (like the PL/I macro preprocessor), but the basic language is at least sane.  It’s interesting that COBOL never evolved.  A language like C++ may have evolved too much, and still is, but the most flagrant design flaw in the COBOL language has been there since inception, despite every other language in the world figuring out that sort of stupidity should not be propagated.

Note that I work on the development of a COBOL and PL/I compilation stack.  I really like my work, which is challenging and great fun, and I work with awesome people. That doesn’t stop me from acknowledging that COBOL is a language spawned in hell by Satan. I can love my work, which provides tools for customers allowing them to develop, maintain and debug COBOL code, but also have great pity and remorse for those customers, having inherited ancient code built with an ancient language, and having no easy migration path away from that language.

The evil of COBOL: everything is in global variables

December 7, 2023 COBOL , , , , , , ,

COBOL does not have stack variables.  Everything is a global variable.  There is a loose equivalent of a function, called a paragraph, which can be called using a PERFORM statement, but a paragraph does not have any input or output variables, and no return code, so if you want it to behave like a function, you have to construct some sort of complicated naming convention using your global variables.

I’ve seen real customer COBOL programs with many thousands of global variables.  A production COBOL program is usually a giant sequence of MOVEs, MOVE A TO B, MOVE B TO C, MOVE C TO D, MOVE D TO E, … with various PERFORMs or GOTOs, or other things in between.  If you find that your variable has a bad value in it, that is probably because it has been copied from something that was copied from something, that was copied from something, that’s the output of something else, that was copied from something, 9 or 10 times.

I was toying around with the idea of coding up a COBOL implementation of 2D Euclidean geometric algebra, just as a joke, as it is surely the worst language in the world.  Yes, I work on a COBOL compiler project. The project is a lot of fun, and the people I work with are awesome, but I don’t have to like the language.

If I was to implement this simplest geometric algebra in COBOL, the logical starting place for that would be to implement complex numbers in COBOL first.  That is because we can use a pair of complex numbers to implement a 2D multivector, with one complex number for the vector part, and a complex number for the scalar and pseudoscalar parts.  That technique has been detailed on this blog previously, and also in a Mathematica module Cl20.m.

Trying to implement a couple of complex number operations in COBOL got absurd really fast.  Here’s an example.  First step was to create some complex number types.  I did that with a copybook (include file), like so:

This can be included multiple times, each time with a different name, like so:

The way that I structured all my helper functions, was with one set of global variables for input (at least one), and if appropriate, one output global variable.  Here’s an example:

So, if I want to compute and display a value, I have a whole pile of stupid MOVEs to do in and out of the appropriate global variables for each of the helper routines in question:

I wrote enough of this little complex number library that I could do conjugate, real, imaginary, multiply, inverse, and divide operations.  I can run that little program with the following JCL

//COMPLEX JOB
//A EXEC PGM=COMPLEX
//SYSOUT   DD SYSOUT=*
//STEPLIB  DD DSN=PJOOT.SAMPLE.COMPLEX,
//  DISP=SHR

and get this SYSOUT:

STEP A SYSOUT:
A                    =  .10000000000000000E 01 + ( .20000000000000000E 01) I
B                    =  .30000000000000000E 01 + ( .40000000000000000E 01) I
CONJ(A)              =  .10000000000000000E 01 + (-.20000000000000000E 01) I
RE(A)                =  .10000000000000000E 01
IM(A)                =  .20000000000000000E 01
A * B                = -.50000000000000000E 01 + ( .10000000000000000E 02) I
1/A                  =  .20000000000000000E 00 + (-.40000000000000000E 00) I
A/B                  =  .44000000000000000E 00 + ( .80000000000000000E-01) I

If you would like your eyes burned further, you can access the full program on github here. It takes almost 200 lines of code to do almost nothing.

An invalid transformation of a COBOL data description entry

August 28, 2020 COBOL , , , , ,

Here’s a subtle gotcha that we saw recently.  A miraculous tool transformed some putrid DELTA generated COBOL code from GOTO soup into human readable form.  Among the transformations that this tool did, were modifications to working storage data declarations (removing unused variables in the source, and simplifying some others).  One of those transformations was problematic.  In that problematic case the pre-transformed declarations were:

This declaration is basically a union of char[8] with a structure that has four char[2]’s, with the COBOL language imposed restriction that the character values can be only numeric (EBCDIC) digits (i.e. ‘\xF0’, …, ‘\xF9’).  In the code in question none of the U044-BIS* variables (neither the first, nor the aliases) were ever used explicitly, but they were passed into another COBOL program as LINKAGE SECTION variables and used in the called program.

Here’s how the tool initially transformed the declaration:

It turns out that dropping that first PIC and removing the corresponding REDEFINES clause, was an invalid transformation in this case, because the code used INITIALIZE on the level 01 object that contained these variables.

On page 177, of the “178 Enterprise COBOL for z/OS: Enterprise COBOL for z/OS, V6.3 Language Reference”, we have:

(copyright IBM)

FILLER
A data item that is not explicitly referred to in a program. The keyword FILLER is optional. If specified,
FILLER must be the first word following the level-number.

… snip …

In an INITIALIZE statement:
• When the FILLER phrase is not specified, elementary FILLER items are ignored.

The transformation of the code in question would have been correct provided the “INITIALIZE foo” was replaced with “INITIALIZE foo WITH FILLER”.  The bug in the tool was fixed, and the transformed code in question was, in this case, changed to drop all the aliasing:

As a side effect of encountering this issue, I learned a number of things:

  • FILLER is actually a COBOL language keyword, with specific semantics, and not just a variable naming convention.
  • Both ‘INITIALIZE’ and ‘INITIALIZE … WITH FILLER’ are allowed.
  • INITIALIZE (without FILLER) doesn’t do PIC appropriate initialization of FILLER variables (we had binary zeros instead of EBCDIC zeros as a result.)