## The evil of COBOL: everything is in global variables

COBOL does not have stack variables.  Everything is a global variable.  There is a loose equivalent of a function, called a paragraph, which can be called using a PERFORM statement, but a paragraph does not have any input or output variables, and no return code, so if you want it to behave like a function, you have to construct some sort of complicated naming convention using your global variables.

I’ve seen real customer COBOL programs with many thousands of global variables.  A production COBOL program is usually a giant sequence of MOVEs, MOVE A TO B, MOVE B TO C, MOVE C TO D, MOVE D TO E, … with various PERFORMs or GOTOs, or other things in between.  If you find that your variable has a bad value in it, that is probably because it has been copied from something that was copied from something, that was copied from something, that’s the output of something else, that was copied from something, 9 or 10 times.

I was toying around with the idea of coding up a COBOL implementation of 2D Euclidean geometric algebra, just as a joke, as it is surely the worst language in the world.  Yes, I work on a COBOL compiler project. The project is a lot of fun, and the people I work with are awesome, but I don’t have to like the language.

If I was to implement this simplest geometric algebra in COBOL, the logical starting place for that would be to implement complex numbers in COBOL first.  That is because we can use a pair of complex numbers to implement a 2D multivector, with one complex number for the vector part, and a complex number for the scalar and pseudoscalar parts.  That technique has been detailed on this blog previously, and also in a Mathematica module Cl20.m.

Trying to implement a couple of complex number operations in COBOL got absurd really fast.  Here’s an example.  First step was to create some complex number types.  I did that with a copybook (include file), like so:

This can be included multiple times, each time with a different name, like so:

The way that I structured all my helper functions, was with one set of global variables for input (at least one), and if appropriate, one output global variable.  Here’s an example:

So, if I want to compute and display a value, I have a whole pile of stupid MOVEs to do in and out of the appropriate global variables for each of the helper routines in question:

I wrote enough of this little complex number library that I could do conjugate, real, imaginary, multiply, inverse, and divide operations.  I can run that little program with the following JCL

```//COMPLEX JOB
//A EXEC PGM=COMPLEX
//SYSOUT   DD SYSOUT=*
//STEPLIB  DD DSN=PJOOT.SAMPLE.COMPLEX,
//  DISP=SHR
```

and get this SYSOUT:

```STEP A SYSOUT:
A                    =  .10000000000000000E 01 + ( .20000000000000000E 01) I
B                    =  .30000000000000000E 01 + ( .40000000000000000E 01) I
CONJ(A)              =  .10000000000000000E 01 + (-.20000000000000000E 01) I
RE(A)                =  .10000000000000000E 01
IM(A)                =  .20000000000000000E 01
A * B                = -.50000000000000000E 01 + ( .10000000000000000E 02) I
1/A                  =  .20000000000000000E 00 + (-.40000000000000000E 00) I
A/B                  =  .44000000000000000E 00 + ( .80000000000000000E-01) I

```

If you would like your eyes burned further, you can access the full program on github here. It takes almost 200 lines of code to do almost nothing.

## Unexpected COBOL implicit operator distribution!

Another day, another surprise from COBOL.  I was debugging a failure in a set of COBOL programs, and it seemed that the place things started going wrong was a specific IF check, which basically looked like:

The original code was triple incomprehensible, as it:

• Was in German.
• Was in COBOL.
• Was generated by DELTA and was completely disgusting spaghetti code.  A map of the basic blocks would have looked like it was colored by a three year old vigorously scribbling with a crayon.

It turns out that there was a whole pile of error handling code that happened after the IF check, and I correctly guessed that there was something wrong with how our compiler handled the IF statement.

What I didn’t guess was what the actual operator precedence in this IF check was.  Initially, my C programmer trained mind looked at that IF condition, and said “what the hell is that!?”  I then guessed, incorrectly, that it meant:

if ( X != SPACES and X = ZERO)

where X is the array slice expression.  That interpretation did not explain the runtime failure, so I was hoping that I was both wrong about what it meant, but right that there was a compiler bug.

It turns out that in COBOL the implicit operator for the second part of the IF statement is  ‘NOT =’.  i.e. the NOT= distributes over the AND, so this IF actually meant:

if ( X != SPACES and X != ZERO)

In the original program, that IF condition actually makes sense.  After some reflection, I see there is some sense to this distribution, but it certainly wasn’t intuitive after programming C and C++ for 27 years. I’d argue that the root cause of the confusion is COBOL itself. A real programming language would use a temporary variable for the really long array slice expression, which would obliterate the need for counter-intuitive operator distribution requirements. Imagine something like:

```  VAR X = PAYLOAD-DATA(PAYLOAD-START(TALLY): PAYLOAD-END(TALLY))

IF (X NOT = SPACES) AND (X NOT = LOW-VALUE)
NEXT SENTENCE ELSE GO TO CHECK-IT-DONE.
```

(Incidentally LOW-VALUE means binary-zero, not a ‘0’ character that has a 0xF0 value in EBCDIC).

COBOL is made especially incomprehensible since you can’t declare an in-line temporary in COBOL.  If you want one, you have to go thousands of lines up in the code to the WORKING-STORAGE section, and put a variable in there somewhere.  Such a variable is global to the whole program, and you have to search to determine it’s usage scope.  You probably also need a really long name for it because of namespace collision with all the other global variables.  Basically, you are better off not using any helper variables, unless you want to pay an explicit cost in overall code complexity.

In my test program that illustrated the compiler bug, I made other COBOL errors. I blame the fact that I was using gross GOTO ridden code like the original. Here was my program:

Because I misinterpreted the NOT= distribution, I thought this should produce:

```000000001: !(not space and low-value.)
000000002: !(not space and low-value.)
000000003: !(not space and low-value.)
000000003: not space and low-value.
```

Once the subtle compiler bug was fixed, the actual SYSOUT from the program was:

```000000001: not space and low-value.
000000001: !(not space and low-value.)
000000002: !(not space and low-value.)
000000003: !(not space and low-value.)
```

See how both the TRUE and FALSE basic blocks executed in my code. That didn’t occur in the original code, because it used an additional dummy EXIT paragraph to end the PERFORM range, and had a GOTO out of the first paragraph.

There is more modern COBOL syntax that can avoid this GOTO hell, but I hadn’t used it, as I kept the reproducer somewhat like the original code.

## peeking into relocation of function static in shared library

Here’s GUI (TUI) output of a function with a static variable access:

```B+ |0x7ffff7616800 <st32>           test   %edi,%edi                                                                                                                     |
|0x7ffff7616802 <st32+2>         je     0x7ffff7616811 <st32+17>                                                                                                      |
|0x7ffff7616804 <st32+4>         mov    %edi,%eax                                                                                                                     |
|0x7ffff7616806 <st32+6>         bswap  %eax                                                                                                                          |
|0x7ffff7616808 <st32+8>         mov    %eax,0x200852(%rip)        # 0x7ffff7817060 <st32.yst32>                                                                        |
|0x7ffff761680e <st32+14>        mov    %edi,%eax                                                                                                                     |
|0x7ffff7616810 <st32+16>        retq                                                                                                                                 |
>|0x7ffff7616811 <st32+17>        mov    0x200849(%rip),%edi        # 0x7ffff7817060 <st32.yst32>                                                                        |
|0x7ffff7616817 <st32+23>        bswap  %edi                                                                                                                          |
|0x7ffff7616819 <st32+25>        mov    %edi,%eax                                                                                                                     |
|0x7ffff761681b <st32+27>        retq                                                                                                                                 |
|0x7ffff761681c <_fini>          sub    \$0x8,%rsp                                                                                                                     |
|0x7ffff7616824 <_fini+8>        retq                                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```

The associated code is:

```int st32( int v ) {
static int yst32 = 0x1a2b3c4d;

if ( v ) {
yst32 = v;
}

return yst32;
}
```

The object code dump (prior to relocation) just has zeros in the offset for the variable:

```\$ objdump -d g.bs.o | grep -A12 '<st32>'
0000000000000050 <st32>:
50:   85 ff                   test   %edi,%edi
52:   74 0d                   je     61 <st32+0x11>
54:   89 f8                   mov    %edi,%eax
56:   0f c8                   bswap  %eax
58:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # 5e <st32+0xe>
5e:   89 f8                   mov    %edi,%eax
60:   c3                      retq
61:   8b 3d 00 00 00 00       mov    0x0(%rip),%edi        # 67 <st32+0x17>
67:   0f cf                   bswap  %edi
69:   89 f8                   mov    %edi,%eax
6b:   c3                      retq
```

The linker has filled in the real offsets in question, and the dynamic loader has collaborated to put the data segment in the desired location.

The observant reader may notice bwsap instructions in the listings above that don’t make sense for x86_64 code. That is because this code is compiled with an LLVM pass that performs byte swapping at load and store points, making it big endian in a limited fashion.

The book Linkers and Loaders has some nice explanation of how relocation works, but I wanted to see the end result first hand in the debugger. It turned out that my naive expectation that the sum of \$rip and the constant relocation factor is the address of the global variable (actually static in this case) is incorrect. Check that out in the debugger:

```(gdb) p /x 0x200849+\$rip
\$1 = 0x7ffff781705a

(gdb) x/10 \$1
0x7ffff781705a <gy+26>: 0x22110000      0x2b1a4433      0x00004d3c      0x00000000
0x7ffff781706a: 0x00000000      0x00000000      0x00000000      0x30350000
0x7ffff781707a: 0x20333236      0x64655228
```

My magic value 0x1a2b3c4d looks like it is 6 bytes into the \$rip + 0x200849 location that the disassembly appears to point to, and that is in fact the case:

```(gdb) x/10 \$1+6
0x7ffff7817060 <st32.yst32>:      0x4d3c2b1a      0x00000000      0x00000000      0x00000000
0x7ffff7817070 <y32>:   0x00000000      0x00000000      0x32363035      0x52282033
0x7ffff7817080: 0x48206465      0x34207461
```

My guess was the mysterious offset of 6 required to actually find this global address was the number of bytes in the MOV instruction, and sure enough that MOV is 6 bytes long:

```(gdb) disassemble /r
Dump of assembler code for function st32:
0x00007ffff7616800 <+0>:     85 ff   test   %edi,%edi
0x00007ffff7616802 <+2>:     74 0d   je     0x7ffff7616811 <st32+17>
0x00007ffff7616804 <+4>:     89 f8   mov    %edi,%eax
0x00007ffff7616806 <+6>:     0f c8   bswap  %eax
0x00007ffff7616808 <+8>:     89 05 52 08 20 00       mov    %eax,0x200852(%rip)        # 0x7ffff7817060 <st32.yst32>
0x00007ffff761680e <+14>:    89 f8   mov    %edi,%eax
0x00007ffff7616810 <+16>:    c3      retq
=> 0x00007ffff7616811 <+17>:    8b 3d 49 08 20 00       mov    0x200849(%rip),%edi        # 0x7ffff7817060 <st32.yst32>
0x00007ffff7616817 <+23>:    0f cf   bswap  %edi
0x00007ffff7616819 <+25>:    89 f8   mov    %edi,%eax
0x00007ffff761681b <+27>:    c3      retq
End of assembler dump.
```

So, it appears that the %rip reference in the disassembly is really the value of the instruction pointer after the instruction executes, which is curious.

Note that this 4 byte relocation requires the shared library code segment and the shared library data segment be separated by no more than 4G. The linux dynamic loader has put all the segments back to back so that this is the case. This can be seen from /proc/PID/maps for the process:

```\$ ps -ef | grep maindl
pjoot    17622 17582  0 10:50 pts/3    00:00:00 /home/pjoot/workspace/pass/global/maindl libglobtestbs.so

\$ grep libglob /proc/17622/maps
7ffff7616000-7ffff7617000 r-xp 00000000 fc:00 2492653                    /home/pjoot/workspace/pass/global/libglobtestbs.so
7ffff7617000-7ffff7816000 ---p 00001000 fc:00 2492653                    /home/pjoot/workspace/pass/global/libglobtestbs.so
7ffff7816000-7ffff7817000 r--p 00000000 fc:00 2492653                    /home/pjoot/workspace/pass/global/libglobtestbs.so
7ffff7817000-7ffff7818000 rw-p 00001000 fc:00 2492653                    /home/pjoot/workspace/pass/global/libglobtestbs.so
```

We’ve got a read-execute mmap region, where the code lies, and a read-write mmap region for the data. There’s a read-only segment which I presume is for read only global variables (my shared lib has one such variable and we have one page worth of space allocated for read only memory).

I wonder what the segment that has none of the read, write, nor execute permissions set is?