EBCDIC

EBCDIC, thou art evil.

November 18, 2020 Mainframe , , ,

Here’s a bit of innocuous code. It was being compiled with gcc -fexec-charset=1047, so all the characters and strings were being treated as EBCDIC. For example ‘0’ = ‘\xF0’.

    if (c >= '0' && c <= '9')                                                                                                         
         c -= '0';                                                                                                                    
    else if (c >= 'A' && c <= 'Z')                                                                                                    
         c -= 'A' - 10;                                                                                                               
    else if (c >= 'a' && c <= 'z')                                                                                                    
         c -= 'a' - 10;                                                                                                               
    else                                                                                                                              
         break;         

Specifying the charset is not enough to port this code to the mainframe.  The problem is that EDCDIC is completely braindead, and DOESN’T PUT THE FRIGGEN LETTERS TOGETHER!

The letters are clustered in groups:

  • a-i
  • j-r
  • s-z
  • A-I
  • J-R
  • S-Z

with whole piles of crap between each range of characters, so comparisons like c >= ‘A’ && c <= ‘Z’ are useless, as are constructions like (c-‘A’-10) since c in J-R or S-Z will break that.

Now I have a big hunt and destroy task ahead of me.  I can fix this code, but where else are problems like this lurking!

An invalid transformation of a COBOL data description entry

August 28, 2020 COBOL , , , , ,

Here’s a subtle gotcha that we saw recently.  A miraculous tool transformed some putrid DELTA generated COBOL code from GOTO soup into human readable form.  Among the transformations that this tool did, were modifications to working storage data declarations (removing unused variables in the source, and simplifying some others).  One of those transformations was problematic.  In that problematic case the pre-transformed declarations were:

This declaration is basically a union of char[8] with a structure that has four char[2]’s, with the COBOL language imposed restriction that the character values can be only numeric (EBCDIC) digits (i.e. ‘\xF0’, …, ‘\xF9’).  In the code in question none of the U044-BIS* variables (neither the first, nor the aliases) were ever used explicitly, but they were passed into another COBOL program as LINKAGE SECTION variables and used in the called program.

Here’s how the tool initially transformed the declaration:

It turns out that dropping that first PIC and removing the corresponding REDEFINES clause, was an invalid transformation in this case, because the code used INITIALIZE on the level 01 object that contained these variables.

On page 177, of the “178 Enterprise COBOL for z/OS: Enterprise COBOL for z/OS, V6.3 Language Reference”, we have:

(copyright IBM)

FILLER
A data item that is not explicitly referred to in a program. The keyword FILLER is optional. If specified,
FILLER must be the first word following the level-number.

… snip …

In an INITIALIZE statement:
• When the FILLER phrase is not specified, elementary FILLER items are ignored.

The transformation of the code in question would have been correct provided the “INITIALIZE foo” was replaced with “INITIALIZE foo WITH FILLER”.  The bug in the tool was fixed, and the transformed code in question was, in this case, changed to drop all the aliasing:

As a side effect of encountering this issue, I learned a number of things:

  • FILLER is actually a COBOL language keyword, with specific semantics, and not just a variable naming convention.
  • Both ‘INITIALIZE’ and ‘INITIALIZE … WITH FILLER’ are allowed.
  • INITIALIZE (without FILLER) doesn’t do PIC appropriate initialization of FILLER variables (we had binary zeros instead of EBCDIC zeros as a result.)

Listing the code pages for gdb ‘set target-charset’

August 21, 2020 C/C++ development and debugging. , , , , , ,

I wanted to display some internal state as an IBM-1141 codepage, but didn’t know the name to use.  I knew that EBCDIC-US could be used for IBM-1047, but gdb didn’t like ibm-1147:

(gdb) set target-charset EBCDIC-US
(gdb) p (char *)0x7ffbb7b58088
$2 = 0x7ffbb7b58088 "{Jim       ;012}", ' ' <repeats 104 times>
(gdb) set target-charset ibm-1141
Undefined item: "ibm-1141".

I’d either didn’t know or had forgotten that we can get a list of the supported codepages. The help shows this:

(gdb) help set target-charset
Set the target character set.
The `target character set' is the one used by the program being debugged.
GDB translates characters and strings between the host and target
character sets as needed.
To see a list of the character sets GDB supports, type `set target-charset'<TAB>

I had to hit tab twice, but after doing so, I see:

(gdb) set target-charset 
Display all 200 possibilities? (y or n)
1026               866                ARABIC7            CP-HU              CP1129             CP1158             CP1371             CP4517             CP856              CP903
1046               866NAV             ARMSCII-8          CP037              CP1130             CP1160             CP1388             CP4899             CP857              CP904
1047               869                ASCII              CP038              CP1132             CP1161             CP1390             CP4909             CP860              CP905
10646-1:1993       874                ASMO-708           CP1004             CP1133             CP1162             CP1399             CP4971             CP861              CP912
10646-1:1993/UCS4  8859_1             ASMO_449           CP1008             CP1137             CP1163             CP273              CP500              CP862              CP915
437                8859_2             BALTIC             CP1025             CP1140             CP1164             CP274              CP5347             CP863              CP916
500                8859_3             BIG-5              CP1026             CP1141             CP1166             CP275              CP737              CP864              CP918
500V1              8859_4             BIG-FIVE           CP1046             CP1142             CP1167             CP278              CP770              CP865              CP920
850                8859_5             BIG5               CP1047             CP1143             CP1250             CP280              CP771              CP866              CP921
851                8859_6             BIG5-HKSCS         CP1070             CP1144             CP1251             CP281              CP772              CP866NAV           CP922
852                8859_7             BIG5HKSCS          CP1079             CP1145             CP1252             CP282              CP773              CP868              CP930
855                8859_8             BIGFIVE            CP1081             CP1146             CP1253             CP284              CP774              CP869              CP932
856                8859_9             BRF                CP1084             CP1147             CP1254             CP285              CP775              CP870              CP933
857                904                BS_4730            CP1089             CP1148             CP1255             CP290              CP803              CP871              CP935
860                ANSI_X3.110        CA                 CP1097             CP1149             CP1256             CP297              CP813              CP874              CP936
861                ANSI_X3.110-1983   CN                 CP1112             CP1153             CP1257             CP367              CP819              CP875              CP937
862                ANSI_X3.4          CN-BIG5            CP1122             CP1154             CP1258             CP420              CP850              CP880              CP939
863                ANSI_X3.4-1968     CN-GB              CP1123             CP1155             CP1282             CP423              CP851              CP891              CP949
864                ANSI_X3.4-1986     CP-AR              CP1124             CP1156             CP1361             CP424              CP852              CP901              CP950
865                ARABIC             CP-GR              CP1125             CP1157             CP1364             CP437              CP855              CP902              auto
*** List may be truncated, max-completions reached. ***

There’s my ibm-1141 in there, but masquerading as CP1141, so I’m able to view my data in that codepage, and lookup the value of characters of interest in 1141:

(gdb) set target-charset CP1141
(gdb) p (char *)0x7ffbb7b58088
$3 = 0x7ffbb7b58088 "äJim       ;012ü", ' ' <repeats 104 times>
(gdb) p /x '{'
$4 = 0x43
(gdb) p /x '}
Unmatched single quote.
(gdb) p /x '}'
$5 = 0xdc
(gdb) p /x *(char *)0x7ffbb7b58088
$6 = 0xc0

I’m able to conclude that the buffer in question appears to be in CP1047, not CP1141 (the first character, which is supposed to be ‘{‘ doesn’t have the CP1141 value of ‘{‘).

File organization in really old COBOL code.

May 7, 2020 Mainframe , , , , , , , , , , , , ,

I encountered customer COBOL code today with a file declaration of the following form:

000038   SELECT AUSGABE ASSIGN TO UR-S-AUSGABE            
000039    ACCESS IS SEQUENTIAL.                   
...
000056 FD  AUSGABE                                                     
000057     RECORDING F                                                  
000058     BLOCK 0 RECORDS                                              
000059     LABEL RECORDS OMITTED.                                       

where the program’s JCL used an AUSGABE (German “output”) DDNAME of the following form:

//AUSGABE   DD    DUMMY

The SELECT looked completely wrong to me, as I thought that SELECT is supposed to have the form:

SELECT cobol-file-variable-name ASSIGN TO ddname

That’s the syntax that my Murach’s Mainframe COBOL uses, and also what I’d seen in big-blue’s documentation.

However, in this customer’s code, the identifier UR-S-AUSGABE is longer than 8 characters, so it sure didn’t look like a DDNAME. I preprocessed the code looking to see if UR-S-AUSGABE was hiding in a copybook (mainframe lingo for an include file), but it wasn’t. How on Earth did this work when it was compiled and run on the original mainframe?

It turns out that [LABEL-]S- or [LABEL]-AS- are ways that really old COBOL code used to specify file organization (something like PL/I’s ENV(ORGANIZATION) clauses for FILEs). This works on the mainframe because a “modern” mainframe COBOL compiler strips off the LABEL- prefix if specified and the organization prefix S- as well, essentially treating those identifier fragments as “comments”.

For anybody reading this who has only programmed in a sane programming language, on sane operating systems, this all probably sounds like verbal diarrhea.  What on earth is a file organization and ddname?  Do I really have to care about those just to access a file?  Well, on the mainframe, yes, you do.

These mysterious dependencies highlight a number of reasons why COBOL code is hard to migrate. It isn’t just a programming language, but it is tied to the mainframe with lots of historic baggage in ways that are very difficult to extricate.  Even just to understand how to open a file in mainframe COBOL you have a whole pile of obstacles along the learning curve:

  • You don’t just run the program in a shell, passing in arguments, but you have to construct a JCL job step to do so.  This specifies parameters, environment variables, file handles, and other junk.
  • You have to know what a DDNAME is.  This is like a HANDLE in the JCL code that refers to a file.  The file has a filename (DSNAME), but you don’t typically use that.  Instead the JCL’s job step declares an arbitrary DDNAME to refer to that handle, and the program that is run in that job step has to always refer to the file using that abstract handle.
  • The file has all sorts of esoteric attributes that you have to know about to access it properly (fixed, variable, blocked, record length, block size, …).  The program that accesses the file typically has to make sure that these attributes are all encoded with the equivalent language specific syntax.
  • Files are not typically just byte streams on the mainframe but can have internal structure that can be as complicated as a simple database (keyed records, with special modes to access them to initialize vs access/modify.)
  • To make life extra “fun”, files are found in a variety of EBCDIC code pages.  In some cases these can’t be converted to single byte iso-8859-X code pages, so you have to use utf-8, and can get into trouble if you want to do round trip conversions.
  • Because of the internal structure of a mainframe file, you may not be able to transfer it to a sane operating system unless special steps are taken.  For example, a variable format file with binary data would typically have to be converted to a fixed format representation so that it’s possible to seek from record to record.
  • Within the (COBOL) code you have three sets of attributes that you have to specify to “declare” a file, before you can even attempt to open it: the DDNAME to COBOL-file-name mapping (SELECT), the FD clause (file properties), and finally record declarations (global variables that mirror the file data record structure that you have to use to read and write the file.)

You can’t just learn to program COBOL, like you would any sane programming language, but also have to learn all the mainframe concepts that the COBOL code is dependent on.  Make sure you are close enough to your eyewash station before you start!

Mainframe development: a story, chapter 1.

April 19, 2018 Mainframe , , , , , , , , , , , , ,

Once upon a time, in a land far from any modern developers, were languages named COBOL and PL/I, which generated programs that were consumed by a beast known as Mainframe. Developers for those languages compiled and linked their applications huddled around strange luminous green screens and piles of hole filled papers while chanting vaguely latin sounding incantations like “Om-padre-JCL-beget-loadmodule-pee-dee-ess.”

In these ancient times, version control tools like git were not available. There was no notion of makefiles, so compilation and link was a batch process, with no dependency tracking, and no parallelism. Developers used printf-style debugging, logging trace information to files.  In order to keep the uninitiated from talking to the Mainframe, files were called datasets.  In order to use graphical editors, developers had to repeatedly feed their source to the Mainframe using a slave named ftp, while praying that the evil demon EBCDIC-conversion didn’t mangle their work. The next day, they could go back and see if Mainframe accepted their offering.

[TO BE CONTINUED.]

Incidentally, as of a couple days ago, I’ve now been working for LzLabs for 2 years.  My work is not yet released, nor announced, so I can’t discuss it here yet, but it can be summarized as really awesome.  I’m still having lots of fun with my development work, even if I have to talk in languages that the beast understands.