## An invalid transformation of a COBOL data description entry

August 28, 2020 COBOL No comments , , , , ,

Here’s a subtle gotcha that we saw recently.  A miraculous tool transformed some putrid DELTA generated COBOL code from GOTO soup into human readable form.  Among the transformations that this tool did, were modifications to working storage data declarations (removing unused variables in the source, and simplifying some others).  One of those transformations was problematic.  In that problematic case the pre-transformed declarations were:

This declaration is basically a union of char[8] with a structure that has four char[2]’s, with the COBOL language imposed restriction that the character values can be only numeric (EBCDIC) digits (i.e. ‘\xF0’, …, ‘\xF9’).  In the code in question none of the U044-BIS* variables (neither the first, nor the aliases) were ever used explicitly, but they were passed into another COBOL program as LINKAGE SECTION variables and used in the called program.

Here’s how the tool initially transformed the declaration:

It turns out that dropping that first PIC and removing the corresponding REDEFINES clause, was an invalid transformation in this case, because the code used INITIALIZE on the level 01 object that contained these variables.

On page 177, of the “178 Enterprise COBOL for z/OS: Enterprise COBOL for z/OS, V6.3 Language Reference”, we have:

FILLER
A data item that is not explicitly referred to in a program. The keyword FILLER is optional. If specified,
FILLER must be the first word following the level-number.

… snip …

In an INITIALIZE statement:
• When the FILLER phrase is not specified, elementary FILLER items are ignored.

The transformation of the code in question would have been correct provided the “INITIALIZE foo” was replaced with “INITIALIZE foo WITH FILLER”.  The bug in the tool was fixed, and the transformed code in question was, in this case, changed to drop all the aliasing:

As a side effect of encountering this issue, I learned a number of things:

• FILLER is actually a COBOL language keyword, with specific semantics, and not just a variable naming convention.
• Both ‘INITIALIZE’ and ‘INITIALIZE … WITH FILLER’ are allowed.
• INITIALIZE (without FILLER) doesn’t do PIC appropriate initialization of FILLER variables (we had binary zeros instead of EBCDIC zeros as a result.)

## Listing the code pages for gdb ‘set target-charset’

I wanted to display some internal state as an IBM-1141 codepage, but didn’t know the name to use.  I knew that EBCDIC-US could be used for IBM-1047, but gdb didn’t like ibm-1147:

(gdb) set target-charset EBCDIC-US
(gdb) p (char *)0x7ffbb7b58088
$2 = 0x7ffbb7b58088 "{Jim ;012}", ' ' <repeats 104 times> (gdb) set target-charset ibm-1141 Undefined item: "ibm-1141".  I’d either didn’t know or had forgotten that we can get a list of the supported codepages. The help shows this: (gdb) help set target-charset Set the target character set. The target character set' is the one used by the program being debugged. GDB translates characters and strings between the host and target character sets as needed. To see a list of the character sets GDB supports, type set target-charset'<TAB>  I had to hit tab twice, but after doing so, I see: (gdb) set target-charset Display all 200 possibilities? (y or n) 1026 866 ARABIC7 CP-HU CP1129 CP1158 CP1371 CP4517 CP856 CP903 1046 866NAV ARMSCII-8 CP037 CP1130 CP1160 CP1388 CP4899 CP857 CP904 1047 869 ASCII CP038 CP1132 CP1161 CP1390 CP4909 CP860 CP905 10646-1:1993 874 ASMO-708 CP1004 CP1133 CP1162 CP1399 CP4971 CP861 CP912 10646-1:1993/UCS4 8859_1 ASMO_449 CP1008 CP1137 CP1163 CP273 CP500 CP862 CP915 437 8859_2 BALTIC CP1025 CP1140 CP1164 CP274 CP5347 CP863 CP916 500 8859_3 BIG-5 CP1026 CP1141 CP1166 CP275 CP737 CP864 CP918 500V1 8859_4 BIG-FIVE CP1046 CP1142 CP1167 CP278 CP770 CP865 CP920 850 8859_5 BIG5 CP1047 CP1143 CP1250 CP280 CP771 CP866 CP921 851 8859_6 BIG5-HKSCS CP1070 CP1144 CP1251 CP281 CP772 CP866NAV CP922 852 8859_7 BIG5HKSCS CP1079 CP1145 CP1252 CP282 CP773 CP868 CP930 855 8859_8 BIGFIVE CP1081 CP1146 CP1253 CP284 CP774 CP869 CP932 856 8859_9 BRF CP1084 CP1147 CP1254 CP285 CP775 CP870 CP933 857 904 BS_4730 CP1089 CP1148 CP1255 CP290 CP803 CP871 CP935 860 ANSI_X3.110 CA CP1097 CP1149 CP1256 CP297 CP813 CP874 CP936 861 ANSI_X3.110-1983 CN CP1112 CP1153 CP1257 CP367 CP819 CP875 CP937 862 ANSI_X3.4 CN-BIG5 CP1122 CP1154 CP1258 CP420 CP850 CP880 CP939 863 ANSI_X3.4-1968 CN-GB CP1123 CP1155 CP1282 CP423 CP851 CP891 CP949 864 ANSI_X3.4-1986 CP-AR CP1124 CP1156 CP1361 CP424 CP852 CP901 CP950 865 ARABIC CP-GR CP1125 CP1157 CP1364 CP437 CP855 CP902 auto *** List may be truncated, max-completions reached. ***  There’s my ibm-1141 in there, but masquerading as CP1141, so I’m able to view my data in that codepage, and lookup the value of characters of interest in 1141: (gdb) set target-charset CP1141 (gdb) p (char *)0x7ffbb7b58088$3 = 0x7ffbb7b58088 "äJim       ;012ü", ' ' <repeats 104 times>
(gdb) p /x '{'
$4 = 0x43 (gdb) p /x '} Unmatched single quote. (gdb) p /x '}'$5 = 0xdc
(gdb) p /x *(char *)0x7ffbb7b58088
\$6 = 0xc0


I’m able to conclude that the buffer in question appears to be in CP1047, not CP1141 (the first character, which is supposed to be ‘{‘ doesn’t have the CP1141 value of ‘{‘).

## Using the debugger to understand COBOL level 88 semantics.

August 10, 2020 COBOL No comments , ,

COBOL has an enumeration mechanism called a LEVEL-88 variable.  I found a few aspects of this counter-intuitive, as I’ve mentioned in a few previous posts.  With the help of the debugger, I now think I finally understand what’s going on.  Here’s an example program that uses a couple of LEVEL-88 variables:

We can use a debugger to discover the meaning of the COBOL code. Let’s start by single stepping past the first MOVE statement to just before the SET MY88-VAR-1:

Here, I’m running the program LZ000550 from a PDS named COBRC.NATIVE.LZ000550. We expect EBCDIC spaces (‘\x40’) in the FEEDBACK structure at this point and that’s what we see:

Line stepping past the SET statement, our structure memory layout now looks like:

By setting the LEVEL-88 variable to TRUE all the memory starting at the address of feedback is now overwritten by the numeric value 0x0000000141C33A3BL. If we continue the program, the SYSOUT ends up looking like:

This condition was expected.
This condition was expected.


The first ‘IF MY88-VAR-1 THEN’ fires, and after the subsequent ‘SET MY88-VAR-2 TO TRUE’, the second ‘IF MY88-VAR-2 THEN’ fires. The SET overwrites the structure memory at the point that the 88 was declared, and an IF check of that same variable name checks if the memory in that location has the value in the 88 variable. It does not matter what the specific layout of the structure is at that point. We see that an IF check of the level-88 variable just tests whether or not the value at that address has the pattern specified in the variable. In this case, we have only on level-88 variable with the given name in the program, so the ‘IF MY88-VAR-2 OF feedback’ that was used was redundant, and could have been coded as just ‘IF MY88-VAR-2’, or could have been coded as ‘IF MY88-VAR-2 OF CONDITION-TOKEN-VALUE of Feedback’

We can infer that the COBOL code’s WORKING-STORAGE has the following equivalent C++ layout:

struct CONDITION_TOKEN_VALUE
{
short SEVERITY;
short MSG_NO;
char CASE_SEV_CTL;
char FACILITY_ID[3];
};

enum my88_vars
{
MY88_VAR_1 = 0x0000000141C33A3BL,
MY88_VAR_2 = 0x0000000241C33A3BL
};

struct feedback
{
union {
CONDITION_TOKEN_VALUE c;
my88_vars e;
} u;
int I_S_INFO;
};


and that the control flow of the program can be modeled as the following:

//...
feedback f;

int main()
{
memset( &f, 0x40, sizeof(f) );
f.u.e = MY88_VAR_1;
if ( f.u.e == MY88_VAR_2 )
{
impossible();
}
if ( f.u.e == MY88_VAR_1 )
{
expected();
}

f.u.e = MY88_VAR_2;
if ( f.u.e == MY88_VAR_1 )
{
impossible();
}
if ( f.u.e == MY88_VAR_2 )
{
expected();
}

return 0;
}


Things also get even more confusing if the LEVEL-88 variable specifies less bytes than the structure that it is embedded in.  In that case, SET of the variable pads out the structure with spaces and a check of the variable also looks for those additional trailing spaces:

The CONDITION-TOKEN-VALUE object uses a total of 8 bytes.  We can see the spaces in the display of the FEEDBACK structure if we look in the debugger:

See the four trailing 0x40 spaces here.

Incidentally, it can be hard to tell what the total storage requirements of a COBOL structure is by just looking at the code, because the mappings between digits and storage depends on the usage clauses.  If the structure also uses REDEFINES clauses (embedded unions), as was the case in the program that I was originally looking at, the debug output is also really nice to understand how big the various fields are, and where they are situated.

Here are a few of the lessons learned:

• You might see a check like ‘IF MY88-VAR-1 THEN’, but nothing in the program explicitly sets MY88-VAR-1. It is effectively a global variable value that is set as a side effect of some other call (in the real program I was looking at, what modified this “variable” was actually a call to CEEFMDA, a LE system service.) We have pass by reference in the function calls so it can be a reverse engineering task to read any program and figure out how any given field may have been modified, and that doesn’t get any easier by introducing LEVEL-88 variables into the mix.
• This effective enumeration mechanism is not typed in any sense. Correct use of a LEVEL-88 relies on the variables that follow it to have the appropriate types. In this case, the ‘IF MY88-VAR-1 THEN’ is essentially shorthand for:
• There is a disconnect between the variables modified or checked by a given LEVEL-88 variable reference that must be inferred by the context.
• An IF check of a LEVEL-88 variable may include an implicit check of the trailing part of the structure with EBCDIC spaces, if the fields that follow the 88 variable take more space than the value of the variable. Similarly, seting such a variable to TRUE may effectively memset a trailing subset of the structure to EBCDIC spaces.
• Exactly what is modified by a given 88 variable depends on the context.  For example, if the level 88 variables were found in a copybook, and if I had a second structure that had the same layout as FEEDBACK, with both structures including that copybook, then I’d have two instances of this “enumeration”, and would need a set of “OF foo OF BAR” type clauses to disambiguate things.  Level 88 variables aren’t like a set of C defines.  Their meaning is context dependent, even if they masquerade as constants.

## Small update to “Basic Statistical Mechanics” is now live.

August 8, 2020 Uncategorized No comments

A new version of these notes is now posted, available on amazon, leanpub, and as a free pdf:

phy452.V0.1.12.pdf, Wed Aug 5, 2020 (commit 7bbcdf66b26e950fa01ae6cbae86f987bc2c8d49)

• Fix hyphens in listing, typos in bio.
• Remove appendix part so that the index and bib aren’t grouped with the appendix.
• Tweak the preface and backcover
• Group intro probability text together, and expand on probability distribution definition.
• Remove singlton part heading so that chapters are the highest level.
• Fix pdfbookmarks for contents and list of figures (so that they don’t show up under the preface)
• Streamline FrontBack specialization.

These are mostly cosmetic changes, where my primary objective was to correct the bash listing that shows the reader how to make their own git clone of the book text.

## Leanpub editions of my books.

August 5, 2020 Uncategorized No comments ,

I’d had a leanpub version of my geometric algebra book available for a while and have now added editions of all my older class notes compilations that I have on amazon.  My complete leanpub selection now looks like:

I believe that leanpub essentially provides a pdf to the purchaser (I haven’t tried buying a copy to verify), and I give the pdfs away for free, so you (and I) might ask why somebody would opt to buy such a copy?

There are a few possible reasons that I can think of:

1. Many of the leanpub purchases have been above the minimum price, so at least some of the purchasers are compensating proportionally to their personal valuation of the material, and aren’t strictly trying to buy for the minimum price.
2. A leanpub purchase is subscription like.  Anybody that purchases a copy will automatically receive any updates made without having to check for a new version manually.
3. There is a per-book forum available for each of the books (if the author enables it.)  I didn’t realize that feature was available, and have now enabled the forum for my geometric algebra book.  I’ve also enabled a forum for each of the class notes compilations as I configured them.
4. The purchaser did not know that I also offer the pdf for free, and found the title in leanpub search, not through my website where I make that obvious.

I’ve been putting all my leanpub proceeds into my kiva loan portfolio, so if somebody had the bad luck to buy a copy of my book because of (4) above, I don’t feel very guilty about it.

## [Part 1. Arrow representation of vectors] An introduction to geometric algebra.

August 2, 2020 Geometric Algebra for Electrical Engineers No comments

This is a continuation of:

# Vectors.

Cast yourself back in time, all the way to high school, where the first definition of vector that you would have encountered was probably very similar to the one made famous by the not very villainous Vector in Despicable Me [4].  His definition was not complete, but it is a good starting point:

### Definition: Vector. A vector is a quantity represented by an arrow with both direction and magnitude.

All the operations that make vectors useful are missing from this definition, such as

• a comparison operator,
• a rescaling operation (i.e. a scalar multiplication operation that changes the length),
• an operator that provides the length of a vector,
• multiplication or multiplication like operations.

The concept of vector, once supplemented with the operations above, will be useful since it models many directed physical quantities that we experience daily.  These include velocity, acceleration, forces, and electric and magnetic fields.

## Vector comparison.

In fig. 1.1 (a), we have three vectors, labelled $$\Ba, \Bb, \Bc$$, all with different directions and magnitudes, and in fig. 1.1 (b), those vectors have each been translated (moved without rotation or change of length) slightly. Two vectors are considered equal if they have the same direction and magnitude. That is, two vectors are equal if one is the image of the other after translation. In these figures $$\Ba \ne \Bb, \Bb \ne \Bc, \Bc \ne \Ba$$, whereas any same colored vectors are equal.

Figure 1.1 (a): Three vectors

Figure 1.1 (b): Example translations of three vectors.

## Vector (scalar) multiplication.

We can multiply vectors by scalars by changing their lengths appropriately.

In this context a scalar is a real number (this is purposefully vague, as it will be useful to allow scalars to be complex valued later.)

Using the example vectors, some rescaled vectors include $$2 \Ba, (-1) \Bb, \pi \Bc$$, as illustrated in fig. 1.2.

Figure. 1.2 Scaled vectors.

Scalar multiplication implicitly provides an algorithm for addition of vectors that have the same direction, as $$s \Bx + t \Bx = (s+t) \Bx$$ for any scalars $$s, t$$. This is illustrated in fig. 1.3 where $$2 \Ba = \Ba + \Ba$$ is formed in two equivalent forms. We see that the addition of two vectors that have the same direction requires lining up those vectors head to tail. The sum of two such vectors is the vector that can be formed from the first tail to the final head.

Figure 1.3. Twice a vector.

It turns out that this arrow daisy chaining procedure is an appropriate way of defining addition for any vectors.

### Definition: Vector addition. The sum of two vectors can be found by connecting those two vectors head to tail in either order. The sum of the two vectors is the vector that can be formed by drawing an arrow from the initial tail to the final head. This can be generalized by chaining any number of vectors and joining the initial tail to the final head.

This addition procedure is illustrated in fig. 1.4, where $$\Bs = \Ba + \Bb + \Bc$$ has been formed.

Figure 1.x: Friends pulling on your arms.

## Vector subtraction.

Since we can scale a vector by $$-1$$ and we can add vectors, it is clear how to define vector subtraction

### Definition: Vector subtraction. The difference of vectors $$\Ba, \Bb$$ is \begin{equation*} \Ba – \Bb \equiv \Ba + ((-1)\Bb). \end{equation*}

Graphically, subtracting a vector from another requires flipping the direction of the vector to be subtracted (scaling by $$-1$$), , and then adding both head to tail. This is illustrated in fig. 1.5.

Figure 1.5. Vector subtraction.

## Length and what’s to come.

It is easy to compute the length of a vector that has an arrow representation.
One simply lines a ruler of appropriate units along the vector and measures.

We actually want an algebraic way of computing length, but there is some baggage required, including

• Coordinates.
• Bases (plural of basis).
• Linear dependence and independence.
• Dot product.
• Metric.

The next part of this series will cover these topics. Our end goal is geometric algebra, which allows for many coordinate free operations, but we still have to use coordinates, both to read the literature, and in practice. Coordinates and non-orthonormal bases are also a good way to introduce non-Euclidean metrics.

# References

[4] Vector; supervillain extraordinaire (Despicable Me). A quantity represented by an arrow with direction and magnitude. Youtube. URL https://www.youtube.com/watch?v=bOIe0DIMbI8. [Online; accessed 11-July-2020].