c++

A less evil COBOL toy complex number library

December 29, 2023 COBOL , , , , , , , , , ,

In a previous post ‘The evil of COBOL: everything is in global variables’, I discussed the implementation of a toy complex number library in COBOL.

That example code was a single module, having one paragraph for each function. I used a naming convention to work around the fact that COBOL functions (paragraphs) are completely braindead and have no input nor output parameters, and all such functions in a given loadmodule have access to all the variables of the entire program.

Perhaps you try to call me on my claim that COBOL doesn’t have parameters, nor does it have return values.  That’s true if you consider COBOL paragraphs to be the equivalent to functions.  I’ve heard paragraphs described as not-really-functions, and there’s some truth to that, especially since you can do a PERFORM range that executes a set of paragraphs, and there can be non-intuitive control flow transfers between paragraphs of such a range of execution, that is entirely non-function like.

There is one circumstance where COBOL parameters can be found.  It is actually possible to have both input and output parameters in COBOL, but it can only be done at a program level (i.e.: int main( int argc, char ** )). So, you can write a less braindead COBOL library, with a set of meaningful input and output parameters for each function, by using CALL instead of PERFORM, and a whole set of external programs, one for each of the operations that is desired. With that in mind, I’ve reworked my COBOL complex number toy library to use this program-level organization.  This is still a toy library implementation, but serves to illustrate the ideas.  The previous paragraph implementation can be found in the same repository, in the ../paragraphs-as-library/ directory.

Here are some examples of the functions in this little library, and examples of calls to them.

Multiply code:

And here’s a call to it:

Notice that I’ve opted to use dynamic calls to the COBOL functions, using a copybook that lists all the possible function names:

This frees me from the constraint of having to use inscrutable 8-character function names, which will get confusing as the library grows.

Like everything in COBOL, the verbosity makes it fairly unreadable, but refactoring all paragraphs into external programs, does make the calling code, and even the library functions themselves, much more readable.  It still takes 49 lines of code, to initialize two complex numbers, multiply them and display them to stdout.

Compare to the same thing in C++, which is 18 lines for a grow-your-own complex implementation with multiply:

#include <iostream>

struct complex{
   double re_;
   double im_;
};

complex mult(const complex & a, const complex & b) {
   // (a + b i)(c + d i) = a c - b d + i( b c + a d) 
   return complex{ a.re_ * b.re_ - a.im_ * b.im_,
                   a.im_ * b.re_ + a.re_ * b.im_ };
}

int main()
{
   complex a{1,2};
   complex b{3,4};
   complex c = mult(a, b);
   std::cout << "c = " << c.re_ << " +(" << c.im_ << ") I\n";

   return 0;
}

and only 11 lines if we use the standard library complex implementation:

#include <iostream>
#include <complex>

using complex = std::complex<double>;

int main() 
{  
   complex a{1,2}; 
   complex b{3,4};
   complex c = a * b;
   std::cout << "c = " << c << "\n";

   return 0;
}

Basically, we have one line for each operation: init, init, multiply, display, and all the rest is one-time fluff (the includes, main decl, return, …)

It turns out that the so-called OBJECT oriented COBOL extension to the language (circa Y2K), is basically a packaging of external-style-programs into collections that are class prefixed, just as I’ve done above.  This provides the capability for information hiding, and allows functions to have parameters and return values.  However, this doesn’t try to rectify the fundamental failure of the COBOL language: everything has to be in a global variable.  This language extension appears to be a hack that may have been done primarily for Java integration, which is probably why nobody uses it.  You just can’t take the dinosaur out of COBOL.

Sadly, it didn’t take people long to figure out that it’s incredibly dumb to require all variables to be global.  Even PL/I, which is 59 years old at the time I write this (only five years younger than COBOL), got it right.  They added parameters and return values to functions, and allow functions to have variables that are limited to that scope.  PL/I probably went too far, and added lots of features that are also braindead (like the PL/I macro preprocessor), but the basic language is at least sane.  It’s interesting that COBOL never evolved.  A language like C++ may have evolved too much, and still is, but the most flagrant design flaw in the COBOL language has been there since inception, despite every other language in the world figuring out that sort of stupidity should not be propagated.

Note that I work on the development of a COBOL and PL/I compilation stack.  I really like my work, which is challenging and great fun, and I work with awesome people. That doesn’t stop me from acknowledging that COBOL is a language spawned in hell by Satan. I can love my work, which provides tools for customers allowing them to develop, maintain and debug COBOL code, but also have great pity and remorse for those customers, having inherited ancient code built with an ancient language, and having no easy migration path away from that language.

C++ compiler diagnostic gone horribly wrong: error: explicit specialization in non-namespace scope

September 23, 2022 C/C++ development and debugging. , , , , , , , ,

Here is a g++ error message that took me an embarrassingly long time to figure out:

In file included from /home/llvm-project/llvm/lib/IR/Constants.cpp:15:
/home/llvm-project/llvm/lib/IR/LLVMContextImpl.h:447:11: error: explicit specialization in non-namespace scope ‘struct llvm::MDNodeKeyImpl<llvm::DIBasicType>’
 template <> struct MDNodeKeyImpl<DIStringType> {
           ^

This is the code:

template <> struct MDNodeKeyImpl<DIStringType> {
  unsigned Tag;
  MDString *Name;
  Metadata *StringLength;
  Metadata *StringLengthExp;
  Metadata *StringLocationExp;
  uint64_t SizeInBits;
  uint32_t AlignInBits;
  unsigned Encoding;

This specialization isn’t materially different than the one that preceded it:

template <> struct MDNodeKeyImpl<DIBasicType> {
  unsigned Tag;
  MDString *Name;
  MDString *PictureString;
  uint64_t SizeInBits;
  uint32_t AlignInBits;
  unsigned Encoding;
  unsigned Flags;
  Optional<DIBasicType::DecimalInfo> DecimalAttrInfo;

  MDNodeKeyImpl(unsigned Tag, MDString *Name, MDString *PictureString,
               uint64_t SizeInBits, uint32_t AlignInBits, unsigned Encoding,
                unsigned Flags,
                Optional<DIBasicType::DecimalInfo> DecimalAttrInfo)
      : Tag(Tag), Name(Name), PictureString(PictureString),
        SizeInBits(SizeInBits), AlignInBits(AlignInBits), Encoding(Encoding),
        Flags(Flags), DecimalAttrInfo(DecimalAttrInfo) {}
  MDNodeKeyImpl(const DIBasicType *N)
      : Tag(N->getTag()), Name(N->getRawName()), PictureString(N->getRawPictureString()), SizeInBits(N->getSizeInBits()),
        AlignInBits(N->getAlignInBits()), Encoding(N->getEncoding()),
        Flags(N->getFlags(), DecimalAttrInfo(N->getDecimalInfo()) {}

  bool isKeyOf(const DIBasicType *RHS) const {
    return Tag == RHS->getTag() && Name == RHS->getRawName() &&
           PictureString == RHS->getRawPictureString() &&
           SizeInBits == RHS->getSizeInBits() &&
           AlignInBits == RHS->getAlignInBits() &&
           Encoding == RHS->getEncoding() && Flags == RHS->getFlags() &&
           DecimalAttrInfo == RHS->getDecimalInfo();
  }

  unsigned getHashValue() const {
    return hash_combine(Tag, Name, SizeInBits, AlignInBits, Encoding);
  }
};

However, there is an error hiding above it on this line:

        Flags(N->getFlags(), DecimalAttrInfo(N->getDecimalInfo()) {}

i.e. a single missing brace in the initializer for the Flags member, a consequence of a cut and paste during rebase that clobbered that one character, when adding a comma after it.

It turns out that the compiler was giving me a hint that something was wrong before this in the message:

error: explicit specialization in non-namespace scope

as it states that the scope is:

‘struct llvm::MDNodeKeyImpl

which is the previous class definition. Inspection of the code made me think that the scope was ‘namespace llvm {…}’, and I’d gone looking for a rebase error that would have incorrectly terminated that llvm namespace scope. This is a classic example of not paying enough attention to what is in front of you, and going off looking based on hunches instead. I didn’t understand the compiler message, but in retrospect, non-namespace scope meant that something in that scope was incomplete. The compiler wasn’t smart enough to tell me that the previous specialization was completed due to the missing brace, but it did tell me that something was wrong in that previous specialization (which was explicitly named), and I didn’t look at that because of my “what the hell does that mean” reaction to the compilation error message.

In this case, I was building on RHEL8.3, which uses an ancient GCC toolchain. I wonder if newer versions of g++ fare better (i.e.: a message like “possibly unterminated brace on line …” would have been much nicer)? I wasn’t able to try with clang++ as I was building llvm+clang+lldb (V14), and had uninstalled all of the llvm related toolchain to avoid interference.

The C compiler is too forgiving! sizeof(variable_name+1) allowed?

April 28, 2022 C/C++ development and debugging. , , , ,

I carelessly passed:

sizeof(s.st_size+1)

to an allocator call, instead of:

s.st_size+1

and corrupted memory nicely.

What the hell would sizeof(variable+1) even mean, and why on earth would the compiler think that is anything close to valid? Both gcc and clang, each with -Wall, are completely quiet about this error!

Debugging a C coding error from an XPLINK assembly listing.

March 12, 2021 C/C++ development and debugging., Mainframe , , , , , , ,

There are at least two\({}^1\) z/OS C calling conventions, the traditional “LE” OSLINK calling convention, and the newer\({}^2\) XPLINK convention.  In the LE calling convention, parameters aren’t passed in registers, but in an array pointed to by R1.  Here’s an example of an OSLINK call to strtof():

*  float strtof(const char *nptr, char **endptr);
LA       r0,ep(,r13,408)
LA       r2,buf(,r13,280)
LA       r4,#wtemp_1(,r13,416)
L        r15,=V(STRTOF)(,r3,4)
LA       r1,#MX_TEMP3(,r13,224)
ST       r4,#MX_TEMP3(,r13,224)
ST       r2,#MX_TEMP3(,r13,228)
ST       r0,#MX_TEMP3(,r13,232)
BASR     r14,r15
LD       f0,#wtemp_1(,r13,416)

R1 is pointed to r13 + 224 (a location on the stack). If the original call was:

float f = strtof( mystring, &err );

The compiler has internally translated it into something of the form:

STRTOF( &f, mystring, &err );

where all of {&f, mystring, &err} are stuffed into the memory starting at the 224(R13) location. Afterwards the value has to be loaded from memory into a floating point register (F0) so that it can be used.  Compare this to a Linux strtof() call:

* char * e = 0;
* float x = strtof( "1.0", &e );
  400b74:       mov    $0x400ef8,%edi       ; first param is address of "1.0"
  400b79:       movq   $0x0,0x8(%rsp)       ; e = 0;
  400b82:       lea    0x8(%rsp),%rsi       ; second param is &e
  400b87:       callq  400810 <strtof@plt>  ; call the function, returning a value in %xmm0

Here the input parameters are RDI, RSI, and the output is XMM0. Nice and simple. Since XPLINK was designed for C code, we expect it to be more sensible. Let’s see what an XPLINK call looks like. Here’s a call to fmodf:

*      float r = fmodf( 10.0f, 3.0f );
            LD       f0,+CONSTANT_AREA(,r9,184)
            LD       f2,+CONSTANT_AREA(,r9,192)
            L        r7,#Save_ADA_Ptr_9(,r4,2052)
            L        r6,=A(__fmodf)(,r7,76)
            L        r5,=A(__fmodf)(,r7,72)
            BASR     r7,r6
            NOP      9
            LDR      f2,f0
            STE      f2,r(,r4,2144)
*
*      printf( "fmodf: %g\n", (double)r );

There are some curious details that would have to be explored to understand the code above (why f0, f2, and not f0,f1?), however, the short story is that all the input and output values in (floating point) registers.

The mystery that led me to looking at this was a malfunctioning call to strtof:

*      float x = strtof( "1.0q", &e );
            LA       r2,e(,r4,2144)
            L        r7,#Save_ADA_Ptr_12(,r4,2052)
            L        r6,=A(strtof)(,r7,4)
            L        r5,=A(strtof)(,r7,0)
            LA       r1,+CONSTANT_AREA(,r9,20)
            BASR     r7,r6
            NOP      17
            LR       r0,r3
            CEFR     f2,r0
            STE      f2,x(,r4,2148)
*
*      printf( "strtof: v: %g\n", x );

The CEFR instruction converts an integer to a (hfp32) floating point representation, so we appear to have strtof returning it’s value in R3, which is an integer register. That then gets copied into R0, and finally into F2 (and after that into a stack spill location before the printf call.) I scratched my head about this code for quite a while, trying to figure out if the compiler had some mysterious way to make this work that I wasn’t figuring out. Eventually, I clued in. I’m so used to using a C++ compiler that I forgot about the old style implicit int return for an unprototyped function. But I had included <stdlib.h> in this code, so strtof should have been prototyped? However, the Language Runtime reference specifies that on z/OS you need an additional define to have strtof visible:

#define _ISOC99_SOURCE
#include <stdlib.h>

Without the additional define, the call to strtof() is as if it was prototyped as:

int strtof( const char *, char ** );

My expectation is that with such a correction, the call to strtof() should return it’s value in f0, just like fmodf() does. The result should also not be garbage!

 

Footnotes:

  1.  There is also a “metal” compiler and probably a different calling convention to go with that.  I don’t know how metal differs from XPLINK.
  2. Newer in the lifetime of the mainframe means circa 2001, which is bleeding edge given the speed that mainframe development moves.

Playing with c++11 and posix regular expression libraries

July 24, 2016 C/C++ development and debugging. , , , , , , , , ,

I was curious how the c++11 std::regex interface compared to the C posix regular expression library. The c++11 interfaces are almost as easy to use as perl. Suppose we have some space separated fields that we wish to manipulate, showing an order switch and the original:

my @strings = ( "hi bye", "hello world", "why now", "one two" ) ;

foreach ( @strings )
{
   s/(\S+)\s+(\S+)/'$&' -> '$2 $1'/ ;

   print "$_\n" ;
}

The C++ equivalent is

   const char * strings[] { "hi bye", "hello world", "why now", "one two" } ;

   std::regex re( R"((\S+)\s+(\S+))" ) ;

   for ( auto s : strings )
   {
      std::cout << regex_replace( s, re, "'$&' -> '$2 $1'\n" )  ;
   }

We have one additional step with the C++ code, compiling the regular expression. Precompilation of perl regular expressions is also possible, but that is usually just as performance optimization.

The posix equivalent requires precompilation too

void posixre_error( regex_t * pRe, int rc )
{
   char buf[ 128 ] ;

   regerror( rc, pRe, buf, sizeof(buf) ) ;

   fprintf( stderr, "regerror: %s\n", buf ) ;
   exit( 1 ) ;
}

void posixre_compile( regex_t * pRe, const char * expression )
{
   int rc = regcomp( pRe, expression, REG_EXTENDED ) ;
   if ( rc )
   { 
      posixre_error( pRe, rc ) ;
   }
}

but the transform requires more work:

void posixre_transform( regex_t * pRe, const char * input )
{
   constexpr size_t N{3} ;
   regmatch_t m[N] {} ;

   int rc = regexec( pRe, input, N, m, 0 ) ;

   if ( rc && (rc != REG_NOMATCH) )
   {
      posixre_error( pRe, rc ) ;
   }

   if ( !rc )
   { 
      printf( "'%s' -> ", input ) ;
      int len ;
      len = m[2].rm_eo - m[2].rm_so ; printf( "'%.*s ", len, &input[ m[2].rm_so ] ) ;
      len = m[1].rm_eo - m[1].rm_so ; printf( "%.*s'\n", len, &input[ m[1].rm_so ] ) ;
   }
}

To get at the capture expressions we have to pass an array of regmatch_t’s. The first element of that array is the entire match expression, and then we get the captures after that. The awkward thing to deal with is that the regmatch_t is a structure containing the start end end offset within the string.

If we want more granular info from the c++ matcher, it can also provide an array of capture info. We can also get info about whether or not the match worked, something we can do in perl easily

my @strings = ( "hi bye", "helloworld", "why now", "onetwo" ) ;

foreach ( @strings )
{
   if ( s/(\S+)\s+(\S+)/$2 $1/ )
   {
      print "$_\n" ;
   }
}  

This only prints the transformed line if there was a match success. To do this in C++ we can use regex_match

const char * pattern = R"((\S+)\s+(\S+))" ;

std::regex re( pattern ) ;

for ( auto s : strings )
{ 
   std::cmatch m ;

   if ( regex_match( s, m, re ) )
   { 
      std::cout << m[2] << ' ' << m[1] << '\n' ;
   }
}

Note that we don’t have to mess around with offsets as was required with the Posix C interface, and also don’t have to worry about the size of the capture match array, since that is handled under the covers. It’s not too hard to do wrap the posix C APIs in a C++ wrapper that makes it about as easy to use as the C++ regex code, but unless you are constrained to using pre-C++11 code and can also live with a Unix only restriction. There are also portability issues with the posix APIs. For example, the perl-style regular expressions like:

   R"((\S+)(\s+)(\S+))" ) ;

work fine with the Linux regex API, but that appears to be an exception. To make code using that regex work on Mac, I had to use strict posix syntax

   R"(([^[:space:]]+)([[:space:]]+)([^[:space:]]+))"

Actually using the Posix C interface, with a portability constraint that avoids the Linux regex extensions, would be horrendous.