c++11

Playing with c++11 and posix regular expression libraries

July 24, 2016 C/C++ development and debugging. No comments , , , , , , , , ,

I was curious how the c++11 std::regex interface compared to the C posix regular expression library. The c++11 interfaces are almost as easy to use as perl. Suppose we have some space separated fields that we wish to manipulate, showing an order switch and the original:

my @strings = ( "hi bye", "hello world", "why now", "one two" ) ;

foreach ( @strings )
{
   s/(\S+)\s+(\S+)/'$&' -> '$2 $1'/ ;

   print "$_\n" ;
}

The C++ equivalent is

   const char * strings[] { "hi bye", "hello world", "why now", "one two" } ;

   std::regex re( R"((\S+)\s+(\S+))" ) ;

   for ( auto s : strings )
   {
      std::cout << regex_replace( s, re, "'$&' -> '$2 $1'\n" )  ;
   }

We have one additional step with the C++ code, compiling the regular expression. Precompilation of perl regular expressions is also possible, but that is usually just as performance optimization.

The posix equivalent requires precompilation too

void posixre_error( regex_t * pRe, int rc )
{
   char buf[ 128 ] ;

   regerror( rc, pRe, buf, sizeof(buf) ) ;

   fprintf( stderr, "regerror: %s\n", buf ) ;
   exit( 1 ) ;
}

void posixre_compile( regex_t * pRe, const char * expression )
{
   int rc = regcomp( pRe, expression, REG_EXTENDED ) ;
   if ( rc )
   { 
      posixre_error( pRe, rc ) ;
   }
}

but the transform requires more work:

void posixre_transform( regex_t * pRe, const char * input )
{
   constexpr size_t N{3} ;
   regmatch_t m[N] {} ;

   int rc = regexec( pRe, input, N, m, 0 ) ;

   if ( rc && (rc != REG_NOMATCH) )
   {
      posixre_error( pRe, rc ) ;
   }

   if ( !rc )
   { 
      printf( "'%s' -> ", input ) ;
      int len ;
      len = m[2].rm_eo - m[2].rm_so ; printf( "'%.*s ", len, &input[ m[2].rm_so ] ) ;
      len = m[1].rm_eo - m[1].rm_so ; printf( "%.*s'\n", len, &input[ m[1].rm_so ] ) ;
   }
}

To get at the capture expressions we have to pass an array of regmatch_t’s. The first element of that array is the entire match expression, and then we get the captures after that. The awkward thing to deal with is that the regmatch_t is a structure containing the start end end offset within the string.

If we want more granular info from the c++ matcher, it can also provide an array of capture info. We can also get info about whether or not the match worked, something we can do in perl easily

my @strings = ( "hi bye", "helloworld", "why now", "onetwo" ) ;

foreach ( @strings )
{
   if ( s/(\S+)\s+(\S+)/$2 $1/ )
   {
      print "$_\n" ;
   }
}  

This only prints the transformed line if there was a match success. To do this in C++ we can use regex_match

const char * pattern = R"((\S+)\s+(\S+))" ;

std::regex re( pattern ) ;

for ( auto s : strings )
{ 
   std::cmatch m ;

   if ( regex_match( s, m, re ) )
   { 
      std::cout << m[2] << ' ' << m[1] << '\n' ;
   }
}

Note that we don’t have to mess around with offsets as was required with the Posix C interface, and also don’t have to worry about the size of the capture match array, since that is handled under the covers. It’s not too hard to do wrap the posix C APIs in a C++ wrapper that makes it about as easy to use as the C++ regex code, but unless you are constrained to using pre-C++11 code and can also live with a Unix only restriction. There are also portability issues with the posix APIs. For example, the perl-style regular expressions like:

   R"((\S+)(\s+)(\S+))" ) ;

work fine with the Linux regex API, but that appears to be an exception. To make code using that regex work on Mac, I had to use strict posix syntax

   R"(([^[:space:]]+)([[:space:]]+)([^[:space:]]+))"

Actually using the Posix C interface, with a portability constraint that avoids the Linux regex extensions, would be horrendous.

some c++11 standard library notes

July 9, 2016 C/C++ development and debugging. No comments , , , , ,

Some notes on Chapter 31, 32 (standard library, STL) of Stroustrup’s “The C++ Programming Language, 4th edition”.

Emplace

I’d never heard the word emplace before, but it turns out that it isn’t a word made up for c++, but is also a dictionary word, meaning to “put into place or position”.

c++11 defines some emplace functions. Here’s an example for vector

#include <vector>
#include <iostream>

int main()
{
   using pair = std::pair<int, int> ;
   using vector = std::vector< pair > ;

   vector v ;

   pair p{ 1, 2 } ;
   v.push_back( p ) ;
   v.push_back( {2, 3} ) ;
   v.emplace_back( 3, 4 ) ;

   for ( auto e : v )
   {
      std::cout << e.first << ", " << e.second << '\n' ;
   }

   return 0 ;
}

The emplace_back is like the push_back function, but does not require that a constructed object be created first, either explicitly as in the object p above, or implictly as done with the {2, 3} pair initializer list.

multimap

I’d written some perl code the other day when I wanted a hash that had multiple entries per key. Since my hashed elememts were simple, I just strung them together as comma separated entries (I could have also used a hash of array references). It looks like c++11 builds exactly the construct that I wanted into STL, and has both a multimap and unordered_multimap. Here’s an example of the latter

#include <unordered_map>
#include <string>
#include <iostream>

int main()
{
   std::unordered_multimap< int, std::string > m ;

   m.emplace( 3, "hi" ) ;
   m.emplace( 3, "bye" ) ;
   m.emplace( 4, "wow" ) ;

   for ( auto & v : m )
   {
      std::cout << v.first << ": " << v.second << '\n' ;
   }
  
   for ( auto f{ m.find(3) } ; f != m.end() ; ++f )
   {
      std::cout << "find: " << f->first << ": " << f->second << '\n' ;
   }
   
   return 0 ;
} 

Running this gives me

$ ./a.out 
4: wow
3: hi
3: bye
find: 3: hi
find: 3: bye

Observe how nice auto is here. I don’t have to care what the typename for the unordered_multimap find result is. According to gdb that type is:

(gdb) whatis f
type = std::__1::__hash_map_iterator<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::__hash_value_type<int, std::__1::basic_string<char> >, void*>*> >

Yikes!

STL

The STL chapter outlines lots of different algorithms. One new powerful feature in c++11 is that the Lambdas can be used instead of predicate function objects, which is so much cleaner. I used that capability in a scientific computing programming assignment earlier this year with partial_sort.

The find_if_not algorthim caught my eye, because I just manually coded exactly that sort of loop translating intel assembly that used ‘REPL SCASB’ instructions, and that code was precisely of this find_if_not form. The c++ equivalent of the assembly was roughly of the following form:

int scan3( const std::string & s, char v )
{
   auto p = s.begin() ;
   for ( ; p != s.end() ; p++ )
   {
      if ( *p != v )
      {
         break ; 
      }
   }

   if ( p == s.end() )
   {
      return 0 ;
   }
   else
   {
      std::cout << "diff: " << p - s.begin() << '\n' ;

      return ( v > *p ) ? 1 : -1 ;
   }

Range for can also be used for this loop, but it is only slightly clearer:

int scan2( const std::string & s, char v )
{
   auto p = s.begin() ;
   for ( auto c : s )
   {
      if ( c != v )
      {
         break ;
      }

      p++ ;
   }

   if ( p == s.end() )
   { 
      return 0 ;
   }
   else
   { 
      std::cout << "diff: " << p - s.begin() << '\n' ;

      return ( v > *p ) ? 1 : -1 ;
   }
}

An STL version of this loop that uses a lambda predicate is

int scan( const std::string & s, char v )
{
   auto i = find_if_not( s.begin(),
                         s.end(),
                         [ v ]( char c ){ return c == v ; }
                       ) ;

   if ( i == s.end() )
   { 
      return 0 ;
   }
   else
   { 
      std::cout << "diff: " << i - s.begin() << '\n' ;

      return ( v > *i ) ? 1 : -1 ;
   }
}

I don’t really think that this is any more clear than explicit for loop versions. All give the same results when tried:

int main()
{
   std::vector< std::function< int( const std::string &, char ) > > v { scan, scan2, scan3 } ;

   for ( auto f : v )
   { 
      int r0 = f( "nnnnn", 'n' ) ;
      int rp = f( "nnnnnmmm", 'n' ) ;
      int rn = f( "nnnnnpnn", 'n' ) ;

      std::cout << r0 << '\n' ;
      std::cout << rp << '\n' ;
      std::cout << rn << '\n' ;
   }

   return 0 ;
}

The compiler does almost the same for all three implementations. With the cout’s removed, and compiling with optimization, the respective instruction counts are:

(gdb) p 0xee3-0xe70
$1 = 115
(gdb) p 0xf4c-0xef0
$2 = 92
(gdb) p 0xfc3-0xf50
$3 = 115

The listings for the STL and the C style for loop are almost the same. The Apple xcode 7 compiler seems to produce slightly more compact code for the range-for version of this function for reasons that are not obvious to me.

c++11 virtual function language changes.

July 1, 2016 C/C++ development and debugging. No comments , , , , , , ,

Chapter 20 of Stroustrup’s book covers a few more new (to me) c++11 features:

  1. override
  2. final
  3. use of using statements for access control.
  4. pointer to member (for data and member functions)

override

The override keyword is really just to make it clear when you are providing a virtual function override.  Because the use of virtual at an override point is redundant, people have used that to explicitly show that the intent is to show the function overrides a base class function. However, if the have the interface erroneously different in the second specification, the use of virtual there means that you are defining a new virtual function.  Here’s a made up example, where the integer type of a virtual function was changed “accidentally” when “overriding” a base class virtual function:

#include <stdio.h>

struct x
{
   virtual void foo( int v ) ;
} ;

struct y : public x
{
   virtual void foo( long v ) ;
} ;

void x::foo( int v ) { printf( "x::foo:%d\n", v ) ; }
void y::foo( long v ) { printf( "y::foo:%ld\n", v ) ; }

Now in c++11 you can be explicit that you intention is to override a base class virtual. Replace the use of the redundant virtual with the override keyword, and the compiler can now tell you if you get things mixed up:

struct x
{
   virtual void foo( int v ) ;
} ;

struct y : public x
{
   void foo( long v ) override ;
} ;

void x::foo( int v )
{
   printf( "x::foo:%d\n", v ) ;
}

void y::foo( long v )
{
   printf( "y::foo:%ld\n", v ) ;
}

This gives a nice compiler message informing you about the error:

$ c++ -std=c++11 -O2 -MMD   -c -o d.o d.cc
d.cc:10:23: error: non-virtual member function marked 'override' hides virtual member function
   void foo( long v ) override ;
                      ^
d.cc:5:17: note: hidden overloaded virtual function 'x::foo' declared here: type mismatch at 1st parameter ('int' vs 'long')
   virtual void foo( int v ) ;
                ^

final

This is a second virtual function modifier designed to cut the performance cost of using virtual functions in some situations. My experimentation with this feature shows the compilers still have more work to do optimizing away the vtable calls. I introduced a square-matrix class that had a single range virtual range checking function:

   void throwRangeError( const indexType i, const indexType j ) const
   { 
      throw rangeError{ i, j, size } ;
   }

   /**
      Introduce a virtual function that allows user selection of optional range error checking.
    */
   virtual void handleRangeError( const indexType i, const indexType j ) const
   { 
      throwRangeError( i, j ) ;
   }

   bool areIndexesOutOfRange( const indexType i, const indexType j ) const
   { 
      if ( (0 == i) or (0 == j) or (i > size) or (j > size) )
      { 
         return true ;
      }

      return false ;
   }

My intent was that a derived class could provide a no-op specialization of handleRangeError:

/**
   Explicitly unchecked matrix element access
 */
class uncheckedMatrix : public matrix
{
public:
   // inherit constructors:
   using matrix::matrix ;

   void handleRangeError( const indexType i, const indexType j ) const final
   {
   }
} ;

This derived class no longer has any virtual functions. Also note that it uses ‘using’ statements to explicitly inherit the base class constructors, which is not a default action (and recommended by Stroustrup only for classes like this that do not add any data members).

The compiler didn’t do too well with this specialization, as calls to the element access operator still took a vtable hit. Here’s some code that when passed a 3×3 matrix object includes out of range accesses:

void outofbounds( const matrix & m, const char * s )
{
   printf( "%s: %g\n", s, m(4,2) ) ;
}

void outofbounds( const checkedMatrix & m, const char * s )
{
   printf( "%s: %g\n", s, m(4,2) ) ;
}

void outofbounds( const uncheckedMatrix & m, const char * s ) noexcept
{
   printf( "%s: %g\n", s, m(4,2) ) ;
}

Here’s the code for the first (base class) matrix class that has virtual functions, but no final overrides:

0000000000000000 <outofbounds(matrix const&, char const*)>:
   0: push   %rbp
   1: mov    %rsp,%rbp
   4: push   %r14
   6: push   %rbx
   7: mov    %rsi,%r14
   a: mov    %rdi,%rbx
   d: mov    0x20(%rbx),%rax
  11: cmp    $0x3,%rax
  15: ja     2d <outofbounds(matrix const&, char const*)+0x2d>
  17: mov    (%rbx),%rax
  1a: mov    $0x4,%esi
  1f: mov    $0x2,%edx
  24: mov    %rbx,%rdi
  27: callq  *(%rax)
  29: mov    0x20(%rbx),%rax
  2d: lea    (%rax,%rax,2),%rax
  31: mov    0x8(%rbx),%rcx
  35: movsd  0x8(%rcx,%rax,8),%xmm0
  3b: lea    0x149(%rip),%rdi        # 18b <__clang_call_terminate+0xb>
         3e: DISP32  .cstring-0x18b
  42: mov    $0x1,%al
  44: mov    %r14,%rsi
  47: pop    %rbx
  48: pop    %r14
  4a: pop    %rbp
  4b: jmpq   50 <outofbounds(checkedMatrix const&, char const*)>
         4c: BRANCH32   printf

The callq instruction is the vtable call. Because this function called through the base class object, and could represent a derived class object, such a call is required. Now look at the code for the uncheckedMatrix class where the handleRangeError() had a no-op final override:

00000000000000a0 <outofbounds(uncheckedMatrix const&, char const*)>:
  a0: push   %rbp
  a1: mov    %rsp,%rbp
  a4: push   %r14
  a6: push   %rbx
  a7: mov    %rsi,%r14
  aa: mov    %rdi,%rbx
  ad: mov    0x20(%rbx),%rax
  b1: cmp    $0x3,%rax
  b5: ja     d0 <outofbounds(uncheckedMatrix const&, char const*)+0x30>
  b7: mov    (%rbx),%rax
  ba: mov    (%rax),%rax
  bd: mov    $0x4,%esi
  c2: mov    $0x2,%edx
  c7: mov    %rbx,%rdi
  ca: callq  *%rax
  cc: mov    0x20(%rbx),%rax
  d0: lea    (%rax,%rax,2),%rax
...

We still have an unnecessary vtable call. This must be a call to handleRangeError(), but that has a final override, and could conceivably be inlined. Some experimentation shows that it is possible to get the desired behaviour (Apple LLVM version 7.3.0 (clang-703.0.31)), but only when the final call is a leaf function. Explicit override of the base class element access operator to omit the check-and-throw logic

/**
   Explicitly unchecked matrix element access
 */
class uncheckedMatrix2 : public matrix
{
public:
   // inherit constructors:
   using matrix::matrix ;

   T operator()( const indexType i, const indexType j ) const
   { 
      return access( i, j ) ;
   }
} ;

has much less horrible code

0000000000000100 <outofbounds(uncheckedMatrix2 const&, char const*)>:
 100: push   %rbp
 101: mov    %rsp,%rbp
 104: mov    0x8(%rdi),%rax
 108: mov    0x20(%rdi),%rcx
 10c: lea    (%rcx,%rcx,2),%rcx
 110: movsd  0x8(%rax,%rcx,8),%xmm0
 116: lea    0x6e(%rip),%rdi        # 18b <__clang_call_terminate+0xb>
         119: DISP32 .cstring-0x18b
 11d: mov    $0x1,%al
 11f: pop    %rbp
 120: jmpq   125 <outofbounds(uncheckedMatrix2 const&, char const*)+0x25>
         121: BRANCH32  printf
 125: data16 nopw %cs:0x0(%rax,%rax,1)

Now we don’t have any of the vtable related epilog and prologue code, nor the indirection required to make such a call. This code isn’t pretty, but isn’t actually that much worse than raw pointer or plain vector access:

void outofbounds( const std::vector<double> m, const char * s ) noexcept
{
   printf( "%s: %g\n", s, m[ 4*3+2-1 ] ) ;
}

void outofbounds( const double * m, const char * s ) noexcept
{
   printf( "%s: %g\n", s, m[ 4*3+2-1 ] ) ;
}

The first generates code like the following:

0000000000000130 <outofbounds(std::__1::vector<double, std::__1::allocator<double> >, char const*)>:
 130: push   %rbp
 131: mov    %rsp,%rbp 
 134: mov    (%rdi),%rax  
 137: movsd  0x68(%rax),%xmm0
 13c: lea    0x48(%rip),%rdi        # 18b <__clang_call_terminate+0xb>
         13f: DISP32 .cstring-0x18b
 143: mov    $0x1,%al
 145: pop    %rbp
 146: jmpq   14b <outofbounds(std::__1::vector<double, std::__1::allocator<double> >, char const*)+0x1b>
         147: BRANCH32  printf
 14b: nopl   0x0(%rax,%rax,1)

Using vector instead of raw array access imposes only a single instruction dereference penalty:

0000000000000150 <outofbounds(double const*, char const*)>:
 150: push   %rbp
 151: mov    %rsp,%rbp
 154: movsd  0x68(%rdi),%xmm0
 159: lea    0x2b(%rip),%rdi        # 18b <__clang_call_terminate+0xb>
         15c: DISP32 .cstring-0x18b
 160: mov    $0x1,%al
 162: pop    %rbp
 163: jmpq   168 <GCC_except_table2>
         164: BRANCH32  printf

With the final override in a leaf function, or a similar explicit hiding of the base class function, we add one additional instruction overhead (one additional load).

pointer to member

This is a somewhat obscure feature. I don’t think that it is new to c++11, but I’ve never seen it used in 20 years. The only thing interesting about it is that the pointer to member objects apparently are entirely offset based, so could be used in shared memory interprocess configurations (where virtual functions cannot!)

Example of writing a class that implements c++11 range based for helpers

June 28, 2016 C/C++ development and debugging. No comments , ,

If a class provides begin and end functions returning iterator objects, and that iterator has a != operator, then the class can be used in a range based for. Here’s an example that allows for iterating over all the bits in an integer. For example, suppose that

0b10101010

is a representation of the set:

128, 32, 8, 2

or

1<<7, 1<<5, 1<<3, 1<<1

We can iterate over the set with a set of bit shifts, and use the following setup to do so

class bititer
{
   unsigned bset ;
   int cur{} ;
 
public:

   bititer( const unsigned b )
      : bset{ b }
   {
   }

   bititer & operator++()
   {
      bset >>= 1 ;
      cur++ ;

      return *this ;
   }

   unsigned operator*()
   {
      unsigned v{} ;

      if ( bset & 1 )
      {
         v = ( 1 << cur ) ;
      } 

      return v ;
   }

   bool operator !=( const bititer & b )
   {
      return ( bset != b.bset ) ;
   }
} ;

Iteration can now be done once a container adapter that provides the begin and end functions is implemented:

struct bitset
{
   unsigned bits ;

   bititer begin()
   {
      return bititer{ bits } ;
   }

   bititer end()
   {
      return bititer{ 0 } ;
   }
} ;

int main()
{
   for ( auto v : bitset{ 0b10101010 } )
   {
      std::cout << v << "\n" ;
   }

   return 0 ;
}

Note that the 0b10101010 syntax is from c++14, not c++11.

Stroustrup reading notes: delagating constructors, default, delete, move, literals

June 26, 2016 C/C++ development and debugging. No comments , , , , , , , ,

Here’s more notes from reading Stroustrup’s “The C++ Programming Language, 4th edition”

Alternate construction methods

I’d seen the new inline member initialization syntax that can be used to avoid (or simplify) explicit constructors. For example, instead of

struct physical
{
   double  c      ;  ///< wave speed
   double  tau    ;  ///< damping time
   double  x1     ;  ///< left most x value
   double  x2     ;  ///< right most x value

   /**
     set physical parameters to some defaults
    */
   physical() ;
} ;

physical::physical() :
   c{ 1.0 },
   tau{ 20.0 },
   x1{ -26.0 },
   x2{ +26.0 }
{
}

You can do

struct physical
{
   double c{ 1.0 }    ;  ///< wave speed
   double tau{ 20.0 } ;  ///< damping time
   double x1{ -26.0 } ;  ///< left most x value
   double x2{ +26.0 } ;  ///< right most x value
} ;

Much less code to write, and you can keep things all in one place. I wondered if this could be combined with constexpr, but the only way I could get that to work was to use static members, which also have to have an explicit definition (at least on Mac) to avoid a link error:

struct p2
{  
   static constexpr double x2{ +26.0 } ;  ///< right most x value
} ;
constexpr double p2::x2 ;

int main()
{  
   p2 p ;

   return p.x2 ;
}

But that is a digression. What I wanted to mention is that, while member initialization is cool, there’s more in the C++11 constructor simplification toolbox. We can write a constructor that builds on the member constructors (if any), but we can also make constructor specialations just call other constructors (called a delegating constructor), like so

struct physical
{
   double c{ 1.0 }    ;  ///< wave speed
   double tau{ 20.0 } ;  ///< damping time
   double x1{ -26.0 } ;  ///< left most x value
   double x2{ +26.0 } ;  ///< right most x value

   physical( const double cv ) : c{cv} {}
   physical( const double x1v, const double x2v ) : x1{x1v}, x2{x2v} {}

   physical( const double cv, const int m ) : physical{cv} { c *= m ; } ;
} ;

Stroustrup points out that the object is considered initialized by the time the delegating constructor is called. So if that throws, we shouldn’t get to the body of the constructor function

#include <iostream>

struct physical
{
   double c{ 1.0 }    ;  ///< wave speed

   physical( const double cv ) { throw 3 ; }

   physical( const double cv, const int m ) : physical{cv} { std::cout << "won't get here\n" ; }
} ;

int main()
try
{
   physical p{5} ;

   return 0 ;
}
catch (...)
{
   return 1 ;
}

default functions

If we define a structure with an explicit constructor with parameters, then unless explicit action is taken, this means that we no longer get a default constructor. Example:

#include <string>

struct F
{
   std::string s{} ;
   
   F( int n ) : s( n, 'a' ) {}
} ;

F x ;

This results in errors because the default constructor has been deleted by defining an explicit constructor

$ c++ -o d -std=c++11 d.cc
d.cc:10:3: error: no matching constructor for initialization of 'F'
F x ;
  ^
d.cc:7:4: note: candidate constructor not viable: requires single argument 'n', but no arguments were provided
   F( int n ) : s( n, 'a' ) {}
   ^
d.cc:3:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
struct F
       ^
d.cc:3:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
1 error generated.

We can get back the default constructor, without having to write it out explictly, by just doing:

#include <string>

struct F
{
   std::string s{} ;
   
   F( int n ) : s( n, 'a' ) {}

   F() = default ;
} ;

F x ;

It wouldn’t be a big deal to define an explicit default constructor above, just

    F() : s{} {}

but for a more complex class, being able to let the compiler do the work is nicer. Using = default also
means that the redundancy of specifying a member initializer and also having to specify the same initializer
in the default constructor member list is not required, which is nicer.

Note that like ‘= default’, you can use ‘= delete’ to tell the compiler not to generate any default for the member (or template specialization, …) if it would have if left unrestricted. This is similar to the trick of making destructors private:

class foo
{
   ~foo() ;
public:
// ...
} ;

Instead in c++11, you can write

class foo
{
public:
   ~foo() = delete ;
// ...
} ;

so instead of the compiler telling you there is unsufficent access to call the destructor, it should be able to tell you that an attempt to use a destructor for a class that has not defined one has been attempted. Note that this can be an explicitly deleted destructor, or one implicitly deleted (see below).

move operations

Back in university I once wrote a matrix class that I was proud of. It was reference counted to avoid really expensive assignment and copy construction operations, which were particularily bad for any binary operation that returned a new value

template <class T>
matrix<T> operator + ( const matrix<T> & a, const matrix<T> & b ) ;

C implementations of an addition operation (like the blas functions), wouldn’t do anything this dumb. Instead they use an interface like

template <class T>
void matrixadd( matrix<T> & r, const matrix<T> & a, const matrix<T> & b ) ;

This doesn’t have the syntactic sugar, but the performance won’t suck as it would if reference counting wasn’t used. I recall having a lot of trouble getting the reference counting just right, and had to instrument all my copy constructors, assignment operators and destructors with trace logging to get it all right. Right also depended on the compiler that was being used! I’ve still got a copy of that code kicking around somewhere, but it can stay where it is out of sight since move operations obsolete it all.

With move constructor and assignment operators, I was suprised to see them not kick in. These were the move operations

/// A simple square matrix skeleton, with instrumented copy, move, construction and destruction operators
class matrix
{
   using T = int ;                  ///< allow for easy future templatization.

   size_t            m_rows ;       ///< number of rows for the matrix.  May be zero.
   size_t            m_columns ;    ///< number of columns for the matrix.  May be zero.
   std::vector<T>    m_elem ;       ///< backing store for the matrix elements, stored in row major format.

public:

   /// move constructor to create 
   matrix( matrix && m )
      : m_rows{ m.m_rows }
      , m_columns{ m.m_columns }
      , m_elem{ std::move(m.m_elem) }
   {  
      m.m_rows = 0 ;
      m.m_columns = 0 ;
      //std::cout << "move construction: " << &m << " to " << this << " ; dimensions: (rows, columns, size) = ( " << rows() << ", " << columns() << ", " << m_elem.size() << " )\n" ;
   }

   /// move assignment operator.
   matrix & operator = ( matrix && m )
   {  
      //std::cout << "move operator=(): " << this << '\n' ;

      std::swap( m_columns, m.m_columns ) ;
      std::swap( m_rows, m.m_rows ) ;
      std::swap( m_elem, m.m_elem ) ;

      return *this ;
   }

   /// Create (dense) square matrix with the specified diagonal elements.
   matrix( const std::initializer_list<T> & diagonals )

//...
} ;

With the following code driving this

matrix f() ;
   
int m1()
{ 
   matrix x1 = f() ; 
   matrix x2 { f() } ;
      
   return x1.rows() + x2.rows() ;
}     

I was suprised to see none of my instrumentation showing for the move operations. That appears to be because the compiler is doing return value optimization, and constructing these in place in the stack storage locations of &x1, and &x2.

To get actual move construction, I have to explicitly ask for move, as in

matrix mg( {4, 5, 6} ) ;

int m0()
{
   matrix x2 { std::move( mg ) } ;

   return x2.rows() ;
}

and to get move assignment I could assign into a variable passed by reference, like

void g( matrix & m )
{
   m = matrix( {1,2,3} ) ;   
}

This resulted in a stack allocation for the diagonal matrix construction, then a move from that. For this assignment, the compiler did not have to be instructed to use a move operation (and the function was coded explicitly to prevent return value optimization from kicking in).

Note that if any of a copy, move, or destructor is defined for the class, a standards compliant compiler is supposed to also not generate any default copy, move or destructor for the class (i.e. having any such function, means that all the others are =delete unless explicitly defined).

Strange operator overload options

In a table of overloadable operators I see two weird ones:

  • ,
  • ->*

I’d never have imagined that there would be a valid reason to overload the comma operator, which I’ve only seen used in old style C macros that predated C99’s inline support. For example you could do

#define foo(x)    (f(x), g(x))

which might be equivalent to, say,

static inline int foo( int x )
{
   f( x ) ;

   return g(x) ;
}

However, sure enough a comma overloaded function is possible:

struct foo
{
   int m ;

   foo( int v = {} ) : m{v} {}

   int blah( ) const
   {
      return m + 3 ;
   }

   int operator,(const foo & f) 
   {
      return blah() + f.blah() ;
   }
} ;

int main()
{
   foo f ;
   foo g{ 7 } ;

   return f, g ;
}

This results in 7 + 0 + 3 + 3 = 13 as a return code. I don’t have any intention of exploiting this overloadable operator in any real code that I am going to write.

What is the ->* operation that can also be overloaded.

User defined literals

C++11 allows for user defined literal suffixes for constant creation, so that you could write something like

length v = 1.0009_m + 3_dm + 5.0_cm + 7_mm ;

User defined literals must begin with underscore. The system defined literals (such as the complex i, and the chrono ns from c++14) do have this underscore restriction. This is opposite to the user requirement that states no non-system code should define underscore or double-underscore prefixed symbols. I found getting the syntax right for such literals was a bit finicky. The constructor has to be constexpr, and you have to explicitly use long double or unsigned long long types in the operator parameters, as in

struct length
{
   double len {} ;

   constexpr length( double v ) : len{ v } {}
} ;

inline length operator + ( const length a, const length b )
{
   return length( a.len + b.len ) ;
}

constexpr length operator "" _m( long double v )
{
   return length{ static_cast<double>(v) } ;
}

constexpr length operator "" _dm( long double v )
{
   return length{ static_cast<double>(v/10.0) } ;
}

constexpr length operator "" _cm( long double v )
{
   return length{ static_cast<double>(v/100.0) } ;
}

constexpr length operator "" _mm( long double v )
{
   return length{ static_cast<double>(v/1000.0) } ;
}

constexpr length operator "" _m( unsigned long long v )
{
   return length{ static_cast<double>(v) } ;
}

constexpr length operator "" _dm( unsigned long long v )
{
   return length{ static_cast<double>(v/10.0) } ;
}

constexpr length operator "" _cm( unsigned long long v )
{
   return length{ static_cast<double>(v/100.0) } ;
}

constexpr length operator "" _mm( unsigned long long v )
{
   return length{ static_cast<double>(v/1000.0) } ;
}

string literals

It’s mentioned in the book that one can use an s suffix for string literals so that they have std::string type. However, what isn’t stated is that this requires both c++14 and the use of the std::literals namespace. The following illustrates how this feature can be used

#include <string>
#include <iostream>

static_assert( __cplusplus >= 201402L, "require c++14 for string literal suffix" ) ;

using namespace std::literals ;

int main()
{
   std::string hi { "hi\n" } ;
   hi += "there"s + "\n" ;

   std::cout << hi ;

   return 0 ;
}

Note that without the literal s suffix in the string concatonation, as in

   hi += "there" + "\n" ;

This produces an error:

$ make
c++ -o d -std=c++14 d.cc
d.cc:11:18: error: invalid operands to binary expression ('const char *' and 'const char *')
   hi += "there" + "\n" ;
         ~~~~~~~ ^ ~~~~
1 error generated.
make: *** [d] Error 1

The language isn’t designed to know to promote the right hand side elements to std::string just because they are being assigned to such a type. The use of either the string literal suffix, or an explicit conversion is required, as in

hi += std::string{"there"} + "\n" ;

More C++11 notes from reading Stroustrup: nothrow, try, inline & unnamed namespace, initialized new

June 16, 2016 C/C++ development and debugging. No comments , , , , , , , , , , , , , , ,

Here’s more notes from reading Stroustrup’s “The C++ Programming Language, 4th edition”

throw() as noexcept equivalent

throw() without any exception types can be used as an equivalent to the new noexcept keyword. Stroustrup also mentions that explicit throw() clauses

void foo() throw( e1, e2 ) ;

haven’t worked out well in practise, and is deprecated.

try scopes as function body

It turns out that try clauses can be used as function bodies, as in

void foo( void )
try {
}
catch ( ... )
{
}

This can also be done for constructor and destructor bodies as in

X::X( T1 v, T2 w )
try{
 : f1( v )
 , f2( w )
}
catch ( ... )
{
}

so that a throw in the class field member construction can also be caught.

Inline (default) namespace

There is a mechanism for namespace versioning. Suppose that you want a new V2 namespace to be the default, you can do:

namespace myproject
{
   inline namespace V2
   {
      struct X { 
         int x ;
         int y ;
      } ;
      void foo( const X & ) ;
   } 

   namespace V1
   {
      struct X { 
         int x ;
      } ;

      void foo( const X & ) ;
   } 
} 

Existing callers of the library that are using V1 interfaces can continue to work unmodified, but new callers will use the V2::X and V2::foo interfaces, and the library can provide both interfaces, one for compatibility and another for new code:

void myproject::V2::foo( const myproject::V2::X & )
{
   // ...
}

void myproject::V1::foo( const myproject::V1::X & )
{
   // ...
}

Unnamed namespaces.

I’d once seen unnamed namespaces as a modern C++ (more general) replacement for static functions. To see if such namespace functions are optimized away in the same fashion as a static function, I tried

#include <stdio.h>

namespace
{
   void foo()
   {
      printf( "ns:foo\n" ) ;
   }
}

int main() 
{
   foo() ;

   return 0 ;
}

This example uses printf and not std::cout because I wanted to look at the assembly listing and cout’s listing, at least on a mac, was completely abysmal. foo() was optimized away, but that’s a lot easier to see in the C printf listing:

$ make
c++ -o n -std=c++11 -O2 n.cc

$ otool -tV n | less
n:
(__TEXT,__text) section
_main:
0000000100000f70        pushq   %rbp
0000000100000f71        movq    %rsp, %rbp
0000000100000f74        leaq    0x2b(%rip), %rdi        ## literal pool for: "ns:foo"
0000000100000f7b        callq   0x100000f84             ## symbol stub for: _puts
0000000100000f80        xorl    %eax, %eax
0000000100000f82        popq    %rbp
0000000100000f83        retq

at_quick_exit

There’s now also a mechanism to exit and avoid global destructors and atexit routines from being evaluated. Here’s an example

#include <cstdlib>
#include <iostream>

extern "C"
void normalexit()
{
   std::cout << "normalexit\n" ;
}

extern "C"
void quickCexit()
{
   std::cout << "quickCexit\n" ;
}

void quickCPPexit()
{
   std::cout << "quickCPPexit\n" ;
}

class X
{
public:
   ~X()
   {
      std::cout << "X::~X()\n" ;
   }
} x ;

int main( int argc, char ** argv )
{
   atexit( normalexit ) ;
   std::at_quick_exit( quickCexit ) ;
   std::at_quick_exit( quickCPPexit ) ;

   if ( argc == 1 )
   {
      std::quick_exit( 3 ) ;
   }

when run without arguments (argc == 1), we get

$ ./at
quickCPPexit
quickCexit

whereas if the normal exit processing is allowed to complete we see global destructors and regular atexit calls

$ ./at 1
normalexit
X::~X()

Observe, unlike atexit, which can only (portably) take extern “C” defined functions, at_quick_exit can take functions with both C and C++ linkage.

Enum default

It was not obvious to me what the default value for an enum class (or enum) should be (the first value, an invalid value, zero, …)? It turns out that the default is zero, as printed by the following fragment

#include <iostream>

enum class x { v = 1, w } ;
enum y { vv = 1, ww } ;

int main()
{
   x e1 = {} ;
   y e2 = {} ;
   std::cout << (int)e1 << '\n' ;
   std::cout << e2 << '\n' ;

   return 0 ;
}

Note that an explicit cast is required for enum class values, but not for enum, which are by default, int convertible.

default initialization with new

The uniform initializer syntax can also be used with new calls. Here’s an example with uninitialized and default initialized double allocations

#include <stdio.h>

int main()
{
   double * d1 = new double ;
   double * d2 = new double{} ;

   printf( "%g %g\n", *d1, *d2 ) ;

   return 0 ;
}

Observe that we get nice garbage values for *d1, but *d2 is always 0.0:

$ ./d
-1.49167e-154 0
$ ./d
0 0
$ ./d
1.72723e-77 0
$ ./d
-2.68156e+154 0

initializer_list

I remember really wanting a feature like this eons ago when I first wrote a matrix template class in 1st year. Here’s a sample of how it could be used

#include <iostream>
#include <vector>
#include <string>

template <unsigned r, unsigned c>
class m
{
    std::vector<double> mat ;

public:
    class bad_init {} ;
    
    m() : mat(r*c) {}

    m( std::initializer_list<double> i ) : mat( r * c ) 
    {
        if ( i.size() > ( r * c ) )
        {
            throw bad_init() ;
        }

        int p{} ;
        for ( auto v : i )
        {
            mat[ p++ ] = v ;
        }
    }

    void dump( const std::string & n ) const
    {
        const char * sep = ": " ;
        std::cout << n ;

        for ( auto v : mat )
        {
            std::cout << sep << v ;
            sep = ", " ;
        }

        std::cout << '\n' ;
    }
} ;

int main()
{
    m< 3, 2 > v1 ;
    m< 3, 2 > v2{ 0., 1., 2., 3., 4. } ;

    v1.dump( "v1" ) ;
    v2.dump( "v2" ) ;

    m< 3, 2 > v3{ 0., 1., 2., 3., 4., 5., 6., 7. } ;

    return 0 ;
}

This produces the two dumps and the expected std::terminate call for the wrong (too many) parameters on the third construction attempt

$ ./i
v1: 0, 0, 0, 0, 0, 0
v2: 0, 1, 2, 3, 4, 0
libc++abi.dylib: terminating with uncaught exception of type m<3u, 2u>::bad_init
Abort trap: 6

Notes for Stroustrup’s “The C++ Programming Language, 4th Ed.”: nothrow new, noexcept, noreturn, static cons, initializer_list

June 8, 2016 C/C++ development and debugging. No comments , , , , , , , ,

I recently purchased Stroustrup’s C++11 book [1], after borrowing it a number of times from the Markham public library (it’s very popular, and only offered for short term loan) . Here are some notes of some bits and pieces that were new to me for this round of reading.

nothrow new

In DB2 we used to have to compile with -fcheck-new or similar, because we had lots of code that predated new throwing on error (c++98). There is a form of new that explicitly doesn’t throw:

void * operator new( size_t sz, const nothrow_t &) noexcept ;

I don’t know if this was introduced in c++11. If this was a c++98 addition, then it should be used in almost all the codebases new calls. When I left DB2 there were still some platform compilers (i.e. AIX xlC which doesn’t use the clang front end like linuxppcle64 xlC) that were not c++11 capable, so if this explicit nothrow isn’t c++98, it probably can’t be used.

Unnamed function parameters

It is common to see function prototypes without named parameters, such as

void foo( int, int ) ;

I did not realize that is also possible in the function definition, as in code like the following where a parameter has been dropped or left as a placeholder for future use

void foo( int x, int )
{
   printf( "%d\n", x ) ;
}

Not naming the parameter is probably a good way to get rid of unused parameter warnings.

This is very likely not a c++11 addition. I just didn’t realize the language allowed for it, and had never seen it done.

No return attribute

Looks like __attribute__ extensions are being baked right into the language, as in

[[noreturn]] void exit( int ) ;

I wonder if this is also in the plan for C?

Thread safe static constructors

C++11 explicitly requires static variable constructors are initialized using a “call-once” mechanism

class x
{
public: 
   x() ;
} ;

void foo( void )
{
   static x v() ;
}

Here there is no data race if foo() is executed concurrently in a number of threads. I remember seeing DB2 code that did this (and opening a defect to have it “fixed”), since I had no idea if it would work. We didn’t (and couldn’t yet) use -std=c++11, so it’s anybody’s guess what that does without that option and on older pre c++11 compilers.

Implied type initializer lists.

In a previous post I mentioned the c++11 uniform initialization syntax, but the basic idea is that is instead of

int x(1) ;
int y(0) ;

or

int x = 1 ;
int y = 0 ;

c++11 now allows

int x{1} ;
int y{} ;

Here the variables are initialized with values 1, and 0 (the default). The motivation for this was to provide an initializer syntax that could be used with container classes. Here’s another variation on the initializer list initialization

int x = int{} ;
int y = int{3} ;

which can be reduced to

int x = {} ;
int y = {3} ;

where the types of the lists are implied. I don’t see much value add to use this equals-list syntax in the examples above. Where this might be useful is in templated code to provide defaults

template <typename T>
void foo( T x, T v = {} ) ;

Runtime values for default arguments.

I don’t know if this is new to C++11, but the book points out that default arguments can be runtime determined values. Initially, my thought on this was that it is good that is not well known, since it would be confusing. I did however, come up with a scenerio where this could be useful. I wrote some code like the following the other day

extern bool g ;

inline int foo( )
{
   int res = 0 ;

   if ( g )
   {
      // first option
   }
   else
   {
      // second option
   }

   return res ;
}

The global g was precomputed at the shared library startup point (effectively const without being marked so). My unit test of this code modified the value of g, which was a hack and I admit ugly. It looked like

BOOST_AUTO_TEST_CASE( basicTest )
{
   for ( auto b : {false, true} )
   {
      g = b ;

      int res = foo() ;

      BOOST_REQUIRE( res >= 0 ) ;
   }
}

This has a side effect of potentially changing the global. A different way to do this would have been

extern bool g ;

inline int foo( bool internalOverrideOfGlobalForTesting = g )
{
   int res = 0 ;

   if ( internalOverrideOfGlobalForTesting )
   {
      // first option
   }
   else
   {
      // second option
   }

   return res ;
}

The test code could then be rewritten as

BOOST_AUTO_TEST_CASE( basicTest )
{
   for ( auto b : {false, true} )
   {
      int res = foo( b ) ;

      BOOST_REQUIRE( res >= 0 ) ;
   }
}

This doesn’t touch the global (an internal value), but still would have allowed for testing both codepaths. The fact that this “feature” exists may not actually be in this case, since my interface was a C interface. Does a

noexcept

Functions that intend to provide a C interface can use the noexcept keyword. That allows the compiler to enforce the fact that such functions should provide a firewall that doesn’t let any exceptions through. Example:

// foo.h
#if defined __cplusplus
   #define EXTERNC extern "C"
   #define NOEXCEPT noexcept
#else
   #define EXTERNC
   #define NOEXCEPT
#endif

EXTERNC void foo(void) NOEXCEPT ;

// foo.cc
#include "foo.h"
int foo( void ) NOEXCEPT
{
   int rc = 0 ;
   try {
      //
   }
   catch ( ... )
   {
      // handle error
      rc = 1 ;
   }

   return rc ;
}

If foo does not catch all exceptions, then the use of noexcept will drive std::terminate(), like a throw from a destructor does on some platforms.

References

[1] Bjarne Stroustrup. The C++ Programming Language, 4th Edition. Addison-Wesley, 2014.

Integer square root

June 4, 2016 C/C++ development and debugging. No comments , , ,

Screen Shot 2016-06-04 at 10.39.47 PM

[Click here for a PDF of this post with nicer formatting]

In [1] is a rather mysterious looking constant expression formula for an integer square root. This is a function that returns the smallest integer for which the square is less than the value to take the root of. Check out the black magic he used

// Stroustrup 10.4:  constexpr capable integer square root function
constexpr int isqrt_helper( int sq, int d, int n )
{
	return sq <= n ? isqrt_helper( sq + d, d + 2, n ) : d ;
}

constexpr int isqrt( int n )
{
	return isqrt_helper( 1, 3, n )/2 - 1 ;
}

The point of this construction was really to illustrate that it allows complex expressions to be used as compile time constants. I wonder at what point various compilers will give up trying to evaluate such expressions?

Let’s take this apart a bit.

Consider the first few values of \( n > 0 \).

  • \( n = 0 \). Here we have a call to \( \textrm{isqrt_helper}( 1, 3, 0 ) \) so the \( 1 \le 0 \) predicate is false, and the return value is just \( 3 \).

    For that value we have (using integer arithmetic):

    \begin{equation}\label{eqn:isqrt:20}
    \frac{3}{2} – 1 = 0,
    \end{equation}

    as desired.

  • \( n = 1 \). Here we have a call to \( \textrm{isqrt_helper}( 1, 3, 1 ) \) so the \( 1 \le 1 \) predicate is true, resulting in a second call \( \textrm{isqrt_helper}( 4, 5, 1 ) \). For that call the \( 4 \le 1 \) predicate is false, resulting in a return value of \( 5 \).

    This time we have a final result of

    \begin{equation}\label{eqn:isqrt:40}
    \frac{5}{2} – 1 = 1,
    \end{equation}

    as desired again. The result will be the same for any value \( n \in [1,3] \).

  • \( n = 4 \). We will end up with a call to \( \textrm{isqrt_helper}( 4, 5, 4 ) \) for which the \( 4 \le 4 \) predicate is true, resulting in a followup call of \( \textrm{isqrt_helper}( 9, 7, 4 ) \). For that call the \( 9 \le 4 \) predicate is false, resulting in a return value of \( 7 \).

    This time we have a final result of

    \begin{equation}\label{eqn:isqrt:45}
    \frac{7}{2} – 1 = 2,
    \end{equation}

    as expected. We get the same result for any value \( n \in [4,8] \).

Recurrence relations

The rough pattern of the magic involved can be seen. We have a sequence of calls

  • \( \textrm{isqrt_helper}( 1, 3, n ) \),
  • \( \textrm{isqrt_helper}( 4, 5, n ) \),
  • \( \textrm{isqrt_helper}( 9, 7, n ) \),
  • \( \textrm{isqrt_helper}( 16, 9, n ) \),

which terminates at the point where the first (square) parameter exceeds that value that we are taking the root of. Let the parameters of the sequence of calls be \( s_k \), and \( d_k \), so that with \( s_0 = 1, d_0 = 3 \) the \( k \in [0,…] \) call to the helper function is \( q_k = \textrm{isqrt_helper}( s_k, d_k, n ) \).

The sequence for the second parameter, the eventual return value, can be summarized compactly as \( d_k = 3 + 2 k \). It is not entirely obvious how we end up with a square for the values \( s_k = s_{k-1} + d_{k-1} \), but this follows by summation. For \( k > 1 \) that is

\begin{equation}\label{eqn:isqrt:60}
\begin{aligned}
s_k
&= s_{k-1} + d_{k-1} \\
&= s_0 + d_0 + d_1 + d_{k-1} \\
&= s_0 + \sum_{m=0}^{k-1} d_m \\
&= s_0 + \sum_{m=0}^{k-1} (3 + 2 m ) \\
&= s_0 + \sum_{m=1}^{k} (3 + 2 (m-1) ) \\
&= s_0 + \sum_{m=1}^{k} (1 + 2 m ) \\
&= 1 + k + 2 \sum_{m=1}^{k} m \\
&= 1 + k + 2 \frac{k(k+1)}{2} \\
&= k^2 + 2 k + 1 \\
&= (k+1)^2.
\end{aligned}
\end{equation}

This clearly holds for the boundary cases \( k = 0,1 \) as well. This allows the helper function action to be summarized more compactly

\begin{equation}\label{eqn:isqrt:80}
\textrm{isqrt_helper}(1, 3, n) = 3 + 2 k,
\end{equation}

where \( k \) is the smallest integer such that \( (k+1)^2 > n \). After integer scaling the final result is

\begin{equation}\label{eqn:isqrt:100}
(3 + 2 k)/2 -1 = k.
\end{equation}

This little beastie makes sense after deconstruction, but it was very Jackson like to toss this into the book without comment or explanation.

As pointed out by Pramod Gupta, there’s a spooky appearance of collaboration between Stroustrup and Jackson’s publishers, not entirely limited to the book covers.

References

[1] Bjarne Stroustrup. The C++ Programming Language, 4th Edition. Addison-Wesley, 2014.

First build break at the new job: C++ uniform initialization

May 12, 2016 C/C++ development and debugging. No comments , , , , ,

Development builds at LZ are done with clang-3.8, but there is an alternate nightly build done with the older RHEL7 GCC-4.8.3 compiler (gcc is up to 6.1 now, so the RHEL7 default is truly _ancient_). This bit of code didn’t compile with gcc:

   template <typename mutex_type>
   class shared_lock
   {  
      mutex_type &      m_mutex ;

   public:

      /** construct and acquire the mutex in shared mode */
      explicit shared_lock( mutex_type & mutex )
         : m_mutex{ mutex }
      {  

The error is:

error: invalid initialization of non-const reference of type ‘lz::shared_mutex&’ from an rvalue of type ‘<brace-enclosed initializer list>’

This seems like a compiler bug to me, one that I’d seen when doing my scinet scientific computing course, which mandated the use of at least -std=c++11. In the scinet assignments, I fixed all such issues by using -std=c++14, which worked fine, but I was using gcc-5.3 for those assignments.

It appears that this is a compiler bug, and not just an issue with the c++11 language specification, as I initially thought while doing my scinet assignments. If I rebuild this code with g++-6.1, explicitly specifying -std=c++11 (GCC 6.1 defaults to c++14), then the issue goes away, so specification of -std=c++14 is not required to allow uniform initialization to work in this situation.

Because of being forced to use the older compiler, it looks like I have to fix this by using pre-c++11 syntax:

      explicit shared_lock( mutex_type & mutex )
         : m_mutex( mutex )

My conclusion is that gcc-4.8.3 is not truly up to the job of building c++11 compliant code. I’ll have to be more careful with the language features that I use in the future.

Notes on C++11 and C++14 from scientific computing for physicists

May 1, 2016 C/C++ development and debugging. No comments , , , , , , , , , , , , , , , , , , , , , , , ,

I recently wrapped up all the programming assignments for PHY1610, Scientific Computing for Physicists

In all the assignments, we were required to compile with either

-std=c++11

or

-std=c++14

It’s possible to use those options and still program using the older C++98 syntax, but I also used this as an opportunity to learn some new style C++.

With the cavaet that we were provided with boilerplate code for a number of assignments, there was a non-trivial amount of code written for this course:

$ cloc `cat f` 2>&1 | tee o
     186 text files.
     177 unique files.                                          
       4 files ignored.

http://cloc.sourceforge.net v 1.60  T=0.88 s (197.6 files/s, 16868.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++                            111           1710           1159           7317
C/C++ Header                    62            819           1525           2237
-------------------------------------------------------------------------------
SUM:                           173           2529           2684           9554
-------------------------------------------------------------------------------

A lot of this code involved calling into external libraries (fftw3, cblas, lapack, gsl, netcdf, MPI, silo, boost exceptions, boost unittest, …) and was pretty fun to write.

Looking through my submissions, here are some of the newer language features that ended up in my code. Keep in mind that new for me is relative to the C++ language features that I was able to use in DB2 code, which is restricted by the features made available by the very oldest compiler we were using accross all platform offerings.

Using statements

I had only seen using statements for namespace selection, as in

using namespace std ;

This is, however, a more general construct, and also allows for what is effectively a scope limited typedef with a more natural syntax. Example:

using carray = rarray<std::complex<double>, 1> ;

Compare this to

typedef rarray<std::complex<double>, 1> carray ;

With the using syntax, the beginner programmer’s issue of remembering the order for the type,typename pair in a typedef statement is obliterated.

I got quite used to using using by the end of the course.

Testing language levels

The following macros were helpful when experimenting with different language levels:

#if defined __cplusplus && (__cplusplus >= 201103L)
   #define HAVE_CPLUSPLUS_11
#endif

#if defined __cplusplus && (__cplusplus >= 201402L)
   #define HAVE_CPLUSPLUS_14
#endif

enum class

C++11 introduces an ‘enum class’, different from an enum. For example, instead of writing:

/**
   interval and derivative solver methods supplied by gsl
 */
enum solver
{
   bisection,
   falsepos,
   brent,
   newton,
   secant,
   steffenson
} ;

you would write:

/**
   interval and derivative solver methods supplied by gsl
 */
enum class solver
{
   bisection,
   falsepos,
   brent,
   newton,
   secant,
   steffenson
} ;

The benefit of this compared to the non-class enum is that the enumeration names are not in the global scope. You would write

void foo( const solver s ) 
{
   if ( s == solver::falsepos )
}

not

void foo( const solver s ) 
{
   if ( s == falsepos )
}

This nicely avoids namespace clashes.

That is not the only benefit to C++11 enums. C++11 enums can also be forward referenced, provided the storage class of the enum is also specified.

If you have ever worked on code that is massively coupled and interdependent (such as DB2), you have seen places where piles of headers have to get dragged in for enum bodies, because it is not possible to forward reference an enum portably. This is a very nice feature!

A simple example of a forward declared C++11 enum is:

enum solver : int ;
void foo( const solver s ) ;

enum solver : int
{
  x = 0, y = 1
} ;

Or, using the non-global enum class syntax:

enum class what : int ;
void foo( const what s ) ;

enum class what : int
{
  x = 0, y = 1
} ;

I didn’t actually use enum classes for enum forward referencing in my phy1610 assignments, because they were too simple to require that.

There is huge potential for using enums with storage classes in DB2 code. I expect that is also true for many other huge scale C++ codebases. The fact that this feature does not have appear to be tied to a requirement to also use ‘enum class’ is very nice for transforming legacy code. I left IBM before the day of seeing the use of compilers that allowed that on all platforms, but can imagine there will be some huge potential build time savings once C++11 compilers are uniformly available for DB2 code (and the code is ported to compile with C++11 enabled on all platforms).

As a side note, the storage class qualification, even if not being used for forward referencing is quite nice. I used it for return codes from main, which have to fit within one byte (i.e. within the waitpid waitstatus byte). For example:

enum class RETURNCODES : unsigned char
{
    SUCCESS       ///< exit code for successful exectution
   ,HELP          ///< exit code when -help (or bad option is supplied)
   ,PARSE_ERROR   ///< exit code if there's a parse error */
   ,EXCEPTION     ///< exit code if there's an unexpected exception thrown */
} ;

Uniform initialization

A new initialization paradigm is available in C++11. Instead of using constructor syntax for initialization, as in

/**
   Input parameters for gsl solver iteration.
 */
struct iterationParameters
{
   const Uint     m_max_iter ;  ///< Maximum number of iterations before giving up.
   const double   m_abserr ;    ///< the absolute error criteria for convergence.
   const double   m_relerr ;    ///< the relative error criteria for convergence.
   const bool     m_verbose ;   ///< verbose output

   iterationParameters( const Uint     max_iter,
                        const double   abserr,
                        const double   relerr,
                        const bool     verbose ) :
         m_max_iter(max_iter),
         m_abserr(abserr),
         m_relerr(relerr),
         m_verbose(verbose)
   {
   }
} ;

one could write

/**
   Input parameters for gsl solver iteration.
 */
struct iterationParameters
{
   const Uint     m_max_iter ;  ///< Maximum number of iterations before giving up.
   const double   m_abserr ;    ///< the absolute error criteria for convergence.
   const double   m_relerr ;    ///< the relative error criteria for convergence.
   const bool     m_verbose ;   ///< verbose output

   iterationParameters( const Uint     max_iter,
                        const double   abserr,
                        const double   relerr,
                        const bool     verbose ) :
         m_max_iter{max_iter},
         m_abserr{abserr},
         m_relerr{relerr},
         m_verbose{verbose}
   {
   }
} ;

This is a little foreign looking and it is easy to wonder what the advantage is. One of the advantages is that this syntax can be used for container initialization. For example, instead of

std::vector<int> v ;
v.push_back( 1 ) ;
v.push_back( 2 ) ;
v.push_back( 3 ) ;

you can just do

std::vector<int> v{ 1, 2, 3 } ;

This is called uniform initialization, since this mechanism was extended to basic types as well. For example, instead of initializing an array with an assignment operator, as in

   constexpr struct option long_options[] = {
     { "help",   0, NULL, 'h' },
     { "number", 1, NULL, 'n' },
     { "lower",  1, NULL, 'l' },
     { "upper",  1, NULL, 'u' },
     { NULL,     0, NULL, 0   }
   } ;

you can write

   constexpr struct option long_options[]{
     { "help",   0, NULL, 'h' },
     { "number", 1, NULL, 'n' },
     { "lower",  1, NULL, 'l' },
     { "upper",  1, NULL, 'u' },
     { NULL,     0, NULL, 0   }
   } ;

Instead of just providing a special mechanism to initialize container class objects, the language was extended to provide a new initialization syntax that could be used to initialize contain those objects and all others.

However, this is not just a different syntax for initialization, because there the types have to match strictly. For example this init of a couple stack variables will not compile

   int more{3} ;
   float x1{-2.0} ;
   size_t size{meta.numThreads*20} ;

What is required is one of

   float x1{-2.0f} ;

   // or

   double x1{-2.0} ;

Additionally, suppose that meta.numThreads has int type. Such a uniform initialization attempt will not compile, since the product is not of type size_t. That line can be written as:

   size_t size{(size_t)meta.numThreads*20} ;

   // or:
   size_t size = meta.numThreads*20 ;

I found uniform initialization hard on the eyes because it looked so foreign, but did eventually get used to it, with one exception. It seems to me that a longer initialization expression like the following is harder to read

double x{ midpoint( x1, x1 + intervalWidth ) } ;

than

double x = midpoint( x1, x1 + intervalWidth ) ;

There were also cases with -std=c++11 where uniform init and auto variables (see below) did not interact well, producing errors later when my auto-uniform-init’ed variables got interpreted as initializer lists instead of the types I desired. All such errors seemed to go away with -std=c++14, which seemed to generally provide a more stable language environment.

New string to integer functions

The c++11 standard library has new string to integer functions
http://en.cppreference.com/w/cpp/string/basic_string/stoul
which are more convenient than the strtoul functions. These throw exceptions on error, but still allow the
collection of errno and error position if you want them.

using Uint = std::uintptr_t ;

/**
   Register sized signed integer type for loop counters and so forth.
 */
using Sint = std::intptr_t ;

/**
   wrapper for stoul to match the type of Uint above.
 */
#if defined _WIN64
   #define strToUint std::stoull
#else
   #define strToUint std::stoul
#endif

There are other similar functions like std::stod, for string to double conversion. There were also opposite convertors, such as to_string, for converting integer types to strings. For example:

const std::string filename{ fileBaseName + "_" + std::to_string( rank ) + ".out" } ;

Static assertions.

DB2 had a static assertion implementation (OSS_CTASSERT, or sqlzStaticAssert?) but there is now one in the standard. Here’s an example using the Uint “typedef” above:

/**
   Force a compilation error if size assumptions are invalid.
 */
inline void strToUintAssumptions()
{
#if defined _WIN64
   static_assert( sizeof(Uint) == sizeof(unsigned long long), "bad assumptions about sizeof uintptr_t, long long" ) ;
#else
   static_assert( sizeof(Uint) == sizeof(unsigned long), "bad assumptions about sizeof uintptr_t, long" ) ;
#endif
}

The advantage of static_assert over a typedef (variable sized array) implementation like DB2 HAD is that compilers likely produce a better error message when it fails (instead of something unintuitive like “reference of array location at offset -1 is invalid”).

Boost exceptions.

While not part of c++11, the boost exception classes were available for my assignments. These are pretty easy to use. As setup you define some helper classes, which really just provide a name for the exception, and a name to identify any of the data that you’d like to throw along with the underlying exception. This could look like the following for example:

#include <boost/exception/exception.hpp>
#include <boost/exception/info.hpp>

struct error : virtual std::exception, virtual boost::exception { } ;
struct regex_match_error : virtual error { } ;

struct tag_match_input ;
typedef boost::error_info<tag_match_input,std::string> match_info ;

struct tag_match_re ;
typedef boost::error_info<tag_match_re,std::string> re_info ;

struct tag_intdata ;
typedef boost::error_info<tag_intdata,long> intdata_info ;

Such classes would be best in a namespace since they are generic, but I didn’t bother for all these assignments.

I used the boost exceptions for a couple things. One of which, of course, was throwing exceptions, but the other was as an assert-with-data backend:

#define ASSERT_NO_ERROR (static_cast<void>(0))
#ifdef NDEBUG
   #define ASSERT_DATA_INT( expr, v1 )          ASSERT_NO_ERROR
   #define ASSERT_DATA_INT_INT( expr, v1, v2 )  ASSERT_NO_ERROR
#else
   #define ASSERT_DATA_INT( expr, v1 )          \
      ( (expr)                                  \
      ? ASSERT_NO_ERROR                         \
      : BOOST_THROW_EXCEPTION(                  \
            assert_error()                      \
               << intdata_info( v1 ) ) )
//...
#endif

This allowed me to assert with data as in

ASSERT_DATA_INT( sz > 0, sz ) ;
ASSERT_DATA_INT_INT( taskNumber < numTasks, taskNumber, numTasks ) ;

This way I get not just the abort from the assert, but also the underlying reason, and can dump those to the console with no additional effort than catching any other boost exception:

//...
#include <boost/exception/diagnostic_information.hpp>

int main( int argc, char ** argv )
{
   try {
      auto expected{7} ;

      ASSERT_DATA_INT_INT( argc == expected, argc, expected ) ;
   }
   catch ( boost::exception & e )
   {
      auto s { boost::diagnostic_information( e ) } ;
      std::cout << s << std::endl ;
      // ...

This generates something like:

$ ./bassert
bassert.cc(11): Throw in function int main(int, char**)
Dynamic exception type: boost::exception_detail::clone_impl<assert_error>
std::exception::what: std::exception
[tag_intdata*] = 1
[tag_intdata2*] = 7

I wonder how efficient constructing such an exception object is? When pre-processed the assertion above expands to

      ( (argc == expected) ? (static_cast<void>(0)) :
     ::boost::exception_detail::throw_exception_(
     assert_error() << intdata_info( argc ) << intdata2_info( expected )
     ,__PRETTY_FUNCTION__,"bassert.cc",11)
     ) ;

Stepping through this in the debugger I see some interesting stuff, but it included heap (i.e. new) allocations. This means that this sort of Boost exception may malfunction very badly in out of memory conditions where it is conceivable that one would want to throw an exception.

The runtime cost can’t be that inexpensive either (when the assert is triggered). I see four function calls even before the throw is processed:

assert_error const& boost::exception_detail::set_info(assert_error const&, boost::error_info const&)-0x4
assert_error const& boost::exception_detail::set_info(assert_error const&, boost::error_info const&)-0x4
assert_error::assert_error(assert_error const&)-0x4
void boost::throw_exception(assert_error const&)-0x4

and the total instruction count goes up to ~140 from 4 for the NDEBUG case (with optimization). Only 5 instructions get executed in the happy codepath. This is what we want in exception handling code: very cheap when it’s not triggered, with all the expense moved to the unhappy codepath.

The negative side effect of this sort of error handling looks like a lot of instruction cache bloat.

Boost test

The boost test library is also not a C++11 feature, but new for me, and learned in this course. Here’s a fragment of how it is used

#define BOOST_TEST_MAIN
#define BOOST_TEST_MODULE test

#define BOOST_TEST_DYN_LINK

#include <boost/test/unit_test.hpp>
#include <vector>

BOOST_AUTO_TEST_CASE( testExample )
{
   std::vector<int> v(3) ;

   BOOST_REQUIRE( 3 == v.size() ) ;
   BOOST_REQUIRE_MESSAGE( 3 == v.size(), "size: " + std::to_string( v.size() ) ) ;
}

A boost test after being run looks like:

$ ./test --report_level=detailed --log_level=all
Running 1 test case...
Entering test module "test"
test.cc:9: Entering test case "testExample"
test.cc:13: info: check 3 == v.size() has passed
test.cc:14: info: check 'size: 3' has passed
test.cc:9: Leaving test case "testExample"; testing time: 87us
Leaving test module "test"; testing time: 103us

Test module "test" has passed with:
  1 test case out of 1 passed
  2 assertions out of 2 passed

  Test case "testExample" has passed with:
    2 assertions out of 2 passed

Range for and auto type

The range for is much like perl’s foreach. For example, in perl you could write

my @a = ( 1, 2, 3 ) ;
foreach my $v ( @a )
{
   foo( $v ) ;
}

An equivalent C++ loop like this can be as simple as

std::vector<int> a{1, 2, 3 } ;
for ( auto v : a )
{
   foo( v ) ;
}

You can also declare the list of items to iterate over inline, as in

using iocfg = iohandler::cfg ;
for ( auto c : { iocfg::graphics, iocfg::ascii, iocfg::netcdf, iocfg::noop } )
{
   // ...
}

Observe that, just like perl, C++ no longer requires any explicit type for the loop variable, as it is deduced when auto is specified. It is still strongly typed, but you can write code that doesn’t explicitly depend on that type. I see lots of benefits to this, as you can have additional freedom to change type definitions and not have to adjust everything that uses it.

I can imagine that it could potentially get confusing if all variables in a function get declared auto, but did not find that to be the case for any of the code I produced in these assignments.

One gotcha with auto that I did hit was that care is required in computed expressions. I’d used auto in one case and the result got stored as a large unsigned value, instead of signed as desired (i.e. negative values got stored in unsigned auto variables). In that case I used an explicit type. Extensive use of auto may end up requiring more unit and other test if the types picked are not those that are desired.

std::chrono (ticks.h)

This is a nice portability layer for fine grain time measurements, allowing you to avoid platform specific functions like gettimeofday, and also avoid any composition of the seconds/subseconds data that many such interfaces provide.

Here’s a fragment of a class that allows interval time measurements and subsequent conversion:

class ticks
{
   using clock      = std::chrono::high_resolution_clock ;

   clock::time_point m_sample ;
public:

   static inline ticks sample()
   {
      ticks t ;
      t.m_sample = clock::now() ;

      return t ;
   }

   using duration   = decltype( m_sample - m_sample ) ;

   friend duration operator -( const ticks & a, const ticks & b ) ;
} ;

inline ticks::duration operator -( const ticks & a, const ticks & b )
{
   return a.m_sample - b.m_sample ;
}

inline auto durationToMicroseconds( const ticks::duration & diff )
{
   return std::chrono::duration_cast<std::chrono::microseconds>( diff ).count() ;
}

Note that the last function is using c++14 return type deduction. That does not work without coersion
in c++11, requiring:

inline auto durationToMicroseconds( const ticks::duration & diff )
-> decltype(std::chrono::duration_cast<std::chrono::microseconds>( diff ).count())
{
   return std::chrono::duration_cast<std::chrono::microseconds>( diff ).count() ;
}

which is very ugly.

Random numbers

/**
   A random number generator that produces integer uniformly
   distributed in the interval:

   [a, a + delta N]

   with separation delta between values returned.
 */
template <int a, int delta, int N>
class RandomIntegers
{
   std::random_device                        m_rd ;A
   //std::default_random_engine                m_engine ;
   std::mt19937                              m_engine ;
   std::uniform_int_distribution<unsigned>   m_uniform ;

public:
   /** constuct a uniform random number generator for the specified range */
   RandomIntegers( )
      : m_rd()
      , m_engine( m_rd() )
      , m_uniform( 0, N )
   {
      static_assert( N > 0, "Integer N > 0 expected" ) ;
      static_assert( delta > 0, "Integer delta > 0 expected" ) ;
   }

   /**
      return a uniform random number sample from {a, a + delta, ..., a + delta N}
    */
   int sample()
   {
      auto p = m_uniform( m_engine ) ;

      return a + p * delta ;
   }
} ;

constexpr

Instead of using #defines, one can use completely typed declarations, but still constant using the constexpr keyword. An example

constexpr size_t N{3} ;
std::tuple<int, N> t ;

nullptr

The days of not knowing what header defines NULL and dealing with conflicting definitions are over. Instead of using NULL, we now have a builtin language construct nullptr available.

Lambdas and sort

Custom sorting is really simple in c++ now. Here’s an example of a partial sort (sorting the top N elements, and leaving the rest unspecified). The sort function no longer has to be a function call, and can be specified inline

auto second_greater = [](auto & left, auto & right) { return left.second > right.second ; } ;
std::partial_sort( cvec.begin(),
                   cvec.begin() + N,
                   cvec.end(),
                   second_greater ) ;

The “inline” sort function here is using c++14 lambda syntax. For c++11, the parameter types can’t be auto, so something such as the following might be required

auto second_greater = [](const results_pair & left, const results_pair & right) { return left.second > right.second ; } ;

Useful standard helper methods

The standard library has lots of useful utility functions. I’m sure I only scratched the surface discovering some of those. Some I used were:

std::swap( m_sz, other.m_sz ) ;
std::fill( m_storage.begin(), m_storage.end(), v ) ;
std::copy( b.m_storage.begin(), b.m_storage.end(), m_storage.begin() ) ;
r.first  = std::max( l, m_myFirstGlobalElementIndex ) ;
r.second = std::min( u, m_myLastGlobalElementIndex ) ;

I also liked the copysign function, allowing easy access to the sign bit of a float or double without messing around with extracting the bit, or explicit predicates:

inline double signof( const double v )
{
   return std::copysign( 1.0, v ) ;
}

Mean and standard deviation were also really easy to calculate. Here’s an example that used a lambda function to calculate the difference from the mean to get at the squared difference from the mean:


      m_sum = std::accumulate( v.begin(), v.end(), 0.0 ) ;
      m_mean = m_sum / v.size() ;
      double mean = m_mean ; // for lambda capture

      std::vector<double> diff( v.size() ) ;

      std::transform( v.begin(), v.end(), diff.begin(), [mean](double x) { return x - mean; } ) ;

      m_sq_sum = std::inner_product( diff.begin(), diff.end(), diff.begin(), 0.0 ) ;

decltype

Attempting to mix auto with g++’s ‘-Wall -Werror’ causes some trouble. For example, this doesn’t work

void foo ( const size_t size )
{
   for ( auto i{0} ; i < size ; i++ )
   {
      // ...
   }
}

This doesn’t compile since the i < size portion generates sign vs unsigned comparison warnings. There are a few ways to fix this.

   // specify the type explicitly:
   for ( size_t i{0} ; i < size ; i++ )

   // let the compiler use the type of the size variable:
   for ( decltype(size) i{0} ; i < size ; i++ )

The decltype method is probably of more use in template code. For non-template code, I found that explicitly specifying the type was more readable.

std::valarray (myrarray.h)

The standard library has a vectored array construct, but I was disappointed with the quality of the generated code that I observed. It also turned out to be faster not to use it. For example:

void SineCosineVecOps( std::valarray<float> & s, std::valarray<float> & c, const std::valarray<float> & v )
{
   s = std::sin( v ) ;
   c = std::cos( v ) ;
}

void SineCosineManOps( std::valarray<float> & s, std::valarray<float> & c, const std::valarray<float> & v )
{
   for ( Uint i{0} ; i < ASIZE ; i++ )
   {  
      float theta = v[i] ;

      s[i] = std::sin( theta ) ;
      c[i] = std::cos( theta ) ;
   }
}

when run on a 300 element array executed close to 1.5x slower using the valarray vector assignment operation, and had close to 3x times the instructions (with optimization)!

Perhaps other compilers do better with valarray. g++ 5.3 is certainly not worth using with that container type.