sort

Building my new “garage”

November 12, 2020 Home renos , , , , , , , ,

I managed to sneak in a day off of work (split over two days), and built a space for all the tools that I used to keep in my double car garage. We’ve been in the new downtown house now for a year, and had most of the old house cleared out except for the garage. You can accumulate a lot of stuff in 20 years of home ownership, and moving from a house with a double car garage to a no-garage house, was quite a challenge. After many panic-demic induced delays, we eventually finished the renos on the old house, and sold it.  I’m really enjoying the new neighbourhood, where I can walk to just about everything I need, but there’s a few things that I miss from the old house:

  1. The garage!
  2. Parking spaces (6 not including the garage — I won’t miss shovelling that driveway!)
  3. The pool.
  4. The hottub.

However, number 1 — the garage, has been the most challenging.  We’ve had stuff from the garage all over the house, in the sheds in the back yard, and a whole lot of it on the back deck under a tarp.  We replaced our washer and dryer with a stacking unit to maximize the space, and I’ve now built some heavy duty shelves next to it for all the tools and toolboxes:

I’ve drilled three rows of holes, each 2″ apart, so that I can adjust the height of the shelves.  I’ve fixed the middle and the top shelf for stability.  I also tacked in the shelf on the bottom with a couple screws and should put some sort of fixed back brace, or a bottom piece so that the side supports cannot spread.  That will have to be later, since I’m out of wood (I had to scrounge a bit and my top most adjustable shelf is not big enough — so that one is temporary too.)

We may redo the plumbing on the other side of the washer dryer too. We have some long multiple hose runs, one of which leaked at one point, because of a degraded washer.   It would be better to put one of those tidy washer/dryer plumbing boxes right in the wall near the washer dryer instead of the current leak ready to happen system.  That would allow for eliminating all the too-long hoses, and give us a chance to fully optimize the long laundry closet for storage.  That and the opposite storage unit is the closest that we will get to a “garage” in the new house.

In the 2o year accumulation of stuff, I have a whole lot of tools that actually need to go.  Some of these were dad’s, and I didn’t have the heart to toss them, but it would be better to find them homes with people that will actively use them.  At the bare minimum, some of these excess tools should go to people who actually have storage space to be hoarders, something that we can no longer do.  Now that I have things arrayed in an accessible fashion, it’s time for the big sort, and then the purge after the sort.

Notes on C++11 and C++14 from scientific computing for physicists

May 1, 2016 C/C++ development and debugging. , , , , , , , , , , , , , , , , , , , , , , , ,

I recently wrapped up all the programming assignments for PHY1610, Scientific Computing for Physicists

In all the assignments, we were required to compile with either

-std=c++11

or

-std=c++14

It’s possible to use those options and still program using the older C++98 syntax, but I also used this as an opportunity to learn some new style C++.

With the cavaet that we were provided with boilerplate code for a number of assignments, there was a non-trivial amount of code written for this course:

$ cloc `cat f` 2>&1 | tee o
     186 text files.
     177 unique files.                                          
       4 files ignored.

http://cloc.sourceforge.net v 1.60  T=0.88 s (197.6 files/s, 16868.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++                            111           1710           1159           7317
C/C++ Header                    62            819           1525           2237
-------------------------------------------------------------------------------
SUM:                           173           2529           2684           9554
-------------------------------------------------------------------------------

A lot of this code involved calling into external libraries (fftw3, cblas, lapack, gsl, netcdf, MPI, silo, boost exceptions, boost unittest, …) and was pretty fun to write.

Looking through my submissions, here are some of the newer language features that ended up in my code. Keep in mind that new for me is relative to the C++ language features that I was able to use in DB2 code, which is restricted by the features made available by the very oldest compiler we were using accross all platform offerings.

Using statements

I had only seen using statements for namespace selection, as in

using namespace std ;

This is, however, a more general construct, and also allows for what is effectively a scope limited typedef with a more natural syntax. Example:

using carray = rarray<std::complex<double>, 1> ;

Compare this to

typedef rarray<std::complex<double>, 1> carray ;

With the using syntax, the beginner programmer’s issue of remembering the order for the type,typename pair in a typedef statement is obliterated.

I got quite used to using using by the end of the course.

Testing language levels

The following macros were helpful when experimenting with different language levels:

#if defined __cplusplus && (__cplusplus >= 201103L)
   #define HAVE_CPLUSPLUS_11
#endif

#if defined __cplusplus && (__cplusplus >= 201402L)
   #define HAVE_CPLUSPLUS_14
#endif

enum class

C++11 introduces an ‘enum class’, different from an enum. For example, instead of writing:

/**
   interval and derivative solver methods supplied by gsl
 */
enum solver
{
   bisection,
   falsepos,
   brent,
   newton,
   secant,
   steffenson
} ;

you would write:

/**
   interval and derivative solver methods supplied by gsl
 */
enum class solver
{
   bisection,
   falsepos,
   brent,
   newton,
   secant,
   steffenson
} ;

The benefit of this compared to the non-class enum is that the enumeration names are not in the global scope. You would write

void foo( const solver s ) 
{
   if ( s == solver::falsepos )
}

not

void foo( const solver s ) 
{
   if ( s == falsepos )
}

This nicely avoids namespace clashes.

That is not the only benefit to C++11 enums. C++11 enums can also be forward referenced, provided the storage class of the enum is also specified.

If you have ever worked on code that is massively coupled and interdependent (such as DB2), you have seen places where piles of headers have to get dragged in for enum bodies, because it is not possible to forward reference an enum portably. This is a very nice feature!

A simple example of a forward declared C++11 enum is:

enum solver : int ;
void foo( const solver s ) ;

enum solver : int
{
  x = 0, y = 1
} ;

Or, using the non-global enum class syntax:

enum class what : int ;
void foo( const what s ) ;

enum class what : int
{
  x = 0, y = 1
} ;

I didn’t actually use enum classes for enum forward referencing in my phy1610 assignments, because they were too simple to require that.

There is huge potential for using enums with storage classes in DB2 code. I expect that is also true for many other huge scale C++ codebases. The fact that this feature does not have appear to be tied to a requirement to also use ‘enum class’ is very nice for transforming legacy code. I left IBM before the day of seeing the use of compilers that allowed that on all platforms, but can imagine there will be some huge potential build time savings once C++11 compilers are uniformly available for DB2 code (and the code is ported to compile with C++11 enabled on all platforms).

As a side note, the storage class qualification, even if not being used for forward referencing is quite nice. I used it for return codes from main, which have to fit within one byte (i.e. within the waitpid waitstatus byte). For example:

enum class RETURNCODES : unsigned char
{
    SUCCESS       ///< exit code for successful exectution
   ,HELP          ///< exit code when -help (or bad option is supplied)
   ,PARSE_ERROR   ///< exit code if there's a parse error */
   ,EXCEPTION     ///< exit code if there's an unexpected exception thrown */
} ;

Uniform initialization

A new initialization paradigm is available in C++11. Instead of using constructor syntax for initialization, as in

/**
   Input parameters for gsl solver iteration.
 */
struct iterationParameters
{
   const Uint     m_max_iter ;  ///< Maximum number of iterations before giving up.
   const double   m_abserr ;    ///< the absolute error criteria for convergence.
   const double   m_relerr ;    ///< the relative error criteria for convergence.
   const bool     m_verbose ;   ///< verbose output

   iterationParameters( const Uint     max_iter,
                        const double   abserr,
                        const double   relerr,
                        const bool     verbose ) :
         m_max_iter(max_iter),
         m_abserr(abserr),
         m_relerr(relerr),
         m_verbose(verbose)
   {
   }
} ;

one could write

/**
   Input parameters for gsl solver iteration.
 */
struct iterationParameters
{
   const Uint     m_max_iter ;  ///< Maximum number of iterations before giving up.
   const double   m_abserr ;    ///< the absolute error criteria for convergence.
   const double   m_relerr ;    ///< the relative error criteria for convergence.
   const bool     m_verbose ;   ///< verbose output

   iterationParameters( const Uint     max_iter,
                        const double   abserr,
                        const double   relerr,
                        const bool     verbose ) :
         m_max_iter{max_iter},
         m_abserr{abserr},
         m_relerr{relerr},
         m_verbose{verbose}
   {
   }
} ;

This is a little foreign looking and it is easy to wonder what the advantage is. One of the advantages is that this syntax can be used for container initialization. For example, instead of

std::vector<int> v ;
v.push_back( 1 ) ;
v.push_back( 2 ) ;
v.push_back( 3 ) ;

you can just do

std::vector<int> v{ 1, 2, 3 } ;

This is called uniform initialization, since this mechanism was extended to basic types as well. For example, instead of initializing an array with an assignment operator, as in

   constexpr struct option long_options[] = {
     { "help",   0, NULL, 'h' },
     { "number", 1, NULL, 'n' },
     { "lower",  1, NULL, 'l' },
     { "upper",  1, NULL, 'u' },
     { NULL,     0, NULL, 0   }
   } ;

you can write

   constexpr struct option long_options[]{
     { "help",   0, NULL, 'h' },
     { "number", 1, NULL, 'n' },
     { "lower",  1, NULL, 'l' },
     { "upper",  1, NULL, 'u' },
     { NULL,     0, NULL, 0   }
   } ;

Instead of just providing a special mechanism to initialize container class objects, the language was extended to provide a new initialization syntax that could be used to initialize contain those objects and all others.

However, this is not just a different syntax for initialization, because there the types have to match strictly. For example this init of a couple stack variables will not compile

   int more{3} ;
   float x1{-2.0} ;
   size_t size{meta.numThreads*20} ;

What is required is one of

   float x1{-2.0f} ;

   // or

   double x1{-2.0} ;

Additionally, suppose that meta.numThreads has int type. Such a uniform initialization attempt will not compile, since the product is not of type size_t. That line can be written as:

   size_t size{(size_t)meta.numThreads*20} ;

   // or:
   size_t size = meta.numThreads*20 ;

I found uniform initialization hard on the eyes because it looked so foreign, but did eventually get used to it, with one exception. It seems to me that a longer initialization expression like the following is harder to read

double x{ midpoint( x1, x1 + intervalWidth ) } ;

than

double x = midpoint( x1, x1 + intervalWidth ) ;

There were also cases with -std=c++11 where uniform init and auto variables (see below) did not interact well, producing errors later when my auto-uniform-init’ed variables got interpreted as initializer lists instead of the types I desired. All such errors seemed to go away with -std=c++14, which seemed to generally provide a more stable language environment.

New string to integer functions

The c++11 standard library has new string to integer functions
http://en.cppreference.com/w/cpp/string/basic_string/stoul
which are more convenient than the strtoul functions. These throw exceptions on error, but still allow the
collection of errno and error position if you want them.

using Uint = std::uintptr_t ;

/**
   Register sized signed integer type for loop counters and so forth.
 */
using Sint = std::intptr_t ;

/**
   wrapper for stoul to match the type of Uint above.
 */
#if defined _WIN64
   #define strToUint std::stoull
#else
   #define strToUint std::stoul
#endif

There are other similar functions like std::stod, for string to double conversion. There were also opposite convertors, such as to_string, for converting integer types to strings. For example:

const std::string filename{ fileBaseName + "_" + std::to_string( rank ) + ".out" } ;

Static assertions.

DB2 had a static assertion implementation (OSS_CTASSERT, or sqlzStaticAssert?) but there is now one in the standard. Here’s an example using the Uint “typedef” above:

/**
   Force a compilation error if size assumptions are invalid.
 */
inline void strToUintAssumptions()
{
#if defined _WIN64
   static_assert( sizeof(Uint) == sizeof(unsigned long long), "bad assumptions about sizeof uintptr_t, long long" ) ;
#else
   static_assert( sizeof(Uint) == sizeof(unsigned long), "bad assumptions about sizeof uintptr_t, long" ) ;
#endif
}

The advantage of static_assert over a typedef (variable sized array) implementation like DB2 HAD is that compilers likely produce a better error message when it fails (instead of something unintuitive like “reference of array location at offset -1 is invalid”).

Boost exceptions.

While not part of c++11, the boost exception classes were available for my assignments. These are pretty easy to use. As setup you define some helper classes, which really just provide a name for the exception, and a name to identify any of the data that you’d like to throw along with the underlying exception. This could look like the following for example:

#include <boost/exception/exception.hpp>
#include <boost/exception/info.hpp>

struct error : virtual std::exception, virtual boost::exception { } ;
struct regex_match_error : virtual error { } ;

struct tag_match_input ;
typedef boost::error_info<tag_match_input,std::string> match_info ;

struct tag_match_re ;
typedef boost::error_info<tag_match_re,std::string> re_info ;

struct tag_intdata ;
typedef boost::error_info<tag_intdata,long> intdata_info ;

Such classes would be best in a namespace since they are generic, but I didn’t bother for all these assignments.

I used the boost exceptions for a couple things. One of which, of course, was throwing exceptions, but the other was as an assert-with-data backend:

#define ASSERT_NO_ERROR (static_cast<void>(0))
#ifdef NDEBUG
   #define ASSERT_DATA_INT( expr, v1 )          ASSERT_NO_ERROR
   #define ASSERT_DATA_INT_INT( expr, v1, v2 )  ASSERT_NO_ERROR
#else
   #define ASSERT_DATA_INT( expr, v1 )          \
      ( (expr)                                  \
      ? ASSERT_NO_ERROR                         \
      : BOOST_THROW_EXCEPTION(                  \
            assert_error()                      \
               << intdata_info( v1 ) ) )
//...
#endif

This allowed me to assert with data as in

ASSERT_DATA_INT( sz > 0, sz ) ;
ASSERT_DATA_INT_INT( taskNumber < numTasks, taskNumber, numTasks ) ;

This way I get not just the abort from the assert, but also the underlying reason, and can dump those to the console with no additional effort than catching any other boost exception:

//...
#include <boost/exception/diagnostic_information.hpp>

int main( int argc, char ** argv )
{
   try {
      auto expected{7} ;

      ASSERT_DATA_INT_INT( argc == expected, argc, expected ) ;
   }
   catch ( boost::exception & e )
   {
      auto s { boost::diagnostic_information( e ) } ;
      std::cout << s << std::endl ;
      // ...

This generates something like:

$ ./bassert
bassert.cc(11): Throw in function int main(int, char**)
Dynamic exception type: boost::exception_detail::clone_impl<assert_error>
std::exception::what: std::exception
[tag_intdata*] = 1
[tag_intdata2*] = 7

I wonder how efficient constructing such an exception object is? When pre-processed the assertion above expands to

      ( (argc == expected) ? (static_cast<void>(0)) :
     ::boost::exception_detail::throw_exception_(
     assert_error() << intdata_info( argc ) << intdata2_info( expected )
     ,__PRETTY_FUNCTION__,"bassert.cc",11)
     ) ;

Stepping through this in the debugger I see some interesting stuff, but it included heap (i.e. new) allocations. This means that this sort of Boost exception may malfunction very badly in out of memory conditions where it is conceivable that one would want to throw an exception.

The runtime cost can’t be that inexpensive either (when the assert is triggered). I see four function calls even before the throw is processed:

assert_error const& boost::exception_detail::set_info(assert_error const&, boost::error_info const&)-0x4
assert_error const& boost::exception_detail::set_info(assert_error const&, boost::error_info const&)-0x4
assert_error::assert_error(assert_error const&)-0x4
void boost::throw_exception(assert_error const&)-0x4

and the total instruction count goes up to ~140 from 4 for the NDEBUG case (with optimization). Only 5 instructions get executed in the happy codepath. This is what we want in exception handling code: very cheap when it’s not triggered, with all the expense moved to the unhappy codepath.

The negative side effect of this sort of error handling looks like a lot of instruction cache bloat.

Boost test

The boost test library is also not a C++11 feature, but new for me, and learned in this course. Here’s a fragment of how it is used

#define BOOST_TEST_MAIN
#define BOOST_TEST_MODULE test

#define BOOST_TEST_DYN_LINK

#include <boost/test/unit_test.hpp>
#include <vector>

BOOST_AUTO_TEST_CASE( testExample )
{
   std::vector<int> v(3) ;

   BOOST_REQUIRE( 3 == v.size() ) ;
   BOOST_REQUIRE_MESSAGE( 3 == v.size(), "size: " + std::to_string( v.size() ) ) ;
}

A boost test after being run looks like:

$ ./test --report_level=detailed --log_level=all
Running 1 test case...
Entering test module "test"
test.cc:9: Entering test case "testExample"
test.cc:13: info: check 3 == v.size() has passed
test.cc:14: info: check 'size: 3' has passed
test.cc:9: Leaving test case "testExample"; testing time: 87us
Leaving test module "test"; testing time: 103us

Test module "test" has passed with:
  1 test case out of 1 passed
  2 assertions out of 2 passed

  Test case "testExample" has passed with:
    2 assertions out of 2 passed

Range for and auto type

The range for is much like perl’s foreach. For example, in perl you could write

my @a = ( 1, 2, 3 ) ;
foreach my $v ( @a )
{
   foo( $v ) ;
}

An equivalent C++ loop like this can be as simple as

std::vector<int> a{1, 2, 3 } ;
for ( auto v : a )
{
   foo( v ) ;
}

You can also declare the list of items to iterate over inline, as in

using iocfg = iohandler::cfg ;
for ( auto c : { iocfg::graphics, iocfg::ascii, iocfg::netcdf, iocfg::noop } )
{
   // ...
}

Observe that, just like perl, C++ no longer requires any explicit type for the loop variable, as it is deduced when auto is specified. It is still strongly typed, but you can write code that doesn’t explicitly depend on that type. I see lots of benefits to this, as you can have additional freedom to change type definitions and not have to adjust everything that uses it.

I can imagine that it could potentially get confusing if all variables in a function get declared auto, but did not find that to be the case for any of the code I produced in these assignments.

One gotcha with auto that I did hit was that care is required in computed expressions. I’d used auto in one case and the result got stored as a large unsigned value, instead of signed as desired (i.e. negative values got stored in unsigned auto variables). In that case I used an explicit type. Extensive use of auto may end up requiring more unit and other test if the types picked are not those that are desired.

std::chrono (ticks.h)

This is a nice portability layer for fine grain time measurements, allowing you to avoid platform specific functions like gettimeofday, and also avoid any composition of the seconds/subseconds data that many such interfaces provide.

Here’s a fragment of a class that allows interval time measurements and subsequent conversion:

class ticks
{
   using clock      = std::chrono::high_resolution_clock ;

   clock::time_point m_sample ;
public:

   static inline ticks sample()
   {
      ticks t ;
      t.m_sample = clock::now() ;

      return t ;
   }

   using duration   = decltype( m_sample - m_sample ) ;

   friend duration operator -( const ticks & a, const ticks & b ) ;
} ;

inline ticks::duration operator -( const ticks & a, const ticks & b )
{
   return a.m_sample - b.m_sample ;
}

inline auto durationToMicroseconds( const ticks::duration & diff )
{
   return std::chrono::duration_cast<std::chrono::microseconds>( diff ).count() ;
}

Note that the last function is using c++14 return type deduction. That does not work without coersion
in c++11, requiring:

inline auto durationToMicroseconds( const ticks::duration & diff )
-> decltype(std::chrono::duration_cast<std::chrono::microseconds>( diff ).count())
{
   return std::chrono::duration_cast<std::chrono::microseconds>( diff ).count() ;
}

which is very ugly.

Random numbers

/**
   A random number generator that produces integer uniformly
   distributed in the interval:

   [a, a + delta N]

   with separation delta between values returned.
 */
template <int a, int delta, int N>
class RandomIntegers
{
   std::random_device                        m_rd ;A
   //std::default_random_engine                m_engine ;
   std::mt19937                              m_engine ;
   std::uniform_int_distribution<unsigned>   m_uniform ;

public:
   /** constuct a uniform random number generator for the specified range */
   RandomIntegers( )
      : m_rd()
      , m_engine( m_rd() )
      , m_uniform( 0, N )
   {
      static_assert( N > 0, "Integer N > 0 expected" ) ;
      static_assert( delta > 0, "Integer delta > 0 expected" ) ;
   }

   /**
      return a uniform random number sample from {a, a + delta, ..., a + delta N}
    */
   int sample()
   {
      auto p = m_uniform( m_engine ) ;

      return a + p * delta ;
   }
} ;

constexpr

Instead of using #defines, one can use completely typed declarations, but still constant using the constexpr keyword. An example

constexpr size_t N{3} ;
std::tuple<int, N> t ;

nullptr

The days of not knowing what header defines NULL and dealing with conflicting definitions are over. Instead of using NULL, we now have a builtin language construct nullptr available.

Lambdas and sort

Custom sorting is really simple in c++ now. Here’s an example of a partial sort (sorting the top N elements, and leaving the rest unspecified). The sort function no longer has to be a function call, and can be specified inline

auto second_greater = [](auto & left, auto & right) { return left.second > right.second ; } ;
std::partial_sort( cvec.begin(),
                   cvec.begin() + N,
                   cvec.end(),
                   second_greater ) ;

The “inline” sort function here is using c++14 lambda syntax. For c++11, the parameter types can’t be auto, so something such as the following might be required

auto second_greater = [](const results_pair & left, const results_pair & right) { return left.second > right.second ; } ;

Useful standard helper methods

The standard library has lots of useful utility functions. I’m sure I only scratched the surface discovering some of those. Some I used were:

std::swap( m_sz, other.m_sz ) ;
std::fill( m_storage.begin(), m_storage.end(), v ) ;
std::copy( b.m_storage.begin(), b.m_storage.end(), m_storage.begin() ) ;
r.first  = std::max( l, m_myFirstGlobalElementIndex ) ;
r.second = std::min( u, m_myLastGlobalElementIndex ) ;

I also liked the copysign function, allowing easy access to the sign bit of a float or double without messing around with extracting the bit, or explicit predicates:

inline double signof( const double v )
{
   return std::copysign( 1.0, v ) ;
}

Mean and standard deviation were also really easy to calculate. Here’s an example that used a lambda function to calculate the difference from the mean to get at the squared difference from the mean:


      m_sum = std::accumulate( v.begin(), v.end(), 0.0 ) ;
      m_mean = m_sum / v.size() ;
      double mean = m_mean ; // for lambda capture

      std::vector<double> diff( v.size() ) ;

      std::transform( v.begin(), v.end(), diff.begin(), [mean](double x) { return x - mean; } ) ;

      m_sq_sum = std::inner_product( diff.begin(), diff.end(), diff.begin(), 0.0 ) ;

decltype

Attempting to mix auto with g++’s ‘-Wall -Werror’ causes some trouble. For example, this doesn’t work

void foo ( const size_t size )
{
   for ( auto i{0} ; i < size ; i++ )
   {
      // ...
   }
}

This doesn’t compile since the i < size portion generates sign vs unsigned comparison warnings. There are a few ways to fix this.

   // specify the type explicitly:
   for ( size_t i{0} ; i < size ; i++ )

   // let the compiler use the type of the size variable:
   for ( decltype(size) i{0} ; i < size ; i++ )

The decltype method is probably of more use in template code. For non-template code, I found that explicitly specifying the type was more readable.

std::valarray (myrarray.h)

The standard library has a vectored array construct, but I was disappointed with the quality of the generated code that I observed. It also turned out to be faster not to use it. For example:

void SineCosineVecOps( std::valarray<float> & s, std::valarray<float> & c, const std::valarray<float> & v )
{
   s = std::sin( v ) ;
   c = std::cos( v ) ;
}

void SineCosineManOps( std::valarray<float> & s, std::valarray<float> & c, const std::valarray<float> & v )
{
   for ( Uint i{0} ; i < ASIZE ; i++ )
   {  
      float theta = v[i] ;

      s[i] = std::sin( theta ) ;
      c[i] = std::cos( theta ) ;
   }
}

when run on a 300 element array executed close to 1.5x slower using the valarray vector assignment operation, and had close to 3x times the instructions (with optimization)!

Perhaps other compilers do better with valarray. g++ 5.3 is certainly not worth using with that container type.

Some Unix command line one liners

July 24, 2014 perl and general scripting hackery , , , , , , , , , ,

Here’s a couple one-liner shell commands collected over the last couple months when it occurred to me to record them.  Each of these I thought were somewhat notable at the time I did so.

Nested “backquotes”

I often have to run commands where it is convenient to have the parameters of the commands in a file.  A simple example is to edit all the files in a list of files, say:

vim `cat c`
# or:
vim $(cat c)

A useful variation of this is to do the same using the output of a command that also takes its input from a file. Here’s one to edit all the “ancestor” files in the version control system, assuming a command vcsancestor that produces such filenames

vim `vcsancestor $(cat c)`
# or
vim $(vcsancestor $(cat c))

Observe how two different methods of embedding shell commands can be combined into one command. In the past I often used for loops for something like this, say:

for i in `cat c` ; do vcsancestor $i ; done > f
vim `cat f`

(because backquotes can’t be nested). It only recently occurred to me that this isn’t a limitation if $() style subshells are used.

Batching commands with xargs

When working in a version control system, it’s often useful to do a batch checkout of all the files that have compilation errors.  Suppose that you made changes that produced the following compilation error output:

$ cat compile.errors
"satauth.C", line 978.30: 1540-0274 (S) The name lookup for "sqlorest" did not find a declaration.
"scrutil.C", line 142.52: 1540-0274 (S) The name lookup for "SQLNLS_SAME_STRING" did not find a declaration.
"testdrv.C", line 1146.16: 1540-0274 (S) The name lookup for "SQLO_OK" did not find a declaration.
"testdrv.C", line 183.15: 1540-0274 (S) The name lookup for "SQLO_OK" did not find a declaration.

Here’s a one liner to checkout all the files in this list of compilation errors (this is AIX xlC error output):

cut -f2 -d'"' x | sort -u | xargs cleartool checkout -nc

The cut command selects just the (first) double-quote delimited text, then dups are removed with sort -u, and finally xargs is used to run a command on each of the files in the resulting output

Looking for a subset of information delimited by markers on separate lines

grep works nicely for matching patterns that are constrained to a single line.  If you are using gnu-grep you can use the -A and -B options to find stuff after and before the pattern of interest.  As an example, in our stacktrace files (a post mortem crash dump format), we have output that includes:

<pre>

<StackTrace>
—–FUNC-ADDR—- ——FUNCTION + OFFSET——
0x00002AAAC74EF263 ossDumpStackTraceInternal(unsigned long, OSSTrapFile&, int, siginfo*, void*, unsigned long, unsigned long) + 0x06e3
0x00002AAAC74EFE89 ossDumpStackTraceV98 + 0x007f
0x00002AAAC74E5C5F OSSTrapFile::dumpEx(unsigned long, int, siginfo*, void*, unsigned long) + 0x04db
0x00002AAABA6EB313 sqlo_trce + 0x0a6f
0x00002AAABA9C52B5 sqloDumpDiagInfoHandler + 0x047b
0x00002AAAAABD5E00 address: 0x00002AAAAABD5E00 ; dladdress: 0x00002AAAAABC8000 ; offset in lib: 0x000000000000DE00 ;
0x00002AAAAABD30A5 pthread_kill + 0x0035
0x00002AAAB5D828DF ossPthreadKill(unsigned long, unsigned int) + 0x0053
0x00002AAABA9C6CA1 sqloDumpEDU + 0x0091
0x00002AAABED7A853 sqlzerdm + 0x149b
0x00002AAAB5D7D745 sqle_remap_errors(int, sqlca*, sqeAgent*) + 0x01c9
0x00002AAAB5DE8717 sqeApplication::AppStopUsing(sqeAgent*, unsigned char, sqlca*) + 0x10b1
0x00002AAAB5D46FF5 address: 0x00002AAAB5D46FF5 ; dladdress: 0x00002AAAAACE1000 ; offset in lib: 0x000000000B065FF5 ;
0x00002AAAB5D4073F address: 0x00002AAAB5D4073F ; dladdress: 0x00002AAAAACE1000 ; offset in lib: 0x000000000B05F73F ;
0x00002AAAB5D44F35 sqleIndCoordProcessRequest(sqeAgent*) + 0x3959
0x00002AAAB5DA8E55 sqeAgent::RunEDU() + 0x061b
0x00002AAABEDAC2C7 sqzEDUObj::EDUDriver() + 0x035d
0x00002AAABEDABBD7 sqlzRunEDU(char*, unsigned int) + 0x0053
0x00002AAABA9BFC62 sqloEDUEntry + 0x1460
0x00002AAAAABCE2A3 address: 0x00002AAAAABCE2A3 ; dladdress: 0x00002AAAAABC8000 ; offset in lib: 0x00000000000062A3 ;
0x00002AAAC7F376DD __clone + 0x006d
</StackTrace>

</pre>

Here’s a one-liner to grab just the portions of these files within the delimiters (with some other filtering that isn’t of terrible interest to describe)

for i in *stack* ; do grep -A40 ‘<StackTrace’ $i | grep -v ‘(/’ | grep -B40 ‘/StackTrace’ | c++filt ; done | less

Unix to Windows path separator switching

Suppose we have some unix filenames

</pre>
$ head -5 f
/vbs/bin/AEDefines.pm
/vbs/bin/AEMacro.pm
/vbs/bin/bld_shared_lib_Darwin
/vbs/bin/chglibpaths
/vbs/bin/chglibpaths_Darwin

and want the Windows paths for the same

</pre>
$ head -5 f | tr / '\\'
bin\AEDefines.pm
bin\AEMacro.pm
bin\bld_shared_lib_Darwin
bin\chglibpaths
bin\chglibpaths_Darwin
<pre>

The tr command above looks a bit like ascii barf, and will translate forward slashes to backward slashes (perhaps for input that’s a list of files).

I didn’t understand the requirement to both single quote the backslash as well as escaping it, but Darin explained it for me:

Quotes allow the backslashes to go through the shell to tr.  And tr has its own backslash escape mechanism (so you can do things like transform \n into \r or something – where you’d then specify ‘\n’ or just \\n and ‘\r’ or \\r).

Vim: replace search results with contents from a file

Probably related to merging conflicting changes, I wanted to completely replace the implementation of a particular function:

void foo() {
 ...
}

This was an easy way one liner method to do that replacement, deleting the implementation of foo, and replacing it with the one that was found in the file ‘foo’

:,/^}/ !cat foo

file:line: delimited output for a single file

The grep -n command is very handy for producing file:line:content delimited output.  In particular, you can iterate over such output with vim -q.  When you want to do this for a single file, grep -n doesn’t include the filename, defeating a subsequent vim -q (since vim then doesn’t know what file to open).  Here’s an example

$ cat my_file_to_search
blah patternOfInterest hi
foo goo
patternOfInterest bye
blah patternOfInterest hi
blah patternOfInterest hi
foo goo
patternOfInterest bye
foo goo
patternOfInterest bye

$ grep -n patternOfInterest my_file_to_search | tee v
1:blah patternOfInterest hi
3:patternOfInterest bye
4:blah patternOfInterest hi
5:blah patternOfInterest hi
7:patternOfInterest bye
9:patternOfInterest bye

To get vim -q’able output, just include a second non-existent dummy file in the search

grep -n patternOfInterest my_file_to_search a_file_that_doesnt_exist | tee v
vim -q v

I usually use a very-short filename for the “does not exist file”, say, .u (which presumes I also don’t create little hidden files .u in my day-to-day work).