C/C++ development and debugging.

A wierd way to invoke the compiler

November 18, 2016 C/C++ development and debugging. ,

Did you know that you can run the compiler from sources specified in stdin?  Here’s an example:

// m.c
#include <stdio.h>

int main()
{   
    int x = 3;
    printf("%d\n", x);
    return 0;
}

You have to specify the language for the code explicitly, since that can’t be inferred from the filename when that file data is coming from stdin:

$ cat m.c | clang -g -x c - -o f
$ ./f
3

This fact came up in conversation the other day. The result is something that is completely undebuggable, but you can do it! I’m curious if there’s actually a use case for this?

Another Linux shared library trace facility

October 27, 2016 C/C++ development and debugging. , ,

I previously blogged about a way to force ltrace to show some shared memory trace records that didn’t show up by default.

Where that fails to be useful, is when you don’t have a guess about what shared library the code in question lives in. I just blundered on the latrace command that uses a Linux dynamic loader audit facility to give a complete trace of all the function-name/library-name pairs that are executed!

Here’s an example invocation:

latrace \
clang xx.c -c 2>&1 | c++filt

without output like:

...
 9022     std::operator&(std::memory_order, std::__memory_order_modifier) [/home/pjoot/clang/be.5e0ac1f.lz31/bin/../lib/libLLVMAnalysis.so]
 9022     std::operator&(std::memory_order, std::__memory_order_modifier) [/home/pjoot/clang/be.5e0ac1f.lz31/bin/../lib/libLLVMAnalysis.so]
 9022     strlen [/lib64/libc.so.6]
 9022     strlen [/lib64/libc.so.6]
 9022     strlen [/lib64/libc.so.6]
 9022     llvm::cl::basic_parser::basic_parser(llvm::cl::Option&) [/home/pjoot/clang/be.5e0ac1f.lz31/bin/../lib/libLLVMSupport.so]
 9022     strlen [/lib64/libc.so.6]
 9022     llvm::cl::Option::setArgStr(llvm::StringRef) [/home/pjoot/clang/be.5e0ac1f.lz31/bin/../lib/libLLVMSupport.so]
 9022     strlen [/lib64/libc.so.6]
 9022     std::pair::__type, std::__decay_and_strip::__type> std::make_pair(void const**&&, bool&&) [/home/pjoot/clang/be.5e0ac1f.lz31/bin/../lib/libLLVMX86CodeGen.so]
 9022       void const**&& std::forward(std::remove_reference::type&) [/home/pjoot/clang/be.5e0ac1f.lz31/bin/../lib/libLLVMX86CodeGen.so]
 9022       bool&& std::forward(std::remove_reference::type&) []
 9022       void const**&& std::forward(std::remove_reference::type&) [/home/pjoot/clang/be.5e0ac1f.lz31/bin/../lib/libLLVMX86CodeGen.so]
 9022       bool&& std::forward(std::remove_reference::type&) []
 9022     void const**&& std::forward(std::remove_reference::type&) [/home/pjoot/clang/be.5e0ac1f.lz31/bin/../lib/libLLVMX86CodeGen.so]
...

With this latrace command, we get all the shared library function call names and their corresponding shared library names. Using that info we can dig into a specific shared library with ltrace or the debugger, once a point of interest is determined.

using ltrace to dig into shared libraries

October 19, 2016 C/C++ development and debugging., clang/llvm , ,

I was trying to find where the clang compiler is writing out constant global data values, and didn’t manage to find it by code inspection. If I run ltrace (also tracing system calls), I see the point where the ELF object is written out:

std::string::compare(std::string const&) const(0x7ffc8983a190, 0x1e32e60, 7, 254) = 5
std::string::compare(std::string const&) const(0x1e32e60, 0x7ffc8983a190, 7, 254) = 0xfffffffb
std::string::compare(std::string const&) const(0x7ffc8983a190, 0x1e32e60, 7, 254) = 5
write@SYS(4, "\177ELF\002\001\001", 848)         = 848
lseek@SYS(4, 40, 0)                              = 40
write@SYS(4, "\220\001", 8)                      = 8
lseek@SYS(4, 848, 0)                             = 848
lseek@SYS(4, 60, 0)                              = 60
write@SYS(4, "\a", 2)                            = 2
lseek@SYS(4, 848, 0)                             = 848
std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()(0x1e2a2e0, 0x1e2a2e8, 0x1e27978, 0x1e27978) = 0
rt_sigprocmask@SYS(2, 0x7ffc8983bb58, 0x7ffc8983bad8, 8) = 0
close@SYS(4)                                     = 0
rt_sigprocmask@SYS(2, 0x7ffc8983bad8, 0, 8)      = 0

This is from running:

ltrace -S --demangle \
   ...

The -S is to display syscalls as well as library calls. To my suprise, this seems to show calls to libstdc++ library calls, but I’m not seeing much from clang itself, just:

clang::DiagnosticsEngine::DiagnosticsEngine
clang::driver::ToolChain::getTargetAndModeFromProgramName
llvm::cl::ExpandResponseFiles
llvm::EnablePrettyStackTrace
llvm::errs
llvm::install_fatal_error_handler
llvm::llvm_shutdown
llvm::PrettyStackTraceEntry::PrettyStackTraceEntry
llvm::PrettyStackTraceEntry::~PrettyStackTraceEntry
llvm::raw_ostream::preferred_buffer_size
llvm::raw_svector_ostream::write_impl
llvm::remove_fatal_error_handler
llvm::StringMapImpl::LookupBucketFor
llvm::StringMapImpl::RehashTable
llvm::sys::PrintStackTraceOnErrorSignal
llvm::sys::Process::FixupStandardFileDescriptors
llvm::sys::Process::GetArgumentVector
llvm::TimerGroup::printAll

There’s got to be a heck of a lot more that the compiler is doing!? It turns out that ltrace doesn’t seem to trace out all the library function calls that lie in shared libraries (I’m using a shared library + split dwarf build of clang). The default output was a bit deceptive since I saw some shared lib calls, in particular the there were std::… calls (from libstc++.so) in the ltrace output. My conclusion seems to be that the tool is lying by default.

This can be confirmed by explicitly asking to see the functions from a specific shared lib. For example, if I call ltrace as:

$ ltrace -S --demangle -e @libLLVMX86CodeGen.so \
/clang/be.b226a0a/bin/clang-3.9 \
-cc1 \
-triple \
x86_64-unknown-linux-gnu \
...

Now I get ~68K calls to libLLVMX86CodeGen.so functions that didn’t show up in the default ltrace output! The ltrace tool won’t show me these by default (although the man page seems to suggest that it should), but if I narrow down what I’m looking through to a single shared lib, at least I can now examine the function calls in that shared lib.

On the SONAME

Note that the @lib….so name has to match the SONAME.  For example if the shared libraries on disk were:

libLLVMX86CodeGen.so -> libLLVMX86CodeGen.so.3
libLLVMX86CodeGen.so.3 -> libLLVMX86CodeGen.so.3.9
libLLVMX86CodeGen.so.3.9 -> libLLVMX86CodeGen.so.3.9.0

$ objdump -x libLLVMX86CodeGen.so | grep SONAME

would give you the name to use.  This becomes relevant in clang 4.0 where the SONAME ends up with .so.4 instead of just .so (when building clang with shared libs instead of archive libs).

brace matching in vim, regardless of how it is formatted?

August 31, 2016 C/C++ development and debugging. ,

DB2 functions were usually formatted with the brace on the leading line like so:

size_t table_count( T * table )
{ 
   size_t count = 0 ;
   ....
} 

For such code, typing [[ in vim anywhere from somewhere in the function text would take you to the beginning of the function. It has always annoyed me that this key sequence didn’t work for functions formatted without the leading { in the first column, such as

size_t table_count( T * table ) { 
    size_t count = 0 ;
    ....
} 

Having my handy [[ command sequence take me to the first line of the file is pretty annoying, enough that I looked up the way to do what I want. A key sequence that does part of this job is:

[{

This takes you to the outermost ending position of the current scope, and you can use % to get to the beginning of that scope. You can repeat this as many times as necessary, until you get the outermost scope.

Is there a better way to go directly to the outermost scope directly, regardless of how the function happens to be formatted?

Playing with c++11 and posix regular expression libraries

July 24, 2016 C/C++ development and debugging. , , , , , , , , ,

I was curious how the c++11 std::regex interface compared to the C posix regular expression library. The c++11 interfaces are almost as easy to use as perl. Suppose we have some space separated fields that we wish to manipulate, showing an order switch and the original:

my @strings = ( "hi bye", "hello world", "why now", "one two" ) ;

foreach ( @strings )
{
   s/(\S+)\s+(\S+)/'$&' -> '$2 $1'/ ;

   print "$_\n" ;
}

The C++ equivalent is

   const char * strings[] { "hi bye", "hello world", "why now", "one two" } ;

   std::regex re( R"((\S+)\s+(\S+))" ) ;

   for ( auto s : strings )
   {
      std::cout << regex_replace( s, re, "'$&' -> '$2 $1'\n" )  ;
   }

We have one additional step with the C++ code, compiling the regular expression. Precompilation of perl regular expressions is also possible, but that is usually just as performance optimization.

The posix equivalent requires precompilation too

void posixre_error( regex_t * pRe, int rc )
{
   char buf[ 128 ] ;

   regerror( rc, pRe, buf, sizeof(buf) ) ;

   fprintf( stderr, "regerror: %s\n", buf ) ;
   exit( 1 ) ;
}

void posixre_compile( regex_t * pRe, const char * expression )
{
   int rc = regcomp( pRe, expression, REG_EXTENDED ) ;
   if ( rc )
   { 
      posixre_error( pRe, rc ) ;
   }
}

but the transform requires more work:

void posixre_transform( regex_t * pRe, const char * input )
{
   constexpr size_t N{3} ;
   regmatch_t m[N] {} ;

   int rc = regexec( pRe, input, N, m, 0 ) ;

   if ( rc && (rc != REG_NOMATCH) )
   {
      posixre_error( pRe, rc ) ;
   }

   if ( !rc )
   { 
      printf( "'%s' -> ", input ) ;
      int len ;
      len = m[2].rm_eo - m[2].rm_so ; printf( "'%.*s ", len, &input[ m[2].rm_so ] ) ;
      len = m[1].rm_eo - m[1].rm_so ; printf( "%.*s'\n", len, &input[ m[1].rm_so ] ) ;
   }
}

To get at the capture expressions we have to pass an array of regmatch_t’s. The first element of that array is the entire match expression, and then we get the captures after that. The awkward thing to deal with is that the regmatch_t is a structure containing the start end end offset within the string.

If we want more granular info from the c++ matcher, it can also provide an array of capture info. We can also get info about whether or not the match worked, something we can do in perl easily

my @strings = ( "hi bye", "helloworld", "why now", "onetwo" ) ;

foreach ( @strings )
{
   if ( s/(\S+)\s+(\S+)/$2 $1/ )
   {
      print "$_\n" ;
   }
}  

This only prints the transformed line if there was a match success. To do this in C++ we can use regex_match

const char * pattern = R"((\S+)\s+(\S+))" ;

std::regex re( pattern ) ;

for ( auto s : strings )
{ 
   std::cmatch m ;

   if ( regex_match( s, m, re ) )
   { 
      std::cout << m[2] << ' ' << m[1] << '\n' ;
   }
}

Note that we don’t have to mess around with offsets as was required with the Posix C interface, and also don’t have to worry about the size of the capture match array, since that is handled under the covers. It’s not too hard to do wrap the posix C APIs in a C++ wrapper that makes it about as easy to use as the C++ regex code, but unless you are constrained to using pre-C++11 code and can also live with a Unix only restriction. There are also portability issues with the posix APIs. For example, the perl-style regular expressions like:

   R"((\S+)(\s+)(\S+))" ) ;

work fine with the Linux regex API, but that appears to be an exception. To make code using that regex work on Mac, I had to use strict posix syntax

   R"(([^[:space:]]+)([[:space:]]+)([^[:space:]]+))"

Actually using the Posix C interface, with a portability constraint that avoids the Linux regex extensions, would be horrendous.