clang/llvm

using ltrace to dig into shared libraries

October 19, 2016 C/C++ development and debugging., clang/llvm 1 comment , ,

I was trying to find where the clang compiler is writing out constant global data values, and didn’t manage to find it by code inspection. If I run ltrace (also tracing system calls), I see the point where the ELF object is written out:

std::string::compare(std::string const&) const(0x7ffc8983a190, 0x1e32e60, 7, 254) = 5
std::string::compare(std::string const&) const(0x1e32e60, 0x7ffc8983a190, 7, 254) = 0xfffffffb
std::string::compare(std::string const&) const(0x7ffc8983a190, 0x1e32e60, 7, 254) = 5
write@SYS(4, "\177ELF\002\001\001", 848)         = 848
lseek@SYS(4, 40, 0)                              = 40
write@SYS(4, "\220\001", 8)                      = 8
lseek@SYS(4, 848, 0)                             = 848
lseek@SYS(4, 60, 0)                              = 60
write@SYS(4, "\a", 2)                            = 2
lseek@SYS(4, 848, 0)                             = 848
std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()(0x1e2a2e0, 0x1e2a2e8, 0x1e27978, 0x1e27978) = 0
rt_sigprocmask@SYS(2, 0x7ffc8983bb58, 0x7ffc8983bad8, 8) = 0
close@SYS(4)                                     = 0
rt_sigprocmask@SYS(2, 0x7ffc8983bad8, 0, 8)      = 0

This is from running:

ltrace -S --demangle \
   ...

The -S is to display syscalls as well as library calls. To my suprise, this seems to show calls to libstc++ library calls, but I’m not seeing much from clang itself, just:

clang::DiagnosticsEngine::DiagnosticsEngine
clang::driver::ToolChain::getTargetAndModeFromProgramName
llvm::cl::ExpandResponseFiles
llvm::EnablePrettyStackTrace
llvm::errs
llvm::install_fatal_error_handler
llvm::llvm_shutdown
llvm::PrettyStackTraceEntry::PrettyStackTraceEntry
llvm::PrettyStackTraceEntry::~PrettyStackTraceEntry
llvm::raw_ostream::preferred_buffer_size
llvm::raw_svector_ostream::write_impl
llvm::remove_fatal_error_handler
llvm::StringMapImpl::LookupBucketFor
llvm::StringMapImpl::RehashTable
llvm::sys::PrintStackTraceOnErrorSignal
llvm::sys::Process::FixupStandardFileDescriptors
llvm::sys::Process::GetArgumentVector
llvm::TimerGroup::printAll

There’s got to be a heck of a lot more that the compiler is doing!? It turns out that ltrace doesn’t seem to trace out all the library function calls that lie in shared libraries (I’m using a shared library + split dwarf build of clang). The default output was a bit deceptive since I saw some shared lib calls, in particular the there were std::… calls (from libstc++.so) in the ltrace output. My conclusion seems to be that the tool is lying by default.

This can be confirmed by explicitly asking to see the functions from a specific shared lib. For example, if I call ltrace as:

ltrace -S --demangle -e @libLLVMX86CodeGen.so \
/clang/be.b226a0a/bin/clang-3.9 \
-cc1 \
-triple \
x86_64-unknown-linux-gnu \
...

Now I get ~68K calls to libLLVMX86CodeGen.so functions that didn’t show up in the default ltrace output! The ltrace tool won’t show me these by default (although the man page seems to suggest that it should), but if I narrow down what I’m looking through to a single shared lib, at least I can now examine the function calls in that shared lib.

How to invoke the 2nd pass of the clang compiler manually

October 3, 2016 clang/llvm No comments , , , , ,

Because the clang front end reexecs itself, breakpoints on the interesting parts of the clang front end don’t get hit by default. Here’s an example

$ cat g2
b llvm::Module::setDataLayout
b BackendConsumer::BackendConsumer
b llvm::TargetMachine::TargetMachine
b llvm::TargetMachine::createDataLayout
run -mbig-endian -m64 -c bytes.c -emit-llvm -o big.bc

$ gdb `which clang`
GNU gdb (GDB) Red Hat Enterprise Linux 7.9.1-19.lz.el7
...
(gdb) source g2
Breakpoint 1 at 0x2c04c3d: llvm::Module::setDataLayout. (2 locations)
Breakpoint 2 at 0x3d08870: file /source/llvm/lib/Target/TargetMachine.cpp, line 47.
Breakpoint 3 at 0x33108ca: file /source/llvm/include/llvm/Target/TargetMachine.h, line 133.
...
Detaching after vfork from child process 15795.
[Inferior 1 (process 15789) exited normally]

(The debugger finishes and exits, hitting none of the breakpoints)

One way to deal with this is to set the fork mode to child:

(gdb) set follow-fork-mode child

An alternate way of dealing with this is to use strace to collect the command line that clang invokes itself with. For example:

$ strace -f -s 1024 -v clang -mbig-endian -m64 big.bc -c 2>&1 | grep exec | tail -2 | head -1

This provides the command line options for the self invocation of clang

[pid  4650] execve("/usr/local/bin/clang-3.9", ["/usr/local/bin/clang-3.9", "-cc1", "-triple", "aarch64_be-unknown-linux-gnu", "-emit-obj", "-mrelax-all", "-disable-free", "-main-file-name", "big.bc", "-mrelocation-model", "static", "-mthread-model", "posix", "-mdisable-fp-elim", "-fmath-errno", "-masm-verbose", "-mconstructor-aliases", "-fuse-init-array", "-target-cpu", "generic", "-target-feature", "+neon", "-target-abi", "aapcs", "-dwarf-column-info", "-debugger-tuning=gdb", "-coverage-file", "/workspace/pass/run/big.bc", "-resource-dir", "/usr/local/bin/../lib/clang/3.9.0", "-fdebug-compilation-dir", "/workspace/pass/run", "-ferror-limit", "19", "-fmessage-length", "0", "-fallow-half-arguments-and-returns", "-fno-signed-char", "-fobjc-runtime=gcc", "-fdiagnostics-show-option", "-o", "big.o", "-x", "ir", "big.bc"],

With a bit of vim tweaking you can turn this into a command line that can be executed (or debugged) directly

/usr/local/bin/clang-3.9 -cc1 -triple aarch64_be-unknown-linux-gnu -emit-obj -mrelax-all -disable-free -main-file-name big.bc -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu generic -target-feature +neon -target-abi aapcs -dwarf-column-info -debugger-tuning=gdb -coverage-file /workspace/pass/run/big.bc -resource-dir /usr/local/bin/../lib/clang/3.9.0 -fdebug-compilation-dir /workspace/pass/run -ferror-limit 19 -fmessage-length 0 -fallow-half-arguments-and-returns -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -o big.o -x ir big.bc

Note that doing this also provides a mechanism to change the compiler triple manually, which is something that I wondered how to do (since clang documents -triple as an option, but seems to ignore it). For example, I’m able to able to change -triple aarch64_be to aarch64 and get little endian object code from bytecode prepared with -mbig-endian.

speeding up clang debug and builds

October 2, 2016 clang/llvm No comments , , , , , , ,

I found the default static library configuration of clang slow to rebuild, so I started building it with in shared mode. That loaded pretty slow in gdb, so I went looking for how to enable split dwarf, and found a nice little presentation on how to speed up clang builds.

There’s a followup blog post with some speed up conclusions.

A failure of that blog post is actually listing the cmake commands required to build with all these tricks. Using all these tricks listed there, I’m now trying the following:

mkdir -p ~/freeware
cd ~/freeware

git clone git://sourceware.org/git/binutils-gdb.git
cd binutils-gdb
./configure --prefix=$HOME/local/binutils.gold --enable-gold=default
make 
make install

cd ..
git clone git://github.com/ninja-build/ninja.git 
cd ninja
./configure.py --bootstrap
mkdir -p ~/local/ninja/bin/
cp ninja ~/local/ninja/bin/

With ninja in my PATH, I can now build clang with:

CC=clang CXX=clang++ \
cmake -G Ninja \
../llvm \
-DLLVM_USE_SPLIT_DWARF=TRUE \
-DLLVM_ENABLE_ASSERTIONS=TRUE \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_INSTALL_PREFIX=$HOME/clang39.be \
-DCMAKE_SHARED_LINKER_FLAGS="-B$HOME/local/binutils.gold/bin -Wl,--gdb-index' \
-DCMAKE_EXE_LINKER_FLAGS="-B$HOME/local/binutils.gold/bin -Wl,--gdb-index' \
-DBUILD_SHARED_LIBS=true \
-DLLVM_TARGETS_TO_BUILD=X86 \
2>&1 | tee o

ninja

ninja install

This does build way faster, both for full builds and incremental builds.

Build tree size

Dynamic libraries: 4.4 Gb. Static libraries: 19.8Gb.

Installed size

Dynamic libraries: 0.7 Gb. Static libraries: 14.7Gb.

Results: full build time.

Static libraries, non-ninja, all backends:

real    51m6.494s
user    160m47.027s
sys     8m49.429s

Dynamic libraries, ninja, split dwarf, x86 backend only:

real    26m19.360s
user    86m11.477s
sys     3m14.478s

Results: incremental build. touch lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp.

Static libraries, non-ninja, all backends:

real    2m17.709s
user    6m8.648s
sys     0m28.594s

Dynamic libraries, ninja, split dwarf, x86 backend only:

real    0m3.245s
user    0m6.104s
sys     0m0.802s

make install times

make:

real    2m6.353s
user    0m7.827s
sys     0m15.316s

ninja:

real    0m2.138s
user    0m0.420s
sys     0m0.831s

The time for rerunning a sharedlib-config ‘ninja install’ is even faster!

Results: time for gdb, b main, run, quit

Static libraries:

real    0m45.904s
user    0m32.376s
sys     0m1.787s

Dynamic libraries, with split dwarf:

real    0m44.440s
user    0m37.096s
sys     0m1.067s

This one isn’t what I would have expected. The initial gdb load time for the split-dwarf exe is almost instantaneous, however it still takes a long time to break in main and continue to that point. I guess that we are taking the hit for a lot of symbol lookup at that point, so it comes out as a wash.

Thinking about this, I noticed that the clang make system doesn’t seem to add ‘-Wl,-gdb-index’ to the link step along with the addition of -gsplit-dwarf to the compilation command line. I thought that was required to get all the deferred symbol table lookup?

Attempting to do so, I found that the insertion of an alternate linker in my PATH wasn’t enough to get clang to use it. Adding –Wl,–gdb-index into the link flags caused complaints from /usr/bin/ld! The cmake magic required was:

-DCMAKE_SHARED_LINKER_FLAGS="-B$HOME/local/binutils.gold/bin -Wl,--gdb-index' \
-DCMAKE_EXE_LINKER_FLAGS="-B$HOME/local/binutils.gold/bin -Wl,--gdb-index' \

This is in the first cmake invocation flags above, but wasn’t used for my initial 45s gdb+clang time measurements. With –gdb-index, the time for the gdb b-main, run, quit sequence is now reduced to:

real    0m10.268s
user    0m3.623s
sys     0m0.429s

A 4x reduction, which is quite nice!