clang/llvm

Tagged V3 of my toy compiler (playing with the MLIR -> LLVM-IR toolchain)

June 3, 2025 clang/llvm No comments , , , , , , , , , , , , , ,

Screenshot

 

I’ve added a number of elements to the language and compiler:

  • comparison operators (<, <=, EQ, NE) yielding BOOL values.  These work for any combinations of floating and integer types (including BOOL.)

  • integer bitwise operators (OR, AND, XOR).  These only for for integer types (including BOOL.)

  • a NOT operator, yielding BOOL.

  • Array + string declaration and lowering support, including debug instrumentation, and print support for string variables.

This version also fixes a few specific issues:

  • Fixed -g/-OX propagation to lowering.  If -g not specified, now don’t generate the DI.
  • Show the optimized .ll with –emit-llvm instead of the just-lowered .ll (unless not invoking the assembly printer, where the ll optimization passes are registered.)
  • Reorganize the grammar so that all the simple lexer tokens are last.  Rename a bunch of the tokens, introducing some consistency.
  • calculator.td: introduce IntOrFloat constraint type, replacing AnyType usage; array decl support, and string support.
  • driver: writeLL helper function, pass -g to lowering if set.
  • parser: handle large integer constants properly, array decl support, and string support.
  • simplest.cpp: This MWE is updated to include a global variable and global variable access.
  • parser: implicit exit: use the last saved location, instead of the module start.  This means the line numbers don’t jump around at the very end of the program anymore (i.e.: implicit return/exit)

I started with the comparison operators, thinking that I’d add if statement support, and loops, but got sidetracked.  In particular, I generated a number of really large test programs, and without some way to print a string message, it was hard to figure out where an error was occuring.  This led to implementing PRINT string-variable support as an interesting feature first.

As a side effect of adding STRING support, I’ve also got declaration support for arbitrary fixed size arrays for any type.  I haven’t implemented array access yet, but that probably won’t be too hard.

Here’s an example of a program that uses STRING variables:

STRING t[2];
STRING u[3];
STRING s[2];
INT8 s2;
s = "hi";
PRINT s;
t = "hi";
PRINT t;
u = "hi";
PRINT u;
u = "bye";
PRINT u;

This is the MLIR for the program:

module {
  toy.program {
    "toy.declare"() <{name = "t", size = 2 : i64, type = i8}> : () -> ()
    "toy.declare"() <{name = "u", size = 3 : i64, type = i8}> : () -> ()
    "toy.declare"() <{name = "s", size = 2 : i64, type = i8}> : () -> ()
    "toy.declare"() <{name = "s2", type = i8}> : () -> ()
    toy.string_assign "s" = "hi"
    %0 = toy.load "s" : !llvm.ptr
    toy.print %0 : !llvm.ptr
    toy.string_assign "t" = "hi"
    %1 = toy.load "t" : !llvm.ptr
    toy.print %1 : !llvm.ptr
    toy.string_assign "u" = "hi"
    %2 = toy.load "u" : !llvm.ptr
    toy.print %2 : !llvm.ptr
    toy.string_assign "u" = "bye"
    %3 = toy.load "u" : !llvm.ptr
    toy.print %3 : !llvm.ptr
    toy.exit
  }
}

It’s a bit clunky, because I cheated and didn’t try to implement PRINT support of string literals directly. I thought that since I had variable support already (which emits llvm.alloca), I could change that alloca trivially to an array from a scalar value.

I think that this did turn out to be a relatively easy way to do it, but this little item did take much more effort than I expected.

The DeclareOp builder is fairly straightforward:

builder.create<toy::DeclareOp>( loc, builder.getStringAttr( varName ), mlir::TypeAttr::get( ty ), nullptr ); // for scalars
...
auto sizeAttr = builder.getI64IntegerAttr( arraySize );
builder.create<toy::DeclareOp>( loc, builder.getStringAttr( varName ), mlir::TypeAttr::get( ty ), sizeAttr ); // for arrays

This matches the Optional size now added to DeclareOp for arrays:

def Toy_DeclareOp : Op<Toy_Dialect, "declare"> {
  let summary = "Declare a variable or array, specifying its name, type (integer or float), and optional size.";
  let arguments = (ins StrAttr:$name, TypeAttr:$type, OptionalAttr:$size);
  let results = (outs);

There’s a new AssignStringOp complementing AssignOp:

def Toy_AssignOp : Op<Toy_Dialect, "assign"> {
  let summary = "Assign a (non-string) value to a variable associated with a declaration";
  let arguments = (ins StrAttr:$name, AnyType:$value);
  let results = (outs);

  // toy.assign "x", %0 : i32
  let assemblyFormat = "$name `,` $value `:` type($value) attr-dict";
}

def Toy_AssignStringOp : Op<Toy_Dialect, "string_assign"> {
  let summary = "Assign a string literal to an i8 array variable";
  let arguments = (ins StrAttr:$name, Builtin_StringAttr:$value);
  let results = (outs);
  let assemblyFormat = "$name `=` $value attr-dict";
}

I also feel this is a cludge. I probably really want a string literal type like flang’s. Here’s a fortran hello world:

program hello
  print *, "Hello, world!"
end program hello

and selected parts of the flang fir dialect MLIR for it:

    %4 = fir.declare %3 typeparams %c13 {fortran_attrs = #fir.var_attrs, uniq_name = "_QQclX48656C6C6F2C20776F726C6421"} : (!fir.ref<!fir.char<1,13>>, index) -> !fir.ref<!fir.char<1,13>>
    %5 = fir.convert %4 : (!fir.ref<!fir.char<1,13>>) -> !fir.ref
    %6 = fir.convert %c13 : (index) -> i64
    %7 = fir.call @_FortranAioOutputAscii(%2, %5, %6) fastmath : (!fir.ref, !fir.ref, i64) -> i1
  ...

  fir.global linkonce @_QQclX48656C6C6F2C20776F726C6421 constant : !fir.char<1,13> {
    %0 = fir.string_lit "Hello, world!"(13) : !fir.char<1,13>
    fir.has_value %0 : !fir.char<1,13>
  }

I had to specialize the LoadOp builder too so that it didn’t create a scalar load. That code looks like:

mlir::Type varType;
mlir::Type elemType = declareOp.getTypeAttr().getValue();
        
if ( declareOp.getSizeAttr() )    // Check if size attribute exists
{       
    // Array: load a generic pointer 
    varType = mlir::LLVM::LLVMPointerType::get( builder.getContext(), /*addressSpace=*/0 );
}       
else    
{       
    // Scalar: load the value
    varType = elemType;
}       
        
auto value = builder.create<toy::LoadOp>( loc, varType, builder.getStringAttr( varName ) );

Lowering didn’t require too much. I needed a print function object:

auto ptrType = LLVM::LLVMPointerType::get( ctx );
auto printFuncStringType = LLVM::LLVMFunctionType::get( LLVM::LLVMVoidType::get( ctx ),
                                                        { pr_builder.getI64Type(), ptrType }, false );
pr_printFuncString = pr_builder.create<LLVM::LLVMFuncOp>( pr_module.getLoc(), "__toy_print_string",
                                                          printFuncStringType, LLVM::Linkage::External );

With LoadOp now possibly having pointer valued return

if ( loadOp.getResult().getType().isa<mlir::LLVM::LLVMPointerType>() )
{           
    // Return the allocated pointer
    LLVM_DEBUG( llvm::dbgs() << "Loading array address: " << allocaOp.getResult() << '\n' );
    rewriter.replaceOp( op, allocaOp.getResult() );
}           
else        
{           
    // Scalar load
    auto load = rewriter.create<LLVM::LoadOp>( loc, elemType, allocaOp );
    LLVM_DEBUG( llvm::dbgs() << "new load op: " << load << '\n' );
    rewriter.replaceOp( op, load.getResult() );
}           

assign-string lowering basically just generates a memcpy from a global:

Type elemType = allocaOp.getElemType();
int64_t numElems = 0;
if ( auto constOp = allocaOp.getArraySize().getDefiningOp<LLVM::ConstantOp>() )
{           
    auto intAttr = mlir::dyn_cast<IntegerAttr>( constOp.getValue() );
    numElems = intAttr.getInt();
}           
LLVM_DEBUG( llvm::dbgs() << "numElems: " << numElems << '\n' );
LLVM_DEBUG( llvm::dbgs() << "elemType: " << elemType << '\n' );

if ( !mlir::isa<mlir::IntegerType>( elemType ) || elemType.getIntOrFloatBitWidth() != 8 )
{           
    return rewriter.notifyMatchFailure( assignOp, "string assignment requires i8 array" );
}           
if ( numElems == 0 )
{           
    return rewriter.notifyMatchFailure( assignOp, "invalid array size" );
}           

size_t strLen = value.size();
size_t copySize = std::min( strLen + 1, static_cast<size_t>( numElems ) );
if ( strLen > static_cast<size_t>( numElems ) )
{           
    return rewriter.notifyMatchFailure( assignOp, "string too large for array" );
}           

mlir::LLVM::GlobalOp globalOp = lState.lookupOrInsertGlobalOp( rewriter, value, loc, copySize, strLen );

auto globalPtr = rewriter.create<LLVM::AddressOfOp>( loc, globalOp ); 

auto destPtr = allocaOp.getResult();

auto sizeConst = 
    rewriter.create<LLVM::ConstantOp>( loc, rewriter.getI64Type(), rewriter.getI64IntegerAttr( copySize ) );

rewriter.create<LLVM::MemcpyOp>( loc, destPtr, globalPtr, sizeConst, rewriter.getBoolAttr( false ) );

rewriter.eraseOp( op );

I used global’s like what we’d find in clang LLVM-IR. For example, here’s a C hello world:

#include <string.h>

int main()
{
    const char* s = "hi there";
    char buf[100];
    memcpy( buf, s, strlen( s ) + 1 );

    return strlen( buf );
}

where our LLVM-IR looks like:

@.str = private unnamed_addr constant [9 x i8] c"hi there\00", align 1

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca ptr, align 8
  %3 = alloca [100 x i8], align 16
  store i32 0, ptr %1, align 4
  store ptr @.str, ptr %2, align 8
  %4 = getelementptr inbounds [100 x i8], ptr %3, i64 0, i64 0
  %5 = load ptr, ptr %2, align 8
  %6 = load ptr, ptr %2, align 8
  %7 = call i64 @strlen(ptr noundef %6) #3
  %8 = add i64 %7, 1
  call void @llvm.memcpy.p0.p0.i64(ptr align 16 %4, ptr align 1 %5, i64 %8, i1 false)
  %9 = getelementptr inbounds [100 x i8], ptr %3, i64 0, i64 0
  %10 = call i64 @strlen(ptr noundef %9) #3
  %11 = trunc i64 %10 to i32
  ret i32 %11
}

My lowered LLVM-IR for the program is similar:

@str_1 = private constant [3 x i8] c"bye"
@str_0 = private constant [2 x i8] c"hi"

declare void @__toy_print_f64(double)

declare void @__toy_print_i64(i64)

declare void @__toy_print_string(i64, ptr)

define i32 @main() {
  %1 = alloca i8, i64 2, align 1
  %2 = alloca i8, i64 3, align 1
  %3 = alloca i8, i64 2, align 1
  %4 = alloca i8, i64 1, align 1
  call void @llvm.memcpy.p0.p0.i64(ptr align 1 %3, ptr align 1 @str_0, i64 2, i1 false)
  call void @__toy_print_string(i64 2, ptr %3)
  call void @llvm.memcpy.p0.p0.i64(ptr align 1 %1, ptr align 1 @str_0, i64 2, i1 false)
  call void @__toy_print_string(i64 2, ptr %1)
  call void @llvm.memcpy.p0.p0.i64(ptr align 1 %2, ptr align 1 @str_0, i64 3, i1 false)
  call void @__toy_print_string(i64 3, ptr %2)
  call void @llvm.memcpy.p0.p0.i64(ptr align 1 %2, ptr align 1 @str_1, i64 3, i1 false)
  call void @__toy_print_string(i64 3, ptr %2)
  ret i32 0
}
...

I managed my string literals with a simple hash, avoiding replication if repeated:

mlir::LLVM::GlobalOp lookupOrInsertGlobalOp( ConversionPatternRewriter& rewriter, mlir::StringAttr& stringLit,
                                             mlir::Location loc, size_t copySize, size_t strLen )
{       
    mlir::LLVM::GlobalOp globalOp;
    auto it = pr_stringLiterals.find( stringLit.str() );
    if ( it != pr_stringLiterals.end() )
    {       
        globalOp = it->second;
        LLVM_DEBUG( llvm::dbgs() << "Reusing global: " << globalOp.getSymName() << '\n' ); 
    }       
    else    
    {       
        auto savedIP = rewriter.saveInsertionPoint();
        rewriter.setInsertionPointToStart( pr_module.getBody() );

        auto i8Type = rewriter.getI8Type();
        auto arrayType = mlir::LLVM::LLVMArrayType::get( i8Type, copySize );

        SmallVector<char> stringData( stringLit.begin(), stringLit.end() );
        if ( copySize > strLen )
        {       
            stringData.push_back( '\0' ); 
        }       
        auto denseAttr = DenseElementsAttr::get( RankedTensorType::get( { static_cast<int64_t>( copySize ) }, i8Type ),
                                                 ArrayRef<char>( stringData ) );

        std::string globalName = "str_" + std::to_string( pr_stringLiterals.size() );
        globalOp =
            rewriter.create<LLVM::GlobalOp>( loc, arrayType, true, LLVM::Linkage::Private, globalName, denseAttr );
        globalOp->setAttr( "unnamed_addr", rewriter.getUnitAttr() );

        pr_stringLiterals[stringLit.str()] = globalOp;
        LLVM_DEBUG( llvm::dbgs() << "Created global: " << globalName << '\n' );

        rewriter.restoreInsertionPoint( savedIP );
    }           

    return globalOp;
}         

Without the insertion point swaperoo, this GlobalOp creation doesn’t work, as we need to be in the ModuleOp level where the symbol table lives.

 

… anyways, it looks like I’m droning on.  There’s been lots of stuff to get this far, but there are still many many things to do before what I’ve got even qualifies as a basic programming language (if statements, loops, functions, array assignments, types, …)

LLVM IR Null pointer constants and function pointers. A wild goose chase after a bad assumption.

March 30, 2017 clang/llvm , , , , , ,

With ELLCC, you can easily check out the LLVM IR for code like:

typedef void ( *f )( void );
void foo( void );

f bar() {
    return (f)foo;
}

That code is:

define nonnull void ()* @bar() local_unnamed_addr {
  ret void ()* @foo
}

declare void @foo()

I was trying to use @foo in a “struct” object, and was getting an error attempting this:

llvm/lib/IR/Constants.cpp:879:llvm::ConstantAggregate::ConstantAggregate(
 llvm::CompositeType*, llvm::Value::ValueTy, llvm::ArrayRef<llvm::Constant*>):
 Assertion `V[I]->getType() == T->getTypeAtIndex(I) &&
 "Initializer for composite element doesn't match!"' failed.

After adding:

fooFunc->dump();

where it shows the whole function body of foo(), I thought that’s where the error was coming from, and that I needed some other method to obtain just “@foo”, a global variable reference to the function, and not the function body itself.

The actual story is much simpler. Here the LLVM code to generate the IR for a foo() with this interface:

//------------
// void foo(){ }
//
auto vt = m_builder.getVoidTy();
auto voidFuncVoidType = FunctionType::get( vt, false /* varargs */ );

Function *fooFunc = Function::Create(
    voidFuncVoidType, Function::InternalLinkage, "foo",
    m_module );
BasicBlock *fooBB =
    BasicBlock::Create( m_context, "", fooFunc );
m_builder.SetInsertPoint( fooBB );
m_builder.CreateRetVoid();

My clue that the error is something else is that I am able to build a function that returns a foo function pointer:

//------------
// void(*)() bar() { return foo ; }
//
auto fpRetFuncType = FunctionType::get( voidFuncVoidType->getPointerTo(), false /* varargs */ );

Function *barFunc = Function::Create(
    fpRetFuncType, Function::ExternalLinkage, "bar",
    m_module );
BasicBlock *barBB =
    BasicBlock::Create( m_context, "", barFunc );
m_builder.SetInsertPoint( barBB );
m_builder.CreateRet( fooFunc );

The module at this point looks like:

define internal void @foo() {
   ret void
}

define void ()* @bar() {
   ret void ()* @foo
}

So why can I used fooFunc in a return statement, but don’t appear to be able to use it in a structure object? Here’s the code that created that structure type

//------------
//
// struct { int, void (*)(), char * }
auto i8t = m_builder.getInt8Ty();
auto i32t = m_builder.getInt32Ty();
std::vector<Type *> consStructMembers{
    i32t, voidFuncVoidType->getPointerTo(), i8t->getPointerTo()};
auto consStructType =
    StructType::create( m_context, consStructMembers, "" );

and my attempt to populate an object of this type:

//
// %struct { int, void (*)(), char * } = { 65535, foo, null };
//
auto consPriority = ConstantInt::get( i32t, 65535 );
auto consDataZero = ConstantInt::get( i8t->getPointerTo(), 0 );

std::vector<Constant *> v{consPriority, fooFunc, consDataZero};
Constant *g = ConstantStruct::get( consStructType, v );

The actual error was in the third struct member initialization, and had nothing to do with the function pointer value. In retrospect, this makes sense since llvm::Function is derived from llvm::Constant, so there shouldn’t logically be a mismatch there.

What actually fixed the error was simply:

auto consDataZero = ConstantPointerNull::get( i8t->getPointerTo() );

It appears that the numeric zero value isn’t the same thing as an LLVM ‘null’. With that corrected, my variable declaration is:

%"type 0x10ea0c0" { i32 65535, void ()* @foo, i8* null }

… so I should now be able to proceed with the actual task at hand.

using ltrace to dig into shared libraries

October 19, 2016 C/C++ development and debugging., clang/llvm , ,

I was trying to find where the clang compiler is writing out constant global data values, and didn’t manage to find it by code inspection. If I run ltrace (also tracing system calls), I see the point where the ELF object is written out:

std::string::compare(std::string const&) const(0x7ffc8983a190, 0x1e32e60, 7, 254) = 5
std::string::compare(std::string const&) const(0x1e32e60, 0x7ffc8983a190, 7, 254) = 0xfffffffb
std::string::compare(std::string const&) const(0x7ffc8983a190, 0x1e32e60, 7, 254) = 5
write@SYS(4, "\177ELF\002\001\001", 848)         = 848
lseek@SYS(4, 40, 0)                              = 40
write@SYS(4, "\220\001", 8)                      = 8
lseek@SYS(4, 848, 0)                             = 848
lseek@SYS(4, 60, 0)                              = 60
write@SYS(4, "\a", 2)                            = 2
lseek@SYS(4, 848, 0)                             = 848
std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()(0x1e2a2e0, 0x1e2a2e8, 0x1e27978, 0x1e27978) = 0
rt_sigprocmask@SYS(2, 0x7ffc8983bb58, 0x7ffc8983bad8, 8) = 0
close@SYS(4)                                     = 0
rt_sigprocmask@SYS(2, 0x7ffc8983bad8, 0, 8)      = 0

This is from running:

ltrace -S --demangle \
   ...

The -S is to display syscalls as well as library calls. To my suprise, this seems to show calls to libstdc++ library calls, but I’m not seeing much from clang itself, just:

clang::DiagnosticsEngine::DiagnosticsEngine
clang::driver::ToolChain::getTargetAndModeFromProgramName
llvm::cl::ExpandResponseFiles
llvm::EnablePrettyStackTrace
llvm::errs
llvm::install_fatal_error_handler
llvm::llvm_shutdown
llvm::PrettyStackTraceEntry::PrettyStackTraceEntry
llvm::PrettyStackTraceEntry::~PrettyStackTraceEntry
llvm::raw_ostream::preferred_buffer_size
llvm::raw_svector_ostream::write_impl
llvm::remove_fatal_error_handler
llvm::StringMapImpl::LookupBucketFor
llvm::StringMapImpl::RehashTable
llvm::sys::PrintStackTraceOnErrorSignal
llvm::sys::Process::FixupStandardFileDescriptors
llvm::sys::Process::GetArgumentVector
llvm::TimerGroup::printAll

There’s got to be a heck of a lot more that the compiler is doing!? It turns out that ltrace doesn’t seem to trace out all the library function calls that lie in shared libraries (I’m using a shared library + split dwarf build of clang). The default output was a bit deceptive since I saw some shared lib calls, in particular the there were std::… calls (from libstc++.so) in the ltrace output. My conclusion seems to be that the tool is lying by default.

This can be confirmed by explicitly asking to see the functions from a specific shared lib. For example, if I call ltrace as:

$ ltrace -S --demangle -e @libLLVMX86CodeGen.so \
/clang/be.b226a0a/bin/clang-3.9 \
-cc1 \
-triple \
x86_64-unknown-linux-gnu \
...

Now I get ~68K calls to libLLVMX86CodeGen.so functions that didn’t show up in the default ltrace output! The ltrace tool won’t show me these by default (although the man page seems to suggest that it should), but if I narrow down what I’m looking through to a single shared lib, at least I can now examine the function calls in that shared lib.

On the SONAME

Note that the @lib….so name has to match the SONAME.  For example if the shared libraries on disk were:

libLLVMX86CodeGen.so -> libLLVMX86CodeGen.so.3
libLLVMX86CodeGen.so.3 -> libLLVMX86CodeGen.so.3.9
libLLVMX86CodeGen.so.3.9 -> libLLVMX86CodeGen.so.3.9.0

$ objdump -x libLLVMX86CodeGen.so | grep SONAME

would give you the name to use.  This becomes relevant in clang 4.0 where the SONAME ends up with .so.4 instead of just .so (when building clang with shared libs instead of archive libs).

How to invoke the 2nd pass of the clang compiler manually

October 3, 2016 clang/llvm , , , , ,

Because the clang front end reexecs itself, breakpoints on the interesting parts of the clang front end don’t get hit by default. Here’s an example

$ cat g2
b llvm::Module::setDataLayout
b BackendConsumer::BackendConsumer
b llvm::TargetMachine::TargetMachine
b llvm::TargetMachine::createDataLayout
run -mbig-endian -m64 -c bytes.c -emit-llvm -o big.bc

$ gdb `which clang`
GNU gdb (GDB) Red Hat Enterprise Linux 7.9.1-19.lz.el7
...
(gdb) source g2
Breakpoint 1 at 0x2c04c3d: llvm::Module::setDataLayout. (2 locations)
Breakpoint 2 at 0x3d08870: file /source/llvm/lib/Target/TargetMachine.cpp, line 47.
Breakpoint 3 at 0x33108ca: file /source/llvm/include/llvm/Target/TargetMachine.h, line 133.
...
Detaching after vfork from child process 15795.
[Inferior 1 (process 15789) exited normally]

(The debugger finishes and exits, hitting none of the breakpoints)

One way to deal with this is to set the fork mode to child:

(gdb) set follow-fork-mode child

An alternate way of dealing with this is to use strace to collect the command line that clang invokes itself with. For example:

$ strace -f -s 1024 -v clang -mbig-endian -m64 big.bc -c 2>&1 | grep exec | tail -2 | head -1

This provides the command line options for the self invocation of clang

[pid  4650] execve("/usr/local/bin/clang-3.9", ["/usr/local/bin/clang-3.9", "-cc1", "-triple", "aarch64_be-unknown-linux-gnu", "-emit-obj", "-mrelax-all", "-disable-free", "-main-file-name", "big.bc", "-mrelocation-model", "static", "-mthread-model", "posix", "-mdisable-fp-elim", "-fmath-errno", "-masm-verbose", "-mconstructor-aliases", "-fuse-init-array", "-target-cpu", "generic", "-target-feature", "+neon", "-target-abi", "aapcs", "-dwarf-column-info", "-debugger-tuning=gdb", "-coverage-file", "/workspace/pass/run/big.bc", "-resource-dir", "/usr/local/bin/../lib/clang/3.9.0", "-fdebug-compilation-dir", "/workspace/pass/run", "-ferror-limit", "19", "-fmessage-length", "0", "-fallow-half-arguments-and-returns", "-fno-signed-char", "-fobjc-runtime=gcc", "-fdiagnostics-show-option", "-o", "big.o", "-x", "ir", "big.bc"],

With a bit of vim tweaking you can turn this into a command line that can be executed (or debugged) directly

/usr/local/bin/clang-3.9 -cc1 -triple aarch64_be-unknown-linux-gnu -emit-obj -mrelax-all -disable-free -main-file-name big.bc -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu generic -target-feature +neon -target-abi aapcs -dwarf-column-info -debugger-tuning=gdb -coverage-file /workspace/pass/run/big.bc -resource-dir /usr/local/bin/../lib/clang/3.9.0 -fdebug-compilation-dir /workspace/pass/run -ferror-limit 19 -fmessage-length 0 -fallow-half-arguments-and-returns -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -o big.o -x ir big.bc

Note that doing this also provides a mechanism to change the compiler triple manually, which is something that I wondered how to do (since clang documents -triple as an option, but seems to ignore it). For example, I’m able to able to change -triple aarch64_be to aarch64 and get little endian object code from bytecode prepared with -mbig-endian.

speeding up clang debug and builds

October 2, 2016 clang/llvm , , , , , , ,

I found the default static library configuration of clang slow to rebuild, so I started building it with in shared mode. That loaded pretty slow in gdb, so I went looking for how to enable split dwarf, and found a nice little presentation on how to speed up clang builds.

There’s a followup blog post with some speed up conclusions.

A failure of that blog post is actually listing the cmake commands required to build with all these tricks. Using all these tricks listed there, I’m now trying the following:

mkdir -p ~/freeware
cd ~/freeware

git clone git://sourceware.org/git/binutils-gdb.git
cd binutils-gdb
./configure --prefix=$HOME/local/binutils.gold --enable-gold=default
make 
make install

cd ..
git clone git://github.com/ninja-build/ninja.git 
cd ninja
./configure.py --bootstrap
mkdir -p ~/local/ninja/bin/
cp ninja ~/local/ninja/bin/

With ninja in my PATH, I can now build clang with:

CC=clang CXX=clang++ \
cmake -G Ninja \
../llvm \
-DLLVM_USE_SPLIT_DWARF=TRUE \
-DLLVM_ENABLE_ASSERTIONS=TRUE \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_INSTALL_PREFIX=$HOME/clang39.be \
-DCMAKE_SHARED_LINKER_FLAGS="-B$HOME/local/binutils.gold/bin -Wl,--gdb-index' \
-DCMAKE_EXE_LINKER_FLAGS="-B$HOME/local/binutils.gold/bin -Wl,--gdb-index' \
-DBUILD_SHARED_LIBS=true \
-DLLVM_TARGETS_TO_BUILD=X86 \
2>&1 | tee o

ninja

ninja install

This does build way faster, both for full builds and incremental builds.

Build tree size

Dynamic libraries: 4.4 Gb. Static libraries: 19.8Gb.

Installed size

Dynamic libraries: 0.7 Gb. Static libraries: 14.7Gb.

Results: full build time.

Static libraries, non-ninja, all backends:

real    51m6.494s
user    160m47.027s
sys     8m49.429s

Dynamic libraries, ninja, split dwarf, x86 backend only:

real    26m19.360s
user    86m11.477s
sys     3m14.478s

Results: incremental build. touch lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp.

Static libraries, non-ninja, all backends:

real    2m17.709s
user    6m8.648s
sys     0m28.594s

Dynamic libraries, ninja, split dwarf, x86 backend only:

real    0m3.245s
user    0m6.104s
sys     0m0.802s

make install times

make:

real    2m6.353s
user    0m7.827s
sys     0m15.316s

ninja:

real    0m2.138s
user    0m0.420s
sys     0m0.831s

The time for rerunning a sharedlib-config ‘ninja install’ is even faster!

Results: time for gdb, b main, run, quit

Static libraries:

real    0m45.904s
user    0m32.376s
sys     0m1.787s

Dynamic libraries, with split dwarf:

real    0m44.440s
user    0m37.096s
sys     0m1.067s

This one isn’t what I would have expected. The initial gdb load time for the split-dwarf exe is almost instantaneous, however it still takes a long time to break in main and continue to that point. I guess that we are taking the hit for a lot of symbol lookup at that point, so it comes out as a wash.

Thinking about this, I noticed that the clang make system doesn’t seem to add ‘-Wl,-gdb-index’ to the link step along with the addition of -gsplit-dwarf to the compilation command line. I thought that was required to get all the deferred symbol table lookup?

Attempting to do so, I found that the insertion of an alternate linker in my PATH wasn’t enough to get clang to use it. Adding –Wl,–gdb-index into the link flags caused complaints from /usr/bin/ld! The cmake magic required was:

-DCMAKE_SHARED_LINKER_FLAGS="-B$HOME/local/binutils.gold/bin -Wl,--gdb-index' \
-DCMAKE_EXE_LINKER_FLAGS="-B$HOME/local/binutils.gold/bin -Wl,--gdb-index' \

This is in the first cmake invocation flags above, but wasn’t used for my initial 45s gdb+clang time measurements. With –gdb-index, the time for the gdb b-main, run, quit sequence is now reduced to:

real    0m10.268s
user    0m3.623s
sys     0m0.429s

A 4x reduction, which is quite nice!