C/C++ development and debugging.

Line stepping through MLIR with a debugger!

February 10, 2026 C/C++ development and debugging. , , , , ,

gdb session

I’ve added an alternate input source for the silly compiler.  As well as the .silly files that it previously accepted, it now also accepts .mlir (silly-dialect) files as input.

This means that if there’s an experimental language feature that requires new style MLIR, but I don’t want to figure out how to push that all the way through grammar -> parser -> builder -> lowering all at once, I might be able to at least understand the required MLIR patterns by by manually modifying exiting MLIR (generated with ‘silly –emit-mlir’).

For example, I don’t have BREAK support for FOR loops. I can do something simple:

INT64 v;

FOR (INT64 myLoopVar : (1, 5))
{
    PRINT myLoopVar;
    v = myLoopVar + 1;
};

PRINT "after loop: ", v;

The MLIR for this (with location info stripped out), looks like:

fedoravm:/home/peeter/toycalculator/tests/endtoend/for> silly-opt --pretty -s out/for_simplest.mlir 
module {
  func.func @main() -> i32 {
    %c0_i32 = arith.constant 0 : i32
    %c5_i64 = arith.constant 5 : i64
    %c1_i64 = arith.constant 1 : i64
    "silly.scope"() ({
      %0 = "silly.declare"() <{sym_name = "v"}> : () -> !silly.var
      scf.for %arg0 = %c1_i64 to %c5_i64 step %c1_i64  : i64 {
        "silly.print"(%c0_i32, %arg0) : (i32, i64) -> ()
        %3 = "silly.add"(%arg0, %c1_i64) : (i64, i64) -> i64
        silly.assign %0 :  = %3 : i64
      }
      %1 = "silly.string_literal"() <{value = "after loop: "}> : () -> !llvm.ptr
      %2 = silly.load %0 :  : i64
      "silly.print"(%c0_i32, %1, %2) : (i32, !llvm.ptr, i64) -> ()
      "silly.return"(%c0_i32) : (i32) -> ()
    }) : () -> ()
    "silly.yield"() : () -> ()
  }
}

If I want to add a BREAK into the mix (which I don’t support in any of grammar or parser or builder right now), something like:

INT64 v; 
FOR (INT64 i : (1, 5)) {
    PRINT i; 
    v = i + 1; 
    IF (i == 3) { BREAK; }; 
};
PRINT "after loop: ", v; 

Then it can be done by replacing the scf.for with scf.while, and putting in additional termination condition logic. Example:

module {
  func.func @main() -> i32 {
    %c0_i32 = arith.constant 0 : i32
    %c1_i64 = arith.constant 1 : i64
    %c3_i64 = arith.constant 3 : i64
    %c5_i64 = arith.constant 5 : i64
    %true = arith.constant true
    %false = arith.constant false

    "silly.scope"() ({
      %0 = "silly.declare"() <{sym_name = "v"}> : () -> !silly.var

      scf.while (%i = %c1_i64, %broke = %false) : (i64, i1) -> (i64, i1) {
        %not_broke = arith.xori %broke, %true : i1
        %in_range = arith.cmpi slt, %i, %c5_i64 : i64
        %continue = arith.andi %in_range, %not_broke : i1
        scf.condition(%continue) %i, %broke : i64, i1
      } do {
      ^bb0(%loop_var: i64, %break_flag: i1):
        "silly.print"(%c0_i32, %loop_var) : (i32, i64) -> ()
        %2 = "silly.add"(%loop_var, %c1_i64) : (i64, i64) -> i64
        silly.assign %0 :  = %2 : i64

        %is_three = arith.cmpi eq, %loop_var, %c3_i64 : i64
        %should_break = arith.ori %break_flag, %is_three : i1

        %next = arith.addi %loop_var, %c1_i64 : i64
        scf.yield %next, %should_break : i64, i1
      }

      %lit = "silly.string_literal"() <{value = "after loop: "}> : () -> !llvm.ptr
      %p = silly.load %0 :  : i64
      "silly.print"(%c0_i32, %lit, %p) : (i32, !llvm.ptr, i64) -> ()

      "silly.return"(%c0_i32) : (i32) -> ()
    }) : () -> ()
    "silly.yield"() : () -> ()
  }
}

Now, here’s where things get cool.  I noticed something curious when I looked at the .mlir dump from the MLIR parser (which I dumped to verify I was getting the expected round trip output before lowering). The MLIR parser, given only MLIR source, and no other location tagging, goes off and tags everything with location info for the MLIR source itself.  Example:

#loc15 = loc("forbreak.mlsilly":27:12)
#loc16 = loc("forbreak.mlsilly":27:28)
module {
  func.func @main() -> i32 {
    %c0_i32 = arith.constant 0 : i32 loc(#loc2)
    %c1_i64 = arith.constant 1 : i64 loc(#loc3)
    %c3_i64 = arith.constant 3 : i64 loc(#loc4)
    %c5_i64 = arith.constant 5 : i64 loc(#loc5)
    %true = arith.constant true loc(#loc6)
    %false = arith.constant false loc(#loc7)
    "silly.scope"() ({
      %0 = "silly.declare"() <{sym_name = "v"}> : () -> !silly.var loc(#loc9)
      %1:2 = scf.while (%arg0 = %c1_i64, %arg1 = %false) : (i64, i1) -> (i64, i1) {
        %4 = arith.xori %arg1, %true : i1 loc(#loc11)
        %5 = arith.cmpi slt, %arg0, %c5_i64 : i64 loc(#loc12)
        %6 = arith.andi %5, %4 : i1 loc(#loc13)
        scf.condition(%6) %arg0, %arg1 : i64, i1 loc(#loc14)
      } do {
      ^bb0(%arg0: i64 loc("forbreak.mlsilly":27:12), %arg1: i1 loc("forbreak.mlsilly":27:28)):
        "silly.print"(%c0_i32, %arg0) : (i32, i64) -> () loc(#loc17)
        %4 = "silly.add"(%arg0, %c1_i64) : (i64, i64) -> i64 loc(#loc18)
        silly.assign %0 :  = %4 : i64 loc(#loc19)
        %5 = arith.cmpi eq, %arg0, %c3_i64 : i64 loc(#loc20)
        %6 = arith.ori %arg1, %5 : i1 loc(#loc21)
        %7 = arith.addi %arg0, %c1_i64 : i64 loc(#loc22)
        scf.yield %7, %6 : i64, i1 loc(#loc23)
      } loc(#loc10)
      %2 = "silly.string_literal"() <{value = "after loop: "}> : () -> !llvm.ptr loc(#loc24)
      %3 = silly.load %0 :  : i64 loc(#loc25)
      "silly.print"(%c0_i32, %2, %3) : (i32, !llvm.ptr, i64) -> () loc(#loc26)
      "silly.return"(%c0_i32) : (i32) -> () loc(#loc27)
    }) : () -> () loc(#loc8)
    "silly.yield"() : () -> () loc(#loc28)
  } loc(#loc1)
} loc(#loc)
#loc = loc("forbreak.mlsilly":9:1)
#loc1 = loc("forbreak.mlsilly":10:3)
#loc2 = loc("forbreak.mlsilly":11:15)
#loc3 = loc("forbreak.mlsilly":12:15)
#loc4 = loc("forbreak.mlsilly":13:15)
#loc5 = loc("forbreak.mlsilly":14:15)
#loc6 = loc("forbreak.mlsilly":15:13)
#loc7 = loc("forbreak.mlsilly":16:14)
...

My compiler can then turns that location info into dwarf DI, just as it does for regular .silly source file, so I can actually line step through the MLIR itself with any debugger! Here’s an example session:

Breakpoint 1, main () at forbreak.mlsilly:25
25              scf.condition(%continue) %i, %broke : i64, i1
(gdb) l
20            
21            scf.while (%i = %c1_i64, %broke = %false) : (i64, i1) -> (i64, i1) {
22              %not_broke = arith.xori %broke, %true : i1
23              %in_range = arith.cmpi slt, %i, %c5_i64 : i64
24              %continue = arith.andi %in_range, %not_broke : i1
25              scf.condition(%continue) %i, %broke : i64, i1
26            } do {
27            ^bb0(%loop_var: i64, %break_flag: i1):
28              "silly.print"(%c0_i32, %loop_var) : (i32, i64) -> ()
29              %2 = "silly.add"(%loop_var, %c1_i64) : (i64, i64) -> i64
(gdb) l
30              silly.assign %0 :  = %2 : i64
31              
32              %is_three = arith.cmpi eq, %loop_var, %c3_i64 : i64
33              %should_break = arith.ori %break_flag, %is_three : i1
34              
35              %next = arith.addi %loop_var, %c1_i64 : i64
36              scf.yield %next, %should_break : i64, i1
37            }
38
39            %lit = "silly.string_literal"() <{value = "after loop: "}> : () -> !llvm.ptr
(gdb) b 32
Breakpoint 2 at 0x40076c: file forbreak.mlsilly, line 32.
(gdb) c
Continuing.
1

Breakpoint 2, main () at forbreak.mlsilly:32
32              %is_three = arith.cmpi eq, %loop_var, %c3_i64 : i64
(gdb) disassemble
Dump of assembler code for function main:
   0x000000000040072c <+0>:     sub     sp, sp, #0x60
   0x0000000000400730 <+4>:     stp     x30, x21, [sp, #64]
   0x0000000000400734 <+8>:     stp     x20, x19, [sp, #80]
   0x0000000000400738 <+12>:    mov     w19, wzr
   0x000000000040073c <+16>:    mov     w20, #0x1                       // #1
   0x0000000000400740 <+20>:    mov     w21, #0x1                       // #1
   0x0000000000400744 <+24>:    str     xzr, [sp, #8]
   0x0000000000400748 <+28>:    cmp     x21, #0x4
   0x000000000040074c <+32>:    b.gt    0x400784 
   0x0000000000400750 <+36>:    tbnz    w19, #0, 0x400784 
   0x0000000000400754 <+40>:    add     x1, sp, #0x10
   0x0000000000400758 <+44>:    mov     w0, #0x1                        // #1
   0x000000000040075c <+48>:    stp     x21, xzr, [sp, #24]
   0x0000000000400760 <+52>:    str     x20, [sp, #16]
   0x0000000000400764 <+56>:    bl      0x4005b0 <__silly_print@plt>
   0x0000000000400768 <+60>:    add     x21, x21, #0x1
=> 0x000000000040076c <+64>:    cmp     x21, #0x4
   0x0000000000400770 <+68>:    str     x21, [sp, #8]
   0x0000000000400774 <+72>:    cset    w8, eq  // eq = none
   0x0000000000400778 <+76>:    orr     w19, w19, w8
   0x000000000040077c <+80>:    cmp     x21, #0x4
   0x0000000000400780 <+84>:    b.le    0x400750 
   0x0000000000400784 <+88>:    mov     x8, #0x3                        // #3
   0x0000000000400788 <+92>:    ldr     x9, [sp, #8]
   0x000000000040078c <+96>:    mov     w10, #0xc                       // #12
   0x0000000000400790 <+100>:   movk    x8, #0x1, lsl #32
   0x0000000000400794 <+104>:   add     x1, sp, #0x10
   0x0000000000400798 <+108>:   mov     w0, #0x2                        // #2
   0x000000000040079c <+112>:   stp     x8, x10, [sp, #16]
   0x00000000004007a0 <+116>:   adrp    x8, 0x400000
   0x00000000004007a4 <+120>:   add     x8, x8, #0x7f8
   0x00000000004007a8 <+124>:   stp     x9, xzr, [sp, #48]
   0x00000000004007ac <+128>:   mov     w9, #0x1                        // #1
   0x00000000004007b0 <+132>:   stp     x8, x9, [sp, #32]
   0x00000000004007b4 <+136>:   bl      0x4005b0 <__silly_print@plt>
   0x00000000004007b8 <+140>:   ldp     x20, x19, [sp, #80]
   0x00000000004007bc <+144>:   mov     w0, wzr
   0x00000000004007c0 <+148>:   ldp     x30, x21, [sp, #64]
   0x00000000004007c4 <+152>:   add     sp, sp, #0x60
   0x00000000004007c8 <+156>:   ret
End of assembler dump.



(gdb) c
Continuing.
2

Breakpoint 2, main () at forbreak.mlsilly:32
32              %is_three = arith.cmpi eq, %loop_var, %c3_i64 : i64
(gdb) p v
$2 = 2

Having built a compiler for an arbitrary language, and having implemented DWARF instrumentation for that language, I get line support for stepping through the MLIR itself, if I want it.

I can imagine a scenerio where I’ve screwed up the MLIR ops generation in the builder. This lets me set a breakpoint right at the MLIR line in question, and poke around at the disassembly for that point in the code, and see what’s going on. What a cool compiler debugging tool!

MLIR toy compiler V5 tagged. Array element assignment/access is implemented.

December 23, 2025 C/C++ development and debugging., clang/llvm , , , , , ,

Screenshot

The language and compiler now supports functions, calls, parameters, returns, basic conditional blocks, scalar and array declarations, binary and unary operations, arithmetic and boolean operators, and a print statement.

See the Changelog for full details of all the changes since V4.  The IF/ELSE work was described recently, but the ARRAY element work is new.

Array element lvalues and rvalues were both implemented.  This required grammar, builder, and lowering changes.

The grammar now has optional array element indexes for many elements.  Examples:

returnStatement
  : RETURN_TOKEN (literal | scalarOrArrayElement)?
  ;

print
  : PRINT_TOKEN (scalarOrArrayElement | STRING_PATTERN)
  ;

assignment
  : scalarOrArrayElement EQUALS_TOKEN rhs
  ;

rhs
  : literal
  | unaryOperator? scalarOrArrayElement
  | binaryElement binaryOperator binaryElement
  | call
  ;

binaryElement
  : numericLiteral
  | unaryOperator? scalarOrArrayElement
  ;

booleanElement
  : booleanLiteral | scalarOrArrayElement
  ;

scalarOrArrayElement
  : IDENTIFIER (indexExpression)?
  ;

indexExpression
  : ARRAY_START_TOKEN (IDENTIFIER | INTEGER_PATTERN) ARRAY_END_TOKEN
  ;

Most of these scalarOrArrayElement used to be just LITERAL. My MLIR AssignOp and LoadOp’s are now generalized to include optional indexes:

def Toy_AssignOp : Op<Toy_Dialect, "assign"> {
  let summary = "Assign a value to a variable (scalar or array element).";

  let description = [{
    Assigns `value` to the variable referenced by `var_name`.
    If `index` is present, the assignment targets the array element at that index.
    The target variable must have been declared with a matching `toy.declare`.
  }];

  let arguments = (ins
    SymbolRefAttr:$var_name,               // @t
    Optional:$index,                // optional SSA value of index type (dynamic or none)
    AnyType:$value                         // the value being assigned
  );

  let results = (outs);

  let assemblyFormat =
    "$var_name (`[` $index^ `]`)? `=` $value `:` type($value) attr-dict";
}

def Toy_LoadOp : Op<Toy_Dialect, "load"> {
  let summary = "Load a variable (scalar or array element) by symbol reference.";
  let arguments = (ins
    SymbolRefAttr:$var_name,               // @t
    Optional:$index                 // optional SSA value of index type (dynamic or none)
  );

  let results = (outs AnyType:$result);

  let assemblyFormat =
    "$var_name (`[` $index^ `]`)? `:` type($result) attr-dict";
}

Here is a simple example program that has a couple array elements, assignments, accesses, print and exit statements:

        INT32 t[7];
        INT32 x;
        t[3] = 42;
        x = t[3];
        PRINT x;

Here is the MLIR listing for this program, illustrating a couple of the optional index inputs:

        module {
          func.func @main() -> i32 {
            "toy.scope"() ({
              "toy.declare"() <{size = 7 : i64, type = i32}> {sym_name = "t"} : () -> ()
              "toy.declare"() <{type = i32}> {sym_name = "x"} : () -> ()
              %c3_i64 = arith.constant 3 : i64
              %c42_i64 = arith.constant 42 : i64
              %0 = arith.index_cast %c3_i64 : i64 to index
              toy.assign @t[%0] = %c42_i64 : i64
              %c3_i64_0 = arith.constant 3 : i64
              %1 = arith.index_cast %c3_i64_0 : i64 to index
    >>        %2 = toy.load @t[%1] : i32
              toy.assign @x = %2 : i32
              %3 = toy.load @x : i32
              toy.print %3 : i32
              %c0_i32 = arith.constant 0 : i32
              "toy.return"(%c0_i32) : (i32) -> ()
            }) : () -> ()
            "toy.yield"() : () -> ()
          }
        }

PRINT and EXIT also now support array elements, but that isn’t in this bit of sample code.

Here is an example lowering to LLVM LL:

        define i32 @main() !dbg !4 {
          %1 = alloca i32, i64 7, align 4, !dbg !8
            #dbg_declare(ptr %1, !9, !DIExpression(), !8)
          %2 = alloca i32, i64 1, align 4, !dbg !14
            #dbg_declare(ptr %2, !15, !DIExpression(), !14)
          %3 = getelementptr i32, ptr %1, i64 3, !dbg !16
          store i32 42, ptr %3, align 4, !dbg !16
    >>    %4 = getelementptr i32, ptr %1, i64 3, !dbg !17
    >>    %5 = load i32, ptr %4, align 4, !dbg !17
          store i32 %5, ptr %2, align 4, !dbg !17
          %6 = load i32, ptr %2, align 4, !dbg !18
          %7 = sext i32 %6 to i64, !dbg !18
          call void @__toy_print_i64(i64 %7), !dbg !18
          ret i32 0, !dbg !18
        }

(with the GEP and associated load for the array access highlighted.)

Even without optimization enabled, the assembly listing is pretty good:

        0000000000000000 
: 0: sub $0x28,%rsp 4: movl $0x2a,0x18(%rsp) c: movl $0x2a,0x8(%rsp) 14: mov $0x2a,%edi 19: call 1e 1a: R_X86_64_PLT32 __toy_print_i64-0x4 1e: xor %eax,%eax 20: add $0x28,%rsp 24: ret

With optimization, everything is in registers, looking even nicer:

        0000000000000000 
: 0: push %rax 1: mov $0x2a,%edi 6: call b 7: R_X86_64_PLT32 __toy_print_i64-0x4 b: xor %eax,%eax d: pop %rcx e: ret

Attempting to learn to use VSCode: some keymappings and notes.

December 12, 2025 C/C++ development and debugging. , , , , , , , , , ,

I’ve been using vim and the terminal for ~30 years, and am working now in a VSCode shop.  I am probably the only holdout, using terminal tools (tmux, vim, cscope, ctags, perl, …).  I am a keyboard guy, and am generally hopeless in a UI of any sort, and don’t find them particularly intuitive.

I keep trying to make the VSCode switch, but then get frustrated, as I want to do something that I can do trivially outside of the UI and I have no idea how to do it in the UI.  I’ll then switch to terminal for something “quick”, and end up just staying there for the rest of the day.

I know that there are good reasons to use VSCode.  In particular, the AI helper tools are near magical at filling in comments and even code.  How the hell did it read my mind this time is often the feeling that I have when I am using it.

Here are some examples of things that I can do really easily outside of the VSCode environment:

  • Switching tabs (open files in the UI) with just keystrokes.  I do this in tmux with F7, F8 keymappings.  I use tmux aliases to put names on all my shell sessions, so I can see at a glance what I am doing in any (example: I just type ‘tnd’ and my tmux window is then labelled with the last two components of my current directory.)
  • Open a file.  Clicking through a UI directory hierarchy is so slow.  I have CDPATH set so that I can get right to my src or tests directory in the terminal.
  • build the code.  Typing ninja from my tmux “build” directory is so easy (and I have scripts that rerun cmake, clean and recreate the build directory).
  • Run an ad-hoc filter on a selected range of lines in the code (either visually selected, or with a vim search expression, like “:,/^  }/!foo”.  If I install the vim extension in VSCode to use comfortable key bindings, then even a search like that doesn’t work.
  • I can’t search for /^  }/ (brace with two spaces of indentation), since the VSCode vim extension insists on ignoring multiple spaces in a search expression like that.
  • Iterate quickly over compilation errors.  In the terminal I just run ‘vim -q o’, assuming that I’ve run ‘ninja 2>&1 | tee o’
  • Launch a debugger.  I can put my breakpoints in a .gdbinit file (either ~/.gdbinit or a local directory one), and then just run.  How to do the same in the UI is not obvious, and certainly not easy.  I have done it, and when you can figure out how to do it, it’s definitely nice to have more than a peephole view of the code (especially since gdb’s TUI mode is flaky.)

It’s my goal to at least understand how to do some of these tasks in VSCode.  I’m going to come back to this blog post and gradually fill it in with the tricks that I’ve figured out, assuming I do, so that I can accomplishing the goals above and more.

My environment

I am using a PC keyboard.  It’s an ancient cheap logitech keyboard (I had two of these, both about 9 years old, both in the same sad but impressively worn state).  Those keyboards have nice pressable keys, not like the mac laptop.  The mac laptop keyboard is for well dressed people browsing the web in Starbucks, not for people in the trenches.  I use the Karabiner app to map my Alt key to Command so that the effective “command” key is always in the same place.  For that reason, some of these key mappings may not be the ones that anybody else would want.

Claude suggests that these are the meanings of the keyboard symbols in VSCode:

And suggests that for me the Alt/Option is my “physical command key” (i.e.: Alt.). I have yet to find a keybinding that I want to use with that to verify that my Karabiner settings don’t do something strange.

How to do stuff (a start):

  • Toggle to the terminal, or start a new one:ctrl-`(at least with my PC keyboard).  VSCode help shows this as:Alternative for create terminal: command-shift-p (command palette) -> open new terminal
  • Search for a file to edit:command-p(Alt-p on my PC keyboard.)VSCode help shows this as “Go to File”, but with an apparent capital P:Somewhat confusingly, the VSCode help shows all the key binding characters in upper case, even though command-p and command-P (shift p) mean different things.
  • Open keyboard shortcuts:command k, (let go), command s ; or:
    command-shift-p (command-P) -> Keyboard shortcuts(Alt-shift p on my PC keyboard)
  • Toggle between editor windows:ctrl-tab
  • Move to editor window N:ctrl-N (for example: ctrl-1, ctrl-2, …)Note that command-2 opens a split (to the right), much different than what ctrl-2 does (command-1, command-3 don’t seem to be bound)
  • Search for a pattern with multiple spaces (with vim extension installed).  Example:/^\s\s}Searching with:/^  }(start of line, two spaces, end brace), does not work, as VSCode or the vim extension seems to aggregate multiple spaces into one.
  • Maximize a terminal, or switch back to split terminal/edit view:I ended up adding a ‘command-m’ keybinding for “Toggle Maximized Panel” to do that.  With that done, I can cycle between full screen terminal and split screen editor/terminal.
  • Maximize an editor window, or switch back to split edit/terminal:ctrl-jThis might better be described as: Hide/show the panel (terminal area), giving the editor more space when the panel is hidden.
  • Close a window:command-w(Alt-w on the PC keyboard)
  • Strip trailing whitespace:command-k, let-go, command-xI see this in the ‘Keyboard Shortcuts’ mappings, but am unlikely to remember it, and will probably revert to using:%s/ *$//or an external filter (that’s how I used to do it.)
  • Build command:command-shift-b (command-B)I did have a bunch of .vscode json overrides that had different build targets, but something has removed those from my tree, so as is, it’s not clear to me what exactly this does.  cmake options come up.I’ll probably just invoke ninja from the terminal (with rm -rf build ; cmake … when I want it.)
  • Tasks shortcutctrl-shift-y (ctrl-Y)This was a recommended key binding from one of our vscode gurus, and I’ve used it.  But it’s annoying that my .vscode/tasks.json was removed by something, so this now does nothing interesting (although that’s okay, since I can now switch to the terminal with a couple keystrokes.)
  • Shell callouts.  It is my recollection that I was unable to run shell callouts.  Example::,/^}/!grep foobut after setting the shell command in the vim extension settings to /bin/bash, this now works.  It’s awkward though, and runs the shell commands locally, not on the remote environment, so I can’t run something like clang-format, which I don’t have installed (currently) on my mac, but only on the remote.  I suppose that I could have a shell command ssh to the remote, but that’s pretty awkward (and would be slow.). The work around for clang-format will probably just be to run ‘clang-format -i’ in the terminal (which can have unfortunate side effects when applied to the whole file.)
  • Debug: create a debugger launch configuration stanza in .vscode/launch.json, like so:

    {   
        "version": "0.2.0",
        "configurations": [
            {
                "name": "Debug foo",
                "type": "cppdbg",
                "request": "launch",
                "program": "${workspaceFolder}/build/foo",
                "args": [],
                "cwd": "${workspaceFolder}/build",
                "MIMode": "gdb",
                "miDebuggerPath": "/usr/bin/gdb",
                "setupCommands": [
                    { "description": "Set initial breakpoint", "text": "-break-insert debugger_test", "ignoreFailures": true }
                ],  
                "preLaunchTask": "build"
            }]} 
    

    Then set a breakpoint in the source that you want to stop in, click the bug symbol on the LHS:

    Bug icon

    select that new debug configuration, and away you go. This brings up a debugger console, but it’s a bit of a pain to use, since it’s in MI mode, so for example, instead of ‘n’, you have to type ‘-exec next’. The vscode key mappings to avoid that extra typing are (according to the Go menu) are:

    • n: F10
    • s: F11 (now cmd-F11)
    • finish: shift F11
    • c: F5That step-in F11 action didn’t work for me, as macOS intercepts it (i.e.: “Show desktop” — a function that doesn’t seem terribly useful, as I don’t have anything on my desktop.)  I’ve changed that “Debug: Step Into” keybinding to a command-F11, and changed “Debug: Step Into Target” (which used command-F11) to ctrl-F11.  I’m not sure if I’ll end up using that ctrl-F11, or just setting breakpoints when the step into candidate has multiple options.
  • MacOS required keyboard configuration!Typing spaces fast in vscode results in rogue period insertions.  Every time I would try vscode again, channelling a diet and exercise “and this time I mean it” vibe, I’d hit this rogue period issue and go back to terminal in frustration.

    Watercooler talk in the office suggested that this is apparently a MacOS feature (but doesn’t effect my usual terminal+ssh+vim workflow).  Chat recommended the following keyboard configuration setting adjustments to fix it (testing that now):

    Fixing EVIL MacOS keyboard settings that cripple vscode.

    Fixing EVIL MacOS keyboard settings that cripple vscode.

mixed results with more C++ module experimentation

November 22, 2025 C/C++ development and debugging. , , , , ,

Here is some followup from my earlier attempt to use bleeding edge C++ module support.

First experiment: do I need to import all of the std library?

I tried this:

import iostream;

int main() {

  std::cout << "hello world\n";

  return 0;
}

I didn’t know if gcm.cache/std.gcm (already built from my previous experiment) would supply that export, but I get:

g++ -std=c++23 -fmodules   -c -o broken.o broken.cc
In module imported at broken.cc:1:1:
iostream: error: failed to read compiled module: No such file or directory
iostream: note: compiled module file is ‘gcm.cache/iostream.gcm’
iostream: note: imports must be built before being imported
iostream: fatal error: returning to the gate for a mechanical issue
compilation terminated.
make: *** [: broken.o] Error 1

so it appears the answer is no. Also, /usr/include/c++/15/bits/ only appears to have a std.cc, and no iostream.cc:

> find  /usr/include/c++/15/bits/ -name "*.cc"
/usr/include/c++/15/bits/std.cc
/usr/include/c++/15/bits/std.compat.cc

so it appears, for the time being, g++-15 is all or nothing with respect to std imports. However, when using precompiled headers, you usually want a big pre-generated pch that has just about everything, and this is similar, so maybe that’s not so bad (other than namespace pollution.)

Second experiment. Adding a non-std import/export.

I moved a variant of Stroustrup’s collect_lines function into a separate module, like so:

// stuff.cc
export module stuff;
import std;

namespace stuff {

void helper() { std::cout << "call to a private function\n"; }

export
std::vector<std::string> collect_lines(std::istream &is) {

  helper();

  std::unordered_set<std::string> s;
  for (std::string line; std::getline(is, line);)
    s.insert(line);

  //return std::vector<std::string>(s.begin(), s.end());
  return std::vector{std::from_range, s};
}
} // namespace stuff

It turns out that I needed the export keyword on ‘module stuff’, as well as for any function that I wanted to export. Without that I get:

> make 
g++ -std=c++23 -fmodules -c /usr/include/c++/15/bits/std.cc
g++ -std=c++23 -fmodules   -c -o stuff.o stuff.cc
g++ -std=c++23 -fmodules   -c -o try.o try.cc
try.cc: In function ‘int main()’:
try.cc:8:12: error: ‘stuff’ has not been declared
    8 |   auto v = stuff::collect_lines(std::cin);
      |            ^~~~~
make: *** [: try.o] Error 1

The compile error is not very good. It doesn’t complain that collect_lines is not exported, but instead complains that stuff, the namespace itself, is not declared.

I can export the namespace, which is the naive resolution to the compiler diagnostic presented, for example:

export module stuff;
import std;

export namespace stuff {

void helper() { std::cout << "call to a private function\n"; }

//export
std::vector<std::string> collect_lines(std::istream &is) {

  helper();

  std::unordered_set<std::string> s;
  for (std::string line; std::getline(is, line);)
    s.insert(line);

  //return std::vector<std::string>(s.begin(), s.end());
  return std::vector{std::from_range, s};
}
} // namespace stuff

However, that means that the calling code can now call stuff::helper, which was not my intent.

There also does not appear to be any good way to enumerate exports available in the gcm.cache. nm output for the symbol is not any different with or without the export keyword:

> nm stuff.o | grep collect_lines | c++filt
0000000000000028 T stuff::collect_lines@stuff[abi:cxx11](std::basic_istream<char, std::char_traits >&)

This is a critically important tooling failure if modules are going to be used in production. Anybody who has programmed with windows dlls or AIX shared objects, or Linux shared objects with symbol versioning, knows about the resulting hellish nature of the linker error chase, when an export is missed from such an enumeration. Hopefully, there’s some external tool that can enumerate gcm.cache exports. Both grok and chatgpt were unsuccessful advising about tools for this sort of task. The best answer was chatgpt’s recommendation for -fmodule-dump:

> g++ -std=c++23 -fmodules -save-temps -fdump-lang-module   -c -o stuff.o stuff.cc 
fedoravm:/home/peeter/physicsplay/programming/module> ls
broken.cc  gcm.cache  makefile  makefile.clang  std.o  stuff.cc  stuff.cc.002l.module  stuff.ii  stuff.o  stuff.s  try.cc

but that *.module output doesn’t have anything that obviously distinguishes exported vs. non-exported symbols:

> grep -2e stuff::helper -e stuff::collect_lines *.module
Wrote section:28 named-by:'::std::vector<::std::__cxx11::basic_string@std:1<char,::std::char_traits@std:1,::std::allocator@std:1>,::std::allocator<::std::__cxx11::basic_string@std:1<char,::std::char_traits@std:1,::std::allocator@std:1>>>'
Writing section:29 2 depsets
 Depset:0 decl entity:403 function_decl:'::stuff::collect_lines'
 Wrote declaration entity:403 function_decl:'::stuff::collect_lines'
 Depset:1 binding namespace_decl:'::stuff::collect_lines'
Wrote section:29 named-by:'::stuff::collect_lines'
Writing section:30 2 depsets
 Depset:0 decl entity:404 function_decl:'::stuff::helper'
 Wrote declaration entity:404 function_decl:'::stuff::helper'
 Depset:1 binding namespace_decl:'::stuff::helper'
Wrote section:30 named-by:'::stuff::helper'
Writing section:31 4 depsets
 Depset:0 specialization entity:405 type_decl:'::std::__replace_first_arg<::std::allocator<::std::__detail::_Hash_node<::std::__cxx11::basic_string@std:1<char,::std::char_traits@std:1,::std::allocator@std:1>,0x1>>,::std::__cxx11::basic_string@std:1<char,::std::char_traits@std:1,::std::allocator@std:1>>'
--
Writing binding table
 Bindings '::std::operator==' section:8
 Bindings '::stuff::collect_lines' section:29
 Bindings '::stuff::helper' section:30
 Bindings '::std::swap' section:35
Writing pending-entities

Chatgpt summarizes this as follows:

“This is confirmed by overwhelming evidence:

  • GCC bug 113590
  • GCC mailing list discussion July 2024
  • Confirmation from module implementers: “GCC BMIs do not currently record export flags.”

This is intentional (for now): GCC’s binary module interface tracks reachable declarations, not exported ones.”

Trying clang

After considerable experimentation, and both grok and chatgpt help, I was finally able to get a working compile and link sequence using the clang toolchain:

fedoravm:/home/peeter/physicsplay/programming/module> make -f *.clang clean
rm -f *.o *.pcm try
fedoravm:/home/peeter/physicsplay/programming/module> make -f *.clang 
clang++ -std=c++23 -stdlib=libc++ -Wall -Wextra -Wno-reserved-module-identifier --precompile /usr/share/libc++/v1/std.cppm -o std.pcm
clang++ -std=c++23 -stdlib=libc++ -Wall -Wextra -fmodule-file=std=std.pcm --precompile stuff.cppm -o stuff.pcm
clang++ -std=c++23 -stdlib=libc++ -Wall -Wextra -fmodule-file=std=std.pcm -fmodule-file=stuff=stuff.pcm -c try.cc -o try.o
clang++ -std=c++23 -stdlib=libc++ -Wall -Wextra -fmodule-file=std=std.pcm -c stuff.cc -o stuff.o
clang++ -std=c++23 -stdlib=libc++ -Wall -Wextra -fmodule-file=std=std.pcm -fmodule-file=stuff=stuff.pcm try.o stuff.o -o try

Unlike g++, I have to build both the module and the object code for stuff.cc (and facilitated that with a clang.cppm -> clang.cc symlink), but unlike g++, I didn’t need a std.o (for reasons that I don’t understand.)

Dumping the clang-AST appears to be the closest that we can get to enumerating exports. Example:

> clang++ -std=c++23 -stdlib=libc++ -Wall -Wextra -fmodule-file=std=std.pcm  -Xclang -ast-dump -fsyntax-only stuff.cc | less -R

This shows output like:

Screenshot

This is not terribly user friendly, and not something that a typical clang front end user would attempt to do.

This hints that the “way” do dump exports would be to write a clang-AST visitor that dumps all the ExportDecl’s that are encountered (or a complex grep script that attempts to mine the -ast-dump output)

C++ sample code with modules!

November 21, 2025 C/C++ development and debugging. ,

Screenshot

A coworker shared the Stroustrup paper titled “21st Century C++”. I was reading a PDF version, but a search turns up an online version too.

This paper included use of C++ with modules. I’ve had my eyes on those since working on DB2, which suffered from include file hell (DB2’s include file hierarchy was a fully connected graph). However, until today, I didn’t realize that there were non-experimental compilers that included module support.

Here’s a sample program that uses modules (Stroustrup’s, with a main added)

import std;

using namespace std;

vector<string> collect_lines(istream &is) {
  unordered_set<string> s;
  for (string line; getline(is, line);)
    s.insert(line);

  return vector{from_range, s};
}

int main() {
  auto v = collect_lines(cin);
  for (const auto &i : v) {
    cout << format("{}\n", i);
  }

  return 0;
}

A first attempt to compile this, even with -std=c++23 bombs:

fedoravm:/home/peeter/physicsplay/programming/module> g++ -std=c++23 -o try try.cc 2>&1 | head -5
try.cc:1:1: error: ‘import’ does not name a type
    1 | import std;
      | ^~~~~~
try.cc:1:1: note: C++20 ‘import’ only available with ‘-fmodules’, which is not yet enabled with ‘-std=c++20’
try.cc:5:8: error: ‘string’ was not declared in this scope

but we get a hint about what is needed (-fmodules). However, that’s not enough by itself:

fedoravm:/home/peeter/physicsplay/programming/module> g++ -std=c++23 -fmodules -o try try.cc 2>&1 | head -5
In module imported at try.cc:1:1:
std: error: failed to read compiled module: No such file or directory
std: note: compiled module file is ‘gcm.cache/std.gcm’
std: note: imports must be built before being imported
std: fatal error: returning to the gate for a mechanical issue

Here’s the magic sequence that we need, which includes a build of the C++ std export too:

g++ -std=c++23 -fmodules -c /usr/include/c++/15/bits/std.cc
g++ -std=c++23 -fmodules   -c -o try.o try.cc
g++ -std=c++23 -fmodules -o try std.o try.o  

On this VM, I have g++-15 installed, which is sufficient to build and run this little program, modules and all.