I’ve tagged V4 for my toy language and MLIR based compiler.
See the Changelog for the gory details (or the commit history). There are three specific new features, relative to the V3 tag:
-
- Adds support (grammar, builder, lowering) for function declarations, and function calls. Much of the work for this was done in branch use_mlir_funcop_with_scopeop, later squashed and merged as a big commit. Here’s an example
Here is the MLIR for this program:
module { func.func private @foo() { "toy.scope"() ({ "toy.declare"() <{type = i16}> {sym_name = "v"} : () -> () %c3_i64 = arith.constant 3 : i64 "toy.assign"(%c3_i64) <{var_name = @v}> : (i64) -> () %0 = "toy.string_literal"() <{value = "In foo"}> : () -> !llvm.ptr toy.print %0 : !llvm.ptr %1 = "toy.load"() <{var_name = @v}> : () -> i16 %c42_i64 = arith.constant 42 : i64 %2 = arith.trunci %c42_i64 : i64 to i32 "toy.call"(%1, %2) <{callee = @bar}> : (i16, i32) -> () %3 = "toy.string_literal"() <{value = "Called bar"}> : () -> !llvm.ptr toy.print %3 : !llvm.ptr "toy.return"() : () -> () }) : () -> () "toy.yield"() : () -> () } func.func private @bar(%arg0: i16, %arg1: i32) { "toy.scope"() ({ "toy.declare"() <{param_number = 0 : i64, parameter, type = i16}> {sym_name = "w"} : () -> () "toy.declare"() <{param_number = 1 : i64, parameter, type = i32}> {sym_name = "z"} : () -> () %0 = "toy.string_literal"() <{value = "In bar"}> : () -> !llvm.ptr toy.print %0 : !llvm.ptr %1 = "toy.load"() <{var_name = @w}> : () -> i16 toy.print %1 : i16 %2 = "toy.load"() <{var_name = @z}> : () -> i32 toy.print %2 : i32 "toy.return"() : () -> () }) : () -> () "toy.yield"() : () -> () } func.func @main() -> i32 { "toy.scope"() ({ %c0_i32 = arith.constant 0 : i32 %0 = "toy.string_literal"() <{value = "In main"}> : () -> !llvm.ptr toy.print %0 : !llvm.ptr "toy.call"() <{callee = @foo}> : () -> () %1 = "toy.string_literal"() <{value = "Back in main"}> : () -> !llvm.ptr toy.print %1 : !llvm.ptr "toy.return"(%c0_i32) : (i32) -> () }) : () -> () "toy.yield"() : () -> () } }
Here’s a sample program with an assigned CALL value:
The MLIR for this one looks like:
module { func.func private @bar(%arg0: i16) { "toy.scope"() ({ "toy.declare"() <{param_number = 0 : i64, parameter, type = i16}> {sym_name = "w"} : () -> () %0 = "toy.load"() <{var_name = @w}> : () -> i16 toy.print %0 : i16 "toy.return"() : () -> () }) : () -> () "toy.yield"() : () -> () } func.func @main() -> i32 { "toy.scope"() ({ %c0_i32 = arith.constant 0 : i32 %0 = "toy.string_literal"() <{value = "In main"}> : () -> !llvm.ptr toy.print %0 : !llvm.ptr %c3_i64 = arith.constant 3 : i64 %1 = arith.trunci %c3_i64 : i64 to i16 "toy.call"(%1) <{callee = @bar}> : (i16) -> () %2 = "toy.string_literal"() <{value = "Back in main"}> : () -> !llvm.ptr toy.print %2 : !llvm.ptr "toy.return"(%c0_i32) : (i32) -> () }) : () -> () "toy.yield"() : () -> () } }
I’ve implemented a two stage lowering, where the toy.scope, toy.yield, toy.call, and toy.returns are stripped out leaving just the func and llvm dialects. Code from that stage of the lowering is cleaner looking
llvm.mlir.global private constant @str_1(dense<[66, 97, 99, 107, 32, 105, 110, 32, 109, 97, 105, 110]> : tensor<12xi8>) {addr_space = 0 : i32} : !llvm.array<12 x i8> func.func private @__toy_print_string(i64, !llvm.ptr) llvm.mlir.global private constant @str_0(dense<[73, 110, 32, 109, 97, 105, 110]> : tensor<7xi8>) {addr_space = 0 : i32} : !llvm.array<7 x i8> func.func private @__toy_print_i64(i64) func.func private @bar(%arg0: i16) { %0 = llvm.mlir.constant(1 : i64) : i64 %1 = llvm.alloca %0 x i16 {alignment = 2 : i64, bindc_name = "w.addr"} : (i64) -> !llvm.ptr llvm.store %arg0, %1 : i16, !llvm.ptr %2 = llvm.load %1 : !llvm.ptr -> i16 %3 = llvm.sext %2 : i16 to i64 call @__toy_print_i64(%3) : (i64) -> () return } func.func @main() -> i32 { %0 = llvm.mlir.constant(0 : i32) : i32 %1 = llvm.mlir.addressof @str_0 : !llvm.ptr %2 = llvm.mlir.constant(7 : i64) : i64 call @__toy_print_string(%2, %1) : (i64, !llvm.ptr) -> () %3 = llvm.mlir.constant(3 : i64) : i64 %4 = llvm.mlir.constant(3 : i16) : i16 call @bar(%4) : (i16) -> () %5 = llvm.mlir.addressof @str_1 : !llvm.ptr %6 = llvm.mlir.constant(12 : i64) : i64 call @__toy_print_string(%6, %5) : (i64, !llvm.ptr) -> () return %0 : i32 }
There are some dead code constants left there (%3), seeming due to type conversion, but they get stripped out nicely by the time we get to LLVM-IR:
@str_1 = private constant [12 x i8] c"Back in main" @str_0 = private constant [7 x i8] c"In main" declare void @__toy_print_string(i64, ptr) declare void @__toy_print_i64(i64) define void @bar(i16 %0) { %2 = alloca i16, i64 1, align 2 store i16 %0, ptr %2, align 2 %3 = load i16, ptr %2, align 2 %4 = sext i16 %3 to i64 call void @__toy_print_i64(i64 %4) ret void } define i32 @main() { call void @__toy_print_string(i64 7, ptr @str_0) call void @bar(i16 3) call void @__toy_print_string(i64 12, ptr @str_1) ret i32 0 }
- Generalize NegOp lowering to support all types, not just f64.
- Allow PRINT of string literals, avoiding requirement for variables. Example:
- Adds support (grammar, builder, lowering) for function declarations, and function calls. Much of the work for this was done in branch use_mlir_funcop_with_scopeop, later squashed and merged as a big commit. Here’s an example
The next obvious thing to do for the language/compiler would be to implement conditionals (IF/ELIF/ELSE) and loops. I think that there are MLIR dialects to facilitate both (like the affine dialect for loops.)
However, having now finished this function support feature (which I’ve been working on for quite a while), I’m going to take a break from this project. Even though I’ve only been working on this toy compiler project in my spare time, it periodically invades my thoughts. With all that I have to learn for my new job, I’d rather have one less extra thing to think about, so that I don’t feel pulled in too many directions at once.
code
more code
~~~~