Stream: beginners

Topic: How to check LLVM optimizations?


view this post on Zulip Qqwy / Marten (Jul 17 2022 at 11:55):

I want to experiment a little with how the LLVM IR which is generated by the Roc compiler is optimized further by LLVM. (To e.g. see if different ways of writing the same program might be treated differently by the compiler passes).

How to set this up?

view this post on Zulip Folkert de Vries (Jul 17 2022 at 12:00):

you can print the llvm to a file earlier in the process

view this post on Zulip Qqwy / Marten (Jul 17 2022 at 12:07):

Is the LLVM which is exported using --debug usable? Or is that a bad idea?

view this post on Zulip Folkert de Vries (Jul 17 2022 at 12:40):

yes it's fine to use

view this post on Zulip Qqwy / Marten (Jul 17 2022 at 13:09):

Fun! LLVM is able to compile the following down (on the hello-world platform in this case):

main =
  outcome = example (Ok 123456789) (Err Nothing)
  when outcome is
    Ok val ->
      valStr = Num.toStr val
      "The result of the addition was successful and is \(valStr)"
    Err _ ->
      "Calculating the result of addition has failed\n"

example : Result I64 [Nothing], Result I64 [Nothing] -> Result I64 [Nothing]
example = \lhs, rhs ->
  Ok (\x -> \y -> x + y)
  |> apply lhs
  |> apply rhs

apply = \funRes, valRes ->
  when funRes is
    Err e -> Err e
    Ok fun ->
      when valRes is
        Err e -> Err e
        Ok val -> Ok (fun val)

to

roc__mainForHost_1_exposed_generic:     # @roc__mainForHost_1_exposed_generic
        lea     rax, [rip + .L_str_literal_13338901839589407859+8]
        mov     qword ptr [rdi], rax
        vmovaps xmm0, xmmword ptr [rip + .LCPI1_0] # xmm0 = [46,46]
        vmovups xmmword ptr [rdi + 8], xmm0
        ret
roc__mainForHost_1_exposed:             # @roc__mainForHost_1_exposed
        mov     rax, rdi
        mov     qword ptr [rdi + 16], 46
        ret
roc__mainForHost_size:                  # @roc__mainForHost_size
        mov     eax, 24
        ret
.L_str_literal_13338901839589407859:
        .ascii  "\000\000\000\000\000\000\000\000Calculating the result of addition has failed\n"

i.e. all function calls are optimized away. LLVM sees that we pass an Err Nothing and so the only thing that is left in assembly is copying the string literal to pass back to the hello-world platform.

view this post on Zulip Qqwy / Marten (Jul 17 2022 at 13:11):

For some reason, there also is a definition of __muloti4 included in the resulting assembly. (This is software support for "multiplying two 128-bit signed integers with overflow checking") although to my knowledge 128bit numbers are not used anywhere in the code.

view this post on Zulip Qqwy / Marten (Jul 17 2022 at 13:11):

c.f. Compiler Explorer

view this post on Zulip Qqwy / Marten (Jul 17 2022 at 13:13):

Not that big of a deal of course, since when linking it together with the platform code, superfluous symbols will disappear from the resulting binary.

view this post on Zulip Folkert de Vries (Jul 17 2022 at 13:14):

we link that in explicitly for reasons I have forgotten

view this post on Zulip Folkert de Vries (Jul 17 2022 at 13:14):

but it is required

view this post on Zulip Qqwy / Marten (Jul 17 2022 at 13:19):

Curious!

view this post on Zulip Qqwy / Marten (Jul 17 2022 at 13:24):

If you replace the line

outcome = example (Ok 123456789) (Err Nothing)

with e.g.

outcome = example (Ok 123456789) (Ok 42)

then the assembly is more involved.
Maybe that is allocation + refcounting ceremony because the resulting strings are too large to allocate on the stack(?) (c.f. Compiler Explorer, go to the definition of _mainForHost_67abdd721024f0ff4eb3f4c2fc13bc5bad42db7851d456d88d203d15aaa450)

But the more important point is that there is no difference in resulting assembly between

outcome = example (Ok 123456789) (Ok 42)

and

outcome = Ok (123456789 + 42)

. The whole calculation is inlined and constant-folded away! :happy:


Last updated: Jul 06 2025 at 12:14 UTC