IR interpreter · ideas · Zulip Chat Archive

Stream: ideas

Topic: IR interpreter

Brendan Hansknecht (Nov 10 2023 at 18:07):

Related to the Q&A, I wonder how hard it would be to make a tree walking or bitcode interpreter that runs on one of our irs.

Folkert de Vries (Nov 10 2023 at 18:08):

not that hard but you'd have to make it work with the platform

Folkert de Vries (Nov 10 2023 at 18:08):

just for the repl that would work fine but not for actual executables

Brendan Hansknecht (Nov 10 2023 at 18:13):

You could generate a shim to deal with that, but it wouldn't be pretty. Basically the interpreter would be a c library. When calling roc someapp.roc, it would figure out all the symbols it needs to expose and build a shared library that essentially exposes those symbols, converts to a generic format, and then call the underlying interpreter generic execution symbol.

Brendan Hansknecht (Nov 10 2023 at 18:15):

So not pretty, but probably not to hard to do. Though you would definitely want to manually build that shared library to not waste 100s of milliseconds just on linking before you can even start running your interpreter.

Ayaz Hafiz (Nov 10 2023 at 18:59):

I really want to build such an interpreter. At least we could reuse it for constant eval

Brendan Hansknecht (Nov 10 2023 at 19:02):

I really want to build such an interpreter.

For any specific reason?

Ayaz Hafiz (Nov 10 2023 at 19:18):

For constant eval. Also easy way to validate correctness of the IR compiler

Anton (Nov 10 2023 at 19:22):

Also seems useful for debugging (for roc users).

Richard Feldman (Nov 10 2023 at 19:57):

I'm open to being wrong about this, but I actually think we'll want to use the dev backend for constant eval

Richard Feldman (Nov 10 2023 at 19:59):

in dev builds, I suspect in a lot of cases it would result in almost exclusively compiling things we're compiling anyway, plus then the execution would be faster!

Richard Feldman (Nov 10 2023 at 20:00):

and in optimized builds, once we have caching, it might be able to reuse previously built and cached dev artifacts to avoid rebuilding them (before executing the constant eval)

Ayaz Hafiz (Nov 10 2023 at 20:08):

I'm interested how the mechanics of that would work. The useful thing about an interpreter at the IR level is you can partially evaluate anything (including partial evaluation inside a function) and get out the same IR that you can then generate to any machine code. How would that work with the dev backend? My intuition is it would only be able to evaluate top-level values, and only generate platform-specific values, unless you also have some read-back mechanism.

Richard Feldman (Nov 10 2023 at 20:19):

oh yeah that was my intention, just evaluate top-level values

Richard Feldman (Nov 10 2023 at 20:19):

(or values that can be hoisted to top level because they don't depend on anything else)

Richard Feldman (Nov 10 2023 at 20:20):

wouldn't partial evaluation that could be done by an interpreter need to be kicked off from a top level value anyway? :thinking:

Richard Feldman (Nov 10 2023 at 20:21):

interpreting as opposed to like eta reduction

Notification Bot (Nov 10 2023 at 20:22):

17 messages were moved here from #show and tell > RustNL compiler performance talk by Richard Feldman.

Ayaz Hafiz (Nov 10 2023 at 20:24):

If you have a function

foo = \x ->
  heavyCompileTimeComputation = \{} -> ... + x
  heavyCompileTimeComputation {}

you could partially evaluate "heavyCompileTimeComputation" with an IR-level interpreter, but I don't think you could with a dev backend

Ayaz Hafiz (Nov 10 2023 at 20:24):

You also need to implement readback from the dev backend if you want the computed values to be architecture-independent

Richard Feldman (Nov 10 2023 at 20:25):

as for the mechanics, I was thinking that the goal would be to turn all top-level values into bytes in the readonly section - so basically:

compile everything using dev backend, generate thunks for all the top-level values (or hoisted ones) that we can subsequently call
one at a time, call them using a bump allocator for roc_alloc. They can be called concurrently.
after each call, we now have an in-memory representation of the compiled value. mark those as "to be copied into the readonly section"

Richard Feldman (Nov 10 2023 at 20:25):

I was thinking they'd be target-specific

Richard Feldman (Nov 10 2023 at 20:28):

hm that's interesting - I wonder about the tradeoffs around total compile time

Richard Feldman (Nov 10 2023 at 20:28):

the main thing I'm thinking about here is dev build times :big_smile:

Richard Feldman (Nov 10 2023 at 20:28):

but maybe if we had an interpreter that could do partial evaluation during --optimize builds that would be the best of both worlds?

Ayaz Hafiz (Nov 10 2023 at 20:31):

I mean in dev builds do you want partial evaluation at all?

Ayaz Hafiz (Nov 10 2023 at 20:31):

Presumably it would be faster except in the case where you have a hot loop of expensive computation to just execute the code as-is

Richard Feldman (Nov 10 2023 at 20:40):

I don't think we want partial evaluation in dev builds, but there's a "consistency between dev and optimized builds" issue if we don't evaluate top-level values at compile time in both cases

Richard Feldman (Nov 10 2023 at 20:40):

e.g. if there's a crash or dbg in them, when do you see it (or do you see it at all) etc.

Richard Feldman (Nov 10 2023 at 20:41):

also potentially in the future there might be value to being able to expose top-level static values directly to hosts instead of putting them behind a thunk (not sure, but it's a consideration)

Ayaz Hafiz (Nov 11 2023 at 00:02):

but there's a "consistency between dev and optimized builds" issue if we don't evaluate top-level values at compile time in both cases

I worry about supporting consistency of this form in the first place - this means if we ever want to change the scope of what can and cannot be evaluated at compile time, this is a problem, right?
I think one reasonable alternative is to not evaluate anything with a dbg/crash in it at compile time, or make it an error to have that in a toplevel value.

Richard Feldman (Nov 11 2023 at 11:49):

potentially, although I know having crash in toplevel values is useful - e.g. in Elm this would come up when people wanted to compile regular expressions as toplevel values and then immediately get rid of the Result from parsing the regex string, so if it failed you'd know about it right away when the program started up, and otherwise you could use it without the Result across the whole program

Richard Feldman (Nov 11 2023 at 11:50):

so I guess any top level value that would involve parsing something would want crash

Richard Feldman (Nov 11 2023 at 11:51):

and disallowing dbg could make it hard to debug nontrivial function calls in the top level values

Brian Carroll (Nov 11 2023 at 12:07):

one reasonable alternative is to not evaluate anything with a dbg/crash in it at compile time

This part of Ayaz's suggestion doesn't cause a mismatch between dev and release though.
And it's compatible with what Richard is describing from Elm.
The main difference is that when Elm generates constant values in JS, they are dynamically evaluated once, when the JS engine reads the elm.js script.
To get that behaviour, we could generate an init_constants function that evaluates all the constant thunks on starting the program. I'm not sure how it would get called from the host though.

Brian Carroll (Nov 11 2023 at 12:24):

I suppose we could insert some code in mainForHost to call the init thunks.

Brian Carroll (Nov 11 2023 at 12:29):

But the current behaviour of crashing on first usage probably fits better with other things in Roc (like running with type errors as long as you don't hit that path)

Ayaz Hafiz (Nov 11 2023 at 15:34):

In any case, that’s probably a separate concern from the method of evaluation itself. I’m still pretty concerned about the dev backend’s architecture-specific output (doesn’t that mean cross compilation is limited?) and only being to work on the top level, so no partial constant folding etc

Agus Zubiaga (Nov 11 2023 at 15:47):

Couldn’t constant expressions in functions be lifted to the top so that they can be evaluated independently (with the dev backend)? Sorry if this is a dumb question, I’m just curious :)

Richard Feldman (Nov 12 2023 at 04:19):

@Agus - I'm assuming so, yes :big_smile:

Richard Feldman (Nov 12 2023 at 04:20):

Ayaz Hafiz said:

I’m still pretty concerned about the dev backend’s architecture-specific output (doesn’t that mean cross compilation is limited?)

I'm not sure I follow this part - do you mean that, for example, something might crash when building due to OOM on a 32-bit target but not on a 64-bit target? Or something else?

Richard Feldman (Nov 12 2023 at 04:24):

as an aside, I didn't say it explicitly earlier, but one of the things I like about the dev backend for top-level constant evaluation idea is that it essentially adds nothing to the build+run cost:

if we build and run them at compile time, then there's zero cost at runtime
if we build them at compile time and then run them as soon as the program starts (which isn't really a thing in a Roc application, since the host is in charge of when the compiled Roc code gets called, but let's pretend it was a thing) then we pay the same cost immediately on program start that we saved from not doing it during the build

so in terms of dev builds, where you basically always build and then immediately run, I think the cost of evaluating them at compile-time ends up being essentially zero. Really it's the difference between however we end up storing them (after evaluating them) at build time versus at runtime, but if anything that probably makes the build-time version faster because it always gets to bump allocate the entire evaluation, whereas the runtime version only can in some situations (depending on the platform)

Richard Feldman (Nov 12 2023 at 04:25):

in contrast, I'd expect using an interpreter at build time to slow down dev builds by some amount. Probably for most constants it would be trivial, but I can imagine people trying to do some ambitious things at build time using this feature, especially in conjunction with being able to bring in files as Str or List U8 constants.

Richard Feldman (Nov 12 2023 at 04:27):

but like I said, I think using an interpreter in --optimize for partial constant folding seems reasonable - and in fact we could potentially do both in --optimize builds if that's faster: use the dev backend for evaluating all the top-level constants (none of which will be partial) and then afterwards doing a pass with the interpreter do all the remaining (potentially partial) things

Brian Carroll (Nov 12 2023 at 08:06):

Richard Feldman said:

Ayaz Hafiz said:

I’m still pretty concerned about the dev backend’s architecture-specific output (doesn’t that mean cross compilation is limited?)

I'm not sure I follow this part - do you mean that, for example, something might crash when building due to OOM on a 32-bit target but not on a 64-bit target? Or something else?

Layouts can be different between targets. Alignment of 128 bit ints, for example, which could affect field order. Or pointer size 64 vs 32. So if you evaluate constants in the dev backend, you will get bytes that are appropriate for the machine you compiled on. But maybe not on the final target machine.

Brian Carroll (Nov 12 2023 at 08:09):

But I suppose you could do what we do in the REPL to turn it back into an AST value. Then turn that into IR and swap _that_ into the target program.

Richard Feldman (Nov 13 2023 at 02:24):

Brian Carroll said:

if you evaluate constants in the dev backend, you will get bytes that are appropriate for the machine you compiled on. But maybe not on the final target machine.

ahh that makes sense! :thumbs_up:

I think that wouldn't be a problem in practice because the steps would be:

run it with a bump allocator to get the bytes in memory
copy the bytes in memory over to the readonly section, which might have a different layout than the source section

even when compiling for the same target as the current machine, the second step couldn't be as simple as one big memcpy because the pointer locations would need to change—so there would always need to be some transformations during the copying.

Richard Feldman (Nov 13 2023 at 02:25):

but I think the key thing is that all the same information is there regardless of target architecture, if that makes sense - like, all the numbers should be the same, all the strings should be the same, all the records and tag unions should have the same number of fields/variants, etc. (even if they have different orderings on the target)

Richard Feldman (Nov 13 2023 at 02:27):

so if the layouts happen to be identical, then more things can be memcpy'd over, but otherwise there might need to be some reordering (e.g. copy from offset 0-15 in the source address into offset 16-31 in the destination address because the field ordering is different on this target) and also pointer size differences, but I think that's it!

Ayaz Hafiz (Nov 13 2023 at 03:01):

That kind of readback is doable, but to me it feels really complicated - now you need a separate pass in the compiler that's similar to what the REPL/glue do, for converting the bytes to the target (in my mind this is more complicated than an interpreter given our IR is very small, but I appreciate this could be a wrong opinion). I guess I'm just not convinced that the complexity, and the limitations of the only-toplevel-evaluator are better than having an IR-level interpreter, given that most of the time the expressions/partial programs to evaluate will be pretty small in size (of course there are exceptions), and the performance is likely not to be too differential.

Ayaz Hafiz (Nov 13 2023 at 03:05):

Another consideration is that to perform any meaningful evaluation over the IR (including partial evaluation) we would need to have a pass that performs analysis over the constant subset (i.e. similar to CSE)

Brian Carroll (Nov 13 2023 at 09:24):

there might need to be some reordering ... and also pointer size differences, but I think that's it!

Not quite. For some tag unions, we store the tag ID in the lower bits of the pointer, if the number of tags is less than the pointer width in bytes. So if you have a 5-tag union that you're cross compiling from x86_64 to Wasm, that complicates things a bit.

Brian Carroll (Nov 13 2023 at 09:47):

We are all probably thinking of different implementations

Brian Carroll (Nov 13 2023 at 09:47):

The implementation I imagined earlier was this:

Take the expression and generate code using dev backend for the machine you're compiling on
Run the code
Pass the bytes through the REPL logic to recover an AST representation of it
Re-compile that for the real target

Brian Carroll (Nov 13 2023 at 09:47):

But I think Richard is describing something like this:

Take the expression and generate code using dev backend for the machine you're compiling on
Run the code to get bytes for the value
Traverse the bytes using the known Layout, transforming them for the final target
- At each node of the Layout, call methods on it such as layout.stack_size(target_info), layout.alignment_bytes(target_info), layout.stores_tag_id_in_pointer_bits(target_info).
- Compare those values for both targets, and manipulate the array of bytes accordingly
- In a value that has internal pointers, allocate space in the data section to get a valid address for the final target (how is this possible? Are we doing all this from the back end? I guess we need a relocation.)
Insert the transformed bytes into... the IR? The final binary? What stage of the compiler is actually driving this?

Brian Carroll (Nov 13 2023 at 09:58):

And I think the IR interpreter would work like this

Insert a constant folding pass, either after type checking or after mono IR generation, I'm not sure
Traverse the IR, partially evaluating whatever you can, and producing a new IR
Have rules for partially evaluating each expression
Have implementations of all the low-levels so that you can reduce 2+2 to 4 etc.

Brian Carroll (Nov 13 2023 at 09:59):

Note that the dev backend and the interpreter both have to traverse the IR. But after that pass, the "interpreter based" system is finished and can immediately start generating the final target. Whereas in the "dev backend based" solution we have to do a bunch of extra steps - executing, transforming bytes, etc.

Richard Feldman (Nov 13 2023 at 12:28):

that's true if the top level expression is trivial, but it could be very involved - e.g. it might call functions, those might conditionally call other functions, etc.

so maybe another way to express my concern is that if people start doing more involved things with constant evaluation, if we're executing all of that with the dev backend then it's going at dev backend speeds, and if interpreted, it's going at slower speeds

Richard Feldman (Nov 13 2023 at 12:29):

so it could be fine (or even faster) to interpret when the top level expression is not doing much, but maybe not if it's doing a lot

Richard Feldman (Nov 13 2023 at 12:30):

I suppose one thing we could try is interpreting by default and then if in the future people run into perf problems in practice, consider dev backend as a way to improve that

Brian Carroll (Nov 13 2023 at 12:55):

if we're executing all of that with the dev backend then it's going at dev backend speeds, and if interpreted, it's going at slower speeds

I don't think that's right.
I know we associate "dev backend" with "fast" and we associate "interpreter" with "slow".
But I think it's a false comparison.
It is comparing "fast" compile time to "slow" execution time.
But those times are actually about the same.

The dev backend traverses the IR, accumulates some stuff, and then serializes that stuff. And we are saying that's fast.
The IR interpreter just needs to traverse the IR and accumlate some stuff. That's a similar operation to the dev backend. But now just because it's an "interpreter", we are saying it's slow.

Brian Carroll (Nov 13 2023 at 12:56):

Especially when after running the dev backend you are still not finished, you still have to execute it and transform the bytes.

Brian Carroll (Nov 13 2023 at 12:59):

Hmm... although I just realised that in the case of an iteration or recursion that might not be true.

Brian Carroll (Nov 13 2023 at 13:00):

The IR interpreter would have to actually do the iteration but the code generator wouldn't.

Brian Carroll (Nov 13 2023 at 13:07):

so it could be fine (or even faster) to interpret when the top level expression is not doing much, but maybe not if it's doing a lot

OK yes I see this point now, it could go either way.

Brian Carroll (Nov 13 2023 at 13:10):

Maybe the IR interpreter should bail out of recursions/iterations after some limit. Or bail out of function calls after some stack depth limit.

Brian Carroll (Nov 13 2023 at 13:12):

Though maybe that makes it hard for a user to know what to expect.

Richard Feldman (Nov 13 2023 at 13:35):

yeah some systems do that, but I don't like the idea of doing that - it means you can take working code, extract it to the top level, and now it no longer builds

Richard Feldman (Nov 13 2023 at 13:37):

in general I don't think top level evaluations should have any restrictions beyond what they'd have if they were evaluated anywhere else

Richard Feldman (Nov 13 2023 at 13:38):

an implication of this is that they can hang your build if there's an infinite loop in them, but I think that's worth accepting as a downside

Brian Carroll (Dec 03 2023 at 12:31):

Which IR would we run this interpreter on? Canonical or monomorphized?

Brian Carroll (Dec 03 2023 at 12:53):

I suppose the canonical one should contain less data, so it would be quicker to traverse.
Better to evaluate each def once generically rather than redoing it for each specialization.
So we transform a complicated generic expression into a simpler generic expression, and then generate specializations of that simplified version.

Ayaz Hafiz (Dec 03 2023 at 15:26):

Probably worth at least spiking both and profiling? The nice thing from the IR one is you are like to be able to evaluate deeper, inline more, etc since you may have non-generic specialization calls that aren’t visible before type specialization (eg ability members). And you can trivially write a stack based interpreter instead of a tree one.

Brian Carroll (Dec 03 2023 at 15:49):

The nice thing from the IR one

again, which IR? We are comparing two IRs! :laughing:

Brian Carroll (Dec 03 2023 at 15:50):

Oh OK you mean mono IR

Brian Carroll (Dec 03 2023 at 15:54):

Maybe nobody refers to can as an IR. I thought we did but I haven't really worked that close to the front end very much so I don't know.
I suppose AST is probably a better description of can, and IR usually implies something more linearised.

Brian Carroll (Dec 03 2023 at 15:56):

So this is probably just me still learning compiler terminology 2 years in!

Ayaz Hafiz (Dec 03 2023 at 16:11):

Sorry, yes, the mono IR

Richard Feldman (Dec 21 2023 at 02:54):

cross-posting since it's relevant to this discussion:

Richard Feldman said:

I wrote up a proposal for this!

https://docs.google.com/document/d/1hp12UnSdS0cIIaUjLhp2NzmzUewwBxp2Mtwk3CNtnI4/edit?usp=sharing

(let's discuss that proposal on the other thread, since this one is about IR interpreting and that proposal proposes using a different design!)

Last updated: Jul 23 2026 at 13:15 UTC