I'd like to start a conversation around what we can do to hedge for a good separate and incremental compilation experience. Currently, Roc operates as a whole-program compiler, and this breaks separate module boundaries due to guaranteed call specialization and guaranteed defunctionalization.
I was talking to someone who works on OCaml and they noted that indeed you want whole-program analysis, but during development separate (and incremental) compilation is key. So one proposal is - during development builds, defunctionalize only within a module (and, perhaps extend this to specialize only within a module). Any types that escape, or are used by a module, would be boxed if they were functionalized. This can also be extended to specialization, where generic values that escape a module are compiled to be runtime-polymorphic.
that seems fine to me conceptually, although I definitely wonder about implementation complexity :sweat_smile:
that said, I also wonder about implementation complexity of alternatives
I think specialization is the general problem for incremental compiles
like if we didn't specialize at all, for example, then we could compile a given module all the way down to a binary representation of everything
This is why rust splits out frontend and codegen, right? frontend gets enough info that other modulescrates can start specializing.
might be one reason!
Defunctionalization is a form of specialization for us, but it's different enough that it think it warrants logically separating
the big to avoid is not having to re-enter an upstream module that you've already compiled as a dependency. That currently is not guaranteed, because a downstream depender can arbitrarily issue a new specialization (including injecting a new function, forcing a fresh defunctionalization) of an upstream module
so imagining no specialization, after compiling a given module for the first time, we could store on disk:
roc check too, not just for full builds)right exactly
one challenge I'm not sure how to solve with the "box things sometimes" approach: closures can be sent to platforms, and the host expects them to be defunctionalized regardless of where they originated
so if we had some boundary across which we don't defunctionalize, a closure could get passed from one module to another and then eventually to the host, where its having been boxed would break the host
Why would it break the host? The host can't do anything with closure other than call it right? Every boxed closure would have the same representation as seen by the host, as long as you distinguish between development and release builds. Even if there were additional constraints, glue could paper over them
the host assumes the captured data is unboxed
actually - @Folkert de Vries you might remember which way we ended up going on this - I think maybe we just give the host a pointer
like the host has to call a function which returns a pointer to the captured data
Glue can paper over that right. whether the data is captured or not matters only for calling the closure, the code for which we can generate
well what we do now is get the size of the closure, and then provide a buffer for the function to be written into
oh perfect
so whether the buffer is boxed or unboxed doesn't matter, right?
How useful would it be to cache each module by specialization? If the module is just a dependency, that caching would likely rarely change. When it does change, you just have to compile the module for one new specialization. When editing a specific module, of course you would have to pay to update all specializations, but I assume that would greatly lower the cost.
the host can't tell
Ayaz Hafiz said:
Glue can paper over that right. whether the data is captured or not matters only for calling the closure, the code for which we can generate
technically glue can't - both glue and the host don't (and I think realistically can't) know about dev vs release distinction
but the host does call code from the application
and that can be different on dev vs release
and this is one of the cases where that seems to be true!
yeah the captures are just a bunch of bytes (stored in a RocList at the moment)
(btw I gotta go for a bit, back later!)
yes, the code we generate in the application is what I mean
We could always force glue to "just" call a function generated in the application for the closure application, whether or not that's something that is done today
that is what we do today
but the platform has to hold on to the captured bytes for a little bit
how it gets those bytes, and what is in there exactly, does not matter
you will however need to make sure the platform works when running in both debug and release mode
and likely that means calling different functions
I think that's where I might be losing you. I don't see why we need to call different functions, as long as both the "get captures" and "call the closure" wrapper are generated by Roc
I think the signature of "get captures" is different between the two cases. In debug, the data is already boxed, so that function can return a pointer. In release mode, the platform makes the allocation and the roc code writes bytes into this allocation
right - but that is okay even in the debug mode right? like, it might be seen as a deoptimization, but the Roc code could, instead of returning a pointer to the boxed data, write the pointer into the buffer given to it by the platform (or even copy the captures into that buffer). And then the Roc code generated to call the closure could read from the buffer appropriately again. I realize it's a bit wasteful but my claim is we can preserve the same interface, and if it's a debug build, it doesn't need to be optimized
hmm, yes that should work
#beginners > Compile time performance is another motivation for this. I have some ideas on how to deal with the problems presented there in the near term, but we will always have a fundamental problem of needing to pay a linear cost to check the recursiveness of large, structural types (including lambda sets), and caching is not one-shot because we must invalidate after unification.
Ayaz Hafiz said:
we will always have a fundamental problem of needing to pay a linear cost to check the recursiveness of large, structural types (including lambda sets), and caching is not one-shot because we must invalidate after unification.
I wonder if there's a way to use type annotations to help with that :thinking:
it can help to an extent, but unless we switch to a nominal type system I don't think it's a foolproof suggestion
oh sure, I just wonder to what extent
like let's say someone has annotated all the types that use recursion (including lambda sets)
how close would that get us to nominal types?
in terms of perf
We would be at it, but that supposes that we do not actually need to do inference of types. if we need to do inference, the problem resurges immediately. maybe you could do something clever with bidirectional or tridirectional type checking, but such a system is not immediately clear to me
I don't think we can a-priori use that for lambda set inference, without forcing lambda sets to be expressible in the user syntax, which I don't believe we'd like to do.
yeah definitely not
to be honest, I expect that in practice people will approximately always be annotating large recursive types
the main situation where in general I'd expect types to be un-annotated are:
Task (for error types and/or effect types specifically) - so, ordinary tag union unificationput another way, if getting the best performance for large recursive types involved optimizing for the case where they're annotated, I would expect people to very rarely notice :big_smile:
that may be true, but we cannot give up inference, as long as we want to support it, even if there are some annotations. We must perform inference and check those against annotations (in fact, this is why annotations cause us to do more work - we must infer, then check against the annotation). If we only check and try to avoid inference, our type system breaks down (I can link some issues from the past).
Even if we were able to use annotations for structural types to lower the type-checking cost, that wouldn't address the problem for lambda sets, which is the specific problem I linked to that Agus observed.
hmm, yeah that's fair
lambda sets are exactly the large, recursive types you might observe, but they can never be annotated in the user syntax
in fact, they are likely to be far larger than any explicitly-defined recursive type written out by a user
since in general, they would consist of a superset of the recursive structural type they appear under
is the issue there that the lambda set itself is really big?
e.g. a parser that returns a continuation to another parser has a lambda set that captures the type of the entire parser type structure that the user has written out
or that it's big enough that inference around it gets slowed down
inference gets slow
we have to check for recursion
which demands an occurs check since the type system is structural
so when we're checking for recursion in the lambda set, that's recursion on the data structures the lambda it captures? or recursion in the functions themselves? or both?
or something else?
it's recursion of the lambda set itself
just like how tag unions can be recursive, so can lambda sets
hm, how would that manifest in practice?
like what would cause a lambda set to be recursive
That is to say, you can have a function that "captures itself", and that must be handled
oh it happens quite often
task continuations are one example
gotcha
so a function that recurses would be one way for that to happen, but also it could just (for some reason) return itself
or even theoretically use itself in a calculation, by passing itself to another function and then not returning that answer
here is an example/explainer comment I have in the solve code
Lambda sets may be recursive. For example, consider the annotated program
```text
XEffect : A -> B
after : ({} -> XEffect) -> XEffect
after =
\cont ->
f = \A -[`f (typeof cont)]-> when cont {} is A -> B
f
nestForever : {} -> XEffect
nestForever = \{} -[`nestForever]-> after nestForever
^^^^^^^^^^^ {} -[`nestForever]-> A -[`f ({} -[`nestForever]-> A -[`f ...]-> B)]-> B
```
where [`nestForever] and [`f ...] refer to the lambda sets of their respective arrows. `f`
captures `cont`. The usage of `after` in `nestForever` means that `nestForever` has type
``nestForever : {} -[`nestForever]-> A -[`f (typeof cont)]-> B``. But also, `after` is called
with ``nestForever`, which means in this case `typeof cont = typeof nestForever``. So we see
that ``nestForever : {} -[`nestForever]-> A -[`f (typeof nestForever)]-> B``, and the lambda
set ``[`f (typeof nestForever)]`` is recursive.
However, we don't know if a lambda set is recursive or not until type inference.
this is a simplified example of what Effect.after does today
I assume I'm missing something, but could something like this work?
foo references itself; function bar does not") during canonicalization, and then when making that lambda set in the first place say "yes this is recursive" or "no this is not recursive"or do we need to know more specific info than just yes/no it's recursive
you need to know more
booo
because, that doesn't tell you where the point of recursion happens - which is the real key
:thinking: could canonicalization help with that?
if we only care to know whether something is recursive or not, it wouldn't really matter. but we must know where the recursion occurs
if we wrote down more info
the other thing is dealing with generic functions, which I don't see how that helps with
Maybe, Im not sure I see how canonicalization could though
like, it doesn't even fully work for resolving recursive type aliases during canonicalization
we still have to fix-up some of them during inference
I think lambda sets would only be worse than the story for recursive type aliases, since we don't even have annotations for them (and won't until inference)
I might be missing something though
yeah I was thinking something like in addition to writing down which functions reference which others, we also write down some context on where those references happened
yeah.. well now we are quite close to just doing unification anyway :)
true haha
to be clear I think we can deal with the problem Agus saw to an extent by tracking what variables we already did an occurs check for during a unification, and not revisiting those. It's a bit naive because we have to invalidate the list on every unification, but I suspect that given how tall that flamegraph was, it will help
but I dont think that's a full solution because of the fundamental need we have with structural types
yep, makes sense
but that approach sounds solid as a first step!
Last updated: Jun 16 2026 at 16:19 UTC