As Roc is a functional language, tail call optimization is very important to avoid stack overflows. Having written a bunch of Scala over the years, I really appreciate the @tailrec annotation you can add to any function in Scala. It enforces that the function is written in a form that will become tail call optimized. If the function is not, then it's a compilation error. This is a nice way to think of optimizations I feel: if an optimization is crucial to your code, you can enforce that the code has that optimization. This approach is so nice that we are emulating it in Elm with elm-review.
In many other languages, whether any given optimization is applied or not can feel like luck. I got the same feeling when watching the presentation on opportunistic in-place mutation. This optimization only applies when there are not further references to the item that is being mutated, making it really easy to break the optimization by taking a new reference to the old item. That's really easy to do by accident. For this optimization, one could imagine an annotation ala @takesLastReference or @consumes or some such, that enforces this behavior on the caller.
Another case where this would be extremely helpful I think would be with vectorization (converting code to using SIMD instructions). LLVM is fairly good at doing auto-vectorization, but it's tricky to figure out if your code is actually being vectorized or not. Something like an @isVectorized annotation that guarantees vectorization would be very useful, as the only alternative to having the compiler discover that code should be vectorized, is to write SIMD code by hand.
These are all varying difficulty to implement of course, and I have no idea if there's something like an annotation mechanism that might fit this usecase in Roc. But for achieving high performance with low and delightful effort, this fits perfectly I feel.
it's something we think about, but as you say it can be hard to implement.
Certainly tall-call elimination is feasible, we'd just have to find a way to express that syntactically. For the in-place mutation, we have two mechanisms: static and dynamic. We've thought about adding expect-unique myList assertions, but those will only do something at runtime, where you may not want these extra checks in production builds (and the optimization might not fire in debug builds)
vectorization is harder still, because how do we retrieve from llvm that it indeed vectorized some code as expected
there are cases where the vectorization that is picked is not optimal
Since Roc wants to be deeply integrated with the Roc Editor, we could have a plugin that informs what transformations the compiler applies and guide the user. That way, we move this feature away from a keyword.
expect-unique sounds like a nice solution. Though it seems comparable to a bounds check in terms of performance, it's really more of an assertion. And those can often unlock other LLVM optimizations. In this case expect-unique is an assertion that the reference count equals 1, which is information LLVM can potentially use to optimize the rest of the code further. So I would expect such a check to be quite cheap in practice.
For tail calls, maybe there could be a function ala expect-constant-stack that could take a function as input and fail to compile if TCO cannot be applied. Whether Roc has the means to support such a function today is beyond me though :smiley:
And integrating with the editor is nice, though I don't think it's sufficient by itself. But having them for quick feedback still makes good sense.
yeah an editor only shows one page of code, so I think global asserts are useful
we already explicitly branch on RC == 1 to get optimizations (much more than llvm can provide) and that branch is short-circuited if we can statically prove that the RC will be one at a certain point in the code
As long as expect-unique comes before the existing check on the reference count, I would expect LLVM to then remove the second check altogether, making expect-unique zero cost. I don't have enough expertise with LLVM to be 100% certain though.
It would still be nice to know that it fails statically at least, but combined with information in the editor, it might be good enough :blush:
that is sadly not how llvm works: the refcount is stored on the heap, and llvm is very conservative about optimization there
things are improving as rust gets more mature
we should look into that at some point
Ah yes
To backtrack a bit, it also seems like there are two different cases. For expect-unique, we want to validate the input of the function, whereas to check for TCO compatibility, we would need to validate the "shape" of the function. But for the in-place mutation, wouldn't we also require checking the shape of the function to get the optimization to be most effective?
That is, you can probably optimize any function with in-place mutation more or less, but I assume you're only going to get its benefits if you're updating an existing record, rather than, say, creating a new record or tuple on every iteration of your function. I don't know if there's anything general that can be done to make sure a function has an optimal "shape" like this though.
For vectorisation, it could also be that, instead of relying on auto-vectorisation, that if your code is validated to live up to a given shape, then the Roc compiler can insert SIMD instructions to improve performance. So if your code lives up to the expect-avx-512 requirements, then the Roc compiler will supply LLVM with AVX 512 IR. But that's also more work for the compiler authors of course, so I don't know if that would be worthwhile.
At a very general level, maybe you just want to be able to check that your function compiles to some simplified AST, where you can observe that optimizations have happened. So observe that your tail call has become a loop for example.
I think the best you can do with llvm is to use array types/functions instead of just doing stuff with pointers
and then llvm has additional vectorization guarantees, or at least opportunities
but I know that the zig compiler uses arrays heavily and regularly finds bugs in how llvm handles them
in general, it's been fun to see rust and zig stress-test llvm in ways that clang just can't
Yeah, I'm also curious what kind of findings Roc will expose in LLVM over time :big_smile:
Just a note, expect-unique isn't enough, and often times isn't that useful. It definitely would be helpful, but it is very limited.
After messing with a lot of roc code and thinking about this a good bit, i think a much much better solution involves some way to integrate with a profiling tool. For example, just like perf can be used to give a heat profile of every line of code, it should be possible to track calls to roc_alloc and roc_realloc to create a heat map of allocations. Then we can point users to specific functions and lines of code that are leading to the copies. Of course map through transforms will be a bit complex, but i think a tool of this nature would be leaps and bounds more useful than expect unique.
here's an idea on this topic: what if we had a way to say "between when this value was defined and right now, I expect it will never have its reference count incremented"
for example, let's say the syntax was (I'm just making this up, it's not important) foo!
so then if I have a function like \foo, bar -> ... and then at some point deep in the function I write someOtherFn foo! bar
then what that's saying is "give me compile error if there is any possible code path between where I defined foo as an argument and where I wrote foo! in which foo's reference count could be incremented"
so I'm not guaranteeing uniqueness but I am guaranteeing that if I received a unique foo it will still be unique when I reference it as foo!
in other words it's not that foo is "unique" but rather that its uniqueness is preserved between when it was declared and a particular usage site
I wonder how useful that would be (assuming it were a reasonable thing to implement in the compiler, which seems plausible) :thinking:
between when this value was defined and right now, I expect it will never have its reference count incremented
Seems like it could be potentially interesting to just say that between the introduction of the value and the foo! use, there is no net change in reference count. I think that enables the same optimizations?
That definitely sounds more useful than expect-unique
Though, not exactly sure how you track a variable though all code paths. Likely, it's name will change because you would pass the value through a pipeline of some sort and want it to be modified in place if it is unique and copied only once if not unique
right, it wouldn't do that; the question is how useful that would be in practice
I'm not really sure, tbh!
I guess you could look at cases where you've wanted this sort of thing and see how useful it would have been if it existed?
Would it work to just issue warnings any time the compiler knows it will have to clone a possibly-large value (e.g. a collection) - and have a way to suppress those warnings?
But every List is "possibly large". So now the compiler shouts at you whenever you use a list!
I was thinking about this more and I don't think stopping all refcounts will be generally useful. I think a lot of the time refcount changes are totally fine, but copies are the problem. For example:
x = loadGiantConfig
y = doThing1 x
# do other stuff with y
# Ope, time to use x again.
# This is the last use, so putting `x!` to make sure no one accidentally copies `x`
doThing2 x!
In the case above we would get an error. The refcount of x is incremented before and decremented after doThing1. If doThing1 and doThing2 are written correctly, x will never be copied, but it is guaranteed to have it's refcount change.
I guess technically we want to stop all refcount checks (places that could copy x) that are using a version of x with a changed refcount. That is to say, x's refcount can change as long as it is not used by a function that would add an additional copy if x has a non-unique refcount.
In the example above, doThing1 would only be allowed to use x in ways that never check the refcount. doThing2, on the other hand, should be allowed to use x in a way that checks the refcount. This is because it is consuming x without any sort of refcount changes.
As I am writting this, I think I am just describing an implicit form of something close to the rust borrowing system, just through refcounts.
Also, I think x! syntax would work best if it was an annotation on a standalone line. In the example above, does x! apply to doThing2 or does it apply up to doThing2. What would I write if I want to ensure that doThing2 does not change the refcount of x?
But every List is "possibly large". So now the compiler shouts at you whenever you use a list!
I was thinking we'd exempt any locations where the compiler was either able to see that it doesn't need a copy - either because the code (recursively) doesn't call anything that would want to be able to do an in-place update - or because there is an in-place-update, and morphic(?) can prove that the refcount is guaranteed to be 1 there.
I was thinking we'd exempt any locations where the compiler was either able to see that it doesn't need a copy
I think this would be much more limited than you expect (at least with the current compiler).
It also would likely confuse end users who just want to use roc as a delightful language and maybe don't want to figure out the hacks to avoid a few additional copies.
I definitely think any system we add for performance optimization should either be:
roc build --complain-about-copying .. - for warnings on potentially copies, though I think it would be way too noisy)roc memgraph-run ... - to get a chart of memory usage by line)I would assume most people don't want to write string processing functions in Roc like this to hopefully avoid copies of data and duplicate work: https://github.com/roc-lang/roc/blob/a07f3226b61da9a71e5e7621bf81327f67b82bf1/crates/glue/src/RustGlue.roc#L392
I was messing around, and at least at the function level we could aggregate allocations currently. Would need better debug info to aggregate by line, which would be more useful. Also, lambdas having no name is quite inconvenient without line numbers. That said, the base idea definitely could work and even with just this level of names could potentially be mapped back to the original file by looking at the exact call chain. This of course would be aggregated in a frame graph based on the allocation size:
New alloc of size: 120
roc.builtins.utils.allocate_withrefcount
New alloc of size: 840
List.withCapacity
List.repeat
Context.66
File.15
Task.42
New alloc of size: 24
roc.fxgetFileBytes
roc.fx_getFileBytes_fastccwrapper
Effect.loop_inner
New alloc of size: 120
List.replaceUnsafe
List.replace
List.set
Context.74
Task.42
Effect.loop_inner
New alloc of size: 24
roc.fxgetFileBytes
roc.fx_getFileBytes_fastccwrapper
Effect.loop_inner
Just did the aggregation for false interpreter:
Top 5 allocating functions from nqueens in false
I can actually pinpoint the exact line of code from most of these stack traces:
Also, still insane how much data is allocated by the false interpreter. The top allocation chain allocated 26 megabytes when doing a super simple nqueens problem.
Do we have enough information when we create the name of a lambda that we could instead name them something like #Userapp_lambda_on_line_x instead of just naming them with an incrementing number?
Not right now, but it should be straightforward to start keeping debug info alongside the data structures mono produces from can
However if you’re willing to hack the compiler @Brendan Hansknecht you can use can/src/debug/print.rs to print the AST for each module you’re interested in, and the printed representation will include what lambdas are named what
Ok. I'll probably take a look at that
Apparently I got the format wrong and uploaded a slightly off svg. Here is a corrected version: false-nqueens-allocations.svg
Also as a png, cause I think zulip will directly display that: false-nqueens-allocations.png
Last updated: Jun 16 2026 at 16:19 UTC