Enforced optimizations · ideas · Zulip Chat Archive

As Roc is a functional language, tail call optimization is very important to avoid stack overflows. Having written a bunch of Scala over the years, I really appreciate the @tailrec annotation you can add to any function in Scala. It enforces that the function is written in a form that will become tail call optimized. If the function is not, then it's a compilation error. This is a nice way to think of optimizations I feel: if an optimization is crucial to your code, you can enforce that the code has that optimization. This approach is so nice that we are emulating it in Elm with elm-review.

In many other languages, whether any given optimization is applied or not can feel like luck. I got the same feeling when watching the presentation on opportunistic in-place mutation. This optimization only applies when there are not further references to the item that is being mutated, making it really easy to break the optimization by taking a new reference to the old item. That's really easy to do by accident. For this optimization, one could imagine an annotation ala @takesLastReference or @consumes or some such, that enforces this behavior on the caller.

Another case where this would be extremely helpful I think would be with vectorization (converting code to using SIMD instructions). LLVM is fairly good at doing auto-vectorization, but it's tricky to figure out if your code is actually being vectorized or not. Something like an @isVectorized annotation that guarantees vectorization would be very useful, as the only alternative to having the compiler discover that code should be vectorized, is to write SIMD code by hand.

These are all varying difficulty to implement of course, and I have no idea if there's something like an annotation mechanism that might fit this usecase in Roc. But for achieving high performance with low and delightful effort, this fits perfectly I feel.

Folkert de Vries (Oct 24 2022 at 09:12):

Certainly tall-call elimination is feasible, we'd just have to find a way to express that syntactically. For the in-place mutation, we have two mechanisms: static and dynamic. We've thought about adding expect-unique myList assertions, but those will only do something at runtime, where you may not want these extra checks in production builds (and the optimization might not fire in debug builds)

Folkert de Vries (Oct 24 2022 at 09:13):

vectorization is harder still, because how do we retrieve from llvm that it indeed vectorized some code as expected

Folkert de Vries (Oct 24 2022 at 09:13):

Chris Duncan (Oct 24 2022 at 09:17):

Since Roc wants to be deeply integrated with the Roc Editor, we could have a plugin that informs what transformations the compiler applies and guide the user. That way, we move this feature away from a keyword.

Kasper Møller Andersen (Oct 24 2022 at 09:32):

expect-unique sounds like a nice solution. Though it seems comparable to a bounds check in terms of performance, it's really more of an assertion. And those can often unlock other LLVM optimizations. In this case expect-unique is an assertion that the reference count equals 1, which is information LLVM can potentially use to optimize the rest of the code further. So I would expect such a check to be quite cheap in practice.

For tail calls, maybe there could be a function ala expect-constant-stack that could take a function as input and fail to compile if TCO cannot be applied. Whether Roc has the means to support such a function today is beyond me though :smiley:

And integrating with the editor is nice, though I don't think it's sufficient by itself. But having them for quick feedback still makes good sense.

Folkert de Vries (Oct 24 2022 at 09:41):

yeah an editor only shows one page of code, so I think global asserts are useful

Folkert de Vries (Oct 24 2022 at 09:42):

we already explicitly branch on RC == 1 to get optimizations (much more than llvm can provide) and that branch is short-circuited if we can statically prove that the RC will be one at a certain point in the code

Kasper Møller Andersen (Oct 24 2022 at 11:28):

As long as expect-unique comes before the existing check on the reference count, I would expect LLVM to then remove the second check altogether, making expect-unique zero cost. I don't have enough expertise with LLVM to be 100% certain though.

It would still be nice to know that it fails statically at least, but combined with information in the editor, it might be good enough :blush:

Folkert de Vries (Oct 24 2022 at 11:40):

that is sadly not how llvm works: the refcount is stored on the heap, and llvm is very conservative about optimization there

Folkert de Vries (Oct 24 2022 at 11:40):

Kasper Møller Andersen (Oct 24 2022 at 11:58):

Kasper Møller Andersen (Oct 24 2022 at 12:09):

To backtrack a bit, it also seems like there are two different cases. For expect-unique, we want to validate the input of the function, whereas to check for TCO compatibility, we would need to validate the "shape" of the function. But for the in-place mutation, wouldn't we also require checking the shape of the function to get the optimization to be most effective?

That is, you can probably optimize any function with in-place mutation more or less, but I assume you're only going to get its benefits if you're updating an existing record, rather than, say, creating a new record or tuple on every iteration of your function. I don't know if there's anything general that can be done to make sure a function has an optimal "shape" like this though.

For vectorisation, it could also be that, instead of relying on auto-vectorisation, that if your code is validated to live up to a given shape, then the Roc compiler can insert SIMD instructions to improve performance. So if your code lives up to the expect-avx-512 requirements, then the Roc compiler will supply LLVM with AVX 512 IR. But that's also more work for the compiler authors of course, so I don't know if that would be worthwhile.

Kasper Møller Andersen (Oct 24 2022 at 12:11):

At a very general level, maybe you just want to be able to check that your function compiles to some simplified AST, where you can observe that optimizations have happened. So observe that your tail call has become a loop for example.

Folkert de Vries (Oct 24 2022 at 12:59):

I think the best you can do with llvm is to use array types/functions instead of just doing stuff with pointers

Folkert de Vries (Oct 24 2022 at 13:00):

and then llvm has additional vectorization guarantees, or at least opportunities

Folkert de Vries (Oct 24 2022 at 13:00):

but I know that the zig compiler uses arrays heavily and regularly finds bugs in how llvm handles them

Folkert de Vries (Oct 24 2022 at 13:01):

in general, it's been fun to see rust and zig stress-test llvm in ways that clang just can't

Kasper Møller Andersen (Oct 24 2022 at 13:42):

Yeah, I'm also curious what kind of findings Roc will expose in LLVM over time :big_smile:

Brendan Hansknecht (Oct 24 2022 at 14:15):

Just a note, expect-unique isn't enough, and often times isn't that useful. It definitely would be helpful, but it is very limited.

After messing with a lot of roc code and thinking about this a good bit, i think a much much better solution involves some way to integrate with a profiling tool. For example, just like perf can be used to give a heat profile of every line of code, it should be possible to track calls to roc_alloc and roc_realloc to create a heat map of allocations. Then we can point users to specific functions and lines of code that are leading to the copies. Of course map through transforms will be a bit complex, but i think a tool of this nature would be leaps and bounds more useful than expect unique.

Richard Feldman (Feb 28 2023 at 21:19):

here's an idea on this topic: what if we had a way to say "between when this value was defined and right now, I expect it will never have its reference count incremented"

Richard Feldman (Feb 28 2023 at 21:19):

for example, let's say the syntax was (I'm just making this up, it's not important) foo!

Richard Feldman (Feb 28 2023 at 21:19):

so then if I have a function like \foo, bar -> ... and then at some point deep in the function I write someOtherFn foo! bar

Richard Feldman (Feb 28 2023 at 21:20):

then what that's saying is "give me compile error if there is any possible code path between where I defined foo as an argument and where I wrote foo! in which foo's reference count could be incremented"

Richard Feldman (Feb 28 2023 at 21:20):

so I'm not guaranteeing uniqueness but I am guaranteeing that if I received a unique foo it will still be unique when I reference it as foo!

Richard Feldman (Feb 28 2023 at 21:20):

in other words it's not that foo is "unique" but rather that its uniqueness is preserved between when it was declared and a particular usage site

Richard Feldman (Feb 28 2023 at 21:21):

I wonder how useful that would be (assuming it were a reasonable thing to implement in the compiler, which seems plausible) :thinking:

Joshua Warner (Feb 28 2023 at 21:24):

Seems like it could be potentially interesting to just say that between the introduction of the value and the foo! use, there is no net change in reference count. I think that enables the same optimizations?

Brendan Hansknecht (Feb 28 2023 at 21:41):

Brendan Hansknecht (Feb 28 2023 at 21:43):

Though, not exactly sure how you track a variable though all code paths. Likely, it's name will change because you would pass the value through a pipeline of some sort and want it to be modified in place if it is unique and copied only once if not unique

Richard Feldman (Feb 28 2023 at 21:57):

right, it wouldn't do that; the question is how useful that would be in practice

Richard Feldman (Feb 28 2023 at 21:58):

I guess you could look at cases where you've wanted this sort of thing and see how useful it would have been if it existed?

Joshua Warner (Feb 28 2023 at 22:16):

Would it work to just issue warnings any time the compiler knows it will have to clone a possibly-large value (e.g. a collection) - and have a way to suppress those warnings?

Brian Carroll (Mar 01 2023 at 07:44):

But every List is "possibly large". So now the compiler shouts at you whenever you use a list!

Brendan Hansknecht (Mar 01 2023 at 16:19):

I was thinking about this more and I don't think stopping all refcounts will be generally useful. I think a lot of the time refcount changes are totally fine, but copies are the problem. For example:

x = loadGiantConfig

y = doThing1 x

# do other stuff with y

# Ope, time to use x again.
# This is the last use, so putting `x!` to make sure no one accidentally copies `x`
doThing2 x!

In the case above we would get an error. The refcount of x is incremented before and decremented after doThing1. If doThing1 and doThing2 are written correctly, x will never be copied, but it is guaranteed to have it's refcount change.

Brendan Hansknecht (Mar 01 2023 at 16:21):

I guess technically we want to stop all refcount checks (places that could copy x) that are using a version of x with a changed refcount. That is to say, x's refcount can change as long as it is not used by a function that would add an additional copy if x has a non-unique refcount.

Brendan Hansknecht (Mar 01 2023 at 16:23):

In the example above, doThing1 would only be allowed to use x in ways that never check the refcount. doThing2, on the other hand, should be allowed to use x in a way that checks the refcount. This is because it is consuming x without any sort of refcount changes.

Brendan Hansknecht (Mar 01 2023 at 16:24):

As I am writting this, I think I am just describing an implicit form of something close to the rust borrowing system, just through refcounts.

Brendan Hansknecht (Mar 01 2023 at 16:28):

Also, I think x! syntax would work best if it was an annotation on a standalone line. In the example above, does x! apply to doThing2 or does it apply up to doThing2. What would I write if I want to ensure that doThing2 does not change the refcount of x?

Joshua Warner (Mar 01 2023 at 16:33):

I was thinking we'd exempt any locations where the compiler was either able to see that it doesn't need a copy - either because the code (recursively) doesn't call anything that would want to be able to do an in-place update - or because there is an in-place-update, and morphic(?) can prove that the refcount is guaranteed to be 1 there.

Brendan Hansknecht (Mar 01 2023 at 16:38):

I think this would be much more limited than you expect (at least with the current compiler).

It also would likely confuse end users who just want to use roc as a delightful language and maybe don't want to figure out the hacks to avoid a few additional copies.

Brendan Hansknecht (Mar 01 2023 at 16:44):

I definitely think any system we add for performance optimization should either be:

Brendan Hansknecht (Mar 01 2023 at 16:46):

Brendan Hansknecht (Mar 02 2023 at 06:56):

I was messing around, and at least at the function level we could aggregate allocations currently. Would need better debug info to aggregate by line, which would be more useful. Also, lambdas having no name is quite inconvenient without line numbers. That said, the base idea definitely could work and even with just this level of names could potentially be mapped back to the original file by looking at the exact call chain. This of course would be aggregated in a frame graph based on the allocation size:

New alloc of size: 120
    roc.builtins.utils.allocate_withrefcount

New alloc of size: 840
    List.withCapacity
    List.repeat
    Context.66
    File.15
    Task.42

New alloc of size: 24
    roc.fxgetFileBytes
    roc.fx_getFileBytes_fastccwrapper
    Effect.loop_inner

New alloc of size: 120
    List.replaceUnsafe
    List.replace
    List.set
    Context.74
    Task.42
    Effect.loop_inner

New alloc of size: 24
    roc.fxgetFileBytes
    roc.fx_getFileBytes_fastccwrapper
    Effect.loop_inner

Brendan Hansknecht (Mar 02 2023 at 07:38):

Top 5 allocating functions from nqueens in false

Allocated 26079664 bytes at:
    List.replaceUnsafe
    List.replace
    List.set
    Context.74
    Task.42
    Effect.loop_inner

Allocated 3783576 bytes at:
    List.reserve
    List.append
    Context.pushStack
    app.89
    Task.34
    Effect.loop_inner

Allocated 2296422 bytes at:
    List.reserve
    List.append
    app.120
    Task.34
    Effect.loop_inner

Allocated 2075616 bytes at:
    List.reserve
    List.append
    Context.pushStack
    app.stepExecCtx
    app.67
    Task.34
    Effect.loop_inner

Allocated 1993272 bytes at:
    List.dropAt
    Context.popStack
    app.popNumber
    app.binaryOp
    app.stepExecCtx
    app.67
    Task.34
    Effect.loop_inner

Brendan Hansknecht (Mar 02 2023 at 07:39):

Also, still insane how much data is allocated by the false interpreter. The top allocation chain allocated 26 megabytes when doing a super simple nqueens problem.

Brendan Hansknecht (Mar 02 2023 at 08:01):

Brendan Hansknecht (Mar 02 2023 at 08:07):

Do we have enough information when we create the name of a lambda that we could instead name them something like #Userapp_lambda_on_line_x instead of just naming them with an incrementing number?

Ayaz Hafiz (Mar 02 2023 at 13:50):

Not right now, but it should be straightforward to start keeping debug info alongside the data structures mono produces from can

Ayaz Hafiz (Mar 02 2023 at 13:51):

However if you’re willing to hack the compiler @Brendan Hansknecht you can use can/src/debug/print.rs to print the AST for each module you’re interested in, and the printed representation will include what lambdas are named what

Brendan Hansknecht (Mar 02 2023 at 14:58):

Brendan Hansknecht (Mar 02 2023 at 15:36):

Apparently I got the format wrong and uploaded a slightly off svg. Here is a corrected version: false-nqueens-allocations.svg

Stream: ideas

Topic: Enforced optimizations

Kasper Møller Andersen (Oct 24 2022 at 09:06):

Folkert de Vries (Oct 24 2022 at 09:12):

Folkert de Vries (Oct 24 2022 at 09:13):

Folkert de Vries (Oct 24 2022 at 09:13):

Chris Duncan (Oct 24 2022 at 09:17):

Kasper Møller Andersen (Oct 24 2022 at 09:32):

Folkert de Vries (Oct 24 2022 at 09:41):

Folkert de Vries (Oct 24 2022 at 09:42):

Kasper Møller Andersen (Oct 24 2022 at 11:28):

Folkert de Vries (Oct 24 2022 at 11:40):

Folkert de Vries (Oct 24 2022 at 11:40):

Folkert de Vries (Oct 24 2022 at 11:40):

Kasper Møller Andersen (Oct 24 2022 at 11:58):

Kasper Møller Andersen (Oct 24 2022 at 12:09):

Kasper Møller Andersen (Oct 24 2022 at 12:11):

Folkert de Vries (Oct 24 2022 at 12:59):

Folkert de Vries (Oct 24 2022 at 13:00):

Folkert de Vries (Oct 24 2022 at 13:00):

Folkert de Vries (Oct 24 2022 at 13:01):

Kasper Møller Andersen (Oct 24 2022 at 13:42):

Brendan Hansknecht (Oct 24 2022 at 14:15):

Richard Feldman (Feb 28 2023 at 21:19):

Richard Feldman (Feb 28 2023 at 21:19):

Richard Feldman (Feb 28 2023 at 21:19):

Richard Feldman (Feb 28 2023 at 21:20):

Richard Feldman (Feb 28 2023 at 21:20):

Richard Feldman (Feb 28 2023 at 21:20):

Richard Feldman (Feb 28 2023 at 21:21):

Joshua Warner (Feb 28 2023 at 21:24):

Brendan Hansknecht (Feb 28 2023 at 21:41):

Brendan Hansknecht (Feb 28 2023 at 21:43):

Richard Feldman (Feb 28 2023 at 21:57):

Richard Feldman (Feb 28 2023 at 21:58):

Richard Feldman (Feb 28 2023 at 21:58):

Joshua Warner (Feb 28 2023 at 22:16):

Brian Carroll (Mar 01 2023 at 07:44):

Brendan Hansknecht (Mar 01 2023 at 16:19):

Brendan Hansknecht (Mar 01 2023 at 16:21):

Brendan Hansknecht (Mar 01 2023 at 16:23):

Brendan Hansknecht (Mar 01 2023 at 16:24):

Brendan Hansknecht (Mar 01 2023 at 16:28):

Joshua Warner (Mar 01 2023 at 16:33):

Brendan Hansknecht (Mar 01 2023 at 16:38):

Brendan Hansknecht (Mar 01 2023 at 16:44):

Brendan Hansknecht (Mar 01 2023 at 16:46):

Brendan Hansknecht (Mar 02 2023 at 06:56):

Brendan Hansknecht (Mar 02 2023 at 07:38):

Brendan Hansknecht (Mar 02 2023 at 07:39):

Brendan Hansknecht (Mar 02 2023 at 08:01):

Brendan Hansknecht (Mar 02 2023 at 08:07):

Ayaz Hafiz (Mar 02 2023 at 13:50):

Ayaz Hafiz (Mar 02 2023 at 13:51):

Brendan Hansknecht (Mar 02 2023 at 14:58):

Brendan Hansknecht (Mar 02 2023 at 15:36):

Brendan Hansknecht (Mar 02 2023 at 15:37):