Stream: beginners

Topic: Understanding performance


view this post on Zulip Anne Archibald (Jan 28 2024 at 15:09):

I'm an absolute beginner at roc, so I thought I'd try a CS 101 exercise: naive functional mergesort. It is Slow. How should I go about figuring out why?

mergesort = \list ->
    n = (List.len list) // 2
    r = List.split list n
    when n is
        0 -> list
        _ -> mergeSorted (mergesort r.before) (mergesort r.others)

mergeSorted = \left, right ->
    when left is
        [] -> right
        [leftFirst, .. as leftRest] ->
            when right is
                [] -> left
                [rightFirst, .. as rightRest] ->
                    if leftFirst <= rightFirst then
                        mergeSorted leftRest right |> List.prepend leftFirst
                    else
                        mergeSorted left rightRest |> List.prepend rightFirst

I know roc lists are actually contiguous arrays, so prepend, split, dropFirst, and maybe even destructuring may be expensive. But this is the obvious way to write the algorithm, so how should I find my way from here to something efficient?

(Yes, I know there's a sort function, and I know that it's slow for a reason that may have to do with list operation problems, and I know there's a problem with lists where I can't even make one that's 100k elements long without a segfault. And maybe this algorithm will be efficient when the list problems are fixed. But how can I find out? Does my code permit tail-call elimination, opportunistic mutation, copy-free slicing?)

Full code attached; I'm sure the style is appalling.
main.roc

view this post on Zulip Brendan Hansknecht (Jan 28 2024 at 16:56):

So generally speaking for measuring the performance of roc, you want to use --optimize and --profiling. Then use some sort of sampling profiler that can read the stack (perf, instruments, dtrace). I tend to find perf with --call-graph dwarf works the best, but the others are functional. I gave a talk about this in the last meetup. Was the first talk.

view this post on Zulip Brendan Hansknecht (Jan 28 2024 at 16:57):

Looking at the profile, about 90% of the time is spent in memmove

view this post on Zulip Brendan Hansknecht (Jan 28 2024 at 16:57):

This is do to prepending instead of appending to the arrays in mergeSorted.

view this post on Zulip Brendan Hansknecht (Jan 28 2024 at 16:58):

That is easy to fix with swapping to the last instead of the first:

mergeSorted = \left, right ->
    when left is
        [] -> right
        [.. as leftRest, leftLast] ->
            when right is
                [] -> left
                [.. as rightRest, rightLast] ->
                    if leftLast >= rightLast then
                        mergeSorted leftRest right |> List.append leftLast
                    else
                        mergeSorted left rightRest |> List.append rightLast

view this post on Zulip Brendan Hansknecht (Jan 28 2024 at 17:10):

After that, the function isn't tail recursive. which actually makes tracking down the real issue and getting a perf graph quite hard. Luckily, that is easy to fix as well:

mergeSorted = \accum, left, right ->
    when left is
        [] -> List.concat accum right
        [.. as leftRest, leftLast] ->
            when right is
                [] -> List.concat accum left
                [.. as rightRest, rightLast] ->
                    if leftLast >= rightLast then
                        mergeSorted (List.append accum leftLast) leftRest right
                    else
                        mergeSorted (List.append accum rightLast) left rightRest

view this post on Zulip Brendan Hansknecht (Jan 28 2024 at 17:13):

With this version, there is a solid chunk of time spent in realloc still. We can just add a List.withCapacity at the beginning to help alleviate that:

mergesort = \list ->
    len = List.len list
    n = len // 2
    r = List.split list n
    when n is
        0 -> list
        _ -> mergeSorted (List.withCapacity len) (mergesort r.before) (mergesort r.others)

view this post on Zulip Brendan Hansknecht (Jan 28 2024 at 17:19):

At this point, the next biggest obvious thing to fix is some refcounting stuff, but that is compiler work and not something to be done in userland.

view this post on Zulip Brendan Hansknecht (Jan 28 2024 at 17:20):

Of course even better than any of this would be to do all of the work inplace with indices, but that is essentially a full rewrite of the algorithm

view this post on Zulip Anne Archibald (Jan 29 2024 at 21:18):

Thanks! I ended up with:

mergesort = \list ->
    n = List.len list
    r = List.split list (n//2)
    when n is
        0 -> list
        1 -> list
        _ -> mergeSorted (List.withCapacity n) (mergesort r.before) (mergesort r.others)

mergeSorted = \accum, left, right ->
    when left is
        [] -> List.concat right (List.reverse accum)
        [.. as leftRest, leftLast] ->
            when right is
                [] -> List.concat left (List.reverse accum)
                [.. as rightRest, rightLast] ->
                    if leftLast > rightLast then
                        mergeSorted (List.append accum leftLast) leftRest right
                    else
                        mergeSorted (List.append accum rightLast) left rightRest

Is tail recursion easy enough to detect that the language server could highlight tail calls differently? (Particularly if it's modulo cons...)

Appending versus prepending I guess is just something you have to know, and I suppose one should just trust the compiler to manage opportunistic mutation?

Are there differences in efficiency for different destructuring patterns (a "rest" that's the beginning versus the end of the list, for example)?

I guess one can always profile, but I'm hoping to develop some mental rules/idioms for efficient style.

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 21:30):

(Particularly if it's modulo cons...)

IIUC, which I very well may not, modulo cons only applies to tags in roc, so it wouldn't apply to this case. Roc's lists are flat.

and I suppose one should just trust the compiler to manage opportunistic mutation?

Trust but verify is my general opinion.

Are there differences in efficiency for different destructuring patterns (a "rest" that's the beginning versus the end of the list, for example)?

There shouldn't be for the most part. Though rest at the beginning in some certain mutating cases may be faster. Like if you took rest and then appended to it.

List.reverse accum

That will hurt perf. Probably should look into how this can be avoided.

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 21:46):

So probably actually want this:

mergeSorted = \accum, left, right ->
    when left is
        [] -> List.concat accum right
        [leftFirst, .. as leftRest] ->
            when right is
                [] -> List.concat accum left
                [rightFirst, .. as rightRest] ->
                    if leftFirst <= rightFirst then
                        mergeSorted (List.append accum leftFirst) leftRest right
                    else
                        mergeSorted (List.append accum rightFirst) left rightRest

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 21:48):

Now that we have an accumulator that we are appending to, that should be fast and always do inplace mutation.

view this post on Zulip Kiryl Dziamura (Jan 29 2024 at 22:45):

Btw, regarding List.reverse. Recently I had to use it only for backward walk. I haven’t checked, but the gut feeling suggests that it should be possible to optimize the iteration based on the ref counter (like, opportunistic zero-mutation). Isn’t it?

Yes, there are walkBackwards flavours, but the list doesn't seem to be complete:

walk - walkBackwards
walkUntil - walkBackwardsUntil

walkWithIndex
walkWithIndexUntil

walkFrom
walkFromUntil

view this post on Zulip Brendan Hansknecht (Jan 29 2024 at 23:06):

Yeah, we should probably add the backwards version of all of those functions. In the future, loop fusion on list functions may be able to make them equivalent, but that is thinking very long term. Filling out these functions is a better solution today. Contributions welcome to add any/all of theses.

view this post on Zulip Eli Dowling (Jan 30 2024 at 00:31):

Anne Archibald said:

Is tail recursion easy enough to detect that the language server could highlight tail calls differently? (Particularly if it's modulo cons...)

I actually really like this idea, I just tried to implement it and hit some issues, but got it kind of working
image.png
image.png

view this post on Zulip Eli Dowling (Jan 30 2024 at 04:01):

@Anne Archibald I've made a new topic for this editor integration and have it working in a branch if you'd like to try it :)
https://roc.zulipchat.com/#narrow/stream/304641-ideas/topic/Editor.20indicate.20if.20a.20function.20is.20tail.20recursive

view this post on Zulip Anne Archibald (Feb 02 2024 at 18:47):

Brendan Hansknecht said:

Now that we have an accumulator that we are appending to, that should be fast and always do inplace mutation.

I can't speak for the mutation, but it seems to me that the list destructuring [a, .. as rest] has to effectively do a dropFirst to construct rest. That would seem to require copying all but one of the elements to a new list. Or is there an efficient way to do this list destructuring operation?

view this post on Zulip Brendan Hansknecht (Feb 02 2024 at 19:27):

We have seamless slice. So you will just get a slice referencing the original list except the first element.


Last updated: Jul 05 2025 at 12:14 UTC