Understanding performance · beginners

I'm an absolute beginner at roc, so I thought I'd try a CS 101 exercise: naive functional mergesort. It is Slow. How should I go about figuring out why?

mergesort = \list ->
    n = (List.len list) // 2
    r = List.split list n
    when n is
        0 -> list
        _ -> mergeSorted (mergesort r.before) (mergesort r.others)

mergeSorted = \left, right ->
    when left is
        [] -> right
        [leftFirst, .. as leftRest] ->
            when right is
                [] -> left
                [rightFirst, .. as rightRest] ->
                    if leftFirst <= rightFirst then
                        mergeSorted leftRest right |> List.prepend leftFirst
                    else
                        mergeSorted left rightRest |> List.prepend rightFirst

I know roc lists are actually contiguous arrays, so prepend, split, dropFirst, and maybe even destructuring may be expensive. But this is the obvious way to write the algorithm, so how should I find my way from here to something efficient?

(Yes, I know there's a sort function, and I know that it's slow for a reason that may have to do with list operation problems, and I know there's a problem with lists where I can't even make one that's 100k elements long without a segfault. And maybe this algorithm will be efficient when the list problems are fixed. But how can I find out? Does my code permit tail-call elimination, opportunistic mutation, copy-free slicing?)

Brendan Hansknecht (Jan 28 2024 at 16:56):

So generally speaking for measuring the performance of roc, you want to use --optimize and --profiling. Then use some sort of sampling profiler that can read the stack (perf, instruments, dtrace). I tend to find perf with --call-graph dwarf works the best, but the others are functional. I gave a talk about this in the last meetup. Was the first talk.

Brendan Hansknecht (Jan 28 2024 at 16:57):

Brendan Hansknecht (Jan 28 2024 at 16:58):

mergeSorted = \left, right ->
    when left is
        [] -> right
        [.. as leftRest, leftLast] ->
            when right is
                [] -> left
                [.. as rightRest, rightLast] ->
                    if leftLast >= rightLast then
                        mergeSorted leftRest right |> List.append leftLast
                    else
                        mergeSorted left rightRest |> List.append rightLast

Brendan Hansknecht (Jan 28 2024 at 17:10):

After that, the function isn't tail recursive. which actually makes tracking down the real issue and getting a perf graph quite hard. Luckily, that is easy to fix as well:

mergeSorted = \accum, left, right ->
    when left is
        [] -> List.concat accum right
        [.. as leftRest, leftLast] ->
            when right is
                [] -> List.concat accum left
                [.. as rightRest, rightLast] ->
                    if leftLast >= rightLast then
                        mergeSorted (List.append accum leftLast) leftRest right
                    else
                        mergeSorted (List.append accum rightLast) left rightRest

Brendan Hansknecht (Jan 28 2024 at 17:13):

With this version, there is a solid chunk of time spent in realloc still. We can just add a List.withCapacity at the beginning to help alleviate that:

mergesort = \list ->
    len = List.len list
    n = len // 2
    r = List.split list n
    when n is
        0 -> list
        _ -> mergeSorted (List.withCapacity len) (mergesort r.before) (mergesort r.others)

Brendan Hansknecht (Jan 28 2024 at 17:19):

At this point, the next biggest obvious thing to fix is some refcounting stuff, but that is compiler work and not something to be done in userland.

Brendan Hansknecht (Jan 28 2024 at 17:20):

Of course even better than any of this would be to do all of the work inplace with indices, but that is essentially a full rewrite of the algorithm

Anne Archibald (Jan 29 2024 at 21:18):

mergesort = \list ->
    n = List.len list
    r = List.split list (n//2)
    when n is
        0 -> list
        1 -> list
        _ -> mergeSorted (List.withCapacity n) (mergesort r.before) (mergesort r.others)

mergeSorted = \accum, left, right ->
    when left is
        [] -> List.concat right (List.reverse accum)
        [.. as leftRest, leftLast] ->
            when right is
                [] -> List.concat left (List.reverse accum)
                [.. as rightRest, rightLast] ->
                    if leftLast > rightLast then
                        mergeSorted (List.append accum leftLast) leftRest right
                    else
                        mergeSorted (List.append accum rightLast) left rightRest

Is tail recursion easy enough to detect that the language server could highlight tail calls differently? (Particularly if it's modulo cons...)

Appending versus prepending I guess is just something you have to know, and I suppose one should just trust the compiler to manage opportunistic mutation?

Are there differences in efficiency for different destructuring patterns (a "rest" that's the beginning versus the end of the list, for example)?

I guess one can always profile, but I'm hoping to develop some mental rules/idioms for efficient style.

Brendan Hansknecht (Jan 29 2024 at 21:30):

IIUC, which I very well may not, modulo cons only applies to tags in roc, so it wouldn't apply to this case. Roc's lists are flat.

There shouldn't be for the most part. Though rest at the beginning in some certain mutating cases may be faster. Like if you took rest and then appended to it.

Brendan Hansknecht (Jan 29 2024 at 21:46):

mergeSorted = \accum, left, right ->
    when left is
        [] -> List.concat accum right
        [leftFirst, .. as leftRest] ->
            when right is
                [] -> List.concat accum left
                [rightFirst, .. as rightRest] ->
                    if leftFirst <= rightFirst then
                        mergeSorted (List.append accum leftFirst) leftRest right
                    else
                        mergeSorted (List.append accum rightFirst) left rightRest

Brendan Hansknecht (Jan 29 2024 at 21:48):

Now that we have an accumulator that we are appending to, that should be fast and always do inplace mutation.

Kiryl Dziamura (Jan 29 2024 at 22:45):

Btw, regarding List.reverse. Recently I had to use it only for backward walk. I haven’t checked, but the gut feeling suggests that it should be possible to optimize the iteration based on the ref counter (like, opportunistic zero-mutation). Isn’t it?

Yes, there are walkBackwards flavours, but the list doesn't seem to be complete:

walk - walkBackwards
walkUntil - walkBackwardsUntil

walkWithIndex
walkWithIndexUntil

walkFrom
walkFromUntil

Brendan Hansknecht (Jan 29 2024 at 23:06):

Yeah, we should probably add the backwards version of all of those functions. In the future, loop fusion on list functions may be able to make them equivalent, but that is thinking very long term. Filling out these functions is a better solution today. Contributions welcome to add any/all of theses.

Eli Dowling (Jan 30 2024 at 00:31):

I actually really like this idea, I just tried to implement it and hit some issues, but got it kind of working
image.png
image.png

Eli Dowling (Jan 30 2024 at 04:01):

Anne Archibald (Feb 02 2024 at 18:47):

I can't speak for the mutation, but it seems to me that the list destructuring [a, .. as rest] has to effectively do a dropFirst to construct rest. That would seem to require copying all but one of the elements to a new list. Or is there an efficient way to do this list destructuring operation?

Brendan Hansknecht (Feb 02 2024 at 19:27):

We have seamless slice. So you will just get a slice referencing the original list except the first element.

Stream: beginners

Topic: Understanding performance

Anne Archibald (Jan 28 2024 at 15:09):

Brendan Hansknecht (Jan 28 2024 at 16:56):

Brendan Hansknecht (Jan 28 2024 at 16:57):

Brendan Hansknecht (Jan 28 2024 at 16:57):

Brendan Hansknecht (Jan 28 2024 at 16:58):

Brendan Hansknecht (Jan 28 2024 at 17:10):

Brendan Hansknecht (Jan 28 2024 at 17:13):

Brendan Hansknecht (Jan 28 2024 at 17:19):

Brendan Hansknecht (Jan 28 2024 at 17:20):

Anne Archibald (Jan 29 2024 at 21:18):

Brendan Hansknecht (Jan 29 2024 at 21:30):

Brendan Hansknecht (Jan 29 2024 at 21:46):

Brendan Hansknecht (Jan 29 2024 at 21:48):

Kiryl Dziamura (Jan 29 2024 at 22:45):

Brendan Hansknecht (Jan 29 2024 at 23:06):

Eli Dowling (Jan 30 2024 at 00:31):

Eli Dowling (Jan 30 2024 at 04:01):

Anne Archibald (Feb 02 2024 at 18:47):

Brendan Hansknecht (Feb 02 2024 at 19:27):