I'm an absolute beginner at roc, so I thought I'd try a CS 101 exercise: naive functional mergesort. It is Slow. How should I go about figuring out why?
mergesort = \list ->
n = (List.len list) // 2
r = List.split list n
when n is
0 -> list
_ -> mergeSorted (mergesort r.before) (mergesort r.others)
mergeSorted = \left, right ->
when left is
[] -> right
[leftFirst, .. as leftRest] ->
when right is
[] -> left
[rightFirst, .. as rightRest] ->
if leftFirst <= rightFirst then
mergeSorted leftRest right |> List.prepend leftFirst
else
mergeSorted left rightRest |> List.prepend rightFirst
I know roc lists are actually contiguous arrays, so prepend, split, dropFirst, and maybe even destructuring may be expensive. But this is the obvious way to write the algorithm, so how should I find my way from here to something efficient?
(Yes, I know there's a sort function, and I know that it's slow for a reason that may have to do with list operation problems, and I know there's a problem with lists where I can't even make one that's 100k elements long without a segfault. And maybe this algorithm will be efficient when the list problems are fixed. But how can I find out? Does my code permit tail-call elimination, opportunistic mutation, copy-free slicing?)
Full code attached; I'm sure the style is appalling.
main.roc
So generally speaking for measuring the performance of roc, you want to use --optimize
and --profiling
. Then use some sort of sampling profiler that can read the stack (perf, instruments, dtrace). I tend to find perf
with --call-graph dwarf
works the best, but the others are functional. I gave a talk about this in the last meetup. Was the first talk.
Looking at the profile, about 90% of the time is spent in memmove
This is do to prepending instead of appending to the arrays in mergeSorted.
That is easy to fix with swapping to the last instead of the first:
mergeSorted = \left, right ->
when left is
[] -> right
[.. as leftRest, leftLast] ->
when right is
[] -> left
[.. as rightRest, rightLast] ->
if leftLast >= rightLast then
mergeSorted leftRest right |> List.append leftLast
else
mergeSorted left rightRest |> List.append rightLast
After that, the function isn't tail recursive. which actually makes tracking down the real issue and getting a perf graph quite hard. Luckily, that is easy to fix as well:
mergeSorted = \accum, left, right ->
when left is
[] -> List.concat accum right
[.. as leftRest, leftLast] ->
when right is
[] -> List.concat accum left
[.. as rightRest, rightLast] ->
if leftLast >= rightLast then
mergeSorted (List.append accum leftLast) leftRest right
else
mergeSorted (List.append accum rightLast) left rightRest
With this version, there is a solid chunk of time spent in realloc
still. We can just add a List.withCapacity
at the beginning to help alleviate that:
mergesort = \list ->
len = List.len list
n = len // 2
r = List.split list n
when n is
0 -> list
_ -> mergeSorted (List.withCapacity len) (mergesort r.before) (mergesort r.others)
At this point, the next biggest obvious thing to fix is some refcounting stuff, but that is compiler work and not something to be done in userland.
Of course even better than any of this would be to do all of the work inplace with indices, but that is essentially a full rewrite of the algorithm
Thanks! I ended up with:
mergesort = \list ->
n = List.len list
r = List.split list (n//2)
when n is
0 -> list
1 -> list
_ -> mergeSorted (List.withCapacity n) (mergesort r.before) (mergesort r.others)
mergeSorted = \accum, left, right ->
when left is
[] -> List.concat right (List.reverse accum)
[.. as leftRest, leftLast] ->
when right is
[] -> List.concat left (List.reverse accum)
[.. as rightRest, rightLast] ->
if leftLast > rightLast then
mergeSorted (List.append accum leftLast) leftRest right
else
mergeSorted (List.append accum rightLast) left rightRest
Is tail recursion easy enough to detect that the language server could highlight tail calls differently? (Particularly if it's modulo cons...)
Appending versus prepending I guess is just something you have to know, and I suppose one should just trust the compiler to manage opportunistic mutation?
Are there differences in efficiency for different destructuring patterns (a "rest" that's the beginning versus the end of the list, for example)?
I guess one can always profile, but I'm hoping to develop some mental rules/idioms for efficient style.
(Particularly if it's modulo cons...)
IIUC, which I very well may not, modulo cons only applies to tags in roc, so it wouldn't apply to this case. Roc's lists are flat.
and I suppose one should just trust the compiler to manage opportunistic mutation?
Trust but verify is my general opinion.
Are there differences in efficiency for different destructuring patterns (a "rest" that's the beginning versus the end of the list, for example)?
There shouldn't be for the most part. Though rest at the beginning in some certain mutating cases may be faster. Like if you took rest and then appended to it.
List.reverse accum
That will hurt perf. Probably should look into how this can be avoided.
So probably actually want this:
mergeSorted = \accum, left, right ->
when left is
[] -> List.concat accum right
[leftFirst, .. as leftRest] ->
when right is
[] -> List.concat accum left
[rightFirst, .. as rightRest] ->
if leftFirst <= rightFirst then
mergeSorted (List.append accum leftFirst) leftRest right
else
mergeSorted (List.append accum rightFirst) left rightRest
Now that we have an accumulator that we are appending to, that should be fast and always do inplace mutation.
Btw, regarding List.reverse
. Recently I had to use it only for backward walk. I haven’t checked, but the gut feeling suggests that it should be possible to optimize the iteration based on the ref counter (like, opportunistic zero-mutation). Isn’t it?
Yes, there are walkBackwards flavours, but the list doesn't seem to be complete:
walk - walkBackwards
walkUntil - walkBackwardsUntil
walkWithIndex
walkWithIndexUntil
walkFrom
walkFromUntil
Yeah, we should probably add the backwards version of all of those functions. In the future, loop fusion on list functions may be able to make them equivalent, but that is thinking very long term. Filling out these functions is a better solution today. Contributions welcome to add any/all of theses.
Anne Archibald said:
Is tail recursion easy enough to detect that the language server could highlight tail calls differently? (Particularly if it's modulo cons...)
I actually really like this idea, I just tried to implement it and hit some issues, but got it kind of working
image.png
image.png
@Anne Archibald I've made a new topic for this editor integration and have it working in a branch if you'd like to try it :)
https://roc.zulipchat.com/#narrow/stream/304641-ideas/topic/Editor.20indicate.20if.20a.20function.20is.20tail.20recursive
Brendan Hansknecht said:
Now that we have an accumulator that we are appending to, that should be fast and always do inplace mutation.
I can't speak for the mutation, but it seems to me that the list destructuring [a, .. as rest] has to effectively do a dropFirst to construct rest. That would seem to require copying all but one of the elements to a new list. Or is there an efficient way to do this list destructuring operation?
We have seamless slice. So you will just get a slice referencing the original list except the first element.
Last updated: Jul 05 2025 at 12:14 UTC