Hi. I could not find the algorithmic complexity of List operations like append, prepend, concat etc mentioned in the docs. I tried going through the code but I was unable to find the implementations. Can someone tell me what they are?
It'll be great if it is mentioned in the docs. Specifically, I'm wondering if List prepend and append are O(1) or O(n).
After some digging, I found the implementation at https://github.com/roc-lang/roc/blob/main/crates/compiler/builtins/bitcode/src/list.zig. IIUC, append and prepend at O(n) and concat is O(m+n). Please correct me if I'm wrong.
List is a flat array
So same as a c++ vector or java arraylist
Append 1
Prepend n
Concat of m and n list is n
The big caveat is uniqueness. Due to immutability requirements, can lead to O(n) coping
Get is 1
slicing/sublist is 1
I haven't written any Zig, but I get why you would think that, glancing at the code you linked.
pub fn listAppendUnsafe(
list: RocList,
element: Opaque,
element_width: usize,
) callconv(.C) RocList {
const old_length = list.len();
var output = list;
output.length += 1;
if (output.bytes) |bytes| {
if (element) |source| {
const target = bytes + old_length * element_width;
@memcpy(target[0..element_width], source[0..element_width]);
}
}
return output;
}
IIUC, the memcpy only copies 1 element-width worth of bytes into the list which has reserved space for it, not a whole list, even though I associate memcpy with copying big regions.
Yep
Though we actually should change this interface some for more perf....but that is a micro benchmark detail.
Thanks. Is it possible to add this information to the docs?
I think it would be useful if the docs for the builtins would have information on Lists being represented as arrays that resize when needed, not linked lists. (This would imply having an O(1) append and O(n) prepend) . This would be useful, to someone coming from another functional language, since they likely assume the implementation to be a linked list, making their code have a lot of prepending. It could go to the begining of the docs, similarly how Str has a section on utf8 there.
Is there a plan for a linked list data structure?
Nope
Though it is easy to make with tags
Generally linked lists are quite bad for performance even when they win on time complexity.
They are exceptional pessimistic on modern hardware unlike flat arrays
Isn't O(n) prepend too slow for some cases?
For sure. Though often the right answer isn't a linked list but some alternative solution built on flat arrays. Not saying never use linked lists, but often time with small to medium size lists, arrays are still way faster even with the O(n) prepend
But highly depends on the exact use case
Is Roc List boxed or unboxed?
The docs do not mention anything
Interestingly, in OCaml, list prepend is O(1) and append is O(n).
Data in a list is stored on the heap.
The values of a list are Unboxed, but yeah, it is on the heap. They are like Rust / C++ vectors. As I understand, anything that can be not boxed isn't. So tuples, structs, closures, numbers. They are not boxed, just nicely sitting next to eachother in a list
Ocaml uses a linked list so it has to scan the entire pointer chain across memory to reach the last element and append.
Roc is a contiguous block of memory and has to move all elements to prepend.
That'll be another good thing to mention in the docs.
Even tag unions are on the stack, unboxed if they are not recursive (Like not defined as a Linked list such as LinkedList a : [End, Node a]
Ups, not that
I think this would be the correct type alias LinkedList a : [Nil, Cons a (LinkedList a)]
Brendan Hansknecht said:
Ocaml uses a linked list so it has to scan the entire pointer chain across memory to reach the last element and append.
Not an Ocaml-specific question, and also apologies if it may be going a tiny bit off-topic:
Really is only common with doubly linked lists where you can traverse in reverse. Just having the end is fine, but then someone will want end - 1 or end - 2. So it is only a partial patch. Generally better to just not store the end and educate users on how to properly use linked lists. Like what kinds of algorithm to avoid with them.
That said, modern hardware really hates linked lists. So they generally are a bad idea unless you have an extreme or required use case.
In absence of linked lists in Roc, what is the right way to write a recursive function that goes over a sequence of items and produces a sequence of zero or more items for each original item, and finally returns a flattened sequence containing all output items?
With linked list, this would be O(n). With Roc lists, this would be O(n²).
that's true in the worst case, yeah
the bet is that the Roc version will also be faster in practice
in other words, that it will have worse worst-case asymptotic performance but better runtime performance in practice, because reallocations won't be that common, and the cost of things like extra allocations and pointer chasing in the linked list will be more impactful in practice
I wouldn't use recursion for that btw, I'd do it like this:
flatMap : List a, (a -> List a) -> List a
flatMap = \items, fn ->
List.walk items [] \answer, item ->
List.concat answer (fn item)
I call it a bet because we can't really know until we've seen it in use in non-contrived programs :big_smile:
so far the bet has been working out well I'd say!
just to clarify, linked lists would be O(n) but :point_up: this implementation (or a recursive one) would only be O(n²) in the worst case where every List
that gets appended is so big that it requires a reallocation
because the answer
list will be mutated in-place
in contrast, if the lists are small enough in practice that reallocations are rarely needed, then it's more like O(n + m) where n is the number of input elements and m is the number of elements in the List
returned in the given function
(and the same would be true of the linked list version)
also worth noting that you could reduce the chances of reallocation by starting with a List.withCapacity
instead of []
Yeah using withCapacity, that is O(n) with no reallocations or extra data movement
A linked list is O(n), but also requires n separate allocations. Calling malloc n times is essentially always bad.
Brendan Hansknecht said:
Really is only common with doubly linked lists where you can traverse in reverse. Just having the end is fine, but then someone will want end - 1 or end - 2. So it is only a partial patch. Generally better to just not store the end and educate users on how to properly use linked lists. Like what kinds of algorithm to avoid with them.
Apologies for not having been clear enough. I should've specified that I wasn't referring to doubly-linked lists.
I was referring to enabling append and prepend to both be O(1). This is in the context of an arbitrary linked-list implementation (of a real-world programming language). I'd like to emphasise that I'm not proposing that Roc's List
s become linked lists.
Wondered the same today while trying to build an algorithm with lists.
Not only big O time complexity is important to know, but also reallocations, and when they happen (e.g. does it happen when doing a sublist on a value that goes out of scope in a tail-end-recursive function?)
I tried to write a simple dirty CSV parser (not using roc-parser), and it's difficult to know what route to take to optimize without more information.
Karakatiza said:
it's difficult to know what route to take to optimize without more information.
I have the exact same problem.
Yeah, we definitely want to build some tooling around this (and more docs), but it is still early days. We also have a number of optimizations that are not in yet that create very sharp edges.
Currently the best way to learn is to chat here and share code.
That and profile things
Hristo said:
Apologies for not having been clear enough. I should've specified that I wasn't referring to doubly-linked lists.
My reply wasn't clear enough either it seems (note, I added some extra context to the quotes in parens). Only the first sentence is about doubly linked lists:
Really (a pointer to the end of a linked list list) is only common with doubly linked lists where you can traverse in reverse.
The following sentences are about the common implementations of singly linked lists in many programming languages (especially ml related ones):
Just having the end is fine, but then someone will want "end - 1" or "end - 2". So it is only a partial patch (to add a pointer to the end element). Generally better to just not store the end and educate users on how to properly use linked lists (append heavy algorithms are an anti pattern. Especially because they make it seem like patterning matching the end or otherwise accessing other tail elements should be fast, but only the very last element would be special). (Education should be on) what kinds of algorithms to avoid with them (most append heavy algorithms on linked lists should be reversed to be prepend heavy or have other tricks to avoid the costs).
Last updated: Jul 05 2025 at 12:14 UTC