Algorithmic complexity of List operations · beginners

Stream: beginners

Topic: Algorithmic complexity of List operations

Abhinav Sarkar (May 12 2024 at 06:42):

Hi. I could not find the algorithmic complexity of List operations like append, prepend, concat etc mentioned in the docs. I tried going through the code but I was unable to find the implementations. Can someone tell me what they are?

Abhinav Sarkar (May 12 2024 at 06:43):

It'll be great if it is mentioned in the docs. Specifically, I'm wondering if List prepend and append are O(1) or O(n).

Abhinav Sarkar (May 12 2024 at 07:06):

After some digging, I found the implementation at https://github.com/roc-lang/roc/blob/main/crates/compiler/builtins/bitcode/src/list.zig. IIUC, append and prepend at O(n) and concat is O(m+n). Please correct me if I'm wrong.

Brendan Hansknecht (May 12 2024 at 10:30):

List is a flat array

Brendan Hansknecht (May 12 2024 at 10:30):

So same as a c++ vector or java arraylist

Brendan Hansknecht (May 12 2024 at 10:32):

Append 1
Prepend n
Concat of m and n list is n

Brendan Hansknecht (May 12 2024 at 10:33):

The big caveat is uniqueness. Due to immutability requirements, can lead to O(n) coping

Brendan Hansknecht (May 12 2024 at 10:33):

Get is 1
slicing/sublist is 1

Norbert Hajagos (May 12 2024 at 10:38):

I haven't written any Zig, but I get why you would think that, glancing at the code you linked.

pub fn listAppendUnsafe(
    list: RocList,
    element: Opaque,
    element_width: usize,
) callconv(.C) RocList {
    const old_length = list.len();
    var output = list;
    output.length += 1;

    if (output.bytes) |bytes| {
        if (element) |source| {
            const target = bytes + old_length * element_width;
            @memcpy(target[0..element_width], source[0..element_width]);
        }
    }

    return output;
}

IIUC, the memcpy only copies 1 element-width worth of bytes into the list which has reserved space for it, not a whole list, even though I associate memcpy with copying big regions.

Brendan Hansknecht (May 12 2024 at 10:41):

Yep

Brendan Hansknecht (May 12 2024 at 10:42):

Though we actually should change this interface some for more perf....but that is a micro benchmark detail.

Abhinav Sarkar (May 12 2024 at 11:15):

Thanks. Is it possible to add this information to the docs?

Norbert Hajagos (May 12 2024 at 14:49):

I think it would be useful if the docs for the builtins would have information on Lists being represented as arrays that resize when needed, not linked lists. (This would imply having an O(1) append and O(n) prepend) . This would be useful, to someone coming from another functional language, since they likely assume the implementation to be a linked list, making their code have a lot of prepending. It could go to the begining of the docs, similarly how Str has a section on utf8 there.

Abhinav Sarkar (May 12 2024 at 14:51):

Is there a plan for a linked list data structure?

Brendan Hansknecht (May 12 2024 at 14:56):

Nope

Brendan Hansknecht (May 12 2024 at 14:56):

Though it is easy to make with tags

Brendan Hansknecht (May 12 2024 at 14:57):

Generally linked lists are quite bad for performance even when they win on time complexity.

Brendan Hansknecht (May 12 2024 at 14:57):

They are exceptional pessimistic on modern hardware unlike flat arrays

Abhinav Sarkar (May 12 2024 at 14:57):

Isn't O(n) prepend too slow for some cases?

Brendan Hansknecht (May 12 2024 at 14:59):

For sure. Though often the right answer isn't a linked list but some alternative solution built on flat arrays. Not saying never use linked lists, but often time with small to medium size lists, arrays are still way faster even with the O(n) prepend

Brendan Hansknecht (May 12 2024 at 14:59):

But highly depends on the exact use case

Abhinav Sarkar (May 12 2024 at 15:00):

Is Roc List boxed or unboxed?

Abhinav Sarkar (May 12 2024 at 15:00):

The docs do not mention anything

Abhinav Sarkar (May 12 2024 at 15:05):

Interestingly, in OCaml, list prepend is O(1) and append is O(n).

Brendan Hansknecht (May 12 2024 at 15:05):

Data in a list is stored on the heap.

Norbert Hajagos (May 12 2024 at 15:06):

The values of a list are Unboxed, but yeah, it is on the heap. They are like Rust / C++ vectors. As I understand, anything that can be not boxed isn't. So tuples, structs, closures, numbers. They are not boxed, just nicely sitting next to eachother in a list

Brendan Hansknecht (May 12 2024 at 15:07):

Ocaml uses a linked list so it has to scan the entire pointer chain across memory to reach the last element and append.

Roc is a contiguous block of memory and has to move all elements to prepend.

Abhinav Sarkar (May 12 2024 at 15:07):

That'll be another good thing to mention in the docs.

Norbert Hajagos (May 12 2024 at 15:09):

Even tag unions are on the stack, unboxed if they are not recursive (Like not defined as a Linked list such as LinkedList a : [End, Node a]

Norbert Hajagos (May 12 2024 at 15:09):

Ups, not that

Norbert Hajagos (May 12 2024 at 15:11):

I think this would be the correct type alias LinkedList a : [Nil, Cons a (LinkedList a)]

Hristo (May 12 2024 at 15:14):

Brendan Hansknecht said:

Ocaml uses a linked list so it has to scan the entire pointer chain across memory to reach the last element and append.

Not an Ocaml-specific question, and also apologies if it may be going a tiny bit off-topic:

Is there a good reason (besides memory savings, however tiny they might be in modern times) for not additionally storing a pointer to the end of the list in such implementations?

Brendan Hansknecht (May 12 2024 at 15:16):

Really is only common with doubly linked lists where you can traverse in reverse. Just having the end is fine, but then someone will want end - 1 or end - 2. So it is only a partial patch. Generally better to just not store the end and educate users on how to properly use linked lists. Like what kinds of algorithm to avoid with them.

Brendan Hansknecht (May 12 2024 at 15:17):

That said, modern hardware really hates linked lists. So they generally are a bad idea unless you have an extreme or required use case.

Abhinav Sarkar (May 12 2024 at 15:22):

In absence of linked lists in Roc, what is the right way to write a recursive function that goes over a sequence of items and produces a sequence of zero or more items for each original item, and finally returns a flattened sequence containing all output items?

Abhinav Sarkar (May 12 2024 at 15:23):

With linked list, this would be O(n). With Roc lists, this would be O(n²).

Richard Feldman (May 12 2024 at 15:29):

that's true in the worst case, yeah

Richard Feldman (May 12 2024 at 15:29):

the bet is that the Roc version will also be faster in practice

Richard Feldman (May 12 2024 at 15:30):

in other words, that it will have worse worst-case asymptotic performance but better runtime performance in practice, because reallocations won't be that common, and the cost of things like extra allocations and pointer chasing in the linked list will be more impactful in practice

Richard Feldman (May 12 2024 at 15:30):

I wouldn't use recursion for that btw, I'd do it like this:

flatMap : List a, (a -> List a) -> List a
flatMap = \items, fn ->
    List.walk items [] \answer, item ->
        List.concat answer (fn item)

Richard Feldman (May 12 2024 at 15:31):

I call it a bet because we can't really know until we've seen it in use in non-contrived programs :big_smile:

Richard Feldman (May 12 2024 at 15:32):

so far the bet has been working out well I'd say!

Richard Feldman (May 12 2024 at 15:41):

just to clarify, linked lists would be O(n) but :point_up: this implementation (or a recursive one) would only be O(n²) in the worst case where every List that gets appended is so big that it requires a reallocation

Richard Feldman (May 12 2024 at 15:41):

because the answer list will be mutated in-place

Richard Feldman (May 12 2024 at 15:45):

in contrast, if the lists are small enough in practice that reallocations are rarely needed, then it's more like O(n + m) where n is the number of input elements and m is the number of elements in the List returned in the given function

Richard Feldman (May 12 2024 at 15:46):

(and the same would be true of the linked list version)

Richard Feldman (May 12 2024 at 15:50):

also worth noting that you could reduce the chances of reallocation by starting with a List.withCapacity instead of []

Brendan Hansknecht (May 12 2024 at 17:00):

Yeah using withCapacity, that is O(n) with no reallocations or extra data movement

Brendan Hansknecht (May 12 2024 at 17:06):

A linked list is O(n), but also requires n separate allocations. Calling malloc n times is essentially always bad.

Hristo (May 12 2024 at 19:55):

Brendan Hansknecht said:

Really is only common with doubly linked lists where you can traverse in reverse. Just having the end is fine, but then someone will want end - 1 or end - 2. So it is only a partial patch. Generally better to just not store the end and educate users on how to properly use linked lists. Like what kinds of algorithm to avoid with them.

Apologies for not having been clear enough. I should've specified that I wasn't referring to doubly-linked lists.

I was referring to enabling append and prepend to both be O(1). This is in the context of an arbitrary linked-list implementation (of a real-world programming language). I'd like to emphasise that I'm not proposing that Roc's Lists become linked lists.

Karakatiza (May 13 2024 at 03:17):

Wondered the same today while trying to build an algorithm with lists.
Not only big O time complexity is important to know, but also reallocations, and when they happen (e.g. does it happen when doing a sublist on a value that goes out of scope in a tail-end-recursive function?)

I tried to write a simple dirty CSV parser (not using roc-parser), and it's difficult to know what route to take to optimize without more information.

Abhinav Sarkar (May 13 2024 at 03:23):

Karakatiza said:

it's difficult to know what route to take to optimize without more information.

I have the exact same problem.

Brendan Hansknecht (May 13 2024 at 05:57):

Yeah, we definitely want to build some tooling around this (and more docs), but it is still early days. We also have a number of optimizations that are not in yet that create very sharp edges.

Brendan Hansknecht (May 13 2024 at 05:58):

Currently the best way to learn is to chat here and share code.

Brendan Hansknecht (May 13 2024 at 05:58):

That and profile things

Brendan Hansknecht (May 13 2024 at 06:05):

Hristo said:

Apologies for not having been clear enough. I should've specified that I wasn't referring to doubly-linked lists.

My reply wasn't clear enough either it seems (note, I added some extra context to the quotes in parens). Only the first sentence is about doubly linked lists:

Really (a pointer to the end of a linked list list) is only common with doubly linked lists where you can traverse in reverse.

The following sentences are about the common implementations of singly linked lists in many programming languages (especially ml related ones):

Just having the end is fine, but then someone will want "end - 1" or "end - 2". So it is only a partial patch (to add a pointer to the end element). Generally better to just not store the end and educate users on how to properly use linked lists (append heavy algorithms are an anti pattern. Especially because they make it seem like patterning matching the end or otherwise accessing other tail elements should be fast, but only the very last element would be special). (Education should be on) what kinds of algorithms to avoid with them (most append heavy algorithms on linked lists should be reversed to be prepend heavy or have other tricks to avoid the costs).

Last updated: Feb 20 2026 at 12:27 UTC