Stream: beginners

Topic: Algorithmic complexity of List operations


view this post on Zulip Abhinav Sarkar (May 12 2024 at 06:42):

Hi. I could not find the algorithmic complexity of List operations like append, prepend, concat etc mentioned in the docs. I tried going through the code but I was unable to find the implementations. Can someone tell me what they are?

view this post on Zulip Abhinav Sarkar (May 12 2024 at 06:43):

It'll be great if it is mentioned in the docs. Specifically, I'm wondering if List prepend and append are O(1) or O(n).

view this post on Zulip Abhinav Sarkar (May 12 2024 at 07:06):

After some digging, I found the implementation at https://github.com/roc-lang/roc/blob/main/crates/compiler/builtins/bitcode/src/list.zig. IIUC, append and prepend at O(n) and concat is O(m+n). Please correct me if I'm wrong.

view this post on Zulip Brendan Hansknecht (May 12 2024 at 10:30):

List is a flat array

view this post on Zulip Brendan Hansknecht (May 12 2024 at 10:30):

So same as a c++ vector or java arraylist

view this post on Zulip Brendan Hansknecht (May 12 2024 at 10:32):

Append 1
Prepend n
Concat of m and n list is n

view this post on Zulip Brendan Hansknecht (May 12 2024 at 10:33):

The big caveat is uniqueness. Due to immutability requirements, can lead to O(n) coping

view this post on Zulip Brendan Hansknecht (May 12 2024 at 10:33):

Get is 1
slicing/sublist is 1

view this post on Zulip Norbert Hajagos (May 12 2024 at 10:38):

I haven't written any Zig, but I get why you would think that, glancing at the code you linked.

pub fn listAppendUnsafe(
    list: RocList,
    element: Opaque,
    element_width: usize,
) callconv(.C) RocList {
    const old_length = list.len();
    var output = list;
    output.length += 1;

    if (output.bytes) |bytes| {
        if (element) |source| {
            const target = bytes + old_length * element_width;
            @memcpy(target[0..element_width], source[0..element_width]);
        }
    }

    return output;
}

IIUC, the memcpy only copies 1 element-width worth of bytes into the list which has reserved space for it, not a whole list, even though I associate memcpy with copying big regions.

view this post on Zulip Brendan Hansknecht (May 12 2024 at 10:41):

Yep

view this post on Zulip Brendan Hansknecht (May 12 2024 at 10:42):

Though we actually should change this interface some for more perf....but that is a micro benchmark detail.

view this post on Zulip Abhinav Sarkar (May 12 2024 at 11:15):

Thanks. Is it possible to add this information to the docs?

view this post on Zulip Norbert Hajagos (May 12 2024 at 14:49):

I think it would be useful if the docs for the builtins would have information on Lists being represented as arrays that resize when needed, not linked lists. (This would imply having an O(1) append and O(n) prepend) . This would be useful, to someone coming from another functional language, since they likely assume the implementation to be a linked list, making their code have a lot of prepending. It could go to the begining of the docs, similarly how Str has a section on utf8 there.

view this post on Zulip Abhinav Sarkar (May 12 2024 at 14:51):

Is there a plan for a linked list data structure?

view this post on Zulip Brendan Hansknecht (May 12 2024 at 14:56):

Nope

view this post on Zulip Brendan Hansknecht (May 12 2024 at 14:56):

Though it is easy to make with tags

view this post on Zulip Brendan Hansknecht (May 12 2024 at 14:57):

Generally linked lists are quite bad for performance even when they win on time complexity.

view this post on Zulip Brendan Hansknecht (May 12 2024 at 14:57):

They are exceptional pessimistic on modern hardware unlike flat arrays

view this post on Zulip Abhinav Sarkar (May 12 2024 at 14:57):

Isn't O(n) prepend too slow for some cases?

view this post on Zulip Brendan Hansknecht (May 12 2024 at 14:59):

For sure. Though often the right answer isn't a linked list but some alternative solution built on flat arrays. Not saying never use linked lists, but often time with small to medium size lists, arrays are still way faster even with the O(n) prepend

view this post on Zulip Brendan Hansknecht (May 12 2024 at 14:59):

But highly depends on the exact use case

view this post on Zulip Abhinav Sarkar (May 12 2024 at 15:00):

Is Roc List boxed or unboxed?

view this post on Zulip Abhinav Sarkar (May 12 2024 at 15:00):

The docs do not mention anything

view this post on Zulip Abhinav Sarkar (May 12 2024 at 15:05):

Interestingly, in OCaml, list prepend is O(1) and append is O(n).

view this post on Zulip Brendan Hansknecht (May 12 2024 at 15:05):

Data in a list is stored on the heap.

view this post on Zulip Norbert Hajagos (May 12 2024 at 15:06):

The values of a list are Unboxed, but yeah, it is on the heap. They are like Rust / C++ vectors. As I understand, anything that can be not boxed isn't. So tuples, structs, closures, numbers. They are not boxed, just nicely sitting next to eachother in a list

view this post on Zulip Brendan Hansknecht (May 12 2024 at 15:07):

Ocaml uses a linked list so it has to scan the entire pointer chain across memory to reach the last element and append.

Roc is a contiguous block of memory and has to move all elements to prepend.

view this post on Zulip Abhinav Sarkar (May 12 2024 at 15:07):

That'll be another good thing to mention in the docs.

view this post on Zulip Norbert Hajagos (May 12 2024 at 15:09):

Even tag unions are on the stack, unboxed if they are not recursive (Like not defined as a Linked list such as LinkedList a : [End, Node a]

view this post on Zulip Norbert Hajagos (May 12 2024 at 15:09):

Ups, not that

view this post on Zulip Norbert Hajagos (May 12 2024 at 15:11):

I think this would be the correct type alias LinkedList a : [Nil, Cons a (LinkedList a)]

view this post on Zulip Hristo (May 12 2024 at 15:14):

Brendan Hansknecht said:

Ocaml uses a linked list so it has to scan the entire pointer chain across memory to reach the last element and append.

Not an Ocaml-specific question, and also apologies if it may be going a tiny bit off-topic:

view this post on Zulip Brendan Hansknecht (May 12 2024 at 15:16):

Really is only common with doubly linked lists where you can traverse in reverse. Just having the end is fine, but then someone will want end - 1 or end - 2. So it is only a partial patch. Generally better to just not store the end and educate users on how to properly use linked lists. Like what kinds of algorithm to avoid with them.

view this post on Zulip Brendan Hansknecht (May 12 2024 at 15:17):

That said, modern hardware really hates linked lists. So they generally are a bad idea unless you have an extreme or required use case.

view this post on Zulip Abhinav Sarkar (May 12 2024 at 15:22):

In absence of linked lists in Roc, what is the right way to write a recursive function that goes over a sequence of items and produces a sequence of zero or more items for each original item, and finally returns a flattened sequence containing all output items?

view this post on Zulip Abhinav Sarkar (May 12 2024 at 15:23):

With linked list, this would be O(n). With Roc lists, this would be O(n²).

view this post on Zulip Richard Feldman (May 12 2024 at 15:29):

that's true in the worst case, yeah

view this post on Zulip Richard Feldman (May 12 2024 at 15:29):

the bet is that the Roc version will also be faster in practice

view this post on Zulip Richard Feldman (May 12 2024 at 15:30):

in other words, that it will have worse worst-case asymptotic performance but better runtime performance in practice, because reallocations won't be that common, and the cost of things like extra allocations and pointer chasing in the linked list will be more impactful in practice

view this post on Zulip Richard Feldman (May 12 2024 at 15:30):

I wouldn't use recursion for that btw, I'd do it like this:

flatMap : List a, (a -> List a) -> List a
flatMap = \items, fn ->
    List.walk items [] \answer, item ->
        List.concat answer (fn item)

view this post on Zulip Richard Feldman (May 12 2024 at 15:31):

I call it a bet because we can't really know until we've seen it in use in non-contrived programs :big_smile:

view this post on Zulip Richard Feldman (May 12 2024 at 15:32):

so far the bet has been working out well I'd say!

view this post on Zulip Richard Feldman (May 12 2024 at 15:41):

just to clarify, linked lists would be O(n) but :point_up: this implementation (or a recursive one) would only be O(n²) in the worst case where every List that gets appended is so big that it requires a reallocation

view this post on Zulip Richard Feldman (May 12 2024 at 15:41):

because the answer list will be mutated in-place

view this post on Zulip Richard Feldman (May 12 2024 at 15:45):

in contrast, if the lists are small enough in practice that reallocations are rarely needed, then it's more like O(n + m) where n is the number of input elements and m is the number of elements in the List returned in the given function

view this post on Zulip Richard Feldman (May 12 2024 at 15:46):

(and the same would be true of the linked list version)

view this post on Zulip Richard Feldman (May 12 2024 at 15:50):

also worth noting that you could reduce the chances of reallocation by starting with a List.withCapacity instead of []

view this post on Zulip Brendan Hansknecht (May 12 2024 at 17:00):

Yeah using withCapacity, that is O(n) with no reallocations or extra data movement

view this post on Zulip Brendan Hansknecht (May 12 2024 at 17:06):

A linked list is O(n), but also requires n separate allocations. Calling malloc n times is essentially always bad.

view this post on Zulip Hristo (May 12 2024 at 19:55):

Brendan Hansknecht said:

Really is only common with doubly linked lists where you can traverse in reverse. Just having the end is fine, but then someone will want end - 1 or end - 2. So it is only a partial patch. Generally better to just not store the end and educate users on how to properly use linked lists. Like what kinds of algorithm to avoid with them.

Apologies for not having been clear enough. I should've specified that I wasn't referring to doubly-linked lists.

I was referring to enabling append and prepend to both be O(1). This is in the context of an arbitrary linked-list implementation (of a real-world programming language). I'd like to emphasise that I'm not proposing that Roc's Lists become linked lists.

view this post on Zulip Karakatiza (May 13 2024 at 03:17):

Wondered the same today while trying to build an algorithm with lists.
Not only big O time complexity is important to know, but also reallocations, and when they happen (e.g. does it happen when doing a sublist on a value that goes out of scope in a tail-end-recursive function?)

I tried to write a simple dirty CSV parser (not using roc-parser), and it's difficult to know what route to take to optimize without more information.

view this post on Zulip Abhinav Sarkar (May 13 2024 at 03:23):

Karakatiza said:

it's difficult to know what route to take to optimize without more information.

I have the exact same problem.

view this post on Zulip Brendan Hansknecht (May 13 2024 at 05:57):

Yeah, we definitely want to build some tooling around this (and more docs), but it is still early days. We also have a number of optimizations that are not in yet that create very sharp edges.

view this post on Zulip Brendan Hansknecht (May 13 2024 at 05:58):

Currently the best way to learn is to chat here and share code.

view this post on Zulip Brendan Hansknecht (May 13 2024 at 05:58):

That and profile things

view this post on Zulip Brendan Hansknecht (May 13 2024 at 06:05):

Hristo said:

Apologies for not having been clear enough. I should've specified that I wasn't referring to doubly-linked lists.

My reply wasn't clear enough either it seems (note, I added some extra context to the quotes in parens). Only the first sentence is about doubly linked lists:

Really (a pointer to the end of a linked list list) is only common with doubly linked lists where you can traverse in reverse.

The following sentences are about the common implementations of singly linked lists in many programming languages (especially ml related ones):

Just having the end is fine, but then someone will want "end - 1" or "end - 2". So it is only a partial patch (to add a pointer to the end element). Generally better to just not store the end and educate users on how to properly use linked lists (append heavy algorithms are an anti pattern. Especially because they make it seem like patterning matching the end or otherwise accessing other tail elements should be fast, but only the very last element would be special). (Education should be on) what kinds of algorithms to avoid with them (most append heavy algorithms on linked lists should be reversed to be prepend heavy or have other tricks to avoid the costs).


Last updated: Jul 05 2025 at 12:14 UTC