Are the collection types (List, Set, Dict) immutable? · beginners

Stream: beginners

Topic: Are the collection types (List, Set, Dict) immutable?

Dimitrios Piliouras (Jun 01 2024 at 19:16):

Or where could I find more info about how they work/perform?

Sven van Caem (Jun 01 2024 at 20:17):

I believe the important thing to know is that Roc will by default clone a collection when it is modified, unless the reference count is exactly 1, in which case it will mutate a collection in place. The Roc team is making the bet that the latter case will occur more often than the former. With this in mind, the List type (as I understand it) works just like a C++ vector/Rust Vec does, in that it allocates its elements contiguously on the heap instead of allocating each element separately in a linked list. The Dict and Set types are both index maps under the hood, backed by a List.

Brendan Hansknecht (Jun 01 2024 at 21:17):

That is a very good summary

Dimitrios Piliouras (Jun 01 2024 at 21:48):

nice one thanks - just to clarify, when you say 'clone', you mean a full/deep copy? no fancy structural-sharing or anything, right?

Brendan Hansknecht (Jun 01 2024 at 21:51):

yep, deep copy.

Brendan Hansknecht (Jun 01 2024 at 21:51):

In practice when code is written well, this will almost never happen. That said, today, there are more sharp edges than we would like.

Dimitrios Piliouras (Jun 01 2024 at 22:04):

this may be slightly off-topic, but if a web-server platform which depends on tokio can exist, I am wondering, can an immutable platform exist making use of something like this ?

Dimitrios Piliouras (Jun 01 2024 at 22:04):

ah nevermind, that's a stupid question...the platform is for side-effects

Brendan Hansknecht (Jun 01 2024 at 22:05):

One could build an immutable data structure library in roc using tags (many immutable data structures are really just forms of trees)

Brendan Hansknecht (Jun 01 2024 at 22:06):

So while roc doesn't have any of these today, and the standard library likely never will. Should be 100% supportable in userland

Brendan Hansknecht (Jun 01 2024 at 22:09):

Roc is intentionally testing a hypothesis that traditional mutable data structures can be performant in an immutable language due to most data being unique if it might be modified. If the data is shared, it is likely to not be modified.

This hypothesis is kinda tested by the rust borrow checker. Which is why we have reasonable faith that this should work out in practice (though maybe require some code rewriting to avoid accidental sharing of something that might be mutated)

Brendan Hansknecht (Jun 01 2024 at 22:10):

On top of that, many immutable data structures end up having a lot of costs that add up over time due to the large amount of extra pointer chancing and fragmentation. So a few dense copies might actually be faster than using an immutable data structure.

Brendan Hansknecht (Jun 01 2024 at 22:12):

Of course, there is a line here where at some point the number of copies ends up costing more than the cost of having an immutable data structure. Generally well written code can avoid essentially all copies. When that happens, we get significantly more perf than an immutable data structure could ever hope to accomplish.

Brendan Hansknecht (Jun 01 2024 at 22:12):

Of course roc is young and we still have many more sharp edges. For example, we currently have some known refcounting cases that totally destroy performance.

Dimitrios Piliouras (Jun 01 2024 at 22:15):

sure, what you say is true, programs built around immutable data-structures aim mainly for correctness, rather than performance, however when you throw multi-threading into the mix the scales starts to shift.

Dimitrios Piliouras (Jun 01 2024 at 22:15):

it actually more costly to copy or even lock/unlock a mutable data-structure across many threads

Dimitrios Piliouras (Jun 01 2024 at 22:16):

anyway, I get your point - i don't want to side-track this thread

Brendan Hansknecht (Jun 01 2024 at 22:18):

No worries side tracking is great for discussion. And yeah, multithreading can get a lot more complex. Currently Roc has not really touched that world. Anything that exist today and is multithread (like basic webserver) basically spawn N unique roc threads with no way to share or communicate.

Dimitrios Piliouras (Jun 01 2024 at 22:18):

oh i see

Dimitrios Piliouras (Jun 01 2024 at 22:21):

well if you like discussion, I personally am 100% sold on immutability...of course, i do understand the performance cost, but I value the benefits so much more...and by the way, opportunistic mutation, can still be available.

Dimitrios Piliouras (Jun 01 2024 at 22:22):

in fact with immutable data the opportunity for opportunistic mutation is much greater :)

Brendan Hansknecht (Jun 01 2024 at 22:26):

I'm curious the use case for a shared across threads immutable data structures. As in, if it is truly immutable data, then it really doesn't matter how it is stored. It is read only. If it needs global mutation that will be visible to all threads, I don't think immutable data structures allow for that. I guess you could modify then data structure in one thread and the message the updated data structure to all other threads? (that sounds like a big synchronization problem)

Generally speaking, I don't see many use cases where you have a global structure that every thread will have their own modified version of.

Dimitrios Piliouras (Jun 01 2024 at 22:29):

immutable data is and stays immutable forever - 'changing' them produces a new value, so it doesn't matter which function or thread you're in. You received an immutable vector as a param? You can be confident that it's not going to change under your feet...

Brendan Hansknecht (Jun 01 2024 at 22:30):

Yes, they are additive.

Dimitrios Piliouras (Jun 01 2024 at 22:30):

now, if the opportunistic mutation bit confuses you, have a look at this

Dimitrios Piliouras (Jun 01 2024 at 22:30):

basically, you can create a 'transient' version of them in constant time, mutate it to death, and 'commit' it

Brendan Hansknecht (Jun 01 2024 at 22:31):

I guess the general update structure would be that you are passed in the most up to date version of the immutable data structure. Then you can return a list of modifications and they will be synchronously applied to the data structure such then other threads can later see the change?

Dimitrios Piliouras (Jun 01 2024 at 22:31):

no no no

Dimitrios Piliouras (Jun 01 2024 at 22:32):

there are no updates that other threads can see

Dimitrios Piliouras (Jun 01 2024 at 22:32):

in fact there is no update-in-place

Dimitrios Piliouras (Jun 01 2024 at 22:32):

i see what confuses you

Brendan Hansknecht (Jun 01 2024 at 22:32):

I know that none of it is inplace. Just trying to understand how this fixes shared information across threads

Brendan Hansknecht (Jun 01 2024 at 22:33):

Generally speaking, there aren't many problems where I want every thread to have their own unique view of data. Generally at some point, I want the global view to be updated

Dimitrios Piliouras (Jun 01 2024 at 22:33):

sharing information across threads is done by modelling time

Dimitrios Piliouras (Jun 01 2024 at 22:34):

and time is best modelled by immutable data

Dimitrios Piliouras (Jun 01 2024 at 22:34):

so that you can take stable snapshots

Dimitrios Piliouras (Jun 01 2024 at 22:35):

this is the most concise sentence i can give you. For more information if that topic intrigues you, you really want to look at clojure's reference types

Dimitrios Piliouras (Jun 01 2024 at 22:35):

these are constructs that model time

Brendan Hansknecht (Jun 01 2024 at 22:36):

Can we get concrete? What is a simple example of a threaded use case that benefits from immutable data structures?

Dimitrios Piliouras (Jun 01 2024 at 22:36):

there are 4 of them with different semantics - e.g. synchronous, asynchronous, transactional etc

Dimitrios Piliouras (Jun 01 2024 at 22:39):

the benefit is that you don't mutate the data-structure - you mutate the 'box' around it, and that is managed and comes with specific semantics and guarrantees that it couldn't make if the data was mutable

Dimitrios Piliouras (Jun 01 2024 at 22:40):

this may not sound like much but it's actually profound, because as I said, it lets you model time, which is super important in a multi-threads situation

Dimitrios Piliouras (Jun 01 2024 at 22:41):

there is no time in a pure function but there is time in real programs

Brendan Hansknecht (Jun 01 2024 at 22:44):

Ah, looking at clojure.core/ref in closure and clojure.core/dosync is making it clearer how closure decides to model things.

If I had a shared cache that was an immutable data structure represented as a ref. To read from the cache would be to deref and just directly use the data structure. To update the cache would be to additively change the data structure (because it is an immutable data structure that does not change inplace), then update all pointers to the data structure to the pointer to the new version. This can be done in an atomic commit to avoid other forms of thread synchronization.

Dimitrios Piliouras (Jun 01 2024 at 22:44):

let me give you the simplest possible example - i will presume total unfamiliarity with clojure

Dimitrios Piliouras (Jun 01 2024 at 22:44):

exactly! you picked the hardest one though which is rarely used

Dimitrios Piliouras (Jun 01 2024 at 22:45):

the simplest one is the atom, which the standard way of achieving mutation

Dimitrios Piliouras (Jun 01 2024 at 22:45):

atom comes with two guarrantees - atomicity + synchronicity (i.e. blocking)

Brendan Hansknecht (Jun 01 2024 at 22:46):

Are atoms limited to basic values like numbers or do they allow for more complex types?

Dimitrios Piliouras (Jun 01 2024 at 22:46):

anything immutable goes

Brendan Hansknecht (Jun 01 2024 at 22:46):

interesting

Dimitrios Piliouras (Jun 01 2024 at 22:46):

hasmaps, vectors etc

Dimitrios Piliouras (Jun 01 2024 at 22:46):

deeply nested whatever

Dimitrios Piliouras (Jun 01 2024 at 22:46):

they cost nothing to comapre

Dimitrios Piliouras (Jun 01 2024 at 22:47):

in clojure, you really have to go out of your way to mutate something w/o some sort of synchronisation

Dimitrios Piliouras (Jun 01 2024 at 22:47):

there is a way, but is more like the Unsafe in java

Dimitrios Piliouras (Jun 01 2024 at 22:48):

highly discouraged

Brendan Hansknecht (Jun 01 2024 at 22:48):

Yeah, interesting that the closure swap for atoms takes a function.

Dimitrios Piliouras (Jun 01 2024 at 22:49):

but guess what? very rarely you end up needing these types

Dimitrios Piliouras (Jun 01 2024 at 22:49):

what else could it take? a function of the current value

Brendan Hansknecht (Jun 01 2024 at 22:49):

But I guess makes sense cause it can linearize calls for synchronoziation and avoid race conditions.

Dimitrios Piliouras (Jun 01 2024 at 22:50):

exactly

Brendan Hansknecht (Jun 01 2024 at 22:50):

I think most of this could be represented in roc with a immutable data structure library built on tags. The only special part is that the atom/ref functions would be required to be implemented in the platform.

Brendan Hansknecht (Jun 01 2024 at 22:51):

So I think roc could have both.

Dimitrios Piliouras (Jun 01 2024 at 22:51):

interesting

Dimitrios Piliouras (Jun 01 2024 at 22:53):

just to clarify what i meant by modelling time

Dimitrios Piliouras (Jun 01 2024 at 22:53):

the future is a function of the past, right?

Dimitrios Piliouras (Jun 01 2024 at 22:54):

so you will notice that all reference types in clojure have some sort of update function associated with them, but they all look the same same in terms of function-signature

Dimitrios Piliouras (Jun 01 2024 at 22:54):

they all take a function which accepts the current value (the past), and will produce the next value (the future)

Dimitrios Piliouras (Jun 01 2024 at 22:56):

they all have different semantics with respect to their operation, but they all obey the epochal time model i mentioned above

Dimitrios Piliouras (Jun 01 2024 at 22:56):

once you have a time model, multi-threading is a breeze

Dimitrios Piliouras (Jun 01 2024 at 22:58):

hope that makes sense

Dimitrios Piliouras (Jun 01 2024 at 23:02):

by the way, the stuff i said about transients earlier, don't apply to multi-threading at all! Mutating a transient value should only ever be done in one thread and should almost always be scoped - i.e. not leak it for further mutation upstream, although exceptions can exist.

Brendan Hansknecht (Jun 01 2024 at 23:06):

Yeah totally does.

Looking some at the underlying implementations, two main costs come to mind:

lots of pointers and potentially scatter memory that mean more cache misses (not friendly to the cpu, but smarter immutable data structures do work to be at least slightly denser).
built on top of compare and swap atomics. In the bad case of atomic swap, the update function could be stuck running on repeat wasting a ton of extra cpu. If for example, 4 threads all try to swap the atomic at the same time with an expensive swap function, you will get a ton of wasted cpu. All 4 threads run the expensive function. only 1 succeeds and updates the underlying atomic references. Then the 3 remaining threads repeat. Then the remaining 2. Then the single final thread. So in the really pessimistic case instead of doing 4 updates worth of work, you have done 4 + 3 + 2 +1 = 10 updates worth of work.

Dimitrios Piliouras (Jun 01 2024 at 23:09):

yes indeed lock contention is always the problem isn't it? :) that's exactly why modelling time in a pluggable fashion is so important

Dimitrios Piliouras (Jun 01 2024 at 23:10):

compare the difference between atom VS agent - different approaches: cas VS queueing

Brendan Hansknecht (Jun 01 2024 at 23:11):

This isn't technically lock contention, but yeah, just noting the edge cases.

Brendan Hansknecht (Jun 01 2024 at 23:11):

and general tradeoffs

Dimitrios Piliouras (Jun 01 2024 at 23:11):

for sure - anything 'managed' comes with costs

Dimitrios Piliouras (Jun 01 2024 at 23:12):

but i've personally come to conclude that 'data' and 'time' and pretty important aspects of programs

Brendan Hansknecht (Jun 01 2024 at 23:12):

Yep. I think roc's current setup is likely to match agent's better. We already live in what feels like a fully async world.

Brendan Hansknecht (Jun 01 2024 at 23:14):

For example in a webserver, if you want a global cache. That may work completely outside of roc only in the platform. Can be implemented in anyway the platform feels is best. Async with queuing, read write mutex, etc.

Dimitrios Piliouras (Jun 01 2024 at 23:14):

cool

Brendan Hansknecht (Jun 01 2024 at 23:15):

So I think for a solid set of problems, the platform can define this api in a way that the roc author doesn't have to think how it is implemented. The platform author decides the transaction model.

Brendan Hansknecht (Jun 01 2024 at 23:15):

and underlying way of implementation.

Brendan Hansknecht (Jun 01 2024 at 23:15):

Definitely means that there would be less consistency from platform to platform though. So will be really interesting to see how this matures as roc grows.

Last updated: Jul 26 2025 at 12:14 UTC