Or where could I find more info about how they work/perform?
I believe the important thing to know is that Roc will by default clone a collection when it is modified, unless the reference count is exactly 1, in which case it will mutate a collection in place. The Roc team is making the bet that the latter case will occur more often than the former. With this in mind, the List type (as I understand it) works just like a C++ vector/Rust Vec does, in that it allocates its elements contiguously on the heap instead of allocating each element separately in a linked list. The Dict and Set types are both index maps under the hood, backed by a List.
That is a very good summary
nice one thanks - just to clarify, when you say 'clone', you mean a full/deep copy? no fancy structural-sharing or anything, right?
yep, deep copy.
In practice when code is written well, this will almost never happen. That said, today, there are more sharp edges than we would like.
this may be slightly off-topic, but if a web-server platform which depends on tokio
can exist, I am wondering, can an immutable platform exist making use of something like this ?
ah nevermind, that's a stupid question...the platform is for side-effects
One could build an immutable data structure library in roc using tags (many immutable data structures are really just forms of trees)
So while roc doesn't have any of these today, and the standard library likely never will. Should be 100% supportable in userland
Roc is intentionally testing a hypothesis that traditional mutable data structures can be performant in an immutable language due to most data being unique if it might be modified. If the data is shared, it is likely to not be modified.
This hypothesis is kinda tested by the rust borrow checker. Which is why we have reasonable faith that this should work out in practice (though maybe require some code rewriting to avoid accidental sharing of something that might be mutated)
On top of that, many immutable data structures end up having a lot of costs that add up over time due to the large amount of extra pointer chancing and fragmentation. So a few dense copies might actually be faster than using an immutable data structure.
Of course, there is a line here where at some point the number of copies ends up costing more than the cost of having an immutable data structure. Generally well written code can avoid essentially all copies. When that happens, we get significantly more perf than an immutable data structure could ever hope to accomplish.
Of course roc is young and we still have many more sharp edges. For example, we currently have some known refcounting cases that totally destroy performance.
sure, what you say is true, programs built around immutable data-structures aim mainly for correctness, rather than performance, however when you throw multi-threading into the mix the scales starts to shift.
it actually more costly to copy or even lock/unlock a mutable data-structure across many threads
anyway, I get your point - i don't want to side-track this thread
No worries side tracking is great for discussion. And yeah, multithreading can get a lot more complex. Currently Roc has not really touched that world. Anything that exist today and is multithread (like basic webserver) basically spawn N unique roc threads with no way to share or communicate.
oh i see
well if you like discussion, I personally am 100% sold on immutability...of course, i do understand the performance cost, but I value the benefits so much more...and by the way, opportunistic mutation, can still be available.
in fact with immutable data the opportunity for opportunistic mutation is much greater :)
I'm curious the use case for a shared across threads immutable data structures. As in, if it is truly immutable data, then it really doesn't matter how it is stored. It is read only. If it needs global mutation that will be visible to all threads, I don't think immutable data structures allow for that. I guess you could modify then data structure in one thread and the message the updated data structure to all other threads? (that sounds like a big synchronization problem)
Generally speaking, I don't see many use cases where you have a global structure that every thread will have their own modified version of.
immutable data is and stays immutable forever - 'changing' them produces a new value, so it doesn't matter which function or thread you're in. You received an immutable vector as a param? You can be confident that it's not going to change under your feet...
Yes, they are additive.
now, if the opportunistic mutation bit confuses you, have a look at this
basically, you can create a 'transient' version of them in constant time, mutate it to death, and 'commit' it
I guess the general update structure would be that you are passed in the most up to date version of the immutable data structure. Then you can return a list of modifications and they will be synchronously applied to the data structure such then other threads can later see the change?
no no no
there are no updates that other threads can see
in fact there is no update-in-place
i see what confuses you
I know that none of it is inplace. Just trying to understand how this fixes shared information across threads
Generally speaking, there aren't many problems where I want every thread to have their own unique view of data. Generally at some point, I want the global view to be updated
sharing information across threads is done by modelling time
and time is best modelled by immutable data
so that you can take stable snapshots
this is the most concise sentence i can give you. For more information if that topic intrigues you, you really want to look at clojure's reference types
these are constructs that model time
Can we get concrete? What is a simple example of a threaded use case that benefits from immutable data structures?
there are 4 of them with different semantics - e.g. synchronous, asynchronous, transactional etc
the benefit is that you don't mutate the data-structure - you mutate the 'box' around it, and that is managed and comes with specific semantics and guarrantees that it couldn't make if the data was mutable
this may not sound like much but it's actually profound, because as I said, it lets you model time, which is super important in a multi-threads situation
there is no time in a pure function but there is time in real programs
Ah, looking at clojure.core/ref
in closure and clojure.core/dosync
is making it clearer how closure decides to model things.
If I had a shared cache that was an immutable data structure represented as a ref. To read from the cache would be to deref and just directly use the data structure. To update the cache would be to additively change the data structure (because it is an immutable data structure that does not change inplace), then update all pointers to the data structure to the pointer to the new version. This can be done in an atomic commit to avoid other forms of thread synchronization.
let me give you the simplest possible example - i will presume total unfamiliarity with clojure
exactly! you picked the hardest one though which is rarely used
the simplest one is the atom
, which the standard way of achieving mutation
atom comes with two guarrantees - atomicity + synchronicity (i.e. blocking)
Are atoms limited to basic values like numbers or do they allow for more complex types?
anything immutable goes
ok
interesting
hasmaps, vectors etc
deeply nested whatever
they cost nothing to comapre
in clojure, you really have to go out of your way to mutate something w/o some sort of synchronisation
there is a way, but is more like the Unsafe
in java
highly discouraged
Yeah, interesting that the closure swap for atoms takes a function.
but guess what? very rarely you end up needing these types
what else could it take? a function of the current value
But I guess makes sense cause it can linearize calls for synchronoziation and avoid race conditions.
exactly
I think most of this could be represented in roc with a immutable data structure library built on tags. The only special part is that the atom/ref functions would be required to be implemented in the platform.
So I think roc could have both.
interesting
just to clarify what i meant by modelling time
the future is a function of the past, right?
so you will notice that all reference types in clojure have some sort of update function associated with them, but they all look the same same in terms of function-signature
they all take a function which accepts the current value (the past), and will produce the next value (the future)
they all have different semantics with respect to their operation, but they all obey the epochal time model i mentioned above
once you have a time model, multi-threading is a breeze
hope that makes sense
by the way, the stuff i said about transients earlier, don't apply to multi-threading at all! Mutating a transient value should only ever be done in one thread and should almost always be scoped - i.e. not leak it for further mutation upstream, although exceptions can exist.
Yeah totally does.
Looking some at the underlying implementations, two main costs come to mind:
yes indeed lock contention is always the problem isn't it? :) that's exactly why modelling time in a pluggable fashion is so important
compare the difference between atom
VS agent
- different approaches: cas VS queueing
This isn't technically lock contention, but yeah, just noting the edge cases.
and general tradeoffs
for sure - anything 'managed' comes with costs
but i've personally come to conclude that 'data' and 'time' and pretty important aspects of programs
Yep. I think roc's current setup is likely to match agent's better. We already live in what feels like a fully async world.
For example in a webserver, if you want a global cache. That may work completely outside of roc only in the platform. Can be implemented in anyway the platform feels is best. Async with queuing, read write mutex, etc.
cool
So I think for a solid set of problems, the platform can define this api in a way that the roc author doesn't have to think how it is implemented. The platform author decides the transaction model.
and underlying way of implementation.
Definitely means that there would be less consistency from platform to platform though. So will be really interesting to see how this matures as roc grows.
Last updated: Jul 06 2025 at 12:14 UTC