Stream: compiler development

Topic: simplify refcounting


view this post on Zulip Brendan Hansknecht (Jan 01 2025 at 01:17):

I am gonna simplify the refcounting in roc. Initially, we made the max negative value be the lowest refcount. This is unnecessary and annoying for platform implementations.

Instead, I plan to switch things over to the more intuitive implementation. 0 is constant. 1 through n are refcounts 1 through n. If the first bit is set, it means that refcounting needs to be atomic.

As we realized a while back, with a 64bit refcount, it is impossible to roll over the refcount counter. Each references requires at least 64bits of memory. The machine is guaranteed to oom before running out of refcounts and rolling over the counter.

view this post on Zulip Brendan Hansknecht (Jan 01 2025 at 01:17):

So instead of -max_int to -1. It is simply 1 to max_int.

view this post on Zulip Brendan Hansknecht (Jan 01 2025 at 01:18):

I think this will just make some future refcount related changes easier

view this post on Zulip Brendan Hansknecht (Jan 01 2025 at 01:18):

This will be a platform breaking change, but should be pretty minimal to fix

view this post on Zulip Richard Feldman (Jan 01 2025 at 01:26):

makes sense! :+1:

view this post on Zulip Brendan Hansknecht (Jan 01 2025 at 01:28):

As an extra note, still leaving anything atomic off. Though hopefully can eventually make it configurable in the platform. Maybe interesting to test with basic-webserver model eventually.

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 05:38):

I continue to be amazed by how nice the m1 mac cache hierarchy and cpu is. For the nqueens benchmark (which basically spends all the time in allocating, refcounting, and freeing), here are the stats:

for apple m1.
normal refcounting as baseline so 1.00x
branch check for atomic, followed by normal refcount: 1.07x
always atomic refcount: 1.27x

vs on my x86 linux machine.
baseline still 1.00x
branch check for atomic, followed by normal refcount: 1.15x
always atomic refcount: 2.67x :poop:

There is no contention. This is a single threaded app. Yet always using atomic refcounts balloons the runtime to 2.67x.

Also, in raw execution speed, the m1 machine baseline runs just under 2x faster than the x86 linux baseline. The linux machine albeit a few years old, is a reasonable gaming machine with an i7.

view this post on Zulip Oskar Hahn (Jan 02 2025 at 12:47):

Is using 0 as constant safe? If you have data with an refcount of 1, it gets reduced to 0. It seems error prone, that 0 is used for "never deallocate" and also for "should be deallocated".

view this post on Zulip Oskar Hahn (Jan 02 2025 at 12:50):

For platform development, it would be nice, if Roc could export functions, to set the refcount.

view this post on Zulip Oskar Hahn (Jan 02 2025 at 12:52):

If would also nice, that when the ref-count is set to "constant", that Roc will not write to the data or to the ref count. The effect would be, that you can call a Roc function with constant data in parallel without the need to to use atomic refcounts.

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 13:51):

I think 0 is reasonable. All roc refcounting always checks the old value is 1 rather than if the new value is 0. That said, if you subtract first and then check the new value is 0, that still means you can deallocate it.

Due to 0 refcounts often being in read only memory you always have to early exit on 0 before doing anything else.

Also, for atomics, you only get the option to check the old value, not the new value.

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 13:52):

Yeah, I think roc should export refcounting functions

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 13:56):

And yeah, constant refcount should also automatically propagate like atomic refcount. That way a box set to constant recount will automatically propagate to individual elements in the box on load.

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 13:58):

Note: as long as all refcounts are set to constant correctly, you can already share data across multiple threads in roc without atomics

view this post on Zulip Oskar Hahn (Jan 02 2025 at 14:11):

Brendan Hansknecht said:

Note: as long as all refcounts are set to constant correctly, you can already share data across multiple threads in roc without atomics

Is it enough to set the ref count of the most outer value to constant or does every referenced data needs a refcount of constant to be thread safe?

For example, if you have a type like List Str, for example ["not_a_short_str1"]. And the app code does something like this:

main : List Str -> U64
main = \list ->
  list
  |> List.map Str.trim
  |> List.len

would it be enough to set the refcount of the list to constant or would every element on the list needs a refcount of 0, so that Roc can be called in parallel without atomic refcounts?

I think, that all elements need a refcount of 0 for the call to be threadsafe. If this is true, would it be possible for roc to export a function like set_refcount_with_references(data, n) that sets the refcount of all elements in data to n? This would be especially useful, if the platform uses a generic Model and therefore does not know, what type the data is?

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 14:31):

I want to make it propagate lazily on load.

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 14:31):

So the outer list is set to constant

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 14:31):

On List.get the loaded data will get set to constant

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 14:32):

Same with loading data from a model via Box.unbox

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 14:32):

This way it is cheap to convert data to constant or atomic.

view this post on Zulip Oskar Hahn (Jan 02 2025 at 14:56):

Will this be threadsafe? If the function is called many times at the same time, Roc would "set" the same memory to 0 many times, which does not sound safe, or would need atomics.

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 15:07):

What is unsafe about that? They are all setting to the same value of 0. So all threads will opt out of refcounting the data and never deallocate. Even if the location is written to 10 times, the value each thread sees is still 0.

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 15:11):

I guess if a platform mixes constants with atomics, we do have to be a bit more careful, but most likely that will be a platform bug. Anything set to constant should be threadlocal. And anything set to atomic should not be constant.

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 15:12):

If we allowed promoting an atomic that is already shared between multiple threads to constant, it would mean we need even slower atomic refcounting.

view this post on Zulip Oskar Hahn (Jan 02 2025 at 15:14):

You are the expert :smiley: I thought, that for some reasons, that I don't understand, it is not thread safe to read and write memory at the same time, even if you are writing the same data. If two threads write a value at the same memory at the same time, it is undefined, what the result will be. But I don't know the source of this claim.

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 15:16):

The big concern generally speaking is that another reading thread may not see the update immediately. That is why each thread has to write 0 until eventually all threads see the data as 0.

view this post on Zulip Oskar Hahn (Jan 02 2025 at 15:19):

What do you mean, that a constant value has to be threadlocal?

I want to use a constant value over many threads. Will this be possible? The platform/host will make sure, that all of the parallel calls are only for reading. If a function is called, that wants to write the data, it first waits (with a RWLock), that all reading is done, then sets the refcount of the data to 1 and when Roc is finished, sets the refcount back to 0.

view this post on Zulip Oskar Hahn (Jan 02 2025 at 15:23):

Your proposal is, that 0 is constant, and when the highest bit is set to 1, it means atomic.

This is the same as saying, that positive numbers are non-atomic and negative numbers are atomic. With this explanation it is clear, that a constant value can never be atomic, since there is no negative zero.

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 15:30):

I need to think about edge cases more. I initially was only thinking of propagating the atomic flag (which is always safe). It would also be nice to propagate the constants (just set a list to constant and its data on load will be set to constant). Propagating constants is only safe if the constant data contains no atomic data (aka only contains threadlocal data). Once set to constant, the data can be shared with any number of threads, but converting a list of atomic strings into a list of constants strings is not safe.

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 15:30):

I want to use a constant value over many threads. Will this be possible? The platform/host will make sure, that all of the parallel calls are only for reading. If a function is called, that wants to write the data, it first waits (with a RWLock), that all reading is done, then sets the refcount of the data to 1 and when Roc is finished, sets the refcount back to 0.

In this case, I would just use atomic refcounting

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 15:30):

with the RWLock still

view this post on Zulip Brendan Hansknecht (Jan 02 2025 at 15:32):

I guess for safety of converting a list of atomic strings to a list of constants, I could not propagate the constant refcount unless the atomic is unique. That would be thread safe without pessimizing atomic refcounts elsewhere.


Last updated: Jul 06 2025 at 12:14 UTC