The kingfisher platform exports this function to the host: Request, Box Model -> Response
. It takes a Model
but does not return Model
.
This call reduces the ref count of Model
by one. Since a refcount of 0 would mean, that the value gets deallocated, the host has to set the refcount of Model
to 2
before calling roc.
When this is done in parallel, a segmentation fault happens. Probably, because the host and roc try to manipulate the same memory at the same time.
A workaround is to set the refcount to a very high value any time Model
gets manipulated. But this would mean, that the platform would only support that high number of read requests.
I though, I read somewhere, that when refcount is set to a magic number, it is handled by roc as infinity and is not manipulated. But I don't know, where I read this or what that magic number is.
Is there a magic refcount number, that tells roc not to manipulate the ref count?
If not, is there another way to solve this?
@Brendan Hansknecht has done some atomic refcounting work
When this is done in parallel, a segmentation fault happens. Probably, because the host and roc try to manipulate the same memory at the same time.
Yeah, our refcounting is not thread safe currently. Also, simply updating the box refcount may not be enough. It might have recourse and update refcount a of things in the box, but not sure.
Is there a magic refcount number, that tells roc not to manipulate the ref count?
Yes, I believe it is -1 as usize
, but would need to double check.
Thank you. The confirmation, that there is a magic number helped me to find it.
After playing around with different numbers, I think the magic number is 0
.
As far as I know, the highest bit of a refcount has to be set to 1.
When I set the value to 2^63+1
(1<<63+1
, highest bit is 1, all other bits are 0), the refcount gets reduced by one (from 9223372036854775809
to 9223372036854775808
). If I call the function again without modifying the new refcount, it changes to a random
number. Probably, because the memory is invalid. If I call the function again without modifying the refcount, I get a segmentation violation
, since Model
was deallocated.
When I set the refcount to 1
(the lowest bit is 1, all other bits are zero, even the first bit) and I call the function, the refcount changes to 0
. If I call the function again, the value stays at 0
. The Model
does not get deallocated.
So setting the refcount to 0
does what I want.
Awesome
I guess I forgot the number
we've talked about this in various places; I wonder if the time has come to figure out how we want to do this
My current vote is roughly. Use a bit to decide whether or not to do atomic refcounting (this feature is off by default and must be enabled by a compile time or platform flag). Only do atomic refcounting if the platform sets that bit. Cause the only way for data to be shared between threads is through the platform. When sharing, it can set the bit.
I like that design!
I have another bug when calling roc in parallel. I don't understand what is going on or if this is related to refcount. Maybe you have an idea what the cause is or how I could debug it?
First, the host calls a roc function, that returns a Box Model
. Model
has type List Str
. If Model
is Str
, everything works.
Then, the host calls the function from above (Request, Box Model -> Response
). If the function is called in sequence, there is no problem. But if it is called many times in parallel (around 10.000 calls "at the same time"), the memory gets corrupted. For example, if Model
was ["hello"]
, then after the run, it is something like [ �GM��]
.
Could it be, that the Str inside the List is refcounted and roc manipulates that refcount?
I tried to debug this by looking at the memory, that gets allocated with roc_alloc
.
After calling the function, that returns the Model
, roc has allocated the following values:
0x5b3d0ea5c930: [0 0 0 0 0 0 0 128 8 202 165 14 61 91 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
0x5b3d0ea5ca00: [0 0 0 0 0 0 0 128 87 111 114 108 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 133]
The first 8 bytes of each value are the ref counter. When you know, how the roc types look it memory, you can see, that the second value is a short string with the ascii values for "World". From the two 1
at the first value, you can see that it could be a list with one element and a capacity of 1.
After calling the second function, the following values look like this:
0x5b3d0ea5c930: [254 255 255 255 255 255 255 255 8 202 165 14 61 91 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
0x5b3d0ea5ca00: [0 0 0 0 0 0 0 128 87 111 114 108 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 133]
So the values are the same, but the refcount of the list has changed to a very big number. When the function is called again, the refcount of the list gets reduced by one.
Should roc change the refcount of the list? If so, it should probably not be such a high number but something like 0
, 1
or the magic infinity. Is this a bug in roc?
By the way: It would be nice, if roc_alloc
would have a debugging argument. For example a pointer to a struct, that contains the line number of the roc code, that triggert the alloc
call, and a type-ID of the value, that gets allocated. In optimized builds, the argument could be a zero-pointer.
Could it be, that the Str inside the List is refcounted and roc manipulates that refcount?
Yes, exactly that
We need to remove this kind of recursive refcounting
I am planning to fix it for list (mostly done minus final tests)
We separately probably need to fix it for recursive tags and box.
Oh also, it is probably freeing the list, not the string
Cause the string is small and would not get freed
Brendan Hansknecht said:
I am planning to fix it for list (mostly done minus final tests)
This sounds great. Is it the list-size-on-heap branch? I tried it and it fails with munmap_chunk(): invalid pointer
It is.
And yeah, unsurprising. Though more of the code is hopefully done, it definitely has bugs
Last updated: Jul 26 2025 at 12:14 UTC