I'm trying my hand at writing a custom RocStr
implementation and I'm wondering about a few things
Small strings
If I understand correctly, when a string is <= 15 bytes, the first 15 bytes contain the string data, the last byte contains the metadata and the "small string flag" is the most significant bit of the metadata (= the last byte).
However the implementation of RocStr.is_small_str
takes the length (= the last 8 bytes) and checks its most significant bit (through a usize -> isize
conversion), which should be part of the string, not the metadata. The method seems to work though, what am I missing?
Not dropping RocStr in Rust
The RocStr
in roc_std
implements the Drop
trait, which frees the memory when the RocStr
goes out of scope in Rust and if the ref count is 1. This leads to the use of core::mem::forget
in host code.
I'm trying out a version without Drop
, which does nothing with ref count / memory - unless you want to store the value. In this case, there's MyRocStr.incr_ref_count
. Here's the implementation:
https://github.com/erwinkn/roc-platform-experiments/blob/main/platform/src/lib.rs#L84-L97
The host also stores all the strings printed by the Roc application until the end of the program and prints them again. If I don't increment the ref count, I see some garbage data, so the memory has been freed. If I increment the ref count, the string is kept alive, so the idea seems to be working.
Are there problems with this approach? I'm still not sure I understand why RocStr
in roc_std
has a Drop implementation, so I may be missing something
If i wasn't in an airport, i would try and help. I'll try to look over this tomorrow if someone hasn't already figured it out.
Also, implementation uses the last byte cause little endianness.
So when you flip the byte order, 1 flag bit is in the same location as the sign but of an isize
I think the endianness is what I was missing, I was also confused by that code
the Drop
implementation is needed because calling roc's main function could return a string to the host. The host then needs to clean up that string, otherwise we would leak memory. RocStr
owns the allocation
(I wonder if for rust we should instead call it RocString
, and we could then make a RocStr
type that borrows its contents (so it's used as &RocStr
in practice)
Catching up a bit late, sorry!
Brendan Hansknecht said:
Also, implementation uses the last byte cause little endianness.
Wow that's a really smart trick! I guess this is for performance reasons vs using a XOR to retrieve the flag bit, if it were somewhere else?
Folkert de Vries said:
(I wonder if for rust we should instead call it
RocString
, and we could then make aRocStr
type that borrows its contents (so it's used as&RocStr
in practice)
One thing I've done in Rust is to add a phantom marker on RocStr
to give it a lifetime, and add a method that increases refcount and returns an owned type without lifetime to store it.
You can still store the RocStr
without increasing ref count by explicitly giving it a 'static
lifetime, but at this point you're saying "trust me, I got this", so I think it's fine
The point of not owning the RocStr
that's passed as argument to a function exposed to Roc code vs owning a RocString
returned from application code applies to any language though, right?
Erwin Kuhn said:
Brendan Hansknecht said:
Also, implementation uses the last byte cause little endianness.
Wow that's a really smart trick! I guess this is for performance reasons vs using a XOR to retrieve the flag bit, if it were somewhere else?
I maen an optimizing compiler like llvm would probably make them equivalent, but it theoretically saves an instruction or 2. Instead of xor and compare to value, it is just directly compare to a value.
Last updated: Jul 06 2025 at 12:14 UTC