Stream: beginners

Topic: RocStr details


view this post on Zulip Erwin Kuhn (Dec 17 2021 at 17:25):

I'm trying my hand at writing a custom RocStr implementation and I'm wondering about a few things

Small strings

If I understand correctly, when a string is <= 15 bytes, the first 15 bytes contain the string data, the last byte contains the metadata and the "small string flag" is the most significant bit of the metadata (= the last byte).

However the implementation of RocStr.is_small_str takes the length (= the last 8 bytes) and checks its most significant bit (through a usize -> isize conversion), which should be part of the string, not the metadata. The method seems to work though, what am I missing?

Not dropping RocStr in Rust

The RocStr in roc_std implements the Drop trait, which frees the memory when the RocStr goes out of scope in Rust and if the ref count is 1. This leads to the use of core::mem::forget in host code.

I'm trying out a version without Drop, which does nothing with ref count / memory - unless you want to store the value. In this case, there's MyRocStr.incr_ref_count. Here's the implementation:
https://github.com/erwinkn/roc-platform-experiments/blob/main/platform/src/lib.rs#L84-L97

The host also stores all the strings printed by the Roc application until the end of the program and prints them again. If I don't increment the ref count, I see some garbage data, so the memory has been freed. If I increment the ref count, the string is kept alive, so the idea seems to be working.

Are there problems with this approach? I'm still not sure I understand why RocStr in roc_std has a Drop implementation, so I may be missing something

view this post on Zulip Brendan Hansknecht (Dec 17 2021 at 18:50):

If i wasn't in an airport, i would try and help. I'll try to look over this tomorrow if someone hasn't already figured it out.
Also, implementation uses the last byte cause little endianness.

view this post on Zulip Brendan Hansknecht (Dec 17 2021 at 18:54):

So when you flip the byte order, 1 flag bit is in the same location as the sign but of an isize

view this post on Zulip Folkert de Vries (Dec 17 2021 at 20:33):

I think the endianness is what I was missing, I was also confused by that code

view this post on Zulip Folkert de Vries (Dec 17 2021 at 20:34):

the Drop implementation is needed because calling roc's main function could return a string to the host. The host then needs to clean up that string, otherwise we would leak memory. RocStr owns the allocation

view this post on Zulip Folkert de Vries (Dec 17 2021 at 20:35):

(I wonder if for rust we should instead call it RocString, and we could then make a RocStr type that borrows its contents (so it's used as &RocStr in practice)

view this post on Zulip Erwin Kuhn (Dec 24 2021 at 10:51):

Catching up a bit late, sorry!

view this post on Zulip Erwin Kuhn (Dec 24 2021 at 10:53):

Brendan Hansknecht said:

Also, implementation uses the last byte cause little endianness.

Wow that's a really smart trick! I guess this is for performance reasons vs using a XOR to retrieve the flag bit, if it were somewhere else?

view this post on Zulip Erwin Kuhn (Dec 24 2021 at 10:55):

Folkert de Vries said:

(I wonder if for rust we should instead call it RocString, and we could then make a RocStr type that borrows its contents (so it's used as &RocStr in practice)

One thing I've done in Rust is to add a phantom marker on RocStr to give it a lifetime, and add a method that increases refcount and returns an owned type without lifetime to store it.

You can still store the RocStr without increasing ref count by explicitly giving it a 'static lifetime, but at this point you're saying "trust me, I got this", so I think it's fine

view this post on Zulip Erwin Kuhn (Dec 24 2021 at 10:56):

The point of not owning the RocStr that's passed as argument to a function exposed to Roc code vs owning a RocString returned from application code applies to any language though, right?

view this post on Zulip Brendan Hansknecht (Dec 24 2021 at 22:30):

Erwin Kuhn said:

Brendan Hansknecht said:

Also, implementation uses the last byte cause little endianness.

Wow that's a really smart trick! I guess this is for performance reasons vs using a XOR to retrieve the flag bit, if it were somewhere else?

I maen an optimizing compiler like llvm would probably make them equivalent, but it theoretically saves an instruction or 2. Instead of xor and compare to value, it is just directly compare to a value.


Last updated: Jul 06 2025 at 12:14 UTC