Stream: contributing

Topic: can we write RocStr as a union instead


view this post on Zulip Pei Yang Ching (Dec 03 2023 at 13:20):

the RocStr in every platform example (e.g. examples/python-interop/demo.c) have the following definition:

struct RocStr {
  uint8_t *bytes;
  size_t len;
  size_t capacity;
};

this made reading the small string optimization stuff in the init function a bit confusing to read, would it be possible to write it as this instead:

struct SmallStr {
  uint8_t bytes[sizeof(struct RocBytes) - 1];
  uint8_t len; // maybe this should be size_t? idk
};

union RocStr {
  struct RocBytes bytes;
  struct SmallStr smallStr;
};

I'm not that familiar with C and unions, so I'm not even sure this would work but just wanted to ask if anyone know if it's doable

also: where do these definitions come from? (RocStr, RocBytes etc.)

view this post on Zulip Brian Carroll (Dec 03 2023 at 14:36):

You're right that this would be a more accurate reflection of how RocStr works!

Out of 30 host.c files, 3 of them define RocStr. But all they're doing is finding the length and the pointer to the characters. You could refactor that to use the more detailed type but it wouldn't be a huge benefit for such a tiny amount of code. I guess the people who wrote it just didn't bother. But if you want to improve it, go for it!

These structures have to be defined separately in each of the languages we use.
The Roc compiler understands them internally.
The Rust definitions are in crates/roc_std
There is a Zig definition in crates/compiler/builtins/bitcode/src

Rust is the best-supported host language at the moment. We have the roc_std crate and also automatic glue code generation using roc glue and crates/glue/src/RustGlue.roc. So that's what "full support" looks like for a host language.

We would like to have the same level of support for other host languages as we do for Rust. I've started working on C++ here. Nobody is looking at C. It's quite hard once you get past RocStr because there are no generics.

Also, we would like to move away from copying and pasting the same code over and over in host examples, and centralise it instead. That has been discussed somewhere in another Zulip thread recently. Probably under "compiler development".

view this post on Zulip Pei Yang Ching (Dec 04 2023 at 20:32):

thanks for the detailed answer! I'll give it a go when I have time, reading that the first
time around without knowing how small string works made me question my eyes haha

view this post on Zulip Brendan Hansknecht (Dec 04 2023 at 20:35):

Note, it is technically a union of 3 different representation:

As such, both the highest bit of length and of capacity can be part of what determines the union values. So it isn't exactly nice to represent a union either due to those bits being mixed in.

view this post on Zulip Brendan Hansknecht (Dec 04 2023 at 20:35):

Really need 63 bit numbers in some of the union cases to make them accurate. That or of course, mask them on all uses.

view this post on Zulip Brendan Hansknecht (Dec 04 2023 at 20:36):

Given this, I think that is why we just use a struct based layout

view this post on Zulip Brendan Hansknecht (Dec 04 2023 at 20:43):

So in packed bits on a 64bit target, the 3 layouts are:

SmallStr : {bytes: [23; u8], constOne: u1, len: u7}

BigStr: {bytes: *u8, len: u64,  cap: u64}
# Note, len and cap always have highest bit as zero in this case

SeamlessSlice:  {bytes: *u8, constOne: u1, len: u63,  constZero: u1, refCountPtr: upper63BitsOfPointer}
# To get real refcount pointer just left shift the refCountPtr by 1.

Last updated: Jul 26 2025 at 12:14 UTC