the RocStr in every platform example (e.g. examples/python-interop/demo.c) have the following definition:
struct RocStr {
uint8_t *bytes;
size_t len;
size_t capacity;
};
this made reading the small string optimization stuff in the init function a bit confusing to read, would it be possible to write it as this instead:
struct SmallStr {
uint8_t bytes[sizeof(struct RocBytes) - 1];
uint8_t len; // maybe this should be size_t? idk
};
union RocStr {
struct RocBytes bytes;
struct SmallStr smallStr;
};
I'm not that familiar with C and unions, so I'm not even sure this would work but just wanted to ask if anyone know if it's doable
also: where do these definitions come from? (RocStr, RocBytes etc.)
You're right that this would be a more accurate reflection of how RocStr works!
Out of 30 host.c
files, 3 of them define RocStr
. But all they're doing is finding the length and the pointer to the characters. You could refactor that to use the more detailed type but it wouldn't be a huge benefit for such a tiny amount of code. I guess the people who wrote it just didn't bother. But if you want to improve it, go for it!
These structures have to be defined separately in each of the languages we use.
The Roc compiler understands them internally.
The Rust definitions are in crates/roc_std
There is a Zig definition in crates/compiler/builtins/bitcode/src
Rust is the best-supported host language at the moment. We have the roc_std
crate and also automatic glue code generation using roc glue
and crates/glue/src/RustGlue.roc
. So that's what "full support" looks like for a host language.
We would like to have the same level of support for other host languages as we do for Rust. I've started working on C++ here. Nobody is looking at C. It's quite hard once you get past RocStr because there are no generics.
Also, we would like to move away from copying and pasting the same code over and over in host examples, and centralise it instead. That has been discussed somewhere in another Zulip thread recently. Probably under "compiler development".
thanks for the detailed answer! I'll give it a go when I have time, reading that the first
time around without knowing how small string works made me question my eyes haha
Note, it is technically a union of 3 different representation:
As such, both the highest bit of length and of capacity can be part of what determines the union values. So it isn't exactly nice to represent a union either due to those bits being mixed in.
Really need 63 bit numbers in some of the union cases to make them accurate. That or of course, mask them on all uses.
Given this, I think that is why we just use a struct based layout
So in packed bits on a 64bit target, the 3 layouts are:
SmallStr : {bytes: [23; u8], constOne: u1, len: u7}
BigStr: {bytes: *u8, len: u64, cap: u64}
# Note, len and cap always have highest bit as zero in this case
SeamlessSlice: {bytes: *u8, constOne: u1, len: u63, constZero: u1, refCountPtr: upper63BitsOfPointer}
# To get real refcount pointer just left shift the refCountPtr by 1.
Last updated: Jul 26 2025 at 12:14 UTC