I'm ready for a main quest compiler task (on Monday), what's available?
builtins! :smiley:
any interest in fleshing those out?
We can add those piecemeal right? Like add a new builtin fn, add some tests etc and that's a PR
yep! Once this PR for lists lands
Richard Feldman said:
builtins! :smiley:
Perfect :)
I'm interested in working on the string builtins, but it seemed like the low_level_interp tests weren't running when I messed around with them. Are those in a ready-to-get-picked-up state?
I think @Anton recently fixed that!
No, the low_level_interp_test.zig is an exception to the new CI test check. I was not able to debug it to get it working again on the first try, but I can focus on that today.
Feel free to go ahead with string builtins without adding a test in low_level_interp_test.zig. Please share the specific builtin you want to add here before starting to avoid double work @Dan G Knutson @Edwin Santos.
I will do Str.concat
oh ok I'll take care of that test then
I have a bunch of changes on my current branch that will affect it
Richard Feldman said:
oh ok I'll take care of that test then
@Richard Feldman I already fixed some Str.is_empty test failures when you sent this and I thought I was close to finishing the whole thing so I continued :p Several hours later all low_level_interp_tests pass now. I'm not sure about the branches I added to src/eval/interpreter.zig though (PR#8378). Anyway, I basically just put it for inspiration, feel free to do with it what you like :)
Essentially any methods in the String documentation which aren't part of RocStr in builtins/str.zig are up for grabs correct? If so I'll start on Str.repeat then!
Yes, go ahead :)
Reading some of the messages about trying to get the compiler ready before AoC.. I just can't sit on the sidelines! I was eying with strCaselessAsciiEquals, strWithAsciiLowercased, strWithAsciiUppercased, because I've made them for the rust compiler, so I can probably figure them out for the new one, though I haven't really touched it yet. Is following your footsteps on #8380 a good way @Anton?
Hey y'all, an implementation for Repeat is already in builtins/str.zig, does this mean that integration into build/roc/Builtin.roc and testing is all that's left to complete adding this builtin? I'll finish that stuff up if so.
I'll also start on Str.with_prefix since that doesn't have an implementation yet.
Thanks for helping out @Norbert Hajagos, yes #8380 should be a good guide. A bunch of tests will not work but just filling things out and making a draft PR is already helpful. Once we get a nice fix for low_level_interp_test in place we can fix up all the builtin PRs.
an implementation for Repeat is already in builtins/str.zig
So builtins/str.zig is for use in compiled Roc binaries (so not used by the interpreter). I did notice repeatC does not have a unit test in that file yet, can you add one @Edwin Santos?
So builtins/str.zig is for use in compiler Roc binaries.
See #8380 for which files you need to later to provide the implementation for the interpreter.
Some builtins can be implemented in Roc (like fold in src/build/roc/Builtin.roc) whereas others should be implemented in interpreter.zig. How should this decision be made @Richard Feldman?
Like I said, new to this compiler. Is there a reason we have 2 implementation for the builtins, one for the interpreter, one fore the compiler? Skimming your PR, it seems the compiled version does extra uniqueness checks to reuse one of the strings if possible for performace reasons, and the interpreted version does not. Is there a reason the interpreted doesn't import the builtins from the compiled version and wraps them with some interpreter-specific work, like allocating the result on the stack?
Good question! I actually don't know, can one of you provide the answer @Richard Feldman @Jared Ramirez @Luke Boswell?
I am checking the src right now. It seems some functions are actually reused, like builtins.list.listGetUnsafe, so I think you can safely reuse your more performant implementation as well.
Yes, looking more thoroughly there are a bunch of cases like this. builtins.list.listIsEmpty, builtins.list.listLen, builtins.list.listConcat, etc...
Didn't mean to turn this into a code review, I'm grateful that you've made such an atomic change I can base my work on :sweat_smile:
Yeah, you are correct :check: I implemented things in a different order and forgot to update my original implementation.
I'll fix it
I know why I did not call builtins.str.strConcat, I removed the call to builtins.list.listConcat in my hacky fix branch fix-low-level-interp-failures to make it work.
Let's pause the addition of builtins until @Richard Feldman can get a proper fix for src/eval/test/low_level_interp_test.zig in place.
I'm planning to work on builtins after work hours today. Is this a blocker because tests can't pass, or something more fundamental?
Adding implementations to the interpreter using the existing ones as a guide seems counterproductive given that the existing ones do not work.
I see. Will just poke the compiler here and there to see how things work.
Hi, just wanted to update that I'm currently working on implementations/tests for:
Str.repeat
Str.with_suffix
Str.drop_suffix
Str.drop_prefix
Will just tinker with other stuff while waiting for the low level interpreter tests to be unblocked.
2 Questions:
I've noticed that for a lot of the string functions, Str.decref is called on certain paths but not on others, and .incref is seldom called (in str.zig, just in strSplitOnHelp). Could someone explain a bit what the procedure is for handling the refcount? I understand not needing to alter it when you're returning newly allocated strings for example, but some which return slices like the trim methods don't seem to be explicitly increasing the refcount which i'm a lil confused about.
2nd question, is it safe if you uniquely hold a string to shift it's bytes pointer and decrease it's capacity to shorten the string's "window" from the front? strTrimStart trims the front by creating a new RocStr with a shifted bytes pointer (but the capacity_or_alloc_ptr is the original one). Shifting the pointer and decreasing the capacity accordingly if the string is unique and not a slice and returning the same input string seems like a viable route on my naive understanding. Not ultra familiar with low-level so not sure if that's dangerous tho :sweat_smile:
I'm not familiar with low level as well, but from what I understand, as long as the refcount is 1, you can do whatever you want with it. So yeah, I think you got q2 nailed down. I need to run and I don't have an immadiate answer to q1.
Yeah, I assume that my idea from q2 from shifting the bytes pointer + changing the capacity when RocStr is Unique + not a slice is probably not that much faster than allocating a RocStr with those couple pointers but if this is sound/not dangerous it'd be good to throw in. Mostly wasn't sure if this would actually set off a grenade or something haha.
Also on q1 when someone has a chance to give a definitive answer, are increments happening before calling these functions and decrements happening inside the functions (under the right circumstances)? I'm thinking of something like strTrim where if the entire string would be trimmed (in the first couple lines), an empty string is returned and the input string is decref'd. This makes sense to me if the string's refcount is incremented prior to it being passed to the function (the function doesn't increment the refcount beforehand), but if not then I'm a little lost why the function would decrement the refcount on that path.
Increments before being passed into functions or auto-refcounting like a Rust RC pointer was my hypothesis on why there's so many decrefs without increfs but wanted to check if this is on the right track.
1 thing I'm not sure about is whether that would lead to problems when we are freeing the str. I don't have time today, sorry for just dropping in vague advices. Follow the decref method in RocStr and see if it would cause problems that the bytes pointer points to further ahead than the original allocation.
I should be able to give answers, just don't have time right now. Will try to loop back in a few hours
Thanks! No rush, was just asking questions on my lunch break haha.
I've noticed that for a lot of the string functions, Str.decref is called on certain paths but not on others, and .incref is seldom called (in str.zig, just in strSplitOnHelp).
In roc, each builtin low level function specifies whether or not it takes the input via ownership or via reference (only implemented in the old compiler currently).
Generally speaking, if there is any chance of mutation, they pick ownership. The roc compiler we automatically generate incref calls before calling the function if needed. This enables the compiler to automatically optimize away incref calls if they aren't needed.
Instead of:
x = "some long string with refcount"
y = call_builtin(x) # builtin internally calls incref
decref x
we generate:
x = "some long string with refcount"
# Already unique ownership, no need to call incref here
y = call_builtin(x)
# Builtin took ownership (and maybe called decref), no need to call decref here
Could someone explain a bit what the procedure is for handling the refcount?
I am not sure how things are wired into the interpreter, currently, but likely, it should follow a similar partern. Explicitly have a list of functions that take the RocStr by ownership and for that list call incref before calling the builtin low level function. This will avoid needing two versions of the builtins (one for the interpreter and one for linking to compiled applications)
but some which return slices like the trim methods don't seem to be explicitly increasing the refcount which i'm a lil confused about.
Yeah, given trim expects to mutate it takes by ownership. Then it trims and returns a slice (which just passes on the ownership).
2nd question, is it safe if you uniquely hold a string to shift it's bytes pointer and decrease it's capacity to shorten the string's "window" from the front?
No. Cause when you go to free the string you would free the wrong bytes.
strTrimStart trims the front by creating a new RocStr with a shifted bytes pointer (but the capacity_or_alloc_ptr is the original one).
If I understand your comment correctly, that is not actually a new RocStr. That is a seamless slice. Essentially, it is a reference to the original string that still knows about the original allocation so it can free it.
The other option would be to keep it as a regular string but memcpy the entire string over to remove the trimmed bytes.
That said, I didn't double check the code here, so I may be missing something with how it is setup currently.
Yeah, I'm pretty sure from a quick skim of the code that StrTrimStart will never allocate. It will return a shifted small string or a seamless slice. Technically if the string is unique, we could shift it instead of creating a seamless slice, but that is not clearly worth it.
Also on q1 when someone has a chance to give a definitive answer, are increments happening before calling these functions and decrements happening inside the functions (under the right circumstances)?
Yep, exactly. Not actually sure the state of this in the interpreter though. But we need to add it if it is not currently working.
Hopefully that gives some base answers.
Feel free to ask more questions or for clarifications.
Have we got these builtins wired into the interpreter yet? It's been a few weeks since I checked, but I thought that was what Richard is currently working on
I'm not sure the exact state. I know richard was working on getting the infrastructure in to add builtins. Then he want other folks to add more builtins.
Ahk, cool. I can see we are using these builtins already in some places.
All the action is in callLowLevelBuiltin in src/eval/interpreter.zig
@Brendan Hansknecht Thanks so much for the explanation, this clarifies a lot. Also currently it looks like most of the standard lib str functions are taking RocStr by value instead of by reference. Also yeah I was thinking that changing the pointer to the original allocation might end up mucking something up when memory is being freed, I guess one way to put it is that there always needs to be a pointer to the beginning of the original allocation to ensure it's eventually correctly deallocated. Super helpful!
Luke Boswell said:
Have we got these builtins wired into the interpreter yet? It's been a few weeks since I checked, but I thought that was what Richard is currently working on
Currently I think we're waiting on some failing tests for low_level_interp to be fixed, we're starting to work out the actual functions in builtins/ but integrating them with the interpreter is the main blocker before we can PR additions to the builtins.
One more quick question, if you have a function returning a substring from a RocStr (any of them like drop_prefix, trim, etc), do y'all think it'd be more optimal to return a smallstr instead of a seamless slice if the string would fit into a smallstr? Currently the trim functions for example are mostly returning slices but wasn't sure if it might be a little better to try to pack them into smallstr's if possible since the resulting RocStr would be fully on stack. Maybe the memcopy is such that this wouldn't make a huge difference tho.
I think if it fits into a small string then it should be that. No extra pointer chasing when accessing it. I assume the time saved to access those strings possibly multiple times outweighs the cost of creating them.
It also unlocks further optimizations later, . Example:
The program can call my_small_str.with_acsii_bytes_uppercased() and when that built-in sees a small string, it will uppercase the bytes and return a small string. When it sees a non-small string, it will not assume that it could return a small string after uppercasing, since that function does not reduce the size of the string.
RocStr by value instead of by reference
So this is where wording is a bit confusing cause there is value/reference as in RocStr/&RocStr. and there is owned/reference as in calling incref before calling a function or just calling the function.
In general RocStr should always be passed by value as RocStr instead of &RocStr or *RocStr.
When it comes to owned/reference, that is what depends on the specific function. If the function is read only. (like Str.len), it should be by reference. If the function wants to mutate (or return a referencing subslice), it should be owned.
I guess one way to put it is that there always needs to be a pointer to the beginning of the original allocation to ensure it's eventually correctly deallocated
Exactly
if you have a function returning a substring from a RocStr (any of them like drop_prefix, trim, etc), do y'all think it'd be more optimal to return a smallstr instead of a seamless slice if the string would fit into a smallstr?
The answer is sadly, "it depends" in my testing.
A lot of times, the substring is used in a way that the copy is not worth it. It is short lifetime and the main string is still in cache. From my limited testing, this is most common. So the copy is not worth it.
The longer the slice stays around the more it is worthwhile to copy. Copying means you might be able to free the original string (reduced memory overhead); you don't have to worry about things falling out of cache; and you avoid a pointer round trip.
Especially with the functions that return many strings (like splitting by spaces), I saw returning slices as a really large gain in genereal. For the trim and drop functions... I'm not really sure what is best. This is where you would really prefer to have tons of real world cases running in CI on a lot of different hardware to measure.
my_small_str.with_acsii_bytes_uppercased()
If it sees a slice. It should first make the slice unique. Which will turn it into a small string. Then it will uppercase. So it really is just a lazy copy instead of a copy right at the beginning. That said, in a perfect world, it would see the slice and then generate a new small string that uppercases as it copies and would perform one less iteration overall. So I think (ignoring a cache miss), the slice has more opportunities to be performant than creating a small string first.
Yeah, looks like that function technically has an extra iteration right now that we could avoid. It first copies then it uppercases. It could do both at the same time.
Where'd we be without you Brendan? :smile: Always a pleasure to read your performance insights.
I'm glad I can at least answer questions even if not contributing to roc significantly in code.
ok I just got low_level_interp fixed finally - that took way longer than anticipated! :sweat_smile:
I believe adding more builtins should now be unblocked but lmk if you hit anything!
I want to try and get my zig template online... that should be nice an minimal
Is that the zig platform template?
Yup, that's the one
I've been getting distracted just trying out the new compiler with that fx test platform... so cool
Str.concat is done
nice, that's a very nice template for other things people might want to add :smiley:
I've got a version of Str.trim on a branch following that formula. Could I get permissions to push a branch? I had them on a previous OS/gpg key, but haven't contributed in a while. I'm 'giesch' on github.
Hey @Dan G Knutson great to see you again -- hyped to update our graphics / gaming platform soon :smiley:
Hi, is this a good minimal PR to look at to learn how to add a builtin? https://github.com/roc-lang/roc/pull/8436
Any documentation elsewhere? I might look into adding builtins for integers if I get the time.
Yes I think so
Hi y'all, in a bit I'm going to setup a PR for Str.drop_prefix, drop_suffix, repeat, and with_prefix. Could someone give it a review once it's up? On drop_prefix specifically I'm getting an issue with the refcount I think that's causing a segfault and Not sure what to do. I structured drop_prefix like the Trim functions so I'm having a hard time debugging what could be going wrong.
Will put up a PR soon!
I've got a draft PR up with those https://github.com/roc-lang/roc/pull/8440
ahk haven't pushed all of those yet
Should I keep going with my implementation or just leave it to you?
Well Claude and I will probably be done in like 5 mins -- would you be happy to give it a review?
I'd be happy to give it a review, I don't know if I have permissions to approve PRs into main tho.
That's ok, just leave a comment if you seen anything strange or if you think it looks good
Sure!
I think that PR is good for a review now, I'm going to start on the other Str builtins in another PR now
Other than string builtins are there any other builtins or core compiler functions to build out before December 1st?
Yeah there's a lot, like Set Dict lots of List ones
We had a loose coordination thing going for a while where we all posted in the #contributing > Worklog (Draft PRs and coordination) channel before starting work, and make a draft PR
But for a little while now it's only really been a smaller handful of people contributing so that practice dropped off a little
My approach is looking at the old docs https://www.roc-lang.org/builtins and then diffing against the src/build/roc/Builtin.roc implementations.
Will set and dict not just stay in pure roc (at least for now, built on top of list).
I guess so?? I haven't looked at them yet
There's a body of work there I guess to translate that implementation over then?
Yeah. Makes sense
From e.g. crates/compiler/builtins/roc/Dict.roc and crates/compiler/builtins/roc/Set.roc
I guess it does require a few different pieces. List has to have enough buiiltins complete. Hash needs to be implemented (which may require compiler magic). Eq has to be implemented as well if it doesn't already auto derive.
So might be a few steps, but core work will be porting otherwise.
I think @Richard Feldman is fixing Eq as we speak
yep!
I haven't done anything with hash or sort
If Eq works, I assume those should be similar to implement at least.
Hey I was trying to run a little fibonacci function and got a crash.
Looks like some kind of stack overflow exception is thrown for fib(N) where N > 5
fib = |n|
if n <= 1 {
n
} else {
fib(n - 1) + fib(n - 2)
}
Is it alright if I look into this further. I don't want to step on any toes if someone is already working on stuff related to this. I can also open an issue.
Is it alright if I look into this further.
Go ahead :)
I can also open an issue.
Yes, that's handy for tracking. I recommend starting with getting a backtrace with gdb, that will show you the function call chain that triggered the segfault.
We dont have tail call optimisation yet I think the plan is to implement trmc again
There might be some hardcoded limit in the interpreter that's like way off
List builtins are open if anyone's interested - and also a ton of them can be implemented without needing to write any Zig logic (other than maybe some boilerplate you can copy/paste from other example PRs)
Is the idea to implement all the missing function in https://www.roc-lang.org/builtins/alpha4/List/? I think I could take some of them but I am unsure which ones to implement, including the ones that require Zig
Yeah basically -- there are some differences like Numbers have changed significantly
I think with static dispatch the way we do encoding and decoding also changes a lot
Ah, for List I think it's pretty much 1-1
Also, from one recent video I saw on roc, I guess walk and what not is not necessary? and will be replaced by just "for loops"?
Will try to spend most of tomorrow and friday implementing some
yeah we already have fold and will get fold_rev (aka foldr) and that's it
Hey y'all, I'm trying to implement List.any and with the following
any : List(a), (a -> Bool) -> Bool any = |list, predicate| { for item in list { if predicate(item) { return True } } False }
But am getting the following compiler error:
This if has no else branch, but it's being used as an expression (assigned to a variable, passed to a function, etc.).
You can only use if without else when it's a statement. When if is used as an expression that evaluates to a value, else is required because otherwise there wouldn't always be a value available.
Either add an else branch, or use this if as a standalone statement.
Are early returns not supposed to be used this way? I tried to look for other examples of early returns in .roc files in the code base but i don't think there were any. I had a working version of List.any using fold but thought this would be preferable if early return is working.
that code ought to work
looks like a compiler bug! I'll dig into this tomorrow
Sounds good! Thought something might be awry.
@Edwin Santos https://github.com/roc-lang/roc/pull/8468 fixes this
Dig a bit on the code, are the built-ins only implemented for the "interpreter" backend at the moment? For example, If I want to expose, List.with_capacity I will need to make sure it is implemented in src/builtins/list.zig (it is), then it is added to roc/Builtin.roc, added it to the low_level_map on the build/builtin_compiler, and then do its implementation in src/eval/interpreter.zig?
Yeah that sounds right
We aren't lowering to machine code yet
yeah we explicitly decided to wait until 2026 to do that, since there was no chance we'd have anything working for Advent of Code if machine code were in scope :smile:
How would you test that List.with_capacity does the right thing in an integration test. is there a way to get the capacity of a list, or should I have a private builtin like list_get_capacity_unsafe to be able to test?
Good question! Personally, I would verify that the capacity is correct in a unit test and in the integration test I would just check that you get back an empty list.
The best way to test it (in my opinion) would be with an instrumented roc alloc. Since we have to pass it into roc anyway, we could modify it to validate the allocation.
Not sure if our tests are setup for that though. Would have to be a special test, not just something run with roc test.
But yeah, a zig level unit test is probably best otherwise.
Hey, taking a look at implementing List.append in Zig, is there a preference to base it off ListAppend or PushInPlace from list.zig? Looks like it could be either.
image.png
Will try to tackle List.reserve next
@Edwin Santos I would guess ListAppend if that's in list.zig
@Andres Villegas @Edwin Santos can you base your changes on https://github.com/roc-lang/roc/pull/8477 if it hasn't merged yet? I'm gonna try to merge it ASAP because it's a big change that's very prone to merge conflicts :smile:
Last updated: Nov 28 2025 at 12:16 UTC