I have been running the Roc Clock as an actual clock these last few days. The problem is that it consistently seg faults after running for about 8 hours. I’m guessing this is going to be a hard issue to debug 😅
Seg Faults
Here’s the code for it if that’s useful
https://github.com/Billzabob/roc-clock/blob/main/examples/clock.roc
Valgrind might be able to shed more light on the cause, can you execute:
valgrind --leak-check=full --track-origins=yes ./examples/clock
Letting the clock run a lot faster could also trigger the segfault much sooner...
Ya that's a good point. I'm not home right now though, so I'll just let it run it's course today. But if I need to redo this I'll just disconnect the PI from the clock hardware and just run it without all the sleeps.
Is this useful at all?
Valgrind
haha 12 GB allocated over the course of the application.
Looks to be str.graphemes that is leaking memory, so that is useful to know.
Guess I just need to download some more RAM then
The question is why isn't the memory being freed. Could just be a bug in str.graphemes or could be some other bug in the application that is keeping around the data generated from str.graphemes. I'll try to take a look and see what's going on.
I have a fuzz test for Str.graphemes (that has never run on arm), but at least on x86_64, it seems to pass consistently. Not definitive of anything, but suggests that the bug if probably not just directly in Str.graphemes.
Looking at the code, I bet the bug is in Task.loop, or more specifically Effect.loop. I bet that it is not correctly decrementing the refcount of the looped over data between iterations.
I am gonna try to make a small repro and see.
Hmmm, unless this issue is arm specific, my guess seems wrong. Somehow numbers is being shared around in a way such that roc never manages to decrement it's refcount to zero, so it is never freed. Maybe another function is missing a decrement.
As a simple validity check, can you do two tests for me on you pi:
numbers = Str.graphemes n
-> numbers = lastNumbers
. I am pretty sure that this shouldn't crash anytime soon/shouldn't accumulate a ton of memory.
numbers = Str.graphemes n
-> numbers = [ "8", "8", "8", "8" ]
. If I am correct, this should still lead to the memory leak and then crash.
It seems like your hypothesis is correct! #2 crashed after a short while and #1 has been running for a while without any issues
Ok. Good to know
Next i guess i should look at the ir with recounting and see anything missing, otherwise, need to dig into other functions that numbers is passed to. Some built-in likely is simply not correctly handling refcounts.
Ok. I think I found the issue. Looks like List.map2
and List.map3
have memory leak.
minimal repro: #5079
So I think I was wrong, looks like 1 and 2 above both eventually segfaulted. Does that make sense for the List.mapN
thing?
I think there may be more than one memory leak, just at different speeds. does 1 take longer to segfault?
Yes seemed like the first one took much longer to seg fault
Trying to familiarize myself with the codebase. Am I looking in the right area here? I'm guessing Dec
and IncN
refer to the reference count?
Yep and yep
In this case my guess is something weird is happening with the 2nd to nth list that cause roc to not decrement their refcount after the function. We have a whole system around wether variables are passed to built-ins as owned or borrowed. I am guessing that is the root cause of the bug.
Though i don't really understand the fundamentals of how that system works/the underlying details.
Could also be a bug local to those functions directly, but at first glance (and a little messing around), i didn't think so.
Ownership definition defined here: https://github.com/roc-lang/roc/blob/main/crates/compiler/mono/src/borrow.rs#L1009
is there a way we could rewrite those functions to be in pure Roc?
(using unsafe low-level List
-module-only primitives)
I'd like that to happen, but currently it would have worse performance
maybe it's worth it for these List.mapN
functions though
for List.map
itself I remember the performance is significantly worse. But these days my benchmarking harness segfauls and I cannot figure out why that is
eh let's not degrade perf then
I think I found the bug. Will push a pr soon.
essentially we were only decrementing the refcount of the last list in List.mapN
functions.
Entire change is to delete 2 dollar signs in a macro: https://github.com/roc-lang/roc/pull/5085/files
@Nick Hallstrom when you get the chance, can you run your clock with this change and valgrind again. I assume there is more than 1 memory leak and want to try and track down what else is leaking memory. Another screenshot like you sent above would be useful in debugging this.
Got the next one for you
Seg Fault 2
So still another leak with Str.graphemes
it looks like?
Just wondering, is this a --optimized
build?
Also, Str.graphemes
is not leaking anymore. It has one block of size 104 bytes. Which is the exact size of a list with 4 small strings in it. It's just that the program crashed, so valgrind noticed the memory is still in use when the program ended.
Hard to say about the other data. I mean, the 32 bytes in 4 blocks that was definitely lost is obviously a leak. I would guess the 1564 bytes in 46 blocks is also leaked, but would need to look at your program for other expected allocations to be sure.
I think the stack overflow is a separate bug/issue. Maybe a problem with tail recursion or need for the --optimized
flag. Given only, 10KB are in use at exit, that shouldn't be enough pressure to end up limiting the stack size.
Ya, if I didn’t have the —optimize
flag, it would seg fault during compilation on the Pi. You were the one that found that workaround I believe.
Ah, I guess I already forgot the exact contraints
I'll take another look and see if I have any ideas around the stack overflow
Ok. I think I found the issue. We have a bug in our generation of Effect.loop
. It is not generating a tail recursive function. As such, after some giant number of runs, we get a stack overflow
Not sure if this is specific to your example for some reason or a general issue. Trying to make a minimal repro now.
Yeah, so far having a hard time generating a repro. I assume that somehow it is related to inlining and llvm.
I think you have a pretty long chain of tasks with Task.traverse
on multiple lists
Yeah, the generated llvm ir from clock.roc
for Effect.loop
does not have tail call optimization
No idea why not so far.
#5086 is a minimal repro. Not really sure what is going on though. Some sort of bad reaction of Task.loop
being after other Tasks.
@Nick Hallstrom if you can make your program work without init
and startup
, that will workaround the issue.
Something like this:
main =
Task.loop {initialized: Bool.false, lastNumbers: ["8", "8", "8", "8"]} run
...
run = \{initialized, lastNumbers} ->
if !initialized then
_ <- await init
_ <- await startup
Task.succeed (Step {initialized: Bool.true, lastNumbers})
else
_ <- await (Task.sleep 1000)
n <- await Time.getTime
numbers = Str.graphemes n
_ <- await (setClock lastNumbers numbers)
Task.succeed (Step {initialized, lastNumbers: numbers})
Looks to fix the other memory leak as well
Oh awesome! I’ll give that a try in the morning. Thanks again Brendan
Just an update on this. It did indeed fix the issue. It’s been running for a day with all the sleeps taken out and is still going strong 💪
yoooo, awesome!
thanks for helping valgrind these bugs to the surface - now they can be fixed for everyone! :smiley:
@Nick Hallstrom I believe https://github.com/roc-lang/roc/pull/5095 will solve the stack overflow you ran into above, without needing to workaround
Woohoo :tada: I’ll give it a try in the morning. Thanks so much 🙏
Got to this a bit late, but your fix seems to be working! Thanks a bunch Ayaz!
Last updated: Jul 05 2025 at 12:14 UTC