Segfault with long running program · beginners

I have been running the Roc Clock as an actual clock these last few days. The problem is that it consistently seg faults after running for about 8 hours. I’m guessing this is going to be a hard issue to debug 😅
Seg Faults

Nick Hallstrom (Feb 27 2023 at 10:26):

Anton (Feb 27 2023 at 10:46):

valgrind --leak-check=full --track-origins=yes ./examples/clock

Anton (Feb 27 2023 at 17:58):

Letting the clock run a lot faster could also trigger the segfault much sooner...

Nick Hallstrom (Feb 27 2023 at 18:13):

Ya that's a good point. I'm not home right now though, so I'll just let it run it's course today. But if I need to redo this I'll just disconnect the PI from the clock hardware and just run it without all the sleeps.

Nick Hallstrom (Feb 28 2023 at 04:04):

Brendan Hansknecht (Feb 28 2023 at 04:30):

Brendan Hansknecht (Feb 28 2023 at 04:32):

Nick Hallstrom (Feb 28 2023 at 04:33):

Brendan Hansknecht (Feb 28 2023 at 04:36):

The question is why isn't the memory being freed. Could just be a bug in str.graphemes or could be some other bug in the application that is keeping around the data generated from str.graphemes. I'll try to take a look and see what's going on.

Brendan Hansknecht (Feb 28 2023 at 04:39):

I have a fuzz test for Str.graphemes (that has never run on arm), but at least on x86_64, it seems to pass consistently. Not definitive of anything, but suggests that the bug if probably not just directly in Str.graphemes.

Brendan Hansknecht (Feb 28 2023 at 04:44):

Looking at the code, I bet the bug is in Task.loop, or more specifically Effect.loop. I bet that it is not correctly decrementing the refcount of the looped over data between iterations.

Brendan Hansknecht (Feb 28 2023 at 04:44):

Brendan Hansknecht (Feb 28 2023 at 05:05):

Hmmm, unless this issue is arm specific, my guess seems wrong. Somehow numbers is being shared around in a way such that roc never manages to decrement it's refcount to zero, so it is never freed. Maybe another function is missing a decrement.

Brendan Hansknecht (Feb 28 2023 at 05:09):

Nick Hallstrom (Feb 28 2023 at 20:11):

It seems like your hypothesis is correct! #2 crashed after a short while and #1 has been running for a while without any issues

Brendan Hansknecht (Feb 28 2023 at 20:48):

Brendan Hansknecht (Feb 28 2023 at 21:15):

Next i guess i should look at the ir with recounting and see anything missing, otherwise, need to dig into other functions that numbers is passed to. Some built-in likely is simply not correctly handling refcounts.

Brendan Hansknecht (Feb 28 2023 at 22:09):

Ok. I think I found the issue. Looks like List.map2 and List.map3 have memory leak.

Brendan Hansknecht (Feb 28 2023 at 22:12):

Nick Hallstrom (Feb 28 2023 at 22:29):

So I think I was wrong, looks like 1 and 2 above both eventually segfaulted. Does that make sense for the List.mapN thing?

Brendan Hansknecht (Feb 28 2023 at 22:31):

I think there may be more than one memory leak, just at different speeds. does 1 take longer to segfault?

Nick Hallstrom (Feb 28 2023 at 23:36):

Nick Hallstrom (Mar 01 2023 at 04:42):

Trying to familiarize myself with the codebase. Am I looking in the right area here? I'm guessing Dec and IncN refer to the reference count?

Brendan Hansknecht (Mar 01 2023 at 06:09):

Brendan Hansknecht (Mar 01 2023 at 06:10):

In this case my guess is something weird is happening with the 2nd to nth list that cause roc to not decrement their refcount after the function. We have a whole system around wether variables are passed to built-ins as owned or borrowed. I am guessing that is the root cause of the bug.

Brendan Hansknecht (Mar 01 2023 at 06:11):

Though i don't really understand the fundamentals of how that system works/the underlying details.

Brendan Hansknecht (Mar 01 2023 at 06:12):

Could also be a bug local to those functions directly, but at first glance (and a little messing around), i didn't think so.

Brendan Hansknecht (Mar 01 2023 at 06:16):

Richard Feldman (Mar 01 2023 at 12:20):

Richard Feldman (Mar 01 2023 at 12:21):

Folkert de Vries (Mar 01 2023 at 12:36):

Folkert de Vries (Mar 01 2023 at 12:37):

for List.map itself I remember the performance is significantly worse. But these days my benchmarking harness segfauls and I cannot figure out why that is

Richard Feldman (Mar 01 2023 at 14:05):

Brendan Hansknecht (Mar 02 2023 at 05:55):

Brendan Hansknecht (Mar 02 2023 at 05:58):

essentially we were only decrementing the refcount of the last list in List.mapN functions.

Brendan Hansknecht (Mar 02 2023 at 06:00):

Brendan Hansknecht (Mar 02 2023 at 06:09):

@Nick Hallstrom when you get the chance, can you run your clock with this change and valgrind again. I assume there is more than 1 memory leak and want to try and track down what else is leaking memory. Another screenshot like you sent above would be useful in debugging this.

Nick Hallstrom (Mar 02 2023 at 20:17):

Brendan Hansknecht (Mar 02 2023 at 22:17):

Brendan Hansknecht (Mar 02 2023 at 22:22):

Also, Str.graphemes is not leaking anymore. It has one block of size 104 bytes. Which is the exact size of a list with 4 small strings in it. It's just that the program crashed, so valgrind noticed the memory is still in use when the program ended.

Hard to say about the other data. I mean, the 32 bytes in 4 blocks that was definitely lost is obviously a leak. I would guess the 1564 bytes in 46 blocks is also leaked, but would need to look at your program for other expected allocations to be sure.

Brendan Hansknecht (Mar 02 2023 at 22:24):

I think the stack overflow is a separate bug/issue. Maybe a problem with tail recursion or need for the --optimized flag. Given only, 10KB are in use at exit, that shouldn't be enough pressure to end up limiting the stack size.

Nick Hallstrom (Mar 02 2023 at 23:54):

Ya, if I didn’t have the —optimize flag, it would seg fault during compilation on the Pi. You were the one that found that workaround I believe.

Brendan Hansknecht (Mar 03 2023 at 00:11):

Brendan Hansknecht (Mar 03 2023 at 00:12):

Brendan Hansknecht (Mar 03 2023 at 01:00):

Ok. I think I found the issue. We have a bug in our generation of Effect.loop. It is not generating a tail recursive function. As such, after some giant number of runs, we get a stack overflow

Not sure if this is specific to your example for some reason or a general issue. Trying to make a minimal repro now.

Brendan Hansknecht (Mar 03 2023 at 01:11):

Yeah, so far having a hard time generating a repro. I assume that somehow it is related to inlining and llvm.

Brendan Hansknecht (Mar 03 2023 at 01:12):

I think you have a pretty long chain of tasks with Task.traverse on multiple lists

Brendan Hansknecht (Mar 03 2023 at 01:46):

Yeah, the generated llvm ir from clock.roc for Effect.loop does not have tail call optimization

Brendan Hansknecht (Mar 03 2023 at 01:46):

Brendan Hansknecht (Mar 03 2023 at 02:03):

#5086 is a minimal repro. Not really sure what is going on though. Some sort of bad reaction of Task.loop being after other Tasks.

Brendan Hansknecht (Mar 03 2023 at 02:04):

@Nick Hallstrom if you can make your program work without init and startup, that will workaround the issue.

Brendan Hansknecht (Mar 03 2023 at 02:11):

main =
    Task.loop {initialized: Bool.false, lastNumbers: ["8", "8", "8", "8"]} run

...

run = \{initialized, lastNumbers} ->
    if !initialized then
        _ <- await init
        _ <- await startup
        Task.succeed (Step {initialized: Bool.true, lastNumbers})
    else
        _ <- await (Task.sleep 1000)
        n <- await Time.getTime
        numbers = Str.graphemes n
        _ <- await (setClock lastNumbers numbers)
        Task.succeed (Step {initialized, lastNumbers: numbers})

Brendan Hansknecht (Mar 03 2023 at 02:14):

Nick Hallstrom (Mar 03 2023 at 03:25):

Nick Hallstrom (Mar 04 2023 at 00:59):

Just an update on this. It did indeed fix the issue. It’s been running for a day with all the sleeps taken out and is still going strong 💪

Richard Feldman (Mar 04 2023 at 01:28):

Richard Feldman (Mar 04 2023 at 01:29):

thanks for helping valgrind these bugs to the surface - now they can be fixed for everyone! :smiley:

Stream: beginners

Topic: Segfault with long running program

Nick Hallstrom (Feb 27 2023 at 10:25):

Nick Hallstrom (Feb 27 2023 at 10:26):

Anton (Feb 27 2023 at 10:46):

Anton (Feb 27 2023 at 17:58):

Nick Hallstrom (Feb 27 2023 at 18:13):

Nick Hallstrom (Feb 28 2023 at 04:04):

Brendan Hansknecht (Feb 28 2023 at 04:30):

Brendan Hansknecht (Feb 28 2023 at 04:32):

Nick Hallstrom (Feb 28 2023 at 04:33):

Brendan Hansknecht (Feb 28 2023 at 04:36):

Brendan Hansknecht (Feb 28 2023 at 04:39):

Brendan Hansknecht (Feb 28 2023 at 04:44):

Brendan Hansknecht (Feb 28 2023 at 04:44):

Brendan Hansknecht (Feb 28 2023 at 05:05):

Brendan Hansknecht (Feb 28 2023 at 05:09):

Nick Hallstrom (Feb 28 2023 at 20:11):

Brendan Hansknecht (Feb 28 2023 at 20:48):

Brendan Hansknecht (Feb 28 2023 at 21:15):

Brendan Hansknecht (Feb 28 2023 at 22:09):

Brendan Hansknecht (Feb 28 2023 at 22:12):

Nick Hallstrom (Feb 28 2023 at 22:29):

Brendan Hansknecht (Feb 28 2023 at 22:31):

Nick Hallstrom (Feb 28 2023 at 23:36):

Nick Hallstrom (Mar 01 2023 at 04:42):

Brendan Hansknecht (Mar 01 2023 at 06:09):

Brendan Hansknecht (Mar 01 2023 at 06:10):

Brendan Hansknecht (Mar 01 2023 at 06:11):

Brendan Hansknecht (Mar 01 2023 at 06:12):

Brendan Hansknecht (Mar 01 2023 at 06:16):

Richard Feldman (Mar 01 2023 at 12:20):

Richard Feldman (Mar 01 2023 at 12:21):

Folkert de Vries (Mar 01 2023 at 12:36):

Folkert de Vries (Mar 01 2023 at 12:36):

Folkert de Vries (Mar 01 2023 at 12:37):

Richard Feldman (Mar 01 2023 at 14:05):

Brendan Hansknecht (Mar 02 2023 at 05:55):

Brendan Hansknecht (Mar 02 2023 at 05:58):

Brendan Hansknecht (Mar 02 2023 at 06:00):

Brendan Hansknecht (Mar 02 2023 at 06:09):

Nick Hallstrom (Mar 02 2023 at 20:17):

Nick Hallstrom (Mar 02 2023 at 20:17):

Brendan Hansknecht (Mar 02 2023 at 22:17):

Brendan Hansknecht (Mar 02 2023 at 22:22):

Brendan Hansknecht (Mar 02 2023 at 22:24):

Nick Hallstrom (Mar 02 2023 at 23:54):

Brendan Hansknecht (Mar 03 2023 at 00:11):

Brendan Hansknecht (Mar 03 2023 at 00:12):

Brendan Hansknecht (Mar 03 2023 at 01:00):

Brendan Hansknecht (Mar 03 2023 at 01:11):

Brendan Hansknecht (Mar 03 2023 at 01:12):

Brendan Hansknecht (Mar 03 2023 at 01:46):

Brendan Hansknecht (Mar 03 2023 at 01:46):

Brendan Hansknecht (Mar 03 2023 at 02:03):

Brendan Hansknecht (Mar 03 2023 at 02:04):

Brendan Hansknecht (Mar 03 2023 at 02:11):

Brendan Hansknecht (Mar 03 2023 at 02:14):

Nick Hallstrom (Mar 03 2023 at 03:25):

Nick Hallstrom (Mar 04 2023 at 00:59):

Richard Feldman (Mar 04 2023 at 01:28):

Richard Feldman (Mar 04 2023 at 01:29):

Ayaz Hafiz (Mar 06 2023 at 04:24):

Nick Hallstrom (Mar 06 2023 at 04:40):

Nick Hallstrom (Mar 08 2023 at 05:31):