Stream: compiler development

Topic: strange stack overflow


view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 07:43):

So I have a benchmark program that stack overflows and is really confusing me. I think it is an llvm codegen bug, but I am not fully sure.

I don't have a minimized example yet, but this example is not that big: https://github.com/bhansconnect/roc-dict-bench/blob/main/random-find-u64.roc

Run as:

roc build --optimize --profiling random-find-u64.roc --linker=legacy
./random-find-u64 500000 1000

It will segfault. If run with valgrind, it will comment that the segfault was due to accessing byte off the end of the stack due to a stack overflow.

If run with a debugger, the stack trace is never deep. So we aren't generating a giant chain of calls leading to a stack overflow. That said, if you every few seconds interupt the app with ctrl + c then print the stack pointer p $sp or p $rsp depending on arch. With each print, the stack pointer will get lower and lower. So the stack keeps growing.

My guess is that somehow our joinpoint loop is leaking stack. So each iteration the stack is a bit larger even though we are in a tail recursive function taht shouldn't grow the stack at all.

I think there is a good chance that this is related to the bug mentioned here: https://roc.zulipchat.com/#narrow/stream/358903-Advent-of-Code/topic/2023.20Day.202/near/406983404

Somehow our joinpoints aren't clean in relation to variables being loading and stored to the stack. This can lead to mutation or stack overflow despite having tail recursion and joinpoints.

That is my guess, but I am having quite a hard time debugging or verifying this. Any help would be greatly appreciated.

view this post on Zulip Folkert de Vries (Dec 10 2023 at 10:48):

Ah, some alloca that is not put in the initial basic block?

view this post on Zulip Folkert de Vries (Dec 10 2023 at 10:48):

We've had that before

view this post on Zulip Folkert de Vries (Dec 10 2023 at 13:32):

yeah, I suspect this is the problem

then_block14:                                     ; preds = %Num_add_30b4e3263e25f386953934b64a917c5af1369696c8155e3052c6ba1bb8ddcba7.exit57
  %struct_alloca = alloca { { i64, i64, i64, i64 }, i32 }, align 8, !dbg !753
  store { { i64, i64, i64, i64 }, i32 } %insert_record_field16, ptr %struct_alloca, align 8, !dbg !753
  br label %joinpointcont10, !dbg !753

view this post on Zulip Folkert de Vries (Dec 10 2023 at 13:39):

lol

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define void @roc__mainForHost_1_exposed_generic(ptr nocapture readnone %0) local_unnamed_addr #10 !dbg !105 {
entry:
  ret void, !dbg !110
}

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define void @roc__mainForHost_1_exposed() local_unnamed_addr #10 !dbg !112 {
entry:
  ret void, !dbg !113
}

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define i64 @roc__mainForHost_1_exposed_size() local_unnamed_addr #10 !dbg !115 {
entry:
  ret i64 0, !dbg !116
}

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 16:18):

Ah, when we have join's we have to also reuse allocas to avoid eating stack space.

view this post on Zulip Folkert de Vries (Dec 10 2023 at 16:59):

yeah, but if you do that here you get a very odd program with just tons of allocas

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 17:12):

Why would you have more allocas? I'm just saying if the join point arg is already an alloca, reuse the alloca. So track when creating a join point generate all allocas right before the join point instead of in each branch jumping into the alloca

view this post on Zulip Folkert de Vries (Dec 10 2023 at 19:56):

I meant this as an observation. e.g. for your Bug.roc program from that example, I get

  %incrementing_element_loop_load.i300 = alloca %str.RocStr, align 8, !dbg !159
  %incrementing_element_loop_load.i277 = alloca %str.RocStr, align 8, !dbg !189
  %incrementing_element_loop_load.i254 = alloca %str.RocStr, align 8, !dbg !193
  %incrementing_element_loop_load.i231 = alloca %str.RocStr, align 8, !dbg !197
  %incrementing_element_loop_load.i213 = alloca %str.RocStr, align 8, !dbg !199
  %str_alloca.i.i18.i.i.i = alloca %str.RocStr, align 8
  %result_value.i9.i.i.i = alloca %str.RocStr, align 8
  %const_str_store2.i.i.i = alloca %str.RocStr, align 8
  %result_value1.i.i.i = alloca %str.RocStr, align 8
  %result_value.i.i9.i = alloca %str.RocStr, align 8
  %const_str_store.i.i.i = alloca %str.RocStr, align 8
  %result_value1.i211 = alloca %str.RocStr, align 8, !dbg !249
  %result_value.i212 = alloca %str.RocStr, align 8, !dbg !249
  %const_str_store.i = alloca %str.RocStr, align 8, !dbg !249
  %struct_field.i = alloca %str.RocStr, align 8, !dbg !249
  %incrementing_element_loop_load.i188 = alloca %str.RocStr, align 8, !dbg !253
  %incrementing_element_loop_load.i = alloca %str.RocStr, align 8, !dbg !258
  %element_to_pass_as_opaque.i.i = alloca %str.RocStr, align 8, !dbg !260
  %str_alloca.i.i19.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8
  %str_alloca.i.i8.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8
  %result_value.i9.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8
  %str_alloca.i.i.i.i.i.i.i.i.i.i.i.i.i154 = alloca %str.RocStr, align 8
  %result_value.i.i.i.i.i.i.i.i.i.i.i.i155 = alloca %str.RocStr, align 8
  %const_str_store2.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8
  %result_value1.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8
  %result_value.i.i.i.i.i.i.i.i.i.i.i156 = alloca %str.RocStr, align 8
  %const_str_store.i.i.i.i.i.i.i.i.i.i.i157 = alloca %str.RocStr, align 8
  %result_value.i.i.i.i.i.i.i.i.i162 = alloca %str.RocStr, align 8, !dbg !267
  %const_str_store.i.i.i.i.i.i.i.i.i163 = alloca %str.RocStr, align 8, !dbg !267
  %struct_field.i.i.i.i.i.i.i.i.i164 = alloca %str.RocStr, align 8, !dbg !267
  %result_value.i.i.i.i.i.i.i.i166 = alloca %str.RocStr, align 8, !dbg !271
  %str_alloca.i.i.i.i.i.i.i167 = alloca %str.RocStr, align 8
  %const_str_store2.i.i.i.i.i170 = alloca %str.RocStr, align 8
  %result_value1.i.i.i.i.i171 = alloca %str.RocStr, align 8
  %const_str_store.i.i.i.i.i173 = alloca %str.RocStr, align 8
  %struct_field1.i.i.i = alloca %str.RocStr, align 8, !dbg !272
  %result_value.i3.i.i177 = alloca %str.RocStr, align 8, !dbg !272
  %struct_field.i.i.i178 = alloca %str.RocStr, align 8, !dbg !272
  %result_value1.i.i181 = alloca %str.RocStr, align 8
  %result_value.i.i182 = alloca { %str.RocStr, %str.RocStr }, align 8
  %list_alloca.i133 = alloca %list.RocList, align 8
  %list_alloca.i = alloca %list.RocList, align 8
  %str_alloca.i.i10.i.i.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8
  %result_value.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !273
  %str_alloca.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8
  %const_str_store4.i.i.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !274
  %result_value3.i.i.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !274
  %const_str_store.i.i.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !274
  %str_alloca.i.i20.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8
  %result_value.i10.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8
  %const_str_store3.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !275
  %result_value2.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !275
  %result_value.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !275
  %const_str_store.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !275
  %load_element.i.i.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !275
  %result_value.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !279
  %const_str_store.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !279
  %struct_field.i.i.i.i.i.i.i.i.i = alloca %str.RocStr, align 8, !dbg !279
  %result_value.i.i.i.i.i.i.i.i = alloca { [0 x i64], [3 x i64], i8, [7 x i8] }, align 8, !dbg !280
  %str_alloca.i.i.i.i.i.i.i = alloca %str.RocStr, align 8
  %const_str_store2.i.i.i.i.i = alloca %str.RocStr, align 8
  %result_value1.i.i.i.i.i = alloca %str.RocStr, align 8
  %const_str_store.i.i.i.i.i = alloca %str.RocStr, align 8
  %struct_field.i.i.i = alloca %str.RocStr, align 8, !dbg !281
  %result_value1.i.i = alloca %str.RocStr, align 8
  %result_value21 = alloca %str.RocStr, align 8, !dbg !282
  %result_value15 = alloca %str.RocStr, align 8, !dbg !282
  %const_str_store = alloca %str.RocStr, align 8, !dbg !282
  %result_value14 = alloca %str.RocStr, align 8, !dbg !282
  %result_value8 = alloca %str.RocStr, align 8, !dbg !282
  %result_value = alloca %str.RocStr, align 8, !dbg !282

view this post on Zulip Folkert de Vries (Dec 10 2023 at 19:57):

we should not need this number of allocas for that example, really

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 19:57):

Ah

view this post on Zulip Folkert de Vries (Dec 10 2023 at 20:11):

this problem is much worse with dbgs

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 20:16):

Can we tell llvm to free allocas? Like make it so that at the jump we clear all allocas and only keep more global things?

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 20:17):

Cause even if we make too many allocas, they should be cheap if we don't stack overflow

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 20:17):

Feels like a lifetime issue of locals living too long

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 20:18):

Aside, have you seen: https://github.com/roc-lang/roc/issues/6139

It feels related. Allocas and bad lifetime assumptions leading to overwriting data

view this post on Zulip Folkert de Vries (Dec 10 2023 at 21:35):

yes these all seem related

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 22:03):

So, may not be the right solution in this case, but llvm.stacksave and llvm.stackrestore may be what we need.

The idea would be:

  1. All joinpoint args are stored either by value or in allocas that are defined before the joinpoint.
  2. At the start of the joined code, we llvm.stacksave
  3. We use allocas in the function and locally define as normal.
  4. All final values for the joinpoint are copied into the allocas defined above the joinpoint
  5. We llvm.stackrestore.
  6. We jump to the joinpoint.

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 22:05):

stack save and restore just save and restore the stack pointer. They don't do anything fancy.

view this post on Zulip Folkert de Vries (Dec 10 2023 at 23:26):

well if you do the copying anyway, you can just define all of the allocas in the entry block

view this post on Zulip Folkert de Vries (Dec 10 2023 at 23:27):

there is no reason to define them locally. But in the example case we'd need an extra alloca to keep both the old and new value alive till the jump

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 23:27):

Just define the allocas for the join point args in the entry. Not all allocas

view this post on Zulip Folkert de Vries (Dec 10 2023 at 23:28):

why not?

view this post on Zulip Folkert de Vries (Dec 10 2023 at 23:29):

if you define any allocas in the loop you grow the stack. There is no need to define them at the use site, you can have them predefined in the entry block

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 23:29):

Nested joinpoints and functions that aren't inlined yet.

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 23:30):

Wouldn't that mess up the allocas that need to be defined in the entry block to avoid stack growth?

view this post on Zulip Folkert de Vries (Dec 10 2023 at 23:30):

I don't think nested join points are a problem if you define in the entry block. I believe llvm is smart about floating allocas when inlining

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 23:30):

Cause theoretically in the nested case all need to defined before the first joinpoint

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 23:31):

Oh, llvm will fix. Cool

view this post on Zulip Folkert de Vries (Dec 10 2023 at 23:31):

that has been my experience if the allocas are in the entry point

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 23:31):

that is nicer than doing stack save and restore

view this post on Zulip Folkert de Vries (Dec 10 2023 at 23:31):

we'll need to validate that properly now though. Certainly if you define the alloca in a weird place LLVM will not float it. So in that case inlining can cause stack overflows

view this post on Zulip Folkert de Vries (Dec 10 2023 at 23:32):

hmm actually I may have run into that in rust

view this post on Zulip Folkert de Vries (Dec 10 2023 at 23:32):

yeah I'd rather not mess with the stack pointer manually

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 23:32):

Fair

view this post on Zulip Brendan Hansknecht (Dec 10 2023 at 23:33):

Ok. So I guess goal one would be hoist alloca out of joinpoints and ensure uniqueness of joinpoint args so they don't get accidentally overwritten?

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 01:34):

@Folkert de Vries just curious, is this something you will have time to look at and hopefully fix?

view this post on Zulip Folkert de Vries (Dec 11 2023 at 08:16):

yeah, I've done some investigation but no clear solution presented itself so far. I think it's a bunch of issues combining into one

view this post on Zulip Folkert de Vries (Dec 11 2023 at 08:17):

we also just have some allocas that happen in the non-entry block which should be lifted to the entry block


Last updated: Jul 06 2025 at 12:14 UTC