Help optimising my BF interpreter · beginners

I've written a BF interpreter in Roc that works (in that it runs scripts correctly), but it's incredibly slow. I picked the first implemtation that I could find online (one that's written in Go) and benchmarked it against mine, it was 9000 times faster on a benchmark script. :sweat_smile:

Here's a link to the repo: https://github.com/Hasnep/brainroc the interpreter is in src/Interpreter.roc. I'm pretty new to FP, so I'm sure my algorithm is very inefficient. It's very recursive which might be causing some problems, I tried rewriting it using List.walk insread of recursion, but I struggled with how to handle loops.

Any suggestions to optimise or just improve overall code quality would be very helpful, thanks!

Kiryl Dziamura (Jul 13 2023 at 09:51):

A bit tangential (because it's not about FP but Rust implementation of BF interpreter/compiler), but maybe it can give some insights. Brilliant series of articles (adapted from C++ but anyway)
https://rodrigodd.github.io/

Hannes (Jul 13 2023 at 10:13):

Thanks @Kiryl Dziamura, I'm reading the first post now and it's really cleanly explained, I'm glad to see some of the same ideas I've had appearing there. That'll be really useful when I get round to some more fine grained optimisations, but I think I still have some more fundamental problems with my code at the moment that I want to tackle first :sweat_smile:

Kiryl Dziamura (Jul 13 2023 at 10:23):

is not tail recursive so roc compiler cannot apply the optimization. But I didn't check it

Folkert de Vries (Jul 13 2023 at 10:24):

Folkert de Vries (Jul 13 2023 at 10:25):

but I'd need to check it too. Another potential problem could be if the input program list is not unique (and hence cloned a bunch)

Kiryl Dziamura (Jul 13 2023 at 10:26):

Folkert de Vries (Jul 13 2023 at 10:26):

Kiryl Dziamura (Jul 13 2023 at 10:28):

Brendan Hansknecht (Jul 13 2023 at 14:43):

Notification Bot (Jul 13 2023 at 19:39):

Logan Lowder (Jul 13 2023 at 19:40):

Hannes (Jul 14 2023 at 01:29):

Here's the script I used to benchmark the interpreters, run from the root of the brainroc repo:

# Download benchmark script
curl -C - https://raw.githubusercontent.com/fabianishere/brainfuck/master/examples/bench/bench-1.bf --output examples/bench/bench-1.bf --create-dirs

# Build Go interpreter
git clone https://github.com/kgabis/brainfuck-go.git /tmp/bf-go
pushd /tmp/bf-go
go build bf.go
popd
cp /tmp/bf-go/bf ./bf-go

# Build Roc interpreter
roc build --optimize src/main.roc

# Benchmark the two interpreters
SCRIPT=examples/bench/bench-1.bf hyperfine "./bf-go $SCRIPT" "./src/brainroc $SCRIPT"

Benchmark #1: ./bf-go examples/bench/bench-1.bf
  Time (mean ± σ):       1.2 ms ±   0.2 ms    [User: 1.1 ms, System: 0.4 ms]
  Range (min … max):     0.8 ms …   2.1 ms    1378 runs

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

Benchmark #2: ./src/brainroc examples/bench/bench-1.bf
  Time (mean ± σ):     11.128 s ±  0.040 s    [User: 11.123 s, System: 0.002 s]
  Range (min … max):   11.073 s … 11.182 s    10 runs

Summary
  './bf-go examples/bench/bench-1.bf' ran
 9343.95 ± 1669.68 times faster than './src/brainroc examples/bench/bench-1.bf'

Brendan Hansknecht (Jul 14 2023 at 17:46):

So definitely a uniqueness issue. There is something you would hope never gets copied, but it is getting copied (probably every iteration of run)

Brendan Hansknecht (Jul 14 2023 at 17:48):

Hannes (Jul 15 2023 at 02:09):

Thank you @Brendan Hansknecht, that's a very useful place for me to start finding some performance improvements

Brendan Hansknecht (Jul 15 2023 at 02:32):

Hopefully it helps. I tried a few things I thought would help this morning and got quite confused when none of them worked.

Brendan Hansknecht (Jul 15 2023 at 02:33):

Folkert de Vries (Jul 16 2023 at 15:13):

I'm looking into this a bit now. Bunch of things going on. First of all, combining all of the state into one record does not play well with our RC mechanism.

Folkert de Vries (Jul 16 2023 at 15:14):

inlining the helpers and splitting out some fields brings me down to ~ 4s. Still terrible, but a bit better

Folkert de Vries (Jul 16 2023 at 15:16):

Folkert de Vries (Jul 16 2023 at 15:17):

by working around some increments/decrements with hacks (exposing unsafe functions) I can get it down to 1s ish. Still terrible though

Folkert de Vries (Jul 16 2023 at 15:45):

@Brendan Hansknecht how did you figure out that the majority of the time is spent in memcpy? I still suspect that that is going on, but I don't see it in my perf output, and also we're not clearly cloning very long lists (using the zig code we have for that, anyway)

Brendan Hansknecht (Jul 16 2023 at 15:49):

So I am not 100% sure I was correct cause in release build with the legacy link the stack got cut off too early for me to be fully sure. With debug build and legacy linker, profile seemed similar but the stack would go deeper. In that case, almost all time at the deepest stack level was in the new Facebook memcpy assembly code.

So not complete convinced or sure but seemed reasonable. That said, given none of my optimization tests really worked, I am definitely confused and unsure.

Folkert de Vries (Jul 16 2023 at 15:49):

ah, I guess I'm working with a platform without debug info, because I see none of that

Brendan Hansknecht (Jul 16 2023 at 15:52):

Brendan Hansknecht (Jul 16 2023 at 15:53):

Also, had to change the app not to use the args module cause it wasn't working on my Linux machine for some reason

Folkert de Vries (Jul 16 2023 at 18:10):

well I mean it brings down memory consumption, but for the runtime it's not a huge improvement

Hannes (Jul 17 2023 at 01:36):

Wow, thanks for looking into this @Folkert de Vries and @Brendan Hansknecht :) I'm not able to contribute directly to the compiler much, so I'm glad my bad code can help improve it indirectly :sweat_smile:

Hannes (Jul 17 2023 at 01:36):

I also had trouble with the argument parsing on Linux, I think the legacy linker messes up the commandline arguments passed to the program so that the binary isn't the first argument.

Hannes (Jul 17 2023 at 01:38):

I started re-writing the interpreter to try to not pass around such a big struct all the time, if I get it working I'll update this thread with the benchmark

Anton (Jul 17 2023 at 08:58):

Ayaz Hafiz (Jul 18 2023 at 19:58):

I am taking a look at this, and one thing that is significantly different between the two interpreters is that the go version does not use a recursive datatype for representing the ast nodes, whereas the Roc version does (https://github.com/Hasnep/brainroc/blob/f54d9ee71332e36ea4342abb2054bc80fb0aa484/src/Parser.roc#L3-L10). I suspect if you can eliminate this, the runtime will drop significantly, since the recursive structure forces a ton of allocations (even with Folkert's change)

Folkert de Vries (Jul 19 2023 at 12:05):

Folkert de Vries (Jul 19 2023 at 12:16):

replacing the lookup with a version of get that does not perform inc/dec halves the execution time

    if pc < List.len instructions then
        { operator, operand } = List.getUnsafe instructions pc

Folkert de Vries (Jul 19 2023 at 12:19):

replacing the other List.get operations with their unsafe counterparts halves execution time again.

Richard Feldman (Jul 19 2023 at 13:14):

Folkert de Vries (Jul 19 2023 at 13:19):

Richard Feldman (Jul 19 2023 at 13:22):

this might be a good place to try an idea I once had that seems silly but maybe might help: List.getWrap : List a, Nat -> Result a [ListWasEmpty] - same as List.get except:

Folkert de Vries (Jul 19 2023 at 13:23):

Richard Feldman (Jul 19 2023 at 13:23):

the hypothesis is that maybe the length = 0 branch is easier for llvm to optimize away in some scenarios than the comparson to a variable #

Richard Feldman (Jul 19 2023 at 13:24):

Folkert de Vries (Jul 19 2023 at 15:49):

Richard Feldman (Jul 19 2023 at 16:26):

Richard Feldman (Jul 19 2023 at 16:27):

Folkert de Vries (Jul 19 2023 at 16:43):

yes that is the crucial difference. getUnsafe does not do any inc/dec operations on the list

Folkert de Vries (Jul 19 2023 at 16:44):

thread '<unnamed>' panicked at 'function symbol `#UserApp.incrementPointer` not in set LambdaSet { set: [], args: [InLayout(U64), InLayout(48), InLayout(U64), InLayout(LIST_U8), InLayout(U64), InLayout(LIST_U8)], ret: InLayout(LIST_U8), representation: InLayout(VOID), full_layout: InLayout(62) }', crates/compiler/mono/src/layout.rs:1576:9

Folkert de Vries (Jul 19 2023 at 18:40):

Folkert de Vries (Jul 19 2023 at 18:44):

define internal fastcc %list.RocList @"#UserApp_incrementPointer_3bbacd33228bca14fe5573efe7278cde33c78fe9028ba98810cff368dece"(i64 %pc, %list.RocList %instructions, i64 %dataPtr, %list.RocList %data, i64 %"108", %list.RocList %output) {
entry:
  %call = tail call fastcc i64 @Num_addWrap_c852b6d75d2364d70d094699f8a9cda9129d5310ed82ea45564f47a9(i64 %pc, i64 1)
  %call1 = tail call fastcc i64 @Num_addWrap_c852b6d75d2364d70d094699f8a9cda9129d5310ed82ea45564f47a9(i64 %dataPtr, i64 1)
  %call2 = tail call fastcc %list.RocList @"#UserApp_next_4afec3b1b615e34b46f852dc4576722a03d82d96cc27deb38d7b350ecaf31"(i64 %call, %list.RocList %instructions, i64 %call1, %list.RocList %data, %list.RocList %output)
  ret %list.RocList %call2
}

000000000011f110 <#UserApp_incrementPointer_3bbacd33228bca14fe5573efe7278cde33c78fe9028ba98810cff368dece>:
  11f110:   55                      push   %rbp
  11f111:   41 57                   push   %r15
  ...
  11f16a:   ff 74 24 68             pushq  0x68(%rsp)
  11f16e:   e8 8d 0b 00 00          callq  11fd00 <#UserApp_next_4afec3b1b615e34b46f852dc4576722a03d82d96cc27deb38d7b350ecaf31>
  11f173:   48 83 c4 38             add    $0x38,%rsp
  11f177:   5b                      pop    %rbx
  11f178:   41 5c                   pop    %r12
  11f17a:   41 5d                   pop    %r13
  11f17c:   41 5e                   pop    %r14
  11f17e:   41 5f                   pop    %r15
  11f180:   5d                      pop    %rbp
  11f181:   c3                      retq
  11f182:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  11f189:   00 00 00
  11f18c:   0f 1f 40 00             nopl   0x0(%rax)

In clang you can force tail calls with musttail. We may need something equivalent to force LLVM to apply the optimization here

Folkert de Vries (Jul 19 2023 at 18:45):

llvm has the musttail attribute, but in mono we can't really know whether applying it is correct.

Brendan Hansknecht (Jul 19 2023 at 18:46):

Oh, we're doing recursive list refcount updates to all of the nested AST notes inside of blocks?

Folkert de Vries (Jul 19 2023 at 18:48):

in the original code, yes. Not any more in the version that I'm talking about (which without cheats still takes 4 seconds)

Folkert de Vries (Jul 19 2023 at 19:00):

wow with optimizations LLVM messes this up completely. it does a bunch of inlining, but cannot turn it into a loop, instead just creating a recursive function that blows the stack very quickly

Qqwy / Marten (Jul 19 2023 at 20:17):

@Folkert de Vries What happens if you try to make the tail call extremely obvious to LLVM by making the instruction functions State -> State functions (where State == (Nat, List Instruction, Nat, List U8, List U8)) and turning next into something along the lines of

run : State -> List U8
run = \(pc, instructions, dataPtr, data, output) ->
    if pc < List.len instructions then
        nextInstruction = List.getUnsafe instructions pc
        f = lookup nextInstruction.operator
        next_state = f pc instructions dataPtr data nextInstruction.operand output
        run next_state
    else
        output

Folkert de Vries (Jul 19 2023 at 20:19):

Ayaz Hafiz (Jul 20 2023 at 14:05):

Ayaz Hafiz (Jul 20 2023 at 14:14):

./bf.go:51:17: input does not escape
./bf.go:53:19: make([]uint16, 0) does not escape
./bf.go:73:27: &errors.errorString{...} escapes to heap
./bf.go:85:25: &errors.errorString{...} escapes to heap
./bf.go:90:17: program does not escape
./bf.go:91:14: make([]int16, data_size) escapes to heap
./bf.go:93:27: new(bufio.Reader) does not escape
./bf.go:93:27: make([]byte, bufio.size) escapes to heap

Folkert de Vries (Jul 20 2023 at 14:16):

Ayaz Hafiz (Jul 20 2023 at 14:20):

yeah. For program and input, if we had that, presumably we could eliminate all RC

Ayaz Hafiz (Jul 20 2023 at 14:20):

Folkert de Vries (Jul 20 2023 at 14:22):

well you could eliminate RC, but you'd also need to know that the data always says unique

Folkert de Vries (Jul 20 2023 at 14:23):

which morphic might be able to tell us statically sometimes, but in general the refcount is also crucial for us to allow in-place updates

Ayaz Hafiz (Jul 20 2023 at 14:25):

if it doesn't escape in the function we don't need to bump refcounts on each loop

Folkert de Vries (Jul 20 2023 at 14:25):

Folkert de Vries (Jul 20 2023 at 14:26):

(I know it's not needed in this example, but in general RC is not just about memory safety in our case)

Folkert de Vries (Jul 20 2023 at 14:28):

likewise, to hoist the bounds check, we'd need to know that the list length is a constant

Ayaz Hafiz (Jul 20 2023 at 14:28):

Folkert de Vries (Jul 20 2023 at 14:28):

Folkert de Vries (Jul 20 2023 at 14:29):

you'd need to do it on the scope level here, and even then you'd need to check that there is no mutation I think?

Ayaz Hafiz (Jul 20 2023 at 14:44):

well in order to see it on a function level, we must also see it on a per-scope level?

Ayaz Hafiz (Jul 20 2023 at 14:45):

anyway maybe this is a red herring if we are still lower after removing refcounts

Folkert de Vries (Jul 20 2023 at 14:59):

yeah, I got that version down to ~400ms. In the original benchmarks go was at 1.2ms. (I did not run that locally though)

Folkert de Vries (Jul 20 2023 at 15:45):

does anyone have an idea for how the musttail attribute is actually added? https://llvm.org/docs/LangRef.html#id328. there is a special fucntion in the LLVM C api to add a normal tail call, but musttail does not really seem to be supported there

Folkert de Vries (Jul 20 2023 at 15:46):

Folkert de Vries (Jul 20 2023 at 18:02):

lol wth, so there is tailcc which will perform tail calls when a call is in tail position. But llvm "optimizes" the code in such a way that my syntactic tail calls are no longer tail calls

Folkert de Vries (Jul 20 2023 at 18:02):

and this, I think, is why I need to use musttail, but there does not seem to be a way to do it ...

Ayaz Hafiz (Jul 20 2023 at 23:22):

Ayaz Hafiz (Jul 20 2023 at 23:23):

Ayaz Hafiz (Jul 20 2023 at 23:24):

>++++++++[<+++++++++>-]<.>++++[<+++++++>-]<+.+++++++..+++.>>++++++[<+++++++>-]<+
+.------------.>++++++[<+++++++++>-]<+.<.+++.------.--------.>>>++++[<++++++++>-
]<+.

❯ hyperfine ./brainfuck-go/bf ./brainroc
Benchmark 1: ./brainfuck-go/bf
  Time (mean ± σ):       7.1 ms ±   0.9 ms    [User: 2.9 ms, System: 2.0 ms]
  Range (min … max):     5.6 ms …   9.8 ms    253 runs

Benchmark 2: ./brainroc
  Time (mean ± σ):       4.3 ms ±   0.7 ms    [User: 2.4 ms, System: 1.0 ms]
  Range (min … max):     3.3 ms …   8.1 ms    293 runs

Hannes (Jul 21 2023 at 02:00):

Wow @Ayaz Hafiz you're right! I tried the Go interpreter with the hello world program and some others like serpinski's triangle and it worked fine for all of them, but when I ran that benchmark script hyperfine supressed the output, so I didn't see that it was producing incorrect output!

Hannes (Jul 21 2023 at 02:48):

              COMMAND              │ MEAN [MS] │ MIN [MS] │ MAX [MS] │  RELATIVE
───────────────────────────────────┼───────────┼──────────┼──────────┼──────────────
  ./bf-go examples/hello.bf        │ 1.2 ± 0.1 │      0.8 │      1.8 │        1.00
  ./src/brainroc examples/hello.bf │ 1.3 ± 0.1 │      1.2 │      2.5 │ 1.12 ± 0.15

              COMMAND            │ MEAN [MS] │ MIN [MS] │ MAX [MS] │  RELATIVE
─────────────────────────────────┼───────────┼──────────┼──────────┼──────────────
  ./bf-go        triangle.bf     │ 3.0 ± 0.3 │      2.7 │      4.7 │        1.00
  ./src/brainroc triangle.bf     │ 9.1 ± 0.3 │      8.7 │     10.2 │ 3.07 ± 0.29

              COMMAND        │    MEAN [S]    │ MIN [S] │ MAX [S] │   RELATIVE
────────────────────────────┼────────────────┼─────────┼─────────┼───────────────
  ./bf-go        bench-2.bf │ 1.364 ± 0.026  │   1.348 │   1.413 │         1.00
  ./src/brainroc bench-2.bf │ 54.819 ± 0.135 │  54.579 │  55.057 │ 40.18 ± 0.76

Brendan Hansknecht (Jul 21 2023 at 03:15):

You may want to compare to another interpreter that works in all cases. Even if these produce the same results, the roc version may be doing more work/tracking more things. So the go version may have a structural advantage that leads to much faster execution. If it was handling whatever case it is missing that fails bench 1, it might slow down all other cases (not sure of this, but it is a pretty common benchmarking issue)

Hannes (Jul 21 2023 at 05:45):

Good point, I picked this Go implementation because it's the first straightforward implementation I found which is more comparable to mine than some of the bigger ones. I'll try a few and see how they compare, making sure to check the output is correct this time! :sweat_smile:

Folkert de Vries (Jul 21 2023 at 08:04):

alright, interesting. Here is my time for my "vanilla" interpreter for the bench-2 benchmark

Executed in   16.75 secs   fish           external 
   usr time   16.75 secs  598.00 micros   16.75 secs 
   sys time    0.00 secs   51.00 micros    0.00 secs

Executed in    1.27 secs   fish           external 
   usr time  1271.84 millis  571.00 micros  1271.27 millis 
   sys time    0.07 millis   66.00 micros    0.00 millis

so that would mean that without the overhead of the list operations, we're in the same order of magnitude at least

Qqwy / Marten (Jul 22 2023 at 14:19):

I wrote my own version of a BF interpreter.
On one hand to dust off my own Roc skills since it's been a while I programmed in Roc.
On the other because I've become a liiitle bit nerdsniped by the issue we are facing :sweat_smile:

Qqwy / Marten (Jul 22 2023 at 14:19):

(And besides this, it is very similar to Folkert's vanilla implementation, with no unsafe cheats)

Qqwy / Marten (Jul 22 2023 at 14:19):

It seemed like a good idea to be able to look at these two parts in isolation.
However... it results in a program that segfaults on larger BF programs (?!)

Qqwy / Marten (Jul 22 2023 at 14:20):

I've tried adding dbg statements, so the intermediate state after each executed instruction can be seen.
Oddly, running roc dev stops the program after 249 dbg lines are printed.

Qqwy / Marten (Jul 22 2023 at 14:20):

I'm very confused by this. It seems like the crash during dbg happens much earlier than the segfault if you don't run in dev mode.

Qqwy / Marten (Jul 22 2023 at 14:20):

How to continue debugging something like this?
Could it be the case that here too that something that is a tailcall in the source does not actually result in a tailcall in the binary
?

Brendan Hansknecht (Jul 22 2023 at 14:39):

Brendan Hansknecht (Jul 22 2023 at 14:40):

Tends to vary a bit by system, but no idea why it happens. Dbg is over shared memory with the compiler, so we probably have a bug there

Qqwy / Marten (Jul 22 2023 at 14:50):

I'm running it in lldb now. Seems like a call to roc dev ./brainroc ./examples/some_program.bf just exits with status code 0 after 249 iterations :confused:

Qqwy / Marten (Jul 22 2023 at 14:51):

Running the program compiled using roc build in lldb shows that the segfault happens inside malloc so it might be an out-of-memory issue maybe? :thinking:

Qqwy / Marten (Jul 22 2023 at 15:25):

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603ff0)
    frame #0: 0x000000010000d6f4 brainroc`decrement_refcounted_ptr_8
brainroc`decrement_refcounted_ptr_8:
->  0x10000d6f4 <+0>:  stp    x29, x30, [sp, #-0x10]!
    0x10000d6f8 <+4>:  ldr    x8, [x0]
    0x10000d6fc <+8>:  cbz    x8, 0x10000d71c           ; <+40>
    0x10000d700 <+12>: mov    x10, #-0x8000000000000000
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603ff0)
  * frame #0: 0x000000010000d6f4 brainroc`decrement_refcounted_ptr_8
    frame #1: 0x000000010000d8a0 brainroc`___lldb_unnamed_symbol333506 + 40
    frame #2: 0x000000010000a638 brainroc`List_get_53eef38977ca9e3af29e8b6fc9f50f557be9bbd173abd2118eb5488f19fb2 + 100
    frame #3: 0x00000001000095b4 brainroc`List_update_82d12b6b12b8099eecd17c66fbb6ef41a8137f8c1ef74478e7bee978be8f9 + 64
    frame #4: 0x000000010000a6e8 brainroc`#UserApp_dec_ca98df76e744faeef21fb76918295997c9c1b552a2e623d6fc162a11de8fae + 84
    frame #5: 0x0000000100007144 brainroc`#UserApp_runOne_5250a04b6d2518751b896097c7387fb10ff97b95f3e08a2795e163ade21ee + 240
    frame #6: 0x0000000100009f48 brainroc`#UserApp_run_618af7379dc962a8e685497132e67d82737289278bf8d7fadc9a49f2d03ce7 + 44
    frame #7: 0x0000000100003744 brainroc`#UserApp_36_789661f33c6ea1791479ecf1f52dd93e21b779364a5197d9de3459113903b9c + 68
    frame #8: 0x0000000100004380 brainroc`Task_39_5cbbff1635f59ae21a02af6cfe0157283a05fb77d9b6ef4377a9133a78ffbe5 + 164
    frame #9: 0x00000001000043d8 brainroc`Effect_effect_after_inner_935229af12e1c6ee752a73f7b73add5a7c7a22cfba9e577e778e240ed627a + 72
    frame #10: 0x000000010000b944 brainroc`Effect_effect_after_inner_1b6b9e2f2c8025d6941d1d79426973c1ba899598ef8eecc9bea3f5f3657b4477 + 84
    frame #11: 0x000000010000e2e8 brainroc`roc__mainForHost_0_caller + 12
    frame #12: 0x0000000100034d60 brainroc`host::call_the_closure::h2bc8f279b028add5 + 160
    frame #13: 0x0000000100034c9c brainroc`rust_main + 132
    frame #14: 0x000000010000f4d4 brainroc`main + 20
    frame #15: 0x00000001a79fff28 dyld`start + 2236

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603ff0)
    frame #0: 0x00000001a7b946b8 libsystem_malloc.dylib`nanov2_malloc_zero_on_alloc + 4
libsystem_malloc.dylib`nanov2_malloc_zero_on_alloc:
->  0x1a7b946b8 <+4>:  stp    x20, x19, [sp, #-0x20]!
    0x1a7b946bc <+8>:  stp    x29, x30, [sp, #0x10]
    0x1a7b946c0 <+12>: add    x29, sp, #0x10
    0x1a7b946c4 <+16>: add    x8, x1, #0xf
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603ff0)
  * frame #0: 0x00000001a7b946b8 libsystem_malloc.dylib`nanov2_malloc_zero_on_alloc + 4
    frame #1: 0x0000000100034134 brainroc`roc_alloc + 24
    frame #2: 0x0000000100009c40 brainroc`List_replaceUnsafe_8256d790c99390129cd6628d4d43bc44f55cfb83af722d8248666b192be24d65 + 176
    frame #3: 0x0000000100003584 brainroc`List_update_9aea80b84c2f1cf95fb346d143d7e8d50c7bf2b25952d5f8f5db3c06a669b1a + 104
    frame #4: 0x0000000100009a84 brainroc`#UserApp_inc_ab6bf482529beb271842e34a3922217532b2f57565d72aea510e9edbdcef6 + 84
    frame #5: 0x0000000100007158 brainroc`#UserApp_runOne_5250a04b6d2518751b896097c7387fb10ff97b95f3e08a2795e163ade21ee + 260
    frame #6: 0x0000000100009f48 brainroc`#UserApp_run_618af7379dc962a8e685497132e67d82737289278bf8d7fadc9a49f2d03ce7 + 44
    frame #7: 0x0000000100003744 brainroc`#UserApp_36_789661f33c6ea1791479ecf1f52dd93e21b779364a5197d9de3459113903b9c + 68
    frame #8: 0x0000000100004380 brainroc`Task_39_5cbbff1635f59ae21a02af6cfe0157283a05fb77d9b6ef4377a9133a78ffbe5 + 164
    frame #9: 0x00000001000043d8 brainroc`Effect_effect_after_inner_935229af12e1c6ee752a73f7b73add5a7c7a22cfba9e577e778e240ed627a + 72
    frame #10: 0x000000010000b944 brainroc`Effect_effect_after_inner_1b6b9e2f2c8025d6941d1d79426973c1ba899598ef8eecc9bea3f5f3657b4477 + 84
    frame #11: 0x000000010000e2e8 brainroc`roc__mainForHost_0_caller + 12
    frame #12: 0x0000000100034d60 brainroc`host::call_the_closure::h2bc8f279b028add5 + 160
    frame #13: 0x0000000100034c9c brainroc`rust_main + 132
    frame #14: 0x000000010000f4d4 brainroc`main + 20
    frame #15: 0x00000001a79fff28 dyld`start + 2236

Folkert de Vries (Jul 22 2023 at 15:29):

in the platform, you can make roc_alloc print e.g. how many bytes it wants to allocate. That might be helpful

Folkert de Vries (Jul 22 2023 at 15:30):

Richard Feldman (Jul 22 2023 at 15:46):

also something I've found often helpful is to put lldb backtraces and/or .ll bytecode into gpt4 and say what symptom I'm seeing and ask if it sees anything - I've had good success with that in the past!

Qqwy / Marten (Jul 22 2023 at 15:48):

This is the output you get after running a program compiled using a normal roc build which IIRC strips all dbg calls.
(To be precise, I run cargo run -- build --debug ../bf/main.roc; the roc directory containing the roc repo and the bf directory containing the bf interpreter project are sibling directories)

Qqwy / Marten (Jul 22 2023 at 16:21):

# ... many more similar lines here
Allocating 1008...
Ptr: 0x102904440
Deallocating 0x102904830
Allocating 1008...
Ptr: 0x102904830
Deallocating 0x102904440
Allocating 1008...
Ptr: 0x102904440
Deallocating 0x102904830
Allocating 1008...
Ptr: 0x102904830
Deallocating 0x102904440
Allocating 1008...
Ptr: 0x102904440
Deallocating 0x102904830
Allocating 1008...
Ptr: 0x102904830
Deallocating 0x102904440
Allocating 1008...
Ptr: 0x102904440
Deallocating 0x102904830
Allocating 1008...
Ptr: 0x102904830
Deallocating 0x102904440
Allocating 930...
Ptr: 0x102904440
Deallocating 0x102904f70
Allocating 1008...
Ptr: 0x102904c20
Deallocating 0x102904830
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904c20
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Process 41816 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fd0)
    frame #0: 0x00000001008c1f8c brainroc`_$LT$std..io..stdio..StdoutLock$u20$as$u20$std..io..Write$GT$::write_all::hd86c4a41d5107153 at stdio.rs:735 [opt]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fd0)
  * frame #0: 0x00000001008c1f8c brainroc`_$LT$std..io..stdio..StdoutLock$u20$as$u20$std..io..Write$GT$::write_all::hd86c4a41d5107153 at stdio.rs:735 [opt]
    frame #1: 0x00000001008c4af8 brainroc`_$LT$std..io..Write..write_fmt..Adapter$LT$T$GT$$u20$as$u20$core..fmt..Write$GT$::write_str::h968df0371a4b9768 at mod.rs:1671:23 [opt]
    frame #2: 0x00000001009197a0 brainroc`core::fmt::Formatter::pad_integral::hb214d8a54e5b3e5f at mod.rs:0 [opt]
    frame #3: 0x000000010091bb00 brainroc`core::fmt::pointer_fmt_inner::hb0b335e6a5beaa1d [inlined] core::fmt::num::GenericRadix::fmt_int::hd198a0fe668a9d31 at num.rs:110:9 [opt]
    frame #4: 0x000000010091ba98 brainroc`core::fmt::pointer_fmt_inner::hb0b335e6a5beaa1d [inlined] core::fmt::num::_$LT$impl$u20$core..fmt..LowerHex$u20$for$u20$usize$GT$::fmt::h338abe56eabf0dc0 at num.rs:155:17 [opt]
    frame #5: 0x000000010091ba98 brainroc`core::fmt::pointer_fmt_inner::hb0b335e6a5beaa1d at mod.rs:2503:15 [opt]
    frame #6: 0x0000000100822a88 brainroc`_$LT$$BP$const$u20$T$u20$as$u20$core..fmt..Pointer$GT$::fmt::h88e433855bb7f807(self=0x000000016f6041b8, f=0x000000016f604200) at mod.rs:2475:9
    frame #7: 0x0000000100822a48 brainroc`_$LT$$BP$mut$u20$T$u20$as$u20$core..fmt..Pointer$GT$::fmt::heda03ebd385f5e29(self=0x000000016f6043f0, f=0x000000016f604200) at mod.rs:2514:9
    frame #8: 0x0000000100822a0c brainroc`_$LT$$BP$mut$u20$T$u20$as$u20$core..fmt..Debug$GT$::fmt::hc2e87981d6756b50(self=0x000000016f6043f0, f=0x000000016f604200) at mod.rs:2543:9
    frame #9: 0x0000000100919434 brainroc`core::fmt::write::hd33da007f7a27e39 at mod.rs:1208:17 [opt]
    frame #10: 0x00000001008c1ab8 brainroc`_$LT$$RF$std..io..stdio..Stdout$u20$as$u20$std..io..Write$GT$::write_fmt::h881b8d60b2402de3 [inlined] std::io::Write::write_fmt::hc2f28f6e637fac24 at mod.rs:1682:15 [opt]
    frame #11: 0x00000001008c1a90 brainroc`_$LT$$RF$std..io..stdio..Stdout$u20$as$u20$std..io..Write$GT$::write_fmt::h881b8d60b2402de3 at stdio.rs:716:9 [opt]
    frame #12: 0x00000001008c38dc brainroc`std::io::stdio::_print::h8769d5b8d07b47a7 [inlined] _$LT$std..io..stdio..Stdout$u20$as$u20$std..io..Write$GT$::write_fmt::h687f60646da13c0c at stdio.rs:690:9 [opt]
    frame #13: 0x00000001008c38b8 brainroc`std::io::stdio::_print::h8769d5b8d07b47a7 [inlined] std::io::stdio::print_to::h68985b56676cff12 at stdio.rs:1008:21 [opt]
    frame #14: 0x00000001008c387c brainroc`std::io::stdio::_print::h8769d5b8d07b47a7 at stdio.rs:1075:5 [opt]
    frame #15: 0x00000001000265d8 brainroc`roc_alloc(size=1008, _alignment=8) at lib.rs:43:5
    frame #16: 0x000000010000ae14 brainroc`List_replaceUnsafe_dbbb614026929029a924a622e5a645206e5e1277bd8c25cb7b78527df1a8c + 176
    frame #17: 0x000000010000a780 brainroc`List_update_b7f585fd6f82cd9d725cedb302bc1875b5e7bb83a5ac59033a7d4152c1447 + 104
    frame #18: 0x000000010000b874 brainroc`#UserApp_dec_ca98df76e744faeef21fb76918295997c9c1b552a2e623d6fc162a11de8fae + 84
    frame #19: 0x000000010000830c brainroc`#UserApp_runOne_5250a04b6d2518751b896097c7387fb10ff97b95f3e08a2795e163ade21ee + 240
    frame #20: 0x000000010000b11c brainroc`#UserApp_run_618af7379dc962a8e685497132e67d82737289278bf8d7fadc9a49f2d03ce7 + 44
    frame #21: 0x00000001000047c4 brainroc`#UserApp_36_66a53c5bb7482a7975e058b703753a62aee74dcb3ed8c2fbfc227399ea738e + 68
    frame #22: 0x0000000100005400 brainroc`Task_39_f2eb2e65858ef9a081c444e7b9b2cef1ed51b5a1e38027833034b9f057aa3131 + 164
    frame #23: 0x0000000100007e74 brainroc`Effect_effect_after_inner_3bfae27b50cc70419dec89ef8da341b1287d7bb7b3c4bb2481ba28b17a8ec4 + 72
    frame #24: 0x000000010000cd98 brainroc`Effect_effect_after_inner_4f1fdebc72623b7987dfbf57d7b81537b885c9e4ce317a63dbf262856adf + 84
    frame #25: 0x000000010000f3d4 brainroc`roc__mainForHost_0_caller + 12
    frame #26: 0x0000000100027198 brainroc`host::call_the_closure::h337ad65c516ada9b(closure_data_ptr="A BF interpreter.") at lib.rs:234:5
    frame #27: 0x0000000100027110 brainroc`rust_main at lib.rs:223:9
    frame #28: 0x0000000100010554 brainroc`main + 20
    frame #29: 0x00000001a79fff28 dyld`start + 2236
(lldb)

The size of the allocations is all as large as the dataSize (+ 8 for the refcount I assume). So I think the program is not changing the data list in place.
However, what is strange is that it looks to me like these allocations are all short-lived. And the error we get is not an OOM error but a 'BAD ACCESS' error

Qqwy / Marten (Jul 22 2023 at 16:24):

Counting how many allocations are alive by keeping track in a local static variable, shows that there are only 4 allocs alive during the main inner loop (and during the time of the segfault)

Stream: beginners

Topic: Help optimising my BF interpreter

Hannes (Jul 13 2023 at 08:17):

Kiryl Dziamura (Jul 13 2023 at 09:51):

Hannes (Jul 13 2023 at 10:13):

Kiryl Dziamura (Jul 13 2023 at 10:23):

Folkert de Vries (Jul 13 2023 at 10:24):

Folkert de Vries (Jul 13 2023 at 10:25):

Kiryl Dziamura (Jul 13 2023 at 10:26):

Folkert de Vries (Jul 13 2023 at 10:26):

Kiryl Dziamura (Jul 13 2023 at 10:28):

Brendan Hansknecht (Jul 13 2023 at 14:43):

Notification Bot (Jul 13 2023 at 19:39):

Notification Bot (Jul 13 2023 at 19:39):

Logan Lowder (Jul 13 2023 at 19:40):

Hannes (Jul 14 2023 at 01:29):

Brendan Hansknecht (Jul 14 2023 at 17:46):

Brendan Hansknecht (Jul 14 2023 at 17:46):

Brendan Hansknecht (Jul 14 2023 at 17:48):

Hannes (Jul 15 2023 at 02:09):

Brendan Hansknecht (Jul 15 2023 at 02:32):

Brendan Hansknecht (Jul 15 2023 at 02:33):

Folkert de Vries (Jul 16 2023 at 15:13):

Folkert de Vries (Jul 16 2023 at 15:14):

Folkert de Vries (Jul 16 2023 at 15:16):

Folkert de Vries (Jul 16 2023 at 15:17):

Folkert de Vries (Jul 16 2023 at 15:45):

Brendan Hansknecht (Jul 16 2023 at 15:49):

Folkert de Vries (Jul 16 2023 at 15:49):

Brendan Hansknecht (Jul 16 2023 at 15:52):

Brendan Hansknecht (Jul 16 2023 at 15:53):

Folkert de Vries (Jul 16 2023 at 18:10):

Folkert de Vries (Jul 16 2023 at 18:10):

Hannes (Jul 17 2023 at 01:36):

Hannes (Jul 17 2023 at 01:36):

Hannes (Jul 17 2023 at 01:38):

Anton (Jul 17 2023 at 08:58):

Anton (Jul 17 2023 at 08:58):

Ayaz Hafiz (Jul 18 2023 at 19:58):

Folkert de Vries (Jul 19 2023 at 12:05):

Folkert de Vries (Jul 19 2023 at 12:16):

Folkert de Vries (Jul 19 2023 at 12:19):

Richard Feldman (Jul 19 2023 at 13:14):

Folkert de Vries (Jul 19 2023 at 13:19):

Folkert de Vries (Jul 19 2023 at 13:19):

Richard Feldman (Jul 19 2023 at 13:22):

Folkert de Vries (Jul 19 2023 at 13:23):

Richard Feldman (Jul 19 2023 at 13:23):

Richard Feldman (Jul 19 2023 at 13:24):

Folkert de Vries (Jul 19 2023 at 15:49):

Richard Feldman (Jul 19 2023 at 16:26):

Richard Feldman (Jul 19 2023 at 16:27):

Folkert de Vries (Jul 19 2023 at 16:43):

Folkert de Vries (Jul 19 2023 at 16:44):

Folkert de Vries (Jul 19 2023 at 18:40):

Folkert de Vries (Jul 19 2023 at 18:44):

Folkert de Vries (Jul 19 2023 at 18:45):

Brendan Hansknecht (Jul 19 2023 at 18:46):

Folkert de Vries (Jul 19 2023 at 18:48):

Folkert de Vries (Jul 19 2023 at 19:00):

Qqwy / Marten (Jul 19 2023 at 20:17):

Folkert de Vries (Jul 19 2023 at 20:19):

Ayaz Hafiz (Jul 20 2023 at 14:05):

Ayaz Hafiz (Jul 20 2023 at 14:14):

Folkert de Vries (Jul 20 2023 at 14:16):

Ayaz Hafiz (Jul 20 2023 at 14:20):

Ayaz Hafiz (Jul 20 2023 at 14:20):

Folkert de Vries (Jul 20 2023 at 14:22):

Folkert de Vries (Jul 20 2023 at 14:23):

Ayaz Hafiz (Jul 20 2023 at 14:25):

Folkert de Vries (Jul 20 2023 at 14:25):

Folkert de Vries (Jul 20 2023 at 14:26):

Folkert de Vries (Jul 20 2023 at 14:28):

Ayaz Hafiz (Jul 20 2023 at 14:28):

Folkert de Vries (Jul 20 2023 at 14:28):

Folkert de Vries (Jul 20 2023 at 14:29):

Ayaz Hafiz (Jul 20 2023 at 14:44):

Ayaz Hafiz (Jul 20 2023 at 14:45):

Folkert de Vries (Jul 20 2023 at 14:59):

Folkert de Vries (Jul 20 2023 at 15:45):