Stream: beginners

Topic: Help optimising my BF interpreter


view this post on Zulip Hannes (Jul 13 2023 at 08:17):

Hello everyone :)

I've written a BF interpreter in Roc that works (in that it runs scripts correctly), but it's incredibly slow. I picked the first implemtation that I could find online (one that's written in Go) and benchmarked it against mine, it was 9000 times faster on a benchmark script. :sweat_smile:

Here's a link to the repo: https://github.com/Hasnep/brainroc the interpreter is in src/Interpreter.roc. I'm pretty new to FP, so I'm sure my algorithm is very inefficient. It's very recursive which might be causing some problems, I tried rewriting it using List.walk insread of recursion, but I struggled with how to handle loops.

Any suggestions to optimise or just improve overall code quality would be very helpful, thanks!

view this post on Zulip Kiryl Dziamura (Jul 13 2023 at 09:51):

A bit tangential (because it's not about FP but Rust implementation of BF interpreter/compiler), but maybe it can give some insights. Brilliant series of articles (adapted from C++ but anyway)
https://rodrigodd.github.io/

view this post on Zulip Hannes (Jul 13 2023 at 10:13):

Thanks @Kiryl Dziamura, I'm reading the first post now and it's really cleanly explained, I'm glad to see some of the same ideas I've had appearing there. That'll be really useful when I get round to some more fine grained optimisations, but I think I still have some more fundamental problems with my code at the moment that I want to tackle first :sweat_smile:

view this post on Zulip Kiryl Dziamura (Jul 13 2023 at 10:23):

My guess is that this function https://github.com/Hasnep/brainroc/blob/f54d9ee71332e36ea4342abb2054bc80fb0aa484/src/Interpreter.roc#L56-L77

is not tail recursive so roc compiler cannot apply the optimization. But I didn't check it

view this post on Zulip Folkert de Vries (Jul 13 2023 at 10:24):

it should be because we desugar |>

view this post on Zulip Folkert de Vries (Jul 13 2023 at 10:25):

but I'd need to check it too. Another potential problem could be if the input program list is not unique (and hence cloned a bunch)

view this post on Zulip Kiryl Dziamura (Jul 13 2023 at 10:26):

but would desugared call bubble up above the when?

view this post on Zulip Folkert de Vries (Jul 13 2023 at 10:26):

after desugaring the |>, the recursive call is in tail position, right?

view this post on Zulip Kiryl Dziamura (Jul 13 2023 at 10:28):

ah, yes, my bad

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 14:43):

Do you have a link to the original source as well?

Also, when I get the chance, I have some ideas, but want to verify them first.

view this post on Zulip Notification Bot (Jul 13 2023 at 19:39):

Logan Lowder has marked this topic as resolved.

view this post on Zulip Notification Bot (Jul 13 2023 at 19:39):

Logan Lowder has marked this topic as unresolved.

view this post on Zulip Logan Lowder (Jul 13 2023 at 19:40):

Whoops miss-click, sorry

view this post on Zulip Hannes (Jul 14 2023 at 01:29):

Here's the script I used to benchmark the interpreters, run from the root of the brainroc repo:

# Download benchmark script
curl -C - https://raw.githubusercontent.com/fabianishere/brainfuck/master/examples/bench/bench-1.bf --output examples/bench/bench-1.bf --create-dirs

# Build Go interpreter
git clone https://github.com/kgabis/brainfuck-go.git /tmp/bf-go
pushd /tmp/bf-go
go build bf.go
popd
cp /tmp/bf-go/bf ./bf-go

# Build Roc interpreter
roc build --optimize src/main.roc

# Benchmark the two interpreters
SCRIPT=examples/bench/bench-1.bf hyperfine "./bf-go $SCRIPT" "./src/brainroc $SCRIPT"

And this is the output I get:

Benchmark #1: ./bf-go examples/bench/bench-1.bf
  Time (mean ± σ):       1.2 ms ±   0.2 ms    [User: 1.1 ms, System: 0.4 ms]
  Range (min … max):     0.8 ms …   2.1 ms    1378 runs

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

Benchmark #2: ./src/brainroc examples/bench/bench-1.bf
  Time (mean ± σ):     11.128 s ±  0.040 s    [User: 11.123 s, System: 0.002 s]
  Range (min … max):   11.073 s … 11.182 s    10 runs

Summary
  './bf-go examples/bench/bench-1.bf' ran
 9343.95 ± 1669.68 times faster than './src/brainroc examples/bench/bench-1.bf'

view this post on Zulip Brendan Hansknecht (Jul 14 2023 at 17:46):

So fundamentally, the app is spending all its time in memcpy.

view this post on Zulip Brendan Hansknecht (Jul 14 2023 at 17:46):

So definitely a uniqueness issue. There is something you would hope never gets copied, but it is getting copied (probably every iteration of run)

view this post on Zulip Brendan Hansknecht (Jul 14 2023 at 17:48):

Haven't figured out the root cause of the coping yet.

view this post on Zulip Hannes (Jul 15 2023 at 02:09):

Thank you @Brendan Hansknecht, that's a very useful place for me to start finding some performance improvements

view this post on Zulip Brendan Hansknecht (Jul 15 2023 at 02:32):

Hopefully it helps. I tried a few things I thought would help this morning and got quite confused when none of them worked.

view this post on Zulip Brendan Hansknecht (Jul 15 2023 at 02:33):

Won't have time this weekend to look at it more

view this post on Zulip Folkert de Vries (Jul 16 2023 at 15:13):

I'm looking into this a bit now. Bunch of things going on. First of all, combining all of the state into one record does not play well with our RC mechanism.

view this post on Zulip Folkert de Vries (Jul 16 2023 at 15:14):

inlining the helpers and splitting out some fields brings me down to ~ 4s. Still terrible, but a bit better

view this post on Zulip Folkert de Vries (Jul 16 2023 at 15:16):

record updates (when some fields are reference-counted) are still a problem

view this post on Zulip Folkert de Vries (Jul 16 2023 at 15:17):

by working around some increments/decrements with hacks (exposing unsafe functions) I can get it down to 1s ish. Still terrible though

view this post on Zulip Folkert de Vries (Jul 16 2023 at 15:45):

@Brendan Hansknecht how did you figure out that the majority of the time is spent in memcpy? I still suspect that that is going on, but I don't see it in my perf output, and also we're not clearly cloning very long lists (using the zig code we have for that, anyway)

view this post on Zulip Brendan Hansknecht (Jul 16 2023 at 15:49):

So I am not 100% sure I was correct cause in release build with the legacy link the stack got cut off too early for me to be fully sure. With debug build and legacy linker, profile seemed similar but the stack would go deeper. In that case, almost all time at the deepest stack level was in the new Facebook memcpy assembly code.

So not complete convinced or sure but seemed reasonable. That said, given none of my optimization tests really worked, I am definitely confused and unsure.

view this post on Zulip Folkert de Vries (Jul 16 2023 at 15:49):

ah, I guess I'm working with a platform without debug info, because I see none of that

view this post on Zulip Brendan Hansknecht (Jul 16 2023 at 15:52):

I think I had to update the platform to get it to compile on my Linux machine.

view this post on Zulip Brendan Hansknecht (Jul 16 2023 at 15:53):

Also, had to change the app not to use the args module cause it wasn't working on my Linux machine for some reason

view this post on Zulip Folkert de Vries (Jul 16 2023 at 18:10):

I made this change https://github.com/roc-lang/roc/pull/5667. I don't think it really helps here, but I noticed this missed opportunity while looking at the IR

view this post on Zulip Folkert de Vries (Jul 16 2023 at 18:10):

well I mean it brings down memory consumption, but for the runtime it's not a huge improvement

view this post on Zulip Hannes (Jul 17 2023 at 01:36):

Wow, thanks for looking into this @Folkert de Vries and @Brendan Hansknecht :) I'm not able to contribute directly to the compiler much, so I'm glad my bad code can help improve it indirectly :sweat_smile:

view this post on Zulip Hannes (Jul 17 2023 at 01:36):

I also had trouble with the argument parsing on Linux, I think the legacy linker messes up the commandline arguments passed to the program so that the binary isn't the first argument.

view this post on Zulip Hannes (Jul 17 2023 at 01:38):

I started re-writing the interpreter to try to not pass around such a big struct all the time, if I get it working I'll update this thread with the benchmark

view this post on Zulip Anton (Jul 17 2023 at 08:58):

Hannes said:

I also had trouble with the argument parsing on Linux, I think the legacy linker messes up the commandline arguments passed to the program so that the binary isn't the first argument.

This is a known issue, but I haven't been able to investigate it deeply yet

view this post on Zulip Anton (Jul 17 2023 at 08:58):

See https://github.com/roc-lang/basic-cli/issues/82

view this post on Zulip Ayaz Hafiz (Jul 18 2023 at 19:58):

I am taking a look at this, and one thing that is significantly different between the two interpreters is that the go version does not use a recursive datatype for representing the ast nodes, whereas the Roc version does (https://github.com/Hasnep/brainroc/blob/f54d9ee71332e36ea4342abb2054bc80fb0aa484/src/Parser.roc#L3-L10). I suspect if you can eliminate this, the runtime will drop significantly, since the recursive structure forces a ton of allocations (even with Folkert's change)

posting some other observations in #compiler development (public) > bf interpreter analysis

view this post on Zulip Folkert de Vries (Jul 19 2023 at 12:05):

here is a more direct port of the go code https://gist.github.com/folkertdev/ef1970afe9f00048f163f4d71d5f03f7. it however still takes 4 seconds on my machine

view this post on Zulip Folkert de Vries (Jul 19 2023 at 12:16):

replacing the lookup with a version of get that does not perform inc/dec halves the execution time

    if pc < List.len instructions then
        { operator, operand } = List.getUnsafe instructions pc

view this post on Zulip Folkert de Vries (Jul 19 2023 at 12:19):

replacing the other List.get operations with their unsafe counterparts halves execution time again.

view this post on Zulip Richard Feldman (Jul 19 2023 at 13:14):

hm, but go does bounds checks too, right? :thinking:

view this post on Zulip Folkert de Vries (Jul 19 2023 at 13:19):

sure, but not inc/dec

view this post on Zulip Folkert de Vries (Jul 19 2023 at 13:19):

got it down to 400ms with more unsafe

view this post on Zulip Richard Feldman (Jul 19 2023 at 13:22):

this might be a good place to try an idea I once had that seems silly but maybe might help: List.getWrap : List a, Nat -> Result a [ListWasEmpty] - same as List.get except:

view this post on Zulip Folkert de Vries (Jul 19 2023 at 13:23):

this does not help

view this post on Zulip Richard Feldman (Jul 19 2023 at 13:23):

the hypothesis is that maybe the length = 0 branch is easier for llvm to optimize away in some scenarios than the comparson to a variable #

view this post on Zulip Richard Feldman (Jul 19 2023 at 13:24):

oh you already tried something similar?

view this post on Zulip Folkert de Vries (Jul 19 2023 at 15:49):

the problem is refcount operations, not the bounds check

view this post on Zulip Richard Feldman (Jul 19 2023 at 16:26):

oh I was responding to this part:

Folkert de Vries said:

replacing the other List.get operations with their unsafe counterparts halves execution time again.

view this post on Zulip Richard Feldman (Jul 19 2023 at 16:27):

I assumed that was bounds check, but do the unsafe ones also not do RC?

view this post on Zulip Folkert de Vries (Jul 19 2023 at 16:43):

yes that is the crucial difference. getUnsafe does not do any inc/dec operations on the list

view this post on Zulip Folkert de Vries (Jul 19 2023 at 16:44):

I made a second version that uses tail calls instead of a switch https://gist.github.com/folkertdev/3212e3ed7ac124d4b6e9db920cf05099. Sadly it runs into this error in mono

thread '<unnamed>' panicked at 'function symbol `#UserApp.incrementPointer` not in set LambdaSet { set: [], args: [InLayout(U64), InLayout(48), InLayout(U64), InLayout(LIST_U8), InLayout(U64), InLayout(LIST_U8)], ret: InLayout(LIST_U8), representation: InLayout(VOID), full_layout: InLayout(62) }', crates/compiler/mono/src/layout.rs:1576:9

(cc @Ayaz Hafiz )

view this post on Zulip Folkert de Vries (Jul 19 2023 at 18:40):

hmm, this does not work anyway because llvm tail calls are lies

view this post on Zulip Folkert de Vries (Jul 19 2023 at 18:44):

what is extremely disappointing is that this LLVM

define internal fastcc %list.RocList @"#UserApp_incrementPointer_3bbacd33228bca14fe5573efe7278cde33c78fe9028ba98810cff368dece"(i64 %pc, %list.RocList %instructions, i64 %dataPtr, %list.RocList %data, i64 %"108", %list.RocList %output) {
entry:
  %call = tail call fastcc i64 @Num_addWrap_c852b6d75d2364d70d094699f8a9cda9129d5310ed82ea45564f47a9(i64 %pc, i64 1)
  %call1 = tail call fastcc i64 @Num_addWrap_c852b6d75d2364d70d094699f8a9cda9129d5310ed82ea45564f47a9(i64 %dataPtr, i64 1)
  %call2 = tail call fastcc %list.RocList @"#UserApp_next_4afec3b1b615e34b46f852dc4576722a03d82d96cc27deb38d7b350ecaf31"(i64 %call, %list.RocList %instructions, i64 %call1, %list.RocList %data, %list.RocList %output)
  ret %list.RocList %call2
}

does not result in an actual tail call

000000000011f110 <#UserApp_incrementPointer_3bbacd33228bca14fe5573efe7278cde33c78fe9028ba98810cff368dece>:
  11f110:   55                      push   %rbp
  11f111:   41 57                   push   %r15
  ...
  11f16a:   ff 74 24 68             pushq  0x68(%rsp)
  11f16e:   e8 8d 0b 00 00          callq  11fd00 <#UserApp_next_4afec3b1b615e34b46f852dc4576722a03d82d96cc27deb38d7b350ecaf31>
  11f173:   48 83 c4 38             add    $0x38,%rsp
  11f177:   5b                      pop    %rbx
  11f178:   41 5c                   pop    %r12
  11f17a:   41 5d                   pop    %r13
  11f17c:   41 5e                   pop    %r14
  11f17e:   41 5f                   pop    %r15
  11f180:   5d                      pop    %rbp
  11f181:   c3                      retq
  11f182:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  11f189:   00 00 00
  11f18c:   0f 1f 40 00             nopl   0x0(%rax)

In clang you can force tail calls with musttail. We may need something equivalent to force LLVM to apply the optimization here

view this post on Zulip Folkert de Vries (Jul 19 2023 at 18:45):

llvm has the musttail attribute, but in mono we can't really know whether applying it is correct.

view this post on Zulip Brendan Hansknecht (Jul 19 2023 at 18:46):

Oh, we're doing recursive list refcount updates to all of the nested AST notes inside of blocks?

view this post on Zulip Folkert de Vries (Jul 19 2023 at 18:48):

in the original code, yes. Not any more in the version that I'm talking about (which without cheats still takes 4 seconds)

view this post on Zulip Folkert de Vries (Jul 19 2023 at 19:00):

wow with optimizations LLVM messes this up completely. it does a bunch of inlining, but cannot turn it into a loop, instead just creating a recursive function that blows the stack very quickly

view this post on Zulip Qqwy / Marten (Jul 19 2023 at 20:17):

@Folkert de Vries What happens if you try to make the tail call extremely obvious to LLVM by making the instruction functions State -> State functions (where State == (Nat, List Instruction, Nat, List U8, List U8)) and turning next into something along the lines of

run : State -> List U8
run = \(pc, instructions, dataPtr, data, output) ->
    if pc < List.len instructions then
        nextInstruction = List.getUnsafe instructions pc
        f = lookup nextInstruction.operator
        next_state = f pc instructions dataPtr data nextInstruction.operand output
        run next_state
    else
        output

?

view this post on Zulip Folkert de Vries (Jul 19 2023 at 20:19):

combining the state into one structure is generally not better

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 14:05):

Here's the IR for Folkert's version
For the go version

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 14:14):

optimizations go performs that we don't

./bf.go:51:17: input does not escape
./bf.go:53:19: make([]uint16, 0) does not escape
./bf.go:73:27: &errors.errorString{...} escapes to heap
./bf.go:85:25: &errors.errorString{...} escapes to heap
./bf.go:90:17: program does not escape
./bf.go:91:14: make([]int16, data_size) escapes to heap
./bf.go:93:27: new(bufio.Reader) does not escape
./bf.go:93:27: make([]byte, bufio.size) escapes to heap

view this post on Zulip Folkert de Vries (Jul 20 2023 at 14:16):

i.e. it is able to allocate some structures on the stack?

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 14:20):

yeah. For program and input, if we had that, presumably we could eliminate all RC

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 14:20):

also they definitely hoist the bounds check

view this post on Zulip Folkert de Vries (Jul 20 2023 at 14:22):

well you could eliminate RC, but you'd also need to know that the data always says unique

view this post on Zulip Folkert de Vries (Jul 20 2023 at 14:23):

which morphic might be able to tell us statically sometimes, but in general the refcount is also crucial for us to allow in-place updates

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 14:25):

if it doesn't escape in the function we don't need to bump refcounts on each loop

view this post on Zulip Folkert de Vries (Jul 20 2023 at 14:25):

we do. What if each loop iteration appends the value to some list

view this post on Zulip Folkert de Vries (Jul 20 2023 at 14:26):

(I know it's not needed in this example, but in general RC is not just about memory safety in our case)

view this post on Zulip Folkert de Vries (Jul 20 2023 at 14:28):

likewise, to hoist the bounds check, we'd need to know that the list length is a constant

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 14:28):

well, that is what i'm saying about escape analysis

view this post on Zulip Folkert de Vries (Jul 20 2023 at 14:28):

my assumption was that go performs escape analysis on a function level

view this post on Zulip Folkert de Vries (Jul 20 2023 at 14:29):

you'd need to do it on the scope level here, and even then you'd need to check that there is no mutation I think?

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 14:44):

well in order to see it on a function level, we must also see it on a per-scope level?

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 14:45):

anyway maybe this is a red herring if we are still lower after removing refcounts

view this post on Zulip Folkert de Vries (Jul 20 2023 at 14:59):

yeah, I got that version down to ~400ms. In the original benchmarks go was at 1.2ms. (I did not run that locally though)

view this post on Zulip Folkert de Vries (Jul 20 2023 at 15:45):

does anyone have an idea for how the musttail attribute is actually added? https://llvm.org/docs/LangRef.html#id328. there is a special fucntion in the LLVM C api to add a normal tail call, but musttail does not really seem to be supported there

view this post on Zulip Folkert de Vries (Jul 20 2023 at 15:46):

there is also the tailcc calling convention, which may or may not be relevant

view this post on Zulip Folkert de Vries (Jul 20 2023 at 18:02):

lol wth, so there is tailcc which will perform tail calls when a call is in tail position. But llvm "optimizes" the code in such a way that my syntactic tail calls are no longer tail calls

view this post on Zulip Folkert de Vries (Jul 20 2023 at 18:02):

and this, I think, is why I need to use musttail, but there does not seem to be a way to do it ...

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 23:22):

hold up

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 23:22):

are we sure this go bf interpreter is correct?

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 23:23):

when I run it with @Hannes 's original sample program, https://raw.githubusercontent.com/fabianishere/brainfuck/master/examples/bench/bench-1.bf, it seems to spit out incorrect values (the null byte and then byte 13, rather than OK as a correct interpreter would)

view this post on Zulip Ayaz Hafiz (Jul 20 2023 at 23:24):

when I try it with the hello world program

>++++++++[<+++++++++>-]<.>++++[<+++++++>-]<+.+++++++..+++.>>++++++[<+++++++>-]<+
+.------------.>++++++[<+++++++++>-]<+.<.+++.------.--------.>>>++++[<++++++++>-
]<+.

inlined into both programs here are the results

❯ hyperfine ./brainfuck-go/bf ./brainroc
Benchmark 1: ./brainfuck-go/bf
  Time (mean ± σ):       7.1 ms ±   0.9 ms    [User: 2.9 ms, System: 2.0 ms]
  Range (min … max):     5.6 ms …   9.8 ms    253 runs

Benchmark 2: ./brainroc
  Time (mean ± σ):       4.3 ms ±   0.7 ms    [User: 2.4 ms, System: 1.0 ms]
  Range (min … max):     3.3 ms …   8.1 ms    293 runs

view this post on Zulip Hannes (Jul 21 2023 at 02:00):

Wow @Ayaz Hafiz you're right! I tried the Go interpreter with the hello world program and some others like serpinski's triangle and it worked fine for all of them, but when I ran that benchmark script hyperfine supressed the output, so I didn't see that it was producing incorrect output!

view this post on Zulip Hannes (Jul 21 2023 at 02:48):

Here's some benchmarks that do actually produce the same output

Hello world:

              COMMAND              │ MEAN [MS] │ MIN [MS] │ MAX [MS] │  RELATIVE
───────────────────────────────────┼───────────┼──────────┼──────────┼──────────────
  ./bf-go examples/hello.bf        │ 1.2 ± 0.1 │      0.8 │      1.8 │        1.00
  ./src/brainroc examples/hello.bf │ 1.3 ± 0.1 │      1.2 │      2.5 │ 1.12 ± 0.15

Serpinski's triangle:

              COMMAND            │ MEAN [MS] │ MIN [MS] │ MAX [MS] │  RELATIVE
─────────────────────────────────┼───────────┼──────────┼──────────┼──────────────
  ./bf-go        triangle.bf     │ 3.0 ± 0.3 │      2.7 │      4.7 │        1.00
  ./src/brainroc triangle.bf     │ 9.1 ± 0.3 │      8.7 │     10.2 │ 3.07 ± 0.29

Benchmark 2:

              COMMAND        │    MEAN [S]    │ MIN [S] │ MAX [S] │   RELATIVE
────────────────────────────┼────────────────┼─────────┼─────────┼───────────────
  ./bf-go        bench-2.bf │ 1.364 ± 0.026  │   1.348 │   1.413 │         1.00
  ./src/brainroc bench-2.bf │ 54.819 ± 0.135 │  54.579 │  55.057 │ 40.18 ± 0.76

view this post on Zulip Brendan Hansknecht (Jul 21 2023 at 03:15):

You may want to compare to another interpreter that works in all cases. Even if these produce the same results, the roc version may be doing more work/tracking more things. So the go version may have a structural advantage that leads to much faster execution. If it was handling whatever case it is missing that fails bench 1, it might slow down all other cases (not sure of this, but it is a pretty common benchmarking issue)

view this post on Zulip Hannes (Jul 21 2023 at 05:45):

Good point, I picked this Go implementation because it's the first straightforward implementation I found which is more comparable to mine than some of the bigger ones. I'll try a few and see how they compare, making sure to check the output is correct this time! :sweat_smile:

view this post on Zulip Folkert de Vries (Jul 21 2023 at 08:04):

alright, interesting. Here is my time for my "vanilla" interpreter for the bench-2 benchmark

Executed in   16.75 secs   fish           external 
   usr time   16.75 secs  598.00 micros   16.75 secs 
   sys time    0.00 secs   51.00 micros    0.00 secs

and the time with unsafe cheats

Executed in    1.27 secs   fish           external 
   usr time  1271.84 millis  571.00 micros  1271.27 millis 
   sys time    0.07 millis   66.00 micros    0.00 millis

so that would mean that without the overhead of the list operations, we're in the same order of magnitude at least

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 14:19):

I wrote my own version of a BF interpreter.
On one hand to dust off my own Roc skills since it's been a while I programmed in Roc.
On the other because I've become a liiitle bit nerdsniped by the issue we are facing :sweat_smile:

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 14:19):

https://github.com/Qqwy/roc-bf_example

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 14:19):

This implementation splits up the interpreter in

(And besides this, it is very similar to Folkert's vanilla implementation, with no unsafe cheats)

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 14:19):

It seemed like a good idea to be able to look at these two parts in isolation.
However... it results in a program that segfaults on larger BF programs (?!)

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 14:20):

I've tried adding dbg statements, so the intermediate state after each executed instruction can be seen.
Oddly, running roc dev stops the program after 249 dbg lines are printed.

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 14:20):

I'm very confused by this. It seems like the crash during dbg happens much earlier than the segfault if you don't run in dev mode.

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 14:20):

How to continue debugging something like this?
Could it be the case that here too that something that is a tailcall in the source does not actually result in a tailcall in the binary
?

view this post on Zulip Brendan Hansknecht (Jul 22 2023 at 14:39):

I have definitely seen the dbg issues where after x dbgs, crashes

view this post on Zulip Brendan Hansknecht (Jul 22 2023 at 14:40):

Tends to vary a bit by system, but no idea why it happens. Dbg is over shared memory with the compiler, so we probably have a bug there

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 14:50):

Brendan Hansknecht said:

I have definitely seen the dbg issues where after x dbgs, crashes

I'm running it in lldb now. Seems like a call to roc dev ./brainroc ./examples/some_program.bf just exits with status code 0 after 249 iterations :confused:

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 14:51):

Running the program compiled using roc build in lldb shows that the segfault happens inside malloc so it might be an out-of-memory issue maybe? :thinking:

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 15:25):

Output from lldb:

For bottles.bt:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603ff0)
    frame #0: 0x000000010000d6f4 brainroc`decrement_refcounted_ptr_8
brainroc`decrement_refcounted_ptr_8:
->  0x10000d6f4 <+0>:  stp    x29, x30, [sp, #-0x10]!
    0x10000d6f8 <+4>:  ldr    x8, [x0]
    0x10000d6fc <+8>:  cbz    x8, 0x10000d71c           ; <+40>
    0x10000d700 <+12>: mov    x10, #-0x8000000000000000
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603ff0)
  * frame #0: 0x000000010000d6f4 brainroc`decrement_refcounted_ptr_8
    frame #1: 0x000000010000d8a0 brainroc`___lldb_unnamed_symbol333506 + 40
    frame #2: 0x000000010000a638 brainroc`List_get_53eef38977ca9e3af29e8b6fc9f50f557be9bbd173abd2118eb5488f19fb2 + 100
    frame #3: 0x00000001000095b4 brainroc`List_update_82d12b6b12b8099eecd17c66fbb6ef41a8137f8c1ef74478e7bee978be8f9 + 64
    frame #4: 0x000000010000a6e8 brainroc`#UserApp_dec_ca98df76e744faeef21fb76918295997c9c1b552a2e623d6fc162a11de8fae + 84
    frame #5: 0x0000000100007144 brainroc`#UserApp_runOne_5250a04b6d2518751b896097c7387fb10ff97b95f3e08a2795e163ade21ee + 240
    frame #6: 0x0000000100009f48 brainroc`#UserApp_run_618af7379dc962a8e685497132e67d82737289278bf8d7fadc9a49f2d03ce7 + 44
    frame #7: 0x0000000100003744 brainroc`#UserApp_36_789661f33c6ea1791479ecf1f52dd93e21b779364a5197d9de3459113903b9c + 68
    frame #8: 0x0000000100004380 brainroc`Task_39_5cbbff1635f59ae21a02af6cfe0157283a05fb77d9b6ef4377a9133a78ffbe5 + 164
    frame #9: 0x00000001000043d8 brainroc`Effect_effect_after_inner_935229af12e1c6ee752a73f7b73add5a7c7a22cfba9e577e778e240ed627a + 72
    frame #10: 0x000000010000b944 brainroc`Effect_effect_after_inner_1b6b9e2f2c8025d6941d1d79426973c1ba899598ef8eecc9bea3f5f3657b4477 + 84
    frame #11: 0x000000010000e2e8 brainroc`roc__mainForHost_0_caller + 12
    frame #12: 0x0000000100034d60 brainroc`host::call_the_closure::h2bc8f279b028add5 + 160
    frame #13: 0x0000000100034c9c brainroc`rust_main + 132
    frame #14: 0x000000010000f4d4 brainroc`main + 20
    frame #15: 0x00000001a79fff28 dyld`start + 2236

For sierpinski.bt, bench1.bt, bench2.bt

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603ff0)
    frame #0: 0x00000001a7b946b8 libsystem_malloc.dylib`nanov2_malloc_zero_on_alloc + 4
libsystem_malloc.dylib`nanov2_malloc_zero_on_alloc:
->  0x1a7b946b8 <+4>:  stp    x20, x19, [sp, #-0x20]!
    0x1a7b946bc <+8>:  stp    x29, x30, [sp, #0x10]
    0x1a7b946c0 <+12>: add    x29, sp, #0x10
    0x1a7b946c4 <+16>: add    x8, x1, #0xf
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603ff0)
  * frame #0: 0x00000001a7b946b8 libsystem_malloc.dylib`nanov2_malloc_zero_on_alloc + 4
    frame #1: 0x0000000100034134 brainroc`roc_alloc + 24
    frame #2: 0x0000000100009c40 brainroc`List_replaceUnsafe_8256d790c99390129cd6628d4d43bc44f55cfb83af722d8248666b192be24d65 + 176
    frame #3: 0x0000000100003584 brainroc`List_update_9aea80b84c2f1cf95fb346d143d7e8d50c7bf2b25952d5f8f5db3c06a669b1a + 104
    frame #4: 0x0000000100009a84 brainroc`#UserApp_inc_ab6bf482529beb271842e34a3922217532b2f57565d72aea510e9edbdcef6 + 84
    frame #5: 0x0000000100007158 brainroc`#UserApp_runOne_5250a04b6d2518751b896097c7387fb10ff97b95f3e08a2795e163ade21ee + 260
    frame #6: 0x0000000100009f48 brainroc`#UserApp_run_618af7379dc962a8e685497132e67d82737289278bf8d7fadc9a49f2d03ce7 + 44
    frame #7: 0x0000000100003744 brainroc`#UserApp_36_789661f33c6ea1791479ecf1f52dd93e21b779364a5197d9de3459113903b9c + 68
    frame #8: 0x0000000100004380 brainroc`Task_39_5cbbff1635f59ae21a02af6cfe0157283a05fb77d9b6ef4377a9133a78ffbe5 + 164
    frame #9: 0x00000001000043d8 brainroc`Effect_effect_after_inner_935229af12e1c6ee752a73f7b73add5a7c7a22cfba9e577e778e240ed627a + 72
    frame #10: 0x000000010000b944 brainroc`Effect_effect_after_inner_1b6b9e2f2c8025d6941d1d79426973c1ba899598ef8eecc9bea3f5f3657b4477 + 84
    frame #11: 0x000000010000e2e8 brainroc`roc__mainForHost_0_caller + 12
    frame #12: 0x0000000100034d60 brainroc`host::call_the_closure::h2bc8f279b028add5 + 160
    frame #13: 0x0000000100034c9c brainroc`rust_main + 132
    frame #14: 0x000000010000f4d4 brainroc`main + 20
    frame #15: 0x00000001a79fff28 dyld`start + 2236

(helloworld.bt and print0to999.bt run fine).

view this post on Zulip Folkert de Vries (Jul 22 2023 at 15:29):

in the platform, you can make roc_alloc print e.g. how many bytes it wants to allocate. That might be helpful

view this post on Zulip Folkert de Vries (Jul 22 2023 at 15:30):

the problem here is the dbg btw, or does this always happen?

view this post on Zulip Richard Feldman (Jul 22 2023 at 15:46):

also something I've found often helpful is to put lldb backtraces and/or .ll bytecode into gpt4 and say what symptom I'm seeing and ask if it sees anything - I've had good success with that in the past!

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 15:48):

Folkert de Vries said:

the problem here is the dbg btw, or does this always happen?

This is the output you get after running a program compiled using a normal roc build which IIRC strips all dbg calls.
(To be precise, I run cargo run -- build --debug ../bf/main.roc; the roc directory containing the roc repo and the bf directory containing the bf interpreter project are sibling directories)

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 16:21):

Folkert de Vries said:

in the platform, you can make roc_alloc print e.g. how many bytes it wants to allocate. That might be helpful

It looks like this:

# ... many more similar lines here
Allocating 1008...
Ptr: 0x102904440
Deallocating 0x102904830
Allocating 1008...
Ptr: 0x102904830
Deallocating 0x102904440
Allocating 1008...
Ptr: 0x102904440
Deallocating 0x102904830
Allocating 1008...
Ptr: 0x102904830
Deallocating 0x102904440
Allocating 1008...
Ptr: 0x102904440
Deallocating 0x102904830
Allocating 1008...
Ptr: 0x102904830
Deallocating 0x102904440
Allocating 1008...
Ptr: 0x102904440
Deallocating 0x102904830
Allocating 1008...
Ptr: 0x102904830
Deallocating 0x102904440
Allocating 930...
Ptr: 0x102904440
Deallocating 0x102904f70
Allocating 1008...
Ptr: 0x102904c20
Deallocating 0x102904830
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904c20
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Ptr: 0x1029047f0
Deallocating 0x102904be0
Allocating 1008...
Ptr: 0x102904be0
Deallocating 0x1029047f0
Allocating 1008...
Process 41816 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fd0)
    frame #0: 0x00000001008c1f8c brainroc`_$LT$std..io..stdio..StdoutLock$u20$as$u20$std..io..Write$GT$::write_all::hd86c4a41d5107153 at stdio.rs:735 [opt]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fd0)
  * frame #0: 0x00000001008c1f8c brainroc`_$LT$std..io..stdio..StdoutLock$u20$as$u20$std..io..Write$GT$::write_all::hd86c4a41d5107153 at stdio.rs:735 [opt]
    frame #1: 0x00000001008c4af8 brainroc`_$LT$std..io..Write..write_fmt..Adapter$LT$T$GT$$u20$as$u20$core..fmt..Write$GT$::write_str::h968df0371a4b9768 at mod.rs:1671:23 [opt]
    frame #2: 0x00000001009197a0 brainroc`core::fmt::Formatter::pad_integral::hb214d8a54e5b3e5f at mod.rs:0 [opt]
    frame #3: 0x000000010091bb00 brainroc`core::fmt::pointer_fmt_inner::hb0b335e6a5beaa1d [inlined] core::fmt::num::GenericRadix::fmt_int::hd198a0fe668a9d31 at num.rs:110:9 [opt]
    frame #4: 0x000000010091ba98 brainroc`core::fmt::pointer_fmt_inner::hb0b335e6a5beaa1d [inlined] core::fmt::num::_$LT$impl$u20$core..fmt..LowerHex$u20$for$u20$usize$GT$::fmt::h338abe56eabf0dc0 at num.rs:155:17 [opt]
    frame #5: 0x000000010091ba98 brainroc`core::fmt::pointer_fmt_inner::hb0b335e6a5beaa1d at mod.rs:2503:15 [opt]
    frame #6: 0x0000000100822a88 brainroc`_$LT$$BP$const$u20$T$u20$as$u20$core..fmt..Pointer$GT$::fmt::h88e433855bb7f807(self=0x000000016f6041b8, f=0x000000016f604200) at mod.rs:2475:9
    frame #7: 0x0000000100822a48 brainroc`_$LT$$BP$mut$u20$T$u20$as$u20$core..fmt..Pointer$GT$::fmt::heda03ebd385f5e29(self=0x000000016f6043f0, f=0x000000016f604200) at mod.rs:2514:9
    frame #8: 0x0000000100822a0c brainroc`_$LT$$BP$mut$u20$T$u20$as$u20$core..fmt..Debug$GT$::fmt::hc2e87981d6756b50(self=0x000000016f6043f0, f=0x000000016f604200) at mod.rs:2543:9
    frame #9: 0x0000000100919434 brainroc`core::fmt::write::hd33da007f7a27e39 at mod.rs:1208:17 [opt]
    frame #10: 0x00000001008c1ab8 brainroc`_$LT$$RF$std..io..stdio..Stdout$u20$as$u20$std..io..Write$GT$::write_fmt::h881b8d60b2402de3 [inlined] std::io::Write::write_fmt::hc2f28f6e637fac24 at mod.rs:1682:15 [opt]
    frame #11: 0x00000001008c1a90 brainroc`_$LT$$RF$std..io..stdio..Stdout$u20$as$u20$std..io..Write$GT$::write_fmt::h881b8d60b2402de3 at stdio.rs:716:9 [opt]
    frame #12: 0x00000001008c38dc brainroc`std::io::stdio::_print::h8769d5b8d07b47a7 [inlined] _$LT$std..io..stdio..Stdout$u20$as$u20$std..io..Write$GT$::write_fmt::h687f60646da13c0c at stdio.rs:690:9 [opt]
    frame #13: 0x00000001008c38b8 brainroc`std::io::stdio::_print::h8769d5b8d07b47a7 [inlined] std::io::stdio::print_to::h68985b56676cff12 at stdio.rs:1008:21 [opt]
    frame #14: 0x00000001008c387c brainroc`std::io::stdio::_print::h8769d5b8d07b47a7 at stdio.rs:1075:5 [opt]
    frame #15: 0x00000001000265d8 brainroc`roc_alloc(size=1008, _alignment=8) at lib.rs:43:5
    frame #16: 0x000000010000ae14 brainroc`List_replaceUnsafe_dbbb614026929029a924a622e5a645206e5e1277bd8c25cb7b78527df1a8c + 176
    frame #17: 0x000000010000a780 brainroc`List_update_b7f585fd6f82cd9d725cedb302bc1875b5e7bb83a5ac59033a7d4152c1447 + 104
    frame #18: 0x000000010000b874 brainroc`#UserApp_dec_ca98df76e744faeef21fb76918295997c9c1b552a2e623d6fc162a11de8fae + 84
    frame #19: 0x000000010000830c brainroc`#UserApp_runOne_5250a04b6d2518751b896097c7387fb10ff97b95f3e08a2795e163ade21ee + 240
    frame #20: 0x000000010000b11c brainroc`#UserApp_run_618af7379dc962a8e685497132e67d82737289278bf8d7fadc9a49f2d03ce7 + 44
    frame #21: 0x00000001000047c4 brainroc`#UserApp_36_66a53c5bb7482a7975e058b703753a62aee74dcb3ed8c2fbfc227399ea738e + 68
    frame #22: 0x0000000100005400 brainroc`Task_39_f2eb2e65858ef9a081c444e7b9b2cef1ed51b5a1e38027833034b9f057aa3131 + 164
    frame #23: 0x0000000100007e74 brainroc`Effect_effect_after_inner_3bfae27b50cc70419dec89ef8da341b1287d7bb7b3c4bb2481ba28b17a8ec4 + 72
    frame #24: 0x000000010000cd98 brainroc`Effect_effect_after_inner_4f1fdebc72623b7987dfbf57d7b81537b885c9e4ce317a63dbf262856adf + 84
    frame #25: 0x000000010000f3d4 brainroc`roc__mainForHost_0_caller + 12
    frame #26: 0x0000000100027198 brainroc`host::call_the_closure::h337ad65c516ada9b(closure_data_ptr="A BF interpreter.") at lib.rs:234:5
    frame #27: 0x0000000100027110 brainroc`rust_main at lib.rs:223:9
    frame #28: 0x0000000100010554 brainroc`main + 20
    frame #29: 0x00000001a79fff28 dyld`start + 2236
(lldb)

The size of the allocations is all as large as the dataSize (+ 8 for the refcount I assume). So I think the program is not changing the data list in place.
However, what is strange is that it looks to me like these allocations are all short-lived. And the error we get is not an OOM error but a 'BAD ACCESS' error

view this post on Zulip Qqwy / Marten (Jul 22 2023 at 16:24):

Counting how many allocations are alive by keeping track in a local static variable, shows that there are only 4 allocs alive during the main inner loop (and during the time of the segfault)

view this post on Zulip Folkert de Vries (Jul 22 2023 at 16:26):

yeah, I'm not sure how alloc could cause a bad access

view this post on Zulip Folkert de Vries (Jul 22 2023 at 16:26):

so I wonder if maybe the real cause is something else

view this post on Zulip Folkert de Vries (Jul 22 2023 at 16:26):

the tool to find the actual issue is valgrind, which does not run on mac sadly

view this post on Zulip Folkert de Vries (Jul 22 2023 at 16:27):

(how do mac developers get anything done???)

view this post on Zulip Ayaz Hafiz (Jul 22 2023 at 18:40):

you can try using macos's debug malloc to tell you if it's actually a badaccess

view this post on Zulip Ayaz Hafiz (Jul 22 2023 at 18:40):

DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib


Last updated: Jul 06 2025 at 12:14 UTC