Profiling an AoC app · beginners

I feel like I'm missing something. Does anyone see anything obviously wrong with this code? (Forget about the parsing bit, it is immaterial in performance)

app "AoC"
    packages {
        pf: "https://github.com/roc-lang/basic-cli/releases/download/0.7.0/bkGby8jb0tmZYsy2hg1E_B2QrCgcSTxdUlHtETwm5m4.tar.br",
        parser: "https://github.com/lukewilliamboswell/roc-parser/releases/download/0.2.0/dJQSsSmorujhiPNIvJKlQoI92RFIG_JQwUfIxZsCSwE.tar.br",
    }
    imports [
        pf.Stdout,
        pf.Task.{ Task },
        parser.Core.{
            Parser,
            const,
            keep,
            skip,
            oneOf,
            many,
            sepBy1,
        },
        parser.String.{
            string,
            parseStr,
            codeunitSatisfies,
        },
        "day-8-input.txt" as input : Str,
    ]
    provides [main] to pf

main : Task {} *
main =
    Stdout.line part1
    |> Task.await \_ -> Stdout.line ""

part1 : Str
part1 =
    input
    |> Str.trim
    |> parseMap
    |> stepsAAAToZZZ
    |> Num.toStr

Direction : [Left, Right]
Node : { left : Str, right : Str }
Network : Dict Str Node
Map : { directions : List Direction, network : Network }

findNode : Network, Str -> Node
findNode = \network, node ->
    when Dict.get network node is
        Ok n -> n
        Err _ -> crash "bad input: node \(node) not found"

stepsAAAToZZZ : Map -> Nat
stepsAAAToZZZ = \{ directions, network } ->
    go = \steps, ds, d, cur ->
        if cur == "ZZZ" then
            steps
        else
            when List.get ds (Num.rem d (List.len ds)) is
                Ok Left -> go (steps + 1) directions (d + 1) (findNode network cur).left
                Ok Right -> go (steps + 1) directions (d + 1) (findNode network cur).right
                Err _ -> crash "bad input"

    go 0 directions 0 "AAA"

parseMap : Str -> Map
parseMap = \str ->
    pDirection : Parser _ Direction
    pDirection = oneOf [
        const Left |> skip (string "L"),
        const Right |> skip (string "R"),
    ]

    pDirections = many pDirection

    pLabel : Parser _ Str
    pLabel =
        const \c ->
            when Str.fromUtf8 c is
                Ok s -> s
                Err _ -> crash "bad input: non-utf8 character"
        |> keep (many (codeunitSatisfies \b -> b >= 'A' && b <= 'Z'))

    pNode : Parser _ (Str, Node)
    pNode =
        const (\node -> \left -> \right -> (node, { left, right }))
        |> keep pLabel
        |> skip (string " = (")
        |> keep pLabel
        |> skip (string ", ")
        |> keep pLabel
        |> skip (string ")")

    pNetwork : Parser _ Network
    pNetwork =
        const Dict.fromList
        |> keep (sepBy1 pNode (string "\n"))

    pMap : Parser _ Map
    pMap =
        const (\directions -> \network -> { directions, network })
        |> keep pDirections
        |> skip (string "\n\n")
        |> keep pNetwork

    when parseStr pMap str is
        Ok g -> g
        Err (ParsingFailure e) | Err (ParsingIncomplete e) -> crash "bad input: '\(e)'"

If I run this as-is, without opt or #6216, it takes ~9 seconds. stepsAAAToZZZ#go makes 22K iterations.

If I compile with opt and #6216 it's at 860ms. But this is still paltry compared to a simple Node.js version:

❯ hyperfine --warmup 2 ./day8 'node day8.js'
Benchmark 1: ./day8
  Time (mean ± σ):     867.2 ms ±  17.8 ms    [User: 856.2 ms, System: 5.4 ms]
  Range (min … max):   848.4 ms … 895.3 ms    10 runs

Benchmark 2: node day8.js
  Time (mean ± σ):      35.0 ms ±   1.5 ms    [User: 29.7 ms, System: 5.3 ms]
  Range (min … max):    31.5 ms …  38.6 ms    74 runs

Summary
  'node day8.js' ran
   24.77 ± 1.18 times faster than './day8'

const fs = require('fs');
const path = require('path');

const input = fs.readFileSync(path.resolve(__dirname, './day-8-input.txt'), 'utf8');

const dir = {left: 0, right: 1};

function parseInput(input) {
  const [directions, nodes] = input.trim().split('\n\n');
  const dirs = directions.split('').map(d => d === 'L' ? dir.left : dir.right);
  const network = {};
  const parsedNodes = nodes.split('\n').forEach(n => {
    const [name, lrs] = n.split(' = (');
    const left = lrs.substring(0, 3);
    const right = lrs.substring(3 + ', '.length, 3 + ', '.length + 3);
    network[name] = {left, right};
  });
  return {dirs, network};
}

const {dirs, network} = parseInput(input);

function stepsAAAToZZZ(dirs, network) {
  let steps = 0;
  let cur = "AAA";
  while (cur !== "ZZZ") {
    const {left, right} = network[cur];
    const dir = dirs[steps % dirs.length];
    cur = dir === 0 ? left : right;
    steps++;
  }
  return steps;
}

console.log(stepsAAAToZZZ(dirs, network));

I ran a profile and all the time is spent in Dict and doing str refcounts. But 800ms for 22K iterations of that loop feels insane to me. The input file has ~700 nodes. So in the absolute worst case, where each node must be scanned, this Roc program amortizes to 800ms/22K/700~=5ns per scan. While that might be reasonable, it seems super unlikely in practice.

Brendan Hansknecht (Dec 09 2023 at 02:59):

Brendan Hansknecht (Dec 09 2023 at 03:00):

Cause we don't have any sort of smarter refcounting or borrowing, each call to dict.get will increment the refcount and then decrement it.

Brendan Hansknecht (Dec 09 2023 at 03:00):

So it will spend all the time traversing all elements of the dict incrementing and decrementing refcounts

Brendan Hansknecht (Dec 09 2023 at 03:00):

Brendan Hansknecht (Dec 09 2023 at 03:04):

If the strings are small and you write a when to convert them to a static sized tuple, it would run way way faster

Brendan Hansknecht (Dec 09 2023 at 03:05):

Ayaz Hafiz (Dec 09 2023 at 03:12):

Brendan Hansknecht (Dec 09 2023 at 03:15):

Did you make a get method? If so it took ownership of the list, requiring a recursive increment when called and a recursive decrement in the function when the list is dropped. Has to be manually inlined to avoid this

Brendan Hansknecht (Dec 09 2023 at 03:16):

It's a huge perf problem for anything wrapping list.getin another function when the list contains recounted types

Brendan Hansknecht (Dec 09 2023 at 03:25):

Also, before we fix add the borrowing, probably fixing this recounting worst case it's more important. I think I'll work on that next. Shouldn't be too hard

Ryan Bates (Dec 09 2023 at 04:55):

Brendan Hansknecht (Dec 09 2023 at 05:21):

Just to clearly show how fast this should be. Just took Ayaz's code and changed from a Str to a (U8, U8, U8):

New version of the code

app "AoC"
    packages {
        pf: "https://github.com/roc-lang/basic-cli/releases/download/0.7.0/bkGby8jb0tmZYsy2hg1E_B2QrCgcSTxdUlHtETwm5m4.tar.br",
        parser: "https://github.com/lukewilliamboswell/roc-parser/releases/download/0.2.0/dJQSsSmorujhiPNIvJKlQoI92RFIG_JQwUfIxZsCSwE.tar.br",
    }
    imports [
        pf.Stdout,
        pf.Task.{ Task },
        parser.Core.{
            Parser,
            const,
            keep,
            skip,
            oneOf,
            many,
            sepBy1,
        },
        parser.String.{
            string,
            parseStr,
            codeunitSatisfies,
        },
        "day-8-input.txt" as input : Str,
    ]
    provides [main] to pf

main : Task {} *
main =
    Stdout.line part1
    |> Task.await \_ -> Stdout.line ""

part1 : Str
part1 =
    input
    |> Str.trim
    |> parseMap
    |> stepsAAAToZZZ
    |> Num.toStr

Direction : [Left, Right]
Node : { left : (U8, U8, U8), right : (U8, U8, U8)}
Network : Dict (U8, U8, U8) Node
Map : { directions : List Direction, network : Network }

findNode : Network, (U8, U8, U8)-> Node
findNode = \network, node ->
    when Dict.get network node is
        Ok n -> n
        Err _ -> crash "bad input: node \(node |> Inspect.toStr) not found"

stepsAAAToZZZ : Map -> Nat
stepsAAAToZZZ = \{ directions, network } ->
    go = \steps, ds, d, cur ->
        if cur == ('Z', 'Z', 'Z') then
            steps
        else
            when List.get ds (Num.rem d (List.len ds)) is
                Ok Left -> go (steps + 1) directions (d + 1) (findNode network cur).left
                Ok Right -> go (steps + 1) directions (d + 1) (findNode network cur).right
                Err _ -> crash "bad input"

    go 0 directions 0 ('A', 'A', 'A')

parseMap : Str -> Map
parseMap = \str ->
    pDirection : Parser _ Direction
    pDirection = oneOf [
        const Left |> skip (string "L"),
        const Right |> skip (string "R"),
    ]

    pDirections = many pDirection

    pLabel : Parser _ Str
    pLabel =
        const \c ->
            when Str.fromUtf8 c is
                Ok s -> s
                Err _ -> crash "bad input: non-utf8 character"
        |> keep (many (codeunitSatisfies \b -> b >= 'A' && b <= 'Z'))

    pNode : Parser _ ((U8, U8, U8), Node)
    pNode =
        const (\node -> \left -> \right ->
            when (Str.toUtf8 node, Str.toUtf8 left, Str.toUtf8 right) is
                ([n1, n2, n3], [l1, l2, l3], [r1, r2, r3]) ->
                    ((n1,n2,n3), { left: (l1,l2,l3), right: (r1,r2,r3) })
                _ ->
                    crash "bad input: \(node)-\(left)-\(right)"
            )
        |> keep pLabel
        |> skip (string " = (")
        |> keep pLabel
        |> skip (string ", ")
        |> keep pLabel
        |> skip (string ")")

    pNetwork : Parser _ Network
    pNetwork =
        const Dict.fromList
        |> keep (sepBy1 pNode (string "\n"))

    pMap : Parser _ Map
    pMap =
        const (\directions -> \network -> { directions, network })
        |> keep pDirections
        |> skip (string "\n\n")
        |> keep pNetwork

    when parseStr pMap str is
        Ok g -> g
        Err (ParsingFailure e) | Err (ParsingIncomplete e) -> crash "bad input: '\(e)'"

Brendan Hansknecht (Dec 09 2023 at 05:22):

Once we fix the recursive list refcounting, these two should run at essentially the same speed

Luke Boswell (Dec 09 2023 at 05:40):

Nice work Brendan, this is really cool so see how performance can be impacted like this, and that there are solutions.

Brendan Hansknecht (Dec 09 2023 at 05:43):

Declan Joseph Maguire (Dec 09 2023 at 05:44):

I have no contributions to this little optimisation saga, I just want to say that I'm loving it. There's something hypnotic about watching something get optimised.

Brendan Hansknecht (Dec 09 2023 at 05:51):

Hmm, though on the heap I have to think about capacity vs size. Cause ideally we want to know the full capacity for in the future when roc_dealloc is passed the full allocation size. But for recount purposes, we want to know the full size such that the slice can just decrease the refcounts of real elements, not old junk elements. I'll have to think about this a bit more, but something along those lines should free us from this problem (though will keep some refcounted objects alive for longer)

Brendan Hansknecht (Dec 09 2023 at 07:58):

@Folkert de Vries do my comments here sound roughly right to you? Anything I am missing? I know you have thought about this recursive refcounting issue before.

Folkert de Vries (Dec 09 2023 at 13:17):

well fundamentally this is a simple change. But there are many places now where the assumption is made that an inc/dec on a list is recursive, and we have to track them all down and fix them

Brendan Hansknecht (Dec 09 2023 at 16:52):

For sure, there will be memory leaks and double frees for a while as I do this. Hopefully due to tests and valgrind I'll be able to dig through those

Ayaz Hafiz (Dec 09 2023 at 18:17):

Im not sure that refcounting is enough to explain the problem here. In a trace I ran, refcounts of the inner strings as 70ms/300ms of the profile. So I'm not sure that automatically translates to a 100x improvement

Ayaz Hafiz (Dec 09 2023 at 18:18):

Richard Feldman (Dec 09 2023 at 18:27):

I'd try running it with refcounts disabled (like they're a no-op) and see how that compares

Brendan Hansknecht (Dec 09 2023 at 18:30):

Does koka have the recursive refcount issue where if you pass a List Str to a function, it will end up recursively incrementing the refcount of all strings in the list? Or in this case Dict Str {Str, Str} so 3x as many strings to increment. Then at the end of the function it will recursively decrement the refcount of all the strings. (in this case, technically it is just traversing all the strings and realizing they are small then moving to the next one.)

You need full debug info from dwarf and perf using it to see the inlined functions. Almost all of the refcounted functions are inlined

Brendan Hansknecht (Dec 09 2023 at 18:38):

Flamegraph of my faster version with U8s Purple is still refcounting. Essentially all of the time is spent in the parser:
Screenshot-2023-12-09-at-10.37.10AM.png

Brendan Hansknecht (Dec 09 2023 at 18:39):

Musab Nazir (May 22 2024 at 15:22):

Is this still a WIP issue? I ran into a problem where my roc code was running almost a 1000x slower than similar go code. Playing around with it a bit more I realized most of this overhead is coming from using Str key values for my Dict. Using a num type puts me back in go ballpark numbers

Brendan Hansknecht (May 23 2024 at 10:54):

Brendan Hansknecht (May 23 2024 at 10:55):

That said, may be slow to finish. I think most remaining failing tests are segfaults. (Probably also have a number of memory leaks)

Stream: beginners

Topic: Profiling an AoC app

Ayaz Hafiz (Dec 09 2023 at 02:49):

Spoiler: this contains an answer to part 1 of day 8 of AoC 2023.

Brendan Hansknecht (Dec 09 2023 at 02:59):

Brendan Hansknecht (Dec 09 2023 at 03:00):

Brendan Hansknecht (Dec 09 2023 at 03:00):

Brendan Hansknecht (Dec 09 2023 at 03:00):

Brendan Hansknecht (Dec 09 2023 at 03:04):

Brendan Hansknecht (Dec 09 2023 at 03:05):

Ayaz Hafiz (Dec 09 2023 at 03:12):

Brendan Hansknecht (Dec 09 2023 at 03:15):

Brendan Hansknecht (Dec 09 2023 at 03:16):

Brendan Hansknecht (Dec 09 2023 at 03:25):

Ryan Bates (Dec 09 2023 at 04:55):

Brendan Hansknecht (Dec 09 2023 at 05:21):

Brendan Hansknecht (Dec 09 2023 at 05:22):

Luke Boswell (Dec 09 2023 at 05:40):

Brendan Hansknecht (Dec 09 2023 at 05:43):

Declan Joseph Maguire (Dec 09 2023 at 05:44):

Brendan Hansknecht (Dec 09 2023 at 05:51):

Brendan Hansknecht (Dec 09 2023 at 07:58):

Folkert de Vries (Dec 09 2023 at 13:17):

Brendan Hansknecht (Dec 09 2023 at 16:52):

Ayaz Hafiz (Dec 09 2023 at 18:17):

Ayaz Hafiz (Dec 09 2023 at 18:18):

Richard Feldman (Dec 09 2023 at 18:27):

Richard Feldman (Dec 09 2023 at 18:27):

Brendan Hansknecht (Dec 09 2023 at 18:30):

Brendan Hansknecht (Dec 09 2023 at 18:38):

Brendan Hansknecht (Dec 09 2023 at 18:39):

Musab Nazir (May 22 2024 at 15:22):

Brendan Hansknecht (May 23 2024 at 10:54):

Brendan Hansknecht (May 23 2024 at 10:55):