roc-base64 · show and tell · Zulip Chat Archive

I needed base64 decoding for my r2e (e2e framework) to save base64 encoded pdfs and screenshots from browser to disk.

The implementation is quite slow - but seems to work -and it is enough for me for now.

BUT I have noticed that it is really slow when running with a not optimized build.

e.g. decoding a png file (254KB of base64 text on disk / 191KB after decoding on disk)

➜ time ./decode
./decode 5.17s user 0.00s system 99% cpu 5.172 total

➜ time ./decode
./decode  0.01s user 0.00s system 98% cpu 0.011 total

I'm using a big when .. is expression to lookup indexes for the 64 characters.
When I tried to use a Dict it went even slower.
unoptimized:

./decodeDict  18.95s user 1.67s system 99% cpu 20.626 total

./decodeDict  1.15s user 1.46s system 99% cpu 2.606 total

Kiryl Dziamura (Jul 09 2024 at 20:42):

There’s an implementation in roc’s benchmarks. Not sure how good it is tho :big_smile:

Artur Domurad (Jul 09 2024 at 20:53):

Brendan Hansknecht (Jul 09 2024 at 20:57):

Dict has a perf issue with refcounted data (like strings or nested lists) currently. I have most of a fix, but have been block on getting it to work with the dev backends. So that is probably why it is so much slower

Brendan Hansknecht (Jul 09 2024 at 20:57):

As for when .. is I would guess that an unoptimized build might be generating a giant if/else chain.

Brendan Hansknecht (Jul 09 2024 at 20:58):

Agus Zubiaga (Jul 10 2024 at 14:45):

I haven't tested it but I imagine doing some math would be faster (at least in a dev build):

decodeChar = \char ->
    if char >= 'a' then
        char - 'a' + 26
    else if char >= 'A' then
        char - 'A'
    else if char >= '0' then
        char - '0' + 52
    else if char == '+' then
        62
    else if char == '/' then
        63
    else
        0 # ignore

Agus Zubiaga (Jul 10 2024 at 14:57):

(This assumes it will only get valid chars, you can add more guards if that isn't the case)

Artur Domurad (Jul 10 2024 at 15:53):

./decode2  4.05s user 0.00s system 99% cpu 4.054 total

Brendan Hansknecht (Jul 10 2024 at 16:42):

Do you have files/browser.base64.txt? I'm curious to run the same file and look at the flamegraph.

Also, your basic-cli url doesn't seem to exist anymore (I guessing this is due to depending on the 0.12.0 prerelease)

Artur Domurad (Jul 10 2024 at 17:04):

And I did notice today my mistake of depending on a prerelease as my examples stoped working.

Anton (Jul 10 2024 at 17:26):

I added an extra visible warning to the pre-release notes to prevent this in the future

Brendan Hansknecht (Jul 10 2024 at 17:28):

decodeList = \list, state ->
    when list is
        [a, b, c, d, .. as rest] ->
            b1 = (0u32 |> Num.shiftLeftBy 6) + (decodeChar a)
            b2 = (b1 |> Num.shiftLeftBy 6) + (decodeChar b)
            b3 = (b2 |> Num.shiftLeftBy 6) + (decodeChar c)
            b4 = (b3 |> Num.shiftLeftBy 6) + (decodeChar d)

            c1 = b4 |> Num.shiftRightBy 16 |> Num.bitwiseAnd 0xff |> Num.toU8
            c2 = b4 |> Num.shiftRightBy 8 |> Num.bitwiseAnd 0xff |> Num.toU8
            c3 = b4 |> Num.shiftRightBy 0 |> Num.bitwiseAnd 0xff |> Num.toU8

            next = List.concat state [c1, c2, c3]
            decodeList rest next

        [] ->
            state

        _ -> crash "Base64.decode: this error should not be possible"

decode : List U8 -> List U8
decode = \bytes ->
    # pad missing "="
    list = bytes |> padMissingChars

    # calc final padding chars
    paddingLen = calculatePadding list

    # Note: Totally would be best to use `List.withCapacity` here.
    decodedList = decodeList list []

    decodedList |> List.dropLast paddingLen

Artur Domurad (Jul 10 2024 at 17:38):

Brendan Hansknecht (Jul 10 2024 at 17:44):

Yeah, definitely something we should debug more. I definitely dont think the difference should be that stark. Oh, though maybe the chunks list of lists is hitting the refcounting perf bug. Not sure though

Brendan Hansknecht (Jul 10 2024 at 18:09):

Yeah, 100% the refcounting issue. Perf is also fixed by the list-size-on-heap branch if I can ever manage to land it.

Summary
  ./ref-fix-opt ran
    1.04 ± 0.08 times faster than ./main-opt
    1.07 ± 0.28 times faster than ./main-no-chunksOf-opt
    1.68 ± 0.28 times faster than ./main-no-chunksOf
    2.25 ± 0.14 times faster than ./ref-fix
 1861.04 ± 122.30 times faster than ./main

Brendan Hansknecht (Jul 10 2024 at 18:32):

Also, just to clarify, I would expect avoiding chunkOf to generally be the better design even after the perf bug is fixed. This is seen when benchmarking at a really high sample size (2000 executions in this case) to get a more accurate difference in perf:

Summary
  ./main-no-chunksOf-opt ran
    1.15 ± 0.07 times faster than ./ref-fix-opt
    1.18 ± 0.07 times faster than ./main-opt
    1.84 ± 0.08 times faster than ./main-no-chunksOf
    2.62 ± 0.11 times faster than ./ref-fix

Stream: show and tell

Topic: roc-base64

Artur Domurad (Jul 09 2024 at 20:30):

Kiryl Dziamura (Jul 09 2024 at 20:42):

Artur Domurad (Jul 09 2024 at 20:53):

Brendan Hansknecht (Jul 09 2024 at 20:57):

Brendan Hansknecht (Jul 09 2024 at 20:57):

Brendan Hansknecht (Jul 09 2024 at 20:58):

Agus Zubiaga (Jul 10 2024 at 14:45):

Agus Zubiaga (Jul 10 2024 at 14:57):

Artur Domurad (Jul 10 2024 at 15:53):

Brendan Hansknecht (Jul 10 2024 at 16:42):

Artur Domurad (Jul 10 2024 at 17:04):

Anton (Jul 10 2024 at 17:26):

Brendan Hansknecht (Jul 10 2024 at 17:28):

Artur Domurad (Jul 10 2024 at 17:38):

Brendan Hansknecht (Jul 10 2024 at 17:44):

Brendan Hansknecht (Jul 10 2024 at 18:09):

Brendan Hansknecht (Jul 10 2024 at 18:32):