I needed base64 decoding for my r2e (e2e framework) to save base64 encoded pdfs and screenshots from browser to disk.
I could not find any base64 lib for roc, so I did a basic implementation:
https://github.com/adomurad/roc-base64
The implementation is quite slow - but seems to work -and it is enough for me for now.
BUT I have noticed that it is really slow when running with a not optimized build.
e.g. decoding a png file (254KB of base64 text on disk / 191KB after decoding on disk)
when running with a binary from roc build decode.roc
:
➜ time ./decode
./decode 5.17s user 0.00s system 99% cpu 5.172 total
when running with a binary from roc build decode.roc --optimize
:
➜ time ./decode
./decode 0.01s user 0.00s system 98% cpu 0.011 total
Here is a "one file repro" if someone wants to look into this:
https://github.com/adomurad/roc-base64/blob/main/temp/decode.roc
I'm using a big when .. is
expression to lookup indexes for the 64 characters.
When I tried to use a Dict
it went even slower.
unoptimized:
./decodeDict 18.95s user 1.67s system 99% cpu 20.626 total
optimized:
./decodeDict 1.15s user 1.46s system 99% cpu 2.606 total
And here is the implementation with a Dict
:
https://github.com/adomurad/roc-base64/blob/main/temp/decodeDict.roc
If someone has a faster implementation - please share :pray:
There’s an implementation in roc’s benchmarks. Not sure how good it is tho :big_smile:
https://github.com/roc-lang/roc/tree/main/crates/cli/tests/benchmarks/Base64
Thanks :)
I will try it tommorow.
Dict
has a perf issue with refcounted data (like strings or nested lists) currently. I have most of a fix, but have been block on getting it to work with the dev backends. So that is probably why it is so much slower
As for when .. is
I would guess that an unoptimized build might be generating a giant if/else chain.
Definitely something I can look into later to verify
I haven't tested it but I imagine doing some math would be faster (at least in a dev build):
decodeChar = \char ->
if char >= 'a' then
char - 'a' + 26
else if char >= 'A' then
char - 'A'
else if char >= '0' then
char - '0' + 52
else if char == '+' then
62
else if char == '/' then
63
else
0 # ignore
(This assumes it will only get valid chars, you can add more guards if that isn't the case)
I have tried that yesterday, and using a dev build it is:
./decode2 4.05s user 0.00s system 99% cpu 4.054 total
so ~1s faster than the when .. is
expression.
Do you have files/browser.base64.txt
? I'm curious to run the same file and look at the flamegraph.
Also, your basic-cli url doesn't seem to exist anymore (I guessing this is due to depending on the 0.12.0 prerelease)
Here is the file:
browser.base64.txt
It is a screenshot of roc-lang.org :grinning_face_with_smiling_eyes:
And I did notice today my mistake of depending on a prerelease as my examples stoped working.
I did notice today my mistake of depending on a prerelease as my examples stoped working.
I added an extra visible warning to the pre-release notes to prevent this in the future
Interesting. The problem is 100% List.chunksOf
.
This should fix debug perf:
decodeList = \list, state ->
when list is
[a, b, c, d, .. as rest] ->
b1 = (0u32 |> Num.shiftLeftBy 6) + (decodeChar a)
b2 = (b1 |> Num.shiftLeftBy 6) + (decodeChar b)
b3 = (b2 |> Num.shiftLeftBy 6) + (decodeChar c)
b4 = (b3 |> Num.shiftLeftBy 6) + (decodeChar d)
c1 = b4 |> Num.shiftRightBy 16 |> Num.bitwiseAnd 0xff |> Num.toU8
c2 = b4 |> Num.shiftRightBy 8 |> Num.bitwiseAnd 0xff |> Num.toU8
c3 = b4 |> Num.shiftRightBy 0 |> Num.bitwiseAnd 0xff |> Num.toU8
next = List.concat state [c1, c2, c3]
decodeList rest next
[] ->
state
_ -> crash "Base64.decode: this error should not be possible"
decode : List U8 -> List U8
decode = \bytes ->
# pad missing "="
list = bytes |> padMissingChars
# calc final padding chars
paddingLen = calculatePadding list
# Note: Totally would be best to use `List.withCapacity` here.
decodedList = decodeList list []
decodedList |> List.dropLast paddingLen
wow, it is almost as fast as the optimized version now.
Thanks!
Yeah, definitely something we should debug more. I definitely dont think the difference should be that stark. Oh, though maybe the chunks list of lists is hitting the refcounting perf bug. Not sure though
Yeah, 100% the refcounting issue. Perf is also fixed by the list-size-on-heap
branch if I can ever manage to land it.
Summary
./ref-fix-opt ran
1.04 ± 0.08 times faster than ./main-opt
1.07 ± 0.28 times faster than ./main-no-chunksOf-opt
1.68 ± 0.28 times faster than ./main-no-chunksOf
2.25 ± 0.14 times faster than ./ref-fix
1861.04 ± 122.30 times faster than ./main
Also, just to clarify, I would expect avoiding chunkOf
to generally be the better design even after the perf bug is fixed. This is seen when benchmarking at a really high sample size (2000 executions in this case) to get a more accurate difference in perf:
Summary
./main-no-chunksOf-opt ran
1.15 ± 0.07 times faster than ./ref-fix-opt
1.18 ± 0.07 times faster than ./main-opt
1.84 ± 0.08 times faster than ./main-no-chunksOf
2.62 ± 0.11 times faster than ./ref-fix
Last updated: Jul 06 2025 at 12:14 UTC