Here is my partial solution for Day 7. I've got to the point where it parses the input, and passes all my tests. I shouldn't have too much trouble implementing the next bit, but I would like to make my. parsing more reliable. At the moment I have to manually append a newline character to the end of the input to make it work properly. I've really had a lot of trouble trying to figure out how to skip
and ingore
something. Basically I want to consume/chomp a newline character if it's there, but if it's not then be happy. I could write something using buildPrimitiveParser
maybe, but I'm not sure how to get it to play nicely with the applicative API. Would really appreciate any assistance here. I'm definitely missing a trick on how const
and apply
work together here I think.
You may consider instead of saying that a line of input is valid if it ends in a newline, that your lines of input are valid if they are separated by newlines. for example: https://gist.github.com/ayazhafiz/f06a714f77a5736545c3b63fa942edf9#file-parser-roc-L152
Here's my Day 7. I feel like I should memoize the calculation but computers are fast.
Here is my completed Day 7. That was more of a challenge. I had quite a few crashes and things to work around, and then plenty of minor logic bugs along the way. I also refactored along the way to significantly speed up the calculation so now it runs almost instantly. Pretty happy with the parsers and combinators too :big_smile:
I know it's probably not a fair comparison right now and the implementations are very different, but a bit of fun anyway... but I've been comparing the speed of my Day 7 with mates who wrote it in Rust. My Roc implementation is sitting at 33.7ms, verse Rust's 2.2ms to run the whole puzzle including reading the file so about 15x slower at the moment. I'm pretty impressed with this! I imagine there is a lot of things that can be optimised yet in basic-cli. :smiley:
I would love to see some flamegraphs of the 2 apps to know where the time is being spent.
My Roc implementation is sitting at 33.7ms, verse Rust's 2.2ms to run the whole puzzle including reading the file so about 15x slower at the moment
Is this with both compilers optimizing the code?
Yes I think so. Its the release version of Rust, and roc then using roc build --optimize
Running it on my linux machine is a little slower
$ hyperfine --warmup 3 './app-aoc-2022-day-7'
Benchmark 1: ./app-aoc-2022-day-7
Time (mean ± σ): 48.3 ms ± 0.2 ms [User: 47.9 ms, System: 0.4 ms]
I'll try and figure out how to get a flame graph
You have a ton of allocations. So that probably is the main cost of your code in Roc
Is that mostly from the parsing logic do you think?
Not really sure. My gut feeling is that it is related to the dicts of lists of strings, but it could be the parser. Maybe I should compare to a different day that just parses and does simple math
Oh, parser is definitely part of it. List.sublist
called by the parser is pretty heavy.
Since we don't have seamless slices yet, all of those are allocations
My first attempt... how can I improve this?
use the legacy linker and maybe an unoptimized build if you want the best flamegraph. Though unoptimized is not necessarily representive of the final graph
I've been thinking the parser could just pass indexes around. Maybe that would be a lot faster?
With seamless slices, it should hopefullly be approximately as fast as passing around indices, but yeah, that would be faster than calling sublist currently.
Also, Dict.insert
is using a lot of the time in subfunctions. Not completely sure why. How big are these dictionaries? I think pretty small, so my guess is that the hashing is actually costing a solid amount. but that is just a guess.
Does the legacy linker work on linux atm?
$ roc build day7.roc --linker legacy
I was expecting this file to exist:
/home/luke/.cache/roc/packages/github.com/roc-lang/basic-cli/releases/download/0.1.2/3bKbbmgtIfOyC6FviJ9o8F8xqKutmXgjCJx3bMfVTSo/linux-x86_64.o
However, it was not there!
If you have the platform's source code locally, you may be able to generate it by re-running this command with --prebuilt-platform=false
Oh, it doesn't work with the url published platform
Optimized flamegraph: flamegraph.svg
Unoptimized flamegraph: flamegraph.svg
Allocation count: 44060
haha...That's crazy.
lol, I think I need to figure out how to write Roc without all the allocations. But tbh, I'm not exactly sure where the line is, what makes an allocation and what doesn't? I feel like it is everytime you create a new thing in memory. So if I can change my code to re-use the same structure, i.e. a record that gets passed around then that would reduce the allocations?
Though most of the time malloc is giving us the same pointer over an over again. We allocate it and free it and then allocate it again...
The address 0x5644723eb9b0
got allocated 21633
times for me. Most of the time just with about 10 to 20 bytes of data.
given the varying small sizes, I would assume most of the allocations are Str
or List U8
, but I am not sure...just a guess.
I should double check how I wrote dict to make sure it doesn't allocate unnecessarily when taking a key that is on the heap.
To clarify this Issue#1361 is what you mean by seamless slices? I guess this feature is still in the works?
So your breakdown seems to be about:
Just parsing:
12ms
20541 allocations
Building directory list:
22ms
18229 allocations
Rest:
10ms
5290 allocations
Yeah, that feature. It is decently well thought through at this point, just someone has to actually implement it and wire it through the compiler.
Looks really amazing for parsers! :drooling:
Now just imagine an essentially zero allocation parser because the lists and strings all are seamless slices and just referencing the original strings/lists. Allocating at most once for each final element going into your datastructure
Anyway, i definitely think I need to dig into your build directory listing function and what it calls to figure out if dicts have an issue with cloning keys and what the cost actually is here.
One immediate thought on general performance, can you reverse your cwd list.? Append to it and drop last. Prepend and drop first are expensive.
Probably would be better to just reverse at the very end if needed.
Just gave it a shot, made no difference.
I definitely need to look into dictionary and see what is going on. I think there may be a referencing issue there.
Past that, it should help performance to change the key from List Str
to Str
for now
Just use a helper like: makeKey = \path -> Str.joinWith path "/"
Oh yeah, almost certainly a referencing bug with dictionary keys that leads to copies. That or a referencing bug with hashing lists of reference counted types that leads to copies. Changing the key type to Str just for buildDirectoryListing
and it's sub functions cuts out ~15000 allocations.
Random aside, I have no idea why, but the program as a whole even after reseting all of the code is running about 2x slower for me. My computer must be throttling or doing something weird, but so far I can't figure out what.
Ok. I think I found the root issue at least for the dictionary part. Small String hashing has terrible performance (it always allocates). All of your strings are small strings. So you haveList SmallStr
, which is probably the worst case for hashing in the compiler currently. That is why joining the strings is faster. It costs one allocation, but leads to way better hashing.
I think we may need a special hack for small string hashing if we want good performance here.
Anyway new summary: With Str
keys to the Dict
for the buildDirectoryListing
the time of that function goes from 22ms to about 3ms
It's currently a List Str
what do you mean by Str
keys?
newKey = Str.joinWith oldKey "/"
Luke Boswell said:
lol, I think I need to figure out how to write Roc without all the allocations. But tbh, I'm not exactly sure where the line is, what makes an allocation and what doesn't? I feel like it is everytime you create a new thing in memory. So if I can change my code to re-use the same structure, i.e. a record that gets passed around then that would reduce the allocations?
It's whenever you create a new thing whose type has variable size
For example every value of U64
is always the same size. But different values of List U64
could be different sizes. That means U64
does not allocate but List U64
does.
The only 3 things in Roc that allocate are List
, Str
, and recursive tag unions (for example tree structures or linked lists).
Numbers, records, and non-recursive tags don't allocate.
The one exception to the rule is that we have a performance trick where we are able to skip allocation for "small" Str
(smaller than 3 Nat
s). In general this makes things faster, but it means that low-level code in the standard library needs special cases for small strings. And Brendan found out that the hashing code for this case isn't as fast as it could be.
think i'll take a break for today since I ran into 3 probable compiler bugs in a row :sweat_smile:
Man that was hard!
Catching up still :grinning: day7
Had to borrow some of your code @Luke Boswell
(gave you a shout out in the comments) :grinning:
Thanks, your solution looks nice! :ok:
Last updated: Jul 06 2025 at 12:14 UTC