Stream: advent of code

Topic: Day 7


view this post on Zulip Luke Boswell (Dec 07 2022 at 11:27):

Here is my partial solution for Day 7. I've got to the point where it parses the input, and passes all my tests. I shouldn't have too much trouble implementing the next bit, but I would like to make my. parsing more reliable. At the moment I have to manually append a newline character to the end of the input to make it work properly. I've really had a lot of trouble trying to figure out how to skip and ingore something. Basically I want to consume/chomp a newline character if it's there, but if it's not then be happy. I could write something using buildPrimitiveParser maybe, but I'm not sure how to get it to play nicely with the applicative API. Would really appreciate any assistance here. I'm definitely missing a trick on how const and apply work together here I think.

view this post on Zulip Ayaz Hafiz (Dec 07 2022 at 13:46):

You may consider instead of saying that a line of input is valid if it ends in a newline, that your lines of input are valid if they are separated by newlines. for example: https://gist.github.com/ayazhafiz/f06a714f77a5736545c3b63fa942edf9#file-parser-roc-L152

view this post on Zulip Shritesh Bhattarai (Dec 07 2022 at 14:29):

Here's my Day 7. I feel like I should memoize the calculation but computers are fast.

view this post on Zulip Luke Boswell (Dec 08 2022 at 22:57):

Here is my completed Day 7. That was more of a challenge. I had quite a few crashes and things to work around, and then plenty of minor logic bugs along the way. I also refactored along the way to significantly speed up the calculation so now it runs almost instantly. Pretty happy with the parsers and combinators too :big_smile:

view this post on Zulip Luke Boswell (Dec 08 2022 at 23:44):

I know it's probably not a fair comparison right now and the implementations are very different, but a bit of fun anyway... but I've been comparing the speed of my Day 7 with mates who wrote it in Rust. My Roc implementation is sitting at 33.7ms, verse Rust's 2.2ms to run the whole puzzle including reading the file so about 15x slower at the moment. I'm pretty impressed with this! I imagine there is a lot of things that can be optimised yet in basic-cli. :smiley:

view this post on Zulip Brendan Hansknecht (Dec 08 2022 at 23:46):

I would love to see some flamegraphs of the 2 apps to know where the time is being spent.

view this post on Zulip Ayaz Hafiz (Dec 08 2022 at 23:55):

My Roc implementation is sitting at 33.7ms, verse Rust's 2.2ms to run the whole puzzle including reading the file so about 15x slower at the moment

Is this with both compilers optimizing the code?

view this post on Zulip Luke Boswell (Dec 08 2022 at 23:57):

Yes I think so. Its the release version of Rust, and roc then using roc build --optimize

view this post on Zulip Luke Boswell (Dec 09 2022 at 00:06):

Running it on my linux machine is a little slower

$ hyperfine --warmup 3 './app-aoc-2022-day-7'
Benchmark 1: ./app-aoc-2022-day-7
  Time (mean ± σ):      48.3 ms ±   0.2 ms    [User: 47.9 ms, System: 0.4 ms]

view this post on Zulip Luke Boswell (Dec 09 2022 at 00:06):

I'll try and figure out how to get a flame graph

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:26):

You have a ton of allocations. So that probably is the main cost of your code in Roc

view this post on Zulip Luke Boswell (Dec 09 2022 at 00:35):

Is that mostly from the parsing logic do you think?

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:38):

Not really sure. My gut feeling is that it is related to the dicts of lists of strings, but it could be the parser. Maybe I should compare to a different day that just parses and does simple math

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:39):

Oh, parser is definitely part of it. List.sublist called by the parser is pretty heavy.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:40):

Since we don't have seamless slices yet, all of those are allocations

view this post on Zulip Luke Boswell (Dec 09 2022 at 00:41):

kernal.svg

view this post on Zulip Luke Boswell (Dec 09 2022 at 00:41):

My first attempt... how can I improve this?

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:42):

use the legacy linker and maybe an unoptimized build if you want the best flamegraph. Though unoptimized is not necessarily representive of the final graph

view this post on Zulip Luke Boswell (Dec 09 2022 at 00:42):

I've been thinking the parser could just pass indexes around. Maybe that would be a lot faster?

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:43):

With seamless slices, it should hopefullly be approximately as fast as passing around indices, but yeah, that would be faster than calling sublist currently.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:44):

Also, Dict.insert is using a lot of the time in subfunctions. Not completely sure why. How big are these dictionaries? I think pretty small, so my guess is that the hashing is actually costing a solid amount. but that is just a guess.

view this post on Zulip Luke Boswell (Dec 09 2022 at 00:44):

Does the legacy linker work on linux atm?

$ roc build day7.roc --linker legacy

I was expecting this file to exist:

    /home/luke/.cache/roc/packages/github.com/roc-lang/basic-cli/releases/download/0.1.2/3bKbbmgtIfOyC6FviJ9o8F8xqKutmXgjCJx3bMfVTSo/linux-x86_64.o

However, it was not there!

If you have the platform's source code locally, you may be able to generate it by re-running this command with --prebuilt-platform=false

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:48):

Oh, it doesn't work with the url published platform

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:49):

Optimized flamegraph: flamegraph.svg

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:50):

Unoptimized flamegraph: flamegraph.svg

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:52):

Allocation count: 44060

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:52):

haha...That's crazy.

view this post on Zulip Luke Boswell (Dec 09 2022 at 00:54):

lol, I think I need to figure out how to write Roc without all the allocations. But tbh, I'm not exactly sure where the line is, what makes an allocation and what doesn't? I feel like it is everytime you create a new thing in memory. So if I can change my code to re-use the same structure, i.e. a record that gets passed around then that would reduce the allocations?

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:55):

Though most of the time malloc is giving us the same pointer over an over again. We allocate it and free it and then allocate it again...

The address 0x5644723eb9b0 got allocated 21633 times for me. Most of the time just with about 10 to 20 bytes of data.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:56):

given the varying small sizes, I would assume most of the allocations are Str or List U8, but I am not sure...just a guess.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 00:58):

I should double check how I wrote dict to make sure it doesn't allocate unnecessarily when taking a key that is on the heap.

view this post on Zulip Luke Boswell (Dec 09 2022 at 01:05):

To clarify this Issue#1361 is what you mean by seamless slices? I guess this feature is still in the works?

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 01:08):

So your breakdown seems to be about:

Just parsing:
12ms
20541 allocations

Building directory list:
22ms
18229 allocations

Rest:
10ms
5290 allocations

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 01:08):

Yeah, that feature. It is decently well thought through at this point, just someone has to actually implement it and wire it through the compiler.

view this post on Zulip Luke Boswell (Dec 09 2022 at 01:11):

Looks really amazing for parsers! :drooling:

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 01:12):

Now just imagine an essentially zero allocation parser because the lists and strings all are seamless slices and just referencing the original strings/lists. Allocating at most once for each final element going into your datastructure

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 01:14):

Anyway, i definitely think I need to dig into your build directory listing function and what it calls to figure out if dicts have an issue with cloning keys and what the cost actually is here.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 01:17):

One immediate thought on general performance, can you reverse your cwd list.? Append to it and drop last. Prepend and drop first are expensive.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 01:18):

Probably would be better to just reverse at the very end if needed.

view this post on Zulip Luke Boswell (Dec 09 2022 at 01:24):

Just gave it a shot, made no difference.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 01:57):

I definitely need to look into dictionary and see what is going on. I think there may be a referencing issue there.

Past that, it should help performance to change the key from List Str to Str for now
Just use a helper like: makeKey = \path -> Str.joinWith path "/"

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 02:01):

Oh yeah, almost certainly a referencing bug with dictionary keys that leads to copies. That or a referencing bug with hashing lists of reference counted types that leads to copies. Changing the key type to Str just for buildDirectoryListing and it's sub functions cuts out ~15000 allocations.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 02:04):

Random aside, I have no idea why, but the program as a whole even after reseting all of the code is running about 2x slower for me. My computer must be throttling or doing something weird, but so far I can't figure out what.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 02:35):

Ok. I think I found the root issue at least for the dictionary part. Small String hashing has terrible performance (it always allocates). All of your strings are small strings. So you haveList SmallStr, which is probably the worst case for hashing in the compiler currently. That is why joining the strings is faster. It costs one allocation, but leads to way better hashing.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 02:36):

I think we may need a special hack for small string hashing if we want good performance here.

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 02:40):

Anyway new summary: With Str keys to the Dict for the buildDirectoryListing the time of that function goes from 22ms to about 3ms

view this post on Zulip Luke Boswell (Dec 09 2022 at 02:43):

It's currently a List Str what do you mean by Str keys?

view this post on Zulip Brendan Hansknecht (Dec 09 2022 at 02:46):

newKey = Str.joinWith oldKey "/"

view this post on Zulip Brian Carroll (Dec 09 2022 at 08:49):

Luke Boswell said:

lol, I think I need to figure out how to write Roc without all the allocations. But tbh, I'm not exactly sure where the line is, what makes an allocation and what doesn't? I feel like it is everytime you create a new thing in memory. So if I can change my code to re-use the same structure, i.e. a record that gets passed around then that would reduce the allocations?

It's whenever you create a new thing whose type has variable size
For example every value of U64 is always the same size. But different values of List U64 could be different sizes. That means U64 does not allocate but List U64 does.
The only 3 things in Roc that allocate are List, Str, and recursive tag unions (for example tree structures or linked lists).

view this post on Zulip Brian Carroll (Dec 09 2022 at 08:49):

Numbers, records, and non-recursive tags don't allocate.

view this post on Zulip Brian Carroll (Dec 09 2022 at 08:58):

The one exception to the rule is that we have a performance trick where we are able to skip allocation for "small" Str (smaller than 3 Nats). In general this makes things faster, but it means that low-level code in the standard library needs special cases for small strings. And Brendan found out that the hashing code for this case isn't as fast as it could be.

view this post on Zulip Gabriel Pickl (Dec 09 2022 at 22:10):

think i'll take a break for today since I ran into 3 probable compiler bugs in a row :sweat_smile:

view this post on Zulip Logan Lowder (Dec 10 2022 at 23:35):

Man that was hard!
Catching up still :grinning: day7

view this post on Zulip Logan Lowder (Dec 10 2022 at 23:36):

Had to borrow some of your code @Luke Boswell
(gave you a shout out in the comments) :grinning:

view this post on Zulip Luke Boswell (Dec 11 2022 at 00:57):

Thanks, your solution looks nice! :ok:


Last updated: Jul 06 2025 at 12:14 UTC