Learning questions with 1brc: no args with legacy linker? · beginners

Hi! I'm new here, and I'm learning Roc by rewriting some coding puzzles I've done previously in Haskell in Roc. I'm currently doing the One Billion Row Challenge (which is basically just parsing a list of cities with temperatures and returning the min/mean/max temperature for each city).

app [main] {
    pf: platform "https://github.com/roc-lang/basic-cli/releases/download/0.10.0/vNe6s9hWzoTZtFmNkvEICPErI9ptji_ySjicO6CkucY.tar.br",
}

import pf.Stdout
import pf.File
import pf.Path
import pf.Task
import pf.Arg

main =
    finalTask =
        pathArg = readFirstArgT!
        readFileToStr (Path.fromStr pathArg)

    finalResult <- Task.attempt finalTask

    when finalResult is
        Err ZeroArgsGiven ->
            Task.err (Exit 1 "Error ZeroArgsGiven")

        Err (ReadFileErr errMsg) ->
            Task.err (Exit 1 "Error ReadFileErr:\n$(errMsg)")

        Ok fileContentStr ->
            fileContentStr |> parse |> Stdout.line


readFirstArgT =
    args = Arg.list!
    dbg args
    List.get args 1 |> Result.mapErr (\_ -> ZeroArgsGiven) |> Task.fromResult


readFileToStr = \path ->
    path
    |> File.readUtf8
    |> Task.mapErr \_ -> ReadFileErr "Failed to read file: $(Path.display path)"


parse = \contents ->
    citiesList = Str.split contents "\n"

    results = List.walk citiesList (Dict.empty {}) \state, elem ->
        cityWithTemp = Str.split elem ";"
        city =
            when List.get cityWithTemp 0 is
                Ok v -> v
                Err _ -> crash "err"

        temp =
            when List.get cityWithTemp 1 is
                Ok v -> when Str.toF64 v is
                            Ok w -> w
                            Err _ -> crash "err"
                Err _ -> crash "err"

        Dict.update state city \value ->
            when value is
                Missing ->
                    Present { min: temp, mean: temp, max: temp }
                Present { min: oldMin, mean: oldMean, max: oldMax } ->
                    Present
                        {
                            min: Num.min temp oldMin,
                            mean: (temp + oldMean) / 2,
                            max: Num.max temp oldMax,
                        }

    results
    |> Dict.toList
    |> List.map \(city, {min, mean, max}) ->
        "$(city): $(Num.toStr min), $(Num.toStr mean), $(Num.toStr max)"
    |> Str.joinWith "\n"

roc dev tells me "The surgical linker currently has issue #3609 and would fail linking your app. Please use --linker=legacy to avoid the issue for now." I'm not sure which change made it so the surgical linker wouldn't work anymore, but I think it was inbetween when I realised I can't Stdout.line a Dict and have to manually create the string to print. When I run it with the legacy linker (roc dev --linker=legacy -- measurements.txt), it always results in ZeroArgsGiven, and dbg args gives [ ]. If I remember correctly, this wasn't so before I introduced whatever caused the problem with the linker. Could this be, or is it a coincidence? If so, what mistake am I making in my code?

I also wanted to know if there is a way to print, e.g., dicts. I'm used to elements having show in Haskell, so I can do print on basically any structure. Having to do that last results |> Dict.toList |> List.map ... |> Str.joinWith "\n" bit is a bit tedious tbh.

Then, I'm still working on my understanding of error handling, but I'm pretty sure this is sub-optimal:

when List.get cityWithTemp 1 is
                Ok v -> when Str.toF64 v is
                            Ok w -> w
                            Err _ -> crash "err"
                Err _ -> crash "err"

Brendan Hansknecht (Jun 14 2024 at 21:09):

Paul Joubert (Jun 14 2024 at 21:11):

Anton (Jun 15 2024 at 12:59):

I'll do a new basic-cli release soon that should improve the args parsing.
If you want you can use this to simplify the file reading a lot.

Paul Joubert (Jun 15 2024 at 13:30):

I'm already "watching" the basic-cli repo :) looking forward to it <3
About the import file syntax, does that compile the file into the binary? Because the one billion row file is 14GB haha. Also, I think I'd prefer to keep the file dynamic, so I can benchmark performance on different sizes without recompiling.

Paul Joubert (Jun 15 2024 at 13:40):

I've rerun the CLI args example as is, and it doesn't see the args when using the legacy linker, so it seems the problem is with using the legacy linker. Legacy linker means using shared system libraries and such, right? In which case maybe the problem is because I'm running NixOS. Does anyone know what in my code above prevents the surgical linker from working? I didn't see anything on the issue that seemed relevant.

Anton (Jun 15 2024 at 13:40):

Anton (Jun 15 2024 at 13:41):

Anton (Jun 15 2024 at 13:47):

$ roc build yourApp.roc --optimize

Paul Joubert (Jun 15 2024 at 13:49):

Oh yeah, I've been intending to do that but haven't gotten it to build yet in the first place haha :smiling: but thanks tho

Paul Joubert (Jun 15 2024 at 13:50):

I've been following the meta conversation about Roc development, all Richard's talks and podcasts and such, so I know a lot about the way things work, I just haven't actually sat down and written code until now hahaha

Anton (Jun 15 2024 at 14:00):

Paul Joubert (Jun 15 2024 at 14:00):

Anton (Jun 15 2024 at 14:15):

The Str.toF64 is what requires the legacy linker, Str.toU64 does work (if I change cities.csv to natural numbers).

Paul Joubert (Jun 15 2024 at 14:20):

Anton (Jun 15 2024 at 14:20):

Anton (Jun 15 2024 at 14:21):

Anton (Jun 15 2024 at 14:29):

No issue, it doesn't happen every time when using Str.toDec (in other code), I'll minimize and make an issue.

Musab Nazir (Jun 15 2024 at 16:31):

These were some of my numbers for a super naive implementation that stopped after parsing and creating the dict

1M row file
roc reference: 103271ms
gleam reference: 2251ms
gleam (arrays): 2901ms
go reference: 140ms

Anton (Jun 15 2024 at 16:47):

Anton (Jun 15 2024 at 16:59):

Anton (Jun 15 2024 at 18:08):

I've investigated the empty args issue further and also found a temporary workaround :)

Luke Boswell (Jun 15 2024 at 22:34):

Luke Boswell (Jun 15 2024 at 22:35):

Paul Joubert (Jun 16 2024 at 15:10):

app [main] {
    pf: platform "./basic-cli/platform/main.roc",
}

import pf.Stdout
import pf.File
import pf.Task
import pf.Arg

main =
    finalTask =
        pathArg = readFirstArgT!
        readFileToStr pathArg

    finalResult <- finalTask |> Task.attempt

    when finalResult is
        Err ZeroArgsGiven ->
            Task.err (Exit 1 "Error ZeroArgsGiven")
        Err (ReadFileErr errMsg) ->
            Task.err (Exit 1 "Error ReadFileErr:\n$(errMsg)")
        Ok fileContentStr ->
            fileContentStr |> parse |> Inspect.toStr |> Stdout.line!
        _ ->
            Task.err (Exit 1 "Unknown error")

readFirstArgT =
    args = Arg.list!
    List.get args 1 |> Result.mapErr (\_ -> ZeroArgsGiven) |> Task.fromResult

readFileToStr = \path ->
    path
    |> File.readUtf8
    |> Task.mapErr \_ -> ReadFileErr "Failed to read file: $(path)"

parse = \contents ->
    citiesList = Str.split contents "\n"

    List.walk citiesList (Dict.empty {}) \state, elem ->
        cityWithTemp = Str.split elem ";"
        [city, t] = cityWithTemp
        temp = Result.withDefault (Str.toF32 t) 0

        Dict.update state city \value ->
            when value is
                Missing -> Present { min: temp, mean: temp, max: temp }
                Present { min: oldMin, mean: oldMean, max: oldMax } ->
                    Present {
                        min: Num.min temp oldMin,
                        mean: (temp + oldMean) / 2,
                        max: Num.max temp oldMax,
                    }

hyperfine --warmup 1 -L num 1_000,10_000,100_000,1_000_000 "./main measurements_{num}.txt"

Benchmark 1: ./main ../1brc-hs/measurements_1_000.txt
  Time (mean ± σ):       6.9 ms ±   0.3 ms    [User: 5.8 ms, System: 1.1 ms]
  Range (min … max):     6.2 ms …   8.4 ms    261 runs

Benchmark 2: ./main ../1brc-hs/measurements_10_000.txt
  Time (mean ± σ):      67.6 ms ±   1.2 ms    [User: 65.2 ms, System: 2.0 ms]
  Range (min … max):    65.4 ms …  71.3 ms    42 runs

Benchmark 3: ./main ../1brc-hs/measurements_100_000.txt
  Time (mean ± σ):     661.0 ms ±   5.8 ms    [User: 654.7 ms, System: 3.9 ms]
  Range (min … max):   653.8 ms … 671.7 ms    10 runs

Benchmark 4: ./main ../1brc-hs/measurements_1_000_000.txt
  Time (mean ± σ):      6.557 s ±  0.041 s    [User: 6.510 s, System: 0.025 s]
  Range (min … max):    6.501 s …  6.621 s    10 runs

{-# LANGUAGE OverloadedStrings #-}
{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-}

module Main where

import System.Environment (getArgs)
import Data.Foldable (foldl')
import qualified Data.Map.Strict as M
import qualified Data.Text as T

type Results = M.Map T.Text (Float, Float, Float)

main :: IO ()
main = do
  [path] <- getArgs
  txt <- readFile path
  print $ M.toList . foldl' addCity M.empty . T.lines . T.pack $ txt

addCity :: Results -> T.Text -> Results
addCity results line =
  let (city, temp) = parseLine line
  in M.insertWith f city (temp, temp, temp) results
      where f (temp, _, _) (oldMin, oldMean, oldMax) =
              (min temp oldMin, (temp + oldMean) / 2, max temp oldMax)

parseLine :: T.Text -> (T.Text, Float)
parseLine line =
  (x, read (T.unpack y) :: Float)
    where [x, y] = T.splitOn ";" line

|       rows | runs | result (ms) |       ± |
|-----------:|-----:|------------:|--------:|
|      1 000 |  164 |        13.4 |     0.9 |
|     10 000 |   50 |        52.9 |     3.8 |
|    100 000 |   10 |       386.2 |    10.7 |
|  1 000 000 |   10 |     4 107.0 |    28.0 |
| 10 000 000 |   10 |    47 138.0 | 1 736.0 |

For some reason, the Haskell is faster from 100 000 rows. Since Roc starts out faster, maybe this is a file reading thing? Not sure. (I did compile with --optimize).

I do like the Haskell for being more concise, but then again it doesn't do any error handling.

Brendan Hansknecht (Jun 16 2024 at 17:01):

Just a note, we have a bug that effects dict containing strings and would be killing your perf

Brendan Hansknecht (Jun 16 2024 at 17:58):

Also, I'm really surprised that this works. It is probably a bug: [city, t] = cityWithTemp. Cause there is no guarantee the list is length 2.

Brendan Hansknecht (Jun 16 2024 at 18:49):

So, I wrote a terrible function to instead of using strings, use tuples of 32 U8s. The longest string was 26 bytes, so everything fits in these tuples, but they are way less efficient for hashing than a string. This should actually be a slower solution, but it work around a current refcounting bug.

|       rows | runs | result (ms) |       ± |
|-----------:|-----:|------------:|--------:|
|      1 000 |  294 |         6.8 |     0.4 |
|     10 000 |   55 |        54.9 |     6.5 |
|    100 000 |   10 |       531.8 |    72.8 |
|  1 000 000 |   10 |     5 434.0 |   737.0 |
| 10 000 000 |   10 |    51 639.0 | 6 809.0 |

|       rows | runs | result (ms) |       ± |
|-----------:|-----:|------------:|--------:|
|      1 000 |  248 |         4.1 |     0.4 |
|     10 000 |  300 |         6.3 |     0.2 |
|    100 000 |   96 |        27.6 |     2.7 |
|  1 000 000 |   12 |       233.2 |     1.1 |
| 10 000 000 |   10 |     2 291.0 |    15.0 |

The Dict Str version should actually be faster than this second version with the tuples, but we have a refcounting bug to finish fixing. So the roc version should be about 20x faster for large inputs.

Paul Joubert (Jun 16 2024 at 20:15):

Oh huh, I think I wrote that with the intention of afterwards chasing down the type errors to do error handling, and then when the checker didn't complain I just forgot about it!

Paul Joubert (Jun 16 2024 at 20:17):

Paul Joubert (Jun 16 2024 at 20:26):

It's so exciting to write functional/high-level code — and a pretty naive implementation also — that is just really fast!!

Paul Joubert (Jun 16 2024 at 20:54):

Seoul;8.7
Minneapolis;3.5
El Paso;22.0
Helsinki;9.1
Foo
Reggane;15.4
Gaborone;25.3
Benghazi;31.6
Bridgetown;38.0
Algiers;26.8
Chihuahua;12.9

{
  "Seoul": {max: 8.699999809265137, mean: 8.699999809265137, min: 8.699999809265137 },
  "Minneapolis": {max: 3.5, mean: 3.5, min: 3.5 },
  "El Paso": {max: 22, mean: 22, min: 22 },
  "Helsinki": {max: 9.100000381469727, mean: 9.100000381469727, min: 9.100000381469727 },
  "Foo": {max: 9.100000381469727, mean: 9.100000381469727, min: 9.100000381469727 },
  "Reggane": {max: 15.399999618530273, mean: 15.399999618530273, min: 15.399999618530273 },
  "Gaborone": {max: 25.299999237060547, mean: 25.299999237060547, min: 25.299999237060547 },
  "Benghazi": {max: 31.600000381469727, mean: 31.600000381469727, min: 31.600000381469727 },
  "Bridgetown": {max: 38, mean: 38, min: 38 },
  "Algiers": {max: 26.799999237060547, mean: 26.799999237060547, min: 26.799999237060547 },
  "Chihuahua": {max: 12.899999618530273, mean: 12.899999618530273, min: 12.899999618530273 },
  "": {max: 12.899999618530273, mean: 12.899999618530273, min: 12.899999618530273 }
}

I also noticed that it added a blank string at the bottom, which is because splitting the input on "\n" results in a blank string at the end (or rather, the file string ends on a \n):

» Str.split "Seoul;8.7\nMinneapolis;3.5\n" "\n"
["Seoul;8.7", "Minneapolis;3.5", ""] : List Str

Anyway, when I insert dbg cityWithTemp above [city, t] = cityWithTemp, compilation fails with

This looks like an operator, but it's not one I recognize!

45│          dbg cityWithTemp
46│          [city, t] = cityWithTemp
                       ^

I have no specific suggestion for this operator, see TODO for the full
list of operators in Roc.

I remember trying [city, t] = Str.split elem ";", which, trying again, results in the same "looks like an operator" error. I originally thought this was maybe because split can fail, but I realise now it can't, so it must be something else.

 ...
    List.walk citiesList (Dict.empty {}) \state, elem ->
        cityWithTemp = Str.split elem ";"
        [city, t] = cityWithTemp
        dbg cityWithTemp
        dbg t
        temp = Result.withDefault (Str.toF32 t) 0
        dbg temp

        Dict.update state city \value -> ...

...
[main.roc:46] cityWithTemp = ["Helsinki", "9.1"]
[main.roc:47] t = "9.1"
[main.roc:49] temp = 9.100000381469727

[main.roc:46] cityWithTemp = ["Foo"]
[main.roc:47] t = "9.1"
[main.roc:49] temp = 9.100000381469727

[main.roc:46] cityWithTemp = ["Reggane", "15.4"]
[main.roc:47] t = "15.4"
[main.roc:49] temp = 15.399999618530273
...

Brendan Hansknecht (Jun 16 2024 at 21:24):

Brendan Hansknecht (Jun 16 2024 at 21:25):

I think it should be faster and give you a record that can be matched inline like that. As I mentioned earlier, I don't think[city, t] = ... is actually expected to work in general. So it may confuse the compiler.

Brendan Hansknecht (Jun 16 2024 at 21:29):

Also, the value carry over is definitely a bug. If you could file an issue on GitHub, that would be great.

Paul Joubert (Jun 17 2024 at 13:40):

I tried this (not sure if the nested whens is best way to express this :thinking: )

parse = \contents ->
    citiesList = contents |> Str.split "\n"

    List.walk citiesList (Dict.empty {}) \state, elem ->
        when Str.splitFirst elem ";" is
            Ok {before: city, after} ->
                temp = after |> Str.toF32 |> Result.withDefault 0

                Dict.update state city \value ->
                    when value is
                        Missing -> Present { min: temp, mean: temp, max: temp }
                        Present { min: oldMin, mean: oldMean, max: oldMax } ->
                            Present {
                                min: Num.min temp oldMin,
                                mean: (temp + oldMean) / 2,
                                max: Num.max temp oldMax,
                            }

            Err NotFound -> state

and it's actually a tiny bit slower. But by like 1ms in the 1000 line case and 200ms in the 1m line case, so could be insignificant.

Brendan Hansknecht (Jun 17 2024 at 15:21):

This intuitively makes some sense to me. I'm pretty sure that [city, t] = ... has no error handling. So it runs faster due to having a bug that is hit if the list isn't exactly (at least?) 2 elements.

Also, does it help if you change the error case to:
Err NotFound -> crash "impossible"

Paul Joubert (Jun 17 2024 at 15:27):

Paul Joubert (Jun 17 2024 at 15:49):

Problem is, the error case is reached because of the newline at the end of the file and Str.splitadds blank strings, e.g., Str.split "foo" "o" => ["f", "", ""].
I think this is expected behaviour since Haskell's splitOn also does this. I was actually looking for an equivalent of lines.

Paul Joubert (Jun 17 2024 at 15:55):

Musab Nazir (Jun 17 2024 at 16:06):

An interesting observation:
You guys were getting 5-6s for the 1M files and even though my implementation (Dict Str) was virtually identical my M1 mac would finish consistently around 100-110s. After looking into it a lot, it came down to my input generation script. The one I was using, would create the correct number of rows, but it had waaay more unique cities. When I swapped the script to one linked on the official 1brc site I started getting ~5s. This input script only has 413 unique cities whereas my old one could go up to 44k which exacerbates the Dict issue I think

Paul Joubert (Jun 17 2024 at 16:07):

Trimming the newline at the end before splitting, and using crash on the nofound error:

Benchmark 1: ./main ../1brc-hs/measurements_1_000.txt
  Time (mean ± σ):       7.1 ms ±   0.3 ms    [User: 5.7 ms, System: 1.3 ms]
  Range (min … max):     6.4 ms …   8.4 ms    267 runs

Benchmark 2: ./main ../1brc-hs/measurements_10_000.txt
  Time (mean ± σ):      69.2 ms ±   0.7 ms    [User: 67.0 ms, System: 1.7 ms]
  Range (min … max):    68.0 ms …  71.8 ms    41 runs

Benchmark 3: ./main ../1brc-hs/measurements_100_000.txt
  Time (mean ± σ):     677.7 ms ±   4.1 ms    [User: 670.1 ms, System: 5.0 ms]
  Range (min … max):   672.5 ms … 686.2 ms    10 runs

Benchmark 4: ./main ../1brc-hs/measurements_1_000_000.txt
  Time (mean ± σ):      6.839 s ±  0.060 s    [User: 6.777 s, System: 0.029 s]
  Range (min … max):    6.755 s …  6.920 s    10 runs

Paul Joubert (Jun 17 2024 at 16:09):

@Musab Nazir oh interesting, and yeah, my test files were generated by the "official" python script.

Brendan Hansknecht (Jun 17 2024 at 18:53):

Basically, it is recursively updating all of the refcounts of the stored string keys

Brendan Hansknecht (Jul 18 2024 at 06:19):

It's only single threaded, so obviously the performance can't be that good. I'm sure it also has tons of room for improvement (probably a lot within roc itself).

Assuming perfect parallelism over the 8 cores the test uses, we would expect something in the range of:
M1: 18s
Linux: 34s

For context, good results for the expanded key version of the test start at about 20s and go down as low as 3s. That said, good results also do a lot more tricks than we do (like swar for finding the semicolons).

Note: I used the python generator, which actually leads to slower results cause it uses 8,921 weather station names instead of just the 413 of the java generator.

Brendan Hansknecht (Jul 18 2024 at 06:20):

Brendan Hansknecht (Jul 18 2024 at 06:33):

To get a bit more of a direct comparison. This is the timing of r4 on my machines with my same input data:
M1: 66s
Linux: 73s

Brendan Hansknecht (Jul 18 2024 at 06:35):

Perf of the roc code is split as such:
25% list.splitFirst
25% parsing the rest of row (probably the pattern match)
50% Dict.update

Richard Feldman (Jul 18 2024 at 12:12):

Brendan Hansknecht (Jul 18 2024 at 15:34):

Brendan Hansknecht (Jul 18 2024 at 15:41):

Aside from using an mmap instead of buffered io, the two implementations are almost identical algorithmically. With the caveat that I use pattern matching while go uses manual but roughly equivalent code.

Brendan Hansknecht (Jul 18 2024 at 16:00):

Also, at a minimum for split, it is doing a ton of redundant checks by calling List.sublist twice. But I would guess that iterating in roc with refcount, tags, and list checks probably costs a solid amount then a quick loop in zig.

Richard Feldman (Jul 18 2024 at 16:29):

oh interesting! do you think it would be viable to move more of that to zig to avoid redundant checks?

Brendan Hansknecht (Jul 18 2024 at 17:06):

Also, in this specific case, a bytes specific version could use simd or swar. Which would be even faster, but I don't think the go version does that.

Richard Feldman (Jul 18 2024 at 18:13):

Richard Feldman (Jul 18 2024 at 18:14):

and having builtins just automatically run faster for common cases like that seems great

Richard Feldman (Jul 18 2024 at 18:14):

Brendan Hansknecht (Jul 18 2024 at 20:20):

Yeah, one note related to that, would also be great if we allowed for special ability specialization for lists of bytes. For example, hashing has addBytes, but you have to wrap a List U8 in an opaque type to use it. So List U8 hashing is slow by default.

Brendan Hansknecht (Jul 18 2024 at 20:49):

Yeah, doubled checked, List U8 hashing is on the slow path by default. It hashes each element individually.

Richard Feldman (Jul 18 2024 at 20:55):

Richard Feldman (Jul 18 2024 at 20:56):

Brendan Hansknecht (Jul 18 2024 at 20:57):

yeah, and with equality checks, we can use treat any list of types without pointers (or opaques) as a big list of bytes and just blit over them instead of caring about padding.

Brendan Hansknecht (Jul 18 2024 at 20:59):

Actually, i guess we could do that with hashing as well...maybe....depends if we want hashing a b and c to be equivalent to hashing [a,b,c]. Where a/b/c are variables of some random type alias.

Brendan Hansknecht (Jul 18 2024 at 21:00):

Oh wait, list hashing is already different than just hashing the elements...so yeah, any type that can be seen as a bag of bits (memcpy safe in the llvm backend) could use way faster list hashing.

Brendan Hansknecht (Jul 18 2024 at 21:00):

That said, I do think there is a problem here with ordering of passes. Abilities are generated when we don't know concrete types. These optimizations require knowing the concrete final type.

Richard Feldman (Jul 18 2024 at 21:04):

Richard Feldman (Jul 18 2024 at 21:05):

I'd be surprised if there wasn't some way to get that to work without restructuring things, although @Ayaz Hafiz would know better

Brendan Hansknecht (Jul 18 2024 at 21:09):

Oh, I bet we could just make hashList a low level. So the ability dispatches to hashList. Then the low level should know the concrete type and be able to dispatch farther.

Brendan Hansknecht (Jul 18 2024 at 21:10):

Brendan Hansknecht (Jul 18 2024 at 21:14):

Also, as much as I would love to keep dict in roc (and think it should long term be possible to make efficient in roc), we probably could get some large perf gains by moving dict to zig. (oh, and making sure we enable borrow inference for dict and set the same as list and str).

That said, it would be a pretty large project to move dict and may not be worth it for now.

Brendan Hansknecht (Jul 18 2024 at 21:32):

Do we have a location in the compiler pipeline after/during specialization that we can pick a lowlevel we generate? Thinking about trying to specialize before we get to the backends instead of needing to deal with the specialization in each of the backends. So backends would still just dispatch a function call to zig without any special wiring. I'm assuming this would just be a pass on mono ir. I assume there is already a location in the compiler to add something like this.

Richard Feldman (Jul 18 2024 at 22:07):

Richard Feldman (Jul 18 2024 at 22:08):

so if you search for pattern matches on lowlevels you might find it (I'm on mobile right now)

Brendan Hansknecht (Jul 19 2024 at 01:00):

Also, ran this through the poop and definitely some heavy memory issues (I need to test with a large bufreader instead of an mmap). That may be a large part of the perf, but memory issues might be from something else (like mmap loading pages may be label as a cache miss but not actually be that expensive, not sure):

$ poop -d 1 "./r4 data/measurements_1_000_000_000.txt" "./1brc data/measurements_1_000_000_000.txt"
Benchmark 1 (3 runs): ./r4 data/measurements_1_000_000_000.txt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          87.3s  ± 3.30s     83.5s  … 89.5s           0 ( 0%)        0%
  peak_rss           4.15MB ±  227KB    3.89MB … 4.28MB          0 ( 0%)        0%
  cpu_cycles          276G  ±  958M      275G  …  277G           0 ( 0%)        0%
  instructions        471G  ±  175M      471G  …  471G           0 ( 0%)        0%
  cache_references   16.3G  ±  265M     16.0G  … 16.6G           0 ( 0%)        0%
  cache_misses       1.06M  ±  224K      870K  … 1.31M           0 ( 0%)        0%
  branch_misses      2.75G  ± 21.6M     2.74G  … 2.77G           0 ( 0%)        0%
Benchmark 2 (3 runs): ./1brc data/measurements_1_000_000_000.txt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           169s  ± 10.1s      159s  …  180s           0 ( 0%)        💩+ 94.1% ± 19.6%
  peak_rss           16.0GB ± 74.5KB    16.0GB … 16.0GB          0 ( 0%)        💩+385274.2% ±  9.2%
  cpu_cycles          640G  ± 2.13G      638G  …  642G           0 ( 0%)        💩+131.9% ±  1.4%
  instructions       1.40T  ± 1.59G     1.40T  … 1.40T           0 ( 0%)        💩+197.5% ±  0.5%
  cache_references   10.9G  ±  133M     10.8G  … 11.0G           0 ( 0%)        ⚡- 33.0% ±  2.9%
  cache_misses        462M  ± 1.43M      460M  …  463M           0 ( 0%)        💩+43454.8% ± 218.7%
  branch_misses      2.95G  ± 30.2M     2.92G  … 2.97G           0 ( 0%)        💩+  7.4% ±  2.2%

Brendan Hansknecht (Jul 19 2024 at 01:01):

Also, totally possible all the cache misses are from our dictionary being in two different allocation along with the memory pressure from the giant mmap.

Brendan Hansknecht (Jul 19 2024 at 01:02):

Also, note, these times are way faster than the times above due to not clearing the page cache between runs. So the file was already in ram more or less.

Brendan Hansknecht (Jul 19 2024 at 06:39):

So with a better version of readline, perf is even worse than with the mmap. This version of readline actually accumulates to a RocList directly instead of a Vec (which helped a lot, but not enough):

Benchmark 1 (3 runs): ./r4 data/measurements_1_000_000_000.txt
  wall_time          77.6s  ± 4.87s     74.5s  … 83.2s           0 ( 0%)        0%
Benchmark 2 (3 runs): ./1brc-mmap data/measurements_1_000_000_000.txt
  wall_time           177s  ± 7.44s      168s  …  182s           0 ( 0%)        💩+127.7% ± 18.4%
Benchmark 3 (3 runs): ./1brc-readlines data/measurements_1_000_000_000.txt
  wall_time           255s  ± 2.91s      252s  …  258s           0 ( 0%)        💩+228.0% ± 11.7%

Stream: beginners

Topic: Learning questions with 1brc: no args with legacy linker?

Paul Joubert (Jun 14 2024 at 20:15):

Brendan Hansknecht (Jun 14 2024 at 21:09):

Brendan Hansknecht (Jun 14 2024 at 21:09):

Paul Joubert (Jun 14 2024 at 21:11):

Anton (Jun 15 2024 at 12:59):

Paul Joubert (Jun 15 2024 at 13:30):

Paul Joubert (Jun 15 2024 at 13:40):

Anton (Jun 15 2024 at 13:40):

Anton (Jun 15 2024 at 13:41):

Anton (Jun 15 2024 at 13:47):

Paul Joubert (Jun 15 2024 at 13:49):

Paul Joubert (Jun 15 2024 at 13:50):

Anton (Jun 15 2024 at 14:00):

Paul Joubert (Jun 15 2024 at 14:00):

Anton (Jun 15 2024 at 14:15):

Paul Joubert (Jun 15 2024 at 14:20):

Anton (Jun 15 2024 at 14:20):

Anton (Jun 15 2024 at 14:21):

Anton (Jun 15 2024 at 14:29):

Musab Nazir (Jun 15 2024 at 16:31):

Anton (Jun 15 2024 at 16:47):

Anton (Jun 15 2024 at 16:59):

Anton (Jun 15 2024 at 18:08):

Luke Boswell (Jun 15 2024 at 22:34):

Luke Boswell (Jun 15 2024 at 22:35):

Paul Joubert (Jun 16 2024 at 15:10):

Brendan Hansknecht (Jun 16 2024 at 17:01):

Brendan Hansknecht (Jun 16 2024 at 17:58):

Brendan Hansknecht (Jun 16 2024 at 18:49):

Paul Joubert (Jun 16 2024 at 20:15):

Paul Joubert (Jun 16 2024 at 20:17):

Paul Joubert (Jun 16 2024 at 20:26):

Paul Joubert (Jun 16 2024 at 20:54):

Brendan Hansknecht (Jun 16 2024 at 21:24):

Brendan Hansknecht (Jun 16 2024 at 21:25):

Brendan Hansknecht (Jun 16 2024 at 21:29):

Paul Joubert (Jun 17 2024 at 13:40):

Brendan Hansknecht (Jun 17 2024 at 15:21):

Paul Joubert (Jun 17 2024 at 15:27):

Paul Joubert (Jun 17 2024 at 15:49):

Paul Joubert (Jun 17 2024 at 15:55):

Musab Nazir (Jun 17 2024 at 16:06):

Paul Joubert (Jun 17 2024 at 16:07):

Paul Joubert (Jun 17 2024 at 16:09):

Brendan Hansknecht (Jun 17 2024 at 18:53):

Brendan Hansknecht (Jun 17 2024 at 18:53):

Brendan Hansknecht (Jul 18 2024 at 06:19):

Brendan Hansknecht (Jul 18 2024 at 06:20):

Brendan Hansknecht (Jul 18 2024 at 06:33):

Brendan Hansknecht (Jul 18 2024 at 06:35):

Richard Feldman (Jul 18 2024 at 12:12):

Richard Feldman (Jul 18 2024 at 12:12):

Brendan Hansknecht (Jul 18 2024 at 15:34):

Brendan Hansknecht (Jul 18 2024 at 15:41):

Brendan Hansknecht (Jul 18 2024 at 16:00):

Richard Feldman (Jul 18 2024 at 16:29):

Brendan Hansknecht (Jul 18 2024 at 17:06):

Richard Feldman (Jul 18 2024 at 18:13):

Richard Feldman (Jul 18 2024 at 18:14):

Richard Feldman (Jul 18 2024 at 18:14):

Richard Feldman (Jul 18 2024 at 18:14):

Brendan Hansknecht (Jul 18 2024 at 20:20):

Brendan Hansknecht (Jul 18 2024 at 20:49):

Richard Feldman (Jul 18 2024 at 20:55):

Richard Feldman (Jul 18 2024 at 20:56):

Brendan Hansknecht (Jul 18 2024 at 20:57):

Brendan Hansknecht (Jul 18 2024 at 20:59):

Brendan Hansknecht (Jul 18 2024 at 21:00):

Brendan Hansknecht (Jul 18 2024 at 21:00):

Richard Feldman (Jul 18 2024 at 21:04):

Richard Feldman (Jul 18 2024 at 21:05):

Brendan Hansknecht (Jul 18 2024 at 21:09):

Brendan Hansknecht (Jul 18 2024 at 21:10):

Brendan Hansknecht (Jul 18 2024 at 21:14):

Brendan Hansknecht (Jul 18 2024 at 21:32):

Richard Feldman (Jul 18 2024 at 22:07):

Richard Feldman (Jul 18 2024 at 22:08):

Richard Feldman (Jul 18 2024 at 22:08):