generating a big Roc program for benchmarking · contributing

it would be helpful to have a bigger code base (e.g. 100K or 1M LoC), but of course nobody has written any real Roc programs that big yet!

Richard Feldman (Aug 23 2022 at 12:15):

still, we could get some really useful information if we had an actual code base with line counts like that - even if it was generated by a script

Richard Feldman (Aug 23 2022 at 12:16):

e.g. having a script that randomly generates a valid Roc interface module with a random name, a handful of functions in it, randomly imports a few of the other modules, etc.

Richard Feldman (Aug 23 2022 at 12:17):

that way, we could run the compiler on it and not only see how long it takes, but also see where specific bottlenecks are in the compilation pipeline etc.

Richard Feldman (Aug 23 2022 at 12:18):

Brian Hicks (Aug 23 2022 at 13:41):

are you thinking something similar to a fuzzer/shrinker or just a random generator?

Brendan Hansknecht (Aug 23 2022 at 13:56):

I wrote a super basic one a long while ago. I remember that after some number of files the compiler would just hang back then...not sure the state now.

Richard Feldman (Aug 23 2022 at 14:33):

just a random generator - like something that makes valid .roc files which reference each other

Richard Feldman (Aug 23 2022 at 14:34):

the goal wouldn't be fuzzing because the point wouldn't be to identify edge cases (e.g. no need for shrinking)

Richard Feldman (Aug 23 2022 at 14:34):

the goal would just be raw lines of code in some generally realistic structure (e.g. files of nontrivial length that import various other files of nontrivial length)

Richard Feldman (Aug 23 2022 at 14:35):

for the purpose of seeing how well the compiler can handle it, identifying performance bottlenecks, etc.

Brendan Hansknecht (Aug 23 2022 at 14:44):

My version just used jinja2 templates and in this case made a binary expansion of files.

interface {{ name }}
    exposes [ a, b ]
    imports [ {{ left }}, {{ right }}]

a = \n ->
    {% if flip1 %}{{ op1 }} ({{ left }}.a n) ({{ right }}.a {{ rand }}){% else %}{{ op1 }} ({{ left }}.a {{ rand }}) ({{ right }}.a n){% endif %}

b = \n ->
    {% if flip2 %}{{ op2 }} ({{ left }}.b n) ({{ right }}.b {{ rand }}){% else %}{{ op2 }} ({{ left }}.b {{ rand }}) ({{ right }}.b n){% endif %}

Brendan Hansknecht (Aug 23 2022 at 14:44):

Of course you could do something way more interesting. This just used a few possible functions from Num and some randomness to build out the tree.

Brendan Hansknecht (Aug 23 2022 at 14:46):

Making interesting to compiler functions is of course a much larger challenge than focus on something specific like number of files and module depth.

Richard Feldman (Aug 23 2022 at 14:47):

right, the idea here would be to keep the scope minimal so it can be a quick project :big_smile:

Ayaz Hafiz (Aug 23 2022 at 14:56):

If this is done, we'd want to be careful about the shape of the program that's generated. For example, I don't think we'd want to have any deeply nested closures - that is a known bottleneck that will dominate any perf trace, and how those constructs are treated in the compiler are likely to change soon.

Brendan Hansknecht (Aug 23 2022 at 15:05):

Depending on how complex of generation is wanted, I may be able to modify what I have to support this use case. I think what I had here was specifically trying to also support the dev backend, which is why it was so restricted.

Stream: contributing

Topic: generating a big Roc program for benchmarking

Richard Feldman (Aug 23 2022 at 12:15):

Richard Feldman (Aug 23 2022 at 12:15):

Richard Feldman (Aug 23 2022 at 12:15):

Richard Feldman (Aug 23 2022 at 12:16):

Richard Feldman (Aug 23 2022 at 12:17):

Richard Feldman (Aug 23 2022 at 12:18):

Brian Hicks (Aug 23 2022 at 13:41):

Brendan Hansknecht (Aug 23 2022 at 13:56):

Richard Feldman (Aug 23 2022 at 14:33):

Richard Feldman (Aug 23 2022 at 14:34):

Richard Feldman (Aug 23 2022 at 14:34):

Richard Feldman (Aug 23 2022 at 14:35):

Brendan Hansknecht (Aug 23 2022 at 14:44):

Brendan Hansknecht (Aug 23 2022 at 14:44):

Brendan Hansknecht (Aug 23 2022 at 14:46):

Richard Feldman (Aug 23 2022 at 14:47):

Ayaz Hafiz (Aug 23 2022 at 14:56):

Brendan Hansknecht (Aug 23 2022 at 15:05):