Stream: contributing

Topic: generating a big Roc program for benchmarking


view this post on Zulip Richard Feldman (Aug 23 2022 at 12:15):

right now we only have small Roc programs on which to try out the compiler

view this post on Zulip Richard Feldman (Aug 23 2022 at 12:15):

it would be helpful to have a bigger code base (e.g. 100K or 1M LoC), but of course nobody has written any real Roc programs that big yet!

view this post on Zulip Richard Feldman (Aug 23 2022 at 12:15):

still, we could get some really useful information if we had an actual code base with line counts like that - even if it was generated by a script

view this post on Zulip Richard Feldman (Aug 23 2022 at 12:16):

e.g. having a script that randomly generates a valid Roc interface module with a random name, a handful of functions in it, randomly imports a few of the other modules, etc.

view this post on Zulip Richard Feldman (Aug 23 2022 at 12:17):

that way, we could run the compiler on it and not only see how long it takes, but also see where specific bottlenecks are in the compilation pipeline etc.

view this post on Zulip Richard Feldman (Aug 23 2022 at 12:18):

anyone interested in generating something like that?

view this post on Zulip Brian Hicks (Aug 23 2022 at 13:41):

are you thinking something similar to a fuzzer/shrinker or just a random generator?

view this post on Zulip Brendan Hansknecht (Aug 23 2022 at 13:56):

I wrote a super basic one a long while ago. I remember that after some number of files the compiler would just hang back then...not sure the state now.

view this post on Zulip Richard Feldman (Aug 23 2022 at 14:33):

just a random generator - like something that makes valid .roc files which reference each other

view this post on Zulip Richard Feldman (Aug 23 2022 at 14:34):

the goal wouldn't be fuzzing because the point wouldn't be to identify edge cases (e.g. no need for shrinking)

view this post on Zulip Richard Feldman (Aug 23 2022 at 14:34):

the goal would just be raw lines of code in some generally realistic structure (e.g. files of nontrivial length that import various other files of nontrivial length)

view this post on Zulip Richard Feldman (Aug 23 2022 at 14:35):

for the purpose of seeing how well the compiler can handle it, identifying performance bottlenecks, etc.

view this post on Zulip Brendan Hansknecht (Aug 23 2022 at 14:44):

My version just used jinja2 templates and in this case made a binary expansion of files.

interface {{ name }}
    exposes [ a, b ]
    imports [ {{ left }}, {{ right }}]

a = \n ->
    {% if flip1 %}{{ op1 }} ({{ left }}.a n) ({{ right }}.a {{ rand }}){% else %}{{ op1 }} ({{ left }}.a {{ rand }}) ({{ right }}.a n){% endif %}

b = \n ->
    {% if flip2 %}{{ op2 }} ({{ left }}.b n) ({{ right }}.b {{ rand }}){% else %}{{ op2 }} ({{ left }}.b {{ rand }}) ({{ right }}.b n){% endif %}

view this post on Zulip Brendan Hansknecht (Aug 23 2022 at 14:44):

Of course you could do something way more interesting. This just used a few possible functions from Num and some randomness to build out the tree.

view this post on Zulip Brendan Hansknecht (Aug 23 2022 at 14:46):

Making interesting to compiler functions is of course a much larger challenge than focus on something specific like number of files and module depth.

view this post on Zulip Richard Feldman (Aug 23 2022 at 14:47):

right, the idea here would be to keep the scope minimal so it can be a quick project :big_smile:

view this post on Zulip Ayaz Hafiz (Aug 23 2022 at 14:56):

If this is done, we'd want to be careful about the shape of the program that's generated. For example, I don't think we'd want to have any deeply nested closures - that is a known bottleneck that will dominate any perf trace, and how those constructs are treated in the compiler are likely to change soon.

view this post on Zulip Brendan Hansknecht (Aug 23 2022 at 15:05):

Depending on how complex of generation is wanted, I may be able to modify what I have to support this use case. I think what I had here was specifically trying to also support the dev backend, which is why it was so restricted.


Last updated: Jul 06 2025 at 12:14 UTC