right now we only have small Roc programs on which to try out the compiler
it would be helpful to have a bigger code base (e.g. 100K or 1M LoC), but of course nobody has written any real Roc programs that big yet!
still, we could get some really useful information if we had an actual code base with line counts like that - even if it was generated by a script
e.g. having a script that randomly generates a valid Roc interface
module with a random name, a handful of functions in it, randomly imports a few of the other modules, etc.
that way, we could run the compiler on it and not only see how long it takes, but also see where specific bottlenecks are in the compilation pipeline etc.
anyone interested in generating something like that?
are you thinking something similar to a fuzzer/shrinker or just a random generator?
I wrote a super basic one a long while ago. I remember that after some number of files the compiler would just hang back then...not sure the state now.
just a random generator - like something that makes valid .roc
files which reference each other
the goal wouldn't be fuzzing because the point wouldn't be to identify edge cases (e.g. no need for shrinking)
the goal would just be raw lines of code in some generally realistic structure (e.g. files of nontrivial length that import various other files of nontrivial length)
for the purpose of seeing how well the compiler can handle it, identifying performance bottlenecks, etc.
My version just used jinja2 templates and in this case made a binary expansion of files.
interface {{ name }}
exposes [ a, b ]
imports [ {{ left }}, {{ right }}]
a = \n ->
{% if flip1 %}{{ op1 }} ({{ left }}.a n) ({{ right }}.a {{ rand }}){% else %}{{ op1 }} ({{ left }}.a {{ rand }}) ({{ right }}.a n){% endif %}
b = \n ->
{% if flip2 %}{{ op2 }} ({{ left }}.b n) ({{ right }}.b {{ rand }}){% else %}{{ op2 }} ({{ left }}.b {{ rand }}) ({{ right }}.b n){% endif %}
Of course you could do something way more interesting. This just used a few possible functions from Num
and some randomness to build out the tree.
Making interesting to compiler functions is of course a much larger challenge than focus on something specific like number of files and module depth.
right, the idea here would be to keep the scope minimal so it can be a quick project :big_smile:
If this is done, we'd want to be careful about the shape of the program that's generated. For example, I don't think we'd want to have any deeply nested closures - that is a known bottleneck that will dominate any perf trace, and how those constructs are treated in the compiler are likely to change soon.
Depending on how complex of generation is wanted, I may be able to modify what I have to support this use case. I think what I had here was specifically trying to also support the dev backend, which is why it was so restricted.
Last updated: Jul 06 2025 at 12:14 UTC