Hi everyone,
I’ve been following the development of the language over the past few years and recently noticed in some comments that a few of you are using AI tools (like Claude) during the development process. I’d love to hear your thoughts and experiences on this.
What do you see as the main advantages, and what limitations or risks have you encountered (design choices, consistency, long-term maintainability, etc.)?
I’m genuinely curious to learn from your perspectives. Thanks in advance for sharing — and Merry Christmas :holiday_tree:!
(Feel free to move this discussion out of “compiler development” if this isn’t the right place.)
since Sonnet 4.5 came out, and even more so with Opus 4.5, my workflow has pretty much changed to "write English, review Zig code"
I'm a lot faster that way than before (despite 30 years of handwriting code!) plus it's much more conducive to making progress in tiny chunks of time
like right now I'm on the way to the gym and I have 5ish agents running on my laptop on different branches, and I can review what they did when I get back
I really struggled with lack of continuous hours prior to this workflow existing
since having a kid, since Roc is not my day job, I just would not have "work on Roc for several hours in a row" except for maybe a handful of times per year
that's not really a problem anymore because I don't need to load as many code-level details into my brain to make progress now
so I can't speak for others, but for me it's become totally transformative in a really positive way! :smiley:
Thanks for sharing!
I’m curious: do you find that the AI is able to maintain a sufficiently high-level, global understanding of the project on its own, or do you usually have to guide it quite explicitly (e.g. which parts of the codebase to touch, which files to modify, architectural constraints, etc.)?
In other words, how much of the “big picture” can you delegate to the AI today, and where do you still need to step in?
none of the current tools are good at big picture in my experience
I can't just be like "hey Claude, draw the owl" - that will end in disaster
however for small bugfixes I actually can be like "hey Claude, here's a boilerplate script you can follow to reproduce the bug in a test, then track down the root cause, and apply a fix and then push a draft PR for me to review"
sometimes that's enough bc there's no "big picture understanding" necessary - just surrounding context clues like the bug report and Claude being able to poke around the code case is enough to figure out how to repro it and then find a fix (again, for small bugs!)
so usually the diffs for these PRs are very small and easy to review. I'd guess more than 50% don't need changes, but a common failure mode is "Claude made the test pass using a bandaid fix that doesn't actually address the root cause"
and when I spot that in review it's often enough for me to go back and be like "hey this looks like a bandaid, try digging deeper to find the actual root cause"
for bigger projects they need to be chopped up into smaller pieces, can't just be like "draw the owl of this whole feature" and there is usually way more iterating back and forth on my reviewing it, finding something unacceptable, describing the necessary revisions, etc
but again for me personally it's huge to be able to have each of those interactions be on and arbitrary schedule. I don't need to sit down and spend 10 minutes reorienting myself in the code to be able to make progress like I do when writing code by hand
this thread made me try opus and damn its a nice model lol
I also find it really valuable, I can contribute while working around other meetings and different things. I tend to spend a fair bit of time in "plan mode" until I'm convinced Claude has identified a good root cause and understands the issue, or the scope of the next step isn't too large.
Ai tools take tons of context to work well currently, but they can do a lot.
One of my coworkers who is an extreme ai power users describes current AI as "a neurodivergent intern who is exceptionally passionate and types really really fast". I think that is a relatively fair way to describe it today. If you can give enough details, guidance, and measurable tasks, it can do great things. But it also can generate gigantic messes and releasing the enthusiasm without bounds is a catastrophic mistake.
I find for mych of my work, it requires very detailed prompting. Lots of asking targeted questions to get the AI to think about the problem in the right way and a lot of planning mode with thinking on and the "ultrathink" keyword. (This is for Claude opus 4.5)
It works rather well, but still constantly falls short especially if I am not careful enough
Note, this is from my generally work on not much roc work
Thank you for your answers, I find this workflow really impressive.
One concern I still have, though, is how well this approach holds up for building a programming language, where a lot of value comes from deep optimization, careful design trade-offs, and long-term refinement.
Do you feel the AI can meaningfully contribute at that level, or does it mostly help with implementation once those decisions are already very well defined?
I think the work that my coworker does (infrastructure and core abstractions for high performance and maintable GPU kernels) qualifies as requiring "deep optimization, careful design trade-offs, and long-term refinement".
And I think it can...but you have to be way more careful of it currently.
A huge part of it is not giving basic answers to the ai, but instead asking it questions that cause it to think through the design. Then when it gets to implementation, it has that design discussion as reference.
That said, I currently find it better to explore design separate from code. Use the ai to educate myself to inform better design rather than let it create the design.
Brendan Hansknecht said:
And I think it can...but you have to be way more careful of it currently.
A huge part of it is not giving basic answers to the ai, but instead asking it questions that cause it to think through the design. Then when it gets to implementation, it has that design discussion as reference.
That part
Brendan Hansknecht said:
I currently find it better to explore design separate from code. Use the ai to educate myself to inform better design rather than let it create the design.
yeah, same here!
Yes, relevant context :smile:
Ghislain said:
One concern I still have, though, is how well this approach holds up for building a programming language, where a lot of value comes from deep optimization, careful design trade-offs, and long-term refinement.
Do you feel the AI can meaningfully contribute at that level, or does it mostly help with implementation once those decisions are already very well defined?
so far I think the only way LLMs have contributed to Roc's design is in researching what other languages do - e.g. it's way faster now to say "hey Gemini [which is the model I usually use for research] how do various different programming languages do _____?" than it used to be to try to find exactly the right part of documentation before, especially if I don't know what I'm searching for - e.g. one of my searches was "what names do different languages use for flat_map?" and if I do like "what does [specific language] call flat_map?" then if the language uses a different name, I had to hope someone asked a StackOverflow question like that or else I'd be running different searches guessing different names :smile:
but at least in Roc's case I haven't done any like "hey [model] help me solve this design problem" stuff - it's really mostly been for research, debugging, or writing/editing code
On a more philosophical note — and as a relatively younger developer — this is something I find myself thinking about more and more as AI plays such a large role in implementation and even design.
I’ll admit there’s also a bit of personal tension for me: seeing AI not only write code much faster, but now also reason and design at such speed can feel slightly discouraging, especially when my own motivation has always been to deeply understand things in order to do them as well as possible.
How do you still find personal meaning or interest in the project itself — especially a programming language, in a context where we may be expected to use them less and less — and where does that meaning come from today?
I ask myself this question a lot. like i can potentially see a future where we just write everything in assembly via agent
I guess writing in a higher-level language like C/Zig would already be sufficient, given the level of optimizations modern compilers are able to perform (thank you Advent of Compiler :smile:).
Richard Feldman said:
e.g. it's way faster now to say "hey Gemini [which is the model I usually use for research] how do various different programming languages do _____?"
I'm curious what you use Gemini for, since from context, it looks like you use claude for code. Is gemini better for you regarding research?
at least right at this moment it's force of habit. There was some point where Gemini 2.5 Pro was giving me better answers than Claude 4 was and I haven't done a revised comparison since then :smile:
Ghislain said:
How do you still find personal meaning or interest in the project itself — especially a programming language, in a context where we may be expected to use them less and less — and where does that meaning come from today?
It honestly never occurred to me to think about it in these terms.
Like I use Rust at work and Zig on Roc, I use the same models on both, and the languages feel very noticeably different. I assume the same would be true of like Go or TypeScript too, so I don't see why Roc would be any different in that regard! :smile:
like I'm still spending a lot of time reading and revising code, and build times are a major contribution to how long it takes llms to get to a point where code is reviewable.
Ghislain said:
I’ll admit there’s also a bit of personal tension for me: seeing AI not only write code much faster, but now also reason and design at such speed can feel slightly discouraging, especially when my own motivation has always been to deeply understand things in order to do them as well as possible.
I think this is still where the most value is. AI may be fast but understanding the depth to enable it to build robust and extensionable solutions is an important skill. Also for many areas of significant depth AI falls quite flat. So being someone who wants to dive deep and really grok things, you are in the best place to take advantage of AI.
I think the people who are getting hurt the most by AI right now are the people who are outsourcing too much to AI. As a result, they are no longer learning and growing. Instead they become stagnant and dependent. People who use AI to learn and deal with only a subset of tasks are able to focus on the most important work, grow, and generally accelerate their work.
People ... are able to focus on the most important work, grow, and generally accelerate their work.
I agree with this sentiment.
Ghislain said:
How do you still find personal meaning or interest in the project itself — especially a programming language, in a context where we may be expected to use them less and less — and where does that meaning come from today?
I think there is still demand for a programming language that:
It's actually an exciting opportunity to be able to make a programming language for this new age because we can design it with the capabilities of modern LLMs in mind. Making big changes like that seems a lot harder for established programming languages.
yeah another way to think about it: if language doesn't matter anymore, what's the one language all AI-programming enthusiasts have converged on using for all greenfield programming tasks?
(answer: whichever one they feel happiest using or which their job demands, just like always)
so I don't think the tradeoffs about languages have changed at a fundamental level, it's more that some tradeoffs matter more than other
for example, it's wild to me that about a year ago we were discussing how static dispatch enabling . autocomplete in IDEs was a big selling point, and today I do manual editing so infrequently I basically value that selling point at zero :sweat_smile:
and it feels like the incentives around automated tests have changed a lot when doing a ton with agents; the tests are way cheaper to write, and have added value because they not only catch regressions, they give agents a feedback mechanism to self-correct without programmer review.
so in that world, like Anton noted, I think the "tests can be auto-skipped when they're all pure functions and the code didn't change" is a way bigger deal, because bigger test suites (currently) unavoidably slow the agents down, and yet are important for feedback loop reasons.
another example: the "what color is your function?" tradeoff of effectfulness needing to propagate all the way through a call chain - used to be a big downside because you'd need to do that propagation yourself. Now it's trivial; I'll just notice in the review that the agent did it.
and the upside of names ending in ! making it more obvious where effects are happening is more valuable than before, because I'm spending proportionately more of my time reading diffs compared to before
An interesting benchmark for programming languages would be: How fast can Opus 4.5 build a specific set of diverse applications with language X, what's the perf of the application and how many tests does the app pass of a very thorough hidden test suite.
Other interesting metrics: total number of errors and warnings encountered, number of times the application was run during development, time spent debugging
Would be interesting if you added extra constraints to see how it copes. For example, it might start with naive python. You should also be able to ask it to optimize the code to reach a certain performance... Might then use numpy in python for example
Brendan Hansknecht said:
Might then use numpy in python for example
yesterday I learned numPy is implemented in fortran
Hmm, only a tiny bit seems to be implemented in Fortran based on the "Languages" breakdown on https://github.com/numpy/numpy
I think I learned it from an LLM so yeah that checks out lol
Fortran can optimize better than c for this kind of code due to assuming that pointers won't alias by default.
Not sure how relevant it is in the modern day, but it was part of the reason for its early growth. Was better for high performance scientific code and would run faster.
But that would be part of my guess for the origin here
If you click on Fortran in that repo to see the files and scroll through them, they are all just documentation and test data for a thing called "f2py", which seems to be a tool for integrating user-defined Fortran code into Python apps. But it doesn't seem like numpy itself actually implements any algorithms in Fortran.
What do you think about Simon’s take ?
If you’re introducing a new protocol or even a new programming language to the world in 2026 I strongly recommend including a language-agnostic conformance suite as part of your project.
I don't understand what he's saying there :sweat_smile:
maybe he blogged about it in more detail somewhere else?
Isn't that basically what we are developing in our suite of snapshot tests? we are building up a collection of examples which demonstrate the syntax and semantics
that's certainly one thing he might mean :smile:
I think it is the snapshot tests but also taken a level further. You need to make sure not only that you generate correct parsing/errors and what not, but also that you execute correctly. That is fundamental for programming language correctness. (I don't think snapshot tests run the interpretter, but maybe they do)
I think the theory being proposed here is that eventually I should be able to write a language implementation in python that covers all the errors and full intepretter setup. Build a conformance test suite that tests everything. Then ask AI to build me a fully working interpretter in Zig by implementing the tests one by one with a little extra guidance. After that I should theoretically even be able to ask for an llvm backend and just need to ask the ai to have it generate the exact same output as the intepretter when used.
I don't think snapshot tests run the interpretter
Yeah they do in the "REPL" snapshots :smiley:
theoretically even be able to ask for an llvm backend and just need to ask the ai to have it generate the exact same output as the intepretter when used.
I think this is the approach (or something similar) we're currently using building out the LLVM backend for Roc.
I am not sure how complete the LLVM backend is already, but from skim reading some of the PR's it looks like Richard has been using our tests with both the interpreter and LLVM backend and confirming they evaluate to the same thing -- and this being of the key methods of feedback for Claude and friends to work with as a source of truth.
See https://github.com/roc-lang/roc/pull/8810
Ah yeah, then I think we have the core of what they talk about here.
Last updated: Jan 12 2026 at 12:19 UTC