Stream: compiler development

Topic: Thoughts on Using AI in the Language Development Process?


view this post on Zulip Ghislain (Dec 24 2025 at 17:10):

Hi everyone,

I’ve been following the development of the language over the past few years and recently noticed in some comments that a few of you are using AI tools (like Claude) during the development process. I’d love to hear your thoughts and experiences on this.

What do you see as the main advantages, and what limitations or risks have you encountered (design choices, consistency, long-term maintainability, etc.)?

I’m genuinely curious to learn from your perspectives. Thanks in advance for sharing — and Merry Christmas :holiday_tree:!

(Feel free to move this discussion out of “compiler development” if this isn’t the right place.)

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:31):

since Sonnet 4.5 came out, and even more so with Opus 4.5, my workflow has pretty much changed to "write English, review Zig code"

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:32):

I'm a lot faster that way than before (despite 30 years of handwriting code!) plus it's much more conducive to making progress in tiny chunks of time

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:32):

like right now I'm on the way to the gym and I have 5ish agents running on my laptop on different branches, and I can review what they did when I get back

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:33):

I really struggled with lack of continuous hours prior to this workflow existing

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:33):

since having a kid, since Roc is not my day job, I just would not have "work on Roc for several hours in a row" except for maybe a handful of times per year

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:34):

that's not really a problem anymore because I don't need to load as many code-level details into my brain to make progress now

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:36):

so I can't speak for others, but for me it's become totally transformative in a really positive way! :smiley:

view this post on Zulip Ghislain (Dec 24 2025 at 17:39):

Thanks for sharing!

I’m curious: do you find that the AI is able to maintain a sufficiently high-level, global understanding of the project on its own, or do you usually have to guide it quite explicitly (e.g. which parts of the codebase to touch, which files to modify, architectural constraints, etc.)?

In other words, how much of the “big picture” can you delegate to the AI today, and where do you still need to step in?

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:43):

none of the current tools are good at big picture in my experience

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:44):

I can't just be like "hey Claude, draw the owl" - that will end in disaster

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:45):

however for small bugfixes I actually can be like "hey Claude, here's a boilerplate script you can follow to reproduce the bug in a test, then track down the root cause, and apply a fix and then push a draft PR for me to review"

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:47):

sometimes that's enough bc there's no "big picture understanding" necessary - just surrounding context clues like the bug report and Claude being able to poke around the code case is enough to figure out how to repro it and then find a fix (again, for small bugs!)

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:48):

so usually the diffs for these PRs are very small and easy to review. I'd guess more than 50% don't need changes, but a common failure mode is "Claude made the test pass using a bandaid fix that doesn't actually address the root cause"

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:49):

and when I spot that in review it's often enough for me to go back and be like "hey this looks like a bandaid, try digging deeper to find the actual root cause"

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:51):

for bigger projects they need to be chopped up into smaller pieces, can't just be like "draw the owl of this whole feature" and there is usually way more iterating back and forth on my reviewing it, finding something unacceptable, describing the necessary revisions, etc

view this post on Zulip Richard Feldman (Dec 24 2025 at 17:52):

but again for me personally it's huge to be able to have each of those interactions be on and arbitrary schedule. I don't need to sit down and spend 10 minutes reorienting myself in the code to be able to make progress like I do when writing code by hand

view this post on Zulip nandi (Dec 24 2025 at 18:55):

this thread made me try opus and damn its a nice model lol

view this post on Zulip Luke Boswell (Dec 24 2025 at 19:37):

I also find it really valuable, I can contribute while working around other meetings and different things. I tend to spend a fair bit of time in "plan mode" until I'm convinced Claude has identified a good root cause and understands the issue, or the scope of the next step isn't too large.

view this post on Zulip Brendan Hansknecht (Dec 25 2025 at 00:25):

Ai tools take tons of context to work well currently, but they can do a lot.

view this post on Zulip Brendan Hansknecht (Dec 25 2025 at 00:27):

One of my coworkers who is an extreme ai power users describes current AI as "a neurodivergent intern who is exceptionally passionate and types really really fast". I think that is a relatively fair way to describe it today. If you can give enough details, guidance, and measurable tasks, it can do great things. But it also can generate gigantic messes and releasing the enthusiasm without bounds is a catastrophic mistake.

view this post on Zulip Brendan Hansknecht (Dec 25 2025 at 00:29):

I find for mych of my work, it requires very detailed prompting. Lots of asking targeted questions to get the AI to think about the problem in the right way and a lot of planning mode with thinking on and the "ultrathink" keyword. (This is for Claude opus 4.5)

view this post on Zulip Brendan Hansknecht (Dec 25 2025 at 00:29):

It works rather well, but still constantly falls short especially if I am not careful enough

view this post on Zulip Brendan Hansknecht (Dec 25 2025 at 00:29):

Note, this is from my generally work on not much roc work

view this post on Zulip Ghislain (Dec 25 2025 at 01:17):

Thank you for your answers, I find this workflow really impressive.

One concern I still have, though, is how well this approach holds up for building a programming language, where a lot of value comes from deep optimization, careful design trade-offs, and long-term refinement.

Do you feel the AI can meaningfully contribute at that level, or does it mostly help with implementation once those decisions are already very well defined?

view this post on Zulip Brendan Hansknecht (Dec 25 2025 at 01:20):

I think the work that my coworker does (infrastructure and core abstractions for high performance and maintable GPU kernels) qualifies as requiring "deep optimization, careful design trade-offs, and long-term refinement".

view this post on Zulip Brendan Hansknecht (Dec 25 2025 at 01:22):

And I think it can...but you have to be way more careful of it currently.

A huge part of it is not giving basic answers to the ai, but instead asking it questions that cause it to think through the design. Then when it gets to implementation, it has that design discussion as reference.

view this post on Zulip Brendan Hansknecht (Dec 25 2025 at 01:23):

That said, I currently find it better to explore design separate from code. Use the ai to educate myself to inform better design rather than let it create the design.

view this post on Zulip nandi (Dec 25 2025 at 01:24):

Brendan Hansknecht said:

And I think it can...but you have to be way more careful of it currently.

A huge part of it is not giving basic answers to the ai, but instead asking it questions that cause it to think through the design. Then when it gets to implementation, it has that design discussion as reference.

That part

view this post on Zulip Richard Feldman (Dec 25 2025 at 02:00):

Brendan Hansknecht said:

I currently find it better to explore design separate from code. Use the ai to educate myself to inform better design rather than let it create the design.

yeah, same here!

view this post on Zulip nandi (Dec 26 2025 at 18:54):

image.png

view this post on Zulip Anton (Dec 26 2025 at 18:55):

Yes, relevant context :smile:

view this post on Zulip Richard Feldman (Dec 26 2025 at 22:01):

Ghislain said:

One concern I still have, though, is how well this approach holds up for building a programming language, where a lot of value comes from deep optimization, careful design trade-offs, and long-term refinement.

Do you feel the AI can meaningfully contribute at that level, or does it mostly help with implementation once those decisions are already very well defined?

so far I think the only way LLMs have contributed to Roc's design is in researching what other languages do - e.g. it's way faster now to say "hey Gemini [which is the model I usually use for research] how do various different programming languages do _____?" than it used to be to try to find exactly the right part of documentation before, especially if I don't know what I'm searching for - e.g. one of my searches was "what names do different languages use for flat_map?" and if I do like "what does [specific language] call flat_map?" then if the language uses a different name, I had to hope someone asked a StackOverflow question like that or else I'd be running different searches guessing different names :smile:

view this post on Zulip Richard Feldman (Dec 26 2025 at 22:02):

but at least in Roc's case I haven't done any like "hey [model] help me solve this design problem" stuff - it's really mostly been for research, debugging, or writing/editing code

view this post on Zulip Ghislain (Dec 26 2025 at 22:31):

On a more philosophical note — and as a relatively younger developer — this is something I find myself thinking about more and more as AI plays such a large role in implementation and even design.

view this post on Zulip Ghislain (Dec 26 2025 at 22:32):

I’ll admit there’s also a bit of personal tension for me: seeing AI not only write code much faster, but now also reason and design at such speed can feel slightly discouraging, especially when my own motivation has always been to deeply understand things in order to do them as well as possible.

view this post on Zulip Ghislain (Dec 26 2025 at 22:32):

How do you still find personal meaning or interest in the project itself — especially a programming language, in a context where we may be expected to use them less and less — and where does that meaning come from today?

view this post on Zulip nandi (Dec 26 2025 at 22:35):

I ask myself this question a lot. like i can potentially see a future where we just write everything in assembly via agent

view this post on Zulip Ghislain (Dec 26 2025 at 22:40):

I guess writing in a higher-level language like C/Zig would already be sufficient, given the level of optimizations modern compilers are able to perform (thank you Advent of Compiler :smile:).

view this post on Zulip nandi (Dec 26 2025 at 22:41):

Richard Feldman said:

e.g. it's way faster now to say "hey Gemini [which is the model I usually use for research] how do various different programming languages do _____?"

I'm curious what you use Gemini for, since from context, it looks like you use claude for code. Is gemini better for you regarding research?

view this post on Zulip Richard Feldman (Dec 26 2025 at 22:54):

at least right at this moment it's force of habit. There was some point where Gemini 2.5 Pro was giving me better answers than Claude 4 was and I haven't done a revised comparison since then :smile:

view this post on Zulip Richard Feldman (Dec 26 2025 at 22:59):

Ghislain said:

How do you still find personal meaning or interest in the project itself — especially a programming language, in a context where we may be expected to use them less and less — and where does that meaning come from today?

It honestly never occurred to me to think about it in these terms.

Like I use Rust at work and Zig on Roc, I use the same models on both, and the languages feel very noticeably different. I assume the same would be true of like Go or TypeScript too, so I don't see why Roc would be any different in that regard! :smile:

view this post on Zulip Richard Feldman (Dec 26 2025 at 23:01):

like I'm still spending a lot of time reading and revising code, and build times are a major contribution to how long it takes llms to get to a point where code is reviewable.

view this post on Zulip Brendan Hansknecht (Dec 27 2025 at 06:54):

Ghislain said:

I’ll admit there’s also a bit of personal tension for me: seeing AI not only write code much faster, but now also reason and design at such speed can feel slightly discouraging, especially when my own motivation has always been to deeply understand things in order to do them as well as possible.

I think this is still where the most value is. AI may be fast but understanding the depth to enable it to build robust and extensionable solutions is an important skill. Also for many areas of significant depth AI falls quite flat. So being someone who wants to dive deep and really grok things, you are in the best place to take advantage of AI.

view this post on Zulip Brendan Hansknecht (Dec 27 2025 at 06:56):

I think the people who are getting hurt the most by AI right now are the people who are outsourcing too much to AI. As a result, they are no longer learning and growing. Instead they become stagnant and dependent. People who use AI to learn and deal with only a subset of tasks are able to focus on the most important work, grow, and generally accelerate their work.

view this post on Zulip Luke Boswell (Dec 27 2025 at 10:13):

People ... are able to focus on the most important work, grow, and generally accelerate their work.

I agree with this sentiment.

view this post on Zulip Anton (Dec 29 2025 at 10:28):

Ghislain said:

How do you still find personal meaning or interest in the project itself — especially a programming language, in a context where we may be expected to use them less and less — and where does that meaning come from today?

I think there is still demand for a programming language that:

It's actually an exciting opportunity to be able to make a programming language for this new age because we can design it with the capabilities of modern LLMs in mind. Making big changes like that seems a lot harder for established programming languages.

view this post on Zulip Richard Feldman (Dec 31 2025 at 16:12):

yeah another way to think about it: if language doesn't matter anymore, what's the one language all AI-programming enthusiasts have converged on using for all greenfield programming tasks?

view this post on Zulip Richard Feldman (Dec 31 2025 at 16:13):

(answer: whichever one they feel happiest using or which their job demands, just like always)

view this post on Zulip Richard Feldman (Dec 31 2025 at 16:14):

so I don't think the tradeoffs about languages have changed at a fundamental level, it's more that some tradeoffs matter more than other

view this post on Zulip Richard Feldman (Dec 31 2025 at 16:15):

for example, it's wild to me that about a year ago we were discussing how static dispatch enabling . autocomplete in IDEs was a big selling point, and today I do manual editing so infrequently I basically value that selling point at zero :sweat_smile:

view this post on Zulip Richard Feldman (Dec 31 2025 at 16:18):

and it feels like the incentives around automated tests have changed a lot when doing a ton with agents; the tests are way cheaper to write, and have added value because they not only catch regressions, they give agents a feedback mechanism to self-correct without programmer review.

view this post on Zulip Richard Feldman (Dec 31 2025 at 16:19):

so in that world, like Anton noted, I think the "tests can be auto-skipped when they're all pure functions and the code didn't change" is a way bigger deal, because bigger test suites (currently) unavoidably slow the agents down, and yet are important for feedback loop reasons.

view this post on Zulip Richard Feldman (Dec 31 2025 at 16:21):

another example: the "what color is your function?" tradeoff of effectfulness needing to propagate all the way through a call chain - used to be a big downside because you'd need to do that propagation yourself. Now it's trivial; I'll just notice in the review that the agent did it.

view this post on Zulip Richard Feldman (Dec 31 2025 at 16:22):

and the upside of names ending in ! making it more obvious where effects are happening is more valuable than before, because I'm spending proportionately more of my time reading diffs compared to before

view this post on Zulip Anton (Jan 01 2026 at 10:37):

An interesting benchmark for programming languages would be: How fast can Opus 4.5 build a specific set of diverse applications with language X, what's the perf of the application and how many tests does the app pass of a very thorough hidden test suite.

view this post on Zulip Anton (Jan 01 2026 at 10:43):

Other interesting metrics: total number of errors and warnings encountered, number of times the application was run during development, time spent debugging

view this post on Zulip Brendan Hansknecht (Jan 01 2026 at 16:05):

Would be interesting if you added extra constraints to see how it copes. For example, it might start with naive python. You should also be able to ask it to optimize the code to reach a certain performance... Might then use numpy in python for example

view this post on Zulip nandi (Jan 01 2026 at 18:11):

Brendan Hansknecht said:

Might then use numpy in python for example

yesterday I learned numPy is implemented in fortran

view this post on Zulip Anton (Jan 01 2026 at 18:14):

Hmm, only a tiny bit seems to be implemented in Fortran based on the "Languages" breakdown on https://github.com/numpy/numpy

view this post on Zulip nandi (Jan 01 2026 at 18:16):

I think I learned it from an LLM so yeah that checks out lol

view this post on Zulip Brendan Hansknecht (Jan 01 2026 at 18:42):

Fortran can optimize better than c for this kind of code due to assuming that pointers won't alias by default.

Not sure how relevant it is in the modern day, but it was part of the reason for its early growth. Was better for high performance scientific code and would run faster.

view this post on Zulip Brendan Hansknecht (Jan 01 2026 at 18:42):

But that would be part of my guess for the origin here

view this post on Zulip Brian Carroll (Jan 03 2026 at 20:34):

If you click on Fortran in that repo to see the files and scroll through them, they are all just documentation and test data for a thing called "f2py", which seems to be a tool for integrating user-defined Fortran code into Python apps. But it doesn't seem like numpy itself actually implements any algorithms in Fortran.

view this post on Zulip Ghislain (Jan 04 2026 at 00:26):

What do you think about Simon’s take ?

If you’re introducing a new protocol or even a new programming language to the world in 2026 I strongly recommend including a language-agnostic conformance suite as part of your project.

view this post on Zulip Richard Feldman (Jan 04 2026 at 00:58):

I don't understand what he's saying there :sweat_smile:

view this post on Zulip Richard Feldman (Jan 04 2026 at 00:58):

maybe he blogged about it in more detail somewhere else?

view this post on Zulip Luke Boswell (Jan 04 2026 at 01:11):

Isn't that basically what we are developing in our suite of snapshot tests? we are building up a collection of examples which demonstrate the syntax and semantics

view this post on Zulip Richard Feldman (Jan 04 2026 at 01:58):

that's certainly one thing he might mean :smile:

view this post on Zulip Brendan Hansknecht (Jan 04 2026 at 03:28):

I think it is the snapshot tests but also taken a level further. You need to make sure not only that you generate correct parsing/errors and what not, but also that you execute correctly. That is fundamental for programming language correctness. (I don't think snapshot tests run the interpretter, but maybe they do)

I think the theory being proposed here is that eventually I should be able to write a language implementation in python that covers all the errors and full intepretter setup. Build a conformance test suite that tests everything. Then ask AI to build me a fully working interpretter in Zig by implementing the tests one by one with a little extra guidance. After that I should theoretically even be able to ask for an llvm backend and just need to ask the ai to have it generate the exact same output as the intepretter when used.

view this post on Zulip Luke Boswell (Jan 04 2026 at 11:46):

I don't think snapshot tests run the interpretter

Yeah they do in the "REPL" snapshots :smiley:

view this post on Zulip Luke Boswell (Jan 04 2026 at 12:02):

theoretically even be able to ask for an llvm backend and just need to ask the ai to have it generate the exact same output as the intepretter when used.

I think this is the approach (or something similar) we're currently using building out the LLVM backend for Roc.

I am not sure how complete the LLVM backend is already, but from skim reading some of the PR's it looks like Richard has been using our tests with both the interpreter and LLVM backend and confirming they evaluate to the same thing -- and this being of the key methods of feedback for Claude and friends to work with as a source of truth.

view this post on Zulip Luke Boswell (Jan 04 2026 at 12:04):

See https://github.com/roc-lang/roc/pull/8810

view this post on Zulip Brendan Hansknecht (Jan 04 2026 at 21:04):

Ah yeah, then I think we have the core of what they talk about here.


Last updated: Jan 12 2026 at 12:19 UTC