Thoughts on Using AI in the Language Development Process? · compiler development

Stream: compiler development

Topic: Thoughts on Using AI in the Language Development Process?

Ghislain (Dec 24 2025 at 17:10):

Hi everyone,

I’ve been following the development of the language over the past few years and recently noticed in some comments that a few of you are using AI tools (like Claude) during the development process. I’d love to hear your thoughts and experiences on this.

What do you see as the main advantages, and what limitations or risks have you encountered (design choices, consistency, long-term maintainability, etc.)?

I’m genuinely curious to learn from your perspectives. Thanks in advance for sharing — and Merry Christmas :holiday_tree:!

(Feel free to move this discussion out of “compiler development” if this isn’t the right place.)

Richard Feldman (Dec 24 2025 at 17:31):

since Sonnet 4.5 came out, and even more so with Opus 4.5, my workflow has pretty much changed to "write English, review Zig code"

Richard Feldman (Dec 24 2025 at 17:32):

I'm a lot faster that way than before (despite 30 years of handwriting code!) plus it's much more conducive to making progress in tiny chunks of time

Richard Feldman (Dec 24 2025 at 17:32):

like right now I'm on the way to the gym and I have 5ish agents running on my laptop on different branches, and I can review what they did when I get back

Richard Feldman (Dec 24 2025 at 17:33):

I really struggled with lack of continuous hours prior to this workflow existing

Richard Feldman (Dec 24 2025 at 17:33):

since having a kid, since Roc is not my day job, I just would not have "work on Roc for several hours in a row" except for maybe a handful of times per year

Richard Feldman (Dec 24 2025 at 17:34):

that's not really a problem anymore because I don't need to load as many code-level details into my brain to make progress now

Richard Feldman (Dec 24 2025 at 17:36):

so I can't speak for others, but for me it's become totally transformative in a really positive way! :smiley:

Ghislain (Dec 24 2025 at 17:39):

Thanks for sharing!

I’m curious: do you find that the AI is able to maintain a sufficiently high-level, global understanding of the project on its own, or do you usually have to guide it quite explicitly (e.g. which parts of the codebase to touch, which files to modify, architectural constraints, etc.)?

In other words, how much of the “big picture” can you delegate to the AI today, and where do you still need to step in?

Richard Feldman (Dec 24 2025 at 17:43):

none of the current tools are good at big picture in my experience

Richard Feldman (Dec 24 2025 at 17:44):

I can't just be like "hey Claude, draw the owl" - that will end in disaster

Richard Feldman (Dec 24 2025 at 17:45):

however for small bugfixes I actually can be like "hey Claude, here's a boilerplate script you can follow to reproduce the bug in a test, then track down the root cause, and apply a fix and then push a draft PR for me to review"

Richard Feldman (Dec 24 2025 at 17:47):

sometimes that's enough bc there's no "big picture understanding" necessary - just surrounding context clues like the bug report and Claude being able to poke around the code case is enough to figure out how to repro it and then find a fix (again, for small bugs!)

Richard Feldman (Dec 24 2025 at 17:48):

so usually the diffs for these PRs are very small and easy to review. I'd guess more than 50% don't need changes, but a common failure mode is "Claude made the test pass using a bandaid fix that doesn't actually address the root cause"

Richard Feldman (Dec 24 2025 at 17:49):

and when I spot that in review it's often enough for me to go back and be like "hey this looks like a bandaid, try digging deeper to find the actual root cause"

Richard Feldman (Dec 24 2025 at 17:51):

for bigger projects they need to be chopped up into smaller pieces, can't just be like "draw the owl of this whole feature" and there is usually way more iterating back and forth on my reviewing it, finding something unacceptable, describing the necessary revisions, etc

Richard Feldman (Dec 24 2025 at 17:52):

but again for me personally it's huge to be able to have each of those interactions be on and arbitrary schedule. I don't need to sit down and spend 10 minutes reorienting myself in the code to be able to make progress like I do when writing code by hand

nandi (Dec 24 2025 at 18:55):

this thread made me try opus and damn its a nice model lol

Luke Boswell (Dec 24 2025 at 19:37):

I also find it really valuable, I can contribute while working around other meetings and different things. I tend to spend a fair bit of time in "plan mode" until I'm convinced Claude has identified a good root cause and understands the issue, or the scope of the next step isn't too large.

Brendan Hansknecht (Dec 25 2025 at 00:25):

Ai tools take tons of context to work well currently, but they can do a lot.

Brendan Hansknecht (Dec 25 2025 at 00:27):

One of my coworkers who is an extreme ai power users describes current AI as "a neurodivergent intern who is exceptionally passionate and types really really fast". I think that is a relatively fair way to describe it today. If you can give enough details, guidance, and measurable tasks, it can do great things. But it also can generate gigantic messes and releasing the enthusiasm without bounds is a catastrophic mistake.

Brendan Hansknecht (Dec 25 2025 at 00:29):

I find for mych of my work, it requires very detailed prompting. Lots of asking targeted questions to get the AI to think about the problem in the right way and a lot of planning mode with thinking on and the "ultrathink" keyword. (This is for Claude opus 4.5)

Brendan Hansknecht (Dec 25 2025 at 00:29):

It works rather well, but still constantly falls short especially if I am not careful enough

Brendan Hansknecht (Dec 25 2025 at 00:29):

Note, this is from my generally work on not much roc work

Ghislain (Dec 25 2025 at 01:17):

Thank you for your answers, I find this workflow really impressive.

One concern I still have, though, is how well this approach holds up for building a programming language, where a lot of value comes from deep optimization, careful design trade-offs, and long-term refinement.

Do you feel the AI can meaningfully contribute at that level, or does it mostly help with implementation once those decisions are already very well defined?

Brendan Hansknecht (Dec 25 2025 at 01:20):

I think the work that my coworker does (infrastructure and core abstractions for high performance and maintable GPU kernels) qualifies as requiring "deep optimization, careful design trade-offs, and long-term refinement".

Brendan Hansknecht (Dec 25 2025 at 01:22):

And I think it can...but you have to be way more careful of it currently.

A huge part of it is not giving basic answers to the ai, but instead asking it questions that cause it to think through the design. Then when it gets to implementation, it has that design discussion as reference.

Brendan Hansknecht (Dec 25 2025 at 01:23):

That said, I currently find it better to explore design separate from code. Use the ai to educate myself to inform better design rather than let it create the design.

nandi (Dec 25 2025 at 01:24):

Brendan Hansknecht said:

And I think it can...but you have to be way more careful of it currently.

A huge part of it is not giving basic answers to the ai, but instead asking it questions that cause it to think through the design. Then when it gets to implementation, it has that design discussion as reference.

That part

Richard Feldman (Dec 25 2025 at 02:00):

Brendan Hansknecht said:

I currently find it better to explore design separate from code. Use the ai to educate myself to inform better design rather than let it create the design.

yeah, same here!

nandi (Dec 26 2025 at 18:54):

image.png

Anton (Dec 26 2025 at 18:55):

Yes, relevant context :smile:

Richard Feldman (Dec 26 2025 at 22:01):

Ghislain said:

One concern I still have, though, is how well this approach holds up for building a programming language, where a lot of value comes from deep optimization, careful design trade-offs, and long-term refinement.

Do you feel the AI can meaningfully contribute at that level, or does it mostly help with implementation once those decisions are already very well defined?

so far I think the only way LLMs have contributed to Roc's design is in researching what other languages do - e.g. it's way faster now to say "hey Gemini [which is the model I usually use for research] how do various different programming languages do _____?" than it used to be to try to find exactly the right part of documentation before, especially if I don't know what I'm searching for - e.g. one of my searches was "what names do different languages use for flat_map?" and if I do like "what does [specific language] call flat_map?" then if the language uses a different name, I had to hope someone asked a StackOverflow question like that or else I'd be running different searches guessing different names :smile:

Richard Feldman (Dec 26 2025 at 22:02):

but at least in Roc's case I haven't done any like "hey [model] help me solve this design problem" stuff - it's really mostly been for research, debugging, or writing/editing code

Ghislain (Dec 26 2025 at 22:31):

On a more philosophical note — and as a relatively younger developer — this is something I find myself thinking about more and more as AI plays such a large role in implementation and even design.

Ghislain (Dec 26 2025 at 22:32):

I’ll admit there’s also a bit of personal tension for me: seeing AI not only write code much faster, but now also reason and design at such speed can feel slightly discouraging, especially when my own motivation has always been to deeply understand things in order to do them as well as possible.

Ghislain (Dec 26 2025 at 22:32):

How do you still find personal meaning or interest in the project itself — especially a programming language, in a context where we may be expected to use them less and less — and where does that meaning come from today?

nandi (Dec 26 2025 at 22:35):

I ask myself this question a lot. like i can potentially see a future where we just write everything in assembly via agent

Ghislain (Dec 26 2025 at 22:40):

I guess writing in a higher-level language like C/Zig would already be sufficient, given the level of optimizations modern compilers are able to perform (thank you Advent of Compiler :smile:).

nandi (Dec 26 2025 at 22:41):

Richard Feldman said:

e.g. it's way faster now to say "hey Gemini [which is the model I usually use for research] how do various different programming languages do _____?"

I'm curious what you use Gemini for, since from context, it looks like you use claude for code. Is gemini better for you regarding research?

Richard Feldman (Dec 26 2025 at 22:54):

at least right at this moment it's force of habit. There was some point where Gemini 2.5 Pro was giving me better answers than Claude 4 was and I haven't done a revised comparison since then :smile:

Richard Feldman (Dec 26 2025 at 22:59):

Ghislain said:

How do you still find personal meaning or interest in the project itself — especially a programming language, in a context where we may be expected to use them less and less — and where does that meaning come from today?

It honestly never occurred to me to think about it in these terms.

Like I use Rust at work and Zig on Roc, I use the same models on both, and the languages feel very noticeably different. I assume the same would be true of like Go or TypeScript too, so I don't see why Roc would be any different in that regard! :smile:

Richard Feldman (Dec 26 2025 at 23:01):

like I'm still spending a lot of time reading and revising code, and build times are a major contribution to how long it takes llms to get to a point where code is reviewable.

Brendan Hansknecht (Dec 27 2025 at 06:54):

Ghislain said:

I’ll admit there’s also a bit of personal tension for me: seeing AI not only write code much faster, but now also reason and design at such speed can feel slightly discouraging, especially when my own motivation has always been to deeply understand things in order to do them as well as possible.

I think this is still where the most value is. AI may be fast but understanding the depth to enable it to build robust and extensionable solutions is an important skill. Also for many areas of significant depth AI falls quite flat. So being someone who wants to dive deep and really grok things, you are in the best place to take advantage of AI.

Brendan Hansknecht (Dec 27 2025 at 06:56):

I think the people who are getting hurt the most by AI right now are the people who are outsourcing too much to AI. As a result, they are no longer learning and growing. Instead they become stagnant and dependent. People who use AI to learn and deal with only a subset of tasks are able to focus on the most important work, grow, and generally accelerate their work.

Luke Boswell (Dec 27 2025 at 10:13):

People ... are able to focus on the most important work, grow, and generally accelerate their work.

I agree with this sentiment.

Anton (Dec 29 2025 at 10:28):

Ghislain said:

How do you still find personal meaning or interest in the project itself — especially a programming language, in a context where we may be expected to use them less and less — and where does that meaning come from today?

I think there is still demand for a programming language that:

builds fast
tests fast: because it will not re-run tests whose behavior could not have changed
can easily work with access permissions
is easy to read by humans
encourages handling of a wide range of errors
is less likely to experience runtime problems because of static type checking

It's actually an exciting opportunity to be able to make a programming language for this new age because we can design it with the capabilities of modern LLMs in mind. Making big changes like that seems a lot harder for established programming languages.

Richard Feldman (Dec 31 2025 at 16:12):

yeah another way to think about it: if language doesn't matter anymore, what's the one language all AI-programming enthusiasts have converged on using for all greenfield programming tasks?

Richard Feldman (Dec 31 2025 at 16:13):

(answer: whichever one they feel happiest using or which their job demands, just like always)

Richard Feldman (Dec 31 2025 at 16:14):

so I don't think the tradeoffs about languages have changed at a fundamental level, it's more that some tradeoffs matter more than other

Richard Feldman (Dec 31 2025 at 16:15):

for example, it's wild to me that about a year ago we were discussing how static dispatch enabling . autocomplete in IDEs was a big selling point, and today I do manual editing so infrequently I basically value that selling point at zero :sweat_smile:

Richard Feldman (Dec 31 2025 at 16:18):

and it feels like the incentives around automated tests have changed a lot when doing a ton with agents; the tests are way cheaper to write, and have added value because they not only catch regressions, they give agents a feedback mechanism to self-correct without programmer review.

Richard Feldman (Dec 31 2025 at 16:19):

so in that world, like Anton noted, I think the "tests can be auto-skipped when they're all pure functions and the code didn't change" is a way bigger deal, because bigger test suites (currently) unavoidably slow the agents down, and yet are important for feedback loop reasons.

Richard Feldman (Dec 31 2025 at 16:21):

another example: the "what color is your function?" tradeoff of effectfulness needing to propagate all the way through a call chain - used to be a big downside because you'd need to do that propagation yourself. Now it's trivial; I'll just notice in the review that the agent did it.

Richard Feldman (Dec 31 2025 at 16:22):

and the upside of names ending in ! making it more obvious where effects are happening is more valuable than before, because I'm spending proportionately more of my time reading diffs compared to before

Anton (Jan 01 2026 at 10:37):

An interesting benchmark for programming languages would be: How fast can Opus 4.5 build a specific set of diverse applications with language X, what's the perf of the application and how many tests does the app pass of a very thorough hidden test suite.

Anton (Jan 01 2026 at 10:43):

Other interesting metrics: total number of errors and warnings encountered, number of times the application was run during development, time spent debugging

Brendan Hansknecht (Jan 01 2026 at 16:05):

Would be interesting if you added extra constraints to see how it copes. For example, it might start with naive python. You should also be able to ask it to optimize the code to reach a certain performance... Might then use numpy in python for example

nandi (Jan 01 2026 at 18:11):

Brendan Hansknecht said:

Might then use numpy in python for example

yesterday I learned numPy is implemented in fortran

Anton (Jan 01 2026 at 18:14):

Hmm, only a tiny bit seems to be implemented in Fortran based on the "Languages" breakdown on https://github.com/numpy/numpy

nandi (Jan 01 2026 at 18:16):

I think I learned it from an LLM so yeah that checks out lol

Brendan Hansknecht (Jan 01 2026 at 18:42):

Fortran can optimize better than c for this kind of code due to assuming that pointers won't alias by default.

Not sure how relevant it is in the modern day, but it was part of the reason for its early growth. Was better for high performance scientific code and would run faster.

Brendan Hansknecht (Jan 01 2026 at 18:42):

But that would be part of my guess for the origin here

Brian Carroll (Jan 03 2026 at 20:34):

If you click on Fortran in that repo to see the files and scroll through them, they are all just documentation and test data for a thing called "f2py", which seems to be a tool for integrating user-defined Fortran code into Python apps. But it doesn't seem like numpy itself actually implements any algorithms in Fortran.

Ghislain (Jan 04 2026 at 00:26):

What do you think about Simon’s take ?

If you’re introducing a new protocol or even a new programming language to the world in 2026 I strongly recommend including a language-agnostic conformance suite as part of your project.

Richard Feldman (Jan 04 2026 at 00:58):

I don't understand what he's saying there :sweat_smile:

Richard Feldman (Jan 04 2026 at 00:58):

maybe he blogged about it in more detail somewhere else?

Luke Boswell (Jan 04 2026 at 01:11):

Isn't that basically what we are developing in our suite of snapshot tests? we are building up a collection of examples which demonstrate the syntax and semantics

Richard Feldman (Jan 04 2026 at 01:58):

that's certainly one thing he might mean :smile:

Brendan Hansknecht (Jan 04 2026 at 03:28):

I think it is the snapshot tests but also taken a level further. You need to make sure not only that you generate correct parsing/errors and what not, but also that you execute correctly. That is fundamental for programming language correctness. (I don't think snapshot tests run the interpretter, but maybe they do)

I think the theory being proposed here is that eventually I should be able to write a language implementation in python that covers all the errors and full intepretter setup. Build a conformance test suite that tests everything. Then ask AI to build me a fully working interpretter in Zig by implementing the tests one by one with a little extra guidance. After that I should theoretically even be able to ask for an llvm backend and just need to ask the ai to have it generate the exact same output as the intepretter when used.

Luke Boswell (Jan 04 2026 at 11:46):

I don't think snapshot tests run the interpretter

Yeah they do in the "REPL" snapshots :smiley:

Luke Boswell (Jan 04 2026 at 12:02):

theoretically even be able to ask for an llvm backend and just need to ask the ai to have it generate the exact same output as the intepretter when used.

I think this is the approach (or something similar) we're currently using building out the LLVM backend for Roc.

I am not sure how complete the LLVM backend is already, but from skim reading some of the PR's it looks like Richard has been using our tests with both the interpreter and LLVM backend and confirming they evaluate to the same thing -- and this being of the key methods of feedback for Claude and friends to work with as a source of truth.

Luke Boswell (Jan 04 2026 at 12:04):

See https://github.com/roc-lang/roc/pull/8810

Brendan Hansknecht (Jan 04 2026 at 21:04):

Ah yeah, then I think we have the core of what they talk about here.

osa1 (Jan 13 2026 at 18:07):

I'm curious which Claude plan are you using when working on the Roc compiler? I'm new at AI coding, just tried Opus 4.5 today and found it promising. Unfortunately it was only able to write a few hundreds lines of code on an extremely well defined task (well documented, there's another implementation already in the same project etc.) before melting my pro plan quota.. It looks like the pro plan can maybe generate one PR/day at this rate. (or maybe I'm using it wrong?)

Anton (Jan 13 2026 at 18:14):

Yeah, it depends what you are doing but with opus 4.5 you will quickly go through your pro limit. I am using the lowest tier max plan.

Richard Feldman (Jan 13 2026 at 18:24):

I use the $200/month Claude Max plan and I have to ration my weekly usage or else I run out way before the week is over. I've thought about getting a second $200/month plan but decided instead to just do other things while I wait for the weekly limit to reset.

Richard Feldman (Jan 13 2026 at 18:24):

and all I use on it is Opus 4.5 via Claude Code

osa1 (Jan 13 2026 at 18:52):

Thanks. I guess I'm not doing anything too too stupid :sweat_smile:

osa1 (Jan 13 2026 at 21:41):

You have a few contributors using Claude but no CLAUDE.md.. Is there a reason for this? Wouldn't that file help with Claude discovering certain things, like how to run tests? Or are you reusing the same thread with Claude for multiple PRs?

Dan G Knutson (Jan 13 2026 at 22:39):

I'm relatively new to this stuff, but I think the AGENTS.md / .rules file is basically a no-vendor-lock-in version of CLAUDE.md

Dan G Knutson (Jan 13 2026 at 22:40):

it does seem to help to have a working .md file for the plan for a specific high-level feature, though (in addition to whatever internal plan file claude code makes)

Luke Boswell (Jan 13 2026 at 23:27):

Yeah we have https://github.com/roc-lang/roc/blob/main/.rules which effectively does the same thing

Luke Boswell (Jan 13 2026 at 23:30):

We try to catch the really common mistakes the LLM's make without this guidance, but not too much more

Luke Boswell (Jan 13 2026 at 23:32):

I find I can be quite productive on Claude's Pro subscription... but I usually time out the 5hr limit in 45mins or so. I've upgraded to the lower Max tier which gives me the ability to run an agent pretty much all day without breaks.

Eli Dowling (Jan 19 2026 at 02:00):

I've heard a lot of talk online recently that they really reduced the usage people get in the past few months. I signed up to the pro plan and was kind of shocked how quickly I ran out.
Particularly when my github copilot plan which is cheaper (well actually free, thanks roc :sweat_smile:) gives me 100 prompts regardless of token usage.

I think github copilot makes or whatever they call it is basically the best deal around. It gives you like 500 prompts. Opus costs 3 prompts, but all other decent models are just one.

Github does give you only half the context limit though. I've also heard the Google sntigravity plan is also now a better deal than claude pro.
(You can use all these providers inside opencode, and maybe zed or zed via ACP)

Lastly I actually tried exactly this, getting a totally fine coded attempt at adding compeltion to the roc language server...... It didn't go well.
Probably didn't give it enough planning stages or whatever, but the result was just really cooked.
It did manage to have completion of top level locals and builtins though. But it was just kind of shitty and broke all the time.

I was thinking it might be easier to give it the existing rust lang server and tell it to replicate the architecture. The hard part is mostly the caching and the visitor.

Eli Dowling (Jan 19 2026 at 02:06):

I think I might give it another crack at some point.

On a personal level, I think I enjoy coding less, post AI, and that's a little sad. My motivation to tackle hard problems by hand and produce very high quality code in my free time is just so much lower when I know I can likely get the same result with less effort using AI.

But the productivity gain is huge, so it's hard to argue with it.... But I do sometimes miss the days before

Anton (Jan 19 2026 at 07:27):

I do sometimes miss the days before

I miss it a bit too, the problem solving using raw brain power was intrinsically rewarding.

Brendan Hansknecht (Jan 28 2026 at 04:55):

I feel for most of the harder problems I am interested in, it is about the same. I still have to figure out how to fundamental solve the problem and how I want things implemented such that perf is not awful. If AI gets better and writing performant and organzied code when considering the whole system. My opinions may change.

Last updated: Feb 20 2026 at 12:27 UTC