Stream: compiler development

Topic: zig compiler - integrate with tooling


view this post on Zulip Eli Dowling (Feb 04 2025 at 08:17):

I think this rewrite is a great time to also think a little about how the roc compiler can integrate well with external tooling and the language server:
Things like:

I just hadn't seen any discussion of it and wanted to try to make sure it's not forgotten :)

view this post on Zulip Sam Mohr (Feb 04 2025 at 08:31):

I think Richard isn't a fan of tying an external protocol to our compiler:

Richard Feldman said:

zooming out for a sec, I'm trying to avoid coupling the public roc API to other protocols, so that we don't get in a situation where people are saying "hey please update the language and do a release ASAP because we're blocked and there's no way for us to unblock ourselves short of a language release."

examples of this include:

with that in mind, I wonder if there's a way we could keep the relevant logic in the roc binary but expose it in a way that lsp (and others) can access functionality with it in a way that can be upgraded independently from roc itself - like for example what we do with roc glue and giving it a .roc script that describes what to do

view this post on Zulip Sam Mohr (Feb 04 2025 at 08:32):

But since we're aiming for a single binary that does everything, it'd be really nice to have the same experience as with gleam lsp

view this post on Zulip Eli Dowling (Feb 04 2025 at 08:37):

Well that's why I was hoping that the roc binary itself could expose very low level primitives, symbols, types, ast etc.
That way improvements to the language server can occur separately to the language.
It seems like that decouples roc from lsp much more, right?

view this post on Zulip Sam Mohr (Feb 04 2025 at 08:38):

Yeah, I'd agree

view this post on Zulip Sam Mohr (Feb 04 2025 at 08:38):

Sorry, I think I read your message too quickly

view this post on Zulip Luke Boswell (Feb 04 2025 at 08:39):

I think the plan is for each IR to be nicely serialisable for caching and snapshotting, and also for external tools to integrate with.

view this post on Zulip Luke Boswell (Feb 04 2025 at 08:39):

I'm not exactly sure how that will look or work. But definitely something I'm keen to explore and understand more to help ensure we dont miss it.

view this post on Zulip Eli Dowling (Feb 04 2025 at 08:41):

Also I think a language server written in roc would be a really great, "here is roc doing real things that are non trivial" example.
language servers do:

view this post on Zulip Luke Boswell (Feb 04 2025 at 08:42):

oh, interesting... writing the language server in roc. How would that work? as a plugin?

view this post on Zulip Luke Boswell (Feb 04 2025 at 08:44):

I'm really interested in this idea, it sounds great.

view this post on Zulip Luke Boswell (Feb 04 2025 at 08:45):

Would that need to be a custom platform?

view this post on Zulip Eli Dowling (Feb 04 2025 at 08:48):

We can look at how other language servers work.
i have few ideas, but one I quite like is having the compiler also be a roc platform.
That way you guarantee the language server will always work even if you have a different version of the compiler, but it can still release independently of the compiler.

The compiler could also expose apis over jsonrpc or the like and the language server could communicate that way.

view this post on Zulip Eli Dowling (Feb 04 2025 at 08:50):

I think choosing either one should allow us to switch to the other with very little hassle.
Eventually we should definitely expose both, having your compiler be able to be run as a service lets people build interesting tools with it, but also I think the platform approach would be better for the language server and making it super simple for people to build other tools in roc.

view this post on Zulip Luke Boswell (Feb 04 2025 at 08:51):

When you say the roc compiler, are you thinking the roc cli binary/executable?

view this post on Zulip Eli Dowling (Feb 04 2025 at 08:53):

Yup. You could start it with a special flag and it goes into "rpc mode" where you can send messages back and forth and get stuff out.

view this post on Zulip Eli Dowling (Feb 04 2025 at 08:54):

In the short term, I think a platform for roc would be the best choice though. Again, it's a good way to show off our cool features, with a real word use case

view this post on Zulip Richard Feldman (Feb 04 2025 at 14:39):

yeah it could work like glue

view this post on Zulip Richard Feldman (Feb 04 2025 at 14:40):

where instead of gleam lsp you run something like roc tooling lsp.roc

view this post on Zulip Richard Feldman (Feb 04 2025 at 14:42):

and then lsp.roc works like rust-glue.roc in that its platform is provided by the compiler, and lsp.roc plays the role of translating direct function calls to/from the compiler into the language server protocol

view this post on Zulip Eli Dowling (Feb 04 2025 at 14:47):

I think that's a good solution for now.

In the longer term it may be nice to be able to release changes to the language server without updating roc as a whole (not sure how often you plan to do releases).
If so I was thinking we could have a platform that produces it's own standalone binary and uses the compiler as a library. That way the language server can be bumped separate to the compiler because it bundles its own copy of the compiler.

view this post on Zulip Eli Dowling (Feb 04 2025 at 14:50):

I had a chat to @Luke Boswell earlier today about this.
I think it would be great if we could make the "roc tooling" platform be more than just a language server.
If we try to keep everything pretty general purpose on the zig side, getting ast, different IR, types, symbols etc, then we could easily create a base for other analysis tools, like linters, or codegen etc. All written in roc

view this post on Zulip Richard Feldman (Feb 04 2025 at 15:10):

yeah I do think that type of thing seems reasonable

view this post on Zulip Richard Feldman (Feb 04 2025 at 15:13):

I like the "linting" philosophy where it's not so much about enforcing arbitrary stylistic preferences as it is about project-specific invariants

view this post on Zulip Richard Feldman (Feb 04 2025 at 15:15):

e.g. "we have decided to move away from doing things in this way that we used to, and the tool's job is to fail the build if that way is used in any new code, with exceptions carved out for old code"

view this post on Zulip Eli Dowling (Feb 04 2025 at 15:17):

I think one day it would be really cool if we could do things like create linting rules using the refcount IR.
That way we could create a lint that says "hey this platform expects to be able to reuse this buffer, and you're storing an extra reference to it here"

view this post on Zulip Richard Feldman (Feb 04 2025 at 15:21):

I think it's reasonable to do that on the application side but not the platform side

view this post on Zulip Richard Feldman (Feb 04 2025 at 15:21):

like "I always want to be reusing this for perf, and if I ever stop doing that here I want to know about it"

view this post on Zulip Richard Feldman (Feb 04 2025 at 15:22):

as opposed to "you must only give me one that can be reused or else your build will fail" - at which point we've added a janky version of Rust's ownership types to Roc :big_smile:

view this post on Zulip Eli Dowling (Feb 04 2025 at 15:30):

oh no, I more just meant:
If we build a good compiler platform that allows plugins for things like linting, and we expose the right stuff, we could enable a platform author to add little suggestions like that. Not as an error necessarily but as a hint.
You know: "if anyone calls this function, and the variable it's assigned to has multiple references, show a little suggestion that says "sure you want to do that mate?"

I would definitely write one for myself that does: "if i have a variable called buff and it has multiple references, warn me"

view this post on Zulip Brendan Hansknecht (Feb 04 2025 at 16:43):

Richard Feldman said:

and then lsp.roc works like rust-glue.roc in that its platform is provided by the compiler, and lsp.roc plays the role of translating direct function calls to/from the compiler into the language server protocol

This really don't make sense to me. Unlike glue, the language server is a bespoke single application.

view this post on Zulip Brendan Hansknecht (Feb 04 2025 at 16:46):

Luke Boswell said:

I think the plan is for each IR to be nicely serialisable for caching and snapshotting, and also for external tools to integrate with.

I think we should be really careful of this. The more coupling we expose the less flexible and changeable roc becomes. This sounds like a trivial way to hit hyrum's law really hard.

Not saying we shouldn't do it, but if we do it, we probably should pick one very explicit cutting point that we think is unlikely to change.

view this post on Zulip Brendan Hansknecht (Feb 04 2025 at 16:46):

That said, I think both zig and Odin expose parts of their compiler in their standard library, so maybe it isn't too bad.

I guess a lot of this depends on good versioning guarantees.

view this post on Zulip Brendan Hansknecht (Feb 04 2025 at 16:48):

Also, this all may fall nice into a libroc workflow where you just use the compiler as a library instead of as an executable

view this post on Zulip Eli Dowling (Feb 04 2025 at 16:48):

I am strongly with Brendan on this. I would be leery of exposing the full IR. I would like to transform it slightly to include only the info we see external services wanting, recounts symbol locations type info etc. nothing weird and internal if we can avoid it.

Hopefully that will keep us from having the external API change much.

view this post on Zulip Luke Boswell (Feb 04 2025 at 19:20):

As a starting point I was thinking of making libroc essentially expose the roc check functionality, and it returns the ResolvedIR (prev Can) or maybe as far as the last IR before code gen ... and making a roc platform that provides that to a roc app aling with Problems.

I figure this would be all that's required for making an LSP, or something like checkmate, or our playground.

view this post on Zulip jan kili (Feb 10 2025 at 21:33):

Is this topic a superset of (something we talked about several months ago) wanting to be able to convert raw Roc source code files to/from a serialization format like JSON or YAML etc, powered by something like a first-party JSON Schema? Should I start a new topic?

view this post on Zulip Luke Boswell (Feb 10 2025 at 22:05):

I think the general idea is to serialise the IR's to an S-expression format, which should then be easy to parse and work with -- I'm not sure about a schema, though I guess once the IR has firmed up that might help standardise it.

view this post on Zulip Anthony Bullard (Feb 10 2025 at 22:07):

yes, each IR would have it's own sexpr representation

view this post on Zulip Anthony Bullard (Feb 10 2025 at 22:07):

Which would be very simple

view this post on Zulip Brendan Hansknecht (Feb 10 2025 at 22:28):

I think the general idea is to serialise the IR's to an S-expression format

Is that for this tooling as well? This tooling wouldn't be using a serialized text at all. It should be directly using some sort of roc tag union representation of the IR.

view this post on Zulip Eli Dowling (Feb 10 2025 at 22:31):

Well making it a format able to be sent outside of zig is pretty essential if we want to build tools for roc in roc.
So I'd call it tangentially relevant.

view this post on Zulip Brendan Hansknecht (Feb 10 2025 at 23:28):

For sure, I was think of text representation vs tag union representation which may be two very different shapes

view this post on Zulip jan kili (Feb 11 2025 at 00:10):

Anthony Bullard said:

Which would be very simple

Sweet! So could I write a Roc library that helps you "read/write Roc code" by mapping+translating typed values (likely mostly Strs, but maybe lots of tags) from/to the raw contents of main.roc.ir_step_5.sexpr.idk files that the compiler writes beforehand / reads later? (Maybe in real time if a platform called the compiler in a certain way?)

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:13):

I'd like to explore the idea of roc glue -- potentially even becoming something more like roc gen and it could potentially access any or all of the IR's and then we could write plugins (roc scripts) that do things with roc source code really easily. The primary usecase is for things like tooling (e.g. checkmate).

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:14):

If we use a Str and parse the S-expressions on the roc side... it will be much easier than trying to maintain a binding to the roc types for all the IR's.

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:17):

So I imagine a roc package that parses the IR Str and gives us an AST using roc tag unions etc.

view this post on Zulip Sam Mohr (Feb 11 2025 at 00:18):

How would we make sure that it always stays in sync with the Zig equivalent? Seems easier to keep this in a Zig library

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:18):

Then I imagine having a few simple effects available like Stdout.line!, File.write! or Http.send! to do stuff with this.

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:19):

Sam Mohr said:

How would we make sure that it always stays in sync with the Zig equivalent? Seems easier to keep this in a Zig library

I imagine we could fuzz it somehow... we will be using this for glue generation anyway [in my hypothesis here] -- which we would want to be reliable.

view this post on Zulip jan kili (Feb 11 2025 at 00:20):

Are the alternatives (b) not having this functionality and (c) having third parties write plugins in Zig?

view this post on Zulip Sam Mohr (Feb 11 2025 at 00:22):

Sounds about right, with more leaning towards (c)

view this post on Zulip Sam Mohr (Feb 11 2025 at 00:22):

Or writing bindings to Zig, which seems janky

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:23):

Note that if we expose things via zig, e.g. (c), we wouldn't want to use the builtins... as they're really our internals and not a very nice abstraction for working with. So if we do expect people to work with zig, it's the same issue where we have a separate thing (zig library) that needs to be maintained and kept in sync.

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:24):

If we can standardise on a simple protocol (S-expressions) instead of a library (one blessed language) then it will be much easier for tooling in any language.

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:27):

But maybe I'm wrong here... we will need zig code to serialise and deserialise the IR's anyway, and a parser in Roc would be duplicating this effort.

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:27):

and it could potentially access any or all of the IR's and then we could write plugins

Please no

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:27):

Why not?

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:27):

The more exposed our internals are, the more lockdown they are

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:28):

I think we should expose a single cutting point with its own transformed IR and nothing else

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:28):

Huge hit by hyrum's law

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:28):

Ahk... well I guess maybe we just expose the IR after type checking, and then the IR after the full build (which includes refcounting etc)??

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:29):

well I guess maybe we just expose the IR after type checking

Yeah, something around here is the one point I think we should expose.

....

refcounting is technically an internal detail, why do we want to expose it?

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:29):

We need to expose something for generating glue anyway... my hypothesis is that we could also expose something that tooling like checkmate can use too

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:30):

Oh, it was the LSP that wanted to know about tail-call or other optimisations.

I was wondering if we could even write our LSP as a roc plugin?

Mostly spit-balling here... these aren't really thought through ideas. I just feel like we could use roc scripts to simplify a lot of things for our own tooling.

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:32):

It possible. If we do so, I think we just need to make sure to pick a limited set of cutting points with really clear apis.

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:33):

Potentially even separating them completely from the IR so the IR can change separately from the api (probably required anyway essentially to translate from zig to roc)

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:34):

JanCVanB said:

Are the alternatives (b) not having this functionality and (c) having third parties write plugins in Zig?

d. standard c api shared libaries

view this post on Zulip Richard Feldman (Feb 11 2025 at 00:37):

also, roc glue does not support effects by design, because it means you know it is always 100% harmless to run anyone's glue script

view this post on Zulip Richard Feldman (Feb 11 2025 at 00:37):

all it's going to do is to spit out files into the directory you requested, because that's all it knows how to do :big_smile:

view this post on Zulip Luke Boswell (Feb 11 2025 at 00:38):

My gut feeling here is that we should kick the tooling can down the road a little further -- and option (d) aka libroc is probably the best option, but there's a lot of design work around the interpreter etc that would be good to understand first.

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:49):

that is actually option (e) techincally. For (d) I meant that the compiler could run liblsp.so instead of making a roc interface for the lsp.

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:49):

but yeah, I think this part of tooling, we should wait on.

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:49):

First we should figure out the best abstraction for glue (in the new compiler) and learn from wiring that up happily

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:50):

Then we should revisit exposing things specifically thinking of the LSP use case

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:50):

Finally we should think if that can be expanded to more general use cases.

view this post on Zulip Brendan Hansknecht (Feb 11 2025 at 00:50):

That is at least roughly how I would push on it

view this post on Zulip Lucas Rosa (Feb 12 2025 at 04:16):

honestly, I've been more tempted to think that compilers should be built lsp-first and out. but maybe that's crazy. not saying we should, I don't think there's enough precedence to justify it. but I get this feeling that at this point a language server ends up inevitable and having a good one makes all the difference in terms of adoption. considering how most people actually spend more time interacting with the LSP than the cli, it kinda makes sense to focus on it first and with highest priority. the cli pass would largely end up being for CI most of the time.

view this post on Zulip Richard Feldman (Feb 12 2025 at 04:27):

to be honest, I'm not sure how different we'll want it to be

view this post on Zulip Richard Feldman (Feb 12 2025 at 04:32):

here's a sketch of how it could look:

view this post on Zulip Richard Feldman (Feb 12 2025 at 04:33):

at that point we have all the parsing, canonical IR, and type info in memory and up to date...so we can expose an interface to ask questions about that

view this post on Zulip Richard Feldman (Feb 12 2025 at 04:34):

maybe I'm missing something, but I don't really understand what specific advantages a fundamentally "query-based" compiler architecture would have over that :sweat_smile:

view this post on Zulip Joshua Warner (Feb 12 2025 at 04:35):

What you're describing is a manually-orchestrated query-based compiler architecture

view this post on Zulip Richard Feldman (Feb 12 2025 at 04:38):

I guess? The relevant part to me is that it sounds like it has about 98% code reuse with the batch compiler :big_smile:

view this post on Zulip Joshua Warner (Feb 12 2025 at 04:38):

Also - in the context where you need to potentially do type inference globally, the more traditional query-based architecture doesn't really do much for you. That's most convenient where you have very clear cut-points in the graph (e.g. function boundaries)

view this post on Zulip Joshua Warner (Feb 12 2025 at 04:39):

(side note: I would like to explore automatically carving out such boundaries based on where type annotations occur in the source code)

view this post on Zulip Richard Feldman (Feb 12 2025 at 04:39):

like when I watched Anders talking about C#'s Roslyn compiler architecture years ago, or when I looked at salsa in Rust, they seem wildly different architecturally

view this post on Zulip Joshua Warner (Feb 12 2025 at 04:41):

If you squint hard enough, it's the same

view this post on Zulip Richard Feldman (Feb 12 2025 at 04:41):

the thing is, my assumption is that rerunning type checking on an individual module will be so fast it won't matter, as long as we don't have to redo any work in other modules to do it - which is already the case unless you're doing something like a rename of an exposed thing

view this post on Zulip Joshua Warner (Feb 12 2025 at 04:42):

Agree that we should make the batch case super duper fast as a first priority, and only fall back to other strategies when that clearly _can't_ keep up.

view this post on Zulip Lucas Rosa (Feb 12 2025 at 19:28):

@Richard Feldman yea makes sense, it's probably not much different, especially if things are just fast enough

view this post on Zulip Lucas Rosa (Feb 12 2025 at 19:29):

just vague musings I've had lately on my part

view this post on Zulip Richard Feldman (Feb 12 2025 at 19:40):

yeah like if we can single-threaded roc check a file in under 1 ms per 2K lines of code, then as long as we have uninterrupted access to a CPU core, we can do that at 120fps for all but the top 1% of humongous individual files

view this post on Zulip Brendan Hansknecht (Feb 12 2025 at 23:42):

I guess it depends on the speed of cross file dependency resolution with cached can IR. That likely will get slower in large projects


Last updated: Jul 06 2025 at 12:14 UTC