zig compiler - LSP · contributing

Stream: contributing

Topic: zig compiler - LSP

Luke Boswell (Jul 08 2025 at 03:48):

I'm wondering if anyone is interested in working on the LSP for our new zig compiler?

Luke Boswell (Jul 08 2025 at 03:50):

I'm thinking we have enough "roc check" functionality that this could be useful to have when working on .roc files using the new 0.1 syntax.

Kiryl Dziamura (Jul 08 2025 at 03:55):

I guess @Anthony Bullard wanted working on it. Otherwise I can start the next week (on vacation rn)

Anthony Bullard (Jul 10 2025 at 15:44):

Yeah I told Kiryl to go ahead if he is interested since this "vacation" I'm on is busier than my normal working days :rofl:

Isaac Van Doren (Jul 11 2025 at 04:17):

We talked about exposing generic information that can be used to build tooling such as a language server in this thread https://roc.zulipchat.com/#narrow/stream/304641-ideas/topic/Bundle.20the.20language.20server.20in.20the.20CLI/near/523017150

I think it would be valuable to do some design work about how this should look before implementing it

Luke Boswell (Jul 11 2025 at 04:55):

Ah yeah, that's a fun topic.

I love the simplicity of just making a roc tooling platform and giving is a list of all the modules in the app/package/platform in a big text S-expression format.

It's probably not much more work in the long run to make a nice Roc API that models things properly and would would be much nicer to work with.

Luke Boswell (Jul 11 2025 at 04:58):

If we're not doing just a normal LSP but wanting to pursue this roc tooling approach instead, then we probably want to wait a little longer before we start implementing it. Designing the API and making an example app etc is very doable now though.

Kiryl Dziamura (Jul 11 2025 at 12:54):

Can it be done in parallel? I like the roc tooling idea, but there is a couple of reasons of why I think it makes sense to start working on roc lsp straight away:

It will show us what we need for the api and find the ergonomics we need on practice
LSP won't be implemented cleanly and likely would require a refactoring anyway (make it work -> make it right)
It should be fairly easy to refactor lsp to use an API. roc tooling can be designed along the way.

I'll start drafting things as soon as possible

Brendan Hansknecht (Jul 11 2025 at 15:28):

The lsp is a separate executable from the compiler, right? There is no roc lsp commands, right?

It just that currently it would via subcommands call roc check and in the future it would call roc tooling?

Kiryl Dziamura (Jul 11 2025 at 16:11):

Hm. How about having it as part of the roc cli for a while and when tooling is ready - move it somewhere else? Relying on rendered diagnostic messages seems to be not great

Brendan Hansknecht (Jul 11 2025 at 16:21):

Can we make it a separate executable that depends on our internal zig source (like snapshot.zig is)?

At least if we want it eventually decoupled, I think it is best to avoid ever putting it in the same executable.

... But I'm not actually sure if our long term plans here have shifted and what the current state is.

Kiryl Dziamura (Jul 11 2025 at 16:21):

Ah, sure. Good point!

Richard Feldman (Jul 11 2025 at 16:57):

Kiryl Dziamura said:

Hm. How about having it as part of the roc cli for a while and when tooling is ready - move it somewhere else? Relying on rendered diagnostic messages seems to be not great

my concern is that people will set up their editor configs to use it that way and then we'll never be able to get rid of it without breaking things for a bunch of people

Kiryl Dziamura (Jul 11 2025 at 17:45):

I see. So roc tooling as poc might be whatever api for lsp which then can be refined and stabilized? Or you suggest having lsp completely decoupled for now? I'm for the second option

Richard Feldman (Jul 11 2025 at 18:01):

I'm saying never have a roc lsp, but rather have a different thing that is sufficiently powerful that you can give it a roc script and that's all you need to stand up a language server

Richard Feldman (Jul 11 2025 at 18:03):

I definitely think we should expose enough info that people can build a fast and featureful language server just by providing roc with a single .roc file

Richard Feldman (Jul 11 2025 at 18:03):

and without roc having any formal knowledge of the language server protocol

Anthony Bullard (Jul 11 2025 at 18:54):

You mean the compiler doesn't have a lsp command, but it is instead a separate binary, correct? With no coupling between the compiler and language server?

Kiryl Dziamura (Jul 11 2025 at 18:56):

I know one language that fits the task very well :smile:
I wonder how good would it be to use roc tooling as a platform

Kiryl Dziamura (Jul 11 2025 at 19:00):

@Richard Feldman yes, I understand the concept of roc tooling. My question is rather "should we start working on the roc tooling or on lsp poc to understand what api the roc tooling expects to provide?"

Luke Boswell (Jul 11 2025 at 19:03):

Loving this direction, here's a brief summary of what I am thinking...

So my understanding is the goal is for the roc cli to support plugin scripts that can take different representations of a roc module/app/platform/package and use this for different purposes. Primarily these fall into the category of developer tooling. Some use-cases I can think of; code gen, linters, debug tools, LSP etc

Example usage:

roc tool <plugin.roc> <app.roc> : roc tool lsp.roc hello-world.roc

The app.roc could be any roc code, a standalone module, an app main.roc, package or platform etc.

The plugin.roc is a normal roc app -- the platform it uses is provided by the roc cli. It has an API that provides information about the app.roc that was provided, and enables various IO. The roc cli is the host which loads the plugin script, and provides the information for it to do it's thing.

We discussed different ideas about what this API would actually look like -- the approach I think is simplest and most flexible (at least to start with) is to provide a Str that is simply the CanIR s-expression (exactly the same as is used in the snapshots).

The way I am thinking we approach this is as follows;

Add a flag to roc check --dump-cir that simply prints the CanIR s-expression to STDOUT
Write a Roc package (using current roc) that parses the CanIR s-expression
Write an LSP using basic-cli/basic-webserver that calls ./zig-out/bin/roc check --dump-cir ...

This will give us an end to end capability quickly, and enable or unlock a lot of other dev tooling.

At some point we will want to transition to roc tool and have a plugin, but I think this should be deferred until after we build roc glue and really establish the conventions for calling into roc and platform development.

While there is some translation from current roc syntax to 0.1 -- I don't expect this to be a major issue in future.

Interested to know what people think of this approach?

Kiryl Dziamura (Jul 11 2025 at 19:13):

My take on it is that I don't know at which level roc tooling should provide info and how because there's no consumers yet. Like, we can do the can ir sexpr and then it turns out the consumer needs smth else. I mean, I love the idea of roc tooling but just don't know the real world requirements. Rephrasing, roc tooling is a server and tools are clients. But I don't know how clients want to be served because there are no clients yet.

I'll read some discussions on it

Luke Boswell (Jul 11 2025 at 19:15):

We could also easily add a roc check --dump-ast also if there are tools we want to build that use the AST instead.

Luke Boswell (Jul 11 2025 at 19:16):

I think we just start with something really basic. I don't understand LSP very well, but I imagine there is something like, jump to definition or what type is this identifier etc

Luke Boswell (Jul 11 2025 at 19:16):

Maybe @Eli Dowling might have some advice?

Kiryl Dziamura (Jul 11 2025 at 19:29):

I imagine roc tooling as a DB, where roc file is a representation of database. And you can send queries to it

Richard Feldman (Jul 11 2025 at 19:47):

@Luke Boswell I like that idea!

Richard Feldman (Jul 11 2025 at 19:48):

I'd maybe call it --deprecated-dump-ir just to set expectations that we will replace it in the future with something else :laughing:

Kiryl Dziamura (Jul 11 2025 at 19:50):

so I will start with the flag implementation? so then it's possible to pipe it's output to a separate simple implementation of lsp, right? for starters

Richard Feldman (Jul 11 2025 at 20:06):

hmm, I'm kinda second-guessing how useful that would be now that I'm thinking about it more

Richard Feldman (Jul 11 2025 at 20:08):

I think if we did the dump to a file thing, we would end up starting over from scratch when we start on the real thing bc so little of it would be useful

Kiryl Dziamura (Jul 11 2025 at 20:11):

that's why I suggest implementing a simple lsp using roc zig as library and iteratively get to the tooling middleware

Richard Feldman (Jul 11 2025 at 20:24):

I guess we could call it --deprecated-lsp haha

Isaac Van Doren (Jul 11 2025 at 21:54):

Do we need to implement a language server before getting to tooling/plugin system?

Luke Boswell (Jul 11 2025 at 22:03):

Richard Feldman said:

I think if we did the dump to a file thing, we would end up starting over from scratch when we start on the real thing bc so little of it would be useful

Maybe we should discuss this in another thread.

Why shouldn't we use the s-expression as a Str? How big of an issue would that be?

Luke Boswell (Jul 11 2025 at 22:06):

Previous thoughts https://roc.zulipchat.com/#narrow/channel/304641-ideas/topic/Bundle.20the.20language.20server.20in.20the.20CLI/near/523176618

Richard Feldman (Jul 11 2025 at 22:23):

I haven't written this up anywhere, but my general thought is that the way I think we should try doing a language server is:

we load up all our ModuleEnvs in the usual way
we build our graph of dependencies between those modules
within each of those modules, we also know what their exports are
we keep a process running with all of this in memory
when we get an individual change to a particular file, we diff its source bytes to figure out which top-level decls changed, and redo parsing on them to get new asts for them
then we replace the old canonical decls with lookups to the new ones, so that all the existing references to the old ones get automatically updated in-place
then we use our old canonical dependency graph of top-level decls to see which top-level decls need to have their types re-checked if necessary (in many cases we can quickly determine that the changed top-level decls(s) could not possibly have changed their types, so this is unnecessary)
now we finish all necessary type-checking within the module
now we see if any exposed things changed types; if so, other modules which depend on this one will need to be rebuilt in a similar incremental way
finally we're all synced up and back to normal, except some of our node stores have grown
to prevent this from growing memory usage indefinitely, we have a background thread with low priority which copies over module envs into more compact representations by rebuilding them from AST onward. If this finishes and the original module env wasn't modified in the meantime, then we atomically swap it in and deinit the old one
in this way, we have cheap garbage collection of modules, they can get updated incrementally very quickly, and we always have all the info necessary in memory to answer questions about types, references, etc.

Richard Feldman (Jul 11 2025 at 22:24):

I think that's how we get the best LSP perf and memory usage, and that has essentially nothing in common with reading SExprs from disk and then doing different things with them in memory :sweat_smile:

Joshua Warner (Jul 11 2025 at 22:27):

Don’t try to diff bytes prior to parsing. That’s basically a buggy implementation of an incremental parser. Either build a proper incremental parser or just always reparse the whole file

Richard Feldman (Jul 11 2025 at 22:36):

I mean using regions

Richard Feldman (Jul 11 2025 at 22:36):

like the language server says "here is what was modified" and we use our regions to tell what was modified

Richard Feldman (Jul 11 2025 at 22:36):

maybe "diff" is the wrong word for that haha

Anthony Bullard (Jul 11 2025 at 22:40):

This doesn't sound like plugin at all. This just sounds like a separate binary that is still coupled to the check libraries , just like snapshot. Which i think is fine but above it sounded like there was a desire for something with less coupling

Anthony Bullard (Jul 11 2025 at 22:41):

(Even if i think that's not easy to do if you also want top performance from the language server)

Richard Feldman (Jul 11 2025 at 22:41):

I think this should be in the roc binary

Richard Feldman (Jul 11 2025 at 22:42):

everything I just described is still useful for something like roc check --watch

Anthony Bullard (Jul 11 2025 at 22:42):

Richard Feldman said:

I'm saying never have a roc lsp, but rather have a different thing that is sufficiently powerful that you can give it a roc script and that's all you need to stand up a language server

Then what was meant by this?

Richard Feldman (Jul 11 2025 at 22:42):

sorry, "different thing" as in roc tool

Anthony Bullard (Jul 11 2025 at 22:43):

So tool is a sub command of the compiler binary that does the above in service of multiple tools that are implemented in Roc?

Richard Feldman (Jul 11 2025 at 22:43):

yeah

Richard Feldman (Jul 11 2025 at 22:43):

so like you could use it to implement a codemod tool as well, for example

Richard Feldman (Jul 11 2025 at 22:43):

or a language server

Richard Feldman (Jul 11 2025 at 22:43):

or a dedicated plugin for a specific editor that's more ambitious than just language server

Anthony Bullard (Jul 11 2025 at 22:44):

Ok. interesting

Richard Feldman (Jul 11 2025 at 22:44):

the specific thing I don't want to do is to couple roc to the language server protocol

Anthony Bullard (Jul 11 2025 at 22:44):

Is there prior art, or is this just an experiment?

Anthony Bullard (Jul 11 2025 at 22:45):

Richard Feldman said:

the specific thing I don't want to do is to couple roc to the language server protocol

i understand this, even with LSP having such deep momentum behind it

Richard Feldman (Jul 11 2025 at 22:46):

there's lots of prior art for doing it in batch builds (e.g. code mod tools, languages like Nim have lots of static introspection features) and there's also lots of prior art for doing dynamic runtime introspection (e.g. Smalltalk) but I'm not aware of prior art combining the two - a static analysis system that efficiently keeps itself up-to-date with changes at runtime

Anthony Bullard (Jul 11 2025 at 22:46):

One down side is it means that we have to have at least a working interpreter for Roc v0.1 implemented to start on implementation of this

Richard Feldman (Jul 11 2025 at 22:47):

yeah that's why I'd be ok with doing a temporary like roc deprecated-lsp or something

Richard Feldman (Jul 11 2025 at 22:47):

just really clearly communicate that it's a stopgap that will be going away, to be replaced by something else that you can use to achieve the same goal

Anthony Bullard (Jul 11 2025 at 22:47):

Anthony Bullard (Jul 11 2025 at 22:48):

but that temporary thing should just be a zig library that's exposed in the roc binary just like
check, test, build, etc

Richard Feldman (Jul 11 2025 at 22:49):

but I do think we should aim for an architecture along these lines:

Richard Feldman said:

I haven't written this up anywhere, but my general thought is that the way I think we should try doing a language server is:

Richard Feldman (Jul 11 2025 at 22:49):

Anthony Bullard said:

but that temporary thing should just be a zig library that's exposed in the roc binary just like
check, test, build, etc

I think just have it be part of the normal roc CLI

Anthony Bullard (Jul 11 2025 at 22:49):

Ok, i think that temporary things architecture is what is being decided here

Richard Feldman (Jul 11 2025 at 22:49):

like we can of course build something simpler, but it feels kinda wasteful

Anthony Bullard (Jul 11 2025 at 22:49):

I think roc tool is awesome for once we have a working stable intepreter

Richard Feldman (Jul 11 2025 at 22:49):

we already have the module envs in place

Anthony Bullard (Jul 11 2025 at 22:50):

Yep

Richard Feldman (Jul 11 2025 at 22:50):

loading files is WIP

Anthony Bullard (Jul 11 2025 at 22:50):

And our check pipeline is so fast it'll be great perf

Anthony Bullard (Jul 11 2025 at 22:50):

I hope we can make the tool interface still very performant as well

Richard Feldman (Jul 11 2025 at 22:53):

yeah I haven't super dug into it but I think we'll be good thanks to the whole Idx thing

Richard Feldman (Jul 11 2025 at 22:54):

supposedly (Folkert and Andrew independently confirmed this) you can turn a SoA thing into a normal tagged union for ergonomics and llvm will optimize away the conversion

Richard Feldman (Jul 11 2025 at 22:54):

so we should be able to offer nice ergonomics as well as great perf...in theory at least!

Anthony Bullard (Jul 11 2025 at 22:56):

i can't wait to have some actual down days to jump back into roc development

Richard Feldman (Jul 11 2025 at 23:01):

if it's still gonna be hectic for awhile, I'd be happy to get your PR across the finish line

Richard Feldman (Jul 11 2025 at 23:02):

we've been avoiding touching parser stuff and it would be nice to unblock if you're gonna be bandwidth-constrained awhile longer! :smile:

Richard Feldman (Jul 11 2025 at 23:02):

there's plenty of other stuff in the pipe :laughing:

Joshua Warner (Jul 11 2025 at 23:03):

FWIW I think an actual incremental parser is not that difficult to do (and make "clearly correct")

Joshua Warner (Jul 12 2025 at 00:03):

Basically:

The parser grows a new field: known_edit_range: (offset, old_len, new_len)
We introduce a new internal ast node type: node_reference, which has an original_id and a "shift" (amount by which we have to adjust ranges to be correct)
Most parsing functions get a new optional arg for the "possible old node"
Those functions will have a small preamble where they check if there's a trivial match against the node passed, and if so return a node_reference pointing to that. We can determine a "trivial" match by looking at the current pos, the known_edit_range, and the node's offset.
Otherwise, they also need to fish out what would be the corresponding field in the "possible old node" to pass that to child parsers

Joshua Warner (Jul 12 2025 at 00:06):

Also, the ast node id type used outside the parser will need to grow an extra field to store the expected "shift", which is applied when looking up any ranges.

Joshua Warner (Jul 12 2025 at 00:07):

We can chose to modify as many or as few functions as we want - so if we do this at the granularity of defs, we automatically get a parser that can skip re-parsing matching defs. Or we can make every function do this, and get very fine-grained re-use.

Joshua Warner (Jul 12 2025 at 00:07):

(obviously the tokenizer needs to be able to produce a diff as well)

Joshua Warner (Jul 12 2025 at 00:08):

That much is somewhat non-trivial, especially with string interpolation, but very doable.

Joshua Warner (Jul 12 2025 at 00:09):

Of course, maybe this is roughly what you mean by "we use our regions to tell what was modified"

Luke Boswell (Jul 12 2025 at 01:19):

Richard Feldman said:

I haven't written this up anywhere, but my general thought is that the way I think we should try doing a language server is:

Ok, thank you for summarising this... clears some things up.

Luke Boswell (Jul 12 2025 at 01:21):

Based on this, I'd say we just wait a little and build the tool version of an LSP. There's no rush and we're probably not too far away. :smiley:

Anthony Bullard (Jul 12 2025 at 02:03):

That tooling api is going to be important to design well, since it's now an API contract for the compiler (obv more important post v1)

Richard Feldman (Jul 12 2025 at 02:12):

100%!

Luke Boswell (Jul 12 2025 at 03:42):

There's nothing stopping us from designing that now.. or sketching out prelimary concepts.

Coming back to the --deprecated-dump-can-ir idea... we could use that as a bit of a hack to simulate the API and do some experiments with real 0.1 programs. Acknowledge we will be throwing most of this work away, but the goal wouldnt be to make a super comprehensive implementation just hack enough together to explore the tooling API design space.

Having a more mature API design we are aiming for earlier would be quite helpful.

Luke Boswell (Jul 12 2025 at 03:43):

It's also something cool people can work on if they're just wanting to learn and write Roc and not so keen on Zig or Rust things -- but still is really helpful working towards nice tooling for the 0.1 implementation

Last updated: Aug 17 2025 at 12:14 UTC