I think this rewrite is a great time to also think a little about how the roc compiler can integrate well with external tooling and the language server:
Things like:
I just hadn't seen any discussion of it and wanted to try to make sure it's not forgotten :)
I think Richard isn't a fan of tying an external protocol to our compiler:
Richard Feldman said:
zooming out for a sec, I'm trying to avoid coupling the public
roc
API to other protocols, so that we don't get in a situation where people are saying "hey please update the language and do a release ASAP because we're blocked and there's no way for us to unblock ourselves short of a language release."examples of this include:
- not wanting to offer generating llvm IR because that couples
roc
to llvm updates- not wanting to have Unicode segmentation in builtins because that couples
roc
to Unicode updates- not wanting to couple
roc
to lsp for the same reason.with that in mind, I wonder if there's a way we could keep the relevant logic in the
roc
binary but expose it in a way that lsp (and others) can access functionality with it in a way that can be upgraded independently fromroc
itself - like for example what we do withroc glue
and giving it a .roc script that describes what to do
But since we're aiming for a single binary that does everything, it'd be really nice to have the same experience as with gleam lsp
Well that's why I was hoping that the roc binary itself could expose very low level primitives, symbols, types, ast etc.
That way improvements to the language server can occur separately to the language.
It seems like that decouples roc from lsp much more, right?
Yeah, I'd agree
Sorry, I think I read your message too quickly
I think the plan is for each IR to be nicely serialisable for caching and snapshotting, and also for external tools to integrate with.
I'm not exactly sure how that will look or work. But definitely something I'm keen to explore and understand more to help ensure we dont miss it.
Also I think a language server written in roc would be a really great, "here is roc doing real things that are non trivial" example.
language servers do:
oh, interesting... writing the language server in roc. How would that work? as a plugin?
I'm really interested in this idea, it sounds great.
Would that need to be a custom platform?
We can look at how other language servers work.
i have few ideas, but one I quite like is having the compiler also be a roc platform.
That way you guarantee the language server will always work even if you have a different version of the compiler, but it can still release independently of the compiler.
The compiler could also expose apis over jsonrpc or the like and the language server could communicate that way.
I think choosing either one should allow us to switch to the other with very little hassle.
Eventually we should definitely expose both, having your compiler be able to be run as a service lets people build interesting tools with it, but also I think the platform approach would be better for the language server and making it super simple for people to build other tools in roc.
When you say the roc compiler, are you thinking the roc cli binary/executable?
Yup. You could start it with a special flag and it goes into "rpc mode" where you can send messages back and forth and get stuff out.
In the short term, I think a platform for roc would be the best choice though. Again, it's a good way to show off our cool features, with a real word use case
yeah it could work like glue
where instead of gleam lsp
you run something like roc tooling lsp.roc
and then lsp.roc
works like rust-glue.roc
in that its platform is provided by the compiler, and lsp.roc
plays the role of translating direct function calls to/from the compiler into the language server protocol
I think that's a good solution for now.
In the longer term it may be nice to be able to release changes to the language server without updating roc as a whole (not sure how often you plan to do releases).
If so I was thinking we could have a platform that produces it's own standalone binary and uses the compiler as a library. That way the language server can be bumped separate to the compiler because it bundles its own copy of the compiler.
I had a chat to @Luke Boswell earlier today about this.
I think it would be great if we could make the "roc tooling" platform be more than just a language server.
If we try to keep everything pretty general purpose on the zig side, getting ast, different IR, types, symbols etc, then we could easily create a base for other analysis tools, like linters, or codegen etc. All written in roc
yeah I do think that type of thing seems reasonable
I like the "linting" philosophy where it's not so much about enforcing arbitrary stylistic preferences as it is about project-specific invariants
e.g. "we have decided to move away from doing things in this way that we used to, and the tool's job is to fail the build if that way is used in any new code, with exceptions carved out for old code"
I think one day it would be really cool if we could do things like create linting rules using the refcount IR.
That way we could create a lint that says "hey this platform expects to be able to reuse this buffer, and you're storing an extra reference to it here"
I think it's reasonable to do that on the application side but not the platform side
like "I always want to be reusing this for perf, and if I ever stop doing that here I want to know about it"
as opposed to "you must only give me one that can be reused or else your build will fail" - at which point we've added a janky version of Rust's ownership types to Roc :big_smile:
oh no, I more just meant:
If we build a good compiler platform that allows plugins for things like linting, and we expose the right stuff, we could enable a platform author to add little suggestions like that. Not as an error necessarily but as a hint.
You know: "if anyone calls this function, and the variable it's assigned to has multiple references, show a little suggestion that says "sure you want to do that mate?"
I would definitely write one for myself that does: "if i have a variable called buff and it has multiple references, warn me"
Richard Feldman said:
and then
lsp.roc
works likerust-glue.roc
in that its platform is provided by the compiler, andlsp.roc
plays the role of translating direct function calls to/from the compiler into the language server protocol
This really don't make sense to me. Unlike glue, the language server is a bespoke single application.
Luke Boswell said:
I think the plan is for each IR to be nicely serialisable for caching and snapshotting, and also for external tools to integrate with.
I think we should be really careful of this. The more coupling we expose the less flexible and changeable roc becomes. This sounds like a trivial way to hit hyrum's law really hard.
Not saying we shouldn't do it, but if we do it, we probably should pick one very explicit cutting point that we think is unlikely to change.
That said, I think both zig and Odin expose parts of their compiler in their standard library, so maybe it isn't too bad.
I guess a lot of this depends on good versioning guarantees.
Also, this all may fall nice into a libroc workflow where you just use the compiler as a library instead of as an executable
I am strongly with Brendan on this. I would be leery of exposing the full IR. I would like to transform it slightly to include only the info we see external services wanting, recounts symbol locations type info etc. nothing weird and internal if we can avoid it.
Hopefully that will keep us from having the external API change much.
As a starting point I was thinking of making libroc essentially expose the roc check functionality, and it returns the ResolvedIR (prev Can) or maybe as far as the last IR before code gen ... and making a roc platform that provides that to a roc app aling with Problems.
I figure this would be all that's required for making an LSP, or something like checkmate, or our playground.
Is this topic a superset of (something we talked about several months ago) wanting to be able to convert raw Roc source code files to/from a serialization format like JSON or YAML etc, powered by something like a first-party JSON Schema? Should I start a new topic?
I think the general idea is to serialise the IR's to an S-expression format, which should then be easy to parse and work with -- I'm not sure about a schema, though I guess once the IR has firmed up that might help standardise it.
yes, each IR would have it's own sexpr representation
Which would be very simple
I think the general idea is to serialise the IR's to an S-expression format
Is that for this tooling as well? This tooling wouldn't be using a serialized text at all. It should be directly using some sort of roc tag union representation of the IR.
Well making it a format able to be sent outside of zig is pretty essential if we want to build tools for roc in roc.
So I'd call it tangentially relevant.
For sure, I was think of text representation vs tag union representation which may be two very different shapes
Anthony Bullard said:
Which would be very simple
Sweet! So could I write a Roc library that helps you "read/write Roc code" by mapping+translating typed values (likely mostly Strs, but maybe lots of tags) from/to the raw contents of main.roc.ir_step_5.sexpr.idk files that the compiler writes beforehand / reads later? (Maybe in real time if a platform called the compiler in a certain way?)
I'd like to explore the idea of roc glue
-- potentially even becoming something more like roc gen
and it could potentially access any or all of the IR's and then we could write plugins (roc scripts) that do things with roc source code really easily. The primary usecase is for things like tooling (e.g. checkmate).
If we use a Str
and parse the S-expressions on the roc side... it will be much easier than trying to maintain a binding to the roc types for all the IR's.
So I imagine a roc package that parses the IR Str
and gives us an AST using roc tag unions etc.
How would we make sure that it always stays in sync with the Zig equivalent? Seems easier to keep this in a Zig library
Then I imagine having a few simple effects available like Stdout.line!
, File.write!
or Http.send!
to do stuff with this.
Sam Mohr said:
How would we make sure that it always stays in sync with the Zig equivalent? Seems easier to keep this in a Zig library
I imagine we could fuzz it somehow... we will be using this for glue generation anyway [in my hypothesis here] -- which we would want to be reliable.
Are the alternatives (b) not having this functionality and (c) having third parties write plugins in Zig?
Sounds about right, with more leaning towards (c)
Or writing bindings to Zig, which seems janky
Note that if we expose things via zig, e.g. (c), we wouldn't want to use the builtins... as they're really our internals and not a very nice abstraction for working with. So if we do expect people to work with zig, it's the same issue where we have a separate thing (zig library) that needs to be maintained and kept in sync.
If we can standardise on a simple protocol (S-expressions) instead of a library (one blessed language) then it will be much easier for tooling in any language.
But maybe I'm wrong here... we will need zig code to serialise and deserialise the IR's anyway, and a parser in Roc would be duplicating this effort.
and it could potentially access any or all of the IR's and then we could write plugins
Please no
Why not?
The more exposed our internals are, the more lockdown they are
I think we should expose a single cutting point with its own transformed IR and nothing else
Huge hit by hyrum's law
Ahk... well I guess maybe we just expose the IR after type checking, and then the IR after the full build (which includes refcounting etc)??
well I guess maybe we just expose the IR after type checking
Yeah, something around here is the one point I think we should expose.
....
refcounting is technically an internal detail, why do we want to expose it?
We need to expose something for generating glue anyway... my hypothesis is that we could also expose something that tooling like checkmate can use too
Oh, it was the LSP that wanted to know about tail-call or other optimisations.
I was wondering if we could even write our LSP as a roc plugin?
Mostly spit-balling here... these aren't really thought through ideas. I just feel like we could use roc scripts to simplify a lot of things for our own tooling.
It possible. If we do so, I think we just need to make sure to pick a limited set of cutting points with really clear apis.
Potentially even separating them completely from the IR so the IR can change separately from the api (probably required anyway essentially to translate from zig to roc)
JanCVanB said:
Are the alternatives (b) not having this functionality and (c) having third parties write plugins in Zig?
d. standard c api shared libaries
also, roc glue
does not support effects by design, because it means you know it is always 100% harmless to run anyone's glue script
all it's going to do is to spit out files into the directory you requested, because that's all it knows how to do :big_smile:
My gut feeling here is that we should kick the tooling can down the road a little further -- and option (d) aka libroc
is probably the best option, but there's a lot of design work around the interpreter etc that would be good to understand first.
that is actually option (e) techincally. For (d) I meant that the compiler could run liblsp.so
instead of making a roc interface for the lsp.
but yeah, I think this part of tooling, we should wait on.
First we should figure out the best abstraction for glue (in the new compiler) and learn from wiring that up happily
Then we should revisit exposing things specifically thinking of the LSP use case
Finally we should think if that can be expanded to more general use cases.
That is at least roughly how I would push on it
honestly, I've been more tempted to think that compilers should be built lsp-first and out. but maybe that's crazy. not saying we should, I don't think there's enough precedence to justify it. but I get this feeling that at this point a language server ends up inevitable and having a good one makes all the difference in terms of adoption. considering how most people actually spend more time interacting with the LSP than the cli, it kinda makes sense to focus on it first and with highest priority. the cli pass would largely end up being for CI most of the time.
to be honest, I'm not sure how different we'll want it to be
here's a sketch of how it could look:
roc check
phases of the compiler (which is all that language servers care about) we're already compiling each module separately, with its data structures stored in its own arena so we can serialize them to/from disk easilyat that point we have all the parsing, canonical IR, and type info in memory and up to date...so we can expose an interface to ask questions about that
maybe I'm missing something, but I don't really understand what specific advantages a fundamentally "query-based" compiler architecture would have over that :sweat_smile:
What you're describing is a manually-orchestrated query-based compiler architecture
I guess? The relevant part to me is that it sounds like it has about 98% code reuse with the batch compiler :big_smile:
Also - in the context where you need to potentially do type inference globally, the more traditional query-based architecture doesn't really do much for you. That's most convenient where you have very clear cut-points in the graph (e.g. function boundaries)
(side note: I would like to explore automatically carving out such boundaries based on where type annotations occur in the source code)
like when I watched Anders talking about C#'s Roslyn compiler architecture years ago, or when I looked at salsa in Rust, they seem wildly different architecturally
If you squint hard enough, it's the same
the thing is, my assumption is that rerunning type checking on an individual module will be so fast it won't matter, as long as we don't have to redo any work in other modules to do it - which is already the case unless you're doing something like a rename of an exposed thing
Agree that we should make the batch case super duper fast as a first priority, and only fall back to other strategies when that clearly _can't_ keep up.
@Richard Feldman yea makes sense, it's probably not much different, especially if things are just fast enough
just vague musings I've had lately on my part
yeah like if we can single-threaded roc check
a file in under 1 ms per 2K lines of code, then as long as we have uninterrupted access to a CPU core, we can do that at 120fps for all but the top 1% of humongous individual files
I guess it depends on the speed of cross file dependency resolution with cached can IR. That likely will get slower in large projects
Last updated: Jul 06 2025 at 12:14 UTC