Auto-annotation tool · ideas · Zulip Chat Archive

Stream: ideas

Topic: Auto-annotation tool

Aurélien Geron (Aug 29 2024 at 06:22):

I'd like to be able to run something like:

roc annotate -u main.roc

And it would automatically add type annotations to all the top-level functions in main.roc. The -u stands for update. If you omit it, the command just prints the annotations:

roc annotate main.roc

square : I32, I32 -> U64
message : Str
parse : Str -> Result I64 [InvalidFormat]

Aurélien Geron (Aug 29 2024 at 06:26):

My motivation right now is that I have over 30 exercises in Exercism, and very few of them are annotated. It's probably a good idea to annotate them, and I wish I could automate that.

Brendan Hansknecht (Aug 29 2024 at 06:42):

That sounds wonderful. Though I do think it will have some minor inconveniences with generality.

Brendan Hansknecht (Aug 29 2024 at 06:42):

As in the concrete type may be I64 but the inferred type is Num a type decision.

Brendan Hansknecht (Aug 29 2024 at 06:43):

Maybe it could be a flag on the formatter instead of a new sub command?

Aurélien Geron (Aug 29 2024 at 06:44):

Oh yes, these are great points.

Aurélien Geron (Aug 29 2024 at 06:45):

Something like:

roc format --add-annotations --annotation-types=specific main.roc

Kilian Vounckx (Aug 29 2024 at 06:47):

Couldn't this also be a nice LSP action?

Sam Mohr (Aug 29 2024 at 07:44):

Kilian Vounckx said:

Couldn't this also be a nice LSP action?

I always assumed this would be how we'd do it

Sam Mohr (Aug 29 2024 at 07:45):

And then we just add an action at the top of the module that does it for a whole file

Aurélien Geron (Aug 29 2024 at 08:21):

That's good too, and probably easier to implement since there's already annotation suggestions in the LSP! It would not be as easy to integrate into a script though (e.g., suppose you want to write a script to check or add annotations in a GitHub action, you would need to spin up a language server).

Sam Mohr (Aug 29 2024 at 08:24):

It would not be as easy to use in a scripting environment, correct, but it would be more effort to implement that, probably? I think both are useful.

Sam Mohr (Aug 29 2024 at 08:26):

Let's treat this as three features?

Add signature to a top-level value as an LSP action
Add signatures to all top-level defs in a file as an LSP action
Add a flag to roc format that adds signatures to all top level defs

Aurélien Geron (Aug 29 2024 at 08:35):

Sounds perfect!

Sky Rose (Aug 30 2024 at 02:33):

For 3, is roc format really the right place for this (as opposed to a new roc annotate)?

It's more than just formatting, it's adding new source code. Does the formatter do anything similar yet?
The formatter should have 0 config, and this would now be a CLI argument. Is this enough like config that it breaks that goal?

Sam Mohr (Aug 30 2024 at 02:35):

You're right, this feels a little bit like formatter config. To avoid polluting the roc <subcommand> namespace, do you think that roc format annotate is sufficiently clear?

Sky Rose (Aug 30 2024 at 02:37):

(no opinion, I'll defer to others)

Joshua Warner (Aug 30 2024 at 02:40):

It seems reasonable on the surface to make the formatter add type annotations to all _exported_ functions (in a library), at least

Joshua Warner (Aug 30 2024 at 02:41):

I do imagine it being nice to not add type annotations to everything tho, since it could be handy to not have to update so many annotations during a refactor or something...

Sam Mohr (Aug 30 2024 at 02:42):

I don't think so, unfortunately, for three reasons:

Do you annotate aliases or their underlying values?
What are type variables named? Just defaulting to a, b, is a good bit worse than useful names.
If the developer is halfway through writing a function and saves their program, do we now put a half-baked type signature? What happens when their finished function disagrees with that one?

Joshua Warner (Aug 30 2024 at 02:45):

:thinking: Maybe tho, there could be a refactoring tool that would try to identify a set of top-level annotations that, if removed and re-inferred, would cause all the type errors to go away

Sam Mohr (Aug 30 2024 at 02:46):

What if someone put a type signature to have the compiler help them get the annotated function to that type?

Sam Mohr (Aug 30 2024 at 02:47):

But I definitely think if this is possible, it would pretty much just be a net positive to ensure all top-level defs are type-annotated

Joshua Warner (Aug 30 2024 at 02:49):

Anyway, I'm now convinced that if nothing else this is solidly in the Hard Problem category, so likely not something we want to try to have the formatter just do, without a lot of careful thought

Joshua Warner (Aug 30 2024 at 02:50):

What are type variables named? Just defaulting to a, b, is a good bit worse than useful names.

Would be fun to ship a small (like, really small) language model for something like this :stuck_out_tongue:

Joshua Warner (Aug 30 2024 at 02:51):

Something vaguely along these lines: https://www.microsoft.com/en-us/research/publication/flame-a-small-language-model-for-spreadsheet-formulas/

Joshua Warner (Aug 30 2024 at 02:51):

I actually want to experiment with something like that for doing a better error-tolerant parser

Joshua Warner (Aug 30 2024 at 02:52):

(tangent alert!)

Anton (Aug 30 2024 at 08:31):

I actually want to experiment with something like that for doing a better error-tolerant parser

That is very interesting, I have argued for an approach like that for error messages before, but it also makes a lot of sense for error-tolerant parsing.

Anton (Aug 30 2024 at 08:41):

Generating training data would be easy; you can take correct programs and mess them up a little.

Anton (Aug 30 2024 at 08:42):

Although a big risk would be that the same code could be parsed into a different correct program when you release a next version of your language model.

Last updated: Jul 23 2026 at 13:15 UTC