So I found some time today to work on Module Params. I'm starting with the parser changes for the inline import syntax described here.
import Json
import Unicode as Uc
import [Req, Resp] from InternalHttp
import [CodePoint, first] from CodePoint as Cp
import foo : Str from "some-file.txt"
I think it makes sense to treat these as defs because they can appear in the top-level, but also nested in an expression/expect.
My first question is whether it would be appropriate to add a variant to ValueDef
for imports? At first I thought imports aren't "values", but we seem to have Expect
and Dbg
in that union.
If not, I suppose we can change SingleDef.type_or_value
to something like type_or_value_or_import
:smile:
I think they structurally are values if we truly allow them anywhere (though I could see an argument for only allowing them at the top of a block). They are kinda a pattern matching let statement, just over modules.
yeah I think they go in the same category as dbg
and expect
, and should parse the same way those do
I think of them more like statements or declarations than values
Agreed, maybe consider renaming ValueDef if it's appropriate
Cool. So basically, ValueDef
are any “statements” that are not type definitions.
I’m struggling to come up with a name that captures that exactly. I’ll keep it for now and maybe it’ll become apparent as we implement more things in the proposal.
inline imports are convenient I guess but do mean that we need to also support that in load. it's no longer true that after reading the header you know all dependencies, you now need to at least parse the whole file
yep, for sure
although I think that will actually simplify things a bit, and I don't think the performance gains turned out to be significant after all :big_smile:
If we really wanted to, I suppose we could parse the ones between the header and the first non-import def (common case), and start processing those before parsing the rest of the file
but that'd probably complicate things a bit
another possibility would be to have the parser take an "import listener" which it would call every time it parsed an import, so it would immediately enqueue other modules as soon as it encountered them
but really I think the best answer is to make the parser so fast there'd be no noticeable difference between the more complicated strategies and just parsing the whole file and reporting all the imports at the end :big_smile:
yeah I'm not worried about performance, but it is some extra complexity
What is Malformed
for exactly? In the new proposal, module names can't have dots. If we encounter some during parsing should that be treated as a failure or as malformed?
Hm, it looks like we use that for generating runtime errors in codegen. I wonder if we should still try to run if you have a bad import.
I think keeping it is Malformed is good if it's not a lot of effort. It also means type checking can proceed with import errors
the general rule is "always proceed"
my general thinking on this is that an advantage interpreted languages (usually) have over statically compiled languages is that you can always run them no matter what problems there are, meaning you can always attempt to run the program no matter what
so the idea is to try to get that benefit by reporting errors at compile time but still proceeding to run the program if that was requested (e.g. via roc run
) and then generating a crash once we encounter it in practice
so for example, if an invalid import
is located inside a function, it might never come up if you're running a program that never hits that code path
That makes sense, thanks!
That function could even run fine until the point it tries to reference an (incorrectly) imported symbol
How do we want to format multi-line imports? Something like this?
import [
map,
int,
] from Json as J
This feels too whitespace heavy
import
[
map,
int,
]
from Json
as J
Yeah, the first one looks better
Also I think we should only allow newlines and comments inside the exposed members part ([ ]
)
as opposed to...? :thinking:
I mean something like this shouldn't be allowed, right?
import
# TODO: Use new module
DeprecatedJson
(edited: Removed already disallowed dot)
Why not?
Well, it could. I just haven’t seen that before.
Oh wow, Elm allows this
If you can have comments between any of the keywords, we probably need a few different format styles
import Json as J
import
# comment
Json as J
import
# comment
Json
# comment
as J
import [
map,
int
] from Json as J
import
# comment
[
map,
int
]
# comment
from
# comment
Json
# comment
as
# comment
J
I feel like comments should be supported anywhere. A bias, but to me it would feel weird if adding a comment somewhere causes a parse error.
The formatting probably doesn't need to be perfect I think
Yeah, that's a good point
I found is_collection_multiline
under annotation.rs
which we seem to use in other modules for non-annotation purposes. Shouldn't this live in collection.rs
?
I need it in def.rs
but I wasn't sure whether we consider ok to use it there
moving it makes sense to me! :thumbs_up:
there's a good chance it existed before we'd created collection.rs
The parser and formatter for inline imports is mostly done! I started looking into canonicalization now, and realized that introducing inline imports before "Module and privacy changes" is probably a waste of effort since the module system changes significantly, and the new imports are just not compatible with the current system even at the syntax level (e.g. packages shorthand).
I think it probably makes sense to do this wholesale. Let me know what you think.
I would still break it into small reviewable PRs but not to main
.
makes sense to me! :thumbs_up:
Should we sort imports alphabetically like elm format does?
Sorting whole import defs might feel weird, but it might make sense to sort names inside exposing
I think so, yeah!
also capitalized things (so, types and abilities) before non-capitalized things
Cool! And do you agree we shouldn't reorder whole import defs or do you think we should try it?
hm, what would be an example of that?
import B
import A
would be formatted to:
import A
import B
oh interesting...yeah I think let's actually try it
if they're all in a group, at least
like one after another
Gotcha! Yeah, I like that elm format does this. I thought it would not be as useful when imports aren't in a centralized place, but that might still be the case for a lot of modules.
I like it that elm-format
does this too, and I'm glad I wrote it into the original elm-format
spec document! :laughing:
Yeah it is nice to respect line break grouping so I can group certain imports together when I want to but still get sorting within those groups
Re this: https://github.com/roc-lang/roc/issues/6172
Instead of using a special module name parser, I’m leaning towards just using the regular identifier parser and treating this case as malformed. So it wouldn’t fail as a syntax error and we can provide a helpful error message explaining how module names must be capitalized.
Is there a particular design behind only allowing valid ident names for module file and folder names? It came up for me when I wanted to put my AoC into a 2023 folder and 01 subfolder, but they aren't valid idents so that was a parsing error.
well for referring to them in a qualified way
e.g. Foo.Bar.baz
To clarify, that’s not a thing anymore in the new proposal, right?
hm true
well, Bar.baz
still is
so like if you loaded from a module named x-y-z.roc
then referring to that in a qualified way would be x-y-z.baz
which definitely means something else :big_smile:
Is there a bit of a convention here, apps/packages/platform mains are lowercase and other modules are uppercase?
Totally. I suppose we could support that by requiring an alias, but honestly I haven’t had the need to do something like this in Elm.
yeah same
Luke Boswell said:
Is there a bit of a convention here, apps/packages/platform mains are lowercase and other modules are uppercase?
yeah exactly - partly because those can't be imported as modules anyway
so they can be named anything
I was looking at this issue related to imports and realized that I don't know how it's supposed to work.
So if you have an app
or package
you can specify dependency packages, import modules from them, and you can also import interface
modules that use those packages. If you run roc check
(or test) against the app
file, everything works as expected.
However, you can also run roc check
with an interface
file directly, and it works fine as long as you don't import a package. This makes sense because since there's no app
, there's no way it can know what package the shorthand refers to. I imagine the LSP would also have this problem as it tries to check or autocomplete a module.
Is there a plan to resolve the app somehow? Maybe there's a convention for where to find it, or a CLI flag that can be specified?
yeah it could be an app, package, or platform too
certainly we already have a convention of that being main.roc, but of course it doesn't have to be
I think the cli flag option makes sense
LSP could do things like trying main.roc first, and if that's not there, try all the .roc files in the folder (it's probably going to end up reading and parsing them all anyway, so once we have caching that should be cheap) and then remembering which one it concluded was the root module until that file is deleted from the file system
You'd probably need a config option for the LSP too, in case you have multiple apps.
Do you think the CLI could also default to main.roc
or should we fail and require the user to specify when checking a module?
yeah I think multiple apps should be very rare unless they're small scripts that probably don't have interfaces, but of course it could happen :+1:
I think default to main.roc
Yeah, that seems convenient. I suppose it would recursively try to find it from the module's directory and back. I believe this is how node finds node_modules
, for example.
Maybe just looking in the current directory is more sensible, though
oh yeah I forget where we ended up on the interface name appearing at the start of the module or not
if it's in the source file, then we know what directory to look in
but if not, then yeah we'd have to recurse
Well, what we need to find is the app/package file and I don't think the module would refer to that anyway.
I might have misunderstood what you meant
like if I have interface Foo.Bar.Baz
then I know to look up exactly 2 directories (Foo/Bar/
) for the root module
but if that isn't declared at the top of the file, then the root module could be any number of directories up
Ah, I didn't think of that! Yeah, at least in the google doc, it looks like we won't have that anymore.
It doesn't look like that changed from Zulip discussions. At least, I haven't found that.
Personally, I like the idea of using the file name as the source of truth. Having to keep those in sync can be a little annoying in Elm.
I'm now torn, though. It does seem convenient for this use case. I don't care that much about the implementation, recursing would probably be fine, but I think for the user, it might be more intuitive to find the app/package this way.
besides convenience, another theoretical benefit of not having the path there is that multiple root modules in different directories can import the same interface
I've never actually wanted to do this in practice though :sweat_smile:
I just realized that we don’t need to recurse to find the app/package main since there won’t be module folders anymore :big_smile:
oh right! :laughing:
Do we only want to parse record literals for module params or can they be any expression?
In other words, should this be allowed?
params = { echo }
import A params
I guess not, but just to be sure :smile:
only record literals I think
(we can always relax that in the future if there's some specific reason it seems like a good idea)
Sounds good. I thought there might be some use cases where passing a tag could be nice, but I couldn't come up with any obvious ones :D
for testing I can imagine theoretically wanting to pass a record in, but supporting that is more than just a parsing concern
doesn't seem worth assuming it's a good idea :stuck_out_tongue:
I'm back to this now that I have some free time :smiley:
I'm running into a parsing issue I don't have an obvious solution for. Take this example:
users : List { name: Str, age: U8 }
users =
import "users.json" as data : Str
data |> parseJson |> Result.withDefault []
When it's parsing the annotation in data : Str
, it will continue to the next line and parse the intended data
expr identifier as a type variable for Str
, as if it were ... as data : Str data
.
It later fails when it finds the |>
because that doesn't make sense as part of type annotation or as the start of the expression that follows.
Initially, I think that wouldn't happen because I use increment_min_indent
after I find the import
keyword. However, it looks like module::typed_ident
resets it.
Any ideas on how to handle this?
@Joshua Warner may have an idea
We only support Str
and List U8
for ingested files, so I could cheat here, but that seems weird
I'm not sure typed_ident
needs to reset min ident. At first, I thought that was because we want this to parse:
User : {
age: U8,
}
However, given that typed_ident
itself doesn't increment_min_ident
after the name (User
), that would still parse I think.
I think the fault here lies with type_annotation::expression
not enforcing that the second (and later) terms in a type expression must be indented relative to the line the expression starts on
Actually... the fix is fairly simple. This and!
should be an indented_seq!
instead: https://github.com/roc-lang/roc/blob/5d98783cf2767e6d1c96b80ffeca8a1ff3e84637/crates/compiler/parse/src/type_annotation.rs#L389
Oh, cool. I'll try that.
FWIW all tests seem to pass if I remove reset_min_indent
from typed_ident
, and my example parses correctly
Do you think we still need it?
Yeah, I think it's arguable whether that reset_min_indent belongs there
On the one hand, it does make the parser a little more error tolerant if someone accidentally put a newline right after the : and forgot to indent. On the other hand, that someone might equally have forgotten to put a type after the :, and by continuing to parse at the same indent level, we're just going to be accidentally interpreting part of the next line as a type.
Yeah, exactly what I was thinking.
It looks like unlike and!
, indented_seq!
discards the output of $p1
, so it doesn't quite work.
It works more like skip_first!
, which makes sense the other cases use keywords in $p1
We probably want a variant of indented_seq!
that doesn't drop the first argument
actually, we should probably just rename the current one to indented_seq_skip_first
or something, and make a new one that returns both args
It seems to work fine with this change! Thank you :raised_hands:
Joshua Warner said:
On the one hand, it does make the parser a little more error tolerant if someone accidentally put a newline right after the : and forgot to indent. On the other hand, that someone might equally have forgotten to put a type after the :, and by continuing to parse at the same indent level, we're just going to be accidentally interpreting part of the next line as a type.
I'm curious what @Richard Feldman thinks about this, but that's not a blocker.
I'm working on updating load
to work with inline imports. As we've discussed before, now that imports could appear anywhere, we don't know the dependencies until we have actually parsed the whole file (not just the header).
I'm not sure what's the best way to find these after parsing, though. They are available in the AST, but walking through all the expressions to find them seems wasteful, or is that ok? Another option, is to add a Vec
to the parser's state to which we would insert imported module names as we find them. Does that sound like a good idea?
I would suggest walking the AST first and then if it turns out having it in the parser state gets rid of a bottleneck doing that later.
So while building headers in load
, we have a scope_size
variable that gets incremented whenever we find an import and is later used to allocate hashmap.
That all makes sense, but I think what we are incrementing it by is wrong: https://github.com/roc-lang/roc/blob/4569770c82c48b297d6552405023ee16f271cd86/crates/compiler/load_internal/src/file.rs#L4312
I think that should be += exposed.len()
, not += num_exposes
since num_exposes
is actually the number of exposed names from the current module, not those from the import.
I think you're right!
I just realized we can now do this without a special REPL command: https://github.com/roc-lang/roc/issues/609 :smile:
ha, good point!
I'm rewriting unused import warnings because, with inline imports, they now need to happen in can
instead of load
as they're limited to certain scopes.
It looks like currently we don't differentiate between qualified and unqualified symbol references, which means that we don't report exposed symbols from imports, even if they're only used qualified.
app "example"
packages { pf: ".." }
imports [pf.Stdout.{ line }]
# ^^^^ this is unused
provides [main] to pf
main =
Stdout.line "Hello world!"
# ^^^^ symbol used here but qualified
This is a bug, right?
I found tests with these unused exposed names (see Dep2.{ two }
), but I guess those are just leftovers
yeah that's a bug!
I don't think I even considered this scenario when I was writing the unused warnings in like 2019 :sweat_smile:
Makes sense. I'll fix it! :smiley:
thank you, appreciate it! :heart:
I think we need to tweak the copy a bit now that imports can appear inside defs:
unusedModule =
import Dep
"I'm not using Dep"
As I have it now, this would be reported as:
── UNUSED IMPORT ───────────────────────────────────── tmp/unused_imports/Main ─
Nothing from Dep is used in this module.
10│ import Dep
^^^^^^^^^^
Since Dep isn't used, you don't need to import it
Should we use "scope" instead of "module" if it isn't top-level? Is that too technical?
I'd just re-word it to "Dep2 is imported but not used."
Ah cool, that's easy :)
It looks like currently you can import a builtin explicitly and we'll just ignore it. Should we make that a warning? It could be an opportunity to explain how builtins work :smile:
if we do, we might need special handling for the builtins themselves; they actually need to import each other explicitly
Yeah, but that's as easy as .is_builtin()
This is what I'm thinking:
── BUILTIN IMPORT ───────────────────────────────────── tmp/unused_builtins/Main ─
The Str builtin was imported explicitly.
10│ import Str
^^^^^^^^^^
Builtins are imported automatically into every Roc module, so you can remove this line.
You can learn more about builtin modules at <https://www.roc-lang.org/tutorial#builtin-modules>
I don't know, maybe it's not that useful
I think it's reasonable!
in general we tend to warn for things that are definitely unnecessary
I'd say only do it if you feel like it, but from a design perspective it seems like a good idea to me :big_smile:
I've been working on something for my Action-State experiment and stopped to think about this proposal and what it might look like. I'm just wondering what the following might look like in a module-params future?? Note the modules Action
, GUI
, and Core
come from the platform.
interface Counter
exposes [
Counter,
init,
render,
]
imports [ray.Action.{ Action }, ray.Core.{ Color }, ray.GUI.{ Elem }]
Counter := I64
init : I64 -> Counter
init = @Counter
render : Counter, Color -> Elem Counter
render = \@Counter state, color ->
GUI.col [
GUI.button { text: "+", onPress: \@Counter prev -> Action.update (@Counter (prev + 1)) },
GUI.text { label: "Clicked $(Num.toStr state) times", color },
GUI.button { text: "-", onPress: \@Counter prev -> Action.update (@Counter (prev - 1)) },
]
module {Elem,Color,button,text,col,update} -> [Counter, init, render]
Counter := I64
init : I64 -> Counter
init = @Counter
render : Counter, Color -> Elem Counter
render = \@Counter state, color ->
col [
button { text: "+", onPress: \@Counter prev -> update (@Counter (prev + 1)) },
text { label: "Clicked $(Num.toStr state) times", color },
button { text: "-", onPress: \@Counter prev -> update (@Counter (prev - 1)) },
]
I'm not sure it makes sense for most of those pieces to be generic as module params. Module params are really just for dependency injection.
So I would guess most of those inputs would be imported directly. Though I guess if you want your counter to be generic over gui/tui/etc you might do something like that.
Oh, so can we still import modules from the platform?
Like import ray.GUI {col, button, text}
etc?
I think it will be import ray.Gui exposing {col, button, text}
though not exactly sure the final syntax
Yeah, your local modules can import things from the platform, packages cannot
Brendan Hansknecht said:
I think it will be
import ray.Gui exposing {col, button, text}
That’s correct, except we are using square brackets :smile:
import ray.Gui exposing [col, button, text]
I just got ingested files working for the new imports. I had to move some code from load/file to can/def which depended on the module's path, and I ended up having to thread that through a ton of can functions: diff
Should I add that to can's Env
instead? I'm not sure how we decide what goes in there
I have an automatic fix for the import syntax working:
CleanShot-2024-01-27-at-15.42.05.mp4
I think this will help with the transition, given every single Roc file will be affected by it.
yoooooooo
that's super nice!!!
great stuff, @Agus Zubiaga!
I'm now working on the new headers keywords. I think we can also autofix a lot of that. Not everything though, since some of the semantics changed.
what do you think of trying out having them be records instead of keywords?
Interesting. I'll reply in that thread.
Hey all! @Luke Boswell asked me how this is going and I thought I would post an update publicly too.
I haven’t had a lot of time outside of work lately, so I have been moving slowly. I should start getting more free time this week.
Last time I touched it, I had the new inline imports working with all its features and the formatter would automatically upgrade header imports to this new syntax.
I also started working on the syntax changes to the headers themselves, and formatter upgrade mechanism for those too. I’m hoping I’ll make some good progress on that this weekend.
While the parser changes are easy, updating the thousands of mini roc files across tests and examples takes a significant part of the time. I can have the formatter do this automatically for some, but not for any of the test_syntax
snapshot tests because we want the originals to be unformatted, so we can test how the formatter affects them.
This is often creates conflicts with changes on main
too.
After this is all done, I’ll work on the privacy changes listed in the proposal and fixing some of the bugs in the module system in general.
The actual work on implementing module params will likely start sometime in March.
ooh, if there's something that requires lots of mechanical stuff with not much thought, I'd love to volunteer to help! :smiley:
that fits very well with the "Dad has 15 minutes free and can't realistically do anything that requires loading lots of context, but could knock out a bunch of mechanical work in that time" situation I very often find myself in these days :laughing:
Haha nice, good to know! Let’s see how these days go and maybe I’ll push parser changes without necessary fixing everything
cool, lmk anytime! And welcome back, exciting to hear about your plans on this! :smiley:
Ok, the interface
to module
syntax change is done. The old header is still supported and roc format
updates it automatically like with imports
.
sweeeeeeeet!!!
Now working on the new app
header with the syntax discussed here
CleanShot-2024-02-25-at-16.03.24.gif
Got the new app headers working today
I fixed most of the tests, but I stumbled into something weird:
app "quicksort" provides [swap, partition, partitionHelp, quicksort] to "./platform"
That app
fixture uses a platform that's not specified in packages
-which apparently is something you can do-, but the platform doesn't exist, yet the app is loaded fine
I'm guessing it was one of the first, and is now a bit of a legacy thing. But I don't know.
Yeah, it looks like the platform is ignored if you use that syntax. In the new app
header, there's no provides
keyword, so you have to specify the platform in the packages.
I created a fake platform that'd take those functions in the fixture, but it looks like the helpers in test_load
break if a platform is actually loaded :upside_down:
hm, that test seems redundant with https://github.com/roc-lang/roc/blob/f55df4bf6b99da1064abd98485516ee30ec739c1/crates/cli_testing_examples/algorithms/quicksort.roc
maybe we can just delete it?
we'll certainly get a test failure if that quicksort example breaks, and I'm not sure why 2 failures would be more helpful than 1 :big_smile:
Ah, good point. Thanks!
this is looking AMAZING btw!!! :heart_eyes:
super exciting to see!
Yeah, I think it looks so much cleaner too.
I just realized that there's a special way to provide types to the platform in app
headers:
app "test"
packages { pf: "./platform" }
provides [quicksort] { Flags, Model } to pf
# ^^^^^^^^^^^^^^^^
How is this used in practice?
I think this wouldn't work with the new syntax, and I wonder if we are using this or if it's legacy. I hadn't seen it before.
Maybe they can be provided amongst the values app [quicksort, Flags, Model]
like module
exposed names
But I still don't really get what these are for, couldn't the provided functions just have type vars?
yeah it's a workaround
we shouldn't need it anymore once we can pass closures to hosts
(or rather, once we can correctly generate glue for doing that)
Ah, I see. So we should keep it for now, right?
yeah I think some applications probably rely on it right now
I'd like to drop it eventually though
What syntax should we use?
maybe we can just accept uppercase identifiers in the square brackets for what the app provides to the platform :thinking:
Sounds good, I'll do that.
Technically, we already allow uppercase idents in the square brackets, because it just uses exposes_entry
: source
I'll deal with it in load
nice, sounds good!
Pasting some new errors here in case people have feedback:
── UNSPECIFIED PLATFORM in tmp/no_platform_specified/Test.roc ──────────────────
This app does not specify a platform:
1│> app [main] {
2│> json: "../json/main.roc"
3│> }
Every Roc app must select a package to use as its platform, like this:
app [main] {
pf: platform "…path or URL to platform…"
}
Tip: See an example in the tutorial: <https://www.roc-lang.org/tutorial#building-an-application>
── MULTIPLE PLATFORMS in tmp/multiple_platforms_specified/Test.roc ─────────────
This app specifies multiple packages as `platform`:
1│> app [main] {
2│> cli: platform "../cli/main.roc",
3│> web: platform "../web/main.roc",
4│> }
Roc apps must specify exactly one platform.
Maybe there’s something more insightful to say about the “multiple platforms” one
We should have a FAQ entry for that. I can write one and we can link to it!
Nice, that’d be great!
New package
headers are done!
I'm going to start working on the "Module and package privacy changes" section now
I still have to update the platform
header, but I'll do that one later since that's not just a syntax change.
The proposal indicates that it should be possible for nested packages to inherit packages from its parent:
package [ParserCore, ParserCsv, ParserStr]
packages [
JsonDecode from "https://…/json/…",
CodePt, Segment from "https://…/unicode/…",
Foo, Bar as Blah, Baz from ..
]
However, we later decided we need to keep the package shorthands, so what syntax should be use to inherit packages?
Maybe something like this:
package [ParserCore, ParserCsv, ParserStr] {
json: "https://…/json/…",
unicode: "https://…/unicode/…",
foo: inherit,
bar: inherit,
}
This is similar to the platform
indicator in the app
header
Another option:
package [ParserCore, ParserCsv, ParserStr] {
json: "https://…/json/…",
unicode: "https://…/unicode/…",
foo,
bar,
}
Or foo: ..
I like the second option, just foo,
bar,
In Elm you can do this:
import Maybe
import Maybe.Extra as Maybe
I used to like this at first because it felt like extending the module, but after a few years using the language, I stopped doing this because it makes it hard to figure out which of the modules a function is coming from.
Do we want Roc to support this or should we make it an error?
This is what I'm thinking:
── IMPORT ALIAS CONFLICT in tmp/duplicate_alias/Main.roc ───────────────────────
MaybeExtra was imported as Maybe
4│ import MaybeExtra as Maybe
^^^^^
but Maybe is also the name of another module:
3│ import Maybe
Using the same name for both can make it hard to tell which one you're referring to.
Make sure each import has a unique alias or none at all.
Similarly we wouldn't allow the same alias to be used for two modules:
── IMPORT ALIAS CONFLICT in tmp/duplicate_alias/Main.roc ───────────────────────
One was imported as D:
3│ import One as D
^
but the same alias was also used for Two:
4│ import Two as D
^
Each import should have a unique alias or none at all.
yeah totally agree
Now that I'm pretty close to finish the rest of the module system changes, I want to start thinking about ways to implement the actual module params
In what stage of the compiler should we deal with them? Is it something we can desugar at canonicalization?
The simplest approach to me would be to desguar by currying all the functions of the module with the params
I wonder what are the implications of that to performance and such
I also thought we could generate globals that we set basically where the import is. However, I think that would require generating a module per import which might result in big binaries.
I think it has to be arguments for inline imports
especially considering you could theoretically import the same module multiple times with as
:big_smile:
and have them be in scope at the same time
Yeah, exactly. That's why I think the globals approach would require emitting the same module multiple times.
Arguments should be fine, right?
I think so yeah
so then the next question is what specifically goes in those arguments
Maybe instead of actually currying the function, we extend its arguments with the module params record
yeah that's what I was thinking
and constants would become functions
so every function gets exactly 1 extra argument, and it's always a pointer to a record that lives on the stack
I guess it's not worth it to make each field of the record an argument, right? That should be optimized already
yeah I think it'll be better to pass it as a pointer
so the next question is when to build up that record
because if I'm calling 3 functions in a different module (the same module) in a row, I don't want to have to assemble module params records for that module 3 times
because they'll be the same each time! I'd rather just assemble it once and pass it in 3 times
If the import is inside a function, I guess we would create it whereever the import is. But it's trickier for top-level imports.
yeah so my thinking there is that we can essentially keep track in each scope like an VecMap<ModuleId, ModuleParams>
Yeah, or VecMap<ModuleName, ModuleParams>
which we populate once we've created the module params for a given module
haha exactly
and then yeah, we can just go grab those (or maybe just a Symbol
for them) if we already have them
I think it'd be ModuleName
instead of ModuleId
because you can have the same ModuleId
multiple times with different aliases
ah true!
I suspect we'd want to do this during monomorphization
because at that point we're traversing all the canonical nodes and then translating them into a different function format
and also transforming all the call sites as well
so right when we're doing both of those, we can add the extra param to the function definitions and the extra arg at the call sites
I think this would also be an excellent time to add another parameter (and another argument at the call sites) for Allocators
because we want to change it so that hosts pass in a pointer to an Allocator struct instead of having roc_alloc
, roc_dealloc
, etc. be top-level symbols
oh, I see
we can actually populate those and wire them up separately, but it seems like a good time to just go ahead and add 2 extra pointer arguments instead of 1
while in the process of doing exactly that to all the functions :big_smile:
I think we can also just uncritically add them, because I believe one of LLVM's optimizations is unused argument elimination
Yeah, it's like allocators are module params or I guess app params :smiley:
so if we pass them in and then don't use them, they should just get eliminated anyway
Ah, I see. I was also thinking whether we wanted to detect whether a param was used in a def to decide whether we should add the args.
Cool
so then I think the only other thing is needing to destructure the records at the top of each function
because they'll be referred to in canonicalization as like foo
rather than record.foo
I guess destructuring is dealt with earlier than mono, right?
so we need foo = record.foo
somewhere in there
I think we could do it in mono too
a nice thing about doing it in mono is that then roc check
doesn't pay for it :big_smile:
Yeah, we still have to type check them, so it's definitely going to involve some work in can for that
for sure :thumbs_up:
yeah I just mean the part about pulling them out of the record in the arg - that's just a code gen detail we need to take care of
I'm curious what @Brendan Hansknecht and @Folkert de Vries think about this in general!
Ah, I see
So if we have a top-level import. When is the record constructed? Is it like constants?
There's no obvious time to do it, right?
well the caller constructs it
By that you mean the expression that references a module function?
Because in that case, the record would be constructed every time you do that, right?
yeah so like if I've imported module Foo
and given it { blah }
and I call Foo.bar
, then I have to construct the { blah }
function before I can call Foo.bar
, because Foo.bar
will now be expecting a pointer to a { blah }
record as its last argument
and that pointer is going to be on my stack
or rather I'm passing it a pointer to a record that's on my stack
and Foo.bar
knows that it accepts a { blah }
record specifically because that's what's declared at the top of teh Foo
module
Agus Zubiaga said:
Because in that case, the record would be constructed every time you do that, right?
the VecMap<ModuleName, ModuleParams>
thing from earlier can prevent constructing it multiple times
like we cache it in scope so that if we already have one constructed in scope, we just use that instead of constructing another one
Right, exactly. But only inside of a function, right?
oh you mean like what if I have foo = if ...
at the top level, and it's not defined as a function?
as in, foo
itself is not a function
No, I mean from the caller side. If I have something like:
module [...]
import Foo { blah }
bar =
Foo.bar + Foo.bar
baz =
Foo.bar
I can cache the construction of { blah }
inside bar
, but when I call baz
I would have to construct it again.
ahh I see
yeah I agree, and I think that's fine
Unless we constructed {blah}
before anything in the program runs
in the future we can do compile-time evaluation of constants to eliminate that
yeah exactly
but in the meantime top-level non-functions always re-evaluate everything at runtime and I think that's fine (for now)
oh I guess we'll need to add the extra pointer arguments to those generated top-level thunks too!
since that's how we compile them today (top-level "constants" get compiled into 0-argument thunks during mono)
Right, exactly. That what I meant by "it's like constants" earlier :smiley:
So all defs in a module with params get a new argument
Well, I guess it's just the top-level ones. Because the nested ones already have it available from the parent.
yep!
well, 2 new arguments in anticipation of allocators, but yeah :big_smile:
right
Yeah, all this sounds fine. Given this is for "slow" io anyway, I think it is better to just keep it simple as a pointer to struct of function pointers instead of thinking about any form of monomorphizing it away.
I have minor questions about what llvm would optimize best in certain cases, but nothing I have meanful data to decide between (arg order for example)
I think my biggest question that is hard to know the answer to is:
1 pointer with allocators and imported function (will need to be reconstructed more often, though it is just cheap copying of a few function pointers)
Or, 2 pointers, one for each. Allocators never need to be reconstructed then. But you are forever pinning two registers to hold the pointers.
Third option:
Use 1 pointer, but nest an extra level. So the module params hold the allocators or vice versa. The allocators would still be stored on the stack, just an extra pointer jump to access them.
Very solid chance that this is the best option. Allocation is a relatively rare and slow event. An extra indirection contained on the stack should be safe. Freeing that register on x86 and in all function calls is probably worth it. When creating a new module param set, just copy a single pointer over.
yeah if we did that option it would have to be allocators hold the module params
because the allocator struct has to be created in the host, potentially in the host's stack
if you think that's most likely the best option, then maybe we should just plan to add the 1 param for module params
and then later we can modify the contents of that pointer to do allocator things
It's definitely a gut feeling especially given the extra arg shuffling that might be needed otherwise.
Also, I think it has to be module param holds the allocator (as t least if we want life to be simple).
Allocator will never change (in the scope of a single call to roc). Module param will change potentially on every function call. If we update the the module params, it would update globally if we are updating the module params within the global allocator allocated on the host. As such whenever returning from a function we would have to revert that.
If module params hold the allocator, whenever we create a new module param struct, we just pass an extra pointer in for the allocator. Grab it from the old module param struct.
Just need a special helper for each entry from host to roc or to make the host pass in a **allocator
for that to all work.
good point! :thumbs_up:
I'm trying to fix the REPL in my branch, but I'm confused about where its platform is
I can see we prepend this to the expression: https://github.com/roc-lang/roc/blob/c74685f74aa4aee259b3b30f2792d25981a48dee/crates/repl_eval/src/gen.rs#L175
but where is this ./platform
?
I think we just load it as a shared library, right. So it is more ingrained into the compiler with some sort of libloading call
Right, but don't we still need a platform roc file?
This fails in my branch because it can't find ./platform
, but somehow it doesn't in main
Not necessarily. It may be more adhoc like all of the gen tests.
But I don't recall for sure. Didn't write the code but definitely dug into it before.
Yeah, I fixed the gen tests by making a real platform roc file: https://github.com/roc-lang/roc/pull/6645/commits/d3efc751e2e93d4b5d739905702678a0dfb3fe4c
I could do the same with the REPL I suppose, but I guess I'd like to understand what this mythical ./platform
path does :smiley:
Was it exploiting a bug in load?
Probably related to LoadStart::from_str
Custom path that I think is only used by the repl
On phone, really hard to dig in
Yeah, no worries
I think we are basically ignoring the platform path when it's specified in provides
rather than in packages
It's not in packages
here, so it's never loaded: https://github.com/roc-lang/roc/blob/c74685f74aa4aee259b3b30f2792d25981a48dee/crates/compiler/load_internal/src/file.rs#L4054-L4064
So I think it's basically exploiting this in the REPL and tests to generate the shared lib without specifying a real platform
In the new app headers, we don't have this "provides to NewPackage
" syntax
Maybe I should allow app [main] { }
for the REPL and tests
Last updated: Jul 06 2025 at 12:14 UTC