module params · compiler development

So I found some time today to work on Module Params. I'm starting with the parser changes for the inline import syntax described here.

import Json
import Unicode as Uc
import [Req, Resp] from InternalHttp
import [CodePoint, first] from CodePoint as Cp
import foo : Str from "some-file.txt"

I think it makes sense to treat these as defs because they can appear in the top-level, but also nested in an expression/expect.

My first question is whether it would be appropriate to add a variant to ValueDef for imports? At first I thought imports aren't "values", but we seem to have Expect and Dbg in that union.

If not, I suppose we can change SingleDef.type_or_value to something like type_or_value_or_import :smile:

Brendan Hansknecht (Nov 25 2023 at 15:40):

I think they structurally are values if we truly allow them anywhere (though I could see an argument for only allowing them at the top of a block). They are kinda a pattern matching let statement, just over modules.

Richard Feldman (Nov 25 2023 at 16:04):

yeah I think they go in the same category as dbg and expect, and should parse the same way those do

Richard Feldman (Nov 25 2023 at 16:05):

Ayaz Hafiz (Nov 25 2023 at 16:17):

Agus Zubiaga (Nov 25 2023 at 17:21):

Cool. So basically, ValueDef are any “statements” that are not type definitions.

Agus Zubiaga (Nov 25 2023 at 17:31):

I’m struggling to come up with a name that captures that exactly. I’ll keep it for now and maybe it’ll become apparent as we implement more things in the proposal.

Folkert de Vries (Nov 25 2023 at 21:40):

inline imports are convenient I guess but do mean that we need to also support that in load. it's no longer true that after reading the header you know all dependencies, you now need to at least parse the whole file

Richard Feldman (Nov 25 2023 at 21:41):

although I think that will actually simplify things a bit, and I don't think the performance gains turned out to be significant after all :big_smile:

Agus Zubiaga (Nov 25 2023 at 21:46):

If we really wanted to, I suppose we could parse the ones between the header and the first non-import def (common case), and start processing those before parsing the rest of the file

Agus Zubiaga (Nov 25 2023 at 21:47):

Richard Feldman (Nov 25 2023 at 21:53):

another possibility would be to have the parser take an "import listener" which it would call every time it parsed an import, so it would immediately enqueue other modules as soon as it encountered them

Richard Feldman (Nov 25 2023 at 21:54):

but really I think the best answer is to make the parser so fast there'd be no noticeable difference between the more complicated strategies and just parsing the whole file and reporting all the imports at the end :big_smile:

Folkert de Vries (Nov 25 2023 at 22:03):

Agus Zubiaga (Nov 26 2023 at 23:41):

What is Malformed for exactly? In the new proposal, module names can't have dots. If we encounter some during parsing should that be treated as a failure or as malformed?

Agus Zubiaga (Nov 26 2023 at 23:45):

Hm, it looks like we use that for generating runtime errors in codegen. I wonder if we should still try to run if you have a bad import.

Ayaz Hafiz (Nov 27 2023 at 00:20):

I think keeping it is Malformed is good if it's not a lot of effort. It also means type checking can proceed with import errors

Richard Feldman (Nov 27 2023 at 00:32):

Richard Feldman (Nov 27 2023 at 00:34):

my general thinking on this is that an advantage interpreted languages (usually) have over statically compiled languages is that you can always run them no matter what problems there are, meaning you can always attempt to run the program no matter what

Richard Feldman (Nov 27 2023 at 00:34):

so the idea is to try to get that benefit by reporting errors at compile time but still proceeding to run the program if that was requested (e.g. via roc run) and then generating a crash once we encounter it in practice

Richard Feldman (Nov 27 2023 at 00:35):

so for example, if an invalid import is located inside a function, it might never come up if you're running a program that never hits that code path

Agus Zubiaga (Nov 27 2023 at 01:28):

Agus Zubiaga (Nov 27 2023 at 01:30):

That function could even run fine until the point it tries to reference an (incorrectly) imported symbol

Agus Zubiaga (Nov 29 2023 at 13:18):

import [
     map,
     int,
] from Json as J

Agus Zubiaga (Nov 29 2023 at 13:19):

import
  [
     map,
     int,
  ]
  from Json
  as J

Anton (Nov 29 2023 at 13:19):

Agus Zubiaga (Nov 29 2023 at 13:29):

Also I think we should only allow newlines and comments inside the exposed members part ([ ])

Richard Feldman (Nov 29 2023 at 13:38):

Agus Zubiaga (Nov 29 2023 at 13:39):

import
   # TODO: Use new module
   DeprecatedJson

Agus Zubiaga (Nov 29 2023 at 13:40):

Ayaz Hafiz (Nov 29 2023 at 14:16):

Agus Zubiaga (Nov 29 2023 at 15:37):

Agus Zubiaga (Nov 29 2023 at 15:38):

Agus Zubiaga (Nov 29 2023 at 15:46):

If you can have comments between any of the keywords, we probably need a few different format styles

Agus Zubiaga (Nov 29 2023 at 15:50):

import Json as J
import
  # comment
  Json as J

import
  # comment
  Json
  # comment
  as J

import [
  map,
  int
] from Json as J

import
  # comment
  [
    map,
    int
  ]
  # comment
  from
    # comment
    Json
  # comment
  as
    # comment
    J

Ayaz Hafiz (Nov 29 2023 at 15:56):

I feel like comments should be supported anywhere. A bias, but to me it would feel weird if adding a comment somewhere causes a parse error.

Ayaz Hafiz (Nov 29 2023 at 15:56):

Agus Zubiaga (Nov 29 2023 at 15:57):

Agus Zubiaga (Nov 29 2023 at 21:09):

I found is_collection_multiline under annotation.rs which we seem to use in other modules for non-annotation purposes. Shouldn't this live in collection.rs?

Agus Zubiaga (Nov 29 2023 at 21:10):

Richard Feldman (Nov 30 2023 at 01:35):

Richard Feldman (Nov 30 2023 at 01:36):

Agus Zubiaga (Dec 01 2023 at 13:55):

The parser and formatter for inline imports is mostly done! I started looking into canonicalization now, and realized that introducing inline imports before "Module and privacy changes" is probably a waste of effort since the module system changes significantly, and the new imports are just not compatible with the current system even at the syntax level (e.g. packages shorthand).

I think it probably makes sense to do this wholesale. Let me know what you think.

Agus Zubiaga (Dec 01 2023 at 13:56):

Richard Feldman (Dec 01 2023 at 14:16):

Agus Zubiaga (Dec 03 2023 at 01:33):

Agus Zubiaga (Dec 03 2023 at 01:35):

Sorting whole import defs might feel weird, but it might make sense to sort names inside exposing

Richard Feldman (Dec 03 2023 at 01:42):

Richard Feldman (Dec 03 2023 at 01:43):

Agus Zubiaga (Dec 03 2023 at 01:44):

Cool! And do you agree we shouldn't reorder whole import defs or do you think we should try it?

Richard Feldman (Dec 03 2023 at 01:45):

Agus Zubiaga (Dec 03 2023 at 01:45):

import B
import A

import A
import B

Richard Feldman (Dec 03 2023 at 01:46):

Agus Zubiaga (Dec 03 2023 at 01:47):

Gotcha! Yeah, I like that elm format does this. I thought it would not be as useful when imports aren't in a centralized place, but that might still be the case for a lot of modules.

Richard Feldman (Dec 03 2023 at 02:51):

I like it that elm-format does this too, and I'm glad I wrote it into the original elm-format spec document! :laughing:

Pearce Keesling (Dec 03 2023 at 04:03):

Yeah it is nice to respect line break grouping so I can group certain imports together when I want to but still get sorting within those groups

Agus Zubiaga (Dec 04 2023 at 00:24):

Instead of using a special module name parser, I’m leaning towards just using the regular identifier parser and treating this case as malformed. So it wouldn’t fail as a syntax error and we can provide a helpful error message explaining how module names must be capitalized.

Luke Boswell (Dec 04 2023 at 00:27):

Is there a particular design behind only allowing valid ident names for module file and folder names? It came up for me when I wanted to put my AoC into a 2023 folder and 01 subfolder, but they aren't valid idents so that was a parsing error.

Richard Feldman (Dec 04 2023 at 00:28):

Agus Zubiaga (Dec 04 2023 at 00:31):

Richard Feldman (Dec 04 2023 at 00:36):

so like if you loaded from a module named x-y-z.roc then referring to that in a qualified way would be x-y-z.baz which definitely means something else :big_smile:

Luke Boswell (Dec 04 2023 at 00:42):

Is there a bit of a convention here, apps/packages/platform mains are lowercase and other modules are uppercase?

Agus Zubiaga (Dec 04 2023 at 00:42):

Totally. I suppose we could support that by requiring an alias, but honestly I haven’t had the need to do something like this in Elm.

Richard Feldman (Dec 04 2023 at 00:42):

Richard Feldman (Dec 04 2023 at 00:43):

Agus Zubiaga (Dec 06 2023 at 12:25):

I was looking at this issue related to imports and realized that I don't know how it's supposed to work.

So if you have an app or package you can specify dependency packages, import modules from them, and you can also import interface modules that use those packages. If you run roc check (or test) against the app file, everything works as expected.

However, you can also run roc check with an interface file directly, and it works fine as long as you don't import a package. This makes sense because since there's no app, there's no way it can know what package the shorthand refers to. I imagine the LSP would also have this problem as it tries to check or autocomplete a module.

Is there a plan to resolve the app somehow? Maybe there's a convention for where to find it, or a CLI flag that can be specified?

Richard Feldman (Dec 06 2023 at 12:28):

certainly we already have a convention of that being main.roc, but of course it doesn't have to be

Richard Feldman (Dec 06 2023 at 12:29):

Richard Feldman (Dec 06 2023 at 12:30):

LSP could do things like trying main.roc first, and if that's not there, try all the .roc files in the folder (it's probably going to end up reading and parsing them all anyway, so once we have caching that should be cheap) and then remembering which one it concluded was the root module until that file is deleted from the file system

Agus Zubiaga (Dec 06 2023 at 12:31):

You'd probably need a config option for the LSP too, in case you have multiple apps.

Agus Zubiaga (Dec 06 2023 at 12:32):

Do you think the CLI could also default to main.roc or should we fail and require the user to specify when checking a module?

Richard Feldman (Dec 06 2023 at 12:33):

yeah I think multiple apps should be very rare unless they're small scripts that probably don't have interfaces, but of course it could happen :+1:

Richard Feldman (Dec 06 2023 at 12:34):

Agus Zubiaga (Dec 06 2023 at 12:35):

Yeah, that seems convenient. I suppose it would recursively try to find it from the module's directory and back. I believe this is how node finds node_modules, for example.

Agus Zubiaga (Dec 06 2023 at 12:37):

Richard Feldman (Dec 06 2023 at 12:44):

oh yeah I forget where we ended up on the interface name appearing at the start of the module or not

Richard Feldman (Dec 06 2023 at 12:44):

Agus Zubiaga (Dec 06 2023 at 12:45):

Well, what we need to find is the app/package file and I don't think the module would refer to that anyway.

Agus Zubiaga (Dec 06 2023 at 12:46):

Richard Feldman (Dec 06 2023 at 12:50):

like if I have interface Foo.Bar.Baz then I know to look up exactly 2 directories (Foo/Bar/) for the root module

Richard Feldman (Dec 06 2023 at 12:50):

but if that isn't declared at the top of the file, then the root module could be any number of directories up

Agus Zubiaga (Dec 06 2023 at 12:51):

Ah, I didn't think of that! Yeah, at least in the google doc, it looks like we won't have that anymore.

Agus Zubiaga (Dec 06 2023 at 12:54):

It doesn't look like that changed from Zulip discussions. At least, I haven't found that.

Agus Zubiaga (Dec 06 2023 at 12:57):

Personally, I like the idea of using the file name as the source of truth. Having to keep those in sync can be a little annoying in Elm.

Agus Zubiaga (Dec 06 2023 at 13:02):

I'm now torn, though. It does seem convenient for this use case. I don't care that much about the implementation, recursing would probably be fine, but I think for the user, it might be more intuitive to find the app/package this way.

Richard Feldman (Dec 06 2023 at 13:13):

besides convenience, another theoretical benefit of not having the path there is that multiple root modules in different directories can import the same interface

Richard Feldman (Dec 06 2023 at 13:26):

Agus Zubiaga (Dec 10 2023 at 04:04):

I just realized that we don’t need to recurse to find the app/package main since there won’t be module folders anymore :big_smile:

Richard Feldman (Dec 10 2023 at 04:08):

Agus Zubiaga (Dec 10 2023 at 04:26):

Do we only want to parse record literals for module params or can they be any expression?

params = { echo }

import A params

Agus Zubiaga (Dec 10 2023 at 04:27):

Richard Feldman (Dec 10 2023 at 04:31):

(we can always relax that in the future if there's some specific reason it seems like a good idea)

Agus Zubiaga (Dec 10 2023 at 04:32):

Sounds good. I thought there might be some use cases where passing a tag could be nice, but I couldn't come up with any obvious ones :D

Richard Feldman (Dec 10 2023 at 04:36):

for testing I can imagine theoretically wanting to pass a record in, but supporting that is more than just a parsing concern

Richard Feldman (Dec 10 2023 at 04:36):

Agus Zubiaga (Dec 26 2023 at 18:18):

I'm running into a parsing issue I don't have an obvious solution for. Take this example:

users : List { name: Str, age: U8 }
users =
    import "users.json" as data : Str
    data |> parseJson |> Result.withDefault []

When it's parsing the annotation in data : Str, it will continue to the next line and parse the intended data expr identifier as a type variable for Str, as if it were ... as data : Str data.

It later fails when it finds the |> because that doesn't make sense as part of type annotation or as the start of the expression that follows.

Initially, I think that wouldn't happen because I use increment_min_indent after I find the import keyword. However, it looks like module::typed_ident resets it.

Anton (Dec 26 2023 at 18:24):

Agus Zubiaga (Dec 26 2023 at 18:28):

We only support Str and List U8 for ingested files, so I could cheat here, but that seems weird

Agus Zubiaga (Dec 26 2023 at 18:36):

I'm not sure typed_ident needs to reset min ident. At first, I thought that was because we want this to parse:

User : {
  age: U8,
}

However, given that typed_ident itself doesn't increment_min_ident after the name (User), that would still parse I think.

Joshua Warner (Dec 26 2023 at 18:50):

I think the fault here lies with type_annotation::expression not enforcing that the second (and later) terms in a type expression must be indented relative to the line the expression starts on

Joshua Warner (Dec 26 2023 at 18:53):

Agus Zubiaga (Dec 26 2023 at 18:55):

FWIW all tests seem to pass if I remove reset_min_indent from typed_ident, and my example parses correctly

Agus Zubiaga (Dec 26 2023 at 18:55):

Joshua Warner (Dec 26 2023 at 18:55):

Joshua Warner (Dec 26 2023 at 18:57):

On the one hand, it does make the parser a little more error tolerant if someone accidentally put a newline right after the : and forgot to indent. On the other hand, that someone might equally have forgotten to put a type after the :, and by continuing to parse at the same indent level, we're just going to be accidentally interpreting part of the next line as a type.

Agus Zubiaga (Dec 26 2023 at 19:03):

Agus Zubiaga (Dec 26 2023 at 19:05):

It works more like skip_first!, which makes sense the other cases use keywords in $p1

Joshua Warner (Dec 26 2023 at 19:06):

We probably want a variant of indented_seq! that doesn't drop the first argument

Joshua Warner (Dec 26 2023 at 19:07):

actually, we should probably just rename the current one to indented_seq_skip_first or something, and make a new one that returns both args

Agus Zubiaga (Dec 26 2023 at 19:11):

Agus Zubiaga (Dec 26 2023 at 19:13):

Agus Zubiaga (Dec 27 2023 at 21:23):

I'm working on updating load to work with inline imports. As we've discussed before, now that imports could appear anywhere, we don't know the dependencies until we have actually parsed the whole file (not just the header).

I'm not sure what's the best way to find these after parsing, though. They are available in the AST, but walking through all the expressions to find them seems wasteful, or is that ok? Another option, is to add a Vec to the parser's state to which we would insert imported module names as we find them. Does that sound like a good idea?

Ayaz Hafiz (Dec 27 2023 at 21:31):

I would suggest walking the AST first and then if it turns out having it in the parser state gets rid of a bottleneck doing that later.

Agus Zubiaga (Dec 28 2023 at 19:59):

So while building headers in load, we have a scope_size variable that gets incremented whenever we find an import and is later used to allocate hashmap.

I think that should be += exposed.len(), not += num_exposes since num_exposes is actually the number of exposed names from the current module, not those from the import.

Richard Feldman (Dec 28 2023 at 20:12):

Agus Zubiaga (Dec 29 2023 at 18:16):

Richard Feldman (Dec 29 2023 at 18:17):

Agus Zubiaga (Jan 11 2024 at 01:35):

I'm rewriting unused import warnings because, with inline imports, they now need to happen in can instead of load as they're limited to certain scopes.

It looks like currently we don't differentiate between qualified and unqualified symbol references, which means that we don't report exposed symbols from imports, even if they're only used qualified.

app "example"
    packages { pf: ".." }
    imports [pf.Stdout.{ line }]
                       # ^^^^ this is unused
    provides [main] to pf

main =
    Stdout.line "Hello world!"
    #      ^^^^ symbol used here but qualified

Agus Zubiaga (Jan 11 2024 at 01:59):

I found tests with these unused exposed names (see Dep2.{ two }), but I guess those are just leftovers

Richard Feldman (Jan 11 2024 at 02:01):

I don't think I even considered this scenario when I was writing the unused warnings in like 2019 :sweat_smile:

Agus Zubiaga (Jan 11 2024 at 02:07):

Richard Feldman (Jan 11 2024 at 02:08):

Agus Zubiaga (Jan 11 2024 at 02:12):

I think we need to tweak the copy a bit now that imports can appear inside defs:

unusedModule =
    import Dep
    "I'm not using Dep"

── UNUSED IMPORT ───────────────────────────────────── tmp/unused_imports/Main ─

Nothing from Dep is used in this module.

10│      import Dep
         ^^^^^^^^^^

Since Dep isn't used, you don't need to import it

Should we use "scope" instead of "module" if it isn't top-level? Is that too technical?

Richard Feldman (Jan 11 2024 at 02:13):

Agus Zubiaga (Jan 11 2024 at 02:14):

Agus Zubiaga (Jan 11 2024 at 22:14):

It looks like currently you can import a builtin explicitly and we'll just ignore it. Should we make that a warning? It could be an opportunity to explain how builtins work :smile:

Richard Feldman (Jan 11 2024 at 22:18):

if we do, we might need special handling for the builtins themselves; they actually need to import each other explicitly

Agus Zubiaga (Jan 11 2024 at 22:19):

Agus Zubiaga (Jan 11 2024 at 22:22):

── BUILTIN IMPORT ───────────────────────────────────── tmp/unused_builtins/Main ─

The Str builtin was imported explicitly.

10│      import Str
         ^^^^^^^^^^

Builtins are imported automatically into every Roc module, so you can remove this line.

You can learn more about builtin modules at <https://www.roc-lang.org/tutorial#builtin-modules>

Agus Zubiaga (Jan 11 2024 at 22:24):

Richard Feldman (Jan 12 2024 at 01:49):

I'd say only do it if you feel like it, but from a design perspective it seems like a good idea to me :big_smile:

Luke Boswell (Jan 19 2024 at 02:14):

I've been working on something for my Action-State experiment and stopped to think about this proposal and what it might look like. I'm just wondering what the following might look like in a module-params future?? Note the modules Action, GUI, and Core come from the platform.

Current

interface Counter
    exposes [
        Counter,
        init,
        render,
    ]
    imports [ray.Action.{ Action }, ray.Core.{ Color }, ray.GUI.{ Elem }]

Counter := I64

init : I64 -> Counter
init = @Counter

render : Counter, Color -> Elem Counter
render = \@Counter state, color ->
    GUI.col [
        GUI.button { text: "+", onPress: \@Counter prev -> Action.update (@Counter (prev + 1)) },
        GUI.text { label: "Clicked $(Num.toStr state) times", color },
        GUI.button { text: "-", onPress: \@Counter prev -> Action.update (@Counter (prev - 1)) },
    ]

Future ??

module {Elem,Color,button,text,col,update} -> [Counter, init, render]

Counter := I64

init : I64 -> Counter
init = @Counter

render : Counter, Color -> Elem Counter
render = \@Counter state, color ->
    col [
        button { text: "+", onPress: \@Counter prev -> update (@Counter (prev + 1)) },
        text { label: "Clicked $(Num.toStr state) times", color },
        button { text: "-", onPress: \@Counter prev -> update (@Counter (prev - 1)) },
    ]

Brendan Hansknecht (Jan 19 2024 at 02:41):

I'm not sure it makes sense for most of those pieces to be generic as module params. Module params are really just for dependency injection.

So I would guess most of those inputs would be imported directly. Though I guess if you want your counter to be generic over gui/tui/etc you might do something like that.

Luke Boswell (Jan 19 2024 at 02:52):

Brendan Hansknecht (Jan 19 2024 at 03:01):

Brendan Hansknecht (Jan 19 2024 at 03:02):

Agus Zubiaga (Jan 19 2024 at 11:13):

Agus Zubiaga (Jan 19 2024 at 11:15):

import ray.Gui exposing [col, button, text]

Agus Zubiaga (Jan 20 2024 at 11:50):

I just got ingested files working for the new imports. I had to move some code from load/file to can/def which depended on the module's path, and I ended up having to thread that through a ton of can functions: diff

Should I add that to can's Env instead? I'm not sure how we decide what goes in there

Agus Zubiaga (Jan 27 2024 at 18:46):

I have an automatic fix for the import syntax working:
CleanShot-2024-01-27-at-15.42.05.mp4
I think this will help with the transition, given every single Roc file will be affected by it.

Richard Feldman (Jan 27 2024 at 18:49):

Agus Zubiaga (Jan 27 2024 at 18:55):

I'm now working on the new headers keywords. I think we can also autofix a lot of that. Not everything though, since some of the semantics changed.

Richard Feldman (Jan 27 2024 at 18:59):

Richard Feldman (Jan 27 2024 at 19:00):

Agus Zubiaga (Jan 27 2024 at 20:08):

Agus Zubiaga (Feb 17 2024 at 18:20):

Hey all! @Luke Boswell asked me how this is going and I thought I would post an update publicly too.

I haven’t had a lot of time outside of work lately, so I have been moving slowly. I should start getting more free time this week.

Last time I touched it, I had the new inline imports working with all its features and the formatter would automatically upgrade header imports to this new syntax.

I also started working on the syntax changes to the headers themselves, and formatter upgrade mechanism for those too. I’m hoping I’ll make some good progress on that this weekend.

While the parser changes are easy, updating the thousands of mini roc files across tests and examples takes a significant part of the time. I can have the formatter do this automatically for some, but not for any of the test_syntax snapshot tests because we want the originals to be unformatted, so we can test how the formatter affects them.
This is often creates conflicts with changes on main too.

After this is all done, I’ll work on the privacy changes listed in the proposal and fixing some of the bugs in the module system in general.

The actual work on implementing module params will likely start sometime in March.

Richard Feldman (Feb 17 2024 at 18:22):

ooh, if there's something that requires lots of mechanical stuff with not much thought, I'd love to volunteer to help! :smiley:

Richard Feldman (Feb 17 2024 at 18:23):

that fits very well with the "Dad has 15 minutes free and can't realistically do anything that requires loading lots of context, but could knock out a bunch of mechanical work in that time" situation I very often find myself in these days :laughing:

Agus Zubiaga (Feb 17 2024 at 18:38):

Haha nice, good to know! Let’s see how these days go and maybe I’ll push parser changes without necessary fixing everything

Richard Feldman (Feb 17 2024 at 18:45):

cool, lmk anytime! And welcome back, exciting to hear about your plans on this! :smiley:

Agus Zubiaga (Feb 18 2024 at 22:24):

Ok, the interface to module syntax change is done. The old header is still supported and roc format updates it automatically like with imports.

Richard Feldman (Feb 18 2024 at 22:24):

Agus Zubiaga (Feb 18 2024 at 22:25):

Agus Zubiaga (Feb 25 2024 at 18:55):

Agus Zubiaga (Feb 25 2024 at 18:57):

app "quicksort" provides [swap, partition, partitionHelp, quicksort] to "./platform"

Agus Zubiaga (Feb 25 2024 at 18:59):

That app fixture uses a platform that's not specified in packages -which apparently is something you can do-, but the platform doesn't exist, yet the app is loaded fine

Luke Boswell (Feb 25 2024 at 19:01):

I'm guessing it was one of the first, and is now a bit of a legacy thing. But I don't know.

Agus Zubiaga (Feb 25 2024 at 19:04):

Yeah, it looks like the platform is ignored if you use that syntax. In the new app header, there's no provides keyword, so you have to specify the platform in the packages.

Agus Zubiaga (Feb 25 2024 at 19:06):

I created a fake platform that'd take those functions in the fixture, but it looks like the helpers in test_load break if a platform is actually loaded :upside_down:

Richard Feldman (Feb 25 2024 at 19:21):

we'll certainly get a test failure if that quicksort example breaks, and I'm not sure why 2 failures would be more helpful than 1 :big_smile:

Agus Zubiaga (Feb 25 2024 at 19:30):

Richard Feldman (Feb 25 2024 at 19:36):

Luke Boswell (Feb 25 2024 at 19:48):

Agus Zubiaga (Mar 02 2024 at 19:01):

I just realized that there's a special way to provide types to the platform in app headers:

app "test"
    packages { pf: "./platform" }
    provides [quicksort] { Flags, Model } to pf
#                        ^^^^^^^^^^^^^^^^

Agus Zubiaga (Mar 02 2024 at 19:02):

I think this wouldn't work with the new syntax, and I wonder if we are using this or if it's legacy. I hadn't seen it before.

Agus Zubiaga (Mar 02 2024 at 19:06):

Maybe they can be provided amongst the values app [quicksort, Flags, Model] like module exposed names

Agus Zubiaga (Mar 02 2024 at 19:07):

But I still don't really get what these are for, couldn't the provided functions just have type vars?

Richard Feldman (Mar 02 2024 at 19:29):

Agus Zubiaga (Mar 02 2024 at 19:47):

Richard Feldman (Mar 02 2024 at 19:50):

Agus Zubiaga (Mar 02 2024 at 19:53):

Richard Feldman (Mar 02 2024 at 20:46):

maybe we can just accept uppercase identifiers in the square brackets for what the app provides to the platform :thinking:

Agus Zubiaga (Mar 02 2024 at 21:36):

Agus Zubiaga (Mar 02 2024 at 21:37):

Technically, we already allow uppercase idents in the square brackets, because it just uses exposes_entry: source

Agus Zubiaga (Mar 02 2024 at 21:38):

Richard Feldman (Mar 02 2024 at 22:06):

Agus Zubiaga (Mar 02 2024 at 23:10):

   ── UNSPECIFIED PLATFORM in tmp/no_platform_specified/Test.roc ──────────────────

    This app does not specify a platform:

    1│>  app [main] {
    2│>      json: "../json/main.roc"
    3│>  }

    Every Roc app must select a package to use as its platform, like this:

        app [main] {
            pf: platform "…path or URL to platform…"
        }

    Tip: See an example in the tutorial: <https://www.roc-lang.org/tutorial#building-an-application>

    ── MULTIPLE PLATFORMS in tmp/multiple_platforms_specified/Test.roc ─────────────

    This app specifies multiple packages as `platform`:

    1│>  app [main] {
    2│>      cli: platform "../cli/main.roc",
    3│>      web: platform "../web/main.roc",
    4│>  }

    Roc apps must specify exactly one platform.

Agus Zubiaga (Mar 02 2024 at 23:15):

Maybe there’s something more insightful to say about the “multiple platforms” one

Richard Feldman (Mar 02 2024 at 23:44):

Agus Zubiaga (Mar 02 2024 at 23:46):

Agus Zubiaga (Mar 04 2024 at 23:04):

Agus Zubiaga (Mar 04 2024 at 23:07):

Agus Zubiaga (Mar 04 2024 at 23:08):

I still have to update the platform header, but I'll do that one later since that's not just a syntax change.

Agus Zubiaga (Mar 04 2024 at 23:20):

The proposal indicates that it should be possible for nested packages to inherit packages from its parent:

package [ParserCore, ParserCsv, ParserStr]
        packages [
            JsonDecode from "https://…/json/…",
            CodePt, Segment from "https://…/unicode/…",
            Foo, Bar as Blah, Baz from ..
        ]

However, we later decided we need to keep the package shorthands, so what syntax should be use to inherit packages?

Agus Zubiaga (Mar 04 2024 at 23:21):

package [ParserCore, ParserCsv, ParserStr] {
    json: "https://…/json/…",
    unicode: "https://…/unicode/…",
    foo: inherit,
    bar: inherit,
}

Agus Zubiaga (Mar 04 2024 at 23:22):

Agus Zubiaga (Mar 04 2024 at 23:23):

package [ParserCore, ParserCsv, ParserStr] {
    json: "https://…/json/…",
    unicode: "https://…/unicode/…",
    foo,
    bar,
}

Agus Zubiaga (Mar 04 2024 at 23:24):

Richard Feldman (Mar 05 2024 at 02:26):

Agus Zubiaga (Mar 24 2024 at 00:43):

import Maybe
import Maybe.Extra as Maybe

I used to like this at first because it felt like extending the module, but after a few years using the language, I stopped doing this because it makes it hard to figure out which of the modules a function is coming from.

Agus Zubiaga (Mar 24 2024 at 00:48):

── IMPORT ALIAS CONFLICT in tmp/duplicate_alias/Main.roc ───────────────────────

MaybeExtra was imported as Maybe

4│  import MaybeExtra as Maybe
                         ^^^^^

but Maybe is also the name of another module:

3│  import Maybe

Using the same name for both can make it hard to tell which one you're referring to.

Make sure each import has a unique alias or none at all.

Agus Zubiaga (Mar 24 2024 at 00:50):

── IMPORT ALIAS CONFLICT in tmp/duplicate_alias/Main.roc ───────────────────────

One was imported as D:

3│  import One as D
                  ^

but the same alias was also used for Two:

4│  import Two as D
                  ^

Each import should have a unique alias or none at all.

Richard Feldman (Mar 24 2024 at 11:51):

Agus Zubiaga (Apr 12 2024 at 17:44):

Now that I'm pretty close to finish the rest of the module system changes, I want to start thinking about ways to implement the actual module params

Agus Zubiaga (Apr 12 2024 at 17:44):

In what stage of the compiler should we deal with them? Is it something we can desugar at canonicalization?

Agus Zubiaga (Apr 12 2024 at 17:45):

The simplest approach to me would be to desguar by currying all the functions of the module with the params

Agus Zubiaga (Apr 12 2024 at 17:45):

Agus Zubiaga (Apr 12 2024 at 17:49):

I also thought we could generate globals that we set basically where the import is. However, I think that would require generating a module per import which might result in big binaries.

Richard Feldman (Apr 12 2024 at 17:49):

Richard Feldman (Apr 12 2024 at 17:50):

especially considering you could theoretically import the same module multiple times with as :big_smile:

Richard Feldman (Apr 12 2024 at 17:50):

Agus Zubiaga (Apr 12 2024 at 17:51):

Yeah, exactly. That's why I think the globals approach would require emitting the same module multiple times.

Agus Zubiaga (Apr 12 2024 at 17:51):

Richard Feldman (Apr 12 2024 at 17:51):

Richard Feldman (Apr 12 2024 at 17:52):

Agus Zubiaga (Apr 12 2024 at 17:52):

Maybe instead of actually currying the function, we extend its arguments with the module params record

Richard Feldman (Apr 12 2024 at 17:52):

Agus Zubiaga (Apr 12 2024 at 17:52):

Richard Feldman (Apr 12 2024 at 17:53):

so every function gets exactly 1 extra argument, and it's always a pointer to a record that lives on the stack

Agus Zubiaga (Apr 12 2024 at 17:54):

I guess it's not worth it to make each field of the record an argument, right? That should be optimized already

Richard Feldman (Apr 12 2024 at 17:54):

Richard Feldman (Apr 12 2024 at 17:55):

because if I'm calling 3 functions in a different module (the same module) in a row, I don't want to have to assemble module params records for that module 3 times

Richard Feldman (Apr 12 2024 at 17:55):

because they'll be the same each time! I'd rather just assemble it once and pass it in 3 times

Agus Zubiaga (Apr 12 2024 at 17:56):

If the import is inside a function, I guess we would create it whereever the import is. But it's trickier for top-level imports.

Richard Feldman (Apr 12 2024 at 17:56):

yeah so my thinking there is that we can essentially keep track in each scope like an VecMap<ModuleId, ModuleParams>

Agus Zubiaga (Apr 12 2024 at 17:56):

Richard Feldman (Apr 12 2024 at 17:57):

Agus Zubiaga (Apr 12 2024 at 17:57):

Richard Feldman (Apr 12 2024 at 17:57):

and then yeah, we can just go grab those (or maybe just a Symbol for them) if we already have them

Agus Zubiaga (Apr 12 2024 at 17:57):

I think it'd be ModuleName instead of ModuleId because you can have the same ModuleId multiple times with different aliases

Richard Feldman (Apr 12 2024 at 17:57):

Richard Feldman (Apr 12 2024 at 17:58):

because at that point we're traversing all the canonical nodes and then translating them into a different function format

Richard Feldman (Apr 12 2024 at 17:58):

so right when we're doing both of those, we can add the extra param to the function definitions and the extra arg at the call sites

Richard Feldman (Apr 12 2024 at 17:59):

I think this would also be an excellent time to add another parameter (and another argument at the call sites) for Allocators

Richard Feldman (Apr 12 2024 at 17:59):

because we want to change it so that hosts pass in a pointer to an Allocator struct instead of having roc_alloc, roc_dealloc, etc. be top-level symbols

Agus Zubiaga (Apr 12 2024 at 17:59):

Richard Feldman (Apr 12 2024 at 18:00):

we can actually populate those and wire them up separately, but it seems like a good time to just go ahead and add 2 extra pointer arguments instead of 1

Richard Feldman (Apr 12 2024 at 18:00):

I think we can also just uncritically add them, because I believe one of LLVM's optimizations is unused argument elimination

Agus Zubiaga (Apr 12 2024 at 18:00):

Richard Feldman (Apr 12 2024 at 18:00):

so if we pass them in and then don't use them, they should just get eliminated anyway

Agus Zubiaga (Apr 12 2024 at 18:02):

Ah, I see. I was also thinking whether we wanted to detect whether a param was used in a def to decide whether we should add the args.

Agus Zubiaga (Apr 12 2024 at 18:02):

Richard Feldman (Apr 12 2024 at 18:03):

so then I think the only other thing is needing to destructure the records at the top of each function

Richard Feldman (Apr 12 2024 at 18:03):

because they'll be referred to in canonicalization as like foo rather than record.foo

Agus Zubiaga (Apr 12 2024 at 18:03):

Richard Feldman (Apr 12 2024 at 18:03):

Richard Feldman (Apr 12 2024 at 18:04):

a nice thing about doing it in mono is that then roc check doesn't pay for it :big_smile:

Agus Zubiaga (Apr 12 2024 at 18:05):

Yeah, we still have to type check them, so it's definitely going to involve some work in can for that

Richard Feldman (Apr 12 2024 at 18:07):

yeah I just mean the part about pulling them out of the record in the arg - that's just a code gen detail we need to take care of

Richard Feldman (Apr 12 2024 at 18:08):

I'm curious what @Brendan Hansknecht and @Folkert de Vries think about this in general!

Agus Zubiaga (Apr 12 2024 at 18:08):

Agus Zubiaga (Apr 12 2024 at 18:09):

So if we have a top-level import. When is the record constructed? Is it like constants?

Agus Zubiaga (Apr 12 2024 at 18:10):

Richard Feldman (Apr 12 2024 at 18:11):

Agus Zubiaga (Apr 12 2024 at 18:12):

Agus Zubiaga (Apr 12 2024 at 18:13):

Because in that case, the record would be constructed every time you do that, right?

Richard Feldman (Apr 12 2024 at 18:13):

yeah so like if I've imported module Foo and given it { blah } and I call Foo.bar, then I have to construct the { blah } function before I can call Foo.bar, because Foo.bar will now be expecting a pointer to a { blah } record as its last argument

Richard Feldman (Apr 12 2024 at 18:13):

Richard Feldman (Apr 12 2024 at 18:14):

and Foo.bar knows that it accepts a { blah } record specifically because that's what's declared at the top of teh Foo module

Richard Feldman (Apr 12 2024 at 18:14):

the VecMap<ModuleName, ModuleParams> thing from earlier can prevent constructing it multiple times

Richard Feldman (Apr 12 2024 at 18:14):

like we cache it in scope so that if we already have one constructed in scope, we just use that instead of constructing another one

Agus Zubiaga (Apr 12 2024 at 18:15):

Richard Feldman (Apr 12 2024 at 18:15):

oh you mean like what if I have foo = if ... at the top level, and it's not defined as a function?

Richard Feldman (Apr 12 2024 at 18:15):

Agus Zubiaga (Apr 12 2024 at 18:17):

module [...]

import Foo { blah }

bar =
     Foo.bar + Foo.bar

baz =
    Foo.bar

I can cache the construction of { blah } inside bar, but when I call baz I would have to construct it again.

Richard Feldman (Apr 12 2024 at 18:18):

Agus Zubiaga (Apr 12 2024 at 18:18):

Richard Feldman (Apr 12 2024 at 18:18):

Richard Feldman (Apr 12 2024 at 18:19):

but in the meantime top-level non-functions always re-evaluate everything at runtime and I think that's fine (for now)

Richard Feldman (Apr 12 2024 at 18:19):

oh I guess we'll need to add the extra pointer arguments to those generated top-level thunks too!

Richard Feldman (Apr 12 2024 at 18:19):

since that's how we compile them today (top-level "constants" get compiled into 0-argument thunks during mono)

Agus Zubiaga (Apr 12 2024 at 18:20):

Agus Zubiaga (Apr 12 2024 at 18:21):

Well, I guess it's just the top-level ones. Because the nested ones already have it available from the parent.

Richard Feldman (Apr 12 2024 at 18:21):

Agus Zubiaga (Apr 12 2024 at 18:22):

Brendan Hansknecht (Apr 12 2024 at 18:34):

Yeah, all this sounds fine. Given this is for "slow" io anyway, I think it is better to just keep it simple as a pointer to struct of function pointers instead of thinking about any form of monomorphizing it away.

I have minor questions about what llvm would optimize best in certain cases, but nothing I have meanful data to decide between (arg order for example)

I think my biggest question that is hard to know the answer to is:
1 pointer with allocators and imported function (will need to be reconstructed more often, though it is just cheap copying of a few function pointers)
Or, 2 pointers, one for each. Allocators never need to be reconstructed then. But you are forever pinning two registers to hold the pointers.

Brendan Hansknecht (Apr 12 2024 at 18:44):

Third option:
Use 1 pointer, but nest an extra level. So the module params hold the allocators or vice versa. The allocators would still be stored on the stack, just an extra pointer jump to access them.

Very solid chance that this is the best option. Allocation is a relatively rare and slow event. An extra indirection contained on the stack should be safe. Freeing that register on x86 and in all function calls is probably worth it. When creating a new module param set, just copy a single pointer over.

Richard Feldman (Apr 12 2024 at 18:48):

yeah if we did that option it would have to be allocators hold the module params

Richard Feldman (Apr 12 2024 at 18:48):

because the allocator struct has to be created in the host, potentially in the host's stack

Richard Feldman (Apr 12 2024 at 18:49):

if you think that's most likely the best option, then maybe we should just plan to add the 1 param for module params

Richard Feldman (Apr 12 2024 at 18:50):

and then later we can modify the contents of that pointer to do allocator things

Brendan Hansknecht (Apr 12 2024 at 18:51):

It's definitely a gut feeling especially given the extra arg shuffling that might be needed otherwise.

Brendan Hansknecht (Apr 12 2024 at 18:54):

Also, I think it has to be module param holds the allocator (as t least if we want life to be simple).

Allocator will never change (in the scope of a single call to roc). Module param will change potentially on every function call. If we update the the module params, it would update globally if we are updating the module params within the global allocator allocated on the host. As such whenever returning from a function we would have to revert that.

If module params hold the allocator, whenever we create a new module param struct, we just pass an extra pointer in for the allocator. Grab it from the old module param struct.

Brendan Hansknecht (Apr 12 2024 at 18:55):

Just need a special helper for each entry from host to roc or to make the host pass in a **allocator for that to all work.

Richard Feldman (Apr 12 2024 at 19:48):

Agus Zubiaga (Apr 17 2024 at 01:49):

I'm trying to fix the REPL in my branch, but I'm confused about where its platform is

Agus Zubiaga (Apr 17 2024 at 01:49):

Agus Zubiaga (Apr 17 2024 at 01:50):

Brendan Hansknecht (Apr 17 2024 at 01:50):

I think we just load it as a shared library, right. So it is more ingrained into the compiler with some sort of libloading call

Agus Zubiaga (Apr 17 2024 at 01:50):

Agus Zubiaga (Apr 17 2024 at 01:51):

This fails in my branch because it can't find ./platform, but somehow it doesn't in main