Stream: compiler development

Topic: zig compiler - function solve


view this post on Zulip Jared Ramirez (Apr 03 2025 at 00:52):

I'm working on writing some IRs by hand so I can start to work on the function_solve stage. The IR I'm writing as input for function_solve is the function_lift IR. I'm starting off pretty simple, creating a simple identity function

fn : U8 -> U8
fn = |x| x

here's what I have so far

a couple things:

  1. this is the right way to be going about this, right?
  2. does the IR so far look correct?

Assuming so, I'll continue on towards creating an IR that matches the compiling lambda sets doc, as the first test for implementing function_solve

view this post on Zulip Jared Ramirez (Apr 13 2025 at 23:50):

Worked a little more on this and I have a slightly more complex function that creates function lift IR for:

fn : (Arg -> Ret) -> Arg -> Ret
fn = |f, x| f x

(where the caller provides Arg and Ret)

Zig code is here.

While doing this, I also realized there's no appendSlice for MultiList, and there's not way to get a slice range of a MultiList . This was a problem because in the function lift IR we hold all Expr.Typed in the root of the IR, then reference slices into the MultiList up and down the tree. So to write the IR for the above function, I ended up needing to write appendSliceand sliceFrom functions for MulitList! I'm still def zig noob, so I would love some feedback on if this there's another way to do this, or if this function has footguns in it. Here's the commit with the funcs and here's the place it's used.

view this post on Zulip Jared Ramirez (Apr 14 2025 at 00:02):

(Also, if there’s something different that would be better to be working on at this stage in development, please let me know!)

view this post on Zulip Anton (Apr 15 2025 at 09:19):

It seems like there is low bandwidth for the people with the necessary expertise to assist @Jared Ramirez, feel free to continue to the best of your ability. We appreciate your contribution :heart:

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 15:50):

Sorry to never reply. Here are my thoughts:

  1. I don't think we should add sliceFrom. We should just do x.slice()[from..] and use the standard slicing mechanisms
  2. For appendSlice, I would write it as:
        pub fn append_slice(self: *SafeMultiList(T), gpa: Allocator, elems: []const T) void {
            if (elems.len == 0) {
                return;
            }
            self.items.ensureUnusedCapacity(gpa, elems.len) catch |err| exitOnOom(err);
            for (elems) |elem| {
                self.items.appendAssumeCapacity(elem);
            }
        }

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 15:54):

Oh, I see, for safe multilist, we need to return a Slice cause it would normally return an Idx for append. This is actually more complex

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 15:56):

I don't think it is safe to return and use a Slice here. A Slice is a reference to the underlying data. What you need is a special kinda of range or something of that nature. Similar to Idx, you probably need a new data structure that is simply {start: Idx, end: Idx}.

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 15:57):

Anything that gets stored in IR needs to be Idx and not real pointers.

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 15:57):

So you probably need to make a new subtype of multilist similar to Idx. I would probably call it a Range and return that from the function.

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 15:58):

I feel like @Anthony Bullard had to already do something similar in the parser.

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 16:00):

Oh, it looks like we messed this up on SafeList. We'll have to correct that.

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 16:03):

Anyway, so a more corrected function, I think it should be something like:

        pub fn append_slice(self: *SafeMultiList(T), gpa: Allocator, elems: []const T) Range {
            if (elems.len == 0) {
                return Range.empty;
            }
            const start = self.len()
            self.items.ensureUnusedCapacity(gpa, elems.len) catch |err| exitOnOom(err);
            for (elems) |elem| {
                self.items.appendAssumeCapacity(elem);
            }
            const end = self.len()
            return .{ .start = @enumFromInt(@as(u32, @intCast(start))), .end = @enumFromInt(@as(u32, @intCast(end))) };
        }

view this post on Zulip Jared Ramirez (Apr 15 2025 at 16:04):

Heard & agree on returning Idxs here!

Then (from my likely incorrect understanding), returning a slice here seems like it should be safe because the field typed_exprs (which is theSafeMultiList) lives in the root function lift IR, so this Slice should live as long as the IR lives?

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 16:04):

IR can be serialized and restored.

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 16:05):

It is also just a waste of space to store 24 bytes (ptr, usize, usize) for every slice instead of only 8 bytes (Idx, Idx).

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 16:07):

Then we would have a function like get, but getRange instead and it would return the slice (which would be short lived for when you need to access the data).

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 16:10):

Oh, also, beyond serilization and restoration, if the list ever grows, it will reallocate, which will likely move the pointer and break all slices.

view this post on Zulip Jared Ramirez (Apr 15 2025 at 16:13):

That's fine with me, I was just going off of how the IR is currently setup up.

My understanding (which again could be wrong), was that the IR has base fields of SafeLists and SafeMultiLists for many things (ie types, typed_exprs, etc):

// src/build/lift_functions/IR.zig
exposed_values: std.AutoHashMap(Ident.Idx, Expr.Idx),
exposed_functions: std.AutoHashMap(Ident.Idx, Function),
types: Type.List,
exprs: Expr.List,
typed_exprs: Expr.Typed.List,
patterns: Pattern.List,
typed_patterns: Pattern.Typed.List,
typed_idents: TypedIdent.List,
when_branches: WhenBranch.List,

Then other places in the IR reference Slices into these arrays. In particular, the Expr type has:

pub const Expr = union(enum) {
  ...
    call: struct {
        fn_type: Type.Idx,
        fn_expr: Expr.Idx,
        args: Expr.Typed.Slice,
    },
}

So I guess first: Is this understanding correct?

The reason I got here at all was I was trying to insert a call expr, but it needed a Expr.Typed.Slice (ie a Slice of a SafeMultiList).

So then you're saying that expr (and other places in this IR) should not use slices and should instead use a new Range type?

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 16:15):

So I guess first: Is this understanding correct?

Yep. That is the goal

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 16:15):

So then you're saying that expr (and other places in this IR) should not use slices and should instead use a new Range type?

Yep

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 16:17):

Also Range is my default name, but it could also be called an IdxSlice or something else.

view this post on Zulip Jared Ramirez (Apr 15 2025 at 16:24):

Okay cool! All makes sense to me, thank you for walking me through that :smile: I didn't consider that serde & reallocation would break the Slices! And I was also treating the IR types as somewhat sacred and didn't think I should change it lol

So I think then I will update this IR (and maybe others?) to use Range/IdxSlice in one MR, then rebase my changes ontop of that and continue!

view this post on Zulip Brendan Hansknecht (Apr 15 2025 at 16:25):

Sounds great

view this post on Zulip Richard Feldman (Apr 15 2025 at 16:56):

I really want to get serialization going, with tests...gonna work on that this week

view this post on Zulip Jared Ramirez (Apr 17 2025 at 01:28):

In the func_lift IR, I'm a little confused as to where to get type information for exposed_values and exposed_functions. The IR looks like:

env: *base.ModuleEnv,
exposed_values: std.AutoHashMap(Ident.Idx, Expr.Idx),  // HERE
exposed_functions: std.AutoHashMap(Ident.Idx, Function),  // HERE
types: Type.List,
exprs: Expr.List,
typed_exprs: Expr.Typed.List,
patterns: Pattern.List,
typed_patterns: Pattern.Typed.List,
typed_idents: TypedIdent.List,
when_branches: WhenBranch.List,

It seems like something here has to be changed to store this data?

Some thoughts:

view this post on Zulip Brendan Hansknecht (Apr 17 2025 at 05:38):

Likely it would be worth looking at previous IRs in order to piece this picture together

view this post on Zulip Brendan Hansknecht (Apr 17 2025 at 05:39):

Like walk down the compilation flow to see what makes most sense to have/generate

view this post on Zulip Brendan Hansknecht (Apr 17 2025 at 05:39):

I wish I had a more concrete answer, but I only have guesses

view this post on Zulip Luke Boswell (Apr 17 2025 at 06:29):

I think type information will come from the type Store in src/types/type.zig previously known as Subs... this is filled out by type checking, but hasn't been implemented yet. See https://github.com/roc-lang/roc/pull/7634 for more information.

view this post on Zulip Luke Boswell (Apr 17 2025 at 06:31):

My understanding is that will live in the ModuleEnv and be used throughout the life of the compilation.

view this post on Zulip Jared Ramirez (Apr 20 2025 at 22:30):

Okay got it, so by the time that types have been solved for, that Store will be the source of truth for all type information.

So given that, it seems like:

  1. Any IR with a field like types: Type.List should be removed
  2. ModuleEnv should have a field like types: Type.Store

Given that type solving hasn't been implemented yet, how do y'all think I should proceed?

Do you think it makes sense to pull just the types from that PR (given that the definition of Store in there is different than the defintion on main) sans implementation, then make the changes to the IRs/ModuleEnv mentioned above?

view this post on Zulip Wizard ish (Apr 20 2025 at 22:31):

also note https://github.com/roc-lang/roc/pull/7738

view this post on Zulip Wizard ish (Apr 20 2025 at 22:35):

also i dont know why that the local idx is essientally just a larger and more specific versions of the global idx

view this post on Zulip Jared Ramirez (Apr 20 2025 at 22:36):

Yeah, I saw that PR too!

I guess the question is, is the interface of Store in either PR stable enough to be pulled out and defined on main now, as to unblock the ability to represent IRs for stages after type solving?

view this post on Zulip Wizard ish (Apr 20 2025 at 22:39):

the original PR is stable enough, but it dosen't implement much, and mine (the fork), is very much still being worked on. Certain of the features do work, so if it's just adding to or removing from the store, i can quickly just remove all the things that aren't working

view this post on Zulip Luke Boswell (Apr 21 2025 at 10:24):

It would be really nice to get a basic can/type checking thing operational (even if it's just strings) end to end.

I feel like I don't know enough about compilers to implement the very first things here, but happy to chip away and add features once there is a clear design direction or example I can follow.

We might need someone like @Richard Feldman or @Lucas Rosa to help... get us something minimal working.

view this post on Zulip Wizard ish (Apr 21 2025 at 11:16):

Okay then by I’ll see if I can get it in some operational state by the end of the day

view this post on Zulip Wizard ish (Apr 21 2025 at 20:14):

ok it isn't working, but the interface does exist


Last updated: Jul 06 2025 at 12:14 UTC