Stream: compiler development

Topic: s-expression format


view this post on Zulip Luke Boswell (Jun 24 2025 at 10:22):

I've been down a rabbit hole on the API for our s-expressions... this is what I have so far.

(can_ir
    (d_let :idx=#87
        (p_assign (8:1-8:6) :ident="main!" :idx=#73)
        (e_lambda (8:9-12:2) :idx=#86
            (args
                (p_underscore (8:10-8:11) :idx=#74))
            (e_block (8:13-12:2)
                (s_let (9:2-9:17)
                    (p_assign (9:2-9:7) :ident="world" :idx=#75)
                    (e_string (9:10-9:17) :idx=#77
                        (e_literal (9:11-9:16) :string="World")))
                (e_call (11:2-11:31)
                    (e_lookup_external
                        (ext_decl (11:2-11:14) :qualified="pf.Stdout.line!" :module="pf.Stdout" :local="line!" :kind="value" :type_var=#79))
                    (e_string (11:15-11:30)
                        (e_literal (11:16-11:29) :string="Hello, world!"))))))
    (s_import (6:1-6:17) :module="pf.Stdout" :idx=#72
        (exposes)))

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:23):

I'm trying to make it easier to read and clearer by adding attributes :name=value to nodes instead of just hardcore nesting

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:24):

Doing this simplifies the rule for newlines, basically one for each child

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:27):

why is :ident="main!" better than (ident "main!")?

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:32):

It would seem that the newline rule should be a newline for each child node, and all attribute nodes are first and on the same line as the tag for the sexpr

(tag (line:col-line:col) (attr1 val) (attr2 val) ...
    (child_1_tag (line:col-line:col) (attr1 val) ...
    (child_2_tag) ...
    ...)

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:33):

It would seem that the newline rule should be a newline for each child node, and all attribute nodes are first and on the same line as the tag for the sexpr

This was the key change... I'll give the formatting a go with parens again

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:33):

I would probably go full sexpr and have no custom microformats to parse

(tag (line col line col) (attr1 val) (attr2 val) ...
    (child_1_tag (line col line col) (attr1 val) ...
    (child_2_tag) ...
    ...)

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:34):

(can_ir
    (d_let (idx #87)
        (p_assign (8:1-8:6) (ident "main!") (idx #73))
        (e_lambda (8:9-12:2) (idx #86)
            (args
                (p_underscore (8:10-8:11) (idx #74)))
            (e_block (8:13-12:2)
                (s_let (9:2-9:17)
                    (p_assign (9:2-9:7) (ident "world") (idx #75))
                    (e_string (9:10-9:17) (idx #77)
                        (e_literal (9:11-9:16) (string "World"))))
                (e_call (11:2-11:31)
                    (e_lookup_external
                        (ext_decl (11:2-11:14) (qualified "pf.Stdout.line!") (module "pf.Stdout") (local "line!") (kind "value") (type_var #79)))
                    (e_string (11:15-11:30)
                        (e_literal (11:16-11:29) (string "Hello, world!")))))))
    (s_import (6:1-6:17) (module "pf.Stdout") (idx #72)
        (exposes)))

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:34):

ok, i don't think the # is necessary

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:35):

(can_ir
    (d_let (id 87)
        (p_assign (8 1 8 6) (ident "main!") (id 73))
        (e_lambda (8 9 12 2) (id 86)
            (args
                (p_underscore (8 10 8 11) (id 74)))
            (e_block (8 13 12 2)
                (s_let (9 2 9 17)
                    (p_assign (9 2 9 7) (ident "world") (id 75))
                    (e_string (9 10 9 17) (id 77)
                        (e_literal (9 11 9 16) (string "World"))))
                (e_call (11 2 11 31)
                    (e_lookup_external
                        (ext_decl (11 2 11 14) (qualified "pf.Stdout.line!") (module "pf.Stdout") (local "line!") (kind "value") (type_var 79)))
                    (e_string (11 15 11 30)
                        (e_literal (11 16 11 29) (string "Hello, world!")))))))
    (s_import (6 1 6 17) (module "pf.Stdout") (id 72)
        (exposes)))

view this post on Zulip Notification Bot (Jun 24 2025 at 10:35):

10 messages were moved here from #compiler development > casual conversation by Luke Boswell.

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:37):

Maybe tag the Pos node and put the line/col pairs in ()s? Like (pos (6 1) (6 17))?

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:37):

It's an improvement over the current I think https://github.com/roc-lang/roc/blob/main/src/snapshots/hello_world_with_block.md#canonicalize

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:37):

Definitely

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:39):

you want to do the least amount of parsing possible, while it still being readable to someone unfamiliar with the specific format (but who has a base level of understanding of sexpr)

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:39):

I'm thinking something custom for Region's is unnavoidable

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:39):

are we saving the ids so that we can recover the IR from the Sexpr representation?

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:40):

I got a bit carried away there. Only some nodes (patterns I think?) need to show them.

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:40):

Other nodes reference them by NodeIdx. Also TypeVars are NodeIdx's

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:41):

I figure any node that is referenced (or could be) by another probably should show it's id

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:41):

I'm just trying to understand if that's useful for debugging

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:41):

I don't know though... just experimenting with it

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:42):

I guess we use the Region info to map back to the source, and the type information will reference a pattern in a declaration maybe.

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:42):

Ok, as far as region you can just have it be a standard s-expr symbol if you start with @

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:43):

How?

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:43):

@6!1-6!17 is a valid S-expr symbol

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:44):

And that can just be an atom

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:45):

I think @ is helpful for understanding that it's a region (or related to where it's "at")

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:45):

(can_ir
    (d_let (id 87)
        (p_assign @8!1-8!6 (ident "main!") (id 73))
        (e_lambda @8!9-12!2 (id 86)
            (args
                (p_underscore @8!10-8!11 (id 74)))
            (e_block @8!13-12!2
                (s_let @9!2-9!17
                    (p_assign @9!2-9!7 (ident "world") (id 75))
                    (e_string @9!10-9!17 (id 77)
                        (e_literal @9!11-9!16 (string "World"))))
                (e_call @11!2-11!31
                    (e_lookup_external
                        (ext_decl @11!2-11!14 (qualified "pf.Stdout.line!") (module "pf.Stdout") (local "line!") (kind "value") (type_var 79)))
                    (e_string @11!15-11!30
                        (e_literal @11!16-11!29 (string "Hello, world!")))))))
    (s_import @6!1-6!17 (module "pf.Stdout") (id 72)
        (exposes)))

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:45):

But unfortunately : is not usually valid there

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:46):

Hmm. I'm surprised with the syntax highlighting

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:46):

Maybe (@ (8 1) (8 16)) is better?

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:47):

Yeah I'm trying to find something nice

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:47):

What about this?

(p_assign @8-1-8-6 (ident "main!") (id 73))

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:51):

I think that's my favourite so far.

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:52):

That looks fine to me

(p_assign @8-1-8-6 (ident "main!") (id 73))

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:52):

Just the smallest of comments here, but canonically, sexpr symbols should be in kebab case, not snake

view this post on Zulip Anthony Bullard (Jun 24 2025 at 10:53):

but if we are using the string value of a zig tag directly, it might not be worth a conversion

view this post on Zulip Luke Boswell (Jun 24 2025 at 10:56):

I might do a pass through each s-expr node and change that as I go.

view this post on Zulip Anthony Bullard (Jun 24 2025 at 11:02):

up to you, just calling it out

view this post on Zulip Anthony Bullard (Jun 24 2025 at 11:02):

you've been doing a great job with these and the snapshots!

view this post on Zulip Luke Boswell (Jun 24 2025 at 11:10):

Here's an updated PARSE section

(app @1-1-3-57
    (provides @2-3-2-10
        (exposed-lower-ident (text "main!")))
    (record-field @3-28-3-55 (name "pf")
        (e-string @3-41-3-54
            (e-string-part @3-42-3-53 (raw "../main.roc"))))
    (packages @3-2-3-57
        (record-field @3-4-3-27 (name "somePkg")
            (e-string @3-13-3-26
                (e-string-part @3-14-3-25 (raw "../main.roc"))))
        (record-field @3-28-3-55 (name "pf")
            (e-string @3-41-3-54
                (e-string-part @3-42-3-53 (raw "../main.roc"))))))

view this post on Zulip Luke Boswell (Jun 24 2025 at 11:11):

The kebab makes is more searchable using ctrl-F

view this post on Zulip Anthony Bullard (Jun 24 2025 at 11:13):

I think this looks great

view this post on Zulip Luke Boswell (Jun 24 2025 at 11:23):

And this is the app we started with, Can section

(can-ir
    (d-let (id 87)
        (p-assign @8-1-8-6 (ident "main!") (id 73))
        (e-lambda @8-9-12-2 (id 86)
            (args
                (p-underscore @8-10-8-11 (id 74)))
            (e-block @8-13-12-2
                (s-let @9-2-9-17
                    (p-assign @9-2-9-7 (ident "world") (id 75))
                    (e-string @9-10-9-17 (id 77)
                        (e-literal @9-11-9-16 (string "World"))))
                (e-call @11-2-11-31
                    (e-lookup-external
                        (ext-decl @11-2-11-14 (qualified "pf.Stdout.line!") (module "pf.Stdout") (local "line!") (kind "value") (type-var 79)))
                    (e-string @11-15-11-30
                        (e-literal @11-16-11-29 (string "Hello, world!")))))))
    (s-import @6-1-6-17 (module "pf.Stdout") (id 72)
        (exposes)))

view this post on Zulip Anthony Bullard (Jun 24 2025 at 11:26):

Beautiful

view this post on Zulip Luke Boswell (Jun 24 2025 at 11:36):

Random side question... does anyone know how to rename a file so git is happy. I change the casing on this but I'm not sure git has carried it through.

Screenshot 2025-06-24 at 21.35.33.png

view this post on Zulip Kiryl Dziamura (Jun 24 2025 at 11:38):

maybe this would help
https://adamj.eu/tech/2022/12/09/git-change-case-of-filenames/

tldr: git mv

view this post on Zulip Luke Boswell (Jun 24 2025 at 11:40):

Thank you.

view this post on Zulip Joshua Warner (Jun 24 2025 at 15:29):

FWIW I think greppability of the s-expr names (i.e. exact match between what a thing is called in the s-expr and what it's called in the zig source) is really useful for ramping up in an unfamiliar codebase. Agree it's a bit more readable with kebab-case, but that does give me a bit of pause...


Last updated: Jul 06 2025 at 12:14 UTC