Stream: compiler development

Topic: no_std Path for a Good First Issue


view this post on Zulip Sam Mohr (Dec 23 2024 at 23:21):

While working on the canonicalization rewrite, I've been trying to figure out how best to handle file paths. We want to make roc_can no_std to enforce that everything is arena-allocated for caching and performance. I started by converting all Paths to &'a [u8] but with the recent OsArg discussion, it seems like a recipe for Windows-related bugs.

Unfortunately, I'm not seeing a way to use an existing impl for Path: I found no issue to make it available no_std in the Rust repo, and no crates in the ecosystem at first glance.

It seems like the solution is to add a new arena-friendly Path in crates/compiler/collections. It's pretty self-contained, does anyone disagree that it would be a good beginner issue? It's tricky to get right, but they should be able to copy a good bit of it from the stdlib

view this post on Zulip Ayaz Hafiz (Dec 23 2024 at 23:27):

Im going to sound like a passenger princess saying this given that i've done almost zero contribution recently, so feel free to ignore this, but i would strongly suggest doing the simplest thing (using stdlib paths) first and then getting rid of allocations etc in subsequent passes once things are stable. one of the biggest issues IMO from an engineering perspective (and in no small part, partially my fault) is there are layers of abstraction that make the implementation hard to read and debug. I think this is a good beginner issue but having bugs here will be very painful.

view this post on Zulip Sam Mohr (Dec 24 2024 at 01:02):

Yeah, that's fair. Okay, I can stick with Path for now and do my best to avoid allocations. It shouldn't be a huge pain to enable #![no_std] later

view this post on Zulip Sam Mohr (Dec 24 2024 at 01:02):

Though it would be nice to have it from the start to keep anyone else from doing that

view this post on Zulip Joshua Warner (Dec 24 2024 at 14:52):

FWIW I'm firmly on board with keeping Paths out of can - and in fact, IMO it should be as far away from the core of the compiler as possible. This way, we can avoid platform-specific behavior. e.g. I want to prevent windows-specific parsing (or can) problems or behavior differences.

view this post on Zulip Joshua Warner (Dec 24 2024 at 14:52):

That said, I'm not sure &[u8] is the right thing to use in its place

view this post on Zulip Joshua Warner (Dec 24 2024 at 14:54):

A much better option (IMO) is to just use &str. My bet is that we will _never_ encounter a problem that will be fixed by having the slightly more faithful Path types.

view this post on Zulip Sam Mohr (Dec 24 2024 at 14:56):

We need some way to maintain a relationship between a file path and its module path, e.g. Path.To.Module and src/Path/To/Module.roc

view this post on Zulip Sam Mohr (Dec 24 2024 at 14:57):

&str can work for now

view this post on Zulip Joshua Warner (Dec 24 2024 at 14:58):

Do you happen to know what that's used for, after can?

view this post on Zulip Sam Mohr (Dec 24 2024 at 15:01):

I don't think its really used after can, besides error reporting

view this post on Zulip Sam Mohr (Dec 24 2024 at 15:01):

Can't check right now, helping my mom with annual computer stuff at Xmas :grinning_face_with_smiling_eyes:

view this post on Zulip Sam Mohr (Dec 24 2024 at 15:06):

I guess an optimization we could do is to save module names in a double format at load time

view this post on Zulip Sam Mohr (Dec 24 2024 at 15:10):

AKA

pub struct ModuleName<'a> {
    pub filename: &'a str,
    pub segments: &'a [&'a str],
}

view this post on Zulip Sam Mohr (Dec 24 2024 at 15:12):

And then we don't have to worry about how to convert

view this post on Zulip Brendan Hansknecht (Dec 24 2024 at 20:06):

Why would we ever be allocating new paths in can. Feels pretty unimportant to worry about their allocations and add complexity around them

view this post on Zulip Brendan Hansknecht (Dec 24 2024 at 20:06):

Also, for a module name, I feel like we could force valid unicode. It's our language after all.

view this post on Zulip Brendan Hansknecht (Dec 24 2024 at 20:07):

So if we really wanted, we could just convert all module names to valid utf-8 strings

view this post on Zulip Richard Feldman (Dec 24 2024 at 21:13):

yeah filenames can be invalid UTF-8 in the OS, but Roc source files have to be valid UTF-8, so any module would need a valid UTF-8 filename to be usable

view this post on Zulip Richard Feldman (Dec 24 2024 at 21:13):

although I guess theoretically the rest of the path could be invalid

view this post on Zulip Richard Feldman (Dec 24 2024 at 21:14):

but then as soon as we're printing out the paths for display, we have invalid Unicode so we'd be displaying the Unicode replacement character, which could be confusing

view this post on Zulip Richard Feldman (Dec 24 2024 at 21:15):

all that said, it also feels pretty free to just leave them as std Paths for now


Last updated: Jul 06 2025 at 12:14 UTC