While working on the canonicalization rewrite, I've been trying to figure out how best to handle file paths. We want to make roc_can no_std to enforce that everything is arena-allocated for caching and performance. I started by converting all Paths to &'a [u8]
but with the recent OsArg discussion, it seems like a recipe for Windows-related bugs.
Unfortunately, I'm not seeing a way to use an existing impl for Path: I found no issue to make it available no_std in the Rust repo, and no crates in the ecosystem at first glance.
It seems like the solution is to add a new arena-friendly Path in crates/compiler/collections
. It's pretty self-contained, does anyone disagree that it would be a good beginner issue? It's tricky to get right, but they should be able to copy a good bit of it from the stdlib
Im going to sound like a passenger princess saying this given that i've done almost zero contribution recently, so feel free to ignore this, but i would strongly suggest doing the simplest thing (using stdlib paths) first and then getting rid of allocations etc in subsequent passes once things are stable. one of the biggest issues IMO from an engineering perspective (and in no small part, partially my fault) is there are layers of abstraction that make the implementation hard to read and debug. I think this is a good beginner issue but having bugs here will be very painful.
Yeah, that's fair. Okay, I can stick with Path for now and do my best to avoid allocations. It shouldn't be a huge pain to enable #![no_std] later
Though it would be nice to have it from the start to keep anyone else from doing that
FWIW I'm firmly on board with keeping Paths out of can
- and in fact, IMO it should be as far away from the core of the compiler as possible. This way, we can avoid platform-specific behavior. e.g. I want to prevent windows-specific parsing (or can
) problems or behavior differences.
That said, I'm not sure &[u8]
is the right thing to use in its place
A much better option (IMO) is to just use &str
. My bet is that we will _never_ encounter a problem that will be fixed by having the slightly more faithful Path types.
We need some way to maintain a relationship between a file path and its module path, e.g. Path.To.Module
and src/Path/To/Module.roc
&str can work for now
Do you happen to know what that's used for, after can
?
I don't think its really used after can, besides error reporting
Can't check right now, helping my mom with annual computer stuff at Xmas :grinning_face_with_smiling_eyes:
I guess an optimization we could do is to save module names in a double format at load time
AKA
pub struct ModuleName<'a> {
pub filename: &'a str,
pub segments: &'a [&'a str],
}
And then we don't have to worry about how to convert
Why would we ever be allocating new paths in can. Feels pretty unimportant to worry about their allocations and add complexity around them
Also, for a module name, I feel like we could force valid unicode. It's our language after all.
So if we really wanted, we could just convert all module names to valid utf-8 strings
yeah filenames can be invalid UTF-8 in the OS, but Roc source files have to be valid UTF-8, so any module
would need a valid UTF-8 filename to be usable
although I guess theoretically the rest of the path could be invalid
but then as soon as we're printing out the paths for display, we have invalid Unicode so we'd be displaying the Unicode replacement character, which could be confusing
all that said, it also feels pretty free to just leave them as std Paths for now
Last updated: Jul 06 2025 at 12:14 UTC