:sob:
error: implementation of `FnOnce` is not general enough
--> crates/compiler/gen_llvm/src/llvm/build.rs:5735:5
|
...
|
= note: closure with signature `fn(&'2 [BasicMetadataValueEnum<'_>]) -> CallSiteValue
= note: ...but it actually implements `FnOnce<(&'2 [BasicMetadataValueEnum<'_>],)>`,
can't you wrap it in another closure to fix this?
also where did the return type go on the bottom line?
i don't know.. it's the classic thing where you have a parameter like
build_foo : FnOnce(Something<'ctx>) -> Other<'ctx>,
and feed it
|something| env.builder.build_whatever(something)
and the inference breaks down over higher-rank lifetimes so you just need to explicitly type it as
|something: Something<'ctx>| env.builder.build_whatever(something)
the fix is simple but it's unfortunate the error message is poor
"ægraphs: Acyclic E-graphs for Efficient Optimization in a Production Compiler" https://vimeo.com/843540328
fyi we just upgraded main to rust 1.67. upgrading should be seamless, but you may see some longer build times
also a cargo clean
may save ~100gb on your system
function erasure is ready for review: https://github.com/roc-lang/roc/pull/5576
Random question, is the intent to use this instead of lambda sets? I.e. this helps remove a complex area of the compiler that was responsible for bugs? Just not entirely understanding why here. I don't need to know, but it looks cool and I'm interested.
it's an alternative to compiling functions to lambda sets but it won't get rid of them. We need type-erasure (for values and functions) in order to support Map2, so it will at least be used for that. But we could also eventually use it for dev builds, since type-erased functions are faster to compiler, or for roc check
, since lambda sets are not important for type checking.
also they're more amenable to on-disk caching!
(for dev builds)
https://github.com/rust-lang/rust/blob/master/compiler/rustc_mir_transform/src/large_enums.rs
would be cool to consider this. we do this already in the llvm backend but it could be generalzied
https://github.com/wolfpld/tracy/ h/t @Andrew Kelley
also the goated https://github.com/janestreet/magic-trace
@Richard Feldman & Josh (unsure of your last name, sorry), since we briefly talked about how much logic was in file.rs I thought you might get a kick out of how many lines this file is in zig: https://github.com/ziglang/zig/blob/master/src/Sema.zig
@Joshua Warner
that's awesome :joy:
when people complain about this, I usually say something to the effect of, "they're good lines, Brent"
(reference to https://knowyourmeme.com/memes/theyre-good-dogs-brent)
That GitHub link makes the mobile app crash so I am gonna guess it's quite large :sweat_smile:
you should see https://github.com/microsoft/TypeScript/blob/main/src/compiler/checker.ts
:laughter_tears:
Makes me long for FORTH a little. For example, the infamous IOCCC submission buzzard.2's ~60 lines of C source code is essentially all you need to bootstrap a language runtime.
it was inevitable of course: I have submitted a patch to llvm https://reviews.llvm.org/D155944
no idea what the status of the C api is actually, and what the chances are of that being accepted, but long term we will absolutely need guaranteed tail calls
https://twitter.com/lemire/status/1683560952027815936
Wow.... That took way way way longer than it should have to happen
More power to the compiler, probably at the cost of more hardware register contention. But compilers are almost certainly good enough and code complex enough that this will be a gain.
that they're all caller saved is super sweet. hopefully will make compilers go faster too, less time for the register allocator to run
some cool ideas here about data type representation https://inria.hal.science/hal-04165615/document needs some iteration I think to make it really usable, but it's a good start
in recent rust versions, you can use OnceLock
instead of lazy_static!
// globally cache the temporary directory
static TEMP_DIR: OnceLock<tempfile::TempDir> = OnceLock::new();
let temp_dir = TEMP_DIR.get_or_init(|| tempfile::tempdir().unwrap());
it's not always the most ergonomic approach (and LazyLock
, the stdlib name for this thing, is an unstable feature, so may make it in at some point) but in many cases this works quite well and does not require any dependencies
(from what I can read, the cost runtime cost is the same)
someone is picking up the rust custom allocators work https://shift.click/blog/allocator-trait-talk/
not holding my breath for quick progress there, but it's good that someone is working on it
does anyone know if there is a technical reason for allocating from high to low addresses? It appears quite common, but I'd like to know if that is just a custom or that there are good technical reasons for it
do you mean on the stack or the heap? on the heap maybe just because of memory regions?
I assume it's because the stack allocates the other way
I mean heap-like things. E.g. an arena allocator
like the stack and the heap can't both use the same strategy, unless one of them is set up to start right where the other one ends, but to do that you'd need to know the exact size of the stack
which I think maybe requires a syscall or something, and/or may not always be supported on all OSes?
well yes when you have some other region growing from the other side you need to make a choice
but what if you didn't? is there still a reason to perfer starting at the end and growing down?
None that I can think of.
must just be convenience then
as in, the thing I could copy-paste went high-to-low, so that is just what it does now
looks like making valgrind work on Apple Silicon is...really difficult! :astonished:
https://github.com/LouisBrunner/valgrind-macos/issues/56#issuecomment-1651811069
good talk about performance of the Carbon compiler
https://www.youtube.com/watch?v=ZI198eFghJk
modeling the semantics (including checking) as an IR sounds very interesting
really want to read the source
How does Roc's reference counting with cycles? I am wondering, since I am fascinated with ORC, the way Nim did it. Does Roc do it at all at the given moment? Thanks for your answers :)
Roc does not deal with cycles, because cycles are impossible to introduce in an immutable language like Roc, since to create a cycle you must create then mutate a value.
small correction: immutable and strict language. Laziness in haskell can also create cycles
that first operator in Carbon's source code :eyes:
Screen-Shot-2023-08-23-at-9.24.24-AM.png
this is from https://youtu.be/ZI198eFghJk?t=2830
That is just normal c++, right?
Shift and assign
oh well that's no fun :stuck_out_tongue:
I think the last time I wrote a whole C++ program was over 20 years ago :sweat_smile:
We realize that we're getting older when we look at our children... and at our old code :wink:
Yesterday I have updated an Elm file that hasn't been touched for over 7 years!
I wish I still had that RPG I wrote in c++ I wrote all those years ago...I'm sure I could spot some undefined behavior in it today :laughing:
very interesting!
https://arxiv.org/pdf/2107.01250.pdf
this is a cool tip if we ever want to start testing big-endian targets in CI: https://twitter.com/burntsushi5/status/1695483429997945092
(I'm fine with not officially supporting them as targets yet)
I'll just leave this here :octopus:
PS C:\Users\bosyl\Documents\GitHub\roc> .\target\release\roc.exe run ..\basic-cli\examples\hello-world.roc
🔨 Rebuilding platform...
warning: ignoring debug info with an invalid version (0) in app
Hello, World!
First time I've been able to get that working.
I packaged basic-cli
for Windows and uploaded the binary in the Github release. Not sure if this is a bad idea to share like this...
PS C:\Users\bosyl\Documents\GitHub\basic-cli> ..\roc\target\release\roc.exe run .\examples\hello-world.roc
Downloading https://github.com/lukewilliamboswell/basic-cli/releases/download/0.5.0/Dt_L3PF3VAAxtLzxXZ-g9nBp0Gzo6tbTevSep1SdDeQ.tar.br
into C:\Users\bosyl\AppData\Roaming\Roc\packages
warning: ignoring debug info with an invalid version (0) in app
Hello, World!
app "hello-world"
packages { pf: "https://github.com/lukewilliamboswell/basic-cli/releases/download/0.5.0/Dt_L3PF3VAAxtLzxXZ-g9nBp0Gzo6tbTevSep1SdDeQ.tar.br" }
imports [pf.Stdout, pf.Task.{ Task }]
provides [main] to pf
main : Task {} I32
main =
Stdout.line "Hello, World!"
There are still a lot of issues here, other examples I've tried will build and run but not print anything... but at least this is a start. :smiley:
I don't think the exe is code signed, so that can be a hurdle with SmartScreen :thinking:
Ahh Windows :smile:
image.png
What I did is I used the Windows Sandbox to get a clean windows version, downloaded the Exe. Went to the properties and clicked Unblock (due to it being downloaded from the internet) and then just double clicked the exe file.
image.png
Running it in powershell also yields no output
@Luke Boswell let me know if you need help with testing Windows stuff. I have Windows 10 on one machine and Windows 11 on another.
I'm surprised the cli doesn't even print anything. :thinking:
this is wild - I can't believe that nibble mask works, those numbers look so random! But it totally does - I tried it out.
https://lemire.me/blog/2017/07/10/pruning-spaces-faster-on-arm-processors-with-vector-table-lookups/
Linux Torvalds on mmap vs buffered reading in Linux: https://lkml.iu.edu/hypermail/linux/kernel/0802.0/1496.html
I'm confused whether this is specific to the linux kernel, or more general for "programs running on linux"
also interesting: https://stackoverflow.com/a/260188
I watched a few YouTube tutorials on interactive rebasing, and cherry-picking and I have to admit... I wish I had done that sooner! So simple, and would have saved me a lot of time. Learnt a few nice tricks along the way. :thumbs_up: :sweat_smile:
This is something that when I started using Fork (https://git-fork.com/) just became so much clearer to me. Unfortunately doesn't have a linux port, only Windows and Mac
I'm just wondering, should the zig tests pass on MacOS? I get the below errors which seem strange to me thread 11673296 panic: incorrect alignment
Aarch64ZigTestError
gist with errors and correct alignment
that is weird. strings have a refcount, so the alignment should always be 8?
https://dl.acm.org/doi/pdf/10.1145/3243176.3243195 has some wild RC numbers:
How many do we use?
Same number of bits as a pointer. So 64 or 32.
We increase the alignment on many heap data structures because of the refcount. Like Str bytes for example.
The github web vscode editor (press '.' key on a repo's page) works well now. In the past I always had issues with commit signing but that seems to have been fixed :) Handy for when you quickly want to make small changes to multiple files.
Woah....never knew that was a feature
https://buttondown.email/hillelwayne/archive/github-has-too-many-hidden-features/
Dec
is now the default type for fractional values. This may cause breakage in some cases. Also I suspect there are still a bunch of missing functions for Dec that are used in practice. The quick fix is to explicitly make literals of the f64 type, like 3.14f64
@Folkert de Vries If you aren't looking into Dec sin/cos impl, I may look into it. Sounds like an interesting side project. I have always wondered how they are implemented in software.
I'm not working on that, and don't plan to
yessss that's awesome!
very interesting reference counting technique, just published in 2021! https://dl.acm.org/doi/10.1145/3453483.3454060
direct link to paper: https://dl.acm.org/doi/pdf/10.1145/3453483.3454060
progress on aarch64 dev backend (elf only though)
this is gen_num
Summary [ 64.943s] 144 tests run: 19 passed, 125 failed, 709 skipped
now just needs a lot more instructions, but the scaffolding works
also the rust library we use has very limited options for macho relocations. it's not clear whether that is fundamental or whether they've just not implemented it yet
also the dev backend repl just got merged, so on x86 linux, the repl should now be much faster!
@Luke Boswell and I got it to work on macos too now, and fixed some other issues. Now there is some bug with branches that went unnoticed on x86 and then we'll have almost all of gen_num working already. (still some float stuff and 128-bit stuff to do)
yoooooooooo
Amazing work, you two!
I just discovered Agner Fog's optimization resources page (which is incredible btw) has a comprehensive explanation of calling conventions on all the major OSes :exploding_head:
That page continues to gain content. Yeah, it is pretty great
I've generated an object file from a test using e.g. ROC_DEV_WRITE_OBJ=1 cargo nextest-gen-dev add_checked_dec
and then built this into an executable using zig build-exe /var/folders/48/39th9k0n0wdcj18k3yhm_g5c0000gn/T/app.o crates/compiler/builtins/bitcode/builtins-aarch64.ll something.zig
. Note the ROC_DEV_WRITE_OBJ
flags roc to write out the object file and print the location of the file.
But when I run step through the file from test_main
it seems to never get into the #UserApp_main_482528569654279254
. Instead it branches off at 0x100000804 <+52>: b.ne 0x100000850
and then returns successfully without calling the actual test code I'm interested in I think.
(lldb) target create "something"
Current executable set to '/Users/luke/Documents/GitHub/roc/something' (arm64).
(lldb) b test_main
Breakpoint 1: where = something`test_main, address = 0x00000001000007d0
(lldb) r
Process 12932 launched: '/Users/luke/Documents/GitHub/roc/something' (arm64)
Process 12932 stopped
* thread #1, stop reason = breakpoint 1.1
frame #0: 0x00000001000007d0 something`test_main
something`test_main:
-> 0x1000007d0 <+0>: sub sp, sp, #0x60 ; =0x60
0x1000007d4 <+4>: str x30, [sp, #0x58]
0x1000007d8 <+8>: str x29, [sp, #0x50]
0x1000007dc <+12>: add x29, sp, #0x50 ; =0x50
(lldb) disassemble --frame
something`test_main:
-> 0x1000007d0 <+0>: sub sp, sp, #0x60 ; =0x60
0x1000007d4 <+4>: str x30, [sp, #0x58]
0x1000007d8 <+8>: str x29, [sp, #0x50]
0x1000007dc <+12>: add x29, sp, #0x50 ; =0x50
0x1000007e0 <+16>: ldr x17, [x17]
0x1000007e4 <+20>: stur x8, [x29, #-0x8]
0x1000007e8 <+24>: stur x17, [x29, #-0x10]
0x1000007ec <+28>: ldur x0, [x29, #-0x10]
0x1000007f0 <+32>: bl 0x100000770 ; roc_setjmp
0x1000007f4 <+36>: stur x0, [x29, #-0x20]
0x1000007f8 <+40>: stur x1, [x29, #-0x18]
0x1000007fc <+44>: ldur x17, [x29, #-0x20]
0x100000800 <+48>: cmp x17, #0x0 ; =0x0
0x100000804 <+52>: b.ne 0x100000850 ; <+128>
0x100000808 <+56>: stur x17, [x29, #-0x10]
0x10000080c <+60>: bl 0x100000920 ; #UserApp_main_482528569654279254
I suspect it might be because I have the type of RocCallResult.value
wrong?
const std = @import("std");
const RocCallResult = extern struct {
tag: u64,
error_msg: u64,
value: bool,
};
extern fn test_main() callconv(.C) RocCallResult;
pub fn main() u8 {
const value = test_main();
std.debug.print("done {}\n", .{value.value});
return 0;
}
This is the test that I am trying to investigate
#[test]
#[cfg(any(feature = "gen-llvm", feature = "gen-wasm", feature = "gen-dev"))]
fn add_checked_dec() {
assert_evals_to!(
indoc!(
r#"
Num.addChecked 2.0dec 4.0dec == Ok 6.0dec
"#
),
true,
bool
);
}
I'm pretty convinced this is my set-up and nothing to do with the test failing, tried the same method on a good test and had the same issue. :sad:
on aarch at the moment we don't use all of the rocresult stuff. instead I call/break on extern fn roc__main_1_exposed() callconv(.C) u128;
the issue here is something with how the return value is passed. it is too big to fit into registers, and we allocate stack space wrong: it overwrites the stored frame pointer and link register. that's how far I got last night
on the pi at least I've got all of gen_num now https://github.com/roc-lang/roc/pull/5824
weird thing: if you use the github UI to rebase a PR then the commits are unverified
but if you e.g. adjust the readme from the ui then the commit is verified (at least it used to be)
Yeah, I noticed that. No idea why. Can rebase a second time locally and that should verify the commits, but kinda defeats the purpose of the GitHub you button.
@Richard Feldman and @Anton re Windows, I think there is still a couple of issues which are causing most of the test failures. I suspect the issue is in strings and lists. It's a little slow making progress with the tests running slowly due to issues with caching on windows.
About the "issues with caching", do you mean locally or on CI?
Locally, when you run tests on windows it looks like zig has to rebuild the builtins and link the app every time
we know the cause of these issues right?
For a test which returns RocResult::ok(RocDec::from(6))
what would the equivalent zig type be? I'm having trouble tracking down something related in our builtins. I.e.
const RocCallResult = extern struct {
tag: u64,
error_msg: u64,
value: RocResultOrSomething??,
};
I think this has done the trick
const RocCallResult = extern struct {
tag: u64,
error_msg: u64,
value: RocResult,
};
const RocDec = extern struct {
num: i128,
pub fn addWithOverflow(self: RocDec, other: RocDec) WithOverflow(RocDec) {
var answer: i128 = undefined;
const overflowed = @addWithOverflow(i128, self.num, other.num, &answer);
return .{ .value = RocDec{ .num = answer }, .has_overflowed = overflowed };
}
};
const RocResultTag = enum(u8) {
RocErr = 0,
RocOk = 1,
};
const RocResultPayload = extern union {
ok: RocDec,
err: u8,
};
const RocResult = extern struct {
payload: RocResultPayload,
tag: RocResultTag,
};
glad you figured something out.
I have encountered a load-bearing dbg!
in my Rust code - the program does something differently depending on whether I do a dbg!(list.len())
:sweat_smile:
Probably means you are forcing the optimizer to not remove code
relatedly: anyone who has valgrind set up want to pair on a basic-webserevr
bug sometime? :big_smile:
I'm also seeing fun stuff like if I remove an if Bool.false then
things work differently, so some UB somewhere seems likely to blame
Rust achievement unlocked:
= note: import resolution is stuck, try simplifying macro imports
This was during the upgrade to rust 1.72. I'm done for today, if anyone else wants to take a stab at fixing the import resolution error, by my guest :)
https://github.com/roc-lang/roc/pull/5856
Important sidenote: this error pops up during cargo test --release --no-run
Just a note, clippy seems to be messing up the readability of a lot of the indoc tests.
I don't know how to fix the import resolution so I'm going to upgrade to rust 1.71.1 first and not let clippy autofix things :p
this could be really useful when we get to writing a debugger for roc
! https://www.timdbg.com/posts/writing-a-debugger-from-scratch-part-1/
til https://en.wikipedia.org/wiki/X86_debug_register
Richard Feldman said:
this could be really useful when we get to writing a debugger for
roc
! https://www.timdbg.com/posts/writing-a-debugger-from-scratch-part-1/
I am keeping my eyes peeled waiting for when it is time for writing the debugger.
@Richard Feldman when will that be?
no concrete plans yet, but there's nothing blocking it as far as I know! Is it something you're interested in working on?
Yes but I don’t have the technical know how to take on that task yet, It would take me some time to level up before.
A message was moved here from #compiler development > yes but I don’t have the technical know how to pick on su... by Rasheed Starlet.
I think we would first need good debug info generation before we would be interested in our own debugger
Like our code should work well with a regular debugger first
I'm doing a Software Unscripted episode with Matt Godbolt and he said "we should absolutely get Roc on Godbolt"
also he tried the nightly on a fresh Ubuntu install and when he put "hello world" into the repl and got a panic about missing a dynamic library
Hmm, we do test every nightly's repl on ubuntu, but this is github's workflow runner ubuntu so its dependencies are different. I happen to have a clean ubuntu vm so I'll try to reproduce.
I was able to reproduce it, I'll make an issue for the error and another one to test nightlies on all kinds of docker containers to mimic fresh installs.
awesome! I relayed this to him (we just finished recording) and he asked if there's something he can quickly apt-get install
to fix it :big_smile:
I'll look at it right now
The amount of time that would be saved if the "No such file or directory" actually told you what file or dir it was looking for... :p
one of my coworkers who has been helping me with getting Roc incorporated into certain parts of the build just started trying it out, and DM'd me a bunch of questions for things he ran into as a beginner, followed by:
Richard, what have you done, now I just want to Roc(k) hahaha
we made a good thing! :heart:
Richard Feldman said:
we made a good thing! :heart:
that people love
awesome! I relayed this to him (we just finished recording) and he asked if there's something he can quickly
apt-get install
to fix it :big_smile:
I'm not sure what part of this script fixes the issue but it does :p I'll look into a less bloated fix later.
wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 13
sudo apt install llvm-13
does not work btw
it might be missing certain features? what features are enabled for their official builds seems very random
why is cargo slow, you ask? well maybe because they just casually perform many small allocations in a recent PR?
https://github.com/rust-lang/cargo/pull/12751/files
it's just really weird to read that code?! it's so easy to remove most of those allocations
Anton said:
awesome! I relayed this to him (we just finished recording) and he asked if there's something he can quickly
apt-get install
to fix it :big_smile:I'm not sure what part of this script fixes the issue but it does :p I'll look into a less bloated fix later.
wget https://apt.llvm.org/llvm.sh chmod +x llvm.sh sudo ./llvm.sh 13
sudo apt install llvm-13
does not work btw
Thanks! I'm surprised llvm isn't statically compiled into the roc
binary though :face_with_raised_eyebrow:
also surprising: I believe he's on x64, and ran into this in the repl...but the latest Linux x64 nightly should use the dev backend in the repl, not LLVM! :thinking:
Doesn't matter the backend? Probably still loads the same shared libraries
Though maybe somehow it failed to the load the dev backend generated shared library for the repl.
Also, yeah, llvm can't be dynamic. Otherwise we would see way way more versioning issues, right?
I would certainly think so! :sweat_smile:
Yeah, I think it's something the llvmsh script installs, not llvm itself
Though maybe somehow it failed to the load the dev backend generated shared library for the repl.
Yes indeed
This is interesting: https://github.com/simd-everywhere/simde
Nice :)
Yeah, I think it's something the llvmsh script installs, not llvm itself
Alright, found it, ld was missing, this can be fixed on Ubuntu with:
sudo apt install binutils
I've added this to the linux getting started as well:
https://github.com/roc-lang/roc/pull/5872
Does anyone know if it should be possible to import URL packages in the platform main file just like an app? I haven't been able to get it working. I'm getting errors like
thread 'main' panicked at '[Qualified("json", json.Core)] not in {} ', crates/compiler/load_internal/src/file.rs:2211:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
it should be, but I remember there's some bug with it from way back :sweat_smile:
might need to vendor it for now as a workaround
https://twitter.com/VictorTaelin/status/1710766199288570079 I saw the discussion about compiling Elm and making fast graphics apps and thought maybe this be relevant to Roc.
Fascinating. It would be cool to have Victor Taelin on Software Unscripted @Richard Feldman
What does the alignment mean in this function on a RocList pub fn decref(self: RocList, alignment: u32) void
?
Alignment is a memory loading concept. In aggregate data, it tends to be the size of the largest thing in the aggregate.
This ended up being a good mini talk on why llvm is often a pain: https://youtu.be/g1qF9LZOoFE?si=7tifU78yyT1K2ZZy
yes fun talk!
wow, I had largely given up hope that this would ever land! :sweat_smile:
https://reviews.llvm.org/D86310
Wow yeah, nice
Trying to compile from main and I think there may be something up with the nix configuration. I haven't seen this before.
192-168-1-105:roc luke$ nix develop
192-168-1-105:roc luke$ cargo build --release --locked
Compiling roc_repl_wasm v0.0.1 (/Users/luke/Documents/GitHub/roc/crates/repl_wasm)
error: failed to run custom build command for `roc_repl_wasm v0.0.1 (/Users/luke/Documents/GitHub/roc/crates/repl_wasm)`
Caused by:
process didn't exit successfully: `/Users/luke/Documents/GitHub/roc/target/release/build/roc_repl_wasm-ba13b2d4af2101b5/build-script-build` (exit status: 101)
--- stdout
cargo:rerun-if-changed=build.rs
cargo:rerun-if-changed=src/repl_platform.c
--- stderr
thread 'main' panicked at 'Output {
status: ExitStatus(
unix_wait_status(
256,
),
),
stdout: "",
stderr: "wasm-ld: error: Unknown attribute kind (86) (Producer: 'LLVM16.0.6' Reader: 'LLVM 13.0.1')\n",
}', crates/repl_wasm/build.rs:48:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Looks like you are still pulling in new zig from when you used the updated llvm branch?
Or something newer in your stack
That or a cached file that was built when you were on an updated branch....something along these lines
Thank you @Brendan Hansknecht cargo clean
did the job!
from https://nibblestew.blogspot.com/2023/10/the-road-to-hell-is-paved-with-good.html
module;
#include<evil.h>
export module I_AM_A_DEFINE_GOOD_LUCK_FINDING_OUT_WHERE_I_COME_FROM;
sometimes I really wonder what is happens behind the C++ committee doors
Really interesting presentation from Modular/Mojo developers: https://www.modular.com/blog/mojo-llvm-2023
Clever how the surface syntax almost immediately desugars to an MLIR. The slides on JIT architecture are super interesting too, especially the idea regarding shipping packages with bytecode for eval
I thought their ideas of taking control of more llvm passes and then running llvm in parallel on a per function level is quite intriguing
How to work around the beast of llvm
apparently part of how bun is so fast! https://github.com/simdutf/simdutf
and nodejs!
ooo :eyes: https://twitter.com/__protected/status/1715693892933153144
at the end of that thread are links to papers, including: https://2023.splashcon.org/details/splash-2023-oopsla/47/Getting-into-the-Flow-Towards-Better-Type-Error-Messages-for-Constraint-Based-Type-I
Is there anyone online who has an apple silicon Mac and is able to test something for me? I have a graphical "hello world" in this gist and am interested to know if it works for others or if there are issues. It should be a prebuilt-package so just work, as you need the latest zig to re-build the platform. Next step is to figure out what I need to cross-compile to various platforms so I can pre-build those object/archive files too.
Oh, it should spit out a zigimg.png
file in the local directory where you run it from using roc run test.roc
I can try it in 15 min
"just worked" for me!
https://dl.acm.org/doi/pdf/10.1145/3607858
that trick to also store the color of an RBTree in the pointer (because you only need one bit for the node type) is cool
you want very flexible niches on the one hand, on the other hand we need to generate code (and maybe also debuginfo) for that and that is hard
Yeah I agree. The codegen is painful, both to implement and debug
I feel like there must be some fundamental theorem that shows optimal-niche-finding is np-complete. it feels the same as instruction selection.
Maybe useful to us at some point:
https://www.npopov.com/2023/10/22/How-to-reduce-LLVM-crashes.html
I've got another build of basic-graphics gist of example here; I think I have managed to include in the bundle a build for for the following targets:
Just wondering if anyone could test this for me? and let me know if you have any issues.
I would like to clean up the API a fair bit, but hoping I have all the parts together and working now.
EDIT: It does't work on Linux, I need to do more work to make that just work.
I think we’ll check in the language server in. Please review it or merge here: https://github.com/roc-lang/roc/pull/5937
Also, the first CI job appears done in the details page, but the status check didn’t update. @Anton any ideas?
Needed some tweaking to get the tests running on my current macos machine; fixed here: https://github.com/roc-lang/roc/pull/5938
If folks have thoughts on whether we should go for the xcode-select --print-path
approach instead, that'd be useful.
lots of sweet stuff in here, but at the end he notes that you can make 32 bit pointers that Just Work on a 64-bit system, which immediately makes sense, but which had never actually occurred to me! https://www.youtube.com/watch?v=H8THRznXxpQ
Oh I watched that video a few weeks ago but never got as far as that 32-bit trick! Will rewatch it. I've been taking his course since the start, enjoying it.
Wow yeah so memory addresses are virtual and you can force them to be under 2^32 for your process as long as you don't actually use more than 4GB. Crazy.
So you could shrink all pointers in your data structures from 64 to 32 and fit more stuff in cache.
I never heard of this before and suddenly I think it's weird that more people don't do it!
Do you guys know that rust has not small string optimization at all? It also can't ever have small string optimization due to how it defines its apis. Vec guarentees that it is always heap allocated. And String guaretees it can always seamlessly convert to a Vec without copying any data. As such String must also be heap allocated.
I guess the main arguments are:
&str
is a huge win. So SSO is often not actually used/needed.Make me wonder how our perf would be affect if we dropped SSO and instead only had seamless slices. That said, &str
should be strictly higher perf than seamless slices, so maybe that wouldn't map to Roc in the same way.
Cause &str
is in the type system and avoids runtime checks. It also doesn't have to deal with any sort of refcount.
Really old and long threads on this:
The https://github.com/roc-lang/roc/blob/main/getting_started/macos_apple_silicon.md states that we do not yet support MacOS 13, is this still true?
Oh, that is outdated now that we updated zig
I'll fix that
Semi-big development update: we're going to halt development on lambda sets for now, in favor of getting boxed closures to work across the board, so that we can unblock development on effect interpreters.
this means closures will be heap-allocated, which unfortunately means worse runtime performance. However, this is actually something we want to try to speed up dev builds in the long term anyway (that is, heap allocating closures), so switching to having them all work that way will give us more data points about the runtime cost in practice as well as some idea of the effect on compile times
I bet that was a tough call, sounds wise to move things forward.
Does the boxed closure contain a function pointer or a tag union or both?
just function pointer
maybe we could keep the current functionality behind a flag like --experimental-closures
or something, to make it easier to compare perf impact and compile times with and without
Ok, where's the captured data stored?
Is it a box containing a structure of function pointer and data?
yeah exactly
I hope llvm can still inline them ok.
Actually, it’s a bit more efficient than that. instead of boxing the pointer, it’s stored as a fat pointer. A boxed closure is a three-word record - one for the pointer, one for the closure data (or NULL), one for the ref counter function (or NULL). I have more details in https://www.notion.so/rwx/Type-Erasure-a3ed13ef1305422eba00dbda026e52b3?pvs=4
Also, just curious, why this over still keeping our static dispatch with a switch statement, but with just boxing the data. I thought mostly the data was the issue, not the function pionters.
We still end up needing the effective equivalent of lambda sets in that case
ah. ok.
Also, for the refcounter function, is it a function that could be called by the host to increment the refcount of all closure captures such that the host can call the same closure multiple times?
I think it could be, but I haven't fully thought that through. The host would need to manage the lifetime though
If you have any friends working on type theories or PL PhDs, would appreciate if you share this with them: https://github.com/roc-lang/roc/issues/5969
https://abseil.io/docs/cpp/atomic_danger#performance-considerations very interesting
just use mutexes I guess. that should keep things simple in nea
https://blog.rust-lang.org/2023/11/09/parallel-rustc.html
we should try and see if this helps our project
this is actually really surprising to me, especially with the heavy multi-pass LLVM backend. i guess part of it might be explained by Rust trait selection/method selection being pretty involved. But it’s something we should be mindful of-right now all inference in our compiler is single pass, intentionally, and making it multi pass could have effects like this
It just sounds like they manage to run more llvm threads at the same time cause other stuff finishes sooner, right?
So that is why llvm is faster
i mean in principle the unit of parallelism between the front end and backend should be the same though
Pretty sure this is why it is faster 8 instead of 1 thread generating llvm ir at the boundary of frontend and backend
Eight of the LLVM threads start at the same time. This is because the eight "rustc" threads create the LLVM IR for eight codegen units in parallel. (For seven of those threads that is the only work they do in the back-end.) After that, the staircase effect returns because only one "rustc" thread does LLVM IR generation while seven or more LLVM threads are active. If the number of threads used by the front-end was changed to 16 the staircase shape would disappear entirely, though in this case the final execution time would barely change.
https://utcc.utoronto.ca/~cks/space/blog/programming/GoModulesAndDomainExpiry
Let me be clear that this is a hard problem in general and no one has a good answer to it
or at least no one had a good answer to it... :wink:
(to be fair, we also don't have a solution for URL-based packages to the problem of "the domain expired or got taken over and now I don't have a good way to notify people about where the new package lives)
but we do have a good answer to the security problems, and not just "any new owner of your package's URL has limited scope for being malicious" or "in theory they can't publish a new version (with malicious code) and have it automatically picked up by current users, because existing users will stick with the current version until they specifically update (new users of the package are not so lucky)"
very interesting issue with the derived eq implementation here
https://github.com/rust-lang/rust/issues/117800
in this case changing the codegen gave a 10% speedup
https://github.com/bevyengine/bevy/pull/10519
really interesting real-world comparison of monomorphization and dictionary passing for parametric polymorphism
https://planetscale.com/blog/generics-can-make-your-go-code-slower
Is this glue issue an easy fix? @Brendan Hansknecht do you know?
Screenshot-2023-11-15-at-12.43.29.png
My work around is to delete the broken parts, but it's really fickle and is breaking CI for the webserver. I haven't figure out a good way to resolve for all the architectures
Probably easy to fix
I'm generating using roc glue ../roc/crates/glue/src/RustGlue.roc platform/glue2 platform/main-command-glue.roc
https://github.com/roc-lang/basic-webserver/pull/3
I'm not seeing the error. How do I trigger it?
I guess it's only happening on CI machines then?? https://github.com/roc-lang/basic-webserver/actions/runs/6871760208/job/18689150356
that
If you regenerate the glue using this command roc glue ../roc/crates/glue/src/RustGlue.roc platform/glue2 platform/main-command-glue.roc
then in glue2 folder rust analyser should complain
that's an interesting discepancy between aarch64 and x86_64
cool. I'll fix that
Also, what is the difference between glue
and command_glue
? command_glue
doesn't look like it is fully generated with regular glue commands?
looks modified
Yeah, so that is another workaround. If you run glue, it doesn't generate any of the types for the Command module.
ah, yeah, fun
I think it is because the mainForHost is mainForHost : Request -> Task Response []
and neither Request
or Response
include anything to do with Command
module
It feels like it is the same issue as https://github.com/roc-lang/roc/issues/5477 just manifested in a different form. That not all the types are available or something
can you try with the rocresult-traits
branch that I just pushed?
So far looks good. Just pushed to CI
That worked :tada: thank you
Morphic reference implementation source code: https://zenodo.org/records/7712285 (a 6.6gb download because it's a docker image)
This repository contains a compressed docker image file, containing the artifact for the PLDI 2023 paper "Better Defunctionalization through Lambda Set Specialization." To use this artifact,first decompress the file (using
tar
or an archiving program like 7zip), and then usedocker load
to load up the decompressed docker image. Information for reproducing the results from our PLDI paper is available in README.md files in the 'morphic/' and 'LSSIsabelle/' directories inside the Docker image. Note! the Docker archive must be decompressed and then loaded with 'docker load' (not 'docker import', as our archive does not use squashed layers).
Loading this tarball into docker is heavy
Anybody else able to load it? Im getting open /var/lib/docker/tmp/docker-import-143820044/repositories: no such file or directory
Oh I didn't gunzip
What tooling do we need to compile a platform? Specifically if someone new was wanting to work with basic-webserver do they only need rust's cargo
and a roc nightly?
That sounds correct
having lots of fun with my web server that never allocates
memory allocation of 97165916604719789682: bytes failed
Today I learned you can also use >
to quote on zulip, just like on github.
No need for the triple backticks with quote
Are there any plans to roll roc_fn into glue? I found that a really useful tool for basic-webserver. Would it be a good idea to copy it across into basic-cli too?
Oh, was that written manually and not generated by glue? I thought that was part of folkerts new glue work.
that is the plan yes
I found the solution to #6088 using the cursor.so editor, I described the problem, told it where to start looking and it found the bug :)
Definitely an easy bug but I'm still impressed.
@Folkert de Vries and I have been investigating issues with slow tests on Windows and have posted a question on the Zig discord. Basically we have identified that linking a dynamic library using zig build-lib is much slower than we expected -- 1,793ms vs 81ms.
issue https://github.com/ziglang/zig/issues/18123
ooh, virus scanner sounds interesting
yeah maybe try playing around with that
but I'm sceptical, we clearly see zig rebuilding that mingw stuff
if it helps, Andrew noted that:
it's expected for the first time linking a dynamic lib against libc on windows to take a while to build the assets
the cache namespace is determined by the zig version, target, and some other CLI flags
Llvm is so fun... :crying_cat:
https://twitter.com/DrawsMiguel/status/1729021572395286744?t=Jv1_8zERJt6Nb0qwp-_oGg&s=19
Also, good read: https://muxup.com/2023q4/storing-data-in-pointers
another fun read, on simd tricks https://mcyoung.xyz/2023/11/27/simd-base64/
it's long but good, and a cool look at the state of std::simd
though I tried it on some other code and ... I'm confused and certainly from a code size standpoint can do better myself
How do we go from a region to a line number?
Oh, I found LineInfo
, I think I figured this out.
The backlog of things I want to work on keeps growing. I think I'm gonna need a bigger whiteboard to track this:
tasks.jpg
ME now: Why are my roc apps ooming???
ME an hour ago: disabled refcounting in the roc compiler for testing something.
Does anyone know what dynhost.pdb
is, and where I might find it? I have generated a binary for Windows using --profiling
and am running it in the debugger. Without using profiling I haven't seen this file requested before. I can still disassemble and step though the binary without it, but I'm guess this has more information related to the source.
it's where the debug info lives
Is there a way to generate it?
we don't have a way to do it I think. it should be the same one as for the host?
in other words, when the host gets compiled, it should spit out this file for itself too (zig/rust/ect should). maybe you can just rename it and use that?
Just saw a link to the inko programming language. Interesting idea on memory management. All values have one owner. When the owner goes out of scope it is dropped. Default is move semantics when assigning to a new variable and such (to keep all values having exactly one owner). You can make as many references to a value as you want (mutable and immutable). If an object is freed and references are used later (maybe if they exist at all), it will crash.
So doubling down on everything has exactly one owner like you usually get in rust via ownership and borrowing, but without complex tracking. No need to rc or GC. Some minor tracking to crash at runtime for use after free.
Trying to be like rust but a lot less complex and faster compiler times, but for that tradeoff, they lose some compile time guarantees and add runtime crashes. And then also a lot less runtime overhead than languages with rc or GC....not a tradeoff I would make, bit interesting to see
I met its author at the rust meetup here, but we did not have a lot if time to discuss details . Would be interesting sometime
(they are funded by nlnet, which also funds my nea work)
Today I learned about Rust's sanitizer. If miri helped us find bugs, this may be useful as well.
miri can't really handle programs that aren't written to be run with miri; it is very limited
e.g. as soon as you do some extern fn
thing it just gives up
so an actual sanitizer is probably much more useful for us
is that a thing in Rust? :thinking:
is that a thing in Rust? :thinking:
Well, yes, I linked to the sanitizer earlier :sweat_smile:. But perhaps I misunderstand your question.
ohh I thought based on the miri comment that the link used miri too!
I think we should add a "Developing Platforms" guide for the website, and include discussion or summary about the current state of things.
After our plugin meeting earlier this morning, I had the thought that it would be helpful to communicate some of the known issues (like glue being in-development) and what the plan/vision is. I can draft something if this would be helpful.
In future we can update this guide when platform development is more mature.
sounds good! I think there may be other informational things floating around - maybe ask in channel?
and yeah I'll publish it this weekend probably
Richard Feldman said:
and yeah I'll publish it this weekend probably
The Strings language reference :smiley:
yeah, that one :big_smile:
The Strings language reference
The examples repo has sort of become a place for both examples and more in-depth explainers as well, should we just put it there? It's useful for users to have just one place they need to go to to find the answers to their questions.
interesting, although I think the language reference should really go on roc-lang.org rather than in a repo
The examples are on roc-lang.org at www.roc-lang.org/examples :)
Perhaps we can find a better name instead of "examples", but I don't think we should further split up our "knowledge libraries", we already have the tutorial, docs, examples, faq, website pages and everything spread out on zulip and github.
maybe we could just add lang ref entries in the same sidebar as the buildin modules
and maybe this string reference could go at the top of the Str
module
I want to have separate articles on things like how conditionals and pattern matching and such work, which don't fit into any particular builtin module, so I think we need more than just those - but maybe they can be located in the same place
and maybe this string reference could go at the top of the
Str
module
Yeah, that's a good spot for it, I do think we should then do something like add a ToC at the top, or collapse the headers by default. Scrolling to search for information is annoying.
I want to have separate articles on things like how conditionals and pattern matching and such work, which don't fit into any particular builtin module, so I think we need more than just those - but maybe they can be located in the same place
One option is to to give these articles their own space and to combat fragmentation; create a powerful search page on the website that can search through faq, examples repo, articles, tutorial, zulip and github. The one-stop page for Roc knowledge :)
I think it makes sense to have at least these things separated:
as a reader, I have different motivations for wanting to view each of those, and I would prefer to have them separate
I think if we have some documentation that only exists in examples, that's a bug :big_smile:
I can see arguments for and against having all the language docs (both builtin modules and also the rest of the language reference) in the same place, but I like the idea of trying out having both in one place
as a reader, I have different motivations for wanting to view each of those
I agreed with this at first but then I thought I could definitely see a user searching through all three (tutorial, examples, docs) to learn about a specific concept in Roc.
I like the idea of trying out having both in one place
It does seem like that would require substantial remodeling of roc-lang.org/builtins to make it well integrated.
Finally got around to the new rob pike talk about what go got right and wrong. This stuck out a ton: https://youtu.be/yE5Tpp2BSGw?si=aSKCgRkpGb-Ugk9c&t=2166
Huge change that Rob really wishes that they had done: default int type should be an arbitrary sized int type. Of course, you can still have sized integer types with that.
interesting - the first benefit he mentioned was "security" - which I assume is because Go has wrapping overflow by default
I don't really understand what he's proposing :sweat_smile:
I guess something along the lines of "when doing any arithmetic operation, on overflow upcast to arbitrary int and try again" but Go is statically typed - so what happens to the integer after that?
he says "you just don't think about integer overflow anymore"
like what happens if I have a struct with a u8 in it, I take it out and multiply it by a gazillion and then try to put it back?
seems like just kicking the overflow can down the road? :face_with_raised_eyebrow:
He is just saying that the default int
type would be an arbitrary precision int that never overflows
Probably would use 1 but to signify if It is a pointer to the real int or just an int stored locally on the stack.
You would still get overflow if you use the uint8
type.
His point is just about changing the default to something safer and without overflow
Just looking at some things for roc docs
and I'm wondering what we should use for the platform name? Currently we could use the platform "name" ...
but under the module params proposal that changes to the following which no longer includes a string in the module header.
platform package [Stdout, Stdin, echo, read]
requires [main]
provides [mainForHost] to "prebuilt-hosts/"
packages [
Foo, Bar, Baz from "https://…",
Something as Smt from "https://…",
]
hosts [
echo : Str -> Task {} [],
read : Task Str [],
]
The root path is usually something like platform/main.roc
which isn't very descriptive
I kinda wonder if we should have an overall name on the page at all
as opposed to just a list of modules, and possibly some customizable introductory text (e.g. the doc comment for the package module)
I don't mind if we dont, I guess the main thing I was wanting was to be able to include doc comments in the platform module which don't currently generate
I noticed there was an issue for displaying platform name in docs, I went down a rabbit hole looking into making a PR for that.
Finally got around to the new rob pike talk about what go got right and wrong.
My key takeaway so far is that Roc needs a fun cartoon mascot :big_smile:
https://www.youtube.com/watch?v=PgUsmO0YyQc&list=PLCiAikFFaMJrgFrWRKn0-1EI3gVZLQJtJ
Eisenberg is so good at explaining the modal extension
Isaac Van Doren said:
Finally got around to the new rob pike talk about what go got right and wrong.
My key takeaway so far is that Roc needs a fun cartoon mascot :big_smile:
DALL-E thinks it should be a total bad boy:
roc-mascot.png
Rocco the rockin’ Roc mascot :guitar::big_smile:
this rules hahahaha
or Rocky? After all, Roc was created in Philadelphia
Rocky is good too
There’s always the option to make the mascot a bird made of rocks
I'm looking for help to find a name for a concept... most roc platforms will generate an executable, so roc build
on your application and you have a program you can execute.
Some platforms are different and they are for making a plugin, so roc build
produces a library instead which will ultimately be loaded and used by another program.
Is this a fair distinction between two use cases? are there others? What would be good names for these? Executable Platforms and Embedding Platforms?
I'm trying to research what I can to collect infromation for a guide on platform development
roc build
basic-cli produces executableroc build
basic-webserver produces executableroc build --lib
roc-plugin-example produces dynamic libraryroc build --no-link
roc-fuzz produces static libraryroc build --no-link
roc-wasm4 produces static libraryConcept-wise I would not use the word "plugin" to describe non-application artifact.
This is a new one, I added an extra Task and it fixed an error in alias analysis. https://github.com/lukewilliamboswell/roc-wasm4/blob/8eb900bddbe2cd49c12b0dab8227eee09e3e1d7f/examples/rocci-bird.roc#L266-L268
In this specific case, N
Tasks is fine, N + 1
is broken, but N + 2
is fine again.
Nice to have a workaround for this one :)
I'm taking some time off and will be back on the 26th of January :wave:
I'll check my mentions and direct messages on zulip once a day for urgent stuff.
Enjoy yourself!
enjoy! :smiley:
If I read this correctly:
Error, expected type "nothing", found type "complicated nothing"
expected type '()', found type 'union { ((),), ((),) }'
Hello, I'm checking in on Roc progress and I see it's able to build the compiler on Windows now which is awesome. Is the roc compiler itself able to work on Windows yet?
@Luke Boswell would know a lot better, but it looks like we are getting close to enabling all the codegen tests: #6408
I've been using roc on Windows. All the LLVM tests pass now as of last night. :grinning: There is at least one bug with the surgical linker on Windows so you have to use --linker=legacy
. Most of the current platforms have been developed without support for Windows, so there is a bit of work to upgrade them and test them out. The roc-wasm4 platform works really well though, and I've not had any issues with my zig platforms. I think we might want to track down and fix the surgical linker before we make a release?? though it hasn't been discussed yet.
Am I using this arg right?
d:\External\Roc>target\debug\deps\roc build --linker=legacy Roctris\Roctris.roc
thread 'main' has overflowed its stack
Yes that looks right. What platform is roctris using? Ive had similar issues using basic cli
It's a custom CLI platform from before the CLI example was a thing. Yea I get the same error when typing any expression into the repl
Can try a release build. Otherwise, getting rust to dump a back trace may help.
Definitely something fishy going on. Here I'm telling it to build an empty file:
d:\External\Roc>target\debug\deps\roc Roctris\Roctris.roc
←[36m── MISSING HEADER in Roctris\Roctris.roc ───────────────────────────────────────←[0m
I am expecting a header, but got stuck here:
←[36m1←[0m←[36m│←[0m
←[31m^←[0m
I am expecting a module keyword next, one of ←[32minterface←[0m, ←[32mapp←[0m, ←[32mpackage←[0m
or ←[32mplatform←[0m.
Different error when building with release:
d:\External\Roc>target\release\deps\roc build Roctris\Roctris.roc
🔨 Rebuilding platform...
An internal compiler expectation was broken.
This is definitely a compiler bug.
Please file an issue here: https://github.com/roc-lang/roc/issues/new/choose
thread '<unnamed>' panicked at 'Error:
Failed to rebuild src/main.rs:
The executed command was:
rustup run nightly-2023-05-28 cargo build --bin host
stderr of that command:
error: toolchain 'nightly-2023-05-28-x86_64-pc-windows-msvc' is not installed
', crates\compiler\build\src\link.rs:1414:21
stack backtrace:
0: std::panicking::begin_panic_handler
at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library\std\src\panicking.rs:593
1: core::panicking::panic_fmt
at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library\core\src\panicking.rs:67
2: roc_build::link::preprocess_host_wasm32
3: roc_build::link::rebuild_host
4: roc_build::program::build_file
5: <anstyle::style::StyleDisplay as core::fmt::Display>::fmt
6: <anstyle::style::StyleDisplay as core::fmt::Display>::fmt
7: std::sys::windows::thread::impl$0::new::thread_start
First issue looks to be just bad color codes for the terminal
Second looks to be a real error. I think windows requires nightly currently for some rust stuff
If you can share your platform impl and that app I could have a look?
some cool debugger tricks here https://www.youtube.com/watch?v=PorfLSr3DDI&list=WL&index=1
is there an up-to-date example C platform?
examples/platform-switching/c-platform/host.c
should be
Have we ever discussed having fuzzing within roc, maybe using something like expect
? How might something like this work in the future?
I know it was discuss at some point, but no idea where currently. We also discussed property based testing some.
Probably a nice property based testing framework will be more useful to most users. Given roc is a safe language fuzzing should be less needed except for low level internals testing. Though both are useful and can be used together to some extent.
@John Murray did you make any progress with this line of effort?
Hmm...I think I could modify the roc fuzz platform to make a nice property based testing setup with multiple tests.
Of course baking it into roc would be best. Cause then you don't need a platform at all.
Luke Boswell said:
John Murray did you make any progress with this line of effort?
Not much, ran into some issues with having the platform call arbitrary roc closures.
I think integration into roc would be the best approach imo
One thing that is a bit sad is that fuzzing tools are not easier to setup and use. In a perfect world, all property testing tools would be based off of a fuzzing engine. Cause fuzzing uses coverage in the actual source code to explore inputs better.
Like property tests will never generate a proper gzip header (without explicit user guidance), but a fuzzer will.
That said, property testing is faster and generally doesn't store a bunch of data to disk. Fuzzing has to build a corpus to run well.
I thought this was really cool https://ziglang.org/news/announcing-donor-bounties/
huh, interesting idea!
Feels off to me. Sure, they spend the donations on paying contributors, but the money isn't going to the specific person who fixes the bounty. That person may not be on zigs payroll at all.
So it's kinda like. If you fixes this bug, you also make us money.
It's not about bugs, it's about features. I see it like this: if I want to spend some of my free time to implement something for a cool project, then I might as well make the project some money by implementing a bounty. If I spend my time that means I like the project, therefore I would be happy if the project would have more funding (even if I don't get any of it).
I totally get that.
Maybe it is more of a problem with how they worded/presented it. By talking about how the money is going towards paying contributors, it makes it sound like people who fix the issue or add the new feature will get some part of the value. They are contributors after all. This is only true if that person happens to be on zigs payroll
So I would label it as misleading at best. Not any sort of mal intent, but as a company donating, I think it would be easy to miss that point.
I'm interested in how this turns out. How the whole zig community is reacting to this and if there even will be a significant amount of bounties.
For sure
Hi, I have been conversing with @GordonBGood on the Elm Discourse and got onto the subject of how Roc optimizes away imutables and copying values sometimes by directly modifying structures on the heap. I have also been reading the papers on how this is implemented in Lean4 which seems to have quite a sophisticated implementation of it.
I was hoping to compare with the code for Roc.
Grepping the source I found this: https://github.com/roc-lang/roc/blob/main/crates/vendor/morphic_lib/src/api.rs#L176
Which comments on it, but isn't the implementation.
Sorry, I am not a Rust programmer so not sure how to find my way around... Can anyone point me to where the implementation is? Hoping there are maybe some comments I can read too to understand how it works, or failing that any docs or other sources I can consult (like a GitHub Issue or Pull Request)?
Thanks - first post here. I thought maybe I should post this in #beginners since I cannot even really claim that level of Roc ability!
I have only a limited knowledge but I'll share what I can!
This is not really in just one place. There are some aspects in the compiler and some in the standard library built-in functions.
The built-ins are easier to follow. Search for isUnique
in the Zig code in this directory:
https://github.com/roc-lang/roc/tree/main/crates%2Fcompiler%2Fbuiltins%2Fbitcode%2Fsrc
(And if you're wondering how/why we use both Zig and Rust, look in the FAQ!)
One of the compiler features is reuse of allocations. Search for the Reuse
variant of the monomorphic IR in crates/compiler/mono
https://github.com/WebAssembly/WASI/tree/main/preview2#wasi-preview-2
Looks like WASI Preview 2 just launched. This will be fun to play with. :laughing:
@Rupert Smith our implementation is now based on the one in Koka, but before that we basically implemented the Lean4 strategy. The current approach is better for algebraic data types, but has some nasty edge cases for our built-in lists and strings. So we'll need to move to more of a hybrid of the two systems.
the core of our logic is in crates/compiler/mono/src/inc_dec.rs
and some related modules in the same folder like reset_reuse.rs.
this is all based on a bunch of papers that I think are good for getting the ideas (going from the paper to an implementation is really hard though, so that's why it is very helpful that the lean and koka sources are available).
https://arxiv.org/pdf/1908.05647.pdf
https://www.microsoft.com/en-us/research/uploads/prod/2020/11/perceus-tr-v1.pdf
https://www.microsoft.com/en-us/research/uploads/prod/2021/11/flreuse-tr.pdf
https://www.microsoft.com/en-us/research/uploads/prod/2023/05/fbip.pdf
are the papers in chronological order
(we like these papers, have read them already, and are always happy to talk about them)
Thanks for the help - now I got some more papers to read!
Interesting that all these papers are in the last 3 years - I might have thought this to have been a significant research area in FP in the 90s, whem computers were slower and had way less memory. I guess we didn't have the compiler infrastructure (LLVM etc) that we do now, and also a lot of research focussed on the theory side. MLton was highly optimised and never did this? But then ML did have mutation so I guess not needed.
The authors of the "counting immutable beans" paper suggest that their work is only scratching the surface of what ought to be possible.
Although, looking at the bilbiographies I see work by Hoffman in 2000 referenced and lots of other stuff around that time. I guess its just a topic that has come back for another round of consideration.
it's an old idea, but does not work in most languages because it is so easy to create RC cycles: in languages with mutation it is trivial (java, ocaml) and in haskell laziness relies on cyclic values. So in most PL tradictions RC just did not make sense
until we got some pure, strict languages
@Brendan Hansknecht raised a good point in https://github.com/roc-lang/basic-webserver/issues/23
Specifically the part about
We need to implement the Body and Buf trait on RocList type. That will enable us to hand the roc list off to hyper with hyper reading directly from the list. Again, no need to copy data or reallocate. The implementation should be pretty simple and the same as would be done for a vector of bytes in rust.
I was thinking maybe we should add this to RustGlue.roc
for the benefit of all. Brendan pointed out that Buf
and Body
are hyper specific, but maybe there is a generic rust trait that would be more suitable.
Looks like we get Buf
through Cursor. To get Cursor just requires AsRef<[u8]> which we should be able to implement in general.
So that is at least part of the story with just a builtin trait
I can add an issue for this if we want to update RustGlue to generate this
I think I understand the idea enough to write up an issue
Huh, apparently inline expectations always run even if they aren't checked: https://github.com/roc-lang/roc/issues/6456
they are also not implemented in the repl/dev backend. bit of a shame
can someone one aarch64 figure out what LLVM is actually complaining about here
https://github.com/roc-lang/roc/actions/runs/7711240861/job/21016231307?pr=6463
it works fine on all the other targets, so it's odd
I can run it locally, would it be helpful if I provide the IR file? or I can dig into the problem on my own but it will be very ineffective I think
anyway, let me check, maybe I can do smth about it. at least I already reproduced the problem. it's clear that it's an inconsistency between the expected return type and the actual one
it helped:
pub fn powC(arg1: RocDec, arg2: RocDec) callconv(.C) i128 {
return @call(.always_inline, RocDec.pow, .{ arg1, arg2 }).num;
}
should I create a pr?
let me check how that affects x86 real quick
that works, I'll just amend my commit and force push that. thanks for looking!
how do we validate that this fix is correct? https://github.com/roc-lang/roc/pull/6476 it adjusts some CSS, I'm assuming it's correct but actually have no idea
I'll take a look at it tonight
What is our recommendation for one off error types?
In this specific case, I am implementing TryFrom
to go from a target_lexicon::Triple
to a roc_target::Target
. target_lexicon
has tons of triples we don't support. So the error type would literally just be a single element enum if I create a full error type. Just UnsupportedTriple
.
I guess since there is only one possible failure, I could also just implement it with Err(())
If you make a single variant enum then it's equivalent to ()
but at least there's an informative name in the source code. And it's easy to extend later if needed.
I think that's our recommendation at work for Elm and Haskell code.
True. Wish I had roc tag unions in rust.
Just feels kinda strange to make a one off super specific error enum. But I guess it doesn't hurt anything that it exists.
pub enum TargetFromTripleError {
TripleUnsupported,
}
SIMD is wild: decoding a bunch of UTF-8 bytes faster than memcpy, and that's only using 128-bit SIMD
https://twitter.com/mitchellh/status/1754645531312435584
I remember chatting with someone who is working on future CPUs and also in memory mini CPU like devices. He made a big complaint about memcpys being horridly slow compared to how fast it could be.
I heard there was some plan to do dedicated CPU instructions for memcpy in some mainstream chips
like intel or arm or something
Intel has rep mov
With the right CPU and extensions for microcode, I think it tends to be a reasonable speed. That said, a number of CPUs it is worse than the generic simd versions
I would assume arm has an equivalent
I think risc v has a generic use the largest simd you have then let me deal with the little bit of leftover bytes type vector instructions.
The CPUID feature "ERMS", "Enhanced REP MOVSB", means you should use
rep movsb
for any memory copy that's at least 128 bytes. The CPUID feature "FSRM", "Fast Short REP MOVSB", implies ERMS and additionally means that you should userep movsb
for any memory copy, even if it's shorter than 128 bytes.
This is starting with ice lake where rep mov should be pretty competitive but still isn't always the best on Intel.
From a stack overflow post they suggest that a fully optimized and prefecthed sse2 memcopy can still be ~25% faster for large memory copies. That said it is decently old and theoretically rep.mov should be even better now. That said sse will be consistent across CPUs, rep mov will not, so I think a lot of implementations defensively use sse.
Digging around a bit more, depending on exact CPU, small memcpys (which is most memcpys) that use rep mov
could have anywhere from 7 to 50ish cycle of startup latency. For something that is under let's say 64 bytes, it will generally still be way faster to just use 8 byte wide mov operations that target registers or similar.
This is also why the advice is to avoid memcpys if you know the size (and it is small). The cost of looping for something like this can be pretty darn heavy.
branch mispredictions strike again
Not just that, but like the actual number of extra instructions between the memory movement instructions. Each cycle might load 8 instructions. Those 8 instructions might be check, conditional jump, load, store, inc counter, jump to top, 2 unused after loop instructions.
Instead it could be 4 load and 4 store operations for the entire copy if the move is 32 bytes and copied with registers.
Loop unrolling of course helps some with this, but then you hit the branch misprediction issues more due to the loop running so few times.
:star: Achievement unlocked
Panicked during a panic. Aborting
Welcome to the club :clap:
https://www.bazhenov.me/posts/2024-02-performance-roulette/
Related to trying to deal with stuff like this: https://github.com/llvm/llvm-project/blob/main/bolt/README.md
So they updated one of the examples in the Mojo vs Rust blog and this is actually a really useful footgun in rust to know of:
In Rust, this is not a tail recursive function:
fn recursive(x: usize){
if x == 0 {
return;
}
let mut stuff = Vec::with_capacity(x);
for i in 0..x {
stuff.push(i);
}
recursive(x - 1)
}
https://rust.godbolt.org/z/7q3os1fsq
Another reason why rust is not really a true functional language (despite having many functional features). If you do a lot of true functional programming patterns you may hit cases like this that blow up.
Essentially, the drop function for the Vec
is run at the end of the scope. This means that it runs after the recursive call.
As such, the function can't have TCO.
Interesting :)
wow, I never thought about that!
https://zed.dev/blog/we-have-to-start-over really resonates with me!
I'd like to try out the llvm interpreter lli to help me debug something. Does anybody here have experience with it?
interesting comment about crate features and CLI parsing edge cases from the author of clap
https://lobste.rs/s/nqootu/sudo_rs_dependencies_when_less_is_better#c_zzvxru
What does expect-fx do exactly?
When implemented, it will allow you to mock effects so that you can run an expect on an effectful function
Thanks Brendan :)
slight correction: the idea is for that one to run the real effects
for integration tests and such
So regular expect will be extended to mocking effects?
https://docs.google.com/document/d/110MwQi7Dpo1Y69ECFXyyvDWzF4OYv1BLojIm08qDTvg/edit?usp=sharing under "Simulation Tests"
I think we need a separate keyword for that
the proposal uses expect-sim
Ah yeah
I'm seeing this error thread 'test::lookup_clone_result' panicked at 'misaligned pointer dereference: address must be a multiple of 0x8 but is 0x16dc05d74', crates/repl_expect/src/app.rs:57:45
. on this PR https://github.com/roc-lang/roc/pull/6586 but I'm pretty sure it is unrelated. (when I run cargo test
locally on my mac, I've pushed to CI to see if that is just my setup)
Also, I wasn't sure if I should have committed all the glue/tests/fixtures/. Would appreciate if someone could confirm if that is the correct thing to do when they are changed.
Is anyone familar with glue/tests/fixtures
able to comment on https://github.com/roc-lang/roc/pull/6586 please?
There were a bunch of files generated and I'm not sure if they should be committed. This is the header from one of them
# ⚠️ READ THIS BEFORE MODIFYING THIS FILE! ⚠️
#
# This file is a fixture template. If the file you're looking at is
# in the fixture-templates/ directory, then you're all set - go ahead
# and modify it, and it will modify all the fixture tests.
#
# If this file is in the fixtures/ directory, on the other hand, then
# it is gitignored and will be overwritten the next time tests run.
I'm guessing I should remove these from the PR and update our gitignore so these aren't included
Yeah, Don't commit them.
Luke Boswell said:
I'm seeing this error
thread 'test::lookup_clone_result' panicked at 'misaligned pointer dereference: address must be a multiple of 0x8 but is 0x16dc05d74', crates/repl_expect/src/app.rs:57:45
. on this PR https://github.com/roc-lang/roc/pull/6586 but I'm pretty sure it is unrelated. (when I runcargo test
locally on my mac, I've pushed to CI to see if that is just my setup)
I'm seeing this on my M1 mac too, but haven't seen it hit on CI.
ah, looks like its logged here: https://github.com/roc-lang/roc/issues/6100 (it passes when running with --release
(oof))
https://kobzol.github.io/rust/rustc/2024/03/15/rustc-what-takes-so-long.html
slightly more detailed benchmarks of what part of compilation takes so long in rust
looks familiar! :big_smile:
share_5249400518627343545.png
I remain very confused why nobody on the Rust team is even talking about doing a direct to machine code backend
that and linking are where all the compilation time goes!
Most surprising part of that article to me was how much work the rust frontend must do in release builds.
I expected the release and dev full compilation builds to be more starkly different due to llvm eating way way more time.
Instead we have llvm going from ~70% to ~80%
I guess to be fair, he said it was using the default 16 different units. So llvm could be running on up to 16 threads. So if it was single threaded it would be a lot slower.
yeah that makes sense :thumbs_up:
Found a good article on the backend code gen units. Apparently incremental builds set code gen units to 256. So really break up crates in hopes of making compilation able to cache more work: https://nnethercote.github.io/2023/07/11/back-end-parallelism-in-the-rust-compiler.html
Interesting
yeah that's what @matklad was telling me - more, smaller crates improves caching
https://twitter.com/mitchellh/status/1769143787862049013
I don't think LLVM automatically bit-packs booleans (let alone tag unions) does it?
no you'd have to do that yourself
there is repr(packed)
which might do some amount of packing ?
I thought that just meant no alignment padding
really interesting read about debuggers and libc! https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html
Richard Feldman said:
yeah that's what matklad was telling me - more, smaller crates improves caching
That's kinda surprising. If every crate is split into 256 pieces, you would think that small crate size wouldn't matter. At least not much.
well in the sense of like - as opposed to having big crates with lots of modules
because things are cached at the level of crates and not modules, so more crates means a higher % of the build that's cached
Sure, but assuming they are actually able to split into fine grain enough pieces (with incremental compilation having a goal of 256 pieces per crate), you would think that most things are split well enough to have decent incremental compilation. Cause even if you have 1024 functions in your crate, that is only compiling 4 functions per code gen unit. So a small change only has to recompile 4 functions (and things that depend on those 4 functions). I guess any crate 256 functions or less means function level caching and dependency graph.
I wonder how many functions the average crate actually is.
Good rant: danluu.com/slow-device/
Only skilled it so far. I'll have to give it a more detailed read later.
[...] a lot of forums are now inaccessible to people who don't have enough wealth to buy a device with effectively infinite CPU.
Interesting post, this is not something I've deeply thought about before.
I think this proposed forum design should do very well on those benchmarks! :smiley:
Wow, very interesting read.
I think about this very time I use a language server :sweat_smile:. Like this should be able to run on the worst device imaginable, it's actually one of the reasons I started working on developer tooling, seeing that many folks would find some of these tools impossible to use because of their "everyone has a fast PC" design.
On that, I can confidently say rocls runs comfortably on my laptop in its power saving mode with the CPU limited to 800mhz, but we can definitely improve caching to make that even better.
https://davidlattimore.github.io/posts/2024/03/18/wild-linker-march-update.html cool rust linker stuff
Finally working on roc again. Continuing my changes to how we handle targets. I am now super deep in letting the types guide me. Very much type error driven development. Kinda satisfying, but also such a gigantic footprint.
Also, I really want to remove target from type checking only flows. It shouldn't be needed, but ends up being required cause uses farther down the compiler need it. Might reorchestrate that. Would require passing it in later instead of storing it in a bunch of types. May not be worth it, but want to look into it more.
that makes sense!
yeah that's what @matklad was telling me - more, smaller crates improves caching
Not exactly: this is not as much about the _size_ of the crate, as about the topology of crate graph. If you have one thousand tiny crates which are linearly dependent on each other, this is more or less the same as one giant crate.
On the other hand, if you have a sort-of start topology with 1 central crate, and 10 mutually-independent supporting crates, than that's great for incremental.
Topology > Size
sure, I meant smaller just in the sense of breaking up a larger crate to be more granular necessarily means making it smaller
I think we should make the --linker=legacy
the default option for Windows for now as we know we have a surgical linker bug that segfaults. @Folkert de Vries what do you think?
well we never really investigated that bug right? on the other hand, using the legacy linker as the default might be pragmatic for now
I'm not opposed to it, anyway
Cranelift Rust frontend putting up some impressive improvements: https://www.williballenthin.com/post/rust-compilation-time/
Good read on bash pipes, benchmarking, and zig: https://mtlynch.io/zig-extraneous-build/
Shows a very easy benchmarking mistake to make with pipes.
It turns out that all commands in a bash pipeline start at the same time.
Weird...
yeah this is fun
$ time $(sleep 1 | sleep 1)
real 0m1.020s
user 0m0.002s
sys 0m0.006s
I'm surprised more people don't know this. Pipe can only store like 4k characters by default. The inputting app will get stuck on IO if the next isn't processing fast enough.
Also, you want them all to start at the same time so that it runs faster as a whole without eating all your memory buffering.
This is also why you can pipe through a programming like tee and see live output.
this is sweet! https://coredumped.dev/2024/03/25/bump-allocation-up-or-down/
Not sure if anyone here has read more details of the new apple hardware vulnerability, but I hate how nice things that make CPUs faster are almost always the source of security vulnerabilities: https://gofetch.fail/
In this case, this vulnerability was fully explored on apple hardware, but probably also affects the newest intel hardware.
It probably will also affect other new hardware if they don't make changes/fixes. Apple was just ahead of the game on adding data dependent memory prefetchers to their cpus (part of the reason why the apple M chips are so fast)
Basically when they load a cache line, they check it for things that look like pointers. If they find one, they prefetch it. Huge gain when loading a linked list for example.
That said, I don't know how much this vulnerability matters. It has to be exploited by a process running locally on your machine. If an attacker has a process running locally on your machine, you are probably already screwed.
https://blog.rust-lang.org/inside-rust/2024/03/26/this-development-cycle-in-cargo-1.78.html has some cool ideas for testing the styling of compiler output with svgs
whoooa, apparently Windows APIs are going UTF-8 by default! :astonished:
Brendan Hansknecht said:
Not sure if anyone here has read more details of the new apple hardware vulnerability, but I hate how nice things that make CPUs faster are almost always the source of security vulnerabilities: https://gofetch.fail/
Wow that exploit is fascinating: https://youtu.be/klhDbLV4Los?si=5Gc8RZf1B98aGV6o
I've been wanting a thing to help me print debug parser issues. I've been using snapshot tests which are great, but don't quite seem to be enough.
So I started threading a boolean through the parser functions and didn't really stop to think if that was a good idea. It started as a helper so I could pass in from the top of a module and then I could print debug for just that modules information. Just wondering if this would be useful to keep around, or if I shouldn't include in a PR -- there's a lot of parser functions and when you touch one it kind of explodes to all of them.
I figure something like this might be optimised out of any release build if it is a constant false
value and there was nothing using it.
For example I can do something like the following and it will only print out for the modules where I have passed in true
.
if print_debug {
dbg!(&result);
}
Probably a bad idea, but it was the only way I could think to do this.
A constant would be better. Optimizing away a parameter that is passed deep down the stack can happen but is unlikely.
Id recommend an env var or something
Check out roc_debug_flags and their uses elsewhere in the codebase
I just learned about RUSTFLAGS="-A warnings" cargo check
and I am SO HAPPY
it's cargo check
but where it doesn't print warnings, only errors
achievement unlocked: new Rust compiler error I've never seen before :laughing:
Screenshot-2024-03-31-at-12.41.37AM.png
Is it just me or am I looking at the same line twice:
-## This means the #U64 this function returns can always be safely converted to #I64 or #I32, depending on the target.
+## This means the #U64 this function returns can always be safely converted to #I64 or #I32, depending on the target.
yeah? unless there is some whitespace difference?
Ah, extra space at the end of the line.
I'm trying to run the fuzzer for the parser and a bit stuck with rust; does anyone have any pointers?
10:43:02 ~/Documents/GitHub/roc/crates/compiler/test_syntax/fuzz optional-unit-assign $ cargo +nightly fuzz run -j2 fuzz_expr -- -dict=../parse/fuzz/dict.txt
error: failed to parse manifest at `/Users/luke/Documents/GitHub/roc/crates/compiler/test_syntax/fuzz/Cargo.toml`
Caused by:
error inheriting `version` from workspace root manifest's `workspace.package.version`
Caused by:
`workspace.package.version` was not defined
Figured it out, and found my first bug :smiley: with the fuzzer.
Fuzzers are good at finding the same bug in many ways, so be a bit careful with that.
Does the surgical linker require position independent executables(pie) as input?
I would need to think about this more. That just means the executable can be loaded to anywhere in memory. That shouldn't strictly matter, but it probably does. My default thought is definitely yes.
I'm finally starting to look at the big refcounting change for lists again. Where we need to pass the element dec functions into list functions so that when a list is free, all of the elements are decremented. Instead of doing recursive increfs and decrefs that kill performance everywhere.
I am so glad that llvm ir is strongly typed. Makes it list out all of the function signature mismatches for me. That said, still have hundreds of tests to fix....will be a journey.
PASS [ 0.665s] test_gen::test_gen gen_list::basic_int_list_len
First heart beat. Got a super basic test passing. Now just to wire everything else up....
~70 passing gen_list tests and ~130 failing.... What is that 35% passing. Not bad for most functions not being wired yet.
yooooooo I am SO HYPE for this change!!!
Will probably be slow progress with mem leaks and segfaults, but pushing it forward
This is actually going surprisingly fast (though I have only touched the llvm backend so far).
gen_list: 208 tests run: 183 passed, 25 failed
All failures are now segfaults. Most are probably from #compiler development > Host Refcounting and rust not knowing how to free refcounted lists.
I'm just wondering what is going on with CI for this PR https://github.com/roc-lang/roc/pull/6587
It needed a new release of basic-cli, the bundle is made now, so I expect we'll be able to merge it today
basic-cli also typically uses nightly Roc, but in this case we needed to use the branch, so I had to make some changes for that.
Ahk, thank you. I wasn't sure. I thought it might be stuck or something.
Does anyone know what could be causing this error in CI?
error: cvt doesn't compile for this platform yet
--> /Users/m1ci/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cvt-0.1.2/src/lib.rs:22:9
|
22 | compile_error!("cvt doesn't compile for this platform yet");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error: could not compile `cvt` (lib) due to previous error
Or more specifically, how do we resolve something like this?
It seems like something that shouldn't have passed CI the first time, and not a flake.
I think it's unrelated to the parser/can changes on my chaining syntax PR.
I'm investigating #6641, and I'm curious why macros like skip_second
are macros when they could be functions - I've only verified that for skip_second
but it looks like its true for others like and
and between
. Wouldn't it be faster to compile a regular function? And any runtime performance loss from could be replicated with #[inline(always)]
.
this _might_ be a renmant of the earlier parser where we wanted to use functions but that caused the rust compiler to just grind to a halt
so if using a function now just works I'd say go for it!
the original problem was that types got too large but both the rust compiler and our parser implementation have improved in the meantime
Awesome, I'll see which ones can be replaced and if it compiles faster now I'll submit a PR
I think functions are preferable it they are not slower (so if compile time stays the same that's also fine)
yeah exactly
a lot of those used to be functions and then the rust type checker was giving "recursion limit reached" errors
the solution was to replace them with macros so that they didn't participate in type checking as functions
@Luke Boswell on what OS are you running this? Based on the relevant source it should work on macos, linux and windows...
See CI in this PR
I've investigated a bit and I can't figure out why it doesn't work on my mac.
It fails when building for wasm so the relevant source code makes sense at least. There were some changes to dependencies in that PR so I think cvt is now accidentally included in the wasm build and the cvt functionality doesn't make sense for wasm.
Ohk, so adding the error macros or test utils might be causing it?
Yeah, I expect one of them has cvt as a dependency
[package]
name = "roc_error_macros"
description = "Provides macros for consistent reporting of errors in Roc's rust code."
authors.workspace = true
edition.workspace = true
license.workspace = true
version.workspace = true
[dependencies]
[package]
name = "roc_test_utils"
description = "Utility functions used all over the code base."
authors.workspace = true
edition.workspace = true
license.workspace = true
version.workspace = true
[dependencies]
pretty_assertions.workspace = true
remove_dir_all.workspace = true
[dev-dependencies]
It may be further down the dependency tree, remove_dir_all probably does not make sense for wasm
Yeah so it's definitely remove_dir_all
[target.'cfg(not(windows))'.dependencies]
cvt = "0.1.1"
libc = "0.2"
I can probably do without that macro if that would be the easiest solution
Just compare strings normally. It's just to show pretty colors for the snapshots
I'm just using this `use roc_test_utils::assert_multiline_str_eq;
We could also split some test_utils off into test_utils_no_wasm, there we can put all the dir stuff in.
Merged the changes from https://github.com/roc-lang/roc/pull/6643 and that fixes the issue above. :tada:
It looks like glue has regressed somehow, or at least there is a bug in module imports which prevents us from generating the glue types for roc-wasm4.
$ roc glue ../roc/crates/glue/src/ZigGlue.roc platform/glue/ platform/main-glue.roc
── MODULE NOT IMPORTED in ../roc/crates/glue/src/../platform/Types.roc ─────────
The `TypeId` module is not imported:
37│ id = TypeId.fromU64 index
^^^^^^^^^^^^^^
Did you mean to import it?
I found a workaround by just copying the opaque type in directly.
Just discovered that using dbg
statements can mess with !
suffixes... needs further investigation.
Can you make an issue for that?
Can do. I'll also fix it, hopefully soon
Looking at timing of our compilation, almost all of the frontend time is spent in "Other". do we know what that is likely to be? Is it mono? Something else?
Ex:
0.036 ms Read .roc file from disk
0.045 ms Parse header
0.147 ms Parse body
1.059 ms Canonicalize
0.000 ms Constrain
1.395 ms Solve
0.058 ms Find Specializations
1.337 ms Make Specializations (Pass 0)
0.170 ms Make Specializations (Pass 1)
0.030 ms Make Specializations (Pass 2)
15.413 ms Other
make specializations is mono
might be coordination overhead, especially if there are multiple mono passes
Even our basic rocLovesZig
example where main is just a string spends most of the time in other across essentially all modules. Though I guess it is compiling all of the standard library.
timings
Oh, I think I see.
We are measuring time spent waiting on dependencies as part of compiling a module
List: 5.2ms
Dict (which depends on list): 10.6ms with 4.6ms other
Set (which depends on dict): 11.4ms with 10.3ms other
UserApp depends on everything else but is just a string and super trivial. So it has 14.317 ms
total and 13.651 ms
for other. I think the 13.651 ms
was waiting on all deps.
yeah that makes sense. the earlier phases cannot really be parallelized across the entire dependency graph, and even mono cannot really be, so the wall time for a single module is likely to be much higher than the actual work done for it
It the process of updating roc-json to the latest basic-cli release I've discovered a strange compiler bug I think. Haven't tracked down what the issue is, but have been able to isolate it to the package/Option.roc
module.
$ RUST_BACKTRACE=1 roc test package/Option.roc
thread 'main' panicked at /Users/luke/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bumpalo-3.14.0/src/lib.rs:1854:5:
out of memory
stack backtrace:
0: rust_begin_unwind
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
1: core::panicking::panic_fmt
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
2: bumpalo::oom
3: roc_repl_eval::eval::addr_to_ast
4: roc_repl_eval::eval::struct_to_ast
5: roc_repl_expect::get_values
6: roc_repl_expect::run::render_expect_failure
7: roc_cli::test
8: roc::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
It's only the last two expect
s in that file that have this issue.
Is there any particular reason we use a host.c
in basic-cli and build that using clang, and then link using ld to get a prebuilt binary when "rebuilding platform"? Im wondering if I could replace this with another crate to produce the static library using just cargo?
In the (near) future I'm hoping to remove this rebuilding platform from the cli, and so the platform needs to producs the prebuilt binary / static library itself.
yeah context is here: https://users.rust-lang.org/t/error-when-compiling-linking-with-o-files/49635/5
basically host.c is a workaround way to get a .o executable out of Rust, because apparently there's no way to convince pure rust to compile an executable into a .o file
or at least that's the only way we were able to figure out how to do it in that forum :sweat_smile:
Ok nice, thats good to know. Though we've since added support for using archives .a
and cargo happily produces them with staticlib crates, so we could do it all with cargo now I think.
IIRC, the problem isn't producing a .a/o
it's that the .a/o
needs to contain everything to generate the executable. I don't think cargo is willing to put a main
function in a .a
.
But please double check.
It is, here's an exMple https://github.com/lukewilliamboswell/basic-ssg/blob/8a11665efa2074b65a71a966d3ecde8d30b9ba9a/crates/host/src/lib.rs#L127
Though the issue then is using that library in a bin crate that also has a main, in which case there is duplicate symbols called main.
I'm not suggesting the way Its currently written in basic-ssg is the right way, because I think that was a mistake. But I think it's possible. So I am thinking we might be able to have another crate that depends on the core library and just adds a main fn, similar to how its done in main.rs
I'm working on https://github.com/roc-lang/roc/issues/6414, and have removed the platform rebuilding from roc.
I would like to clarify the best approach for dealing with our internal tests.
One approach I can think of is to include a script alongside each platform called build.roc
and when we run any of the tests, we first run this script which is responsible for rebuilding the host.
For most platforms, this is as simple as running zig build-lib -lc host.c
to produce a c-archive and then running mv libhost.a macos-arm64.a
to rename it for the current architecture.
This removes a significant amount of host rebuilding logic from the compiler.
Alternatively we could leave the current rebuilding host functionality, and rewire it so that it is called before running a test to ensure we have the prebuilt binaries available.
This is the WIP PR https://github.com/roc-lang/roc/pull/6696
Luke Boswell said:
One approach I can think of is to include a script alongside each platform called
build.roc
and when we run any of the tests, we first run this script which is responsible for rebuilding the host.
I think if the tests are in Rust it'll be best if we keep that logic in Rust, as like a test helper function
otherwise we get into situations like "the actual problem was that the .roc file to build the test's platform didn't run because that .roc file's platform was broken because of an in-progress change to the current compiler code base..."
No I think this is just for the cli tests where we have a platform and app.
Where we currently build the host anyway, except the logic to do that is in the compiler.
I think we could remove all of that logic from the compiler.
Yeah, this is planned, but never completed: https://github.com/roc-lang/roc/issues/6037
Ahk, thank you Brendan. I'll carry on.
I have no major thought on if the build script should be in roc vs in whatever the native language would use. I guess I prefer native language if possible so the platforms feel more familiar to people who use the host language. That said, a lot of the platforms are super tiny and some languages like c/c++ don't have clear build tooling. Also can't just use a bash script due to eventually needing to support Windows. So that pushes for build.roc
for simple consistency.
do you mean in tests in the roc-lang/roc repo specifically? Or for other projects?
personally I have no preference on what other projects want to use (if they want to use build.roc, great!)
Roc repo specifically
Other projects can do whatever they want
I guess I should also be more granular, come to think of it
I think it's also fine in roc-lang/roc's examples/ folder
to use build.roc
actually I guess it's probably fine to try out build.roc in the roc-lang/roc tests too and see if it's a problem in practice
Fundamentally, roc no longer builds the host. It is up to the host to decide how it will get built. Roc will essentially run as if --prebuilt-playform
is always true.
we can always change to something else if it's not working out well
Also, thinking about this more, I think I actually like forcing everything to be build.roc
. Then I don't need to figure out the build script/command. always just roc build.roc
For the roc examples specifically
For the changes I'm working on to remove the platform rebuilding... I've hit a problem now where I need glue gen to build the platforms, but to gen glue I need to be able to build platforms. It's a vicious cycle.
So I think the best approach is to make a release of the glue platform just like any other platform. We could a URL for the glue platform in the GH releases.
TBH I've been thinking about this for other reasons anyway. Currently whenever I use glue elsewhere I have to have roc repository cloned locally to reference the platform which isn't ideal.
seems reasonable to me! :thumbs_up:
I hadn't realised how big of a change this is. I apologise in advance for how large this PR is going to be. I'm trying to keep it reasonable, but I'm also trying to clean things up as I go and leave it in a better state.
For example, I've had to touch all of the roc_cli tests and update the platform implementations
no worries!
Does file ingesting work with the new syntax?
I'll log an issue because I think it's broken
https://github.com/roc-lang/roc/issues/6710
In the space of about 15-30mins I was able to update roc-json, roc-parser, roc-ansi, and roc-random to use the new syntax. Super quick and painless upgrade. Thank you @Agus Zubiaga for putting the effort in to make that possible.
According to https://github.com/roc-lang/roc/pull/6658#issue-2255059371, file ingestion should look like import "data.json" as data : List U8
Oh neat, just missing the type annotation causes that issue then.
I'll update the issue
Ah, interesting. Wasn’t the annotation already mandatory in the old syntax? I guess we can make it optional and have it behave as if you wrote:
import "data.json" as data : _
I agree we should have a nicer error message if we don’t, though
Agus Zubiaga said:
Ah, interesting. Wasn’t the annotation already mandatory in the old syntax? I guess we can make it optional and have it behave as if you wrote:
import "data.json" as data : _
I think that's what we should do, yeah!
I like the type annotation being optional here, because type annotations are optional for all other values :big_smile:
Hm, this is not as easy as I thought. The way solve works for ingested files means that:
import "data.json" as data : _
is effectively the same as:
import "data.json" as data : List U8
Instead, I think we want data
to be inferred as Str
or List U8
from usage.
agreed!
and then give a compile error if it infers to neither of those
would you like help doing that?
Yeah, I'll probably need some because I don't have any experience in this part of the compiler.
ok cool
I'd say we should do it in 2 parts then
like make one PR to enable the functionality if you write out the annotation
and then we can do the inference separately
I'd like to use this feature in the live coding on Tuesday :grinning_face_with_smiling_eyes:
(even if it doesn't do inference yet)
Cool! I'll make a PR with the first part today.
If it's ok, I'd like to continue working on params before shifting focus to the inference part
unless you think it's gonna be really easy
it might be easy but might also have surprises haha
so leaving it for later seems fine!
Made an issue for the second part
Here's the PR for the first part: https://github.com/roc-lang/roc/pull/6717
Separately, I am also going to improve the parsing error messages for imports
@Anton do the CI machines have roc on their PATH?
I think we should make a repository called "rfcs" or something similar to keep design documents of the sort typically shared in ideas. I used to do it in Notion, and I know Richard sometimes does them in Google Docs, but for me at least writing in google docs is quite laborious. It would also be good to centralize this. Any thoughts?
yeah I totally agree! Someone talked about this at some point in the past but I don't know how much progress was actually made on it :sweat_smile:
i’m just gonna make it
Anton do the CI machines have roc on their PATH?
No, I expect that would lead to tricky bugs with unexpected different versions
Ok, well I've been converting the platforms to build using roc as the scripting language. Maybe we should use bash instead then? I think Roc is nicer for the scripting, and will be even better soon with builtin Task and we can move a bunch of duplicate logic into a package.
I don't see how bash fixes anything
We could build the compiler first and add roc to the path at the start of the CI workflow
Don't you require calling roc to proprecess platforms and what not
So even if you don't do roc build.roc
, you still have a call to roc in the equivalent build.sh
script
So either way, the user has to build roc and add it to the path (or make it otherwise accessible)
We could build the compiler first and add roc to the path at the start of the CI workflow
Yeah, I think we should do that for the e2e tests.
Well the current approach is that each test will rebuild the platform when running the example. So it doesn't really use roc as a separate fork or child process or anything.
Anton said:
We could build the compiler first and add roc to the path at the start of the CI workflow
I'll have a crack at doing that.
it's really cool how many people in #beginners are looking to make their own platforms :smiley:
Yeah, I think it's kind of a novel spark that creates opportunities
really interesting article about summing floats: https://orlp.net/blog/taming-float-sums/
Oh, I have a fun tidbit with floating point summation.
Floating point sum reductions (same thing as the summation) happen in machine learning quite often. In a neutral net that I was debugging, I noticed that one ml framework had significantly less accurate results for the neural net as a whole. After digging into the network, I realized that the results deviated the most when reaching a reduction. I originally assumed the framework was cutting corners in reduction in the name of performance. It wasn't. Turns out that the framework actually had the most accurate floating point reductions. So a more numerically accurate floating point reduction made the neural network as a whole noticeably less accurate. Why? The neural net was trained with a less accurate reduction and had optimized the weights for a less accurate reduction.
I wonder if we should make List.sum
do some of this
Definitely could. Though if floats are for speed, I don't think we should really care about summation accuracy. If anything, I would argue for dumb simd and more speed instead of accuracy.
yeah I was thinking the same :thumbs_up:
On this commit in roc-htmx-playground
I got the following... 30 seconds to build on my M2 macbook
Code Generation
5737.190 ms Generate final IR from Mono IR
16199.271 ms Generate object
21936.461 ms Total
Finished compilation and code gen in 28207 ms
Produced a app.o file of size 999952
Finished linking in 1115 ms
0 errors and 13 warnings found in 29325 ms
And this commit is even slower and now has a runtime crash...
thread 'tokio-runtime-worker' panicked at 'The Roc app crashed with: Erroneous: Expr::Call', src/lib.rs:46:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Code Generation
13051.603 ms Generate final IR from Mono IR
35184.772 ms Generate object
48236.376 ms Total
Finished compilation and code gen in 66731 ms
Produced a app.o file of size 965568
Finished linking in 4643 ms
0 errors and 13 warnings found in 71376 ms
These aren't optimised builds... just running with $ DB_PATH=test.db roc --time src/main.roc
Trying to figure out what's causing this... I think I'm going backwards :sweat_smile:
0 errors and 6 warnings found in 179487 ms
<-- 180 seconds
Looks to be mac specific. With my (tiny) linux server
0 errors and 6 warnings found in 3671 ms
Ok, so I've now got it working normally 0 errors and 6 warnings found in 2853 ms
I've got no idea what was causing that.
I cleaned basically every cache I could find. Upgraded my MacOS version, updated all the packages I could find, and rebuilt roc and roc_ls from latest main using nix.
wow super weird
Yeah, I was a little worried for a bit. I suspect I had done something strange like built roc from a branch somewhere or there was some combination of things in caches. Glad it's back to normal.
Actually, I think it's still a problem. It's like every time I compile my app it gets a bit slower, and slower to build.
hm, what's really odd about that is that we don't currently store anything on disk during builds
well, I guess there are changes in between these builds, right?
Yes.
Any cache's in particular I should experiment with?
so it's possible that it's something pathological about the source code itself
like it might not be related to the state of the computer itself
since we've seen (and fixed) pathological cases in the past
I definitely "fixed" it when I rebuilt everything, and it was fine on my linux server. I'll keep playing with it and see if I can find any correlations between things.
I am doing a lot with Tasks in basic-webserver, so it could be something related to nesting those maybe.
I think I know what the issue might be. The language server looks to be hogging all the CPU
If I turn it off I'm back down to a more normal 0 errors and 0 warnings found in 18753 ms
yikes, that's still a ton of time though!
that's for roc check
?
That's roc build. Check is sitting around 1800 ms
someone is having some more ambitious ideas about rust compiler performance, finally https://docs.google.com/document/d/1pE3UV-LUQnZyJCjcL6Kl5RBlj7VVk1Q3EXQmvn43w0I/edit
a Software Unscripted episode with matklad is one of the references
I haven't actually listen to this yet, but I was told it is good and plan to listen later. Thought it would be a good share: https://learn.microsoft.com/en-us/shows/Seth-Juarez/Anders-Hejlsberg-on-Modern-Compiler-Construction
Two other interesting reads. Mojo is testing new areas related to simple ownership (or at least more automatic and less explicit)
Found a new class of bug I think Roc app crashed with: voided tag constructor is unreachable
commit
I think i have a functional (but ugly) implementation of borrow inference that seems to pass all of our tests https://github.com/roc-lang/roc/pull/6849
that needs some cleanups, but hopefully this'll fix a bunch of performance problems
I've been helping @Sam Mohr with the builtin-task changes.
We were experiencing some hard to track down bugs on basic-cli... which we suspect are from rust glue generated code in roc_app
.
So we used a PR on roc-platform-template-zig to test the implementation instead. It looks like task as builtin is good to go. :tada:
We had issues not that long ago with the glue types in basic-cli when implementing the API changes just before SYCL. I had removed a lot of them around that time.
Sam and I have been working on removing the remaining few glue types in #221 and replacing these with roc_std or hand rolled rust types. This should eliminate this as a significant variable for debugging, and hopefully even eliminate the bug and unlock builtin task.
amazing!!!
It turns out that Tasks aren't super hard to implement because there's nothing going on under the hood, really, it's just a lambda returns a result. They just happen to work very well as a contract for IO management, which we can enforce for free using opaque types!
I've been upgrading platforms for Task as a builtin... and just couldn't wait to test out platform independent packages (even though we don't quite have module params yet).
They work :tada: and it's super :cool:
# /package/main.roc
package [
ReversePrint
] {}
# /package/ReversePrint.roc
module [line]
line : Str, (Str -> Task a b) -> Task a b
line = \msg, echo ->
reversed =
msg
|> Str.toUtf8
|> List.reverse
|> Str.fromUtf8
|> Result.withDefault "BAD UTF8"
echo reversed
# example application
app [main] {
pf: platform "platform/main.roc",
test: "package/main.roc",
}
import pf.Stdout
import test.ReversePrint
main = ReversePrint.line "Roc loves Zig" Stdout.line
And we get "Roc loves Zig" reversed :smiley:
$ roc test.roc
🔨 Rebuilding platform...
giZ sevol coR
Runtime: 0.027ms
That's awesome! I can't wait to rewrite roc-pg with Task as a builtin and params
This is a WIP PR to replace cli flag --prebuilt-platform
with --build-host
.
It's a bit of a different direction to effectively remove platform rebuilding from the cli for end users.
Still only a draft -- there's a few tests failing and some linux things to fix.
I'm also trying to add some documentation in the build pipeline and refactor as much as I can.
Just wondering if anyone with Nix experience can look at this and has any intuition for what is causing this to break in CI? It works for me locally https://github.com/roc-lang/basic-cli/actions/runs/9769507618/job/26969069029?pr=194
Actually -- I haven't tried running it in the shell yet. I'll do that
Ok, I can repro... now to try and learn nix
ChatGPT to the rescue :smiley:
Possibly quick question: anyone know why AnnotatedBody
s exist in the AST, instead of being separate type def + Body
? What's the intuition behind why those are paired up?
I dont know sorry.
I think @Folkert de Vries might remember? That’s a super old node :big_smile:
I think that's for making can easier, and possibly error message generation too?
so, it was convenient at the time
if there's good reasons to refactor that now, then if we can keep generating good error messages I don't think there is a reason not to do that refactor
just a note, AnnotatedBody
is the only way to define a type annotation for a variable right now
A good high-level introduction into lambda sets (roc was mentioned! )
https://www.youtube.com/watch?v=CYcf02fTE8E
woah, Mae Milano is one of the authors of the paper and Morphic. I already knew her by this brilliant talk: https://www.youtube.com/watch?v=Mc3tTRkjCvE
yes that strangeloop talk is on my list of talks to watch before giving a talk: on a second (or more) watch you start to notice some of the timing and structure that makes it so good while it treats a bunch of hard technical subject matter
Been working with a lot of retail distribution data lately, and there is one distribution center owned by UNFI called UNFI ROC (Rocklin, CA) - always makes me think of this community
Sponsorship opportunity? :thinking:
hahahaha
Maybe they could provide yohgurt for the next meetup
I will say the retail supply chain could use a lot of love in terms of software products, so totally maybe
Ironically, I'm currently staring at Roc code that attempts to improve the situation in supply chain management
Not sure it will help UNFI though
Well, if your code makes yogurt...
we should chat haha
So not that it is particularly meaningful, but I finally have actually committed to llvm. I have a single commit that adds like 10 lines to expose and mlir c API that I wanted for debugging.
How quick is their release cycle? Will you be able to use it anytime soon without building LLVM from source?
My job builds llvm from source, so should be pulled in for our weekly update. Otherwise, llvm releases like every 6 months
For a concrete date Sept 3rd is the planned next llvm release.
And they don't cut from main until July 23rd so my or will make it in.
Luke Boswell said:
Ironically, I'm currently staring at Roc code that attempts to improve the situation in supply chain management
The Italian guide dude from the SYCL Conf from the Zig crew that we sat down had actually worked on supply chain models software. Maybe he can give you some directions.
Does anyone know how the benchmarks work? I'm trying to figure out how/where they get built for the run in CI
Found it. crates/cli_utils/src/bench_utils.rs
It looks like roc build --bundle
doesn't pick up and include any windows binaries even if they exist. :sad:
Found the issue, easy fix -- need to add .lib
as a recognised legacy host file type
Does anyone know how to find a rust dependency that is using Tcp? I've got basic-ssg building on Windows -- but when I go to link it there are a bunch of missing symbols that all look related to Tcp.
I suspect I might need to turn a feature off on a dependency using a feature flag.
cargo-tree
can be a good place to start
It's built in to cargo
now
I've narrowed my issue down to definitely just a missing library when linking on Windows
Does this look like an issue I should report?
Listening on <http://localhost:8000>
2024-07-12T07:21:16Z GET /dashboard
thread 'tokio-runtime-worker' panicked at 'The Roc app crashed with: Can't create record with improper layout', src/lib.rs:46:5
I know what caused the issue, at least in the roc source code.
It should have been this
baseWithBody = \content, navBar -> Generated.Pages.baseWithBody {
content,
navBarRtl: navBar,
headerRtl,
isWhiteBackground: Bool.true,
}
But instead I was doing this
baseWithBody = \content, navBar -> Generated.Pages.baseWithBody {
content,
navBar,
headerRtl,
isWhiteBackground: Bool.true,
}
Roc turns this into a runtime error and happily roc build
s and outputs the built file -- which means I don't pick this up until it panics at runtime.
But because roc returns Warnings as a non-zero exit code I've been ignoring the exit code and just checking the app was built. I probably need to revisit this strategy.
I'd love a way to say, don't roc build
if there are any errors. I think this should be the dafault actually.
I'd love a way to say, don't
roc build
if there are any errors. I think this should be the dafault actually.
I've had this thought as well and mentioned it on zulip before but I can't find the conversation anymore...
I'd say the problem is that we want both ways to be discoverable and want the user to be aware if their command ignores errors or not.
https://github.com/roc-lang/roc/issues/6637
I think after we land the rebuild host changes that are almost done, this will be the next step. There's just been a lot of preparation to get to this point and enable ^^&
I wonder if we can improve upon the name run
to make it obvious that it runs even with errors, some suggestions:
Well in that issue we have
- Note
roc run
androc dev
commands removed as redundant
I don't think we need it anymore -- if we want to be able to pipe input into just roc, or run without any arguments, we otherwise would be back in the situation of having multiple alternative ways to run a script if we keep run
subcommand.
Note
roc run
androc dev
commands removed as redundant
Oh ok, I assumed it was still there given the "run workflow" title. I'll read through the whole thing in a bit
I'd love a way to say, don't
roc build
if there are any errors. I think this should be the dafault actually.
Ok, so looking at the issue, this is already the plan...
intriguing!
https://github.com/nicholassm/disruptor-rs?tab=readme-ov-file
The library also supports pinning threads on cores to avoid latency induced by context switching.
whooooooa
https://github.com/mazeppa-dev/mazeppa
this is absolutely wild
The lambda normalizer also shows us how to incarnate higher-order functions into a first-order language. In Mazeppa, we cannot treat functions as values, but it does not mean that we cannot simulate them! By performing a metasystem transition, we can efficiently implement higher-order functions in a first-order language. Along with defunctionalization and closure conversion, this technique can be used for compilation of higher-order languages into efficient first-order code.
@Ayaz Hafiz @Folkert de Vries imagine if we could adopt this for --optimize
and use it to defunctionalize instead of lambda sets, then use heap-allocated closures in dev builds
I wonder if it's guaranteed to defunctionalize fully :thinking:
real question is whether they've solved the problems we have. e.g. is it efficient/possible to perform the analysis on a per-module basis?
we can spin a cool story about lambda sets just ignoring the practical problems we face as a production language
if it's only for optimize builds, maybe it's ok if it's whole-program only?
I really think programs just get too big for that to be a practical solution long-term
without being able to parallelize? or do incrementally with caching?
yeah here the details start to matter, and I wonder if they have run into that sort of problem with their implementation
caching etc could go a long way though, but I think that only gets investigated once performance problems come up
sure haha
but the idea of having one process that does all of these and more is pretty intriguing:
Interesting / amusing bug -- if you name an Effect tmpDir
roc won't expose it for some reason.
Hmmm, having trouble getting an effect that just returns a string to work someTmpDir : Effect Str
Like, I'm not going crazy here right? This should be possible?
# Effect.roc
someTmpDir : Effect Str
# Dir.roc
tmpDir : Task Str []_
tmpDir =
Effect.someTmpDir
|> Effect.map Ok
|> InternalTask.fromEffect
$ roc build --no-link examples/hello-world.roc
0 errors and 0 warnings found in 116 ms
while successfully building:
examples/hello-world.o
$ objdump -t examples/hello-world.o | grep "roc_fx_someDir"
$
Ok, not going insane, misspelled something
I've been seeing this bug a fair bit today. I thought I might log an issue for it as it's really strange
https://github.com/roc-lang/roc/issues/6913
Basically, expect
is failing for values which should be equal.
I love that a release includes a hash of all the files.
I just checked if I needed to update the glue platform package as it's been a while -- re-generated the bundle and it's got an identical hash. :smiley:
I'm thinking about the chain of alloca
related issues. cc: @Folkert de Vries
With our joinpoints, we guarantee that everything that lives for multiple iterations is explicitly passed to the jump expression. As such, anything that is alloca'ed in the middle of the loop created by the joinpoint should be safe to have it's alloca be part of the entry block. Correct?
It' just that at the jump instruction, we have to copy out of any temporary allocas and into the allocas for the joinpoint. Otherwise, we may run into mutation bugs where the the the next run of loop mutate values that became joinpoint args.
I think that is roughly correct.
If so, I think that means we can hoist all allocas period to the entry block. Then cleanup jointpoints/jumps and hopefully have fixed a number of bugs.
context #6434
So turns out I should have tried to fix this a long time ago. Turned out to be pretty easy:
#6916 is ready for review
Also, this cut a whole minute off of 1brc
in roc. llvm
really does optimize better with all allocas in the entry block.
Benchmark 1: ./1brc data/measurements_1_000_000_000.txt
Time (mean ± σ): 118.589 s ± 0.643 s [User: 116.121 s, System: 1.720 s]
Range (min … max): 118.009 s … 119.280 s 3 runs
Benchmark 2: ./1brc-old data/measurements_1_000_000_000.txt
Time (mean ± σ): 188.869 s ± 3.279 s [User: 186.282 s, System: 1.635 s]
Range (min … max): 185.188 s … 191.478 s 3 runs
Summary
./1brc data/measurements_1_000_000_000.txt ran
1.59 ± 0.03 times faster than ./1brc-old data/measurements_1_000_000_000.txt
Also, most of the flamegraph is now allocating for RocList::extend_from_slice
So platform primitives instead of roc itself.
This may not help at all, but on the refactor-host branch of basic-webserver I was able to get a working executable by using zig within the nix shell, and compiling my app using --no-link
.
zig build-exe ./target/release/libhost.a app.o -lc -lunwind -fstrip
This is a workaround for the linux musl and linking issue, that I doubt is really bothering anyone, but sharing just in case.
I suspect our linking issue with basic-cli may be related to -lunwind
which isn't included by default in our roc linker flags. I'll try and test that more later.
libunwind
looks to be a dependency of the backtrace
crate
So assuming the numbers in this repo are correct (which I expect them to be), we apparently shouldn't be using linear search anymore. We should be using monobound binary search: https://github.com/scandum/binary_search/
This is thinking about the many cases in the compiler where for small maps, we attempt to use vec map and linear searching. Hmm, though that doesn't take into account the cost of keeping the list sorted so you can binary search.
since we use VecMap
in so many places, might not be a big effort to swap its implementation for something which does that, and just see what happens
True
We just landed task as builtin...didn't we...That is probably why my code using task suddenly broke on main?
── UNRECOGNIZED NAME in ../sorting/builtin.roc ─────────────────────────────────
Nothing is named `Task` in this scope.
56│> Stdout.line!
57│> (
58│> if testSort answrlst then
59│> "List sorted correctly!"
60│> else
61│> "Failure in sorting list!!!"
62│> )
Did you mean one of these?
Hash
main
List
U8
────────────────────────────────────────────────────────────────────────────────
No, it's not on main yet or in TESTING
Ah I know what this is, you need to add exposing [Task]
@Brendan Hansknecht did you add a exposing Task suffix to your import?
Interesting. Why is that needed today but not yesterday (or a few days ago, don't have exact bisect)
It was due to fixes around !
but I did not get to the bottom of it myself
Yes, I think it was caused by PR#6868
I guess once we have task as built-in this will be fixed cause task will be automatically imported.
Yup
I ran into this exact issue yesterday and was confused (using testing releases). I don't think the docs / tutorial mentions the need to use exposing [Task] or maybe I didn't look hard enough
We only just merged that change, and I don't think anyone realised it would impact the imports in this way. The Task as builtin change is basically ready and will eliminate this entirely. We just are going through the process to make sure that is well tested before making anither breaking change and new release.
When Task is a builtin there wont be any need to import it manually, it will be automatically available in every app.
Good info and visualization on how memory really should be accessed if you want to maximize throughput: https://blog.mattstuchlik.com/2024/07/21/fastest-memory-read.html
Theoretically would also be a more optimal way to do large List.map
operations assuming the mapping function is simple enough (too expensive of a mapping function and doing more complex memory loading may actually hurt a lot).
someone suggested using large pages somewhere for basically the same effect with less effort
How do large pages fix the issue? I thought the core problem was needing multiple memory streams to increase memory load throughput and keep the load buffer full.
Maybe this is a completely crazy idea... but LLVM bitcode isn't specific to an os/arch and is basically an IR. Right?
Is there any reason why a platform host couldn't be a LLVM bitcode file? So platform authors provide that one generic-llvm.bc
or something and then roc can build and link targeting anything supported by LLVM.
It woudn't be compiled into machine code already, so it would be much slower than just linking, and probably much larger. But maybe it provides a more flexibility or is a good option for some use-cases?
Versioning is a problem (no guarantee that llvm 16 ir will load with llvm 17). Also wouldn't work with dev backends.
So it could work then? We could version the file, like generic_llvm_18_1_8.bc
.
Even is this only worked for slower optimised builds, it might be useful, particularly for e.g. getting support for various long tail of targets.
The most immediate use case I can think of is for WASM/WASI targets.
Cant we just use wasm LD?
Avoid the error and problems of merging llvm ir directly?
3 messages were moved from this topic to #compiler development > glue generation error by Richard Feldman.
We can generate some "interesting" types:
expected type 'union { (union { ((),), ((heap_cell,),), (), ((heap_cell,), union { (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), () }), (), ((),), () },), ((heap_cell, bag<(heap_cell,)>),) }', found type 'union { (union { ((),), ((heap_cell,),), ((heap_cell,), union { (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), () }), ((),) },), ((heap_cell, bag<(heap_cell,)>),) }'
Is this the work of the alias analysis, lambda sets? I imagine this is what is happening in the "Specialise" stage of the compiler.
And thinking about Jasper's comment the other day, could this be represented in Roc style syntax, or is this completely something different?
Is this the work of the alias analysis, lambda sets?
I think it is related. Like it happens for all types that unify, lambdasets being one of them. I think this specific error may come from morphic related code? Caues I think it use things like bag<(heap_cell,)>
. Theoretically you could represent this as a roc style tag union, but I don't think that would erally be helpful.
As a note, this is generating from basic webserver with sqlite. I ported the todos example over. Using the query function twice in different contexts seems to be leading to this somehow. If I comment out either the createTodo
path or the listTodos
path, it seems to fix things. I am very much not sure what all is going on/the real issue. I think the first type has extra empty ()
values. Maybe something should be collapsing those together? Cause union { (), () }
is probably the same thing as union { () }
(assuming this are raw unions.
PR here: https://github.com/roc-lang/basic-webserver/pull/61
With repro command being roc build.roc && roc examples/todos.roc
if any has a chance to take a look/a guess at the issues.
@Folkert de Vries just curious if you have any thoughts or comments on the above. Specifically around alias analysis union simplification and if it is valid. Like are we missing a canonicalization step here that would lead to all the ()
in the union merging into a single ()
. I'm not sure the semantics expected here.
@Ayaz Hafiz just curious if you have any idea about the above.
() are units
there's probably a way to simplify it, but simplifying it is also likely covering up a bug
i would run the IR checker - that will point to an issue if there is one earlier on, which i suspect there is
I think it crashes before check mono ir runs. That or it passes and then gets to this failure anyway.
there should be a mono IR check pass that runs before alias analysis (morphic). this is a panic in morphic
I would check roc_debug_flags, i think its ROC_CHECK_IR_AFTER_SPECIALIZATION or something
Ok. I'll double check when I get the chance. I'm pretty sure I ran it with ROC_CHECK_MONO_IR
and it still got to the error above and crashed.
Should we enable these env vars in CI when doing cargo test
without --release
to catch any bugs that may otherwise go undetected?
ROC_VERIFY_RIGID_LET_GENERALIZED
ROC_VERIFY_OCCURS_ONE_RECURSION
ROC_CHECK_MONO_IR
That sounds like a great idea!
After reading the descriptions of the verify flags (in crates/compiler/debug_flags/src/lib.rs), I ended up enabling only ROC_CHECK_MONO_IR (PR#6976)
Oh, we do have a mono ir failure here:
check failure
The type difference is this:
[
C [
C [],
C Str,
- C ,
C Str [C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C , C ],
- C ,
C U8,
- C
],
C List Str
]
Just saw someone share this collection of "Resources for Amateur Compiler Writers". I haven't dug into them yet, but thought I'd pass it on anyway!
https://c9x.me/compile/bib/
I just tried IDA free for the first time, it is definitely the best assembly debugger I have ever used
IDA-screenshot
Wow, that's basically exactly what we want
That's a lot like gdb tui mode
TIL gdb has a TUI: https://dev.to/irby/making-gdb-easier-the-tui-interface-15l2
@Anton Have you tried Cutter?
Wow, cutter looks very clean, definitely going to try that out
The debugger is in beta and crashes easily on the binary I'm testing it with, it definitely has potential though!
I am once again reminded how fast memcpy is for giant chunks of data.
I was looking at basic webserver and the cost of always copying the request and response bodies. Cause currently it always copies the input and output data from rust vectors into roc lists and back.
Did some testing of echoing with a large payload (~100KB).
Raw rust, fully async but still manifesting the fully body into bytes (aka avoid streaming in the request body):
787482 requests in 20.10s, 72.66GB read
Roc with spawn_blocking
and copying all inputs and outputs between rust vector and roc lists:
627501 requests in 20.00s, 57.90GB read
So 2 extra copies of all bytes only costs ~20% loss in overall perf. Still a hefty loss, but I was expecting a lot more.
Note: if we also constrain rust to use spawn_blocking
it takes a ~4% perf hit:
755249 requests in 20.10s, 69.69GB read
See if you can try copying 1MB of data back and forth and see how much of penalty does it take! :)
I mean the process as a whole gets slower but the ratio is roughly the same.
raw rust without spawn_blocking
:
69261 requests in 20.08s, 63.86GB read
roc:
58514 requests in 20.08s, 53.95GB read
Roc is ~15% slower in overall perf.
What is this post?? https://github.com/roc-lang/roc/issues/7027#issuecomment-2309494585
Concerning, it looks like spam/malware
I want to "Report Content"
I hid it, you should probably report it
Their only contribution I can publically see on Github is 3 hours ago: https://github.com/TerraFirmaCraft/TerraFirmaCraft/issues/2772
Reported
Terrafirmacraft...that's a throwback
Password protected to evade malware scanners, sneaky...
1023 unread messages in ideas :p
Howdy
We can summarize, but there was a good amount of useful convo
Mainly about purity inference, super cool idea, no perfect way to do it
We can summarize, but there was a good amount of useful convo
Nah, I can't break my reading everything streak :p
That's dedication
Well, we'll be happy to get your input on everything
These are extremely important discussions!
Always fun to remember just how much perf is often left on the table: 41b1ca8f-ae9e-4406-9c07-0a99fc0f35bf.jpg
I'm assuming this is from a matmul or similar op.
How do I fix a snapshot failure like this? I made a completely unrelated change (as far as I can tell) and now this is failing...
Screenshot-2024-09-04-at-21.39.20.png
looks like there's a newline at the end of the line in the source file, and maybe it got saved and your editor automatically trimmed it off?
that used to happen to me all the time, but I edited the tests in question to no longer need a trailing newline
Do I modify the error report maybe?
Why would it have a blank space on the end?
Hmm... it shouldn't have one there from what I can tell.
alloc.reflow("Tip: Learn more about builtins in the tutorial:\n\n<https://www.roc-lang.org/tutorial#builtin-modules>"),
I think the reflow can put one there based on the terminal width
although I agree that would be strange in this case haha
I'm just going to leave that for now and come back to it later... in the build-host PR / rebuild-platform branch I've managed to remove all references to basic-cli from roc (aside from the scripts for building the website) now.
I've migrated the tests we want to keep onto other test platforms.
Scale of waste is mind boggling to me sometimes. I get the computers are fast, so a company can make a ton of money well also wasting a ton, but it just feels wrong to me.
15,000 requests/second! That's what I just got out of that $220/month EPYC 48-core hobby box from Hetzner running a basic Rails 8 scaffold MessagesController#show loading a single record from the DB with no caching of any kind. Hot diggity!
jitted multithreaded ruby using sqlite as a local database... on a modern high end bare metal server with 48 cores.... only getting 15k requests per second.
Basic webserver still has wasted copies and is missing the effect interpreter to enable full async. Yet it can do 120k requests per second on 4 cores of my m1 Mac.
So 30k per core vs 300 per core.
Obviously an apples to oranges comparison to some extent. But the rough scale is real.
Related aside: shouldn't ruby have async and await that turns this into just waiting on the network card? Shouldn't this still be able to saturate the network card for something so simple? Is Ruby with rails all blocking io?
My company primarily runs a Ruby on Rails monolith, tell me about it... :smiling_face_with_tear:
https://www.wjwh.eu/posts/2020-12-28-ruby-fiber-scheduler-c-extension.html
It seems like Ruby 3.0 onwards does async IO automatically, but we run Ruby 2.7 because any company using something so outdated as Rails will avoid the cost of upgrading things whenever possible
Hahaha, this is deeply relatable. I just started on a codebase with a few hundred thousand lines of nightmarish PHP.
We have pages that take over 10 seconds to load because they query all 250k users on the database to display a stat.
I can promise you, nobody has any idea how many requests per second we manage.... But I can also promise it ain't good :sweat_smile:
Does the LLVM Ir
define void @roc__mainForHost_1_exposed_generic(ptr %0) !dbg !21
Look right for a platform with API?
mainForHost : Str -> Str
That would take a RocStr and mutate it right?
No, I think we would take two pointers for that
How does try
/?
work? Is this a correct desugar?
y = try x
y + 1
when x is
Error e -> Error e
Ok y -> y + 1
or does it wrap the continuation in the Ok constructor?
We're going with return
when x is
Err e -> return Err e
Ok y ->
y +1
kk thanks
This works with the continuation monad well
Returning early from the continuation will return to the parent function's return value immediately, since the parent function doesn't do anything after calling the continuation
It all just fits together
sorry I'm not sure I follow that piece
wouldn't the early return immediately pass the result value to the next continuation in the chain? if you return to the parent function wouldn't that break the sequence of continuations?
Is zstd
a dependency for using a roc nightly on intel macos? I'm trying to setup a GH runner and it's given me an error like
dyld[4651]: Library not loaded: /usr/local/opt/zstd/lib/libzstd.1.dylib
Referenced from: <4E33E9A3-ECE7-3A94-90F3-D55423A81AF9> /Users/runner/work/roc-ray/roc-ray/roc_nightly/roc
Reason: tried: '/usr/local/opt/zstd/lib/libzstd.1.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/usr/local/opt/zstd/lib/libzstd.1.dylib' (no such file), '/usr/local/opt/zstd/lib/libzstd.1.dylib' (no such file), '/usr/local/lib/libzstd.1.dylib' (no such file), '/usr/lib/libzstd.1.dylib' (no such file, not in dyld cache)
/Users/runner/work/_temp/97a43c70-cd5c-446e-a653-669d45f49f87.sh: line 1: 4651 Abort trap: 6 ./roc_nightly/roc version
Trying different things to resolve it. Just noticed nothing about it in our Getting Started guide https://www.roc-lang.org/install/macos_x86_64
I think so: https://github.com/roc-lang/roc/pull/7008
hm... that doesn't appear to have helped
Run ./roc_nightly/roc version
dyld[2644]: Library not loaded: /usr/local/opt/zstd/lib/libzstd.1.dylib
Referenced from: <4E33E9A3-ECE7-3A94-90F3-D55423A81AF9> /Users/runner/work/roc-ray/roc-ray/roc_nightly/roc
Reason: tried: '/usr/local/opt/zstd/lib/libzstd.1.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/usr/local/opt/zstd/lib/libzstd.1.dylib' (no such file), '/usr/local/opt/zstd/lib/libzstd.1.dylib' (no such file), '/usr/local/lib/libzstd.1.dylib' (no such file), '/usr/lib/libzstd.1.dylib' (no such file, not in dyld cache)
/Users/runner/work/_temp/e7c7fbc1-dcab-4aba-8b61-ee456c77d93c.sh: line 1: 2644 Abort trap: 6 ./roc_nightly/roc version
Yeah I goofed up... got confused between Arm64 and x64 :sweat_smile:
I think we've managed to break the --no-link
build pipeline sometime recently... I used to be able to do this.
$ roc build --no-link --emit-llvm-ir examples/simple.roc
Legacy linking failed: Failed to find any legacy linking files; I need one of these three paths to exist:
examples/../platform/macos-arm64.a
examples/../platform/macos-arm64.o
examples/../platform/libhost.a
I didn't think this was --rebuild-host PR... but that's probably the most likely candidate
Yeah. No link should turn off all the host stuff
Almost certainly the rebuild host pr
PR to fix this
https://github.com/roc-lang/roc/pull/7236
QQ: Is there any particular reason why there are two phases to canonicalization? (desugaring, and then conversion to can::Expr/etc)
I'm wondering if it'd be crazy to try to combine those...
There's a comment on this already
Oho well that sounds very fixable
I've wanted to move precedence/associativity handling into the parser anyway
If you can do that cleanly, I think that's a good improvement!
ooo, using Pratt parsing? :smiley:
Not only does it send your mind into Möbeus-shaped hamster wheel, it also handles associativity and precedence!
:zany_face:
Yeah that looks roughly right
I've written basically that ~5 times now. Every time I need to either look up how exactly it's done, and/or spend hours debugging.
Mobeus-shaped hamster wheel indeed
send your mind into Möbeus-shaped hamster wheel
spend hours debugging
That does not sound like something we want :big_smile:
Well the great thing is that once it works you never have to touch it again
it's basically the gold standard for how to efficiently resolve precedence without needing a separate pass later, and lots of compilers use it
Or .... Just have simple precedence that is easy to parse like uiua.
Not a real suggestion if you want your language to read normal
But it does point to how bad binary ops and precedence are
In my own language I was working on, I didn't have bin/unary ops and let the user decide the precedence through function call. Was that principal or laziness? I'll let you make the call
fiiiiiiinally
Screenshot 2024-11-30 at 9.12.36 PM.png
We have DOCS!!!!
I need a bunch of cleanup before it's ready for a PR, but it actually works end to end now
(for those who haven't been following along, this resolves the extremely longstanding issue where if you expose a type alias of a type from another module - such as Http.Request : InternalHttp.Request
in this example) it would render as nothing - which is why the Http docs today are not very helpful :sweat_smile:
Great! :star_struck:
Hey, would anyone a little more familiar with the inner workings of the compiler be able to suggest how I could get types for the tag completions I just added to the language server?
Currently I get the tag names from the subs tag_names
field
Initially as an experiment I just tried to brute force it by iterating every variable entry in the current module subs looking for Tags
But that approach didn't seem to find many of the tags that exist in tag_names
subs.variables
.iter()
.flat_map(|var1| {
let res = subs2.get(var1.clone()).content.clone();
match res {
Content::Structure(structure) => match structure {
FlatType::FunctionOrTagUnion(names, _, ext) => {
let res = subs.get_subs_slice(names).iter().filter_map(|(label)| {
if label.as_ident_str().starts_with(prefix) {
// let type_str = SubsFmtFlatType()
Some(CompletionItem {
label: label.as_ident_str().to_string(),
kind: Some(CompletionItemKind::ENUM),
documentation: Some(lsp_types::Documentation::String(
format_var_type(
ext.var(),
&mut subs2,
module_id,
interns,
),
)),
..Default::default()
})
} else {
None
}
});
res.collect()
}
FlatType::TagUnion(name, ext)
| FlatType::RecursiveTagUnion(_, name, ext) => {
let res = name.iter_from_subs(&subs).filter_map(|(label, var)| {
if label.as_ident_str().starts_with(prefix) {
// let type_str = SubsFmtFlatType()
Some(CompletionItem {
label: label.as_ident_str().to_string(),
kind: Some(CompletionItemKind::ENUM),
documentation: Some(lsp_types::Documentation::String(
format_var_type(
ext.var(),
&mut subs2,
module_id,
interns,
),
)),
..Default::default()
})
} else {
None
}
});
res.collect()
}
_ => vec![],
},
_ => vec![],
}
})
.collect::<Vec<_>>() //we have to collect so that we can release the lock
How can I go from tag_names to types? or maybe where should I add in the compiler some way of saving the tag unions that exist so i can use them later in the completion?
If you see this error
Please file an issue here: <https://github.com/roc-lang/roc/issues/new/choose>
Invalid decimal for float literal = 1e10. This should be a type error!
Location: crates/compiler/mono/src/ir/literal.rs:115:25
Is this in dev-backend... or a Can related issue?
inbetween the two in mono
rocdec from str failed
it doesn't understand e
I guess
Apparently my new version of the false interpreter found a bug in drop specialization.
Failing in a debug assert
I was hoping to just update and land this change quickly, not to round about debug a bunch of things due to the update (I guess it at least proves the value of having false interpreter as a test case)
@J.Teeuwissen any chance you can look at the failure here?
It is a debug assert being hit in drop specialization. Can be reproed on the fix-false branch on any target with cargo test -p roc_cli -- false_
Brendan Hansknecht said:
J.Teeuwissen any chance you can look at the failure here?
It is a debug assert being hit in drop specialization. Can be reproed on the fix-false branch on any target with
cargo test -p roc_cli -- false_
Opened pr that should fix this specific issue: https://github.com/roc-lang/roc/pull/7376
I haven't touched Roc in a while, i suggest checking/testing the changes thoroughly ;)
Does this error message mean anything to anyone? for context, it's a CI failure in basic-cli after upgrading hyper and removing "ring" and replacing with "aws-lc-sys" crate. Only failing in macOS-13 -- passes everywhere else.
cargo:warning=In file included from /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/aws-lc-sys-0.24.0/aws-lc/crypto/fipsmodule/bcm.c:150:
cargo:warning=In file included from /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/aws-lc-sys-0.24.0/aws-lc/crypto/fipsmodule/rand/urandom.c:69:
cargo:warning=/nix/store/hmchx96ysd0h19p7kx0w12pljkbp30j3-Libsystem-1238.60.2/include/CommonCrypto/CommonRandom.h:35:9: error: unknown type name 'CCCryptorStatus'
cargo:warning=typedef CCCryptorStatus CCRNGStatus;
cargo:warning= ^
cargo:warning=In file included from /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/aws-lc-sys-0.24.0/aws-lc/crypto/fipsmodule/bcm.c:150:
cargo:warning=/Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/aws-lc-sys-0.24.0/aws-lc/crypto/fipsmodule/rand/urandom.c:394:42: error: use of undeclared identifier 'kCCSuccess'
cargo:warning= if (CCRandomGenerateBytes(out, len) == kCCSuccess) {
cargo:warning= ^
cargo:warning=2 errors generated.
I tried adding CoreFoundation
do the nix flake... but that didn't help
darwinInputs = with pkgs;
lib.optionals stdenv.isDarwin
(with pkgs.darwin.apple_sdk.frameworks; [
Security
CoreFoundation
]);
It's part of libsystem...that feels relevant...but not really sure
Also, looks to be part of libc in rust? https://concoct-rs.github.io/viewbuilder/libc/type.CCRNGStatus.html
37 messages were moved from this topic to #compiler development > bug: Outstanding references to the derived module by Luke Boswell.
Do we have a syntax for extending a union twice:
I have
Tag1: [A]
Tag2: [B]
I want the sum of those two and C
So:
Tag3: [C]Tag1,Tag2
But that is invalid syntax. Do we have a way to do this?
Maybe ([C]Tag1)Tag2
?
Looks like that probably should work but we parse it wrong:
── INVALID_EXTENSION_TYPE in examples/../platform/Sqlite.roc ───────────────────
This record extension type is invalid:
50│ Tag3 : ([C]Tag1)Tag2
^^^^
Note: A record extension variable can only contain a type variable or
another record.
── INVALID_EXTENSION_TYPE in examples/../platform/Sqlite.roc ───────────────────
This tag union extension type is invalid:
50│ Tag3 : ([C]Tag1)Tag2
^^^^
Note: A tag union extension variable can only contain a type variable
or another tag union.
Note that the first error is a record
extension
I thought I saw someone was proposing a spread type syntax for this?
There is. Just needs to be implemented.
Was wondering if something works today.
Is anyone interested to review @Joshua Warner's PR https://github.com/roc-lang/roc/pull/7431
This is a fairly significant refactor, where we introduce a strongly-normalizing intermediate representation from which we actually generate the output in the formatter.
I think his idea is good and any potential (unforeseen negative) impact limited to the formatter.
Oh…..
:melting_face:
I feel like half of my PNC change is going up in smoke :laughing:
And this is what I was working on right now
But more seriously I wish this design could work well with some sort of line width constraints (if it does and I missed it somehow i apologize), me and Richard were talking about that for better formatting of docs code blocks than what we have today
I think I just need to talk to Joshua about this
I wish this design could work well with some sort of line width constraints
Yep, this has been on my mind
When I've floated that previously (and seen it discussed separately), I think Richard has been somewhat against it
There's probably a happy medium where there aren't dramatic changes from how things operate today, but line-length constraints / adjustments really only operate on the fringes / fix up some of the edge cases
I feel like half of my PNC change is going up in smoke
Oh noes! I definitely didn't intend to interfere.
I think we'd want line length constraints that can be applied to type signatures for docs generation, but otherwise are ignored
yeah exactly :point_up:
in other words, have the line width be configurable, and have roc format
set it to infinity, but set it to different numbers when generating docs
Because line length constraints in code just tend to force awkward breaking up of lines that need to be longer than the arbitrary limit
FWIW for normal formatting, I've been thinking of experimenting with soft line _width_ constraints rather than line length constraints - i.e. not counting indentation against you
That avoids some kinds of awkward breaking when code is at a high indent level
That also happens to be somewhat easier to implement
Thinking about the docs use-case a bit more, a few things come to mind:
The "simple" solution of just having everything authored to fit a reasonably narrow viewport (i.e. manually insert newlines to force multi-line representations in the formatter) feels not _terrible_.
The big thing that tends to blow line length limits accidentally in roc code would be comments, and I don't know if there's an easy solution for that.
Currently, the type signatures in docs have a max width about 600px wide even on really wide screens. So type signatures can get narrower than that, but are pretty much the same size on desktop and most mobile views
Why are type signatures special here?
Because that's the thing we display in docs
Besides code examples, which I wouldn't expect to get special formatting
I'd just have those have horizontal-scroll: auto
On formatting comments: since doc comments are markdown, any group of lines that doesn't have an empty line between its lines can be treated as a giant, single line and wrapped based on the word lengths. But I think normal comments just need to be left as-is
And why not just do line wrapping in the browser?
I mean, just let the browser do that
That seems not terrible?
It may look a little weird if there's a very long type decl, but it'll copy-paste just fine
For function types, we do let the browser do line wrapping after forcing multilining on long types
I think it would be less readable if we didn't force multilining
A good way to see this is to compare List.walk
in the search results vs. its main section in the List
module
Or maybe List.walkFrom
:
walkFrom :
List elem,
U64,
state,
(state, elem -> state)
-> state
vs.
walkFrom : List elem, U64, state, (state, elem -> state)
-> state
Or on a small width, List.walkFromUntil
:
List.walkFromUntil : List elem, U64, state,
(state, elem -> [ Continue state, Break
state ]) -> state
It's fine, but I think were better with proper multilining
But don’t those same reasons also apply to the source code?
Why is the developer ok with that super long type in the source but not ok when it’s in docs?
Because the developer can decide to break up a long def with a newline. Just putting a newline in the middle will make Roc multiline it for you
But you can also decide to leave it more legible as a single line
This lets people choose for themselves in wide display environments how to display their signatures
But the docs online are always narrow
So keeping them in a single line and autowrapping leads to potentially many List.walkFromUntil
s
I personally don't feel that strongly about this, I don't think it matters that much
I think I agree with your logic there, but to me it more strongly indicates the formatting should adapt to the viewport in all situations where that’s possible.
Eg in the editor with LSP, it would ideally be possible to do a “view-only” reformat or something, where the on-disk version has longer lines
Or maybe this is done at ‘git checkout’ time: code is formatted to your preference when checked out, and formatted back to some generic standard when committed. Pretty sure hooks for that sort of thing do already exist.
Joshua Warner said:
I think I agree with your logic there, but to me it more strongly indicates the formatting should adapt to the viewport in all situations where that’s possible.
I understand in theory, but having spent years with formatters that do and don't enforce line width, I've found that the line width enforcement is frequently annoying and almost never helpful.
for example, sometimes I have two lines that are doing things that are very similar, and will be easier to understand if they are side by side and formatted the same way - so the similarities between them are visually obvious - but one happens to call functions with slightly longer names, causing the formatter to make it multiline.
I've wasted a bunch of time trying to coax line-width-enforcing formatters to not make my code harder to read, and I've wasted zero time doing that with formatters that don't enforce it
the difference is that when there is a hard width cap, like in types in docs, if there's going to be wrapping anyway, it's better if we can apply a wrapping algorithm that looks nicer than the browser default of just word wrapping without taking into account indentation, whether an entire type should now render as multiline, etc :big_smile:
I think we're mostly agreeing :)
The same view-width logic applies to my editor (I run it full screen and I'm not reducing the font size - so it's not getting any wider). I would love my editor to have nicer line wrapping behavior that's guided by a language server. Or alternatively, an easy git config to format my code to some line limit on checkout (supposing it's somehow perfectly invertable).
The practical difference there is in difficulty. Docs (in theory) have one width across all users. Editor config is often unique-ish for each user.
I would add tho that many times I've come across a docs site that looked silly because of how much it restricted the content width, and I've gone into the browser inspector and removed that width constraint while I read.
(which is to say, I would push back a bit on the notion that 600px is great for everyone)
Here's an idea: ship the formatter as a wasm binary in the browser that will dynamically re-format lines to match whatever the real display width is :)
oh I was thinking we'd pick a few different breakpoints and render each of them
then CSS could hide all but one of them depending on browser width
Interesting!
That's definitely lighter-weight, in terms of what we ship to the browser, so probably better.
Not as much fun tho :)
(I like it!)
Deliciously constrained
Ok, anyway, with that in mind: yes, I think this "Node" refactor for the formatter that I've been working on could be updated to (optionally) do line-length limits
That's not high on my priority list right now, but happy to provide some guidance if someone else wanted to tackle that
that's awesome! :smiley:
I'm working on migrating my blog to the Roc-based static site generator I'm working on. I was doing some styling and trying to figure out the best approach to display code snippets on mobile.
Rewriting the code snippets to have very short line widths is one approach. The thing is, while that means you don't have to scroll horizontally anymore, I think it creates a different flavor of poor readability where even simple expressions need to be broken across multiple lines.
One approach that I'm wondering if others have tried before (haven't ran into this myself): Suppose code snippets on mobile are replaced with an icon, that when pressed open the snippet full screen, in landscape mode. The user presses the code example, flips their phone, then only needs to scroll vertically.
In a brief experiment, I get a line width of about 65 characters with reasonble fontsize that way. I believe 50-70 characters or about 10 words per line is considered about ideal for readability of regular text, so if main-body width is around 50-70 characters on a big screen, then 65-character wide examples will look great on the big screen too, at least typographically speaking.
Plus, if we can have a simple rule like "code examples should have a max linewidth of 65", and the result is the code is readable _and_ looks like same across different screen sizes, that creates a pretty nice experience writing code examples, because what-you-see-is-what-you-get.
Interested to know if anyone has any ideas what could be causing this bug?
https://github.com/roc-lang/roc/issues/7461
I've been trying to isolate it and make a minimal repro. Been hacking code out and deleting things.
It's not related to the platform at all... this is just the package and compiler
I found even more code to cull out... it's pretty small now
Down to 1 file :tada:
I assume this line was meant to be removed before merging a PR: https://github.com/roc-lang/roc/blame/main/crates/compiler/fmt/src/def.rs#L434
cc: @Anthony Bullard
Was really confused when I formated a file and it became:
WTF???
WTF???
app [main] { pf: platform "platform/main.roc" }
...
Hah, I had that too :)
Yes, I have a PR to remove
I'm so sorry
I thought someone would catch it in code review
Because I obviously missed it
https://github.com/roc-lang/roc/pull/7464
The Ubuntu fuzzer is my enemy
We probably should just force a merge and worry about the fuzzing issue separately
Clearly removing the println is correct
For some reason I keep hitting it
If you have the perms to force merge, feel free
cc: @Joshua Warner Another fuzzing failure: https://github.com/roc-lang/roc/actions/runs/12610505189/job/35147114059
Not sure if it is a new regression that slipped in or something older that just took a while for the fuzzer to happen upon.
I'd like to get the crash report and debug myself
Since it may have come from the PNC change
Also did we meant to always allow this kind of syntax?
{
some_field?
"""
""",
}
To destructure an optional?
I don't see why that wouldn't work as silly as it looks
Ok, then I have to figure out why my change to ??
doesn't like
{
some_field??
"""
""",
}
Might be PNC. failing input: ((@Y)@Y)((@Y)@Y)
Formats to (@Y) @Y((@Y) @Y)
which is a different ast
So yeah, probably a new failure case
@Brendan Hansknecht You mean for the new fuzzer bug?
This is hard to parse for my human eyes
I would think this should be a PNC apply with the result of the whitespace application of @Y @Y
as the func and then result of the whitespace application of @Y @Y
as the first arg, so just (@Y @Y)(@Y @Y)
Of after migration to PNC (@Y(@Y))(@Y(@Y))
Minimization is ((Y)Y)()
(Y Y)()
or migrated (Y(Y))()
should be the formatted
Just a formatting error, parses correctly. I’ll put the fix in with my current PR
I have a fix for this. It was super simple
https://github.com/roc-lang/roc/pull/7467
Can someone try repro this bug for me?
https://github.com/roc-lang/roc/issues/7461#issuecomment-2571475674
Just copy that file and
$ cargo run -- test bug.roc
Sam hasn't been able to repro
I’ll try in a few
@Luke Boswell Repro'd on my m1 mac:
❯ RUST_BACKTRACE=1 ../roc/target/debug/roc test bug.roc
thread 'main' panicked at crates/compiler/mono/src/reset_reuse.rs:1244:42:
Expected symbol to have a layout. It should have been inserted in the environment already.
stack backtrace:
0: rust_begin_unwind
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:647:5
1: core::panicking::panic_fmt
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/panicking.rs:72:14
2: core::panicking::panic_display
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/panicking.rs:196:5
3: core::panicking::panic_str
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/panicking.rs:171:5
4: core::option::expect_failed
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/option.rs:1988:5
5: expect<&roc_mono::reset_reuse::LayoutOption>
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/option.rs:894:21
6: get_symbol_layout
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:1244:9
7: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:444:41
8: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:516:36
9: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:213:36
10: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:516:36
11: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:757:39
12: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:516:36
13: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:757:39
14: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:516:36
15: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:516:36
16: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:213:36
17: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:516:36
18: insert_reset_reuse_operations_stmt
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:213:36
19: insert_reset_reuse_operations_proc
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:82:20
20: insert_reset_reuse_operations
at /Users/anthonybullard/Development/roc/crates/compiler/mono/src/reset_reuse.rs:44:24
21: update
at /Users/anthonybullard/Development/roc/crates/compiler/load_internal/src/file.rs:2921:21
22: state_thread_step
at /Users/anthonybullard/Development/roc/crates/compiler/load_internal/src/file.rs:1765:25
23: {closure#1}
at /Users/anthonybullard/Development/roc/crates/compiler/load_internal/src/file.rs:2104:23
24: {closure#0}<roc_load_internal::file::load_multi_threaded::{closure_env#1}, core::result::Result<roc_load_internal::file::LoadResult, roc_load_internal::file::LoadingProblem>>
at /Users/anthonybullard/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-utils-0.8.16/src/thread.rs:163:65
25: call_once<core::result::Result<roc_load_internal::file::LoadResult, roc_load_internal::file::LoadingProblem>, crossbeam_utils::thread::scope::{closure_env#0}<roc_load_internal::file::load_multi_threaded::{closure_env#1}, core::result::Result<roc_load_internal::file::LoadResult, roc_load_internal::file::LoadingProblem>>>
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/panic/unwind_safe.rs:272:9
26: do_call<core::panic::unwind_safe::AssertUnwindSafe<crossbeam_utils::thread::scope::{closure_env#0}<roc_load_internal::file::load_multi_threaded::{closure_env#1}, core::result::Result<roc_load_internal::file::LoadResult, roc_load_internal::file::LoadingProblem>>>, core::result::Result<roc_load_internal::file::LoadResult, roc_load_internal::file::LoadingProblem>>
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:554:40
27: ___rust_try
28: try<core::result::Result<roc_load_internal::file::LoadResult, roc_load_internal::file::LoadingProblem>, core::panic::unwind_safe::AssertUnwindSafe<crossbeam_utils::thread::scope::{closure_env#0}<roc_load_internal::file::load_multi_threaded::{closure_env#1}, core::result::Result<roc_load_internal::file::LoadResult, roc_load_internal::file::LoadingProblem>>>>
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:518:19
29: catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<crossbeam_utils::thread::scope::{closure_env#0}<roc_load_internal::file::load_multi_threaded::{closure_env#1}, core::result::Result<roc_load_internal::file::LoadResult, roc_load_internal::file::LoadingProblem>>>, core::result::Result<roc_load_internal::file::LoadResult, roc_load_internal::file::LoadingProblem>>
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panic.rs:142:14
30: scope<roc_load_internal::file::load_multi_threaded::{closure_env#1}, core::result::Result<roc_load_internal::file::LoadResult, roc_load_internal::file::LoadingProblem>>
at /Users/anthonybullard/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-utils-0.8.16/src/thread.rs:163:18
31: load_multi_threaded
at /Users/anthonybullard/Development/roc/crates/compiler/load_internal/src/file.rs:2040:29
32: load
at /Users/anthonybullard/Development/roc/crates/compiler/load_internal/src/file.rs:1550:35
33: load
at /Users/anthonybullard/Development/roc/crates/compiler/load/src/lib.rs:39:5
34: load_and_monomorphize
at /Users/anthonybullard/Development/roc/crates/compiler/load/src/lib.rs:143:11
35: test
at /Users/anthonybullard/Development/roc/crates/cli/src/lib.rs:586:27
36: main
at /Users/anthonybullard/Development/roc/crates/cli/src/main.rs:83:17
37: call_once<fn() -> core::result::Result<(), std::io::error::Error>, ()>
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Ok, good to know... maybe it's only on Mac somehow. @Sam Mohr must have been using a linux machine
Maybe add a format! in that expect and see what the symbol is
That's not getting a layout inserted
<-- Does not know anything about mono
Yea've I've tried that. It doesn't give me much more info. I'm not sure how to get any more detail than just a symbol number
Are the symbols interned?
Looks like it, there's a MutSet of them
Adding a format!
in there I can get
thread 'main' panicked at crates/compiler/mono/src/reset_reuse.rs:1244:42:
Expected symbol `bug.60` to have a layout. It should have been inserted in the environment already.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Maybe print that out as well
Actually that's not useful
This part of the compiler is harder for me to follow... it's like peering into the dark
You could instrument all of the calls to Symbol::new
and see what kind of thing goes in, and the symbol that comes out
But this looks like something for Brendan to chew on much more effectively
It's so easy for it to go away too
Yep, I was on Linux
I want to force merge: https://github.com/roc-lang/roc/pull/7466
It just updates the benchmarks to PI, PNC, and snake_case.
It is failing for 2 reasons:
any concerns?
Sounds good to me :+1:
merged! :boom:
I've got a couple of hours free... I'm going to poke at PR's and CI and try and land a bunch of things.
I'd like to finish the examples/
folder cleanup, and then start on removing Task
from the builtins and tests
You are hot on the trigger Luke!
I'll have to land these fuzzer fixes on a new PR :-P
It's a lesser of two evils, we're only going to give ourselves grief trying to keep Task around. The sooner we transition things across we can eliminate a lot of known bugs and rough edges.
Love it, it's probably for the best
This part has become larger than I expected
I have about a 400 file collection of updating almost all idents to snake_case idents
I have a couple tests left to fix, but we can actually maybe update everything at once
I don't know if it's worth breaking up into multiple PRs
Seems difficult to avoid breaking everything
I guess we could do one builtin module at a time?
Once I get tests passing, I'll see what I can do to break this up
I don't follow the logic of breaking it up. Is that to make it easier to roll back?
I'd rather not break it up, I just read the lack of response as people being unhappy with a big change.
I was getting in my head.
I'll keep it as is
Pull off the band-aid!
Okay, go team
Then people can just migrate!
Some handsome feller added a tool to make it easy
I think he looks more like Quasimodo, but agree to disagree
@Sam Mohr to clarify, this is basically upgrading all the Builtins to snake_case
right? not doing the sneaky Can case change thing in the background?
Nope, just me manually updating every camelCase ident I found to snake_case
Some notes:
mainForHost
to main_for_host
But it does include all the builtins
Yes
That's why it's 400 gosh darn files
It's a bandaid that covers your whole body
I'm all in favor of this as one big bandaid. I want to start migrating my packages to snake case, but hesitant to do so until I can migrate everything all at once.
Seems like ? is broken with the new PNC change:
module []
println! : Str => Result {} _
print_something! = \{} ->
println!("Hello, world!")?
Ok({})
❯ roc check test.roc
── UNKNOWN OPERATOR in test.roc ────────────────────────────────────────────────
This looks like an operator, but it's not one I recognize!
6│ println!("Hello, world!")?
^
I have no specific suggestion for this operator, see
https://www.roc-lang.org/tutorial#operator-desugaring-table for the
full list of operators in Roc.⏎
So you are saying it just doesn’t work with PNC? Or doesn’t work at all
PNC
It works with spaces
This makes sense. The parsing logic for question mark suffixes needs to be moved to the same spot as we’re handling pnc args
I’ll put that in with my refactor I’m working on right now
Moving PncApply to its own expr node
Just upgrading basic-cli... there is only one external dependency on another package. roc-json
in one example.
I think we should remove that... well actually move it to the roc-json repo.
This will make future breaking changes easier.
I agree with this. We can just have an example using roc-json
on the website or in examples
Seemingly, the last CI test failure in the snake_case
change is that we are failing to generate the docs for basic-cli
to host on the website.
In basic-webserver
, we generate and host them in the basic-webserver
repo
But presumably for outdated reasons, we generate the docs for basic-cli
in the compiler repo and host them with the main website
Luke made a good point that this may be something we want because it lets us host multiple versions of the docs
For an old and a new version of basic-cli
But I think we should just generate the docs for basic-cli
in its own repo and host them there, as we do for basic-webserver
. Any objections?
I think we need to figure out a better solution long term, but that works for now
Yes, to confirm, basic-webserver
only hosts docs for main, but basic-cli
hosts for a few versions
I wonder if we could just commit the html files into the repo under a folder structure...
www/
- 0.16.0/
- 0.17.0/
I'm not sure if this is compatible with GH pages...
I think it is compatible
Then we don't need all the CI magic (and complications) in all the different repos
It's a bit handraulic, when making a release, but brain dead simple and easy to manage.
We'd want some script to make that process easier
Do we want to block updating Roc by changing this process, or are we okay with just stealing the basic-webserver
code for now and doing that later?
Can we test the idea in something else like weaver or roc-json?
If it's reliable to host the static site versions this way, I'd lean towards removing the build stuff from roc's CI and just point at the basic-cli repo docs like webserver
https://github.com/roc-lang/basic-cli/pull/306
I use a similar workflow in Weaver
Yeah, I just tried that... even making a test release in basic-cli but we cant without a new testing release of roc
https://github.com/smores56/weaver/blob/main/.github/workflows/generate-docs.yaml
So my idea above was just to do it manually (locally), and we can side-step all the CI madness when making/testing breaking changes like this
Here's where I tried the same thing... pushed it into the snake_case PR
Give me 5, I'll try it in roc-json -- I might need to read up on GH Pages
Okay, sounds good!
slight bump -- apparently the latest roc nightly isn't passing all test in roc-json https://github.com/lukewilliamboswell/roc-json/actions/runs/12661819172/job/35285690202?pr=43 :sad:
We must have fixed something recently
Does this test look like the actual is correct now?
── EXPECT FAILED in package/Json.roc ───────────────────────────────────────────
This expectation failed:
657│> # Test decode of F32
658│> expect
659│> actual : DecodeResult F32
660│> actual = Str.toUtf8 "12.34e-5" |> Decode.fromBytesPartial utf8
661│> numStr = actual.result |> Result.map Num.toStr
662│>
663│> Result.withDefault numStr "" == "0.00012339999375399202"
When it failed, these variables had these values:
actual : DecodeResult F32
actual = { rest: [], result: Ok 0.0001234 }
numStr : Result Str DecodeError
numStr = Ok "0.0001234"
1 failed and 111 passed in 1252 ms.
It looks like we've lost a lot of precision maybe when converting to Str?
Or somehow storing the F32 so it encode/decodes correctly
@Brendan Hansknecht do you have any thoughts on this?
Oh, this is just the zig update
F32 now prints with the minimal digits necessary to be the correct number
Interestingly the F64
one is still fine
# Test decode of F64
expect
actual : DecodeResult F64
actual = Str.toUtf8 "12.34e-5" |> Decode.fromBytesPartial utf8
numStr = actual.result |> Result.map Num.toStr
Result.withDefault numStr "" == "0.0001234"
Before it was printing tons of unnecessary/incorrect precision
A message was moved from this topic to #ideas > casual conversation by Luke Boswell.
Do we want to block updating Roc by changing this process, or are we okay with just stealing the
basic-webserver
code for now and doing that later?
Can we keep the docs for 0.18.0, 0.17.0 etc. available? They don't need to be rebuild every time, they use a docs.tar.gz that's in the release assets, so it should not create issues with breaking changes in Roc
I pinged you but you haven't seen it yet @Anton https://github.com/roc-lang/basic-cli/pull/307
That PR includes the docs in the repo
docs/
- 0.17.0/
- 0.18.0/
Ok, I'm going to merge that, configure and deploy the Pages site, and update the README
Ok, docs live at
https://roc-lang.github.io/basic-cli/
https://roc-lang.github.io/basic-cli/0.17.0/
https://roc-lang.github.io/basic-cli/0.18.0/
I'll check if I can forward the old links with netlify
This might just be a WIP thing... @Anthony Bullard
I am updating basic-webserver and used --migrate
and noticed this
# default --migrate gives, but this looked a little confusing
try(Stdout.line!, "$(datetime) $(Inspect.to_str(req.method)) $(req.uri)")
# works, but has the dreaded interrobang
Stdout.line!?("$(datetime) $(Inspect.to_str(req.method)) $(req.uri)")
# what I expected, but doesn't compile right now
Stdout.line!("$(datetime) $(Inspect.to_str(req.method)) $(req.uri)")?
I might fix it accidentally in the Task removal, we'll see once I finish updating the code
But yep, that's broken for me
My PR that's open right now fixes this
Ahk... I might just wait for that to land in main, it looks virtually done
Yeah, some guy is giving me a hard time in the review, but it'll get merged soon
Just tell me who, I'll beat him up for you
He looks like a Smore
And likes name-based puns
We need a smore or marshmellow emoji
Just put this in the custom emojis
Is that gen-llvm failure genuine?
Run cargo nextest-gen-llvm --release --no-fail-fast --locked -E "package(test_gen) - test(gen_str::str_append_scalar)"
error: failed to run `rustc` to learn about target-specific information
Caused by:
process didn't exit successfully: `/Users/username1/.cargo/bin/sccache /Users/username1/.rustup/toolchains/1.77.2-x86_64-apple-darwin/bin/rustc - --crate-name ___ --print=file-names --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=split-debuginfo --print=crate-name --print=cfg` (exit status: 2)
--- stderr
sccache: error: Timed out waiting for server startup
error: command `/Users/username1/.rustup/toolchains/1.77.2-x86_64-apple-darwin/bin/cargo test --no-run --message-format json-render-diagnostics --package test_gen --release --locked` exited with code 101
Error: Process completed with exit code 101.
It's passed everything else except the fuzzer... so looks good to merge
What is this on @Luke Boswell ?
https://github.com/roc-lang/roc/actions/runs/12679852096/job/35340490822?pr=7480
Your PR "Move PNC apply to separate Expr/Pattern variant"
I don't know how to parse that error?
Does that seem like something likely from my PR if all the other tests pass?
It looks odd to me also
FWIW I took a brief look at your replies on that PR @Anthony Bullard. Made sense to me. I can re-review after work today, but no concerns with going ahead and merging if you want to maintain momentum
I think we can either ignore it or restart the run... I lean towards merging now, and we can follow up if needed
I can get started on the platform migrations then
There ya go
OW!
Forgive me, I'm merging it
And restarting the refcount PR in CI so we can see that's still ready to go (and also confirm the PNC change was a non-issue)
What would you say is better for where we're currently at?
# THIS?
cwd =
Env.cwd!({})
|> Result.map_err(\CwdUnavailable -> Exit(1, "Unable to read current working directory"))
|> try
# OR THIS?
cwd =
Result.map_err(
Env.cwd!({}),
\CwdUnavailable -> Exit(1, "Unable to read current working directory")
)?
# OR EVEN??
cwd =
Env.cwd!({})
|> Result.map_err?(\CwdUnavailable -> Exit(1, "Unable to read current working directory"))
I'm leaning towards the second because it feels closest to the PNC vision despite not having static dispatch to chain it
The second is the syntax I used for the recent experiment in #bugs > Compiler panic for naming mismatch between type def and use
The first is also nice, in that the chain is clearer.
Though it's not quite the point of your example, this case is specifically handled by binop ?:
cwd =
Env.cwd!() ? |CwdUnavailable|
Exit(1, "Unable to read current working directory")
I think it's relevant here because the problematic cases are function calls with only a couple args in my eyes
I think this helps with that
Minor thing... but I don't think --migrate
catches idents in import with as
e.g.
import "todos.html" as todoHtml : List U8
It doesn't for me either
Is that not a pattern?
It's probably something special, then
Just upgrading the basic-cli snake_case builtins PR to PNC now
Can you run it on basic-webserver
's build.roc while you're at it?
I have tested basic-webserver locally and it works well.
main! : _ => Result {} _
main! = \args ->
parsed_args =
Cli.parse_or_display_message(cli_parser, args, Arg.to_os_raw)
|> try Result.on_err! \message -> Err (Exit 1 message)
That try
statement is not PNC'ed
I haven't been removing all try's. It's just messy for longer chains.
I guess here we could... my heuristic has been -- if it's going to be a chain in SD then leave it for now
Fine by me, but we might need to make another pass when whitespace calling is removed, since I think that'll happen before static dispatch is ready
Though I may be wrong on that
I expect that's the case because static dispatch is hard to implement, and \args ->
going to |args|
is easy to implement
I guess we can just cut a few testing releases and keep these breaking platform PRs in sync and do an actual release once we get |args|
and ${interpolation}
etc
It's easier to update them as we go, but does look a bit strange being in this intermediate state -- so we probably don't want to do a full release with what we currently have
I don't have a strong opinion, do whatever is easy
We don't even need testing releases with the nix CI's, we just land the breaking change in roc main and we're good to go
Even without static dispatch... these examples are looking really nice!
So much talking on Zulip lately. Have to skim tons of things cause I don't have the time to keep up.
same
A good sign, even if it has its drawbacks
yeah, I'm a big fan :grinning_face_with_smiling_eyes:
26 messages were moved from this topic to #compiler development > checking doc comments by Luke Boswell.
I'm presuming that foo_!
is how we'd represent a re-assignable effectful closure. Or would it be foo!_
?
foo!_
I'm okay with that
Or just don't have reassignable effectful closures....who needs that
Generally I'd agree, but you never know
I don't actually feel that strongly against it. I could see building up a lazy computation with a for loop
14 messages were moved from this topic to #compiler development > roc-json compiler bug by Luke Boswell.
What should a IgnoredValue field desugar / canonicalize to? e.g.
{ _name: 123 }
I ask because I have a test that triggers this panic: https://github.com/roc-lang/roc/blob/10ea93e838d290beda1da5ca2bf3c9c1fb53e6e0/crates/compiler/can/src/expr.rs#L2109
But it looks to me like desugaring doesn't remove IgnoredValue fields at all: https://github.com/roc-lang/roc/blob/10ea93e838d290beda1da5ca2bf3c9c1fb53e6e0/crates/compiler/can/src/desugar.rs#L1272
Those are used for record builders
Ideally, an IgnoredValue field would be a runtime error outside of a record builder
So we should probably emit one at that site
That you linked
Regarding PNC... my muscle memory definitely hasn't caught up yet. I'm really glad we have the --migrate
flag, because I'm finding I write a lot of code before I realise, and then it's easy to just fix it up.
I assume at some point it will be natural to add the (
)
but for now it's taking mental effort to remember.
I'm WFH today, so should have time between things to poke at PR's etc. Let me know if you need anything.
Luke Boswell said:
I'm WFH today, so should have time between things to poke at PR's etc. Let me know if you need anything.
Got anything with Chocolate?
I'm still waiting for those Red Bulls
Not offering emotional support today, sorry @Sam Mohr
Did we accidentally or deliberately remove the todos.roc
example from basic-cli? ... (I'm probably the guilty one)... I can't remember
Oh nvm, I think it's just @Brendan Hansknecht added todos.db
for the sqlite example...
All good
[crates/compiler/mono/src/reset_reuse.rs:1244:9] symbol = `temp.64`
[crates/compiler/mono/src/reset_reuse.rs:1244:9] symbol = `temp.data2`
[crates/compiler/mono/src/reset_reuse.rs:1244:9] symbol = `temp.orientation_str`
[crates/compiler/mono/src/reset_reuse.rs:1244:9] symbol = `temp.56`
[crates/compiler/mono/src/reset_reuse.rs:1244:9] symbol = `temp.54`
[crates/compiler/mono/src/reset_reuse.rs:1244:9] symbol = `temp.name_str`
[crates/compiler/mono/src/reset_reuse.rs:1244:9] symbol = `temp.64`
[crates/compiler/mono/src/reset_reuse.rs:1244:9] symbol = `temp.58`
Are the numbered symbols like temp.58 for variables that we make ourselves like during desugaring or something like that?
I think so, that or also in Can I assume.
I think they are also for all intermediates when we breakdown expressions, but would need to double check that.
Sometimes yes
They can get generated in mono
Basically anywhere an ident would be used but we don't have a user-defined one
We generate a new one with a numeric name since that isn't syntactically valid
If we always converted if
statements to when bool is 1 -> ...; _ -> ...
there wouldn't be a perf drop, would there?
I'd expect it to always be the same perf or better
That should help us simplify our IR if we can get away with it
We already do this in mono/src/ir.rs, so I'm gonna roll with it
should be the same perf, although we can give nicer error messages for type mismatches if we know it was an if
compared to a when
Fair
and that relies on constraint gen knowing it was an if
compared to a when
, and constraint gen (currently) does a separate pass over the canonical IR
I'm thinking about post typechecking, on the build/
side of things
oh that's fine then
Yeah, great
incidentally, I think we can potentially do it differently someday (where we generate constraints at the same time as making the canonical IR) but it's tricky because order of constraints matters, and we need them to be sorted by dependencies (e.g. first constrain functions whose types don't depend on other functions, etc.) - right now, doing the constraint pass after canonicalization (and sorting the canonical IR) takes care of that, but if we wanted to do (constrain + canonicalize) at the same time, then we'd need to have the ability to sort the constraints separately after the fact
since we don't know the dependency order until after canonicalization has completely finished
so I don't think now is the right time to do that since it might break things if we get it wrong :big_smile:
It seems like the lesson of the moment is to do everything in separate passes for greater likelihood of correctness
yeah
and then in the future we can experiment with incrementally combining things to see if they make that one step faster while maintaining correctness
but yeah, correctness is the name of the game!
Sam Mohr said:
It seems like the lesson of the moment is to do everything in separate passes for greater likelihood of correctness
This is also echoed by nanopass
is there any docs/papers on how tags are inferred? it seems like they break the fundamental unification rule of not allowing type variables to be solved to a type containing themself.
@Ayaz Hafiz
@emma the topic in literature is "polymorphic variants", roc takes a combination of approaches
Can you elaborate on the unification rule you're describing? I'm not sure I follow the problem fully
Anton said:
Sam Mohr said:
It seems like the lesson of the moment is to do everything in separate passes for greater likelihood of correctness
This is also echoed by nanopass
This makes some sense, but generally nano pass compilers are measurably very slow. I think it is better to make reasonably sized passes than nano passes. But perf depends on a lot of things. So leaning toward the simplest first makes the most sense.
For a long time I've wanted to try to "nanopass, but use some compiler magic to merge passes and make it fast"
Seems like the auto-annotate type signatures PR is ready to merge. Anyone against doing that now? I don't think it will interfere with the release process we're working on right now
Let's hold that for 24 hours to be safe
Frank Pfenning and co with yet another banger https://blog.sigplan.org/2025/01/29/parametric-subtyping-for-structural-parametric-polymorphism/
Where do you find these papers? Conferences?
used to track but too lazy now. found it on a forum
Which forum?
I think I could replace some doomscrolling with this kind of info
image.png
ooph
We’ve been talking
i love yappers ngl
It’s almost like deciding to rewrite the compiler sparks a lot of convo
crazy thought
My plan is to hang in there as long as possible... and hopefully I'll learn the dark arts of roc's compiler internals by osmosis -- reading all the discussion and seeing the PR's roll in.
Man, I'm glad we have the rest of the team for the low-level stuff. I never had a good introduction to it and it still feels like black magic. I look forward to reading lots of PRs that might educate me!
Me too buddy, me too
3489a045-b487-43da-b4c6-48ab2b961b0b.png
lol I was caught up on all threads yesterday, this has gotta be some kind of record in here :clap::saluting_face::popcorn:
Wait.... #compiler development is a different color for different people....that bugs me.
It's red for me, green for Jan, looks to be lavender for ayaz.
It’s greyish brown for me
light purple
Same color as #beginners but slightly brighter
Ayaz Hafiz said:
emma the topic in literature is "polymorphic variants", roc takes a combination of approaches
@Ayaz Hafiz apologies for the very late reply - I'm aware of the term "polymorphic variants" but I haven't found any (simple) descriptions of how it gets implemented on top of typical HM systems - it seems to be hard to fit into the usual unification rules. Do you know if theres any good explanations of how roc or other languages implement it?
there are a ton of papers from Leijen etc. but I don't know how interpretable they might be. Daan Leijen’s “Extensible records” paper is the simplest approach I know of, and is similar to the approach Elm/Roc use. I'll give a short description of the mechanism in Roc, feel free to ask for elaboration if it's confusing or incomplete. The approach for records is to bind a type variable to a record that unifies freely; for example the call (fun r -> r.x) {x: 1, y: 1}
types the function as {x: a}b -> a
(where b
is free) and the record as {x: int, y: int}c
(where c
is free); {x: a}b ~ {x: int, y: int}c
yields b = {y: int}d, c = {}d
(where d
is fresh).
The approach for variants is the dual. That is, an expression `A 1 = `B "foo"
types the LHS and RHS as [A int]a
and [B str]b
respectively, the constraint [A int]a ~ [B str]b
then solves a = [B str]c, b = [A int]c
and both the LHS and RHS have type [A int, B str]c
. However, a function [A, B]a -> {}
is materially different than [A, B] -> {}
- if you have a pattern match over the input on the former, you always need a catch-all branch, whereas a pattern match on the latter only needs a branch for A
and B
. To handle this, Roc selectively removes the remaining unbound variable (a
in [A, B]a
above) depending on the usage pattern of the variable, for example if it is used in an pattern match without a catch-all branch.
@emma Poly. variant implementation in Fir may be useful: https://github.com/fir-lang/fir it's basically "A Polymorphic Type System for Extensible Records and Variants" except it doesn't have "absent" constraints (yet, at least). I use them for error values and exceptions: https://osa1.net/posts/2025-01-18-fir-error-handling.html
We also had a discussion here with @Ayaz Hafiz on variants and some of the counter intuitive type checking with rows, that may also be helpful. (gotta run now so can't find the thread, sorry)
Ok yall this might just be sleep deprivation talking but what do yall think about a C++ concepts / rewriting rules feature in Roc
That is to say, we have a way to say “if this expression is well formed, then for some other expression we can derive its meaning”
I'm not understanding. Could you give an example?
So, like, basically auto functors
So, if we have a function f on x to y, we can apply f to Maybe x without specifying that it is a map
Basically, allowing code to be fully modular (a more complex structure can always be used in place of a simpler one)
So this is a combo of Ocaml's module functors and Zig's "everything is a struct", right?
Essentially, yes
(From what I know of these)
I think we've so far been successful in reducing the complexity of features in Roc to basically flavors of functions and values (values being sum or product types) with some light desugaring thrown on top. This comes with the recent push to remove proper typeclasses in the form of abilities and module params, since the simpler Roc without them can still be as productive as necessary
I think adding a feature like that would be cool and I can definitely think of places we could get use out of it
But I'm not sure how much we can do with that which we can't do today
And it would now be in my eyes the most complex feature in Roc
So that's a pretty high bar to hit, unless we can justify its addition
So I'd love to see some patterns we think this could help with in Roc apps
Hmmmm yeah a simple type system is definitely better
I didn't expect zig to be so similar to Roc. If zig did not exist and we made a low-level language starting from our Roc experience, we could have ended up very close to zig :)
:exploding_head: that's delightful to hear, now I may actually try learning it!
Anton said:
I didn't expect zig to be so similar to Roc. If zig did not exist and we made a low-level language starting from our Roc experience, we could have ended up very close to zig :)
What makes you say this? Just curious. I wouldn't consider them very similar, but I guess it depends on which aspect you re comparing.
error union, try, switch, compiled, likes to go fast, type inference, structs are similar to records, no shadowing, generics, snake_case variables, PascalCase types, likes to keep it simple
saw the switching to zig thing, very cool, very very cool. Also nice that it's happening right in the repo next to the existing code. I'll keep an eye out, been needing an excuse to write some zig. I come to you now at the turn of the tide.
good to see you Lucas :)
nice to see you too Anton. My Aiken side quest was fruitful and it got the niche adoption it needed to last haha. New gig is compiler stuff for this language called compact that compiles to zero knowledge circuits.
zero knowledge circuits
Can you define this? Or link to it?
@Brendan Hansknecht this kind of thing
https://eprint.iacr.org/2019/953.pdf
https://coingeek.com/how-plonk-works-part-1/
Interesting, I've forgotten the details of how zero knowledge proofs work but I do remember that I thought it was cool :big_smile:
I just locally added a .ignore
file that ignores the crates directory. So nice to have helix only seeing the new compiler and limited files.
Yeah, I'm happy only opening the src/ directory in Neovim
I keep editing build and ci files which are outside of src
Seems like a lot of people are excited to contribute because of the accessibility of a rewrite! You love to see it
Is there even a syntax planned to allow making custom tags for custom unions usable within the same module in the way that you can import exposing [Result.[Ok, Err]]
?
Trying to figure out if/how canonicalization scope should handle that
I think the answer is no right now
@Joshua Warner do you have a plan for how the compiler post-parsing will know the "region" for some entity?
Will it be a pair of token IDs, a start and an end index into the source file, or a line and a column?
Or is that info not expected to last past parsing/maybe formatting in this compiler
I’ve gone back and forth on that
I would say probably not separate line/col numbers
The simplest of course is probably a pair of byte offsets
The possibly compelling alternative would be to use a node id for a parser node - that’s a single u32 that can be mapped back to a precise source range when needed, and is smaller that the u32 pair needed for a byte range
When someone runs roc check Module.roc
and Module.roc
is not the root of its package, or at least not the only file in the package, which set of diagnostics do we want to show them?
It seems pretty easy to do the last one, and I'm not sure if someone would want the other cases, though maybe I'd understand the second one. It also shouldn't be that hard to implement.
Also, we currently allow setting an alternative package main file using a query fragment in the package download URL, and I don't think there's a need for that anymore. Anyone opposed to me removing that?
I think it's one of the first two
one way to think about this is "if I want a particular set of outputs, is there any possible way to convince the compiler to give them to me?"
Yep
there's certainly already a way to say "give me all the errors from the package"
namely, give roc check
the package's root module directly
so then the question is, if I say "I want roc check
to check just this one file" - do I also want it to give me the errors from its dependencies, or not?
The decision between the first two in my mind is "if we give only the errors for a single module even if its deps are wrong, then is that correct?"
yeah
I can see arguments either way :thinking:
like "I know about those, I'm not working on them right now" vs. "I want to know if this thing completely works"
Lets say my module imports something that's malformed. We won't report two errors, so it'll look like my usage is correct even if the thing I imported is broken
Doing just the current file and nothing else should be easier to implement actually
Because we'd have to analyze the dep tree for "this modules and its deps"
We naively can't use the toposorted module list because its sorted over all modules
I'll ignore implementation for now
Yeah, not sure
I'll default to just the current module for now to avoid bikeshedding
And we can do the option of "my module and its deps" later since it's more work but not obviously the right option
The big thing for me is that I think it should fully check the single module if possible
This means it must know all of the types of the imports
I don't think it needs to recursively check all dependencies even things that aren't used.
Yes, I think we should do typechecking and all for at least the module and its deps. I'm just suggesting that we only report the errors for that module
yeah the work for sure has to be done regardless
I think it could be confusing if we don't show errors from other modules
out of curiosity, what is the theoretical / computational model that Roc's type system uses?
There's some notes here https://github.com/roc-lang/roc/blob/c72993c7519188698937bcc359d15e5b032ebe78/crates/compiler/solve/src/solve.rs#L44
My understanding it that it's very similar to HM
Also see https://github.com/roc-lang/rfcs/blob/ayaz/compile-with-lambda-sets/0102-compiling-lambda-sets.md
This is a new architecture that @Ayaz Hafiz laid out to solve our issues with compiling Lambda Sets
That RFCS repository has a lot of gold in there
My first pass at implementing the coordinate logic has just been merged in (PR), meaning that the main thrust of the work to typecheck or build a module (and its deps) is put together, outside of running the code with the interpreter/LLVM. I'm sure there are some bugs in the impl, but we'll want to get something basic working for canonicalization and import resolution to work said bugs out, so those stages are my next project.
As a side benefit, I was able to remove most of the refAllDeclsRecursive
calls from our testing entrypoint.
So... argument order. I tend to write functions with "stuff to focus on" near the front of the arg list, and "needed context" near the end of the arg list. This is the Roc way to do it, and usually the Rust way as well. But in Zig, the allocator seems to usually get passed first (or second after the method target). What's the right pattern here?
Yeah, zig's default is allocator first
I would lean towargs following that convention
So I would say env/allocator first (and I assume env stores the allocator so you never need both)
But ultimately it is pretty minor, it is more important we aim to be consistent than which choice we pick
But just the allocator? A lot of our functions in the Rust-based compiler use env
and arena
. Would it be better to do (gpa, important, other, env)
or (gpa, env, important, other)
?
Yeah, I was gonna vote we put the gpa
in the env
to make it less of a pain to use unmanaged lists, and then convert the Safe(Multi)List
types to unmanaged
I personally would lean to env, gpa, important, other
. It feels most conventionally correct.
But yaeh, I would put gpa in the env
Then you only need to pass around local arenas (which I think will be relatively rare)
Okay, sounds good
On the above discussion, here's a PR for cleanup: https://github.com/roc-lang/roc/pull/7657
std.mem.Allocator
as the first argument to functionsgpa: std.mem.Allocator
in base.ModuleEnv
and prefer that for passing around the Allocator
ModuleEnv
should be used enough to usually be in the L1 cache, meaning the double indirection of ir.env.gpa
shouldn't be expensiveArrayListUnmanaged
over ArrayList
(same for all list-backed collections) since it avoids storing a pointer to the Allocator
in every collectionAllocator.Error
instead of discarding it, this should prevent us from accidentally over/under-catching errors@Brendan Hansknecht please take a look if you get a chance
It's extremely mechanical as PRs come, but it's 32 files, so no pressure if you don't have time
@Brendan Hansknecht what's the strategy for avoiding for (items) |item| item.deinit()
deeper in our data types? Making an arena just for that collection, or making a top-level arena that gets passed around, or something else?
Depends on the specific use case. One option is definitely to just use an arena for the items if they never grow and that is reasonable. Another option would be to flatten things out more. That way instead of a list of lists, you have a list of slices and then a second list of data.
I think flattening is what would make most sense for exposed_idents
. Cause exposed_idents is a static slice of data essentially. So instead of nesting you would have a larger context ollections.SafeList(Ident.Idx)
. Then you would have each ModuleImport include exposed_idents
as a Slice into the list.
slice being two u32
s, not a zig slice
Thinking about that latest video from Andrew, that makes sense
Yeah, it is exactly that pattern
Do we want to support two imports of the same external module in a single module? Feels like yes or no, it doesn't matter much. It feels like there's not much of a reason to want that except for only exposing some ident in a more narrow scope. At least at the top level, I don't see it being much besides a fragmentation in imports that makes reading harder. Could we maybe put warnings on subsequent imports at the top level?
That's my current plan until we decide otherwise
I'm being reminded by the parsed AST that we can do import Foo as AliasedFoo
, and that might be useful to allow two of.
I'll forgo the warning for now to trim complexity and therefore implementation time
yeah I agree with that plan!
I think it makes sense to treat imports like declarations: shadowing is a warning
(and multiples is fine as long as they're in non overlapping scopes)
Sam Mohr said:
On the above discussion, here's a PR for cleanup: https://github.com/roc-lang/roc/pull/7657
- Prefer passing the
std.mem.Allocator
as the first argument to functions- Put the
gpa: std.mem.Allocator
inbase.ModuleEnv
and prefer that for passing around theAllocator
- the
ModuleEnv
should be used enough to usually be in the L1 cache, meaning the double indirection ofir.env.gpa
shouldn't be expensive- Prefer
ArrayListUnmanaged
overArrayList
(same for all list-backed collections) since it avoids storing a pointer to theAllocator
in every collection
- Not enforced for testing collections/buffers
- Use Andrew Kelley's suggestion for catching OOM errors by capturing the
Allocator.Error
instead of discarding it, this should prevent us from accidentally over/under-catching errorsBrendan Hansknecht please take a look if you get a chance
I wish I had built my change on top of this....
Have a bad merge/rebase now?
It's not gonna be too bad, just tedious. I'll probably just checkout my copy of all of these and fix the errors by hand
I touched most of the append
calls
And removed a ton of them
Make sure to check my comment on your PR
That solution really doesn't work
Those items aren't in scratch anymore
I could get rid of the iterator completely honestly, and would do that over returning a slice over the extra data
That's what I wrote, right?
The *Iter
functions just create an Iter
around an ArrayList
of the extra_data
And you want it to be typed ids
So you can make that happen by making the desired slice into the UnmanagedArrayList
and calling @ptrCast
to change the []const u32
into []const StatementIdx
That would look like this in usage:
var i: = @as(usize, @intCast(t.annos.span.start));
while (i < t.annos.span.start + t.annos.span.len) {
const ann = TypeAnnoIdx{ .id = fmt.ast.store.extra_data.items[i] };
fmt.formatTypeAnno(an);
if (i < (t.annos.span.len - 1)) {
fmt.pushAll(", ");
}
i += 1;
}
over the current
var anno_iter = fmt.ast.store.typeAnnoIter(t.annos);
while (anno_iter.next()) |an| {
fmt.formatTypeAnno(an);
if (i < (t.annos.span.len - 1)) {
fmt.pushAll(", ");
}
i += 1;
}
I literally made the change in my editor for expressions and everything compiled
I'm sorry, but I'm not sure why those are desirable over
for (fmt.ast.store.typeAnnoSlice(t.annos)) |an| {
fmt.formatTypeAnno(an);
if (i < (t.annos.span.len - 1)) {
fmt.pushAll(", ");
}
}
Sorry, I misread the code you wrote. I'm literally was trying to avoid slices here at all
I'm not sure why you'd want to, the Zig compiler works with slices really well
Well the first option doesn't create a struct of any sort if that's what we are optimizing for
I'm optimizing for for (items) |item|
working well in Zig
And it's more terse than the other options
If you have reason to believe that is truly faster, then I will do it
I think it's more terse and at least as performant, if not sketchy maybe faster
The iterator needs to do math and casting every time between number types
A slice doesn't seem to have those problems
Ok, I just wish that kind of feedback could have came on the first PR - whose purpose was to get such feedback so I applied something everyone was on-board with before doing it everywhere.
But luckily this is largely a mechanical change - maybe I could even make it comptime and reduce the number of unique functions I have to maintain here
I'll do the change for you if you want
I'm happy to
No worries
I'm already doing a rebase
I can get rid of two struct defs and hopefully 7-8 functions
And about 8 type aliases
It'll cut about 60 LOC
yeet
The problem I had with slices won't affect me here
I didn't think about that before and was worried about slices in general - but that's only an issue when we are appending to them (and they can resize and invalidate pointers)
makes sense
I'll have that change (and merge conflicts addressed) in the morning
Yeah, not going to try to comptime it. it really won't save much code since Idxs and DataSpans aren't Generic. I'd have to write a complicated comptime block that seems too clever by half
I'll just replace the Iter functions with Slice functions and remove the Iterator and IdIterator structs and the type aliases
Ok, this is done
Do y'all already have a plan for how to do type inference w/ multiple static dispatch? (the planned replacement for abilities). I looked into how koka does it, and it leaves a lot to be desired.
:thinking: what's multiple static dispatch?
I may have misinterpreted this line
image.png
I figured it meant koka-style "there are multiple functions in scope with the same name, and we use type inference to determine which one to use". It's a way of doing ability-like or typeclass-like behavior
If we see x.foo(a, b, c)
. We query the type of x
. When then load the module that type is from and call ModuleOf(x).foo(x, a, b, c)
x must get specialized to a concrete type for this dispatch to work.
@Brendan Hansknecht we actually plan to allow multiple custom types per module, though a single one per module is preferred
Sure, that is unrelated to my comment though.
So it's a module lookup paired with a type lookup
Type doesn't really matter (past finding the module) cause we don't have namespaces within modules. So the module + function name is a unique identifier. Of course can still get a type mismatch if you use the wrong custom type.
yep
yeah so "if we see x, we query the type of x", if implemented in the koka way, really restricts things in an annoying way. For example, in koka, if you have
fun x(y) {
y.z() + 2
}
if there are two z
functions available, one returning an int
and one returning a string
, you would think that this should be inferrable because the return value is being used as an int
. But koka chokes on this, requiring you to add an annotation or disambiguate the z with a prefix. imo this forfeits one of the key benefits of hindley-milner-style type inference, namely that the "order" of expressions doesn't matter, all information is essentially considered "at once".
I've been thinking about how to implement something like this for my own language, and I think the route I'll take is, if you see a z
that is ambiguous, just give it a type variable, and let inference continue. At the end of inference, find all the type variables that were created in this manner, and then use the type information that resulted from the inference to disambiguate.
We'd infer the type of that function as
x : y -> Num a where y.z() -> Num a
x = |y| y.z() + 2
In that y.z()
is inferred as y where y.z() -> b
and then b
is constrained to some number type Num a
because of the addition
ok and if there's later in that function something that makes the type of y concrete, you then do the module lookup to satisfy the "where" bound?
Yes
solid
does where y.z() -> Num a
mean where ModuleOf(y).z(y) -> Num a
?
Basically
yeah that's a neat solution
It means "in the module where the custom type y
is defined, there needs to be a function z
that returns a type that unifies with Num a
, and the function z
needs to take a y
as the first and only arg"
We discussed what you're calling multiple dispatch a little bit and it got ruled out for ambiguity reasons
custom type means "nominal type", as opposed to structural, I imagine?
Yep
this does mean you would be able to have multi-type traits, but that seems like a reasonable sacrifice
Also, you can explicitly do module(y).foo(...)
. This is needed for things like decode
where the type is not an arg.
In this case y
is a type variable.
Not a regular variable
Is anyone doing some collaborative coding, like pairing or ensemble programming while working on the compiler? I had tried to do some work on the rust compiler, but never quite found a foothold in the codebase. It seems like now might be a good time to get back into it, since things are smaller and a little more well-thought-out
There definitely has been some. That said, less so recently (just the ebb and flow as folks getting busy). That said, I'm sure we can make more time for it.
This weekend I'll do a session or two!
Ok, cool. Thanks for the info! I'm coming back from vacation on Saturday, so I'll check them
I'm in the middle of a move interstate, so my availability for roc has dropped off a little this week. I'm definitely keen for more collaborative coding. @Trevor Settles happy to spend time with you or anyone else interested, and sharing what I know about things.
Now is a great time to get involved with the new zig compiler work. There are lots of parts that are basically just stubbed out with a TODO or placeholder waiting for someone to give it some love.
If there isn't a draft PR then you can probably assume it's not being looked at or worked on by someone else.
I think for anyone who wants to help -- getting involved by reviewing PR's and leaving comments, contributing ideas to discussions, and reading the code/making PR's are all really helpful.
And if you want specific feature work, we can carve our a specific chunk for you, big or small
Hello everyone, my name is Reed Harston.
I just came here from the Zig Showtime interview about the compiler rewrite.
Roc looks interesting, and I’d like to try it, but I don’t have any projects right now that seem like a good fit. But as soon as Richard said Zig people are welcome to come pitch in without even using Roc I paused the video and came right here. :big_smile:
I won’t have time right now to really dig into development, but I’d love to help out so I was thinking about jumping in and commenting on PRs and the like, and it turns out that just two message ago Luke said that would be helpful, so I plan on starting there.
Glad to be here and I look forward to getting to know you.
Welcome to the club! We'd love to get any help on PRs or anything
Awesome @Reed Harston, any feedback on PRs are very welcome. I'm a zig noob and have been having a blast learning as I go. It's a different vibe from Rust, but so far it's been really great. I'm super excited for the builtin fuzzing features.
Sam Mohr said:
This weekend I'll do a session or two!
Anyone interested in an ad-hoc contributor meeting this weekend? Chat current status. Chat what could use support. Chat what is needed to unblock various work. Also, I'll show off tracy for profiling.
I'd love to if the timing works out!
https://www.when2meet.com/?29672150-cuT6d
Would love to join but I’ll be out of town
Ping for a few folks to see if they have interest in joining/availability to add: @Luke Boswell @Sam Mohr @Trevor Settles @Anthony Bullard @Joshua Warner
Would like to join if possible, but very hard to know ahead of time if any particular time slot works
Sorry, I'd love to but not available at all this weekend. In the middle of moving house (like the actual moving part).
lets do cc: @Anton @Richard Feldman @Sam Mohr
Should we do a Google Meet or something?
https://meet.jit.si/moderated/672e6cc127f67c355ac33a6c800ab60136fe27de5db001bed534ad2d2e3ee02d
There in a minute!
are lurkers allowed?
sure!
Sorry y'all! I've been super heads down the past couple of weeks with a major work project and family stuff
But :hi: I'll try to catch up sometime soon
No need to apologize :)
Ugh, using bash for anything mildly complex is such a disaster
Yep. That’s what Python is for. :big_smile:
Or Swift even. It does a good job at simple single file scripts.
Roc can do it too :) but I was just modifying an existing bash script that I thought was going to be a quick fix :p
Oh, yeah. Nothing ever seems to be quick in a bash script.
Ugh, using bash for anything mildly complex is such a disaster
Lol; at my last company we had a script called deploy.sh (the name of the entrypoint, but there were probably 100+ files involved). It was IIRC 25k+ lines.
The only place I’ve worked where bash was reasonable was Google. They have so much tooling around it it’s pretty nice
But usually, if you are making an array, you should probably use a real PL
100% agree
I was very impressed at how functional and usable that script was, despite all the problems
Now that I have used python a lot more, I feel less and less confident that anyone should use it. Like bash, convenient in the small. But to go to medium, you really need type checking and python type checking is slow and pretty buggy. Yes it has tons of libraries and scales farther than bash, but it kinda really sucks except in super interactive notebook style flows. And even in that case you often get stuck when you realize that production requires leaving the interactive flow and then you want fast type checking and what not.
So I guess I put python minorly ahead of bash, but a lot less far ahead than one would hope.
yeah i litterally just had a discussion about how python is essentially the C(++) of scripting languages, it just had things piled on top of it until it barely works anymore
And at the same time, this is part of the reason it is so successful
Yeah, it may be the C++ of scripting languages but it is the C++ of scripting languages :grinning_face_with_smiling_eyes:
Yep
I like it. I’d never seen that before.
The funny thing is, at work the two languages we use are C++ and Python. Just those two.
Almost everything at my work is c++ or python. Though we have a ton of mojo too.
Ooh, first time I’ve met someone that has used Mojo. And professionally! What have you thought of it so far? Live up to its promises?
I knows it’s designed for accelerating the AI work being done in Python, but I wonder if anybody has started using it for anything else they’re doing in Python but want to be fasted.
Ooh, first time I’ve met someone that has used Mojo.
Brendan works there :)
I guess I need to catch up on the community. :sweat_smile:
Actually, I think I did know that. :upside_down:
this is the SIMD-powered lexing strategy (based on simdjson) that I want to try out someday for Roc: :grinning_face_with_smiling_eyes:
https://lobste.rs/s/2ydd6d/deus_lex_machina_releasing_new
except without the AVX-512 part :sweat_smile:
Will be very cool to see
And tinker with
honestly python is just so... python the moment I hear that a language has "whitespace sensitive syntax", I can't deal with it, even something like NIM
except for haskell
because it has (that might be overstating it given that it never works) a way to write syntax in a way that is whitespace agnostic
wait oh no, it has RValues and LValues? C++ flashbacks
although frankly i should learn nim
I think whitespace sensitive syntax is totally fine
I think tabs verse spaces leads to some issues, but it is generally more readable
I’ve had multiple issues at work where we release a Python script to production and then a certain flow is broken because of a white space issue and our testing did catch it because we didn’t hit that flow or something.
Yes there that flagged an area to improve our process, and we did, but it still is frustrating that it was possible at all for that to get out into production because to the human eye everything looked right, but only because the tabs were the right width to match the spaces on the other lines.
(There were two files that had a nearly identical section of code and when one was edited that section was copied to the other file. But one file used tabs and the other used spaces. Why? Ask someone from 5 years ago. I wish I knew. When I pasted it over everything looked right, and I happened to test the file that was right, and not the one that was wrong. Yes, user error. I haven’t done that again!)
Reed Harston said:
I’ve had multiple issues at work where we release a Python script to production and then a certain flow is broken because of a white space issue and our testing did catch it because we didn’t hit that flow or something.
Yes there that flagged an area to improve our process, and we did, but it still is frustrating that it was possible at all for that to get out into production because to the human eye everything looked right, but only because the tabs were the right width to match the spaces on the other lines.
That should be easy to catch with any longer or type checker for python.
Also, not saying it is a good state, but definitely manageable.
But this is where I definitely prefer mojo (though it is young and needs other support) but it can be compiled and fully type checked and etc.
Part of our process improvement for Python includes linters run in Jenkins when we push changes.
We didn’t have anything before because all our Python is written by C++ firmware developers that write their code like it is C with classes. So you can imagine the state of our Python. :sweat_smile:
Anton said:
Ooh, first time I’ve met someone that has used Mojo.
Brendan works there :)
@Brendan Hansknecht curious what part of Mojo you work on. Mojo is my #1 upcoming language of interest. Hugely impressed with what it can do already. You guys have been doing magic (:wink:).
I don't work on mojo at all, just work with it some. I initially worked mostly on the graph compiler (compiles ai models down to kernels). Now I work at the max framework level helping build out the tools we use to write models. Compilers are all c++. Frameworks are all python. Kernels are all mojo. I interact with everything to some extent, but mostly python lately.
Looking at @Jared Ramirez's PR I can see a few comments like this... which I assume is because we have that lint to ensure all top-level decls have a comment.
Would it be possible to not require that where it's just re-exporting something from another file? I think Zig's language server/editor integration? picks up the referenced comments -- and just appends the additional comment.
/// Type Desc
pub const Desc = types.Descriptor;
/// Type Rank
pub const Rank = types.Rank;
/// Type Mark
pub const Mark = types.Mark;
Screenshot 2025-05-13 at 18.23.47.png
I do like the way Jared has done it here.
I'll see if I can easily add an exception for re-exports
I do like the way Jared has done it here.
It's not clear to me what generalized means though
All times I’ve seen an attempt to enforce doc comments via tooling, it has devolved in exactly this way
I’m very skeptical of the cost/benefit tradeoff here
In the old compiler we had nothing to encourage comments, and so we had very few comments. In the new compiler we force comments on pub things and we have decent comments in a lot of places and some redundant ones. I feel like it's already working, how do you see this devolving?
and so we had very few comments
And a lot of cases where I felt "I have no idea what this thing is".
In my experience, things start off great, and then the percentage of low-value comments creeps up and up until they’re the cast majority of comments
Hmm...
Sound plausible.
I still think this test/trail is well worth it given the experience of the rust compiler.
FWIW, there were very few comments in the Rust unify/subs code, and there were many places where I think they would have been helpful.
I can see how we'll likely end up with many self-describing comments like above with the pub
enforcement, but I would guess the total number of good comments will be higher than without
Encouraging good comments is IMO much more about carrot than stick
What's a good carrot in this case?
Being really appreciative of someone going out of their way to document things well, for one
Gentle prodding on the code review is good, to try to encourage that. But imo it’s really important that it’s actually a person making that code review comment rather than a bot. It’s much more likely people will respect and listen to a person.
It’s much more likely people will respect and listen to a person.
That is true but it's also hard to review consistently with different people.
One alternative approach that comes to mind is a small checklist that shows up when you make a PR in github:
A gentle reminder
I added comments where needed
Below this one we could provide a collapsible list with all pub functions that you added but did not provide a comment for. So they have an easy overview without a hard reject mechanism.
Luke Boswell said:
Would it be possible to not require that where it's just re-exporting something from another file?
/// Type Desc pub const Desc = types.Descriptor; /// Type Rank pub const Rank = types.Rank; /// Type Mark pub const Mark = types.Mark;
Done :) PR#7790
Oh no :p
zig_dependency_graph.png
This actually looks relatively one direction and decent....though base is definitely a bit sketch as a super aggregator
How to make llvm compile at a reasonable speed (from one of my coworkers who works on mojo): https://youtu.be/6Ro6XTHAffY?si=iBPwdfjcA8hKqUQG
I put up a slightly improved version of the import graph on https://anton-4.github.io/roc-compiler-vis/zig_dependency_graph.png
It's updated once a day based on the roc repo :)
is this done by generating Mermaid from the dependency graph?
I use graphviz based on grepped imports: https://github.com/Anton-4/roc-compiler-vis/blob/main/.github/workflows/publish_every_day.yml
Should we upgrade our rust version?
I was trying to run typos (like in CI) and were running an older version of rustc with our rust-toolchain.toml
$ cargo install typos-cli --version 1.32.0
Updating crates.io index
error: cannot install package `typos-cli 1.32.0`, it requires rustc 1.80 or newer, while the currently active rustc version is 1.77.2
`typos-cli 1.28.2` supports rustc 1.75
I say go for it
Should be pretty minimal
I'm happy to poke at it. I'll wait for @Anton who may have ideas. I think we're just following the upgrade guide the toml file.
In CI I remove rust-toolchain.toml before I install typos, that makes it work. Upgrading rust can come with some serious clippy work but with today's AI assistance that may go smooth, you're welcome to give it a try.
You can also install typos outside nix (and outside the roc repo) locally , that may be the easiest way.
It's nice that by encouraging code to be pure it is also easier and faster to test it
I remember that my professor deducted me a ton of points on the exam for my first programming (java) college course. Not because my solution was incorrect, but because I didn't put stuff in classes. Ironically, now I am working on a programming language without any classes, full circle, haha :p
He did also end up teaching me functional programming in a different course, so that's an acceptable redemption arc :)
@Jared Ramirez I noticed your explanation about the occurs check and not using Mark, I thought that looked good. You're switching back to use Mark, why is that? I'm just curious.
Yeah, I kinda went back and forth on it. I think using Marks will have slight better performance (though it may be negligible), and I figured if it was done that way in the rust compiler, there was probably a reason.
I was also gonna see what Richard thought when he reviews it! I kept the “covert to use Marks” change isolated in a commit, which can easily be dropped if the original way sans Marks is preferred
I have an idea, which I'm not sure has precedent - what if we counted the number of transitive unifications we'd done for a given type (e.g. we visit a canonical IR node and kick off a unification of it and some other type; all the unifications resulting from that original, before we move on to the next canonical IR node, would be "transitive" here) and if that count is under some threshold, then all occurs checks are no-ops
so the idea there would be that the occurs checks are there to catch infinite types, which are a rare but very serious error case
rare bc it's a very uncommon mistake to make, and serious bc it hangs the compiler by default (unless there's an occurs check to detect that we're stuck in an infinite loop)
so the idea would be that if we're stuck in an infinite loop, our iteration count will exceed the threshold, and if we're not stuck in an infinite loop, then it very rarely will, and so it's much cheaper to skip the checks
this would have a perf cost in the case where we do actually have an infinite type, but I think that's totally fine bc it comes up so rarely in practice
(plus the perf cost would probably only be paid for that one type in the whole build, so likely still wouldn't be noticeable)
thoughts?
So like we have a counter and every time we recurse in unify, we increment it, then if we exceed the threshold we do a full occurs check?
I think we’d have to reset/decrement the count for each recursive branch (eg if count = 1 at the start of a function, we recurse to unify args 1, we’d have to reset the count back to 1 before unifying args 2). This is how tracking seen variables in a regular occurs check works though, so I think that should be fine
Thinking more, I wonder if we couldn’t combine unify and occurs check. We could track seen variables in unification and check on each recursive call if we’ve already seen this var
yeah, my thinking was just that we could avoid checking the variables at all in the very common case where there aren't any infinite types
since incrementing a counter and checking if it's over a threshold is so cheap it's basically free
One thing, to double check. Can we make sure that this checker allows infinite recursion through indirection? Also through non tag types. And multiple aliases
The old version had really annoying edge cases
For example, a list or a box should break infinite recursion and be valid:
Node a: { data: a, children: List(Node(a)) }
And by multiple type alias recursion, I mean things like:
Wrapper a : { data: a, next: Inner(a) }
Inner a : [ Next: Wrapper(a), None ]
yeah definitely need to allow those cases!
Yeah, just want to make sure they get thought about early in case it matters to design
I know this is a common beginner trip currently
@Jared Ramirez would be some good cases to add tests for! :smiley:
Interesting approach to compiler backend construction: https://arxiv.org/pdf/2505.22610
Are we planning on limiting Roc source to ASCII?
Can I have idents or fields in records that are using a wider set of characters?
For context I was looking at Richard's WIP PR
// Then sort by name (ascending)
const lhs_str = ctx.env.idents.getText(lhs.name);
const rhs_str = ctx.env.idents.getText(rhs.name);
return std.mem.order(u8, lhs_str, rhs_str) == .lt;
But I guess this wouldn't mind what encoding is used... it's just using the raw bytes.
Idents are currently ascii-only in the new compiler, and I think the intent is to keep it that way unless/until there’s a strong signal otherwise
I see the implementation in tokenize
pub fn chompIdentGeneral(self: *Cursor) bool {
var valid = true;
while (self.pos < self.buf.len) {
const c = self.buf[self.pos];
if ((c >= 'a' and c <= 'z') or (c >= 'A' and c <= 'Z') or (c >= '0' and c <= '9') or c == '_' or c == '!') {
self.pos += 1;
} else if (c >= 0x80) {
valid = false;
self.pos += 1;
} else {
break;
}
}
return valid;
}
yeah I think it's reasonable to consider non-ASCII in the future, but even assuming we want to support it, it's enough of a project that I don't think we need it for 0.1.0
My roc-math package has constants like π and function arguments like θ, but I think I'm the only one who's used them before :sweat_smile:
I've been merging changes and generally going ham on our WIP Can PR https://github.com/roc-lang/roc/pull/7806.
It's back to compiling and tests passing now... but I've definitely missed a heap of things. I can already see things that I've missed between my "fixes" and merge conflicts.
I'm going to take my time going back through all changes and clean it up so we can merge it hopefully this weekend sometime.
Random tip; merging multiple Stdout.line
into one can drastically reduce your build times! So like this:
Stdout.line!(
"""
Testing Tcp module functions...
Note: These tests require a TCP server running on localhost:8085
You can start one with: ncat -e `which cat` -l 8085
"""
)?
Instead of
Stdout.line!("Testing Tcp module functions...")?
Stdout.line!("Note: These tests require a TCP server running on localhost:8085")?
Stdout.line!("You can start one with: ncat -e `which cat` -l 8085")?
Stdout.line!("")?
Doing this for all subsequent Stdout.line
in tests/tcp.roc reduced build time from 9315 ms to 728 ms!
Splitting lots of ?
calls over multipe functions also helps a lot
Makes sense
Way less closures for llvm to deal with
Or hmm....actually with the old tasks it makes sense.....surprising now that it makes much of a difference
Would be great to see a profile of the compiler before and after
Also
from 9315 ms to 728 ms!
Feels like a bug or file system caching or something else crazy....bug is my biggest guess.
just too high of a delta
so I'm curious
I'll try to profile later today :)
Hmm, addr2line is being used after recording the flamegraph and it's taking its time... it's also single threaded :graveyard:
I've never had that before, could be caused by our perf issue :thinking:
Addr2line, generally isn't fast, but it taking forever could mean essentially infinite recursion
It's still going :p I'm just going to make a simpler reproducer on Monday
I've been putting together as much as I can for this Can PR but I have some architectural questions.
The type store currently holds a reference to the ModuleEnv, however I feel like it should live in the ModuleEnv so we can make fresh variables in Can, and later solve them.
Where should the type store live?
I have gone with Can IR owning the type store. It seems to be working ok for now.
Type store in module env makes sense! Probably better than specifically can IR
But not sure it matters too much
yeah can should not be making fresh type variables in this design
rather, we should reuse can idx to be var, and initialize the types store with a capacity of at least the highest can idx
Richard Feldman said:
rather, we should reuse can idx to be var, and initialize the types store with a capacity of at least the highest can idx
this was exactly what i was expecting
I've been going ham on the Can PR @Anthony Bullard
It's probably past the point where I should stop and get some feedback. Maybe even worth merging now (it's passed all the CI checks)??
I feel like most of my effort has been on the fringes and infrastructure like SExpr stuff that I understand, and I haven't really touched core Can implementation things, aside from copying Anthony's lead I think.
I've got another day tomorrow I can probably spend hacking on this, so if anyone is able to skim through and give me any pointers that would be appreciated.
i will take a look today Luke
sorry i've been so busy
Thank you :grinning:
I started refactoring Can a little now I'm getting my head around what is going on there. It's such a massive thing I feel like some module structure for the API will help in the long run.
I've been trying different things out, and I'm basically just rolling with what I think is nicest. I figure if there are any concerns I'm happy to change it back etc.
One change is instead of calling it IR
I started calling it CIR
short for Can IR. It just helps a lot to qualify things, particularly when we are mixing both Parser IR and Can IR in some places.
I'm really tempted to refactor and update Parser in a similar way, move Node and NodeStore out into separate files and rename to PIR
. But I can wait until I get some feedback on these Can changes.
Richard Feldman said:
rather, we should reuse can idx to be var, and initialize the types store with a capacity of at least the highest can idx
I think we might need to initialize some fresh vars as well – for Can nodes that need more than 1 var. Like a tag
needs the tag union var & an extensibility var, and we can't use the Can Idx for both. Though very possible I'm missing something
Sam Mohr said:
Type store in module env makes sense! Probably better than specifically can IR
I just realised I misread this comment when you wrote it Sam, and understood it to mean the opposite. :sweat_smile:
Thinking about this more, I agree it needs to live in the ModuleEnv as that is where things should be that outlive any particular compiler stage IR.
Hatching plans with @Jared Ramirez to move it, but needs some coordination to update the type_store a little as it currently holds a pointer to ModuleEnv.
Jared Ramirez said:
Richard Feldman said:
rather, we should reuse can idx to be var, and initialize the types store with a capacity of at least the highest can idx
I think we might need to initialize some fresh vars as well – for Can nodes that need more than 1 var. Like a
tag
needs the tag union var & an extensibility var, and we can't use the Can Idx for both. Though very possible I'm missing something
ah, so my thinking there was that we just make empty can slots
in other words, let's say we have a can node that needs 3 vars - instead of that can node getting 1 slot and then we make 2 extra vars for it, we just give it 3 can slots
it might sound like it's wasting memory, but my thinking is that the alternative is to spend the same amount of memory writing down which type vars they need
and the advantage of this is that if you zero out the empty can slots, it becomes bidirectional; not only can you go from any given Can Idx to a type var, you can also take any given type var and get back to the Can Idx where it came from
which in turn means (I think) that we could avoid propagating region info through to type checking, and still get error messages that map back to source
because you can go from the type var that had the mismatch back to the original source via the Can IR node
(that might not turn out to work in all cases though, I'm not sure yet haha)
aside -- should the Can IR nodes have region info? only some do at the moment, and it was something I was looking at yesterday. I've threaded things through for the sexpr to display that for those nodes that do.
Everything all the way down the stack should have region info
Important for debug info with llvm
another aside -- what about desugured nodes? do they inherit region info
Also Re desugaring... should we have a similar to rust, single pass that runs before Can, or should we be doing that desugaring as we go through the Parse IR
Luke Boswell said:
another aside -- what about desugured nodes? do they inherit region info
Yes
Luke Boswell said:
Also Re desugaring... should we have a similar to rust, single pass that runs before Can, or should we be doing that desugaring as we go through the Parse IR
I think this was discussed a bit, and the decision was likely we should just start with a new pass just for desugaring, but at the same time, if it is trivial to do on a different IR, that is also fine.
I want to skip desugaring for now
as in, don't desugar anything - just keep it around as a first-class thing for later passes to deal with separately
How do we handle string interpolation then?
Here is my naive attempt at that in Can...
description=Simple string interpolation
type=expr
~~~SOURCE
"Hello ${name}!"
~~~PROBLEMS
NIL
~~~TOKENS
StringStart(1:1-1:2),StringPart(1:2-1:8),OpenStringInterpolation(1:8-1:10),LowerIdent(1:10-1:14),CloseStringInterpolation(1:14-1:15),StringPart(1:15-1:16),StringEnd(1:16-1:17),EndOfFile(1:17-1:17),
~~~PARSE
(string (1:1-1:17)
(string_part (1:2-1:8) "Hello ")
(ident (1:10-1:14) "" "name")
(string_part (1:15-1:16) "!"))
~~~FORMATTED
NO CHANGE
~~~CANONICALIZE
(call
(lookup (ident "Str.concat"))
(call
(lookup (ident "Str.concat"))
(str "Hello ")
(lookup (ident "name")))
(str "!"))
~~~END
Basically, de-sugared the interpolation into multiple Str.concat
calls (that I imagine exist).
we just make an "interpolated string" can IR node type
one reason for doing that is so that if there's an error message later (e.g. a type mismatch) we can do better than referring to Str.concat calls that aren't in the code
bc we'll still know it was interpolation, so we can just put that directly in the error message
In the same vein as the above ^^ should we make a node for Binop instead of desugaring to a call to Num.add
etc
Yep
Or maybe a node per binop type in the end cause tags are cheap
@Joshua Warner @Anthony Bullard -- any objections to me refactoring the Node
and NodeStore
out of the parser IR and into their own files (in the src/check/parse
directory).
I think it helps clarify the abstraction a little more, basically the Store is a lower level abstraction for good memory/cache efficiency etc, and the IR is the higher level representation of the AST with helpers for parsing.
Also, we wen't back and forth a little over what to call the IR, but settled on IR instead of AST (I think because it's just another IR in the overall compiler pipeline)...
Does anyone have any objections to using PIR
for Parser IR?
Maybe it's a little OCD or something, but I like fully qualifying the types and having a short and clear distinction between the parser and can types is helpful.
Err, why not just AST? That’s the much more standard name for this.
Calling everything an IR does make things sound uniform, but that also heavily dilutes the meaning of that term.
I don't mind using AST
It's literally our concrete syntax right?
No, it is actually abstract(er)! (Than the old “ast” was)
In the process of refacatoring the NodeStore
out into it's own file... I'd like to ziggify the types a little using nesting... so instead of PatternRecordFieldIdx
we have Pattern.RecordField.Idx
, and getPatternRecordField
becomes Pattern.RecordField.get
etc
I know what I'm proposing is a little like moving the deckchairs on the titanic, and kind of pointless ... but I'm really enjoying these features of Zig and I think it helps to establish the patterns we want to use elsewhere.
Not sure I fully understand how Pattern.RecordField.get would work?
I think I need to see that in context
Is there a particular reason we are using a struct instead of an enum for index types in the parser?
e.g. pub const BodyIdx = struct { id: u32 };
instead of pub const BodyIdx = enum(u32) { _ };
We use both in different places. From what I can tell it seems the enum approach is preferable, but I'm not sure if there was a specific reason we did this. I assume it was just the first thing we thought of at the time and it would be ok to use the enum.
i don't personally think it matters
the size of it is what matters and the ergonomics
Ok, thank you. As far as I can tell this is about the only difference.
return .{ .crash = .{
// using the struct approach
.expr = .{ .id = node.data.lhs },
.region = node.region,
} };
return .{ .crash = .{
// using the enum approach
.expr = @enumFromInt(node.data.lhs),
.region = node.region,
} };
yep
i prefer the former, but don't care that much
I've been making these changes, and have found many places where one would work better, and other places where the other works better. :shrug:
I don't feel particularly strong about it either, but I'm a far way through making all the Idx's use the enum approach. I'll keep going and see how it looks.
@Jared Ramirez mentioned he is ready to wire up the type vars properly in Can.
I think we should merge my gigantic Can PR that I've been working against so he has a more mature structure to work with.
I've been in a move fast and break things / refactor mode so theres a lot of improvements there, but all low technical risk as I'm mostly fleshing out the new Can stage and wiring everything up so we have snapshots with diagnostic errors and region info etc (along with some very primitive Can analysis).
Brief update on progress with Can.
Improved the snapshots reporting so they should now display all the problems, and print out useful information. Haven't got region info, but for now just seeing the source is useful to understand where the issue is.
~~~SOURCE
app [main!] { pf: platform "../basic-cli/platform.roc" }
import pf.Stdout
main! = |_| Stdout.line!("Hello, world!")
~~~PROBLEMS
CANONICALIZE: ident_not_in_scope "Stdout.line!"
CANONICALIZE: can_lambda_not_implemented "|_| Stdout.line!(..."
For hello world we are almost there.. just need to figure out the final details on lambdas. I'm not sure if we should have a lambda Expr
variant in the Can IR or if there is some analysis we should be doing here to make it something else. We could have just missed that variant when translating from rust.
~~~CANONICALIZE
(can_ir
(top_level_defs
(def
"let"
(pattern (5:1-5:6)
(assign (5:1-5:6) (ident "main!")))
(expr (5:9-5:42) (runtime_error (5:9-5:42) "can_lambda_not_implemented"))
"#0")))
All the Can nodes have regions information now, and they display correctly in our SExpr snapshots.
I've implemented most of the Expr
, Pattern
, and Statment
handling in the NodeStore
now, theres a few nodes here and there that need to store things in extra_data
but I figure we will do that as we get to them. It should be mush easier now to implement the analysis logic without thinking too much about the low-level store.
We've discussed it before previously I think... but should we consider having a constants file somewhere where we keen magic hardcoded numbers that configure the compiler?
I was looking at @Jared Ramirez PR's for unify and the max_depth_before_occurs
variable made me think of it.
@Brendan Hansknecht can you remember talking about this
?
yes, I was thinking this too
I like the name limits.zig
and putting it at the top level dir so everything can access it (and it shouldn't import anything)
When debugging the rust compiler, note that you need to comment out this line to see local vars with lldb or gdb
I'm tracking the implementation of Scope is not quite correct. I discussed with Richard the intended design to accomodate var
and shadowing and I intend on revisiting that next.
Heads up that I’m starting work on basic type solving, now that we have basic type variable generation in Can. Just positing for viz!
I'm experimenting with some additions to my .rules file. I thought I'd made a PR to share. I might give it a week or so and see if I feel like tweaking it.
https://github.com/roc-lang/roc/pull/7845
So far it's been good. I find the agents are much quicker at finding the relevant information for a task.
Cool stuff, can you try mentioning the Glossary.md file? It has a lot of links and Roc compiler specific terms too, I would expect it to be helpful.
The Glossary is a very useful resource, and it's getting better all the time :smiley:
Are type declarations only valid at the top-level? (I'm reasonably sure they are valid everywhere) but I figured I'd clarify. I'm going to leave the type_decl in statement positions as a TODO for now.
they're valid everywhere, yeah
type decls are valid everywhere??
i've never seen that before outside the top level
Define a type decl?
Do we mean:
1
SomeType := [InnerType]
2
SomeAlias : [InnerType]
3
x : SomeType
x = InnerType
1 and 2
I could see 2 being valid anywhere....though a bit strange.... 1 sounds pretty crazy given we anchor static dispatch to the module and it would just generally be odd to define a named type within a scope in roc.
Why do we want the everywhere? I feel like that only makes sense if we allow nested modules or other more intense scoping of some form.
it's mostly to avoid having an arbitrary restriction in the language
like yeah you can just use them wherever, no special rules
Fair enough
I can't wait to see the use of an opaque type that is defined in a function and returned.
i don't even know what that could do
it would not be very useful :joy:
seems like a footgun
I think it literally could only be used if it also was returned with functions that can use the type
wouldn't that mean that the type would have to live outside the scope of the function ?
Not sure, but it does sound complicated to reason about at a minimum
heh, these are fair points
maybe we should just disallow it for 0.1.0 and revisit later
Yeah, based on trying to implement it, my vote is to restrict to top level for now. It seems strange having type decls in Scope
Type decls inside a function could be useful for making sure you understand the type of some intermediate variable - e.g. like a type assertion
for that purpose, you'd need to use 1+3 or 2+3 (i.e. together with a value defined)
3 is fine, it's 1 and 2 that are a problem, and more 1 than 2
is there a thread with the current work plan or smth like that for coordination? I'm trying to get into zig rewrite but don't understand current goals and workflow. just don't know where to start at. I see src/README.md
, it implies loads of work to be done, but nothing really specific. maybe there are some shortcuts to avoid redundant job?
e.g. I see records canonicalization isn't implemented yet. I assume there's a way to adapt rust implementation, but not sure if there an agreement on best practices. shortly, what's the rule of thumb for the rewrite?
I would say the rule of thumb is to coordinate via zulip, let people know what you are up to by making Draft PRs etc.
I've been focussing on implementing more of Can along with the s-expressions for debugging and making nice error reports.
Anthony is currently looking at the Builtins I think.
Jared has been hooking the first parts of type unification.
Richard has been laying the foundation for runtime representations so we can evaluating things using the interpreter.
I was thinking about looking at either how do we fuzz Can or start using the cache for single modules, next.
There's a mountain of unimplemented or hacked together things though in Can. Records haven't been touched. When (now match
) hasn't been touched. There's many error reports that we can bring across from the rust implementation.
I've been picking a snapshot that represents a real bit of roc code and seeing what I can do to get it working end to end, and letting that take me on a yak shaving adventure.
I wouldn't recommend doing anything in the build (or later) stages yet. I think the immediate goal is getting a basic interpreter going that we have high confidence is working correctly.
I've been noodling around with adding a secondary output file for snapshot tests - an html file with a more interactive version of the snapshot, with things like hovering over tokens or ranges in the snapshot output causing the corresponding source range in the input to be highlighted (and visa-versa). My thinking is that we'd always generate these .html files next to the existing snapshots but have them git-ignored to avoid bloating the repo. We can also start to integrate some of the debugging/visualization tools for various compiler passes into this html output. Thoughts?
Love it
One of the s-expression changes I've been thinking about, is should we be displaying the node_idx on each node? We have some nodes that reference other nodes (for example a pattern for a declaration). I could imagine hovering over a reference like that, and it highlights the s-expression node being referenced.
Oooh, yeah!
For this reason I've ben thinking of making all node_idx's use a common formatting like #12
(I think this is similar to what we had in the rust debug IR formats)
Initial version of that, only adding this new fanciness for the TOKENS section: https://github.com/roc-lang/roc/pull/7864
We have flags on the snapshot tool like --verbose
... we could include others like --html
or --watch
.
Maybe that loads a single web app that then has all the snapshots
The renderToHtml for Reports renders using the standardised css classes we want
The s-expressions could have a renderToHtml
helper too instead of the plain text version
Anyway, there's a lot of different directions we could go with this. Great fun.
I'm going to work on binop tokens -> CIR mapping if it's ok
sounds good! We decided to have binops represented directly in CIR rather than desugaring them to function calls
the idea is that it will simplify having error messages trace back to what was originally in the source code
nice! that's exactly what I was about to ask
(as opposed to converting them to a Call node which then has to record "was I called via a binop, and if so which one?" which has the same information but with more redundancy and memory usage)
what's the key difference between ast and cir then?
one critical difference is that lookups have all been resolved to other CIR indices instead of using strings
another important one is that we've organized everything so that type variables map 1-to-1 with CIR indices, whereas they certainly do not map 1-to-1 with AST nodes :smile:
oh yeah we also have translated =
into either declare or reassign, depending on whether there was a previous declaration under the same name made with var
Is this valid in 0.1 syntax?
checkNumber = |num| {
if num < 0 then
"negative"
else if num == 0 then
"zero"
else if num > 100 then
"large"
else
"positive"
}
Figured it out ...
module [checkNumber]
checkNumber = |num| {
if num < 0 {
"negative"
} else if num == 0 {
"zero"
} else if num > 100 {
"large"
} else {
"positive"
}
}
i think we need to reevaluate this syntax a bit
not when using blocks, but when using expressions after any condition
I think it should be fine to omit the brackets there
like this should work
if num < 0
"negative"
else if num == 0
"zero"
else if num > 100
"large"
else
"positive"
there's a separate question of whether the formatter should insert brackets
but I think it simplifies the rules of the language to not require brackets there
and I think it would actually be nice to be able to do like
x = if things.is_empty() 0 else 1
Yeah, I'd like to avoid brackets when possible
I think in a lot of cases the squirly-less syntax is unreadable, especially without newline / indent
Onelining is not a must for me, they do easily become hard to read
I find myself wanting to put parens around the condition if I don't use squirlies
And it's on oneline
same here, except if there's already a closing paren from a function call
e.g.
x = if things.is_empty() 0 else 1
vs.
x = if (condition) 0 else 1
vs.
x = if condition 0 else 1
first 2 look fine to me, and I don't think this is necessary:
x = if (things.is_empty()) 0 else 1
as an aside, something I've wondered about is whether we should use curly braces like we use trailing commas - as an indicator of whether the formatter should use newlines
e.g.
fn = |arg| arg + 1
fn = |arg| {
arg + 1
}
if there are curlies we always do multiline
x = if things.is_empty() 0 else 1
x = if things.is_empty() {
0
} else {
1
}
Seems reasonable
I think blocks do always use multiline
in the formatter in the zig compiler
cool, TIL!
I _think_, it's been a month or more since I worked on it
let me check that out
Oh, if the block is retained it's multiline, but it's usually elided in the formatter if not necessary
I think that would be easy to change. If I see a useless block that is single-line, keep it (it will naturally be multiline)
yeah I like that idea!
it would be really cool if we could get to a point where the author has control over single-line vs multiline, without needing the formatter to be aware of newlines
using a combination of blocks vs no blocks and trailing comma vs no trailing comma, etc.
I think we are already partly there
:thinking: what's remaining?
assuming we adopted those rules for trailing commas and blocks
just removing all checking for newlines
I'm having a crack at Can for Records now.
I'm working on de-structuring record patterns now. I've merged main in and just adding to my Records PR. Also addressing the review feedback. Thank you @Jared Ramirez and @Anthony Bullard
What do you think about format @1.5-2.3
for regions in sexprs?
(e-tuple @1.1-17.2
(e-binop @2.5-2.11 (op "+")
(e-int @2.5-2.6 (raw "4"))
(e-int @2.9-2.10 (raw "2")))
Right now:
(e-tuple @1-1-17-2
(e-binop @2-5-2-11 (op "+")
(e-int @2-5-2-6 (raw "4"))
(e-int @2-9-2-10 (raw "2")))
Does anyone have something I can help with? Open for coding any area that is zig! Last time I made the txt -> markdown conversion for our snapshots. I'm down to do something one step harder, or similar. Will be back later to read zulip.
Random side-note: git does terrrrrrribly with rebasing (or merging, I think) changes when one of the sides does a case-only rename (e.g. sexpr.zig->SExpr.zig), and you're operating on a case insensitive filesystem (e.g. mac).
Like... it seems to not be able to do it on its own
error: Your local changes to the following files would be overwritten by merge:
src/base/SExpr.zig
Please commit your changes or stash them before you merge.
Aborting
hint: Could not execute the todo command
hint:
hint: squash bb9df942054bfb94f2a3d154e6a549ea510edfec Switch to two-column layout in snapshot html and jazz up parse output
hint:
hint: It has been rescheduled; To edit the command before continuing, please
hint: edit the todo list first:
hint:
hint: git rebase --edit-todo
hint: git rebase --continue
(note: I made no 'local changes' here - this was git rebase stepping on its own toes)
Sigh. Going to have to stitch this back together manually...
... which wasn't as bad as it could have been
But, my goodness, I wouldn't have expected git to be this terrible
(thanks for listening)
https://github.com/roc-lang/roc/pull/7889
Screen Recording 2025-06-27 at 21.53.50.gif
Now doing highlighting of tokens when mousing over the parse tree
Already caught a few cases that look kinda fishy in terms of the token range we choose for various ast nodes
Also added click-to-scroll to the token view
And made things two column so that the click-to-scroll actually makes sense
Side note: @Kiryl Dziamura I think I'm starting to agree with your suggestion to not have the js and css embedded as zig strings; I'll fix that soon.
Joshua Warner said:
Already caught a few cases that look kinda fishy in terms of the token range we choose for various ast nodes
Yeah, I've picked up on that in a few places when going through a snapshot and manually verifying the regions... but haven't circled back to investigate yet.
Norbert Hajagos said:
Does anyone have something I can help with? Open for coding any area that is zig! Last time I made the txt -> markdown conversion for our snapshots. I'm down to do something one step harder, or similar. Will be back later to read zulip.
If no one else has an answer, can also fix fuzzer bugs, but I bet someone has solid work that could use a hand
Okay, I'll go with fixing fuzz errors. Will start with
zig build repro-tokenize -- -b XzD/ -v
which is this input:
_0�
I'm thinking we rename When
to Match
in Can to avoid confusion in future... if we agree I'll save it for a later PR that just does that one change.
10 messages were moved from this topic to #compiler development > single quote parsing by Luke Boswell.
Looking into fancy-ing up the html snapshot generation for canonicalization now, and one of the slight snags I've run into is that can regions are defined in terms of byte offsets rather than token indices - was that an intentional choice? Thoughts on just using token indices?
Tokens are a bit easier to work out highlighting for in the markup (versus arbitrary byte ranges)
I don't think it matters much for the internal representation. We can still convert that back to line/col pairs right?
Yep
Is it still easy to slice the original source to get what we need?
You need to keep around the list of token offsets (or be okay with recomputing it if needed)
That is a slight disadvantage
We're keeping around a list of line_start offsets already
I think Richard has ideas to pull Region information out of the IR's entirely
I was talking with him the other day about a hypothesis that the AST and CIR nodes can map 1-1 and therefore we could have region information in a side array and just use the AST/CIR node index to get the region information back out.
That feels like a relevant design consideration.
Also we need the line/col information for s-expressions and reporting only -- so it would be ok to calculate those as needed. I'm assuming we wouldn't need to re-read the file, but using information like line_starts etc.
Removing region from IR's would have the advantage of not polluting our IR and keeping the nodes much smaller, hopefully reducing cache misses etc.
FWIW I'd actually like to get line/col out of sexpr's
Maybe I should just bite the bullet (byte the bullet?) and implement arbitrary byte range selection in the source view
Not super clear how tho
Joshua Warner said:
FWIW I'd actually like to get line/col out of sexpr's
Why? I find it really easy to manually validate using the line/col information my editor shows.
Gotta start using the fancy html version :stuck_out_tongue_wink:
Anyway, snark aside, I want to have the conversion to line:col to happen in once place (probably at the point we're rendering the sexpr), rather than at 100 different callsites in the ast/can/etc.
I think it is already like that -- only calculated when generating the s-expressions
So the in-memory sexpr will always be either token indices or byte offsets
It's calculated when _generating_ the sexpr. I want to move it to being calculated when _rendering_ the sexpr
The image file of the new compiler dependency graph is now to large to be hosted on github :(
I'll see if webp fits
I know mermaid charts work well on gh, maybe it will fit well
https://github.com/mermaid-js/mermaid
scroll the readme down to see an example of md embedded diagrams
webp worked :)
where I can see the graph btw?
https://github.com/Anton-4/roc-compiler-vis
Is this valid syntax -- can we use as
in any pattern?
match shape {
Rectangle({ width, height }) as rect => ..
}
We have f64_literal
in CIR.Pattern... what is this used for, we aren't matching on an exact float literal are we? we don't have eq
for floats.
Luke Boswell said:
Is this valid syntax -- can we use
as
in any pattern?match shape { Rectangle({ width, height }) as rect => .. }
ideally yes
Are we keeping the default/optional value thing? { name, age ? 0 } => ...
-- I understand this is being removed as it was confusing
nope that's gone
static dispatch is replacing it
I noticed build.zig.zon
requires zig 0.14
, but nix installs 0.13
. 0.14
seems to be required only for the fuzzer
looks like nixpkgs.url
is outdated
Nix setup is still for the rust compiler
@Anton -- could we look at making the default nix for our zig compiler, and use a flag or something to get the devshell for the rust compiler?
Do we need nix? Isn't our only explicit dependency of that sort zig?
fwiw i do not use the devshell and just do nix shell 'github:mitchellh/zig-overlay#"0.14.0"'
That might be nice to add to our BUILDING_FROM_SOURCE.md
or CONTRIBUTING.md
docs.
Also, I have no real issue with a nix shell, just for now:
Aside, now that we have check working would be good to instrument it all with Tracy and look at some more detailed traces. See how perf looks and such.
Brendan Hansknecht said:
Aside, now that we have check working would be good to instrument it all with Tracy and look at some more detailed traces. See how perf looks and such.
Would love to see that
General question, what should the limit be for numbers of errors printed by something like roc check?
It's not something we've talked about... Maybe in the hundreds? Is it different in the TTY or LSP use case?
My million line of code file, which is just repeated code (and thus has tons of repeated declaration errors), essentially just prints forever.
That's less than ideal.
Made me realize that maybe it isn't the right UX. Also, due to the extract lines perf issue, hangs for a while before it finally starts printing.
Is aggregation an option? How many distinct diagnostic messages we would have? It might be a message and a list of file:line:column
grouped by the same kind of error
Brendan Hansknecht said:
Also, I have no real issue with a nix shell, just for now:
- I want to make sure rust still works and is easy to use
- I like the friction we have for dependencies. Nix lowers that friction some. If possible, I want us to remain zig only for core compiler flows.
For the future it's great to be able to jump between commits and immediately get the correct zig version, so a nix flake would be nice. To keep friction for adding dependencies, we can set it up with github code owners so that changes to flake.nix require approval from a specific person. What do you think @Brendan Hansknecht?
Kiryl Dziamura said:
Is aggregation an option? How many distinct diagnostic messages we would have? It might be a message and a list of
file:line:column
grouped by the same kind of error
Quite possibly. Though in this case, it generated 3.6 million lines of output. Even aggregated I think we want to cut that off.
...and 3.6M more errors. Use --flag to output to a file
?
Anton said:
For the future it's great to be able to jump between commits and immediately get the correct zig version, so a nix flake would be nice. To keep friction for adding dependencies, we can set it up with github code owners so that changes to flake.nix require approval from a specific person. What do you think Brendan Hansknecht?
That's fair for auto updating the zig version. And I think as long as we have the right culture it shouldn't matter too much, but I think it could be reasonable to set code owners for build.zig and flake.nix
For my million line of code file. We spend:
0.4s -> tokenize + parse
9.8s -> canonicalize
56s -> generating diagnostic from can (this is after my fix that made this part way way faster)
0.6s -> checkDefs
1.5s -> generating diagnostics from check defs.
haha.... generating diagnostics is definitely really slow right now. Though I guess that isn't much of problem if we limit the number of diagnostics.
We generate ~395,000 diagnostics
I'm sure we can make diagnostics faster haha
and canonicalize too
obviously canonicalization should not be taking 20x as long as type checking, since it has astronomically less work to do :joy:
but this is really cool to know! any chance we could automate this so we can see the number change over time as we make changes? :smiley:
I can quickly set up something basic with the raspberry pi 4 CI server, to minimize power consumption long term. Can you share the file somewhere @Brendan Hansknecht?
Yeah, I can later. At a 4th of July parade now
Also, we may want to switch it out for something more representive I don't think many users will run files that generated nearly 400k diagnostics
Agreed :)
Brendan Hansknecht said:
Yeah, I can later. At a 4th of July parade now
Enjoy :fireworks:
Brendan Hansknecht said:
Also, we may want to switch it out for something more representive I don't think many users will run files that generated nearly 400k diagnostics
still, it's a fun one to have in the mix for exactly that reason! if everything is broken, we don't want the compiler to lock up your machine :joy:
For sure. For example, streaming diagnostics here would at least show the user that something is happening instead of simply having a crazy long delay.
Oh, and we use way to much memory collecting all the diagnostics. I think this uses about a gig making diagnostics asts (which are old school allocating per node asts) More like 400MB for diagnostics.
So I have two giant files that I have used. One based on a now quite old copy of our syntax grab bag and one based on List.roc.
New-List.roc
new.roc
My numbers above were from new.roc
which is from they syntax grab bag
Also, I guess we should add some form of the --time
flag that counts time in each stage of the compiler and reports on it.
For the list file (which if I recall is 1 million lines of code, but 1.6 million lines including comments), the breakdown is:
0.9s -> tokenize + parse
75s -> canonicalize
60s -> generating diagnostic from can (this is after my fix that made this part way way faster)
2.7s -> checkDefs
0.01s -> generating diagnostics from check defs.
900MB used to parse
400MB added by can
250MB added by diagnostics
Of note, input source is 53MB. So roughly a 25:1 ratio from memory usage after can to original source memory size.
added by can includes types I guess?
I assume it is everything in CIR?
Oh boy...we added in usingnamespace
at some point
That breaks incremental compilation
Breaking changes in zig writers
https://github.com/ziglang/zig/pull/24329
0.15.0 will be fun to migrate to
Hmm....zls doesn't seem to work for me on macos anymore. I don't know what went wrong, but it always hits:
/Users/bren077s/vendor/zig-0.14.0/lib/std/posix.zig:4533:22: 0x102497013 in kevent (build)
.BADF => unreachable, // Always a race condition.
This theoretically comes from the --watch
flag, but I am not setting the --watch
flag in zls.
Does zls work for others on macos?
It's working for me, but I built from source a long time ago and haven't updated it since. I'm not sure if the 0.14.0 branch in zls repo gets updates
Yeah, I haven't built it from source in a long time and I thought it was working last time I was working on roc, but I guess not anymore.
What is your zls config?
Also, I guess to clarify. I think the main issue is with diagnostics. I'm not getting build errors in roc currently.
I can still go to defintion and what not.
I just noticed this and thought I'd double check
**UNEXPECTED TOKEN IN EXPRESSION**
The token **crash "** is not expected in an expression.
crash
is supported as both a statement and as an expression right?
ideally should be! :+1:
Last updated: Jul 06 2025 at 12:14 UTC