The stream "show and tell" does not really fit. This topic is more "show and ask for help" :wink:
I would really like to call roc from go. But I have no experience with c, the c ABI, cgo or anything related. Finally, I succeeded to call the "platform-switching" example with a very simple go-platform.
https://github.com/roc-lang/roc/compare/main...ostcar:go-platform
There are some unpleasantness: When I call roc build
, I get the error:
🔨 Rebuilding platform...
An internal compiler expectation was broken.
This is definitely a compiler bug.
Please file an issue here: https://github.com/roc-lang/roc/issues/new/choose
thread '<unnamed>' panicked at 'failed to open file "go-platform/dynhost": No such file or directory (os error 2)', crates/linker/src/lib.rs:590:29
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'Failed to (re)build platform.: Any { .. }', crates/compiler/build/src/program.rs:976:46
This can be solved by calling roc build --no-link
. This creates a main.o
file. You have to move it inside the go-platform
folder. I was not able to build go with the main.go file from another location.
But after this, It is possible to call go run main.go
to call the go-platform or go build
to create a executable.
Would it be possible, to build the go-platform with roc build
?
I did not write any "roc_alloc
and friends"-code. I think, no allocations are needed for this simple example, since it only contains a constant string. But I still feels like a big step (with many even bigger steps to come).
Should I create a PR for the platform-switching-example? It is not much, but maybe a starting point for something more.
Oskar Hahn said:
Would it be possible, to build the go-platform with
roc build
?
The plan is to rip all that special code out anyway. We want platforms to control their own build and to tell roc what they need it to generate.
Roc may deal with the final linking (eventually only surgically), but we don't want it to know about and call every toolchain under the sun.
So you could add code for go into roc, but it would be short lived.
In a more general note, roc and go may not really be a good match. Go really dislikes interacting with C. It is horribly slow.
Should I create a PR for the platform-switching-example?
The examples repo is a better fit, we've been wanting to move most "other language" examples to there.
I created a PR for the example repo: https://github.com/roc-lang/examples/pull/152
I have no experience with cgo. So I can not tell, how slow "slow" is. But this article comes to the conclusion, that the overhead of cgo is similar to two mutex operations. I think, this is ok for many use cases.
My similar benchmarks are 17x faster than what Cockroach labs saw in 2015
haha. Glad it got better
Imagine if instead it was just as slow as 35 mutex operations :face_palm:
Probably would matter for platform design. Need to make sure roc is doing enough work that it is worth calling back and forth.
For a task heavy workflow, that could be super expensive.
It would also be expensive, when there are a lot of calls to roc_alloc
, roc_realloc
and roc_dealloc
. But it would still be faster then most IO-calls.
true....yeah. not great with roc's base design
It would be nice, if roc would allocate bigger chunks at once. I guess, that would also be good for other platforms.
that is a decision we leave to the platform. They can group and chunk allocations as they want. For example using an arena. Roc is just a consumer of what the platform picks.
If roc did it's own thing, it would likely defeat some of the optimizations that the platform is doing.
I took the false interpreter example. It is large enough of a run time with multiple allocations and tasks, so I thought it might be reasonable to measure the cost of roughly 40ns delay on each effect and allocations function.
I can't use an actual sleep function cause it is too slow for 40ns delay.
This seems to take roughly 40ns on my machine and not optimize away (generally it errs on the faster side in my testing):
static mut I: i64 = 0;
#[inline(never)]
fn cgo_cost() {
unsafe {
I = 0;
while I < 40 {
I = std::hint::black_box(I) + 1;
}
}
}
Used the nqueens example cause it takes about a second to run.
With the added delay, it takes 12% longer to execute.
So a hefty but definitely manageable perf cost. Also, other applications with better allocation patterns likely will have less of a perf loss.
This is a interesting comparison. But to get better numbers, I converted the false interpreter to use a go platform. This was a fun exercise: https://github.com/ostcar/roc-examples/tree/go-false/examples/false-interpreter-go
To run it, I called:
roc build --no-link False.roc
go build platform/main.go
time (echo "9\n" | ./main examples/queens.false)
For the go platform, it returns:
real 0m26,863s
user 0m27,679s
sys 0m1,165s
For the original rust platform, it returns
real 0m26.401s
user 0m26.252s
sys 0m0.004s
So the real time is about the same (I run it multiple times. Some times go was faster, some time rust was faster). There is a relevant difference in the sys-time, but this seems to be insignificant on multi core CPUs, when there is an idle core.
Oh wow. Awesome :+1:
Out of curiosity, can you run something like this to get more accurate time comparisions:
hyperfine -w 5 -r 20 -L v rust,go "/tmp/false-{v} examples/cli/false-interpreter/examples/queens.false <<< 9"
The two executables would be saved as /tmp/false-rust
and /tmp/false-go
. And it woud be run from the root of the roc repo.
roc build --no-link False.roc
This misses --optimize
On M1 machine, the go version was crashing (unsurprising, I think false
hits some roc bugs currently and the stricter memory protection can notice that)
For my x86 linux machine, these are the timings that I see with hyperfine
and --optimize
:
Summary
'/tmp/false-rust examples/cli/false-interpreter/examples/queens.false <<< 9' ran
1.10 ± 0.25 times faster than '/tmp/false-go examples/cli/false-interpreter/examples/queens.false <<< 9'
10% perf loss +- 25%. Go version has a crasy high standard deviation.
rust stdev ± 0.038 s
go stdev ± 0.729 s
So. Wanted a bit cleaner testing. So I removed reading from stdin
it is for a single character, noisy, and requires a shell. Instead just hardcode getChar
to return 9 in both rust and go.
I then noticed that go was missing buffered file reading, so I hacked that in:
diff
Also closed everything else on that PC and set the cpu to performance mode to make sure underclocking wasn't happening.
With much less noise, here are the full hyperfine results:
hyperfine results
For fun I also ran it with a zig performance tool by andrew kelley that adds more info:
poop results
This second tool was run for longer, so I will use its results. Go is 26% +/- 4% slower than rust. This does not show how much of the time is used by cgo though. Luckily, we have perf
for that.
go-flamegraph.svg
rust-flamegraph.svg
Looking at the go flamegraph, it looks like 10.5% of the time is spent in runtime.cgocallback.abi0
. Plus another 2.5% for the cgo malloc calls for 13%. Free didn't measure any overhead.
Of that time, 1.5% is spent in the actual malloc impl.
Thise would mean a total overhead of 11.5% for using cgo with this program. The other 14.5% of perf loss look to be coming from go runtime stuff and general setup.
this is such a sweet analysis, love it! :heart_eyes:
Wow. Very interesting. I think, for a simple webserver, it is fast enough
Oh yeah, for sure. As a note, false is probably a worst case (or at least used to be not sure how bad it is now). It allocates like crazy.
So that will be tons of calls and overhead. Most webserver and what not will be io bound. Also they will hopefully allocate much much less and have limited numbers of tasks run.
So this isn't a don't use cgo. It was mostly me being curious cause I used to work with go in chrome os and had always heard that was super slow. So wanted to test
Thanks for proving out go platforms!
Last updated: Jul 06 2025 at 12:14 UTC