Go Platform · show and tell · Zulip Chat Archive

The stream "show and tell" does not really fit. This topic is more "show and ask for help" :wink:

I would really like to call roc from go. But I have no experience with c, the c ABI, cgo or anything related. Finally, I succeeded to call the "platform-switching" example with a very simple go-platform.

🔨 Rebuilding platform...
An internal compiler expectation was broken.
This is definitely a compiler bug.
Please file an issue here: https://github.com/roc-lang/roc/issues/new/choose
thread '<unnamed>' panicked at 'failed to open file "go-platform/dynhost": No such file or directory (os error 2)', crates/linker/src/lib.rs:590:29
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'Failed to (re)build platform.: Any { .. }', crates/compiler/build/src/program.rs:976:46

This can be solved by calling roc build --no-link. This creates a main.o file. You have to move it inside the go-platform folder. I was not able to build go with the main.go file from another location.

But after this, It is possible to call go run main.go to call the go-platform or go build to create a executable.

I did not write any "roc_alloc and friends"-code. I think, no allocations are needed for this simple example, since it only contains a constant string. But I still feels like a big step (with many even bigger steps to come).

Should I create a PR for the platform-switching-example? It is not much, but maybe a starting point for something more.

Brendan Hansknecht (Jan 07 2024 at 15:39):

The plan is to rip all that special code out anyway. We want platforms to control their own build and to tell roc what they need it to generate.

Brendan Hansknecht (Jan 07 2024 at 15:40):

Roc may deal with the final linking (eventually only surgically), but we don't want it to know about and call every toolchain under the sun.

Brendan Hansknecht (Jan 07 2024 at 15:41):

Brendan Hansknecht (Jan 07 2024 at 15:42):

In a more general note, roc and go may not really be a good match. Go really dislikes interacting with C. It is horribly slow.

Anton (Jan 08 2024 at 09:55):

The examples repo is a better fit, we've been wanting to move most "other language" examples to there.

Oskar Hahn (Jan 13 2024 at 18:44):

I have no experience with cgo. So I can not tell, how slow "slow" is. But this article comes to the conclusion, that the overhead of cgo is similar to two mutex operations. I think, this is ok for many use cases.

Brendan Hansknecht (Jan 13 2024 at 19:04):

Brendan Hansknecht (Jan 13 2024 at 19:05):

Brendan Hansknecht (Jan 13 2024 at 19:06):

Probably would matter for platform design. Need to make sure roc is doing enough work that it is worth calling back and forth.

Oskar Hahn (Jan 13 2024 at 19:08):

It would also be expensive, when there are a lot of calls to roc_alloc, roc_realloc and roc_dealloc. But it would still be faster then most IO-calls.

Brendan Hansknecht (Jan 13 2024 at 19:10):

Oskar Hahn (Jan 13 2024 at 19:11):

It would be nice, if roc would allocate bigger chunks at once. I guess, that would also be good for other platforms.

Brendan Hansknecht (Jan 13 2024 at 19:12):

that is a decision we leave to the platform. They can group and chunk allocations as they want. For example using an arena. Roc is just a consumer of what the platform picks.

Brendan Hansknecht (Jan 13 2024 at 19:14):

If roc did it's own thing, it would likely defeat some of the optimizations that the platform is doing.

Brendan Hansknecht (Jan 13 2024 at 20:15):

I took the false interpreter example. It is large enough of a run time with multiple allocations and tasks, so I thought it might be reasonable to measure the cost of roughly 40ns delay on each effect and allocations function.

I can't use an actual sleep function cause it is too slow for 40ns delay.
This seems to take roughly 40ns on my machine and not optimize away (generally it errs on the faster side in my testing):

Used the nqueens example cause it takes about a second to run.
With the added delay, it takes 12% longer to execute.

So a hefty but definitely manageable perf cost. Also, other applications with better allocation patterns likely will have less of a perf loss.

Oskar Hahn (Jan 14 2024 at 12:57):

So the real time is about the same (I run it multiple times. Some times go was faster, some time rust was faster). There is a relevant difference in the sys-time, but this seems to be insignificant on multi core CPUs, when there is an idle core.

Brendan Hansknecht (Jan 14 2024 at 15:47):

Brendan Hansknecht (Jan 14 2024 at 15:57):

Out of curiosity, can you run something like this to get more accurate time comparisions:

The two executables would be saved as /tmp/false-rust and /tmp/false-go. And it woud be run from the root of the roc repo.

Brendan Hansknecht (Jan 14 2024 at 15:58):

Brendan Hansknecht (Jan 14 2024 at 16:17):

On M1 machine, the go version was crashing (unsurprising, I think false hits some roc bugs currently and the stricter memory protection can notice that)

For my x86 linux machine, these are the timings that I see with hyperfine and --optimize:

Brendan Hansknecht (Jan 14 2024 at 16:19):

10% perf loss +- 25%. Go version has a crasy high standard deviation.
rust stdev ± 0.038 s
go stdev ± 0.729 s

Brendan Hansknecht (Jan 14 2024 at 18:00):

So. Wanted a bit cleaner testing. So I removed reading from stdin it is for a single character, noisy, and requires a shell. Instead just hardcode getChar to return 9 in both rust and go.

diff

diff --git a/examples/false-interpreter-go/platform/main.go b/examples/false-interpreter-go/platform/main.go
index ac2d5c9..7080f5f 100644
--- a/examples/false-interpreter-go/platform/main.go
+++ b/examples/false-interpreter-go/platform/main.go
@@ -63,26 +63,43 @@ func rocStrRead(rocStr C.struct_RocStr) string {
    return unsafe.String(ptr, len)
 }

+
+// I tried to do this proper with a memory pinner and what not.
+// Couldn't get it to work.
+// Global it is. This makes sure go doesn't free stuff too early.
+var rc *readerCloser;
+
+type readerCloser struct {
+    reader io.Reader
+    closer io.Closer
+}
+
 //export roc_fx_openFile
 func roc_fx_openFile(name *C.struct_RocStr) uintptr {
    file, err := os.Open(rocStrRead(*name))
    if err != nil {
        panic(fmt.Sprintf("can not open file: %w", err))
    }
-   return uintptr(unsafe.Pointer(file))
+   r := bufio.NewReader(file)
+   rc = new(readerCloser)
+   rc.reader = r
+   rc.closer = file
+
+   return uintptr(unsafe.Pointer(rc))
 }

 //export roc_fx_closeFile
 func roc_fx_closeFile(filePtr unsafe.Pointer) {
-   file := (*os.File)(filePtr)
-   file.Close()
+   file := (*readerCloser)(filePtr)
+   file.closer.Close()
+   rc = nil
 }

 //export roc_fx_getFileBytes
 func roc_fx_getFileBytes(output *C.struct_RocStr, filePtr unsafe.Pointer) {
-   file := (*os.File)(filePtr)
+   file := (*readerCloser)(filePtr)
    buf := make([]byte, 0x10) // This is intentionally small to ensure correct implementation
-   count, err := file.Read(buf)
+   count, err := file.reader.Read(buf)
    if err != nil && err != io.EOF {
        panic(fmt.Sprintf("can not read from file: %v", err))
    }
@@ -92,9 +109,7 @@ func roc_fx_getFileBytes(output *C.struct_RocStr, filePtr unsafe.Pointer) {

 //export roc_fx_getChar
 func roc_fx_getChar() C.char {
-   reader := bufio.NewReader(os.Stdin)
-   text, _ := reader.ReadString('\n')
-   return C.char(text[0])
+   return C.char('9')
 }

 //export roc_fx_putLine

Also closed everything else on that PC and set the cpu to performance mode to make sure underclocking wasn't happening.

This second tool was run for longer, so I will use its results. Go is 26% +/- 4% slower than rust. This does not show how much of the time is used by cgo though. Luckily, we have perf for that.

Looking at the go flamegraph, it looks like 10.5% of the time is spent in runtime.cgocallback.abi0. Plus another 2.5% for the cgo malloc calls for 13%. Free didn't measure any overhead.

Thise would mean a total overhead of 11.5% for using cgo with this program. The other 14.5% of perf loss look to be coming from go runtime stuff and general setup.

Richard Feldman (Jan 14 2024 at 18:09):

Oskar Hahn (Jan 14 2024 at 21:28):

Brendan Hansknecht (Jan 14 2024 at 22:17):

Oh yeah, for sure. As a note, false is probably a worst case (or at least used to be not sure how bad it is now). It allocates like crazy.

So that will be tons of calls and overhead. Most webserver and what not will be io bound. Also they will hopefully allocate much much less and have limited numbers of tasks run.

So this isn't a don't use cgo. It was mostly me being curious cause I used to work with go in chrome os and had always heard that was super slow. So wanted to test

Stream: show and tell

Topic: Go Platform

Oskar Hahn (Jan 07 2024 at 12:29):