The upgrade to rust 1.67 somehow breaks the wasm repl. We get an invalid instruction byte (the OpCode
type). I don't see why incorrect wasm would be produced now.
I ran wasm-objdump
on the compiler wasm blob (so the webassembly that the rust compiler generates from our code base) and it does not report any problems. I'd assume it would error out when it hits some invalid instruction.
the basic_1_field_f64_record
test fails with the following backtrace / invalid instruction byte
706d29 GETLOCAL 0 [I32(2ac7f0)]
706d2b I32CONST 0 [I32(2ac7f0), I32(0)]
706d2d GETLOCAL 2 [I32(2ac7f0), I32(0), I32(c)]
706d2f CALL 5420 args=[I32(2ac7f0), I32(0), I32(c)] return_type=Some(I32)
708df0 BLOCK 64 []
708df2 GETLOCAL 2 [I32(c)]
708df4 I32CONST 33 [I32(c), I32(21)]
708df6 I32LTU [I32(1)]
thread 'tests::basic_1_field_f64_record' panicked at 'invalid instruction byte 0xfc', crates/wasm_module/src/opcodes.rs:373:21
This is confusing because when I dump the wasm that is executed to a file, the program at those memory locations is very different
706d2f: 10 ac aa 80 80 00 | call 5420 <memset>
708df0: 41 01 | i32.const 1
708df2: 6a | i32.add
708df3: 10 a7 aa 80 80 00 | call 5415
708df9: 1a | drop
708dfa: 41 00 | i32.const 0
708dfc: 21 03 | local.set 3
708deb func[5420] <memset>:
708dec: 03 7f | local[0..2] type=i32
708dee: 01 7e | local[3] type=i64
708df0: 02 40 | block
708df2: 20 02 | local.get 2
708df4: 41 21 | i32.const 33
708df6: 49 | i32.lt_u
so something gets messed up in our wasm codegen somewhere? and for some reason a rust compiler upgrade surfaces the problem. Is there some unsoundness? or maybe the wasm blob that rust generates has a subtly different structure now?
I'll continue investigating (I hope a diff between what main generates and what the 1.67 branch generates proves helpful)
(cc @Brian Carroll )
Sounds to me like a problem in the interpreter
I assume this is using our own interpreter, right? The dump looks like the one I wrote.
So internally the program counter is an index into the code section of the binary, not an index into the full Wasm binary. When we print out an error dump we adjust it to display it relative to the full binary, in an attempt to match wasm-objdump
. There have been bugs there before and there might be some remaining.
So let's assume we can't rely on the code addresses in the error dump. I would search the dump for the instruction CALL 5420
and find a place where the surrounding code matches the dump.
Also might be worth running it in some 3rd party Wasm interpreter if you have one installed.
I'd assume it would error out when it hits some invalid instruction.
Yes that's right
It's possible that the interpreter is miscalculating the number of bytes in some instruction and therefore losing its place in the instruction stream.
And maybe the previous Rust version didn't use that instruction for some reason.
is there some easy way to export the program so it is runnable with an external interpreter? I'm using the WasmModule
serialize right now but that is not quite a complete program it looks like
> wasmtime run --invoke entrypoint_from_test full-new.wasm
Error: failed to run main module `full-new.wasm`
Caused by:
0: failed to instantiate "full-new.wasm"
1: unknown import: `env::test_run_app` has not been defined
based on the code normally javascript provides this function
things seem correct up to the call
706d29 GETLOCAL 0 [I32(2ac7f0)]
706d2b I32CONST 0 [I32(2ac7f0), I32(0)]
706d2d GETLOCAL 2 [I32(2ac7f0), I32(0), I32(c)]
706d2f CALL 5420 args=[I32(2ac7f0), I32(0), I32(c)] return_type=Some(I32)
seems to match (also the offset)
706d42: 20 00 | local.get 0
706d44: 41 00 | i32.const 0
706d46: 20 02 | local.get 2
706d48: 10 ac aa 80 80 00 | call 5420
so our interpreter seems to think that function[5420] is in a different place than it actually is, right? because in practice it is here
708e04 func[5420]:
708e05: 03 7f | local[0..2] type=i32
708e07: 01 7e | local[3] type=i64
708e09: 02 40 | block
but our interpreter thinks it is here
708df0 BLOCK 64 []
708df2 GETLOCAL 2 [I32(c)]
708df4 I32CONST 33 [I32(c), I32(21)]
708df6 I32LTU [I32(1)]
it is off by 20 bytes. Could it be relevant that that is how many bytes of locals we have here (3 * 4 + 8)?
I'm pretty sure it is this is where the program counter is set to an incorrect value
self.program_counter = module.code.function_offsets[internal_fn_index] as usize;
the offset this gives (corrected by the section offset) is somewhere in the middle of a function
allright, found the problem. There are some (new?) wasm instructions that our interpreter does not have yet
specifically memory.fill
https://webassembly.github.io/spec/core/binary/instructions.html#memory-instructions
it is off by 20 bytes. Could it be relevant that that is how many bytes of locals we have here (3 * 4 + 8)?
Yes that could be relevant for sure. It depends specifically what you mean by "our interpreter thinks it is here"! Expected behaviour is that it reads the local variable declarations, allocates space for them, then resumes executing instructions at the next byte. So if you are just dumping the program counter it sounds like normal behaviour.
Yeah I just implemented the Wasm MVP, didn't add any later features.
It might be possible to tell rustc
what version of Wasm you're targeting
I've never tried that or looked it up or anything, just saying it might be possible :shrug:
memory.fill
sounds like it might be easy to implement though!
Though of course we could then immediately run into another unsupported instruction!
I got things working now
needed some new wasi stubs too for env variables
PR https://github.com/roc-lang/roc/pull/5651. quite the debugging journey. turns out that the original error was correct; it was just an instruction that we did not implement. Also, there should be a panic in such cases, but we just get UB. That is unfortunate, because the safe alternative is _much_ slower. So, just good to know I guess for next time, I don't want to make things slower really
Hmm the main match
produces a Result
with an error enum and eventually panics. We could try that. Or is that already what you did?
well, we do this
impl From<u8> for OpCode {
fn from(x: u8) -> Self {
unsafe { std::mem::transmute(x) }
}
}
and putting in an invalid value here (for which there is no OpCode
variant) is just immediately UB
when you wrote it, you did some testing and it did not cause issues immediately and this branch here would catch some errors
// Catch-all in case of an invalid cast from u8 to OpCode while parsing binary
// (rustc keeps this code, I verified in Compiler Explorer)
#[allow(unreachable_patterns)]
_ => return Err(format!("Unknown Wasm instruction 0x{:02x}", op as u8)),
but that function is not in all paths. I think the rust compiler is now stricter in the conversion than when the code was first written, which means that an invalid conversion by that unsafe code causes some compilation that (on my system) will just crash (it hits an illegal instruction, but it's UB so it could just do whatever)
but, any other conversion mechanism (certainly a match, i tried that today, I believe we also tried a lookup table at the time?) are much, much slower than this unsafe cast
and because we control the input for our interpreter, I think it is OK to cut this corner. but for the future: when new instructions are added, our wasm interpreter will break
Oh yes I remember more clearly now, thanks :+1:
Hmm I don't _think_ we tried defining a static array of Option<OpCode>
and looking it up.
we can test that separately, it doesn't really matter for the current fix, which is the final blocker for the 1.67 upgrade.
so a review on that PR would be nice
I tried out this code, which looks pretty good https://godbolt.org/z/Mra3d56YY
I took a wasm binary (the roc compiler blob) and filtered out all invalid instruction bytes. Then I have two loops summing the instruction byte values. Here are the runtimes
[benches/conversion.rs:14] input.len() = 8084815
unsafe cast time: [10.461 ms 10.479 ms 10.499 ms]
change: [-0.0001% +0.2507% +0.4915%] (p = 0.05 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
lookup table time: [12.507 ms 12.524 ms 12.543 ms]
change: [-0.1140% +0.0861% +0.2867%] (p = 0.42 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
[benches/conversion.rs:43] x = Some(
386028267,
)
[benches/conversion.rs:43] y = Some(
386028267,
)
so, it's slower, but not that much. Not sure what to do with that, but now we have some actual numbers
I guess really that the conversion of byte opcode is not the bottleneck, given that we can do it 8 million times in ~10ms. that becoming 12ms is not really a big slowdown, and other parts of the interpreter loop are much slower than that
OK so the change gives us better errors for unsupported instructions and doesn't have much of a performance penalty. Sounds great.
hmm, do we accept this? this is in debug mode, so I guess it's fine? 30 seconds delta
Executed in 415.85 secs fish external
usr time 48.30 mins 797.39 millis 48.29 mins
sys time 0.13 mins 154.94 millis 0.13 mins
________________________________________________________
Executed in 384.09 secs fish external
usr time 46.50 mins 819.00 micros 46.50 mins
sys time 0.08 mins 0.00 micros 0.08 mins
So this is still the test that sums up instruction bytes, right? Hard to make a call based on that alone. I was assuming that we'd see little difference in some of our real use cases like the tests for the Wasm REPL.
ah, no, this the wasm repl test run on my machine
but in debug mode
OK when we did analysis on this kind of thing before we used release so that it would be representative of CI
oh, wait it is release actually
those tests are really slow then
Last updated: Jul 06 2025 at 12:14 UTC