Stream: compiler development

Topic: rust 1.67 upgrade


view this post on Zulip Folkert de Vries (Jul 07 2023 at 19:00):

The upgrade to rust 1.67 somehow breaks the wasm repl. We get an invalid instruction byte (the OpCode type). I don't see why incorrect wasm would be produced now.

I ran wasm-objdump on the compiler wasm blob (so the webassembly that the rust compiler generates from our code base) and it does not report any problems. I'd assume it would error out when it hits some invalid instruction.

the basic_1_field_f64_record test fails with the following backtrace / invalid instruction byte

706d29 GETLOCAL 0        [I32(2ac7f0)]
706d2b I32CONST 0        [I32(2ac7f0), I32(0)]
706d2d GETLOCAL 2        [I32(2ac7f0), I32(0), I32(c)]

706d2f CALL 5420          args=[I32(2ac7f0), I32(0), I32(c)] return_type=Some(I32)

708df0 BLOCK 64          []
708df2 GETLOCAL 2        [I32(c)]
708df4 I32CONST 33       [I32(c), I32(21)]
708df6 I32LTU            [I32(1)]
thread 'tests::basic_1_field_f64_record' panicked at 'invalid instruction byte 0xfc', crates/wasm_module/src/opcodes.rs:373:21

This is confusing because when I dump the wasm that is executed to a file, the program at those memory locations is very different

 706d2f: 10 ac aa 80 80 00          |   call 5420 <memset>

 708df0: 41 01                      |   i32.const 1
 708df2: 6a                         |   i32.add
 708df3: 10 a7 aa 80 80 00          |   call 5415
 708df9: 1a                         |   drop
 708dfa: 41 00                      |   i32.const 0
 708dfc: 21 03                      |   local.set 3

708deb func[5420] <memset>:
 708dec: 03 7f                      | local[0..2] type=i32
 708dee: 01 7e                      | local[3] type=i64
 708df0: 02 40                      | block
 708df2: 20 02                      |   local.get 2
 708df4: 41 21                      |   i32.const 33
 708df6: 49                         |   i32.lt_u

so something gets messed up in our wasm codegen somewhere? and for some reason a rust compiler upgrade surfaces the problem. Is there some unsoundness? or maybe the wasm blob that rust generates has a subtly different structure now?

I'll continue investigating (I hope a diff between what main generates and what the 1.67 branch generates proves helpful)

(cc @Brian Carroll )

view this post on Zulip Brian Carroll (Jul 08 2023 at 07:07):

Sounds to me like a problem in the interpreter

view this post on Zulip Brian Carroll (Jul 08 2023 at 07:08):

I assume this is using our own interpreter, right? The dump looks like the one I wrote.

view this post on Zulip Brian Carroll (Jul 08 2023 at 07:11):

So internally the program counter is an index into the code section of the binary, not an index into the full Wasm binary. When we print out an error dump we adjust it to display it relative to the full binary, in an attempt to match wasm-objdump. There have been bugs there before and there might be some remaining.

view this post on Zulip Brian Carroll (Jul 08 2023 at 07:13):

So let's assume we can't rely on the code addresses in the error dump. I would search the dump for the instruction CALL 5420 and find a place where the surrounding code matches the dump.

view this post on Zulip Brian Carroll (Jul 08 2023 at 07:14):

Also might be worth running it in some 3rd party Wasm interpreter if you have one installed.

view this post on Zulip Brian Carroll (Jul 08 2023 at 07:14):

I'd assume it would error out when it hits some invalid instruction.

Yes that's right

view this post on Zulip Brian Carroll (Jul 08 2023 at 07:17):

It's possible that the interpreter is miscalculating the number of bytes in some instruction and therefore losing its place in the instruction stream.

view this post on Zulip Brian Carroll (Jul 08 2023 at 07:18):

And maybe the previous Rust version didn't use that instruction for some reason.

view this post on Zulip Folkert de Vries (Jul 08 2023 at 11:04):

is there some easy way to export the program so it is runnable with an external interpreter? I'm using the WasmModule serialize right now but that is not quite a complete program it looks like

> wasmtime run --invoke entrypoint_from_test full-new.wasm
Error: failed to run main module `full-new.wasm`

Caused by:
    0: failed to instantiate "full-new.wasm"
    1: unknown import: `env::test_run_app` has not been defined

based on the code normally javascript provides this function

view this post on Zulip Folkert de Vries (Jul 08 2023 at 11:41):

things seem correct up to the call

706d29 GETLOCAL 0        [I32(2ac7f0)]
706d2b I32CONST 0        [I32(2ac7f0), I32(0)]
706d2d GETLOCAL 2        [I32(2ac7f0), I32(0), I32(c)]

706d2f CALL 5420          args=[I32(2ac7f0), I32(0), I32(c)] return_type=Some(I32)

seems to match (also the offset)

 706d42: 20 00                      |   local.get 0
 706d44: 41 00                      |   i32.const 0
 706d46: 20 02                      |   local.get 2
 706d48: 10 ac aa 80 80 00          |   call 5420

so our interpreter seems to think that function[5420] is in a different place than it actually is, right? because in practice it is here

708e04 func[5420]:
 708e05: 03 7f                      | local[0..2] type=i32
 708e07: 01 7e                      | local[3] type=i64
 708e09: 02 40                      | block

but our interpreter thinks it is here

708df0 BLOCK 64          []
708df2 GETLOCAL 2        [I32(c)]
708df4 I32CONST 33       [I32(c), I32(21)]
708df6 I32LTU            [I32(1)]

it is off by 20 bytes. Could it be relevant that that is how many bytes of locals we have here (3 * 4 + 8)?

view this post on Zulip Folkert de Vries (Jul 08 2023 at 11:56):

I'm pretty sure it is this is where the program counter is set to an incorrect value

            self.program_counter = module.code.function_offsets[internal_fn_index] as usize;

the offset this gives (corrected by the section offset) is somewhere in the middle of a function

view this post on Zulip Folkert de Vries (Jul 08 2023 at 15:55):

allright, found the problem. There are some (new?) wasm instructions that our interpreter does not have yet

view this post on Zulip Folkert de Vries (Jul 08 2023 at 15:55):

specifically memory.fill https://webassembly.github.io/spec/core/binary/instructions.html#memory-instructions

view this post on Zulip Brian Carroll (Jul 08 2023 at 17:09):

it is off by 20 bytes. Could it be relevant that that is how many bytes of locals we have here (3 * 4 + 8)?

Yes that could be relevant for sure. It depends specifically what you mean by "our interpreter thinks it is here"! Expected behaviour is that it reads the local variable declarations, allocates space for them, then resumes executing instructions at the next byte. So if you are just dumping the program counter it sounds like normal behaviour.

view this post on Zulip Brian Carroll (Jul 08 2023 at 17:10):

Yeah I just implemented the Wasm MVP, didn't add any later features.

view this post on Zulip Brian Carroll (Jul 08 2023 at 17:10):

It might be possible to tell rustc what version of Wasm you're targeting

view this post on Zulip Brian Carroll (Jul 08 2023 at 17:11):

I've never tried that or looked it up or anything, just saying it might be possible :shrug:

view this post on Zulip Brian Carroll (Jul 08 2023 at 17:13):

memory.fill sounds like it might be easy to implement though!

view this post on Zulip Brian Carroll (Jul 08 2023 at 17:13):

Though of course we could then immediately run into another unsupported instruction!

view this post on Zulip Folkert de Vries (Jul 08 2023 at 17:26):

I got things working now

view this post on Zulip Folkert de Vries (Jul 08 2023 at 17:26):

needed some new wasi stubs too for env variables

view this post on Zulip Folkert de Vries (Jul 08 2023 at 18:41):

PR https://github.com/roc-lang/roc/pull/5651. quite the debugging journey. turns out that the original error was correct; it was just an instruction that we did not implement. Also, there should be a panic in such cases, but we just get UB. That is unfortunate, because the safe alternative is _much_ slower. So, just good to know I guess for next time, I don't want to make things slower really

view this post on Zulip Brian Carroll (Jul 08 2023 at 19:58):

Hmm the main match produces a Result with an error enum and eventually panics. We could try that. Or is that already what you did?

view this post on Zulip Folkert de Vries (Jul 08 2023 at 20:00):

well, we do this

impl From<u8> for OpCode {
    fn from(x: u8) -> Self {
        unsafe { std::mem::transmute(x) }
    }
}

and putting in an invalid value here (for which there is no OpCode variant) is just immediately UB

view this post on Zulip Folkert de Vries (Jul 08 2023 at 20:03):

when you wrote it, you did some testing and it did not cause issues immediately and this branch here would catch some errors

        // Catch-all in case of an invalid cast from u8 to OpCode while parsing binary
        // (rustc keeps this code, I verified in Compiler Explorer)
        #[allow(unreachable_patterns)]
        _ => return Err(format!("Unknown Wasm instruction 0x{:02x}", op as u8)),

but that function is not in all paths. I think the rust compiler is now stricter in the conversion than when the code was first written, which means that an invalid conversion by that unsafe code causes some compilation that (on my system) will just crash (it hits an illegal instruction, but it's UB so it could just do whatever)

view this post on Zulip Folkert de Vries (Jul 08 2023 at 20:04):

but, any other conversion mechanism (certainly a match, i tried that today, I believe we also tried a lookup table at the time?) are much, much slower than this unsafe cast

view this post on Zulip Folkert de Vries (Jul 08 2023 at 20:05):

and because we control the input for our interpreter, I think it is OK to cut this corner. but for the future: when new instructions are added, our wasm interpreter will break

view this post on Zulip Brian Carroll (Jul 09 2023 at 12:47):

Oh yes I remember more clearly now, thanks :+1:

view this post on Zulip Brian Carroll (Jul 09 2023 at 12:50):

Hmm I don't _think_ we tried defining a static array of Option<OpCode> and looking it up.

view this post on Zulip Folkert de Vries (Jul 09 2023 at 13:12):

we can test that separately, it doesn't really matter for the current fix, which is the final blocker for the 1.67 upgrade.

view this post on Zulip Folkert de Vries (Jul 09 2023 at 17:59):

so a review on that PR would be nice

I tried out this code, which looks pretty good https://godbolt.org/z/Mra3d56YY

I took a wasm binary (the roc compiler blob) and filtered out all invalid instruction bytes. Then I have two loops summing the instruction byte values. Here are the runtimes

[benches/conversion.rs:14] input.len() = 8084815
unsafe cast             time:   [10.461 ms 10.479 ms 10.499 ms]
                        change: [-0.0001% +0.2507% +0.4915%] (p = 0.05 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

lookup table            time:   [12.507 ms 12.524 ms 12.543 ms]
                        change: [-0.1140% +0.0861% +0.2867%] (p = 0.42 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

[benches/conversion.rs:43] x = Some(
    386028267,
)
[benches/conversion.rs:43] y = Some(
    386028267,
)

so, it's slower, but not that much. Not sure what to do with that, but now we have some actual numbers

view this post on Zulip Folkert de Vries (Jul 09 2023 at 18:04):

I guess really that the conversion of byte opcode is not the bottleneck, given that we can do it 8 million times in ~10ms. that becoming 12ms is not really a big slowdown, and other parts of the interpreter loop are much slower than that

view this post on Zulip Brian Carroll (Jul 10 2023 at 12:41):

OK so the change gives us better errors for unsupported instructions and doesn't have much of a performance penalty. Sounds great.

view this post on Zulip Folkert de Vries (Jul 10 2023 at 17:05):

hmm, do we accept this? this is in debug mode, so I guess it's fine? 30 seconds delta

Executed in  415.85 secs   fish           external
   usr time   48.30 mins  797.39 millis   48.29 mins
   sys time    0.13 mins  154.94 millis    0.13 mins
________________________________________________________
Executed in  384.09 secs   fish           external
   usr time   46.50 mins  819.00 micros   46.50 mins
   sys time    0.08 mins    0.00 micros    0.08 mins

view this post on Zulip Brian Carroll (Jul 10 2023 at 17:16):

So this is still the test that sums up instruction bytes, right? Hard to make a call based on that alone. I was assuming that we'd see little difference in some of our real use cases like the tests for the Wasm REPL.

view this post on Zulip Folkert de Vries (Jul 10 2023 at 17:18):

ah, no, this the wasm repl test run on my machine

view this post on Zulip Folkert de Vries (Jul 10 2023 at 17:18):

but in debug mode

view this post on Zulip Brian Carroll (Jul 10 2023 at 17:18):

OK when we did analysis on this kind of thing before we used release so that it would be representative of CI

view this post on Zulip Folkert de Vries (Jul 10 2023 at 17:19):

oh, wait it is release actually

view this post on Zulip Folkert de Vries (Jul 10 2023 at 17:19):

those tests are really slow then


Last updated: Jul 06 2025 at 12:14 UTC