Off by one error running with 'optimize' · bugs

Stream: bugs

Topic: Off by one error running with 'optimize'

Alex Nuttall (Dec 12 2024 at 22:06):

I have a couple of advent of code solutions which have problems running with roc run --optimize.

Day 10 returns 1 less than the correct answer about 1/3 of the time. I can only hit it with the full input, not the example. The non-optimised build is always correct.

Day 12 hangs in optimised mode. This also happens with the example data.

Using the latest nightly release

Brendan Hansknecht (Dec 12 2024 at 22:25):

Would be great if you can make any sort of more minimal repros, but this is nice too. Must mean we have incorrectly defined behaviour that llvm is optimizing away. That or something with morphic and uniquness being off.

Alex Nuttall (Dec 13 2024 at 01:36):

There is no difference in the IR for day 10 between the times the bug occurs and the times it doesn't

I did cut it down a fair bit: https://github.com/alexnuttall/aoc2024-roc/blob/main/repro/10/main.roc

Brendan Hansknecht (Dec 13 2024 at 01:41):

There is no difference in the IR for day 10 between the times the bug occurs and the times it doesn't

There is no difference between the optimized and non-optimized llvm ir?

Luke Boswell (Dec 13 2024 at 01:46):

(deleted)

Brendan Hansknecht (Dec 13 2024 at 01:48):

I assume the optimized memory is accessing memory it shouldn't and making a decision based on that. 1/3 of the time, it gets lucky

Brendan Hansknecht (Dec 13 2024 at 04:23):

woah, the same executable is random (not even recompilations)...that's really cool. I've never seen that before in roc... too bad it's a bug.

Brendan Hansknecht (Dec 13 2024 at 04:32):

Looks to be a dictionary bug.

Brendan Hansknecht (Dec 13 2024 at 04:32):

In optimized build we are failing to find an item in the dict

Brendan Hansknecht (Dec 13 2024 at 04:33):

Not sure what the root bug is though

Brendan Hansknecht (Dec 13 2024 at 04:38):

~~Interesting... debug printing the hash fixes the bug. So some sort of optimization right around hashing is breaking things.~~

Brendan Hansknecht (Dec 13 2024 at 04:42):

woah...there seem to be bad dictionary seeds that break the dictionary in roc.......

Brendan Hansknecht (Dec 13 2024 at 04:42):

Though maybe it is just seeds that reveal a bug in the dict or hash

Brendan Hansknecht (Dec 13 2024 at 04:44):

still hashing to the same value, so probably revealing a bug in dict

Brendan Hansknecht (Dec 13 2024 at 05:19):

Doesn't seem like a dict bug either per say. I think this is a refcounting or morphic uniqueness bug. I think that we are mutating inplace part of the dictionary despite it being needed later

Brendan Hansknecht (Dec 13 2024 at 05:19):

seems to happend when Dict.keepIf is called

Brendan Hansknecht (Dec 13 2024 at 05:21):

The dictionary metadata is incorrectly mutated in place.

Brendan Hansknecht (Dec 13 2024 at 05:31):

Ok, this is definitely a morphic bug (or I guess bug with the info we give morphic). It is using replace_in_place incorrectly. If I force always using replace, the bug goes away

Brendan Hansknecht (Dec 13 2024 at 05:33):

This is the same issue being hit by #beginners > forcing a new string allocation/copy

Brendan Hansknecht (Dec 13 2024 at 05:33):

I wonder if this is a bug in borrow inference

Brendan Hansknecht (Dec 13 2024 at 05:35):

cc @Folkert de Vries in case you have any ideas to help with debugging.

Brendan Hansknecht (Dec 13 2024 at 05:42):

Also looks to be the root cause of an older dict bug: #6936

Brendan Hansknecht (Dec 13 2024 at 05:42):

I know a workaround, but hopefully we can root cause and fix

Brendan Hansknecht (Dec 13 2024 at 05:57):

Actually, I guess everything is owned here. So wouldn't be a bug in borrow inference. Would just be a bug in morphic or what we feed it. I wonder if there is some sort of issue with uniqueness calculation and structures. Cause the dict should always be owned.

Brendan Hansknecht (Dec 13 2024 at 06:03):

I guess I need to dive into how morphic decides it can call the inplace version of a function cause it is getting that wrong.

Brendan Hansknecht (Dec 13 2024 at 06:27):

Ok, I think I see something interesting in mono related to this.

The Dict related functions assume ownership of the dict (cause all functions assume ownership of their inputs excepts for raw lists and strings, which can be borrowed). All functions within dict use the parts of it in a linear fashion. This means if you pass in a unique dict, it is correct to mutate in place. As such, the List.set calls within the dict module are set to be inplace. This is all correct assuming you always pass a unique dict into the APIs. We do not pass a unique dict into the apis. Before calling the dict function we run inc `#UserApp.map`;. This does not give us a unique dict. It explicitly gives us a dict with 2 references. I think we have a bug with owned functions arguments + inplace mutations.

Specifically, owned != unique. Owned seems to mean that the function has it's own reference to the passed in data. So when morphic uses owned to mean unique, this leads to bugs.

I'm not sure the correct way to go about fixing this, but it feels like a pretty fundmental mis-mapping between our stack and morphic. If I am correct, all uses of InPlace today are probably not trustable. Cause they might get passed something with a refcount > 1 due to owned function args not meaning unique.

I'm hoping that only something minor is off here (and I just don't understand how the pieces fit together), but definitely worried it is a bigger issue.

Brendan Hansknecht (Dec 13 2024 at 07:03):

I found a really solid minimal repro. It seems that specifically recursion confuses morphic here:

app [main] {
    pf: platform "https://github.com/roc-lang/basic-cli/releases/download/0.17.0/lZFLstMUCUvd5bjnnpYromZJXkQUrdhbva4xdBInicE.tar.br",
}

import pf.Stdout

main =
    list = [123]
    x = updateListBroken list 0 |> List.len
    # x = updateListOk list 0 |> List.len
    Stdout.line "$(Inspect.toStr list) $(Num.toStr x)"


updateListBroken = \list, i ->
    if i < List.len list then
        next = List.set list i 456
        updateListBroken next (i + 1)
    else
        list

updateListOk = \list, i ->
    if i < List.len list then
        List.set list i 789
    else
        list

This leads to inplace mutation of the list and printing out 456

EDIT: just made a minor correction to make the repro actually happen.

Brendan Hansknecht (Dec 13 2024 at 07:19):

Hoping someone with more morphic knowledge can take a look.
cc @J.Teeuwissen

J.Teeuwissen (Dec 13 2024 at 08:23):

I have little to no morphic knowledge, but what you stated on recounting seems to be correct. Functions that (might) modify their inputs likely have them passed as owned. To see if they can be modified in place, only the reference count should have to be checked. How is inPlace currently defined?

Anton (Dec 13 2024 at 11:43):

Thanks for the deep debug @Brendan Hansknecht :heart:

Brendan Hansknecht (Dec 13 2024 at 16:10):

J.Teeuwissen said:

I have little to no morphic knowledge, but what you stated on recounting seems to be correct.

Sorry about the wrong ping then.... I guess your drop specialization and recounting work made me assume you had context on morphic...

To see if they can be modified in place, only the reference count should have to be checked. How is inPlace currently defined?

I'm really not sure how all of this works. I thought the owned and borrowed info was passed to morphic, but that doesn't actually look to be the case. All I know is that morphic does a complex whole program analysis to attempt to find statically known unique values such that they can be updated in place without any refcount checks.

Brendan Hansknecht (Dec 13 2024 at 17:14):

output from alias analysis debug flag. Likely holds the bug (though bug might be completely on the morphic side if this looks correct: https://gist.github.com/bhansconnect/c1e98d5ff04c567fda5dd17ff4174a13

Brendan Hansknecht (Dec 13 2024 at 18:16):

Attempted to change the model of replace as a remove followed by an insert instead of a get followed by an update, but no luck with that.

Brendan Hansknecht (Dec 13 2024 at 18:19):

Which I guess makes senses, cause I think the bug is with joinpoint loops, not with replace itself (though a small part of replace looks wrong too)

Brendan Hansknecht (Dec 13 2024 at 19:21):

Sadly, this looks to be a general bug not specific to List.replace. I can reproduce it with List.swap as well. I assume this means it is a bug with morphic analysis for recursive code, but it still could be something we are doing wrong with passing data to/getting data from morphic. Either way, probably a harder fix.

Brendan Hansknecht (Dec 13 2024 at 19:44):

Also, if any one else ends up looking into this, you can get a slightly more minimal repro (in terms of ir generated) by switching the code above to one the the platform switching example platforms.

Brendan Hansknecht (Dec 13 2024 at 22:17):

cc: @Ayaz Hafiz in case you have any ideas or tips for debugging morphic.

Ayaz Hafiz (Dec 14 2024 at 02:15):

sadly morphic is not easy to debug

Ayaz Hafiz (Dec 14 2024 at 02:16):

easily the hardest out of any of our deps to debug

Ayaz Hafiz (Dec 14 2024 at 02:16):

if it was easy to pull out i would suggest removing it but it's not

Ayaz Hafiz (Dec 14 2024 at 02:17):

something needs to be done about it though because it's not maintained and it has internal bugs. It might be worth removing that and specialization inference if it makes things more stable rn.

Brendan Hansknecht (Dec 14 2024 at 02:25):

Is morphic only for the inplace update mode currently or does it do more?

Brendan Hansknecht (Dec 14 2024 at 02:25):

Also, what is specialization inference?

Ayaz Hafiz (Dec 14 2024 at 02:59):

Morphic creates separate specializations of functions based on their borrow/ownership demands

Ayaz Hafiz (Dec 14 2024 at 02:59):

so e.g. creates a separate List.map if one call site needs to borrow, and one site owns

Brendan Hansknecht (Dec 14 2024 at 03:06):

Ah, didn't realize that, but makes sense

Brendan Hansknecht (Dec 14 2024 at 03:08):

Also, if we just want stability for now, I could use the trivial analysis is optimized builds or just ignore update mode in place completely for now. I don't think there would be an easy way for me to just turn it off within loops for now.

Brendan Hansknecht (Dec 14 2024 at 03:08):

Then I can just log the minimal repro in an issue for now.

Brendan Hansknecht (Dec 14 2024 at 03:08):

Side question, do we have any "best" benchmark for me to see the perf cost of this?

Brendan Hansknecht (Dec 14 2024 at 03:50):

Oh, random thought as to what might be going on. I wonder if morphic expects the first iteration of the loop to call the reference count checking version of the function. Then for every loop after, it would call the inplace version.

Brendan Hansknecht (Dec 14 2024 at 03:56):

filed #7367 to track this

Brendan Hansknecht (Dec 14 2024 at 07:14):

For now, setting morphic analysis to trivial too workaround this bug:
Brendan Hansknecht said:

limit morphic to trivial solving only to avoid the inplace mutation correctness bugs: https://github.com/roc-lang/roc/pull/7370

Surprisingly, this seems to increase performance in a number of cases by a few percent. My only guess is that the accidental mutation is leading to extra looping that shouldn't be happening. All changes to perf are within 5% and most are slightly positive.

Last updated: Jul 26 2025 at 12:14 UTC