Dealing with logic errors in Roc · ideas

In the world of software, I think it's fair to divide the kinds of errors we need to deal with into two camps:

Predictable errors are where we model things with Result primarily, because we know all the error conditions ahead of time, roughly.

But unpredictable errors are the things where we don't know what's going to happen. Strictly speaking, that also includes things like segfaults, but I primarily want to view the unpredictable errors as being logic errors here. Cases where Roc encourages using crash today if you ever reach an impossible branch for example.

There are a couple of important facets of logic errors as they are handled, not only in Roc, but in other languages too:

crash and expect provide this experience in Roc, because they allow developers to be quickly notified that something is wrong during development, and we don't have to bend the type system to model something that shouldn't actually be happening in the first place.

To be clear, I think there is a real need for something like crash and expect. They are important for making it easy to spot logic errors during development, because they are very loud. But if I'm developing some web service, I might be serving many REST endpoints from the same service. If one of those endpoints call a function that happens to crash, then I don't want it to bring down my entire service. At the very least, I just want to make the single endpoint unavailable while I investigate, so the rest of the service can remain live. So even I can't do anything useful with the logic error itself, I can still offer a degraded service that remains useful.

So my first question is: should there be a way to catch a crash, a la catch_unwind in Rust?

One of my problems with catch_unwind is that it's very general, and you can really put it anywhere. But for logic errors specifically, I think there's usually a clear boundary of where you want it:

Maybe there's a way to (optionally) lint that you are remembering to call catch in only these places?

My next question though, is whether the behaviors in crash and expect are even the right primitives to begin with? Maybe there are better tools waiting to be found here?

One small evolution might be to make crash and expect parametrized over the behavior we expect them to trigger. For example, what if module params was used to pass in a "crasher", that could specialise how they would crash and define any guards where the crash would be caught? I don't really know if this would be feasible or even sensible, but I want to float the idea at least.

Or maybe we could define something more targeted at dealing with logic errors, e.g. LogicErrors.detectedError : Severity, Str -> .... This function would allow you to report that you hit an impossible error case to the platform, and then the platform could decide what to do. For example, if you pass in a severity of Fatal, it might crash as today. Or it might decide to block access to the specific REST endpoint where this error was detected, and automatically return a 503 error on it, until it can be manually unblocked.

Hannes (Oct 02 2024 at 12:16):

One thing to keep in mind, I'm pretty sure platforms can choose how to handle crashes, e.g. a web server can choose to let each request crash separately

Richard Feldman (Oct 02 2024 at 13:44):

Richard Feldman (Oct 02 2024 at 13:52):

I definitely think we should never add anything to the language that applications or packages could use to recover from crash

Richard Feldman (Oct 02 2024 at 13:52):

Kasper Møller Andersen (Oct 02 2024 at 14:14):

To be clear, let’s not get hung up on web servers. But you’re saying that the platform is responsible for insulating the software from crashes?

As another example, let’s say I’m building a game, and there’s a crash in my animation subsystem. If the platform is running animations isolated from the rest of the code, then it can recover by just disabling animations for example. But if the animation is just running with the rest of the game loop, can the platform do that?

Richard Feldman (Oct 02 2024 at 14:44):

Richard Feldman (Oct 02 2024 at 14:45):

"if there is a mistake in this part of the code, what's the worst that can happen? how can we insulate against that to make it less bad if it happens? what's the cost of that insulation?"

Richard Feldman (Oct 02 2024 at 14:45):

Kasper Møller Andersen (Oct 02 2024 at 17:11):

Sure, and nobody sets out to overflow the stack or run out of memory. Before I go on, I would like to better understand why you don’t want developers to write this insulation code in Roc though?

Richard Feldman (Oct 02 2024 at 17:29):

if you can recover from crash in Roc code, then it has become throw and we have added try/catch, and it will become used for recoverable error handling

Kasper Møller Andersen (Oct 06 2024 at 11:04):

Is that a problem in Rust, since it has that exact setup? It's also something that can be designed in a few different ways in my mind, to discourage such use if that's really a worry.

To better motivate why I think it makes sense, I think software in general has really awful user-facing errors. Part of that I think, is that programming languages and frameworks make it very easy to "create" an error (crash, Err, etc.), but then leaves it up to developers to make it good for users.

The reason I wanted to single out logic errors, is because Roc has effectively given the tooling to create errors, with no chance for developers to create a good recovery story for users, outside of modifying the platform directly.

It's of course alright to say that these logic errors should be rare enough that it's not something Roc developers are actually exposed to dealing with. But having a way to trigger errors without a way of creating a user friendly recovery story just makes me uneasy.

Richard Feldman (Oct 06 2024 at 11:15):

I wouldn't say Rust has that exact setup - if you look at the docs for panic recovery they say things like "this doesn't actually catch all panics, and also here are a bunch of things to be careful of if you use this with other language features"

Richard Feldman (Oct 06 2024 at 11:19):

right, but if you read "logic error" as "logic mistake" then the feature request becomes "Roc needs a way to correct arbitrary mistakes in other people's code without modifying that code"

Richard Feldman (Oct 06 2024 at 11:21):

Roc originally did not have a crash keyword; if you made a mistake that crashed the program, it was probably because you ran it out of memory or tried to do integer division by zero

Richard Feldman (Oct 06 2024 at 11:22):

Richard Feldman (Oct 06 2024 at 11:24):

so I think if there's a case to be made for a feature like this, it has to be made without reference to crash

Richard Feldman (Oct 06 2024 at 11:26):

for example, something like "if I call library code and it overflows the stack, or the heap, or goes into an infinite loop, or does integer division by zero, I want my application code to be able to defensively write code which recovers from that possibility, and spawning a new process to run that code is not a sufficient solution"

Kasper Møller Andersen (Oct 06 2024 at 11:50):

I think there's a gap between the two that doesn't need to be crossed though. I'm not interested in "fixing" the error, but it's still quite important that:

Going back to the example of a game where the animation system crashes, you probably want something like the following to happen:

Those are the kinds of scenarios I think are worth being able to deal with, which right now, I would need to fork the platform to do (as I understand it).

Richard Feldman (Oct 06 2024 at 12:19):

Richard Feldman (Oct 06 2024 at 12:20):

Richard Feldman (Oct 06 2024 at 12:21):

I think maybe another way to frame this is: for the purposes of this discussion, let's assume that crash has been removed from the language. There is no longer a keyword for that, or any equivalent of it.

Richard Feldman (Oct 06 2024 at 12:22):

the animation system did not do a crash because that doesn't exist. So what did it do that we're trying to recover from?

Kasper Møller Andersen (Oct 06 2024 at 14:21):

I guess it doesn’t actually matter. What’s important to me is the experience of recovering from something deemed unrecoverable. I would expect it to be the same in all those cases.

Brendan Hansknecht (Oct 06 2024 at 15:32):

I think it is important to remember that platforms are not set in stone. If you are working on a game, you definitely own the platform. So you can modify it to crash however you like. Even if you are working on basic-cli, you can fork it and change how panics are handled.

Brendan Hansknecht (Oct 06 2024 at 15:34):

I do agree that crash is scary. There is a reason that I push for essentially no one ever using it (especially in libraries) unless they somehow hit an unreachable state. I think the world before crash was much worse than the world we have now with crash. At some point there are going to be:

Without crash, hacks have to be used that generate terrible error messages in order to cause a crash in a unreachable state: 255u8 + 1.

Brendan Hansknecht (Oct 06 2024 at 15:37):

All this said, I think crash should be used as sparingly as possible. If crash becomes common in libraries, and actually gets hit semi often. I would hope the community as a whole would put pressure on those libraries to remove crash or switch to different libraries. If that fails, I think that Roc has a major issue. That would be proof to me that adding crash to Roc was probably a mistake.

In other words, in my view (as one of the main people that pushed for crash existing), feeling like you need to catch crashes would mean adding crash was a mistake. It would not be a reason to add some form of catch.

Brendan Hansknecht (Oct 06 2024 at 15:48):

The general advice is to only use it at the ffi (eg rust plugins for godot) or stability boundary (eg stop web request from killing webserver). Both of these exist in the platform layer for roc.

Richard Feldman (Oct 06 2024 at 16:36):

interesting - so in other languages, do you use things like try/catch around library calls in case they stack overflow, or heap overflow?

Richard Feldman (Oct 06 2024 at 16:36):

Kasper Møller Andersen (Oct 06 2024 at 18:09):

This may also just be somewhere where my mental model is off. With what I know about platforms today, it seems like they are responsible for a whole heap of things that I don't want to take responsibility for necessarily (like building good stack traces). So my intuition around forking a platform, is that it's analoguos to forking the JVM because you want different crash handling in your Java code. I know the JVM is a much bigger beast than a Roc platform, but in my mind, they fill a similar space.

Kasper Møller Andersen (Oct 06 2024 at 19:03):

I think the term "stability boundary" describes well what I feel is kind of missing.

Not for those kinds of crashes, but I've definitely seen Scala and Java code with extra catches in it to catch various runtime exceptions (which are basically the Java equivalents of crash). Part of that is also just Java having poor standards around what kinds of exceptions should actually be runtime exceptions rather than checked exceptions though.

As @Brendan Hansknecht was saying, if crash becomes a common thing in libraries, it can potentially make for a poor experience of using Roc libraries. But the way to minimize the problem is to encourage people to model their code in such a way that explicit crashing is not needed. But I'm not confident that people can be relied on enough to do so. Going back to Rust, it's basically the argument for social pressure on libraries to avoid unsafe code. This has kind of worked, but also had times where people came under a lot of pressure to get rid of their unsafe usage (i.e. the Actix debacle).

So I guess my position is that I really want to keep crash around, but I'm also looking for tooling to help me define and maintain my stability boundaries. Even in a web server where a request causes a crash, what if that means I've written some data into the database but not completed the rest of the required work? Is all the work being done for that request being seen as one atomic transaction that will get rolled back, or do I need to do something myself in case of a crash?

Richard Feldman (Oct 06 2024 at 19:12):

just to be totally honest, I think if we get to the point where crash is overused in libraries because library don't want to handle errors via Result (or otherwise), my ordering of preferences would put "go back to not having crash again" as preferable to "add a way to recover from crash"

Richard Feldman (Oct 06 2024 at 19:14):

basically, there is always going to be some point where library authors can make mistakes

Richard Feldman (Oct 06 2024 at 19:14):

and I don't think we should try to have a language feature that corrects arbitrary mistakes

Richard Feldman (Oct 06 2024 at 19:14):

Richard Feldman (Oct 06 2024 at 19:15):

a mistake in third-party code is going to cause a degraded user experience no matter what

Richard Feldman (Oct 06 2024 at 19:15):

Richard Feldman (Oct 06 2024 at 19:17):

Richard Feldman (Oct 06 2024 at 19:18):

for example, you can recover from an accidental infinite loop by running that code on a different thread, timing it, and killing the thread if it runs longer than some configured timeout

Richard Feldman (Oct 06 2024 at 19:19):

this has downsides such as performance and code complexity, but it can result in a less bad user experience (an error message) than an infinite loop

Richard Feldman (Oct 06 2024 at 19:19):

Brendan Hansknecht (Oct 06 2024 at 19:19):

I do think there is an important point brought up in terms of stability boundary.

If in basic webserver I start a transaction then a crash happens, I believe it will completely clog up sqlite. So a user that starts a transaction then runs 3rd party code would be left in a fundamentally broken state if a crash happens. They have no way to fix this but avoid calling any code that might crash after a transaction is stated on a thread.

Richard Feldman (Oct 06 2024 at 19:20):

Brendan Hansknecht (Oct 06 2024 at 19:20):

Brendan Hansknecht (Oct 06 2024 at 19:21):

It just knows about statements (which would all clean up but that transaction would still be open)

Brendan Hansknecht (Oct 06 2024 at 19:21):

It also definitely wouldn't know about a transaction to something like postgres over TCP.

Brendan Hansknecht (Oct 06 2024 at 19:22):

In sqlite, one statement requests beginning the transaction. Then you run n statements in the transaction. Then you run a final statement to close the transaction.

Brendan Hansknecht (Oct 06 2024 at 19:22):

So even if you clean up all statements you may still be in the middle of a transaction.

Richard Feldman (Oct 06 2024 at 19:30):

oh you're assuming the platform isn't aware of SQLite and is instead just offering a socket primitive

Richard Feldman (Oct 06 2024 at 19:30):

Brendan Hansknecht (Oct 06 2024 at 19:30):

So for sqlite, I am just noting the state today in basic webserver. For postgres, I am trying to point out the general issue.

Richard Feldman (Oct 06 2024 at 19:31):

Richard Feldman (Oct 06 2024 at 19:32):

so an individual request handler overflows the stack and we want to close its SQLite transactions, right?

Brendan Hansknecht (Oct 06 2024 at 19:32):

I don't think that is reasonable cause it many systems, stack overflows always crash the program. Like completely crash it.

So I would prefer to keep it in terms of something that might happen but would generally recover with a 500. So numeric overflow or any sort of exception otherwise.

Brendan Hansknecht (Oct 06 2024 at 19:33):

Richard Feldman (Oct 06 2024 at 19:34):

division by zero we can choose to handle in other ways, e.g. by having it return 0 like pony does

Richard Feldman (Oct 06 2024 at 19:34):

I'd specifically like to focus on stack overflow because it is unquestionably unrecoverable

Brendan Hansknecht (Oct 06 2024 at 19:35):

You want something in the grey area. How about integer overflow which we want to have crash due to all of the correctness implications

Brendan Hansknecht (Oct 06 2024 at 19:36):

Like on stack overflow, we aren't even going to 500, the entire webserver is probably going to crash.

Richard Feldman (Oct 06 2024 at 19:36):

ok so I think this actually gets to the fundamental difference in perspective here

Richard Feldman (Oct 06 2024 at 19:37):

to me, the only cases we are talking about are situations like stack overflows where if it happens, we have decided it should be game over and there's no recovering from it

Brendan Hansknecht (Oct 06 2024 at 19:38):

Stack overflow doesn't even call roc_panic though. It literally doesn't hit the path we are discussing

Richard Feldman (Oct 06 2024 at 19:38):

Richard Feldman (Oct 06 2024 at 19:39):

Brendan Hansknecht (Oct 06 2024 at 19:40):

Richard Feldman (Oct 06 2024 at 19:43):

if people are misusing crash, then to me the default best response to that is to try to educate people about what it's for.

if that doesn't work, then the next best response is to remove crash from the language so that if you're writing a library and hit a "this should never happen" situation, then your only option is to use one of the hacks we used to use (e.g. inducing a stack overflow) to address it, at which point it at least becomes obvious that you should never use that technique for error handling

Richard Feldman (Oct 06 2024 at 19:44):

the reason I think it's useful to talk about how to handle SQLite transactions in the presence of a stack overflow is that it's not a situation where either of those solutions would help

Richard Feldman (Oct 06 2024 at 19:44):

Richard Feldman (Oct 06 2024 at 19:45):

and we also couldn't fix it by changing how integer overflow or division by zero works

Brendan Hansknecht (Oct 06 2024 at 19:48):

Brendan Hansknecht (Oct 06 2024 at 19:49):

So personally, I don't care much about 1 or 2. 3 is the important problematic case.

Brendan Hansknecht (Oct 06 2024 at 19:54):

As a concrete case of what could happen in basic webserver. We can even pretend crash is gone:

Richard Feldman (Oct 06 2024 at 19:54):

well state 2 doesn't have to kill the entire root process - the host can install a signal handler, unwind, and return 500

Richard Feldman (Oct 06 2024 at 19:55):

Brendan Hansknecht (Oct 06 2024 at 19:55):

Sure, then 2 falls into 3 and it matters, but it specifically only matters if the host might recover in a potentially invalid state.

Richard Feldman (Oct 06 2024 at 19:56):

Brendan Hansknecht (Oct 06 2024 at 19:57):

Personally. I would like to pin to the concrete example. I think it is what would lead to users wanting some form of catch

Brendan Hansknecht (Oct 06 2024 at 19:57):

Or at least some form of errdefer or finally to clean up in the case of an exception.

Richard Feldman (Oct 06 2024 at 19:59):

Richard Feldman (Oct 06 2024 at 20:00):

Brendan Hansknecht (Oct 06 2024 at 20:04):

I'm just trying to point it out as a motivating example. One solution would be c++ style exceptions then enable finally to clean up.

Richard Feldman (Oct 06 2024 at 20:05):

Brendan Hansknecht (Oct 06 2024 at 20:05):

Or even could be done without c++ exceptions before calling roc_panic. Like register functions to call to clean up before calling roc_panic. I just assume a platform won't be able to deal with all of these cases, but maybe that is wrong. It at least is worth thinking about critically

Brendan Hansknecht (Oct 06 2024 at 20:06):

That would be optimal. Also, the platform may know about sqlite, but sqlite transactions are built into standard statements. So the platform may have zero control over transactions (state of current basic webserver)

Richard Feldman (Oct 06 2024 at 20:09):

ok so the way I'm approaching this is "I want to avoid having exception handling semantics in Roc applications, so how can we solve this in a way that gets to a good outcome in this scenario without doing that?"

Richard Feldman (Oct 06 2024 at 20:09):

one thing that comes to mind is that it seems like the best experience if the author of the SQLite library can solve this

Richard Feldman (Oct 06 2024 at 20:10):

Brendan Hansknecht (Oct 06 2024 at 20:10):

Richard Feldman (Oct 06 2024 at 20:13):

one idea for how that could work: have the platform expose a function that accepts a boxed "cleanup function" and returns a token value that works like a file descriptor

Richard Feldman (Oct 06 2024 at 20:14):

Brendan Hansknecht (Oct 06 2024 at 20:15):

We could even make roc_panic take a list of cleanup closures that the platform can use if they want. Add some sort of Task.finally that just adds onto the list. Not sure when exactly it would be cleared out though. But yeah. Something like that which gives platform cleanup control should work

Richard Feldman (Oct 06 2024 at 20:16):

so then the SQLite author can make sure that token has the same lifetime as the transaction normally, so when it gets deallocated it closes the transaction

Richard Feldman (Oct 06 2024 at 20:17):

and if the request handler stack overflows, the host has a list of tokens that never got resolved, and can run them all before 500ing

Richard Feldman (Oct 06 2024 at 20:18):

Brendan Hansknecht (Oct 06 2024 at 20:19):

Kasper Møller Andersen (Oct 06 2024 at 20:51):

Through all this, I’ve had a hard time figuring out what users are expected to do with their platforms. It feels like there’s two different ways of looking at platforms, which I find hard to reconcile:

To me, this sounds like there will be an explosion of very specific, and not necessarily well maintained, platforms a la basic-webserver-mssql-windowsserver2022, basic-webserver-postgres-ubuntu, etc.
This doesn’t feel like a healthy place to end up in though. It feels like any single platform consists of a number of choices, but an application author can’t easily compose those choices without forking a complete platform, and handling the full responsibility this entails.

I think a healthier position would be for some “base” platforms to be incredibly small and stable, to provide the raw primitives that a full platform needs (reusable Roc setup, IO, etc), and then have another layer on top where you define the interface and behavior that you want to work against. In other words, you may have a “base” ubuntu platform, which exposes the capabilities you get in Ubuntu, and then someone would build a basic-cli API on top of that. Does that make sense, or am I misunderstanding platforms here?

Brendan Hansknecht (Oct 06 2024 at 22:14):

This could be done but would probably a terrible experience and wouldn't be worth trying to do. If we need an ubuntu platform, the platform concept has probably failed. That sounds like a generic standard library with all of the io primitives.

These probably will exist for certain power users, but not for the average user. That said, forking an existing platform and slightly modifying it may be reasonably common. Not some super fork like integrating an entire mssql library into the platform, but a tiny fork that may modify the panic handler to send requests to your logging server.

I think that long term, most platforms will be in the vein of basic-webserver in terms of complexity. They won't be trying to own the world, but they be trying to have a robust set of primitives (for a webserver, these primitives are most likely for file io and for socket io). Once you have those two, you can do essentially everything a webserver would want to do. The postgres library can be built right in roc.

Brendan Hansknecht (Oct 06 2024 at 22:16):

The last part of what richard and I were talking about is how a platform like basic-webserver could offer the ability to run cleanup code after a panic. This would enable a postgres library to hook in and request for transactions to be rolled back on panic. It would enable more control from the end user in terms of adding a finally to any crash. This is all within a single platform.

Richard Feldman (Oct 06 2024 at 22:40):

Richard Feldman (Oct 06 2024 at 22:41):

I wouldn't expect many platforms to be operating system specific (although it's always possible) outside of maybe embedded systems

Richard Feldman (Oct 06 2024 at 22:41):

so I don't think it would be basic-webserver-postgres-ubuntu, but I could imagine a postgres-webserver platform existing

Richard Feldman (Oct 06 2024 at 23:22):

just to clarify on this part, I don't actually think this is the right way to think about it

Richard Feldman (Oct 06 2024 at 23:22):

for example, let's say I am an application author and I spawn an OS subprocess and run some logic in there

Richard Feldman (Oct 06 2024 at 23:23):

that's an example of creating a very hard boundary where anything can go wrong and I can definitely recover from it, and the platform doesn't need to be customized to do that

Richard Feldman (Oct 06 2024 at 23:28):

however, the platform is in charge of whether to offer primitives for spawning processes

Richard Feldman (Oct 06 2024 at 23:52):

maybe a bit of relevant context I should have started with: in Elm, you can't publish a package with Elm's equivalent of a crash in it

Richard Feldman (Oct 06 2024 at 23:52):

so in Elm, if you really want to publish a package that has an "unreachable" state in it, you really do have to do one of the hacks

Richard Feldman (Oct 06 2024 at 23:54):

Elm applications have a reputation for approximately never crashing in practice, and I think this is part of the reason why: aside from mistakes (e.g. of course libraries can still overflow the stack or get into an infinite loop) you really have to go out of your way to have a library crash

Richard Feldman (Oct 06 2024 at 23:54):

compare this to, for example, unwrap in Rust - which I think would be a mistake to include in Roc, precisely because it makes it so easy to introduce an unnecessary crash

Richard Feldman (Oct 06 2024 at 23:56):

the outcome Elm has seen from not having a crash equivalent is a major reason that Roc did not have crash at first, and why I was hesitant to introduce it (but I did find the argument compelling, and continue to, that if you think it's going to be unreachable, but it actually is reached in practice somehow, it's definitely best to be able to at least include some context on what happened that you thought couldn't happen)

Richard Feldman (Oct 06 2024 at 23:56):

it's also a reason that I am so strongly resistant to adding recovery mechanisms in userspace: it seems like every ecosystem that has these gets more crashes in practice than the Elm ecosystem does

Richard Feldman (Oct 06 2024 at 23:57):

and I think a contributing factor there is that culturally it's not only okay, but expected in many languages to throw for error handling

Richard Feldman (Oct 06 2024 at 23:58):

and if it doesn't end up being caught, shrug, not my problem; someone else should have handled it

Richard Feldman (Oct 06 2024 at 23:58):

whereas in Elm, there is only one way to say "it's someone else's job to handle this," which is Result (or some equivalent)

Richard Feldman (Oct 06 2024 at 23:58):

Richard Feldman (Oct 06 2024 at 23:59):

in order to get the same "applications essentially never crash in practice" outcome that Elm has gotten, and which systems with try/catch (and Rust too) are extremely far away from getting

Richard Feldman (Oct 07 2024 at 00:00):

Richard Feldman (Oct 07 2024 at 00:01):

because based on the experience of Elm vs every other language, I can't possibly see how #5 would do anything other than explode the number of crashes that actually happen to real end users

Richard Feldman (Oct 07 2024 at 00:01):

if it did anything other than that, Roc would be the first language where that turned out to be true, so I'd be really curious to see why we should expect it to be different from all the others :big_smile:

Oskar Hahn (Oct 07 2024 at 06:32):

I like, that an Roc application can not recover from a crash. But there is a counter example. Go has panic and recover. The normal way in Go to handle errors is by returning an error-value. Many people do not like the way you have to handle errors in Go. But still, (nearly) nobody is reaching for panic and recover.

I think the only time recover is used in Go is, when you are writing a framework and you do not expect your users to write high quality code. So you have to expect runtime-panics like zero division, nil pointer dereference or out of bound checks. In a context of Roc, the framework is the platform. So only the platform should use recover. This is already possible.

Oskar Hahn (Oct 07 2024 at 06:44):

If the pressure to add recover gets two high, then a way to add it could be, that you can only clean up your own mess with it. What I mean is, that it can only recover crashes from your own module/application but never above the module boundary.

So you would implement unwind, but internally check the stack trace. If the stack-trace only contains entries from builtins or the module calling recover, then it returns a Result. But if the stack contains other modules, it continues the crash.

With a recover like this, it would be impossible for library authors to expect there users to recover. But it would be possible to use it for your own code. Like

doSomeCalculateion = \a, b, c ->
  # This could overflow or divide by zero
  a * b / c

exportedFunction = \a, b, c ->
  recover (doSomeCalculation a b c)
  |> Result.mapErr \_-> OverflowOrZeroDivision

Richard Feldman (Oct 07 2024 at 11:16):

that's an interesting idea, although it seems likely that with that restriction there wouldn't be any demand for it in practice anyway! :big_smile:

Kasper Møller Andersen (Oct 07 2024 at 15:01):

Okay, I wasn't sure if you were implying that the database driver had to be integrated into the platform to give a robust experience. And yeah, having a platform per OS probably doesn't make sense, but what I was really going for was just a way to have shared primitives that you don't need to take responsibility for when forking a platform. Like you could have a Rust crate that defines a lot of common Roc operations (IO, panic handling, etc.), and then it would be easy to write your own platform by just calling out to that library for anything where you just want the "standard" behavior. This way you also get all the usual benefits of version tracking, and you don't have to worry about keeping your fork up to date.

Richard Feldman (Oct 07 2024 at 15:04):

Kasper Møller Andersen (Oct 07 2024 at 15:06):

I do agree it's nice that Elm libraries tend to not crash, but it's actually also something that makes me a bit nervous. Having worked for a number of years on a big and complicated Elm codebase, I know there will be times when you just end up in these branches, and the best thing you can really do is return some default value when it happens, because Elm doesn't give you tools to do better. So I don't think people would use their own custom stack overflow in general when they have to handle such a branch, but instead, I think the most common behavior is to just eat the failure, and try to keep going anyway. Which is also something I don't think you can really measure how often it happens across the ecosystem, so that's something of a ghost that's bothering me there.

Kasper Møller Andersen (Oct 07 2024 at 15:08):

Also, fun side note: this morning I got into work and the first thing I see is someone with a PR doing handling of stack overflow exceptions :big_smile:

Kasper Møller Andersen (Oct 07 2024 at 15:14):

I'm not sure how much I can say, so I'll be light on details, but we have a runtime that is executing queries on the JVM, and those can blow the stack. And we'd rather give the queries access to as much stack memory as possible, and recover when they exceed it, rather than limit how much stack they can use in the first place and not let them use the hardware to its full potential.

Kasper Møller Andersen (Oct 07 2024 at 15:27):

Richard Feldman (Oct 07 2024 at 15:29):

I'd say something more like "runtime crashes" (errors in general are for Result!) but yeah :big_smile:

Kasper Møller Andersen (Oct 07 2024 at 15:32):

Richard Feldman (Oct 07 2024 at 15:32):

I do think graceful recovery (when possible) is beneficial to user experience, although ideally it would be accompanied by logging so someone can find out it happened!

Silently giving an incorrect answer, on the other hand (e.g. division by zero silently returning a zero and continuing as if nothing wrong had happened) is definitely bad, and I'd say worse than crashing

Kasper Møller Andersen (Oct 07 2024 at 15:40):

Yup, and those are the ones I’m worried about, because they can be really hard to spot and I think many people may not be aware enough of them. But anyway, that’s all just good arguments for why Roc does as it does today. I just wanted to be sure it’s not about stability at all costs :blush:

Richard Feldman (Oct 07 2024 at 15:48):

Richard Feldman (Oct 07 2024 at 15:49):

Kasper Møller Andersen (Oct 07 2024 at 19:24):

Likewise! It’s always nice to get to go through a subject in a thorough manner, when everyone is gracious enough to talk through rookie understandings:smiling_face:

Kasper Møller Andersen (Oct 07 2024 at 20:01):

One example of this becoming an awkward fit I think is with GraphQL. GraphQL is most often used over HTTP, but it doesn’t need to be. HTTP is just one of many potential transports you can use for GraphQL. Because GraphQL is transport agnostic, it includes its own error handling mechanism. So ideally, if you have a webserver serving a GraphQL API, the GraphQL layer is actually the stability boundary. To have that be the case in Roc, you need a platform that is specifically for serving a GraphQL API with a webserver, as opposed to using a webserver platform where you just plug in a GraphQL library.

It’s still not a huge deal, because these errors should be quite rare of course. But I do think it works as an example of having a stability boundary above the platform.

Brendan Hansknecht (Oct 07 2024 at 23:03):

I think as long as you abstract away the protocol you could have a basic-webserver type platform that is separate from a GraphQL library. The library just will accept primitives that are protocol generic to build it's requests.

Brendan Hansknecht (Oct 07 2024 at 23:04):

That said, basic-webserver may not enable websockets. So you may still need to switch platforms if you want to server graphql over websockets. But you should still be able to use a protocol agnostic graphql library either way.

Richard Feldman (Oct 07 2024 at 23:53):

I could be wrong, but my understanding is that websockets can be implemented on top of a normal TCP socket

Brendan Hansknecht (Oct 08 2024 at 00:12):

I think so, but I have never looked into it. Just trying to point out that whatever graphql is implemented on top of can be decoupled from a graphql library in roc. A platform doesn't need to support all protocols. The library doesn't need to be specialized to the protocol. Should be able to make it generic and flexible.

Kasper Møller Andersen (Oct 08 2024 at 05:03):

My point related to the topic was that the GraphQL layer is where you want to catch a crash though :blush:

Brendan Hansknecht (Oct 08 2024 at 05:08):

Oh... you want to send the graphql equivalent of an http 500 status code if there is a crash. And you want to send it over whatever protocol the platform supports (http, websocket, etc)

Kasper Møller Andersen (Oct 08 2024 at 05:19):

Precisely! A GraphQL response should encode the error in its own format. GraphQL over HTTP will generally send errors as a HTTP 200 status for that reason for example.

Brendan Hansknecht (Oct 08 2024 at 05:31):

Yeah, so to support this case in a generic form, you would at least need to be able to register what the crash response should be. Graphql could register a 200 message with the text:

{ "errors": [{ "message": "Server error" }]}

Kasper Møller Andersen (Oct 08 2024 at 05:38):

Any webserver can serve both GraphQL and REST endpoints at the same time, so the error handling would to differentiate on which kind of endpoint was reached

Brendan Hansknecht (Oct 08 2024 at 05:40):

Kasper Møller Andersen (Oct 08 2024 at 07:11):

While this error handling would work in this case, this also reflects what I was trying to get at with platforms not being composable enough. That is, if a user has an existing webserver (based on any general web server platform likebasic-webserver, nea, or something else) which doesn’t offer this kind of error handling, and they now want to serve GraphQL, they’ll need to fork the platform to do so. It’s completely reasonable for a platform author to not have this error handling in their web server still though. So it feels to me like this error handling should be application code, and not something platform authors have the only say in.

Again, not a huge deal for this particular example, but it just makes my spidey sense tingle none the less :blush:

Stream: ideas

Topic: Dealing with logic errors in Roc

Kasper Møller Andersen (Oct 02 2024 at 11:52):

Hannes (Oct 02 2024 at 12:16):

Richard Feldman (Oct 02 2024 at 13:44):

Richard Feldman (Oct 02 2024 at 13:52):

Richard Feldman (Oct 02 2024 at 13:52):

Kasper Møller Andersen (Oct 02 2024 at 14:14):

Richard Feldman (Oct 02 2024 at 14:44):

Richard Feldman (Oct 02 2024 at 14:45):

Richard Feldman (Oct 02 2024 at 14:45):

Kasper Møller Andersen (Oct 02 2024 at 17:11):

Richard Feldman (Oct 02 2024 at 17:29):

Kasper Møller Andersen (Oct 06 2024 at 11:04):

Richard Feldman (Oct 06 2024 at 11:15):

Richard Feldman (Oct 06 2024 at 11:19):

Richard Feldman (Oct 06 2024 at 11:21):

Richard Feldman (Oct 06 2024 at 11:22):

Richard Feldman (Oct 06 2024 at 11:24):

Richard Feldman (Oct 06 2024 at 11:26):

Kasper Møller Andersen (Oct 06 2024 at 11:50):

Richard Feldman (Oct 06 2024 at 12:19):

Richard Feldman (Oct 06 2024 at 12:20):

Richard Feldman (Oct 06 2024 at 12:21):

Richard Feldman (Oct 06 2024 at 12:22):

Richard Feldman (Oct 06 2024 at 12:22):

Kasper Møller Andersen (Oct 06 2024 at 14:21):

Brendan Hansknecht (Oct 06 2024 at 15:32):

Brendan Hansknecht (Oct 06 2024 at 15:34):

Brendan Hansknecht (Oct 06 2024 at 15:37):

Brendan Hansknecht (Oct 06 2024 at 15:48):

Richard Feldman (Oct 06 2024 at 16:36):

Richard Feldman (Oct 06 2024 at 16:36):

Kasper Møller Andersen (Oct 06 2024 at 18:09):

Kasper Møller Andersen (Oct 06 2024 at 19:03):

Richard Feldman (Oct 06 2024 at 19:12):

Richard Feldman (Oct 06 2024 at 19:14):

Richard Feldman (Oct 06 2024 at 19:14):

Richard Feldman (Oct 06 2024 at 19:14):

Richard Feldman (Oct 06 2024 at 19:15):

Richard Feldman (Oct 06 2024 at 19:15):

Richard Feldman (Oct 06 2024 at 19:17):

Richard Feldman (Oct 06 2024 at 19:18):

Richard Feldman (Oct 06 2024 at 19:19):

Richard Feldman (Oct 06 2024 at 19:19):

Brendan Hansknecht (Oct 06 2024 at 19:19):

Richard Feldman (Oct 06 2024 at 19:20):

Richard Feldman (Oct 06 2024 at 19:20):

Brendan Hansknecht (Oct 06 2024 at 19:20):

Brendan Hansknecht (Oct 06 2024 at 19:21):

Brendan Hansknecht (Oct 06 2024 at 19:21):

Brendan Hansknecht (Oct 06 2024 at 19:22):

Brendan Hansknecht (Oct 06 2024 at 19:22):

Richard Feldman (Oct 06 2024 at 19:30):

Richard Feldman (Oct 06 2024 at 19:30):

Brendan Hansknecht (Oct 06 2024 at 19:30):

Richard Feldman (Oct 06 2024 at 19:31):

Richard Feldman (Oct 06 2024 at 19:32):

Brendan Hansknecht (Oct 06 2024 at 19:32):

Brendan Hansknecht (Oct 06 2024 at 19:33):

Richard Feldman (Oct 06 2024 at 19:34):

Richard Feldman (Oct 06 2024 at 19:34):

Brendan Hansknecht (Oct 06 2024 at 19:35):

Brendan Hansknecht (Oct 06 2024 at 19:35):

Brendan Hansknecht (Oct 06 2024 at 19:36):

Richard Feldman (Oct 06 2024 at 19:36):

Richard Feldman (Oct 06 2024 at 19:37):

Brendan Hansknecht (Oct 06 2024 at 19:38):

Richard Feldman (Oct 06 2024 at 19:38):

Richard Feldman (Oct 06 2024 at 19:39):

Richard Feldman (Oct 06 2024 at 19:39):

Brendan Hansknecht (Oct 06 2024 at 19:40):

Richard Feldman (Oct 06 2024 at 19:43):

Richard Feldman (Oct 06 2024 at 19:44):

Richard Feldman (Oct 06 2024 at 19:44):

Richard Feldman (Oct 06 2024 at 19:45):

Brendan Hansknecht (Oct 06 2024 at 19:48):

Brendan Hansknecht (Oct 06 2024 at 19:49):

Brendan Hansknecht (Oct 06 2024 at 19:54):

Richard Feldman (Oct 06 2024 at 19:54):