In the world of software, I think it's fair to divide the kinds of errors we need to deal with into two camps:
Predictable errors are where we model things with Result primarily, because we know all the error conditions ahead of time, roughly.
But unpredictable errors are the things where we don't know what's going to happen. Strictly speaking, that also includes things like segfaults, but I primarily want to view the unpredictable errors as being logic errors here. Cases where Roc encourages using crash today if you ever reach an impossible branch for example.
There are a couple of important facets of logic errors as they are handled, not only in Roc, but in other languages too:
crash and expect provide this experience in Roc, because they allow developers to be quickly notified that something is wrong during development, and we don't have to bend the type system to model something that shouldn't actually be happening in the first place.
However, I think there's a couple of issues that would be useful to address:
crash and expect feel very much like evolutions of what other languages have done. This makes them fell like a case of "developers want to be able to express this" rather than a holistic look at how to actually handle logic errors.To be clear, I think there is a real need for something like crash and expect. They are important for making it easy to spot logic errors during development, because they are very loud. But if I'm developing some web service, I might be serving many REST endpoints from the same service. If one of those endpoints call a function that happens to crash, then I don't want it to bring down my entire service. At the very least, I just want to make the single endpoint unavailable while I investigate, so the rest of the service can remain live. So even I can't do anything useful with the logic error itself, I can still offer a degraded service that remains useful.
So my first question is: should there be a way to catch a crash, a la catch_unwind in Rust?
One of my problems with catch_unwind is that it's very general, and you can really put it anywhere. But for logic errors specifically, I think there's usually a clear boundary of where you want it:
Maybe there's a way to (optionally) lint that you are remembering to call catch in only these places?
My next question though, is whether the behaviors in crash and expect are even the right primitives to begin with? Maybe there are better tools waiting to be found here?
One small evolution might be to make crash and expect parametrized over the behavior we expect them to trigger. For example, what if module params was used to pass in a "crasher", that could specialise how they would crash and define any guards where the crash would be caught? I don't really know if this would be feasible or even sensible, but I want to float the idea at least.
Or maybe we could define something more targeted at dealing with logic errors, e.g. LogicErrors.detectedError : Severity, Str -> .... This function would allow you to report that you hit an impossible error case to the platform, and then the platform could decide what to do. For example, if you pass in a severity of Fatal, it might crash as today. Or it might decide to block access to the specific REST endpoint where this error was detected, and automatically return a 503 error on it, until it can be manually unblocked.
One thing to keep in mind, I'm pretty sure platforms can choose how to handle crashes, e.g. a web server can choose to let each request crash separately
yeah I think this is the answer!
I definitely think we should never add anything to the language that applications or packages could use to recover from crash
but platform authors already can, and absolutely should!
To be clear, let’s not get hung up on web servers. But you’re saying that the platform is responsible for insulating the software from crashes?
As another example, let’s say I’m building a game, and there’s a crash in my animation subsystem. If the platform is running animations isolated from the rest of the code, then it can recover by just disabling animations for example. But if the animation is just running with the rest of the game loop, can the platform do that?
I think the best way to think about this is in terms of errors, e.g. mistakes
"if there is a mistake in this part of the code, what's the worst that can happen? how can we insulate against that to make it less bad if it happens? what's the cost of that insulation?"
forget about crash
Sure, and nobody sets out to overflow the stack or run out of memory. Before I go on, I would like to better understand why you don’t want developers to write this insulation code in Roc though?
if you can recover from crash in Roc code, then it has become throw and we have added try/catch, and it will become used for recoverable error handling
Is that a problem in Rust, since it has that exact setup? It's also something that can be designed in a few different ways in my mind, to discourage such use if that's really a worry.
To better motivate why I think it makes sense, I think software in general has really awful user-facing errors. Part of that I think, is that programming languages and frameworks make it very easy to "create" an error (crash, Err, etc.), but then leaves it up to developers to make it good for users.
The reason I wanted to single out logic errors, is because Roc has effectively given the tooling to create errors, with no chance for developers to create a good recovery story for users, outside of modifying the platform directly.
It's of course alright to say that these logic errors should be rare enough that it's not something Roc developers are actually exposed to dealing with. But having a way to trigger errors without a way of creating a user friendly recovery story just makes me uneasy.
Kasper Møller Andersen said:
Is that a problem in Rust, since it has that exact setup?
I wouldn't say Rust has that exact setup - if you look at the docs for panic recovery they say things like "this doesn't actually catch all panics, and also here are a bunch of things to be careful of if you use this with other language features"
Kasper Møller Andersen said:
The reason I wanted to single out logic errors, is because Roc has effectively given the tooling to create errors, with no chance for developers to create a good recovery story for users
right, but if you read "logic error" as "logic mistake" then the feature request becomes "Roc needs a way to correct arbitrary mistakes in other people's code without modifying that code"
Roc originally did not have a crash keyword; if you made a mistake that crashed the program, it was probably because you ran it out of memory or tried to do integer division by zero
and actually we tried having division return Result, but that didn't go well
so I think if there's a case to be made for a feature like this, it has to be made without reference to crash
for example, something like "if I call library code and it overflows the stack, or the heap, or goes into an infinite loop, or does integer division by zero, I want my application code to be able to defensively write code which recovers from that possibility, and spawning a new process to run that code is not a sufficient solution"
Richard Feldman sagde:
right, but if you read "logic error" as "logic mistake" then the feature request becomes "Roc needs a way to correct arbitrary mistakes in other people's code without modifying that code"
I think there's a gap between the two that doesn't need to be crossed though. I'm not interested in "fixing" the error, but it's still quite important that:
Going back to the example of a game where the animation system crashes, you probably want something like the following to happen:
Those are the kinds of scenarios I think are worth being able to deal with, which right now, I would need to fork the platform to do (as I understand it).
Kasper Møller Andersen said:
Going back to the example of a game where the animation system crashes
why did it crash?
(e.g. did it run out of stack space? heap space? integer division by zero?)
I think maybe another way to frame this is: for the purposes of this discussion, let's assume that crash has been removed from the language. There is no longer a keyword for that, or any equivalent of it.
in that world, what's the specific scenario we're trying to address?
the animation system did not do a crash because that doesn't exist. So what did it do that we're trying to recover from?
I guess it doesn’t actually matter. What’s important to me is the experience of recovering from something deemed unrecoverable. I would expect it to be the same in all those cases.
with no chance for developers to create a good recovery story for users, outside of modifying the platform directly.
I think it is important to remember that platforms are not set in stone. If you are working on a game, you definitely own the platform. So you can modify it to crash however you like. Even if you are working on basic-cli, you can fork it and change how panics are handled.
I do agree that crash is scary. There is a reason that I push for essentially no one ever using it (especially in libraries) unless they somehow hit an unreachable state. I think the world before crash was much worse than the world we have now with crash. At some point there are going to be:
Without crash, hacks have to be used that generate terrible error messages in order to cause a crash in a unreachable state: 255u8 + 1.
All this said, I think crash should be used as sparingly as possible. If crash becomes common in libraries, and actually gets hit semi often. I would hope the community as a whole would put pressure on those libraries to remove crash or switch to different libraries. If that fails, I think that Roc has a major issue. That would be proof to me that adding crash to Roc was probably a mistake.
In other words, in my view (as one of the main people that pushed for crash existing), feeling like you need to catch crashes would mean adding crash was a mistake. It would not be a reason to add some form of catch.
As a point of comparison, catch_unwind in rust.
The general advice is to only use it at the ffi (eg rust plugins for godot) or stability boundary (eg stop web request from killing webserver). Both of these exist in the platform layer for roc.
Kasper Møller Andersen said:
I guess it doesn’t actually matter. What’s important to me is the experience of recovering from something deemed unrecoverable. I would expect it to be the same in all those cases.
interesting - so in other languages, do you use things like try/catch around library calls in case they stack overflow, or heap overflow?
I don't, but maybe others do! :big_smile:
Brendan Hansknecht sagde:
I think it is important to remember that platforms are not set in stone. If you are working on a game, you definitely own the platform. So you can modify it to crash however you like. Even if you are working on basic-cli, you can fork it and change how panics are handled.
This may also just be somewhere where my mental model is off. With what I know about platforms today, it seems like they are responsible for a whole heap of things that I don't want to take responsibility for necessarily (like building good stack traces). So my intuition around forking a platform, is that it's analoguos to forking the JVM because you want different crash handling in your Java code. I know the JVM is a much bigger beast than a Roc platform, but in my mind, they fill a similar space.
I think the term "stability boundary" describes well what I feel is kind of missing.
Richard Feldman sagde:
interesting - so in other languages, do you use things like try/catch around library calls in case they stack overflow, or heap overflow?
Not for those kinds of crashes, but I've definitely seen Scala and Java code with extra catches in it to catch various runtime exceptions (which are basically the Java equivalents of crash). Part of that is also just Java having poor standards around what kinds of exceptions should actually be runtime exceptions rather than checked exceptions though.
As @Brendan Hansknecht was saying, if crash becomes a common thing in libraries, it can potentially make for a poor experience of using Roc libraries. But the way to minimize the problem is to encourage people to model their code in such a way that explicit crashing is not needed. But I'm not confident that people can be relied on enough to do so. Going back to Rust, it's basically the argument for social pressure on libraries to avoid unsafe code. This has kind of worked, but also had times where people came under a lot of pressure to get rid of their unsafe usage (i.e. the Actix debacle).
The thing that came to a front there was that:
unsafe being something that can impact you a lot, but you have little to no control over whether your dependency tree contains any unsafe (and whether it's been audited, etc.)unsafe when not needed, which the Actix maintainer went against, and a lot of people were unhappy about that.So I guess my position is that I really want to keep crash around, but I'm also looking for tooling to help me define and maintain my stability boundaries. Even in a web server where a request causes a crash, what if that means I've written some data into the database but not completed the rest of the required work? Is all the work being done for that request being seen as one atomic transaction that will get rolled back, or do I need to do something myself in case of a crash?
Kasper Møller Andersen said:
As Brendan Hansknecht was saying, if
crashbecomes a common thing in libraries, it can potentially make for a poor experience of using Roc libraries. But the way to minimize the problem is to encourage people to model their code in such a way that explicit crashing is not needed. But I'm not confident that people can be relied on enough to do so.
just to be totally honest, I think if we get to the point where crash is overused in libraries because library don't want to handle errors via Result (or otherwise), my ordering of preferences would put "go back to not having crash again" as preferable to "add a way to recover from crash"
basically, there is always going to be some point where library authors can make mistakes
and I don't think we should try to have a language feature that corrects arbitrary mistakes
in third-party code
a mistake in third-party code is going to cause a degraded user experience no matter what
maybe that's because a calculation is incorrect
for some category of mistakes, it's possible to recover from them gracefully
for example, you can recover from an accidental infinite loop by running that code on a different thread, timing it, and killing the thread if it runs longer than some configured timeout
this has downsides such as performance and code complexity, but it can result in a less bad user experience (an error message) than an infinite loop
so we could totally do that, but I don't think it's worth it
I do think there is an important point brought up in terms of stability boundary.
If in basic webserver I start a transaction then a crash happens, I believe it will completely clog up sqlite. So a user that starts a transaction then runs 3rd party code would be left in a fundamentally broken state if a crash happens. They have no way to fix this but avoid calling any code that might crash after a transaction is stated on a thread.
the platform can already solve that
because it knows about transactions
No it doesnt
It just knows about statements (which would all clean up but that transaction would still be open)
It also definitely wouldn't know about a transaction to something like postgres over TCP.
In sqlite, one statement requests beginning the transaction. Then you run n statements in the transaction. Then you run a final statement to close the transaction.
So even if you clean up all statements you may still be in the middle of a transaction.
oh you're assuming the platform isn't aware of SQLite and is instead just offering a socket primitive
I see
So for sqlite, I am just noting the state today in basic webserver. For postgres, I am trying to point out the general issue.
so for that issue, let's frame it in terms of a stack overflow
so an individual request handler overflows the stack and we want to close its SQLite transactions, right?
I don't think that is reasonable cause it many systems, stack overflows always crash the program. Like completely crash it.
So I would prefer to keep it in terms of something that might happen but would generally recover with a 500. So numeric overflow or any sort of exception otherwise.
Or division by zero
division by zero we can choose to handle in other ways, e.g. by having it return 0 like pony does
I'd specifically like to focus on stack overflow because it is unquestionably unrecoverable
That's exactly why it isn't useful to talk about
You want something in the grey area. How about integer overflow which we want to have crash due to all of the correctness implications
Like on stack overflow, we aren't even going to 500, the entire webserver is probably going to crash.
ok so I think this actually gets to the fundamental difference in perspective here
to me, the only cases we are talking about are situations like stack overflows where if it happens, we have decided it should be game over and there's no recovering from it
Stack overflow doesn't even call roc_panic though. It literally doesn't hit the path we are discussing
I cannot overstate how important that distinction is to me in this discussion
yes, exactly!
that is exactly why I want to focus on it
I don't follow.
if people are misusing crash, then to me the default best response to that is to try to educate people about what it's for.
if that doesn't work, then the next best response is to remove crash from the language so that if you're writing a library and hit a "this should never happen" situation, then your only option is to use one of the hacks we used to use (e.g. inducing a stack overflow) to address it, at which point it at least becomes obvious that you should never use that technique for error handling
the reason I think it's useful to talk about how to handle SQLite transactions in the presence of a stack overflow is that it's not a situation where either of those solutions would help
because the problem isn't that people have misused crash
and we also couldn't fix it by changing how integer overflow or division by zero works
I think there are 3 states:
roc_panic and partially recover. If we can't do cleanup this may leave the app in a continually partially failing state.So personally, I don't care much about 1 or 2. 3 is the important problematic case.
As a concrete case of what could happen in basic webserver. We can even pretend crash is gone:
roc_panic on overflow which cleans up all statements but doesn't know anything about the open transaction.well state 2 doesn't have to kill the entire root process - the host can install a signal handler, unwind, and return 500
and that's true for both stack overflow and running out of heap memory
Sure, then 2 falls into 3 and it matters, but it specifically only matters if the host might recover in a potentially invalid state.
yeah exactly
Personally. I would like to pin to the concrete example. I think it is what would lead to users wanting some form of catch
Or at least some form of errdefer or finally to clean up in the case of an exception.
so how would this specific case be implemented in the compiler?
C++ exception style?
I'm just trying to point it out as a motivating example. One solution would be c++ style exceptions then enable finally to clean up.
this specific case being (just to make sure we're on the same page):
that's the scenario, right?
Or even could be done without c++ exceptions before calling roc_panic. Like register functions to call to clean up before calling roc_panic. I just assume a platform won't be able to deal with all of these cases, but maybe that is wrong. It at least is worth thinking about critically
we want the outcome to be that the request handler unwinds, responds with a 500, and the transaction is closed and nothing is leaked
That would be optimal. Also, the platform may know about sqlite, but sqlite transactions are built into standard statements. So the platform may have zero control over transactions (state of current basic webserver)
ok so the way I'm approaching this is "I want to avoid having exception handling semantics in Roc applications, so how can we solve this in a way that gets to a good outcome in this scenario without doing that?"
one thing that comes to mind is that it seems like the best experience if the author of the SQLite library can solve this
so application authors don't need to code defensively for this scenario
Yep
one idea for how that could work: have the platform expose a function that accepts a boxed "cleanup function" and returns a token value that works like a file descriptor
in terms of how the host tracks it
We could even make roc_panic take a list of cleanup closures that the platform can use if they want. Add some sort of Task.finally that just adds onto the list. Not sure when exactly it would be cleared out though. But yeah. Something like that which gives platform cleanup control should work
so then the SQLite author can make sure that token has the same lifetime as the transaction normally, so when it gets deallocated it closes the transaction
and if the request handler stack overflows, the host has a list of tokens that never got resolved, and can run them all before 500ing
I think that can be implemented today with no implementation changes needed
in the same way that file descriptors can be
Ah, I see. Yeah. I think that works.
Through all this, I’ve had a hard time figuring out what users are expected to do with their platforms. It feels like there’s two different ways of looking at platforms, which I find hard to reconcile:
To me, this sounds like there will be an explosion of very specific, and not necessarily well maintained, platforms a la basic-webserver-mssql-windowsserver2022, basic-webserver-postgres-ubuntu, etc.
This doesn’t feel like a healthy place to end up in though. It feels like any single platform consists of a number of choices, but an application author can’t easily compose those choices without forking a complete platform, and handling the full responsibility this entails.
I think a healthier position would be for some “base” platforms to be incredibly small and stable, to provide the raw primitives that a full platform needs (reusable Roc setup, IO, etc), and then have another layer on top where you define the interface and behavior that you want to work against. In other words, you may have a “base” ubuntu platform, which exposes the capabilities you get in Ubuntu, and then someone would build a basic-cli API on top of that. Does that make sense, or am I misunderstanding platforms here?
you may have a “base”
ubuntuplatform
This could be done but would probably a terrible experience and wouldn't be worth trying to do. If we need an ubuntu platform, the platform concept has probably failed. That sounds like a generic standard library with all of the io primitives.
platforms a la
basic-webserver-mssql-windowsserver2022
These probably will exist for certain power users, but not for the average user. That said, forking an existing platform and slightly modifying it may be reasonably common. Not some super fork like integrating an entire mssql library into the platform, but a tiny fork that may modify the panic handler to send requests to your logging server.
I think that long term, most platforms will be in the vein of basic-webserver in terms of complexity. They won't be trying to own the world, but they be trying to have a robust set of primitives (for a webserver, these primitives are most likely for file io and for socket io). Once you have those two, you can do essentially everything a webserver would want to do. The postgres library can be built right in roc.
The last part of what richard and I were talking about is how a platform like basic-webserver could offer the ability to run cleanup code after a panic. This would enable a postgres library to hook in and request for transactions to be rolled back on panic. It would enable more control from the end user in terms of adding a finally to any crash. This is all within a single platform.
yeah, to elaborate, here are some scenarios I imagine happening long-term:
basic-webserver and basic-cli would be examples of this.I wouldn't expect many platforms to be operating system specific (although it's always possible) outside of maybe embedded systems
so I don't think it would be basic-webserver-postgres-ubuntu, but I could imagine a postgres-webserver platform existing
Kasper Møller Andersen said:
Platforms can also handle logic which is very application specific, like defining your stability boundaries. Users are encouraged to own their platform for this if they want it changed.
just to clarify on this part, I don't actually think this is the right way to think about it
for example, let's say I am an application author and I spawn an OS subprocess and run some logic in there
that's an example of creating a very hard boundary where anything can go wrong and I can definitely recover from it, and the platform doesn't need to be customized to do that
however, the platform is in charge of whether to offer primitives for spawning processes
maybe a bit of relevant context I should have started with: in Elm, you can't publish a package with Elm's equivalent of a crash in it
so in Elm, if you really want to publish a package that has an "unreachable" state in it, you really do have to do one of the hacks
Elm applications have a reputation for approximately never crashing in practice, and I think this is part of the reason why: aside from mistakes (e.g. of course libraries can still overflow the stack or get into an infinite loop) you really have to go out of your way to have a library crash
compare this to, for example, unwrap in Rust - which I think would be a mistake to include in Roc, precisely because it makes it so easy to introduce an unnecessary crash
the outcome Elm has seen from not having a crash equivalent is a major reason that Roc did not have crash at first, and why I was hesitant to introduce it (but I did find the argument compelling, and continue to, that if you think it's going to be unreachable, but it actually is reached in practice somehow, it's definitely best to be able to at least include some context on what happened that you thought couldn't happen)
it's also a reason that I am so strongly resistant to adding recovery mechanisms in userspace: it seems like every ecosystem that has these gets more crashes in practice than the Elm ecosystem does
and I think a contributing factor there is that culturally it's not only okay, but expected in many languages to throw for error handling
and if it doesn't end up being caught, shrug, not my problem; someone else should have handled it
whereas in Elm, there is only one way to say "it's someone else's job to handle this," which is Result (or some equivalent)
and that's the way I want it to be in Roc
in order to get the same "applications essentially never crash in practice" outcome that Elm has gotten, and which systems with try/catch (and Rust too) are extremely far away from getting
so this is also why my ordering of preferences is:
crash, so if the inconceivable happens, at least there's a helpful message to explain wha happenedcrash from the languagecrash in applicationsbecause based on the experience of Elm vs every other language, I can't possibly see how #5 would do anything other than explode the number of crashes that actually happen to real end users
if it did anything other than that, Roc would be the first language where that turned out to be true, so I'd be really curious to see why we should expect it to be different from all the others :big_smile:
Richard Feldman said:
because based on the experience of Elm vs every other language, I can't possibly see how
#5would do anything other than explode the number of crashes that actually happen to real end users
I like, that an Roc application can not recover from a crash. But there is a counter example. Go has panic and recover. The normal way in Go to handle errors is by returning an error-value. Many people do not like the way you have to handle errors in Go. But still, (nearly) nobody is reaching for panic and recover.
I think the only time recover is used in Go is, when you are writing a framework and you do not expect your users to write high quality code. So you have to expect runtime-panics like zero division, nil pointer dereference or out of bound checks. In a context of Roc, the framework is the platform. So only the platform should use recover. This is already possible.
If the pressure to add recover gets two high, then a way to add it could be, that you can only clean up your own mess with it. What I mean is, that it can only recover crashes from your own module/application but never above the module boundary.
So you would implement unwind, but internally check the stack trace. If the stack-trace only contains entries from builtins or the module calling recover, then it returns a Result. But if the stack contains other modules, it continues the crash.
With a recover like this, it would be impossible for library authors to expect there users to recover. But it would be possible to use it for your own code. Like
doSomeCalculateion = \a, b, c ->
# This could overflow or divide by zero
a * b / c
exportedFunction = \a, b, c ->
recover (doSomeCalculation a b c)
|> Result.mapErr \_-> OverflowOrZeroDivision
that's an interesting idea, although it seems likely that with that restriction there wouldn't be any demand for it in practice anyway! :big_smile:
Okay, I wasn't sure if you were implying that the database driver had to be integrated into the platform to give a robust experience. And yeah, having a platform per OS probably doesn't make sense, but what I was really going for was just a way to have shared primitives that you don't need to take responsibility for when forking a platform. Like you could have a Rust crate that defines a lot of common Roc operations (IO, panic handling, etc.), and then it would be easy to write your own platform by just calling out to that library for anything where you just want the "standard" behavior. This way you also get all the usual benefits of version tracking, and you don't have to worry about keeping your fork up to date.
ahh gotcha!
Richard Feldman sagde:
it's also a reason that I am so strongly resistant to adding recovery mechanisms in userspace: it seems like every ecosystem that has these gets more crashes in practice than the Elm ecosystem does
I do agree it's nice that Elm libraries tend to not crash, but it's actually also something that makes me a bit nervous. Having worked for a number of years on a big and complicated Elm codebase, I know there will be times when you just end up in these branches, and the best thing you can really do is return some default value when it happens, because Elm doesn't give you tools to do better. So I don't think people would use their own custom stack overflow in general when they have to handle such a branch, but instead, I think the most common behavior is to just eat the failure, and try to keep going anyway. Which is also something I don't think you can really measure how often it happens across the ecosystem, so that's something of a ghost that's bothering me there.
Also, fun side note: this morning I got into work and the first thing I see is someone with a PR doing handling of stack overflow exceptions :big_smile:
I'm not sure how much I can say, so I'll be light on details, but we have a runtime that is executing queries on the JVM, and those can blow the stack. And we'd rather give the queries access to as much stack memory as possible, and recover when they exceed it, rather than limit how much stack they can use in the first place and not let them use the hardware to its full potential.
But anyway, summing up the current state of affairs:
Kasper Møller Andersen said:
Roc code is not supposed to be able to deal with runtime errors
I'd say something more like "runtime crashes" (errors in general are for Result!) but yeah :big_smile:
Fair enough :stuck_out_tongue:
Kasper Møller Andersen said:
Having worked for a number of years on a big and complicated Elm codebase, I know there will be times when you just end up in these branches, and the best thing you can really do is return some default value when it happens, because Elm doesn't give you tools to do better. So I don't think people would use their own custom stack overflow in general when they have to handle such a branch, but instead, I think the most common behavior is to just eat the failure, and try to keep going anyway. Which is also something I don't think you can really measure how often it happens across the ecosystem, so that's something of a ghost that's bothering me there.
I do think graceful recovery (when possible) is beneficial to user experience, although ideally it would be accompanied by logging so someone can find out it happened!
Silently giving an incorrect answer, on the other hand (e.g. division by zero silently returning a zero and continuing as if nothing wrong had happened) is definitely bad, and I'd say worse than crashing
Yup, and those are the ones I’m worried about, because they can be really hard to spot and I think many people may not be aware enough of them. But anyway, that’s all just good arguments for why Roc does as it does today. I just wanted to be sure it’s not about stability at all costs :blush:
cool, thanks for talking through all that! :smiley:
I appreciate your patience with it :heart:
Likewise! It’s always nice to get to go through a subject in a thorough manner, when everyone is gracious enough to talk through rookie understandings:smiling_face:
One example of this becoming an awkward fit I think is with GraphQL. GraphQL is most often used over HTTP, but it doesn’t need to be. HTTP is just one of many potential transports you can use for GraphQL. Because GraphQL is transport agnostic, it includes its own error handling mechanism. So ideally, if you have a webserver serving a GraphQL API, the GraphQL layer is actually the stability boundary. To have that be the case in Roc, you need a platform that is specifically for serving a GraphQL API with a webserver, as opposed to using a webserver platform where you just plug in a GraphQL library.
It’s still not a huge deal, because these errors should be quite rare of course. But I do think it works as an example of having a stability boundary above the platform.
I think as long as you abstract away the protocol you could have a basic-webserver type platform that is separate from a GraphQL library. The library just will accept primitives that are protocol generic to build it's requests.
That said, basic-webserver may not enable websockets. So you may still need to switch platforms if you want to server graphql over websockets. But you should still be able to use a protocol agnostic graphql library either way.
I could be wrong, but my understanding is that websockets can be implemented on top of a normal TCP socket
I think so, but I have never looked into it. Just trying to point out that whatever graphql is implemented on top of can be decoupled from a graphql library in roc. A platform doesn't need to support all protocols. The library doesn't need to be specialized to the protocol. Should be able to make it generic and flexible.
My point related to the topic was that the GraphQL layer is where you want to catch a crash though :blush:
I don't actually understand that part. Why would graphql be catching the crash?
Oh... you want to send the graphql equivalent of an http 500 status code if there is a crash. And you want to send it over whatever protocol the platform supports (http, websocket, etc)
Precisely! A GraphQL response should encode the error in its own format. GraphQL over HTTP will generally send errors as a HTTP 200 status for that reason for example.
Yeah, so to support this case in a generic form, you would at least need to be able to register what the crash response should be. Graphql could register a 200 message with the text:
{ "errors": [{ "message": "Server error" }]}
Any webserver can serve both GraphQL and REST endpoints at the same time, so the error handling would to differentiate on which kind of endpoint was reached
yeah, would need to be path aware.
While this error handling would work in this case, this also reflects what I was trying to get at with platforms not being composable enough. That is, if a user has an existing webserver (based on any general web server platform likebasic-webserver, nea, or something else) which doesn’t offer this kind of error handling, and they now want to serve GraphQL, they’ll need to fork the platform to do so. It’s completely reasonable for a platform author to not have this error handling in their web server still though. So it feels to me like this error handling should be application code, and not something platform authors have the only say in.
Again, not a huge deal for this particular example, but it just makes my spidey sense tingle none the less :blush:
Last updated: Jun 16 2026 at 16:19 UTC