server platform that can upgrade itself · ideas

Stream: ideas

Topic: server platform that can upgrade itself

Richard Feldman (Dec 14 2023 at 01:00):

so I was reading https://matt.sh/htmx-is-a-erlang and I thought this quote was really interesting:

[Erlang creator Joe Armstrong] always said his favorite erlang program was the universal server: a process capable of receiving a message to become another server.

I've wondered in the past what it would look like to have a Roc server platform that could:

upgrade itself at runtime
that is, both the platform (including the host) and/or the application could be upgraded at runtime
...without dropping any requests in the process
and without leaking memory either, such that it could have theoretically indefinite uptime while continuing to upgrade itself

Richard Feldman (Dec 14 2023 at 01:05):

assuming you have the current basic-webserver API, where the application is nothing more than Request -> Task Response [] (and it holds no state in between requests other than maybe things like caching tcp connections behind the scenes), it seems like upgrading the host could be possible by having a super minimal "bootstrap" host which does nothing more than:

load a dynamic library that represents the actual host library
know how to recognize (somehow; hand waving a bit here) when it has been asked to upgrade to a different dynamic library
when that happens, it loads the new host dynamic library, tells it to start accepting conections, and tells the old one to stop accepting connections
once the old one runs out of in-flight requests and is essentially idle with no remaining work to do, it returns all of its heap pages back to the OS and communicates back to the bootstrap wrapper "hey I'm all done, you can dlclose me")

Richard Feldman (Dec 14 2023 at 01:06):

presumably if the application were also loaded dynamically, a similar strategy could work there too

Richard Feldman (Dec 14 2023 at 01:07):

so at that point, the original super minimal bootstrap logic would run forever, and as long as that never needed to be upgraded (which hopefully it wouldn't need to be, given how simple its job would be) I can't think of a reason this couldn't in principle run indefinitely :thinking:

Richard Feldman (Dec 14 2023 at 01:08):

(without ever having downtime)

Richard Feldman (Dec 14 2023 at 01:19):

of course you can already do this in a different way by running multiple server processes, but I think it's interesting to think about! :big_smile:

Brendan Hansknecht (Dec 14 2023 at 02:44):

Yeah, the easy way to do this is to have a load balancer and just take down and upgrade one server at a time. Site as a whole has 100% uptime

Brendan Hansknecht (Dec 14 2023 at 02:48):

Also, the bootstrap process has to make sure no connections drop while upgrading the server. So it would have to maintain a buffer for that.

Brendan Hansknecht (Dec 14 2023 at 02:49):

Oh, NVM. I see that you want both version running dynamically at the same time. Just need to make sure you avoid any sort of port issues

Brendan Hansknecht (Dec 14 2023 at 02:56):

Aside, if roc contains all your application logic, you probably don't really need to upgrade the server base often. So a platform like this that just deals with dynamically loading the roc app would be really cool and probably easy to do (probably would be really nice for quick feedback loop server development as well)

Luke Boswell (Dec 14 2023 at 03:37):

I need this

Luke Boswell (Dec 14 2023 at 03:37):

Well... I mean to say I would really love this to exist

Richard Feldman (Dec 14 2023 at 03:41):

for local development? Production? Both?

Richard Feldman (Dec 14 2023 at 03:41):

for local development I think hot code loading is the ultimate version of this

Luke Boswell (Dec 14 2023 at 03:41):

Yeah mostly for local development.

Luke Boswell (Dec 14 2023 at 03:43):

It currently takes ~5s for roc dev to reboot my webserver. I'm on macos arm64 using llvm. If that could happen instantly on file save it would be awesome.

Hannes Nevalainen (Dec 14 2023 at 21:39):

Hot reloading of modules in erlang/elixir works works pretty great when developing :)

Upgrading a whole application (release) is another story and is generally avoided in favor of blue-green deploys. It quickly gets really complex and comes with a ton of footguns when you have state involved that needs to be migrated between versions.
Hot upgrades is kinda oversold feature in Elixir/beam that most people will never use. But it is amazing that it can be done.

The erlang virtual machine got so many things right IMO (and most of it 30 years ago) and I wish it had a greater influence in the languages we use today. :)

Brendan Hansknecht (Dec 15 2023 at 03:26):

Luke Boswell said:

It currently takes ~5s for roc dev to reboot my webserver. I'm on macos arm64 using llvm. If that could happen instantly on file save it would be awesome.

How much of that time is linking? 2s?

Sky Rose (Dec 15 2023 at 03:28):

I have dreams about writing an Erlang platform that can do hot upgrades. I know Gleam skipped trying to do hot upgrades, but I hope Roc could do it with sufficiently fancy Erlang/Elixir platform code.

Luke Boswell (Dec 15 2023 at 03:43):

@Brendan Hansknecht I may have been overly dramatic... here is some actual data. I swear though, that sometimes it runs a slower than this, but I can't reproduce right now.

Luke Boswell (Dec 15 2023 at 03:45):

And for an optimized build

    Code Generation
          829.612 ms   Generate final IR from Mono IR
        10266.172 ms   Generate object

        11095.784 ms   Total

Finished compilation and code gen in 11435 ms

Produced a app.o file of size 429640

Finished linking in 284 ms

Brendan Hansknecht (Dec 15 2023 at 03:57):

Looks like you really need a full dev backend that supports everything basic webserver might do

Richard Feldman (Dec 15 2023 at 04:12):

so for the non-optimized build, that was:

82% LLVM
14% linking
4% parsing, canonicalization, type checking, monomorphization

dev backend and surgical linking: the perf benefits are for real :sweat_smile:

Richard Feldman (Dec 15 2023 at 04:12):

I really hope in 2024 we can get full coverage for dev backend and surgical linking...dropping ~96% of build time would be no joke!

Kevin Gillette (Dec 18 2023 at 00:24):

I think @Brendan Hansknecht was right. This is a pretty solved problem using a load balancer of any variety (i think they all support this)

Kevin Gillette (Dec 18 2023 at 00:28):

All you would need is the ability in the app to close the listener (stop accepting new connections) while serving out open connections (and generally closing keep-alive idle connections where applicable, since those might stay open a very long time)

Kevin Gillette (Dec 18 2023 at 00:54):

If you need the actual app processes to be able to do a direct handoff, at least on unix systems, you'd:

fork-exec the new version, making sure the open socket is inherited by the child process, along with any means to temporarily communicate with the child.
The child would set up self-check that it's ready to take over, communicate that to the parent, and then start accepting connections on the inherited socket. Alternatively this could wait until after a certain amount of successful requests have been processed by the child.
The parent would receive that communication of readiness from the child and then stop accepting new connections, and finally halt when its open connections have closed.

The key is the inherited socket (which works with any fd). In the brief time in which both parent and child processes are accepting connections, the kernel will distribute connections to both processes seamlessly.

Using processes instead of dynload/dynunload means that:

you don't need to worry as much about avoiding memory leaks
you can upgrade the platform code as well
you can support static binaries and don't need to mess around with the avoidable complexity of hot-swapping dynamic libraries.
if anything goes catastrophically wrong with the new version, such as a segfault, you can recover (i.e. parent just continues running).
as long as the new and old versions adhere to the same "protocol," you can migrate to wholly different implementations/languages

Last updated: Jul 23 2026 at 13:15 UTC