Stream: compiler development

Topic: step debugging in an interpreter


view this post on Zulip Richard Feldman (Feb 15 2025 at 23:33):

one of the things that's cool about interpreters like Ruby's (which I'll use as an example because I'm the most familiar with it) is that you can do things like:

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:34):

this seems both awesome and also feasible with our planned interpreter backend, such that when you do roc foo.roc it just supports all of this right away, as long as you didn't pass --optimize to roc

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:34):

however, it's still nontrivial!

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:35):

for reasons I'd also prefer to keep out of scope of this thread, let's assume that we are interpreting canonicalized IR

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:36):

if the debugger pauses the interpreter on a particular IR node, and you change the source file on disk...when you resume the interpreter, how does it know that the source file changed? And how can it modify the upcoming instructions, including the very next line of the program, when it's in the middle of interpreting a canonical IR in memory?

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:39):

regarding how we know the source file changed, at first I thought we could just diff the current source file after you hit resume (or step, etc.) to see if it changed, but then I realized you might have edited another module - possibly one that's not even directly imported by this one, but rather indirectly imported

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:39):

in that scenario, are we really going to re-check the disk every single time we call a function from another module? That would be way too slow.

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:40):

instead, the fast way to do this would be that, whenever roc is running a program with debugging enabled, it sets up a watch on every file that gets imported, and whenever that file changes, we know it's time to rebuild the canonical IR (and potentially report new compilation errors)

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:40):

at this point I realized if we have that, then we have realtime hot code loading :sweat_smile:

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:43):

so really, I think the experience I mentioned at the start the thread can be broken down into two projects:

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:44):

there might be a bit of coordination between the two, but I think the vast majority of each of those projects seem totally independent of one another

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:45):

e.g. the hot code loading needs to (I think) keep around the source file hashes in memory, along with the dependency graph, so it can efficiently tell which IRs need to be rebuilt in memory when source files change on disk

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:45):

I think it also needs to have a string-based translation step for "upgrading" all the interned IDs and such when a source file changes

view this post on Zulip Joshua Warner (Feb 15 2025 at 23:46):

Rather than keeping around hashes in memory, I would instead (or in addition) keep around the stat - mtime, ctime, inode, size - and if none of those have changed, assume the hash hasn't changed either

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:47):

yep, we can do both! Related: https://apenwarr.ca/log/20181113

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:47):

although that's more about rebuilds

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:47):

if we're running and have OS file watches set up, then we get events whenever things change (and if they changed, then certainly mtime changed)

view this post on Zulip Joshua Warner (Feb 15 2025 at 23:48):

OS file watches are not completely reliable, FWIW

view this post on Zulip Joshua Warner (Feb 15 2025 at 23:48):

(and if they changed, then certainly mtime changed)

This is very much not true

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:48):

oh, true

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:48):

given second resolution etc.

view this post on Zulip Joshua Warner (Feb 15 2025 at 23:49):

Yeah, also some apps have the annoying proclivity to modify a file and then set the mtime to some earlier time (perhaps the exact same time)

view this post on Zulip Joshua Warner (Feb 15 2025 at 23:50):

The particular such apps I know of are not text editors tho. Text editors are probably not being that sneaky.

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:50):

hopefully haha

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:50):

regardless, I haven't heard of OS file watching APIs being unreliable

view this post on Zulip Richard Feldman (Feb 15 2025 at 23:50):

do you have any links I could read to learn more?

view this post on Zulip Joshua Warner (Feb 15 2025 at 23:55):

See https://developer.apple.com/documentation/coreservices/kfseventstreameventflaguserdropped for example

view this post on Zulip Joshua Warner (Feb 15 2025 at 23:56):

watchman handles that sort of thing for you

view this post on Zulip Joshua Warner (Feb 15 2025 at 23:56):

(as probably do many libraries that abstract over file events)

view this post on Zulip Joshua Warner (Feb 15 2025 at 23:58):

That sort of thing can happen if the system is under heavy filesystem load, for instance

view this post on Zulip Joshua Warner (Feb 15 2025 at 23:58):

OSes generally prioritize availability over delivering all file events

view this post on Zulip Joshua Warner (Feb 15 2025 at 23:59):

Switching branches in a large repo can trigger that sort of behavior quite frequently

view this post on Zulip Richard Feldman (Feb 16 2025 at 00:03):

oh I see

view this post on Zulip Richard Feldman (Feb 16 2025 at 00:03):

well that's just normal error handling though right?

view this post on Zulip Richard Feldman (Feb 16 2025 at 00:04):

like the OS still informs us that a full file scan is necessary, it's not like we just silently don't get any info

view this post on Zulip Joshua Warner (Feb 16 2025 at 00:04):

I believe so, yes

view this post on Zulip Joshua Warner (Feb 16 2025 at 00:04):

Of course the same caveats for mtime often apply to file events

view this post on Zulip Joshua Warner (Feb 16 2025 at 00:05):

e.g. memory mapped files can delay file events (and maybe suppress them altogether?)

view this post on Zulip Joshua Warner (Feb 16 2025 at 00:06):

That said, I would be very much not shocked to learn that there are OSes and configurations thereof that can legitimately drop file events with no notice.

view this post on Zulip Joshua Warner (Feb 16 2025 at 00:06):

File events are generally not treated with the same level of care as other file system operations

view this post on Zulip Joshua Warner (Feb 16 2025 at 00:08):

Anyway, popping back up a level, I'd recommend looking at how the git index works

view this post on Zulip Joshua Warner (Feb 16 2025 at 00:08):

That's the sort of state we'd want to keep in memory

view this post on Zulip Joshua Warner (Feb 16 2025 at 00:18):

And, popping back to the original discussion, debuggers / hot reloading is very cool! (and complicated)

view this post on Zulip Joshua Warner (Feb 16 2025 at 00:18):

I'm reminded about https://mun-lang.org/

view this post on Zulip Richard Feldman (Feb 16 2025 at 00:22):

I think there's going to be some overlap between what we'll want for caching, what we'll want for editor tooling, and what we'll want for hot code loading

view this post on Zulip Brendan Hansknecht (Feb 16 2025 at 01:16):

This sounds fun. I'm for it

view this post on Zulip Brendan Hansknecht (Feb 16 2025 at 01:16):

Also, I think general hot code reloading would be much easier than patching mid function

view this post on Zulip Brendan Hansknecht (Feb 16 2025 at 01:22):

Also, I'm more used to seeing a repl from debugger mid function than patching. I think that also is an easier problem to solve than full mid function patching

view this post on Zulip Richard Feldman (Feb 16 2025 at 01:44):

I actually suspect it's not as hard as it sounds

view this post on Zulip Richard Feldman (Feb 16 2025 at 01:45):

basically you do a diff and it's just very conservative

view this post on Zulip Richard Feldman (Feb 16 2025 at 01:46):

if anything changed about the function you're in the middle of executing while the debugger is stopped, you verify that absolutely nothing changed prior to the current line where you're paused

view this post on Zulip Richard Feldman (Feb 16 2025 at 01:48):

bc if it did, then you can no longer be confident that the state you're currently in at runtime even exists in the new source file

view this post on Zulip Richard Feldman (Feb 16 2025 at 01:49):

whereas if the entire function up to the current place where you're paused is unchanged, then you can re-canonicalize, use source regions to find the place in the new canonical iR where you need to pick up, and go from there

view this post on Zulip Richard Feldman (Feb 16 2025 at 01:50):

and if anything changed in the function after where you paused, it's fine

view this post on Zulip Joshua Warner (Feb 16 2025 at 01:56):

You also need to know that the parts of the function that have changed either haven't been run, or wouldn't have contributed to the current state of the program - i.e. you need to keep track of code coverage

view this post on Zulip Joshua Warner (Feb 16 2025 at 01:56):

(maybe that's what you meant)

view this post on Zulip Richard Feldman (Feb 16 2025 at 02:19):

yeah a crude form of that - just source diff basically

view this post on Zulip Richard Feldman (Feb 16 2025 at 02:20):

I guess a fancier version could do an IR diff

view this post on Zulip Richard Feldman (Feb 16 2025 at 02:20):

in case you only added a comment or something

view this post on Zulip Brendan Hansknecht (Feb 16 2025 at 03:16):

if anything changed about the function you're in the middle of executing while the debugger is stopped, you verify that absolutely nothing changed prior to the current line where you're paused

This sounds less nice than just getting an a REPL, but I guess we can support both

view this post on Zulip Richard Feldman (Feb 16 2025 at 03:29):

oh sorry, I do actually mean both :big_smile:

view this post on Zulip Richard Feldman (Feb 16 2025 at 03:29):

like in Ruby you get a repl + debugger at the same time

view this post on Zulip Richard Feldman (Feb 16 2025 at 03:29):

like you're in a repl, but then also you have access to debugging commands for stepping, resuming, etc.

view this post on Zulip Brendan Hansknecht (Feb 16 2025 at 03:30):

Yeah, so I guess all 3? repl + debugger + hot reloading (even line by line in the debugger)?

view this post on Zulip Richard Feldman (Feb 16 2025 at 03:30):

yep! :grinning_face_with_smiling_eyes:


Last updated: Jul 06 2025 at 12:14 UTC