Stream: compiler development

Topic: data structures in readonly section


view this post on Zulip Richard Feldman (Jul 13 2023 at 17:08):

let's say I have a top-level foo : List (List Str) that's a giant literal, and we want to compile that entire thing to be stored in the readonly section of the binary

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:09):

how do the internal pointers in there work? are there relocations or something?

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:09):

basically I'm trying to get a sense for:

  1. what bytes does the compiler need to emit to the binary?
  2. where in the binary do they need to be emitted?
  3. is there extra metadata that needs to get emitted somewhere else in the binary?

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 17:12):

Yeah, relocation for each

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 17:13):

Basically the relocation tells the loader to add the segment offset once loaded in memory to the pointer

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 17:13):

Requires adding them to the dynamic relocation list

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 17:16):

Also, shouldn't matter if it is top level or not. Either way we would probably want to generate this structure.

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 17:16):

What is the general thought or goal?

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 17:17):

Side note: this is the main case that the surgical linker doesn't support because it is annoying and adds a lot more binary shifting

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:19):

I'm thinking about how canonicalization's data structures should change in order to make it cache-able

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:20):

and one of the things that occurred to me is that when we're translating (for example) strings into the final format that will end up in the binary (e.g. replacing \n in the strings actual newline bytes), we should probably just cut to the chase and store it in the format that the actual binary wants

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:21):

so when we're reading it out of cache, we can just copy the bytes over as directly as possible

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:21):

and same for lists, nested lists, etc.

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 17:22):

For the pointers, you won't know the correct offsets yet. So I don't think you can do much.

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:23):

ah interesting

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:23):

so the offsets have to be based on how big the final header in the binary ends up being I guess?

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 17:27):

I need to double check the exact anchor point, but it is an absolute value dependent on where the final pointee lands in the binary.

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 17:38):

One thing I am a bit confused by with your goals, if using llvm, we don't control the relocations and such. So I'm not exactly sure what level you are trying to plan and work at.

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:39):

I'm just trying to figure out what representation we want to cache

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:39):

like what bytes to put on disk, so they're maximally useful when we read them back again because the file was cached

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 17:42):

I see

view this post on Zulip Brendan Hansknecht (Jul 13 2023 at 17:43):

Yeah, I mean could definitely dump relative offset based Bytes and then patch just the relative offset. As needed when continuing the build pipeline. Would be similar to catching in object files

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:50):

that makes sense! :thumbs_up:

view this post on Zulip Richard Feldman (Jul 13 2023 at 17:51):

I guess we could cache where the relative offsets are, to make them easier to patch


Last updated: Jul 06 2025 at 12:14 UTC