Skip to content

Thaw Implementation

Three Thaw Paths

fastboot_thaw_closure(cap, path)
    │
    ├─→ thaw_from_section()         ← ELF section: mremap, zero I/O
    │     _fastboot_section_anchor exists?
    │     header magic/version valid?
    │     mremap(MREMAP_FIXED | MREMAP_DONTUNMAP) → target_va
    │
    ├─→ thaw_from_fd(/proc/self/exe) ← EmbedFooter: mmap from binary
    │     read footer from end of file
    │     footer magic valid?
    │     mmap MAP_PRIVATE from binary file
    │
    └─→ thaw_from_fd(path)          ← standalone .snap file
          mmap MAP_PRIVATE from file

ELF Section Embedding

The most efficient path embeds the snapshot in the binary's ELF image:

  1. Build time: embed_section.S creates a small placeholder fastboot section with "aw" (alloc, write) flags

  2. Linker script: fastboot.ld places the section after .bss in its own PT_LOAD segment, preventing .bss overlap when the section is expanded

  3. Snapshot derivation: run the binary once to freeze, producing snapshot.snap

  4. objcopy --update-section: replaces the placeholder with the real snapshot data, expanding the PT_LOAD segment

  5. Runtime: the ELF loader maps the section into the process address space. thaw_from_section calls mremap(MREMAP_FIXED | MREMAP_DONTUNMAP) to move the page table entries to the target VA — pure kernel page table manipulation, 8µs for 91MB

BFD Linker Requirement

The fastboot ELF section requires the BFD linker (-fuse-ld=bfd). The Gold linker aggressively strips custom sections even with --undefined symbols. BFD respects section placement directives and properly handles INSERT AFTER .bss in linker scripts.

Linker Script

/* Place fastboot section after .bss in its own LOAD segment.
   This ensures objcopy --update-section can expand it without
   overlapping .bss or other data sections.
   SUBALIGN(4096) ensures internal alignment for mmap/mremap. */
SECTIONS {
  fastboot ALIGN(CONSTANT(MAXPAGESIZE)) : SUBALIGN(4096) { *(fastboot) }
} INSERT AFTER .bss;

Three-Phase Nix Build

The embedded pipeline is a three-phase nix build:

  1. nix build — compile ghc-fastboot with BFD linker, placeholder fastboot ELF section
  2. snapshot derivation — run binary once to freeze, cached
  3. bench-embedded derivation — objcopy --update-section fastboot=snapshot.snap

Fixed VA at 900GB Offset

The target VA is 0x4200000000 + 900GB + FIRST_BLOCK_OFF — 900GB into GHC's 1TB mblock_address_space. This ensures:

  • HEAP_ALLOCED(p) returns true for all frozen pointers
  • Bdescr(p) resolves to our manually-initialized descriptors
  • The address is far above the active GHC heap (which grows from the bottom), avoiding collisions
  • Same address across runs → va_delta = 0 → zero internal relocations

Performance Breakdown

For a 91MB snapshot (1M Vector Text entries):

Phase Time What happens
Header parse 4 µs Read SnapshotHeaderV2 from mapped memory
mmap anonymous 4 µs Reserve target VA range for bdescr area
mremap 8 µs Move page table entries from ELF section to target VA
Relocation 0 µs Same binary + fixed VA: nothing to do
bdescr init 884 µs Initialize block descriptors for 91 megablocks
Thaw total ~900 µs
RTS hs_init + hs_exit ~1600 µs Irreducible runtime overhead
Process total ~2600 µs 238x faster than 618ms cold start

Data pages are demand-paged: only pages actually accessed trigger minor page faults (~40ns each). A CLI tool that touches 100 entries out of 1M pays for ~100 pages, not ~23,000.