Thaw Implementation¶
Three Thaw Paths¶
fastboot_thaw_closure(cap, path)
│
├─→ thaw_from_section() ← ELF section: mremap, zero I/O
│ _fastboot_section_anchor exists?
│ header magic/version valid?
│ mremap(MREMAP_FIXED | MREMAP_DONTUNMAP) → target_va
│
├─→ thaw_from_fd(/proc/self/exe) ← EmbedFooter: mmap from binary
│ read footer from end of file
│ footer magic valid?
│ mmap MAP_PRIVATE from binary file
│
└─→ thaw_from_fd(path) ← standalone .snap file
mmap MAP_PRIVATE from file
ELF Section Embedding¶
The most efficient path embeds the snapshot in the binary's ELF image:
-
Build time:
embed_section.Screates a small placeholderfastbootsection with"aw"(alloc, write) flags -
Linker script:
fastboot.ldplaces the section after.bssin its own PT_LOAD segment, preventing.bssoverlap when the section is expanded -
Snapshot derivation: run the binary once to freeze, producing
snapshot.snap -
objcopy --update-section: replaces the placeholder with the real snapshot data, expanding the PT_LOAD segment -
Runtime: the ELF loader maps the section into the process address space.
thaw_from_sectioncallsmremap(MREMAP_FIXED | MREMAP_DONTUNMAP)to move the page table entries to the target VA — pure kernel page table manipulation, 8µs for 91MB
BFD Linker Requirement¶
The fastboot ELF section requires the BFD linker (-fuse-ld=bfd). The Gold linker aggressively strips custom sections even with --undefined symbols. BFD respects section placement directives and properly handles INSERT AFTER .bss in linker scripts.
Linker Script¶
/* Place fastboot section after .bss in its own LOAD segment.
This ensures objcopy --update-section can expand it without
overlapping .bss or other data sections.
SUBALIGN(4096) ensures internal alignment for mmap/mremap. */
SECTIONS {
fastboot ALIGN(CONSTANT(MAXPAGESIZE)) : SUBALIGN(4096) { *(fastboot) }
} INSERT AFTER .bss;
Three-Phase Nix Build¶
The embedded pipeline is a three-phase nix build:
nix build— compileghc-fastbootwith BFD linker, placeholderfastbootELF sectionsnapshotderivation — run binary once to freeze, cachedbench-embeddedderivation —objcopy --update-section fastboot=snapshot.snap
Fixed VA at 900GB Offset¶
The target VA is 0x4200000000 + 900GB + FIRST_BLOCK_OFF — 900GB into GHC's 1TB mblock_address_space. This ensures:
HEAP_ALLOCED(p)returns true for all frozen pointersBdescr(p)resolves to our manually-initialized descriptors- The address is far above the active GHC heap (which grows from the bottom), avoiding collisions
- Same address across runs →
va_delta = 0→ zero internal relocations
Performance Breakdown¶
For a 91MB snapshot (1M Vector Text entries):
| Phase | Time | What happens |
|---|---|---|
| Header parse | 4 µs | Read SnapshotHeaderV2 from mapped memory |
| mmap anonymous | 4 µs | Reserve target VA range for bdescr area |
| mremap | 8 µs | Move page table entries from ELF section to target VA |
| Relocation | 0 µs | Same binary + fixed VA: nothing to do |
| bdescr init | 884 µs | Initialize block descriptors for 91 megablocks |
| Thaw total | ~900 µs | |
| RTS hs_init + hs_exit | ~1600 µs | Irreducible runtime overhead |
| Process total | ~2600 µs | 238x faster than 618ms cold start |
Data pages are demand-paged: only pages actually accessed trigger minor page faults (~40ns each). A CLI tool that touches 100 entries out of 1M pays for ~100 pages, not ~23,000.