GHC Runtime Internals¶
The Storage Manager¶
GHC's storage manager is a generational copying collector with these key structures:
Megablocks (1MB): the unit of allocation from the OS. Each megablock is divided into blocks (4KB), with block descriptors (bdescr) stored at the start of the megablock.
Megablock (1MB):
┌──────────────────────────────────────────────────────┐
│ bdescr[0] │ bdescr[1] │ ... │ bdescr[N] │ padding │ ← FIRST_BLOCK_OFF
├──────────────────────────────────────────────────────┤
│ Block 0 (4KB) │ Block 1 (4KB) │ ... │ Block N (4KB) │
└──────────────────────────────────────────────────────┘
Bdescr(p) macro: given any pointer p into the heap, computes the address of its block descriptor by masking and offsetting. This is how the GC finds metadata for any heap object — it's O(1) pointer arithmetic, no lookup table.
HEAP_ALLOCED(p) macro: checks whether p falls within the GC's reserved virtual address range. GHC reserves a 1TB VA range (mblock_address_space) at startup. Any pointer within this range is a heap pointer; anything outside is static/code.
Why External Memory Is Hard¶
The Bdescr(p) macro assumes that every pointer resolves to a valid block descriptor. If you mmap memory at an arbitrary address and store a Haskell closure there, Bdescr(p) returns a pointer into uninitialized memory. The next GC dereferences this garbage pointer and crashes.
The ghc-fastboot solution: mmap inside the mblock_address_space range at a fixed offset (900GB into the 1TB range), then manually initialize the bdescr entries for each megablock. This makes Bdescr(p) return valid descriptors and HEAP_ALLOCED(p) return true, so the GC treats our memory as legitimate heap.
Block Descriptor Setup¶
Each megablock requires its bdescr array to be initialized:
// Head block: represents the entire megablock as a large object
bdescr *head = Bdescr(first_block);
head->start = first_block;
head->blocks = total_blocks;
head->flags = BF_LARGE | BF_PINNED; // never moved, never copied
head->gen = oldest_gen; // no promotion
// Sub-blocks: point back to head
for (int b = 1; b < total_blocks; b++) {
bdescr *bd = Bdescr(block_n);
bd->start = first_block; // same as head
bd->blocks = 0; // "part of a larger group"
bd->flags = BF_LARGE | BF_PINNED;
bd->gen = oldest_gen;
}
// Link into GC's large object list
dbl_link_onto(head, &oldest_gen->large_objects);
BF_LARGE | BF_PINNED tells the GC: this is a large object that should never be moved or evacuated. The GC will traverse it for pointers (if needed) but won't try to copy it. oldest_gen placement means it's only examined during major GC.
This bdescr initialization is currently the dominant cost of thaw (884µs for 91 megablocks). The mmap/mremap itself is 8µs. Future optimization: lazy bdescr init via userfaultfd, initializing each megablock's descriptors only when the GC first encounters a pointer into it.
GC Integration: BF_LARGE | BF_PINNED¶
ghc-fastboot does not use compact regions. Instead, it registers memory with the GC using the BF_LARGE | BF_PINNED flag combination on manually-initialized block descriptors.
This is a pragmatic choice:
-
BF_LARGE: tells the GC this is a large object. Large objects are managed by relinking their bdescr between generations, never copied. The GC scavenges them (follows pointers inside) but doesn't move them. -
BF_PINNED: tells the GC this object must never be moved or freed by the collector. Combined withBF_LARGE, this creates an immovable, persistent region that the GC is aware of but won't disturb. -
oldest_genplacement: the region is linked intooldest_gen->large_objects, so it's only examined during major GC (infrequent). TheStablePtron the root closure serves as a GC root, keeping the entire graph alive.
This combination is simpler than the compact region protocol (no StgCompactNFDataBlock headers, no compactFixupPointers calls, no block-chain management) and avoids the compact region's normal-form restriction.
Why Not Compact Regions?¶
Compact regions (BF_COMPACT) are opaque to the GC — it never traces inside. This is perfect for fully-evaluated data (constructors only) but prevents freezing thunks: if a thunk inside a compact region is forced and allocates new heap objects, those objects aren't visible to the GC and will be collected.
With BF_LARGE | BF_PINNED, the GC does trace into our region during major GC. This means thunks inside the frozen data can be safely forced — the GC will find any new objects they allocate. The cost is that major GC walks our bdescrs, but since the data is in oldest_gen, this only happens during major collections.
| Aspect | Compact Regions | ghc-fastboot |
|---|---|---|
| GC flag | BF_COMPACT |
BF_LARGE \| BF_PINNED |
| GC traversal | Never traces inside (opaque) | Traces normally during major GC |
| Closure types | Normal form only | All types (thunks, functions, PAPs, ...) |
| Thunk evaluation | Unsafe (GC can't see new objects) | Safe (GC scavenges the region) |
| Setup | Block chain + compact closure + fixup | Manual bdescr init + StablePtr |
| Memory layout | Compact region block chain | Contiguous mmap'd megablocks |
The Missing Public API¶
GHC has no public API for registering external memory with the GC. The machinery exists — Bdescr(), block flags, generation lists — but it's all internal to the RTS. ghc-fastboot accesses it by:
- Including RTS internal headers (
rts/storage/Block.h,rts/storage/GC.h) - Directly manipulating
bdescrstructs via pointer arithmetic - Linking into
oldest_gen->large_objectsviadbl_link_onto - Calling
getStablePtrfor GC root registration
This works because GHC statically links the RTS, making all internal symbols available. But it's fragile — dependent on RTS struct layouts that can change between GHC versions.
A proper public API would look like:
registerExternalHeap :: Ptr a -> Int -> IO ExternalHeapHandle
unregisterExternalHeap :: ExternalHeapHandle -> IO ()
This would enable not just ghc-fastboot but any project that needs to bring foreign memory into the GHC heap: shared memory IPC, memory-mapped databases, custom allocators, GPU-managed buffers.
GHC RTS Source References¶
rts/sm/GC.c— garbage collector, closure traversalrts/sm/Evac.c— evacuation,evacuate_compactfor compact region handlingrts/sm/CNF.c— compact normal forms,compactAllocateBlock,compactFixupPointersrts/include/rts/storage/Block.h—Bdescrmacro, block flags, megablock layoutrts/include/rts/storage/InfoTables.h— info table structure, closure type enumerationrts/include/rts/storage/GC.h— generation structure,oldest_gen,large_objectsrts/PrimOps.cmm— compact region primops (reference for Cmm calling convention)