Skip to content

GHC Runtime Internals

The Storage Manager

GHC's storage manager is a generational copying collector with these key structures:

Megablocks (1MB): the unit of allocation from the OS. Each megablock is divided into blocks (4KB), with block descriptors (bdescr) stored at the start of the megablock.

Megablock (1MB):
┌──────────────────────────────────────────────────────┐
 bdescr[0]  bdescr[1]  ...  bdescr[N]  padding      FIRST_BLOCK_OFF
├──────────────────────────────────────────────────────┤
 Block 0 (4KB)  Block 1 (4KB)  ...  Block N (4KB) 
└──────────────────────────────────────────────────────┘

Bdescr(p) macro: given any pointer p into the heap, computes the address of its block descriptor by masking and offsetting. This is how the GC finds metadata for any heap object — it's O(1) pointer arithmetic, no lookup table.

HEAP_ALLOCED(p) macro: checks whether p falls within the GC's reserved virtual address range. GHC reserves a 1TB VA range (mblock_address_space) at startup. Any pointer within this range is a heap pointer; anything outside is static/code.

Why External Memory Is Hard

The Bdescr(p) macro assumes that every pointer resolves to a valid block descriptor. If you mmap memory at an arbitrary address and store a Haskell closure there, Bdescr(p) returns a pointer into uninitialized memory. The next GC dereferences this garbage pointer and crashes.

The ghc-fastboot solution: mmap inside the mblock_address_space range at a fixed offset (900GB into the 1TB range), then manually initialize the bdescr entries for each megablock. This makes Bdescr(p) return valid descriptors and HEAP_ALLOCED(p) return true, so the GC treats our memory as legitimate heap.

Block Descriptor Setup

Each megablock requires its bdescr array to be initialized:

// Head block: represents the entire megablock as a large object
bdescr *head = Bdescr(first_block);
head->start = first_block;
head->blocks = total_blocks;
head->flags = BF_LARGE | BF_PINNED;  // never moved, never copied
head->gen = oldest_gen;               // no promotion

// Sub-blocks: point back to head
for (int b = 1; b < total_blocks; b++) {
    bdescr *bd = Bdescr(block_n);
    bd->start = first_block;  // same as head
    bd->blocks = 0;           // "part of a larger group"
    bd->flags = BF_LARGE | BF_PINNED;
    bd->gen = oldest_gen;
}

// Link into GC's large object list
dbl_link_onto(head, &oldest_gen->large_objects);

BF_LARGE | BF_PINNED tells the GC: this is a large object that should never be moved or evacuated. The GC will traverse it for pointers (if needed) but won't try to copy it. oldest_gen placement means it's only examined during major GC.

This bdescr initialization is currently the dominant cost of thaw (884µs for 91 megablocks). The mmap/mremap itself is 8µs. Future optimization: lazy bdescr init via userfaultfd, initializing each megablock's descriptors only when the GC first encounters a pointer into it.

GC Integration: BF_LARGE | BF_PINNED

ghc-fastboot does not use compact regions. Instead, it registers memory with the GC using the BF_LARGE | BF_PINNED flag combination on manually-initialized block descriptors.

This is a pragmatic choice:

  • BF_LARGE: tells the GC this is a large object. Large objects are managed by relinking their bdescr between generations, never copied. The GC scavenges them (follows pointers inside) but doesn't move them.

  • BF_PINNED: tells the GC this object must never be moved or freed by the collector. Combined with BF_LARGE, this creates an immovable, persistent region that the GC is aware of but won't disturb.

  • oldest_gen placement: the region is linked into oldest_gen->large_objects, so it's only examined during major GC (infrequent). The StablePtr on the root closure serves as a GC root, keeping the entire graph alive.

This combination is simpler than the compact region protocol (no StgCompactNFDataBlock headers, no compactFixupPointers calls, no block-chain management) and avoids the compact region's normal-form restriction.

Why Not Compact Regions?

Compact regions (BF_COMPACT) are opaque to the GC — it never traces inside. This is perfect for fully-evaluated data (constructors only) but prevents freezing thunks: if a thunk inside a compact region is forced and allocates new heap objects, those objects aren't visible to the GC and will be collected.

With BF_LARGE | BF_PINNED, the GC does trace into our region during major GC. This means thunks inside the frozen data can be safely forced — the GC will find any new objects they allocate. The cost is that major GC walks our bdescrs, but since the data is in oldest_gen, this only happens during major collections.

Aspect Compact Regions ghc-fastboot
GC flag BF_COMPACT BF_LARGE \| BF_PINNED
GC traversal Never traces inside (opaque) Traces normally during major GC
Closure types Normal form only All types (thunks, functions, PAPs, ...)
Thunk evaluation Unsafe (GC can't see new objects) Safe (GC scavenges the region)
Setup Block chain + compact closure + fixup Manual bdescr init + StablePtr
Memory layout Compact region block chain Contiguous mmap'd megablocks

The Missing Public API

GHC has no public API for registering external memory with the GC. The machinery exists — Bdescr(), block flags, generation lists — but it's all internal to the RTS. ghc-fastboot accesses it by:

  1. Including RTS internal headers (rts/storage/Block.h, rts/storage/GC.h)
  2. Directly manipulating bdescr structs via pointer arithmetic
  3. Linking into oldest_gen->large_objects via dbl_link_onto
  4. Calling getStablePtr for GC root registration

This works because GHC statically links the RTS, making all internal symbols available. But it's fragile — dependent on RTS struct layouts that can change between GHC versions.

A proper public API would look like:

registerExternalHeap :: Ptr a -> Int -> IO ExternalHeapHandle
unregisterExternalHeap :: ExternalHeapHandle -> IO ()

This would enable not just ghc-fastboot but any project that needs to bring foreign memory into the GHC heap: shared memory IPC, memory-mapped databases, custom allocators, GPU-managed buffers.

GHC RTS Source References

  • rts/sm/GC.c — garbage collector, closure traversal
  • rts/sm/Evac.c — evacuation, evacuate_compact for compact region handling
  • rts/sm/CNF.c — compact normal forms, compactAllocateBlock, compactFixupPointers
  • rts/include/rts/storage/Block.hBdescr macro, block flags, megablock layout
  • rts/include/rts/storage/InfoTables.h — info table structure, closure type enumeration
  • rts/include/rts/storage/GC.h — generation structure, oldest_gen, large_objects
  • rts/PrimOps.cmm — compact region primops (reference for Cmm calling convention)