GHC Compact Regions, External Heap Registration, and Runtime Limitations¶

Overview¶

This document describes limitations in GHC's runtime system that prevent Haskell from competing with either "no runtime" languages (Rust, C) or "full runtime" platforms (JVM, BEAM). It focuses on the storage manager's closed architecture, the compact region mechanism as a partial workaround, and undocumented internal APIs that enable external heap registration.

These findings emerged from the ghc-fastboot project, which implements snapshot/restore for instant Haskell program startup using compact region internals.

The Problem: GHC's Closed Storage Manager¶

Every pointer must be "known"¶

GHC's garbage collector assumes it owns all heap memory. Every pointer in a Haskell closure must point into a block managed by GHC's block allocator. The GC uses the Bdescr() macro to look up the block descriptor (bdescr) for any pointer:

// rts/include/rts/storage/Block.h
INLINE_HEADER bdescr *Bdescr(StgPtr p) {
    return (bdescr *)(((W_)p & MBLOCK_MASK) +
                      ... offset arithmetic ...);
}

This macro performs arithmetic on the pointer value itself, assuming a specific alignment and placement within GHC's megablock structure. If a pointer points to memory not allocated by GHC's block allocator, Bdescr() returns garbage, and the GC crashes.

What this prevents¶

You cannot: - mmap a file and use its contents as Haskell heap objects - Allocate memory with malloc and place closures there - Share heap regions between processes via shared memory - Restore serialized heap data into arbitrary memory locations

Any attempt to do so will work until the next garbage collection, then segfault. The GC encounters a pointer into the foreign memory, calls Bdescr(), gets a garbage block descriptor, and either dereferences invalid memory or corrupts the heap.

Comparison with other runtimes¶

Capability	JVM	BEAM	GHC
Register external memory with GC	`DirectByteBuffer`, `MemorySegment`	Binary references	Nothing public
Zero-copy mmap into heap	`MappedByteBuffer`	Persistent terms	Nothing
Off-heap objects, GC-aware	`Cleaner`, `PhantomReference`	NIFs with resources	`ForeignPtr` (raw bytes only)
Heap snapshot/restore	GraalVM Native Image, CRaC	Code hot-swap	Nothing (this project)
Custom allocator regions	`Arena` (Panama)	Custom allocators	`Compact` (limited)
Hot code swap	Class reloading, OSGi	Native, hot code loading	Not available

The JVM was designed from the start to coexist with foreign memory. GHC's GC was designed in the 1990s for a single-process batch computation model and has not been rearchitected for modern use cases.

GHC Is Stuck in the Middle¶

GHC cannot compete on either end of the runtime spectrum:

Against "no runtime" languages (Rust, C, Go): - Rust/C: ~0.5ms boot, deterministic memory, no GC pauses - GHC: ~12ms boot, GC pauses, RTS initialization overhead - For CLI tools (like fzf), the RTS overhead makes Haskell nonviable

Against "full runtime" platforms (JVM, BEAM): - JVM: HotSpot JIT, heap snapshots, hot code reload, pluggable GC, off-heap memory - BEAM: hot code swap, per-process GC, fault tolerance, live introspection - GHC: no hot swap, no external heap, no JIT, compact regions half-finished

The irony is that Haskell's type system and purity should make it the best candidate for both extremes: - Pure functions are perfect for AOT snapshot/freeze (no hidden mutation) - Strong types enable safe hot code swapping (type system guarantees compatibility) - Immutable data is trivially shareable via mmap (no aliasing hazards)

But the RTS architecture prevents exploiting these properties.

Compact Regions and External Memory Registration¶

History¶

Compact regions were introduced as a way to reduce GC pause times. The implementation was initially by Edward Z. Yang and later hardened by Simon Marlow after he moved to Facebook (now Meta) to use Haskell in their data center infrastructure (Sigma/Haxl).

The primary use case at Facebook was keeping large lookup tables out of GC traversal — put them in a compact region, and the GC treats them as a single opaque object instead of tracing through millions of entries.

The importCompact/exportCompact API was added for serializing compact regions (originally motivated by HPC and RDMA use cases), but no higher-level API for general-purpose external memory registration was ever built on top of this machinery. Compact region development has been largely stalled since its initial implementation.

How compact regions work internally¶

A compact region is a chain of memory blocks, each with a StgCompactNFDataBlock header:

typedef struct StgCompactNFDataBlock_ {
    struct StgCompactNFDataBlock_ *self;   // self-pointer (relocation detection)
    struct StgCompactNFData_      *owner;  // back-pointer to compact closure
    struct StgCompactNFDataBlock_ *next;   // next block in chain
} StgCompactNFDataBlock;

The first block also contains a StgCompactNFData closure — the compact region object itself:

typedef struct StgCompactNFData_ {
    StgHeader header;                       // info = stg_COMPACT_NFDATA_CLEAN_info
    StgWord totalW;                         // total words across all blocks
    StgWord autoBlockW;                     // default size for new blocks
    StgPtr hp, hpLim;                       // bump allocation pointer and limit
    StgCompactNFDataBlock *nursery;         // current allocation block
    StgCompactNFDataBlock *last;            // last block in chain
    struct hashtable *hash;                 // sharing table (usually NULL)
    StgClosure *result;                     // root of compacted data
    struct StgCompactNFData_ *link;         // used by compacting GC
} StgCompactNFData;

Memory layout of a first block:

[ StgCompactNFDataBlock | StgCompactNFData | ... user closures ... ]
^                        ^                  ^
block base               block + 24 bytes   block + 24 + 80 bytes

The critical flag: BF_COMPACT¶

The key to GC safety is a single flag on the block descriptor:

bd->flags = BF_COMPACT;   // value: 512 (0x200)

When the GC encounters a pointer and calls Bdescr(), it checks the block's flags. If BF_COMPACT is set, the GC calls evacuate_compact() instead of the normal evacuation path:

// rts/sm/Evac.c (simplified)
void evacuate(StgClosure **p) {
    bdescr *bd = Bdescr((StgPtr)q);

    if (bd->flags & BF_COMPACT) {
        evacuate_compact(p);   // DON'T trace inside — just mark the region
        return;
    }

    // ... normal evacuation (copy object, update forwarding ptr) ...
}

evacuate_compact() does NOT copy the object or trace its pointer fields. It simply marks the block as BF_EVACUATED and links the entire compact region into the target generation's compact_objects list. The contents are completely opaque to the GC.

Using compact region internals for external memory registration¶

The compact region import path provides the mechanism needed for external heap registration. Two C functions in rts/sm/CNF.c do the heavy lifting:

// Allocate a block with BF_COMPACT flag, registered with the GC
StgCompactNFDataBlock *compactAllocateBlock(
    Capability *cap,
    StgWord size,                     // total bytes including block header
    StgCompactNFDataBlock *previous   // NULL for first block
);

// Move block from staging list to live compact objects
// (makes GC aware of it)
StgPtr compactFixupPointers(
    StgCompactNFData *str,    // compact closure in first block
    StgClosure *root          // root object within region
);

These C functions follow GHC's standard three-tier API architecture:

Tier	What	Where	Accessible from
Public C API	`allocate()`, `performGC()`, etc.	`rts/include/` headers, exported in `.so`	C, Haskell via FFI
STG Primops	`stg_compactAllocateBlockzh`, etc.	`rts/include/stg/MiscClosures.h`, exported in `.so`	Cmm, Haskell via `foreign import prim`
Internal C	`compactAllocateBlock`, etc.	`rts/sm/CNF.h` (not shipped), `visibility(hidden)`	RTS internals only

The C-level functions sit in Tier 3 alongside ~500 other internal functions (GarbageCollect, nonmovingAllocate, stmStartTransaction, etc.). This is not unusual — it is the standard GHC RTS design. The public entry points are the Cmm primops in Tier 2, which are what GHC.Compact.Serialized.importCompact uses internally.

For C code that statically links with the RTS (as ghc-fastboot does), the Tier 3 symbols are accessible via extern declarations because they have global linkage in the static library (libHSrts.a). No header ships with installed GHC, so callers must provide their own declarations.

The combination of compactAllocateBlock (which sets BF_COMPACT on the bdescr) and compactFixupPointers (which moves the block from g0->compact_blocks_in_import to g0->compact_objects) is the mechanism used to register memory with GHC's GC as an opaque region.

The gap is not that these functions are hidden — they follow the standard RTS visibility model. The gap is that no higher-level Haskell API exists for general-purpose external memory registration. The public GHC.Compact.Serialized.importCompact function wraps these calls, but it requires data to be formatted as a serialized compact region (with proper StgCompactNFDataBlock headers in each chunk). It does not accept arbitrary memory regions.

The compact region import protocol¶

To register arbitrary memory with the GC using compact region internals:

// 1. Calculate total block size (header + compact closure + user data)
StgWord total_size = sizeof(StgCompactNFDataBlock)
                   + sizeof(StgCompactNFData)
                   + user_data_bytes;

// 2. Allocate a GC-registered block
StgCompactNFDataBlock *block = compactAllocateBlock(cap, total_size, NULL);
// At this point, block has BF_COMPACT on its bdescr
// and is on g0->compact_blocks_in_import (staging list)

// 3. Set up the compact region closure
StgCompactNFData *str = (StgCompactNFData *)
    ((char *)block + sizeof(StgCompactNFDataBlock));
SET_HDR((StgClosure *)str,
        &stg_COMPACT_NFDATA_CLEAN_info, CCS_SYSTEM);
str->hash = NULL;
str->result = root_closure;
str->link = NULL;
block->owner = str;

// 4. Copy user data after the compact closure
void *data_area = (char *)str + sizeof(StgCompactNFData);
memcpy(data_area, user_data, user_data_bytes);

// 5. Register with GC (move from staging to live)
compactFixupPointers(str, root_closure);
// Now the GC treats this block as opaque — will never trace inside

Important size semantics: compactAllocateBlock's size parameter is the total bytes from the block base, INCLUDING the StgCompactNFDataBlock header (24 bytes on 64-bit). Omitting the header from the size calculation causes a buffer overflow — the block is allocated 24 bytes too small, and memcpy corrupts adjacent block metadata. This manifests as a segfault in the GC, often far from the actual overflow site.

Limitations of compact regions¶

Even with the import protocol above, compact regions have structural limitations:

What	Storable?	Notes
Constructors (`CONSTR`)	Yes	The primary use case
Byte arrays (`ARR_WORDS`)	Yes	Raw bytes, no pointers
Frozen arrays	Yes	`MUT_ARR_PTRS_FROZEN_*`
Functions / closures (`FUN`)	No	Contain code pointers (info tables)
Thunks (`THUNK`)	No	Contain code pointers, may trigger evaluation
Partial applications (`PAP`)	No	Contain code pointers
Mutable references (`IORef`)	No	GC needs write barrier tracking
MVars, TVars	No	Mutable, scheduler-integrated
Weak references	No	GC-managed lifetime
Foreign pointers	No	External resource

The "no closures" limitation is fundamental to the public GHC.Compact API: compactAdd explicitly rejects functions and thunks with a runtime error. This is because compact regions were designed for normal-form data (fully evaluated, no thunks, no closures).

Extending compact regions to store closures¶

The ghc-fastboot project demonstrates that closures CAN be stored in compact region memory if you handle relocation yourself. The key insight:

Closures contain info table pointers — the first word of every closure points to its info table in the binary's .text section. The GHC runtime uses these pointers to determine closure type, layout, and entry code. The compact region import protocol does not adjust these pointers because it assumes all closures are data (constructors), whose info table pointers are position-independent.

For closures (functions, thunks, partial applications), the info table pointers are valid as long as: 1. The binary hasn't changed since the snapshot was created 2. The binary is loaded at the same address (or ASLR relocation is applied)

In practice, GHC links non-PIE executables by default, so the .text section is at a fixed address. Info table pointers are stable across runs of the same binary without any relocation.

For PIE executables, a single code_delta (runtime .text base minus snapshot .text base) can be applied to all info table pointers, since they all reside in the same binary.

What GHC Needs: A Public External Heap API¶

The compact region import mechanism can be repurposed for external memory registration, but this requires working at the Cmm primop level or statically linking against internal RTS symbols — both of which are fragile and unsupported. The underlying machinery is solid; what is missing is a purpose-built Haskell API for registering arbitrary memory with the GC.

Proposed API sketch¶

-- Register an mmap'd region with the GC as "don't trace"
registerExternalHeap :: Ptr a -> Int -> IO ExternalHeapHandle

-- Unregister when done
unregisterExternalHeap :: ExternalHeapHandle -> IO ()

-- Create a closure-safe external region (allows functions/thunks)
newExternalRegion :: Int -> IO ExternalRegion

-- Allocate within an external region
allocateInRegion :: ExternalRegion -> Int -> IO (Ptr a)

At the RTS level, this would: 1. Allocate block descriptors (bdescr) for the external memory range 2. Set BF_COMPACT (or a new BF_EXTERNAL flag) on the descriptors 3. Register the blocks with the appropriate generation 4. On GC: treat the blocks as opaque, just like compact regions

This would enable: - Zero-copy mmap of serialized data into the Haskell heap - Shared memory regions between Haskell processes - Persistent data structures backed by memory-mapped files - Fast startup via heap snapshot/restore - Custom allocators for performance-critical data

The relationship to hot code swap¶

Hot code swapping (loading new code at runtime, re-linking closures) faces the same fundamental issue from the other direction. Where external heap registration asks "can I bring foreign data into the GHC world?", hot code swap asks "can I change the code that GHC closures point to?"

Both require the GC to handle pointers that don't fit its standard assumptions. Both are blocked by the same closed architecture. A solution to one would likely enable the other — if the RTS can handle external memory regions, it can handle dynamically loaded code modules whose .text addresses aren't known at link time.

References¶

GHC RTS source: rts/sm/CNF.c (compact normal forms — block allocation, fixup)
GHC RTS source: rts/sm/Evac.c (evacuate_compact — GC handling of compact blocks)
GHC RTS source: rts/include/rts/storage/Block.h (Bdescr macro, BF_COMPACT flag)
GHC RTS source: rts/sm/Compact.h (StgCompactNFData, StgCompactNFDataBlock)
GHC wiki: Compact Normal Forms
Well-Typed blog: Functions in Compact Regions
Edward Z. Yang's thesis: initial compact region implementation
ghc-fastboot project: practical demonstration of compact region import for snapshot/restore