GHC Compact Regions, External Heap Registration, and Runtime Limitations¶
Overview¶
This document describes limitations in GHC's runtime system that prevent Haskell from competing with either "no runtime" languages (Rust, C) or "full runtime" platforms (JVM, BEAM). It focuses on the storage manager's closed architecture, the compact region mechanism as a partial workaround, and undocumented internal APIs that enable external heap registration.
These findings emerged from the ghc-fastboot project, which implements snapshot/restore for instant Haskell program startup using compact region internals.
The Problem: GHC's Closed Storage Manager¶
Every pointer must be "known"¶
GHC's garbage collector assumes it owns all heap memory. Every pointer in a
Haskell closure must point into a block managed by GHC's block allocator. The GC
uses the Bdescr() macro to look up the block descriptor (bdescr) for any
pointer:
// rts/include/rts/storage/Block.h
INLINE_HEADER bdescr *Bdescr(StgPtr p) {
return (bdescr *)(((W_)p & MBLOCK_MASK) +
... offset arithmetic ...);
}
This macro performs arithmetic on the pointer value itself, assuming a specific
alignment and placement within GHC's megablock structure. If a pointer points
to memory not allocated by GHC's block allocator, Bdescr() returns garbage,
and the GC crashes.
What this prevents¶
You cannot:
- mmap a file and use its contents as Haskell heap objects
- Allocate memory with malloc and place closures there
- Share heap regions between processes via shared memory
- Restore serialized heap data into arbitrary memory locations
Any attempt to do so will work until the next garbage collection, then segfault.
The GC encounters a pointer into the foreign memory, calls Bdescr(), gets a
garbage block descriptor, and either dereferences invalid memory or corrupts the
heap.
Comparison with other runtimes¶
| Capability | JVM | BEAM | GHC |
|---|---|---|---|
| Register external memory with GC | DirectByteBuffer, MemorySegment |
Binary references | Nothing public |
| Zero-copy mmap into heap | MappedByteBuffer |
Persistent terms | Nothing |
| Off-heap objects, GC-aware | Cleaner, PhantomReference |
NIFs with resources | ForeignPtr (raw bytes only) |
| Heap snapshot/restore | GraalVM Native Image, CRaC | Code hot-swap | Nothing (this project) |
| Custom allocator regions | Arena (Panama) |
Custom allocators | Compact (limited) |
| Hot code swap | Class reloading, OSGi | Native, hot code loading | Not available |
The JVM was designed from the start to coexist with foreign memory. GHC's GC was designed in the 1990s for a single-process batch computation model and has not been rearchitected for modern use cases.
GHC Is Stuck in the Middle¶
GHC cannot compete on either end of the runtime spectrum:
Against "no runtime" languages (Rust, C, Go):
- Rust/C: ~0.5ms boot, deterministic memory, no GC pauses
- GHC: ~12ms boot, GC pauses, RTS initialization overhead
- For CLI tools (like fzf), the RTS overhead makes Haskell nonviable
Against "full runtime" platforms (JVM, BEAM): - JVM: HotSpot JIT, heap snapshots, hot code reload, pluggable GC, off-heap memory - BEAM: hot code swap, per-process GC, fault tolerance, live introspection - GHC: no hot swap, no external heap, no JIT, compact regions half-finished
The irony is that Haskell's type system and purity should make it the best candidate for both extremes: - Pure functions are perfect for AOT snapshot/freeze (no hidden mutation) - Strong types enable safe hot code swapping (type system guarantees compatibility) - Immutable data is trivially shareable via mmap (no aliasing hazards)
But the RTS architecture prevents exploiting these properties.
Compact Regions and External Memory Registration¶
History¶
Compact regions were introduced as a way to reduce GC pause times. The implementation was initially by Edward Z. Yang and later hardened by Simon Marlow after he moved to Facebook (now Meta) to use Haskell in their data center infrastructure (Sigma/Haxl).
The primary use case at Facebook was keeping large lookup tables out of GC traversal — put them in a compact region, and the GC treats them as a single opaque object instead of tracing through millions of entries.
The importCompact/exportCompact API was added for serializing compact
regions (originally motivated by HPC and RDMA use cases), but no higher-level
API for general-purpose external memory registration was ever built on top of
this machinery. Compact region development has been largely stalled since its
initial implementation.
How compact regions work internally¶
A compact region is a chain of memory blocks, each with a StgCompactNFDataBlock
header:
typedef struct StgCompactNFDataBlock_ {
struct StgCompactNFDataBlock_ *self; // self-pointer (relocation detection)
struct StgCompactNFData_ *owner; // back-pointer to compact closure
struct StgCompactNFDataBlock_ *next; // next block in chain
} StgCompactNFDataBlock;
The first block also contains a StgCompactNFData closure — the compact region
object itself:
typedef struct StgCompactNFData_ {
StgHeader header; // info = stg_COMPACT_NFDATA_CLEAN_info
StgWord totalW; // total words across all blocks
StgWord autoBlockW; // default size for new blocks
StgPtr hp, hpLim; // bump allocation pointer and limit
StgCompactNFDataBlock *nursery; // current allocation block
StgCompactNFDataBlock *last; // last block in chain
struct hashtable *hash; // sharing table (usually NULL)
StgClosure *result; // root of compacted data
struct StgCompactNFData_ *link; // used by compacting GC
} StgCompactNFData;
Memory layout of a first block:
[ StgCompactNFDataBlock | StgCompactNFData | ... user closures ... ]
^ ^ ^
block base block + 24 bytes block + 24 + 80 bytes
The critical flag: BF_COMPACT¶
The key to GC safety is a single flag on the block descriptor:
bd->flags = BF_COMPACT; // value: 512 (0x200)
When the GC encounters a pointer and calls Bdescr(), it checks the block's
flags. If BF_COMPACT is set, the GC calls evacuate_compact() instead of the
normal evacuation path:
// rts/sm/Evac.c (simplified)
void evacuate(StgClosure **p) {
bdescr *bd = Bdescr((StgPtr)q);
if (bd->flags & BF_COMPACT) {
evacuate_compact(p); // DON'T trace inside — just mark the region
return;
}
// ... normal evacuation (copy object, update forwarding ptr) ...
}
evacuate_compact() does NOT copy the object or trace its pointer fields. It
simply marks the block as BF_EVACUATED and links the entire compact region
into the target generation's compact_objects list. The contents are completely
opaque to the GC.
Using compact region internals for external memory registration¶
The compact region import path provides the mechanism needed for external heap
registration. Two C functions in rts/sm/CNF.c do the heavy lifting:
// Allocate a block with BF_COMPACT flag, registered with the GC
StgCompactNFDataBlock *compactAllocateBlock(
Capability *cap,
StgWord size, // total bytes including block header
StgCompactNFDataBlock *previous // NULL for first block
);
// Move block from staging list to live compact objects
// (makes GC aware of it)
StgPtr compactFixupPointers(
StgCompactNFData *str, // compact closure in first block
StgClosure *root // root object within region
);
These C functions follow GHC's standard three-tier API architecture:
| Tier | What | Where | Accessible from |
|---|---|---|---|
| Public C API | allocate(), performGC(), etc. |
rts/include/ headers, exported in .so |
C, Haskell via FFI |
| STG Primops | stg_compactAllocateBlockzh, etc. |
rts/include/stg/MiscClosures.h, exported in .so |
Cmm, Haskell via foreign import prim |
| Internal C | compactAllocateBlock, etc. |
rts/sm/CNF.h (not shipped), visibility(hidden) |
RTS internals only |
The C-level functions sit in Tier 3 alongside ~500 other internal functions
(GarbageCollect, nonmovingAllocate, stmStartTransaction, etc.). This is
not unusual — it is the standard GHC RTS design. The public entry points are
the Cmm primops in Tier 2, which are what GHC.Compact.Serialized.importCompact
uses internally.
For C code that statically links with the RTS (as ghc-fastboot does), the Tier 3
symbols are accessible via extern declarations because they have global linkage
in the static library (libHSrts.a). No header ships with installed GHC, so
callers must provide their own declarations.
The combination of compactAllocateBlock (which sets BF_COMPACT on the
bdescr) and compactFixupPointers (which moves the block from
g0->compact_blocks_in_import to g0->compact_objects) is the mechanism used
to register memory with GHC's GC as an opaque region.
The gap is not that these functions are hidden — they follow the standard RTS
visibility model. The gap is that no higher-level Haskell API exists for
general-purpose external memory registration. The public
GHC.Compact.Serialized.importCompact function wraps these calls, but it
requires data to be formatted as a serialized compact region (with proper
StgCompactNFDataBlock headers in each chunk). It does not accept arbitrary
memory regions.
The compact region import protocol¶
To register arbitrary memory with the GC using compact region internals:
// 1. Calculate total block size (header + compact closure + user data)
StgWord total_size = sizeof(StgCompactNFDataBlock)
+ sizeof(StgCompactNFData)
+ user_data_bytes;
// 2. Allocate a GC-registered block
StgCompactNFDataBlock *block = compactAllocateBlock(cap, total_size, NULL);
// At this point, block has BF_COMPACT on its bdescr
// and is on g0->compact_blocks_in_import (staging list)
// 3. Set up the compact region closure
StgCompactNFData *str = (StgCompactNFData *)
((char *)block + sizeof(StgCompactNFDataBlock));
SET_HDR((StgClosure *)str,
&stg_COMPACT_NFDATA_CLEAN_info, CCS_SYSTEM);
str->hash = NULL;
str->result = root_closure;
str->link = NULL;
block->owner = str;
// 4. Copy user data after the compact closure
void *data_area = (char *)str + sizeof(StgCompactNFData);
memcpy(data_area, user_data, user_data_bytes);
// 5. Register with GC (move from staging to live)
compactFixupPointers(str, root_closure);
// Now the GC treats this block as opaque — will never trace inside
Important size semantics: compactAllocateBlock's size parameter is the
total bytes from the block base, INCLUDING the StgCompactNFDataBlock header
(24 bytes on 64-bit). Omitting the header from the size calculation causes a
buffer overflow — the block is allocated 24 bytes too small, and memcpy
corrupts adjacent block metadata. This manifests as a segfault in the GC, often
far from the actual overflow site.
Limitations of compact regions¶
Even with the import protocol above, compact regions have structural limitations:
| What | Storable? | Notes |
|---|---|---|
Constructors (CONSTR) |
Yes | The primary use case |
Byte arrays (ARR_WORDS) |
Yes | Raw bytes, no pointers |
| Frozen arrays | Yes | MUT_ARR_PTRS_FROZEN_* |
Functions / closures (FUN) |
No | Contain code pointers (info tables) |
Thunks (THUNK) |
No | Contain code pointers, may trigger evaluation |
Partial applications (PAP) |
No | Contain code pointers |
Mutable references (IORef) |
No | GC needs write barrier tracking |
| MVars, TVars | No | Mutable, scheduler-integrated |
| Weak references | No | GC-managed lifetime |
| Foreign pointers | No | External resource |
The "no closures" limitation is fundamental to the public GHC.Compact API:
compactAdd explicitly rejects functions and thunks with a runtime error. This
is because compact regions were designed for normal-form data (fully evaluated,
no thunks, no closures).
Extending compact regions to store closures¶
The ghc-fastboot project demonstrates that closures CAN be stored in compact region memory if you handle relocation yourself. The key insight:
Closures contain info table pointers — the first word of every closure points
to its info table in the binary's .text section. The GHC runtime uses these
pointers to determine closure type, layout, and entry code. The compact region
import protocol does not adjust these pointers because it assumes all closures
are data (constructors), whose info table pointers are position-independent.
For closures (functions, thunks, partial applications), the info table pointers are valid as long as: 1. The binary hasn't changed since the snapshot was created 2. The binary is loaded at the same address (or ASLR relocation is applied)
In practice, GHC links non-PIE executables by default, so the .text section
is at a fixed address. Info table pointers are stable across runs of the same
binary without any relocation.
For PIE executables, a single code_delta (runtime .text base minus snapshot
.text base) can be applied to all info table pointers, since they all reside
in the same binary.
What GHC Needs: A Public External Heap API¶
The compact region import mechanism can be repurposed for external memory registration, but this requires working at the Cmm primop level or statically linking against internal RTS symbols — both of which are fragile and unsupported. The underlying machinery is solid; what is missing is a purpose-built Haskell API for registering arbitrary memory with the GC.
Proposed API sketch¶
-- Register an mmap'd region with the GC as "don't trace"
registerExternalHeap :: Ptr a -> Int -> IO ExternalHeapHandle
-- Unregister when done
unregisterExternalHeap :: ExternalHeapHandle -> IO ()
-- Create a closure-safe external region (allows functions/thunks)
newExternalRegion :: Int -> IO ExternalRegion
-- Allocate within an external region
allocateInRegion :: ExternalRegion -> Int -> IO (Ptr a)
At the RTS level, this would:
1. Allocate block descriptors (bdescr) for the external memory range
2. Set BF_COMPACT (or a new BF_EXTERNAL flag) on the descriptors
3. Register the blocks with the appropriate generation
4. On GC: treat the blocks as opaque, just like compact regions
This would enable: - Zero-copy mmap of serialized data into the Haskell heap - Shared memory regions between Haskell processes - Persistent data structures backed by memory-mapped files - Fast startup via heap snapshot/restore - Custom allocators for performance-critical data
The relationship to hot code swap¶
Hot code swapping (loading new code at runtime, re-linking closures) faces the same fundamental issue from the other direction. Where external heap registration asks "can I bring foreign data into the GHC world?", hot code swap asks "can I change the code that GHC closures point to?"
Both require the GC to handle pointers that don't fit its standard assumptions.
Both are blocked by the same closed architecture. A solution to one would likely
enable the other — if the RTS can handle external memory regions, it can handle
dynamically loaded code modules whose .text addresses aren't known at link time.
References¶
- GHC RTS source:
rts/sm/CNF.c(compact normal forms — block allocation, fixup) - GHC RTS source:
rts/sm/Evac.c(evacuate_compact— GC handling of compact blocks) - GHC RTS source:
rts/include/rts/storage/Block.h(Bdescrmacro,BF_COMPACTflag) - GHC RTS source:
rts/sm/Compact.h(StgCompactNFData,StgCompactNFDataBlock) - GHC wiki: Compact Normal Forms
- Well-Typed blog: Functions in Compact Regions
- Edward Z. Yang's thesis: initial compact region implementation
- ghc-fastboot project: practical demonstration of compact region import for snapshot/restore