10. Implementation Timeline¶

The runtime was developed in 21 phases. Each phase is summarized below with a reference to the section containing full details.

Phase	Description	Section
1	Stub pthread-based runtime validating GCC GOMP ABI compatibility	§4
2	Replace pthreads with GHC Capabilities; hybrid C shim approach	§4
3	Lock-free optimization: atomic generation counter, sense-reversing barriers	§5
4	Haskell FFI interop via `foreign import ccall safe`	§6.1
5	Concurrent Haskell green threads + OpenMP parallel regions	§6.3
6	GC interaction testing — minimal impact on OpenMP latency	§6.4
7	Dense matrix multiply (DGEMM) workload	§9.2
8	Head-to-head comparison with native libgomp — performance parity	§9.2
9	Bidirectional interop: OpenMP workers call Haskell via FunPtr	§6.5
10	Cmm primitives via `foreign import prim` — zero-overhead FFI	§7.1
11	`inline-cmm` quasiquoter integration	§7.1
12	Batched safe calls amortizing 68ns FFI overhead	§7.2
13	Parallelism crossover analysis — break-even at ~500 elements	§9.4
14	GHC native parallelism vs OpenMP — parity with `-fllvm`	§9.5
15	Deferred task execution with work-stealing barriers	§4.3
16	Zero-copy FFI with pinned ByteArray — 19% inner loop speedup	§A.6
17	Linear typed arrays for type-safe disjoint partitioning	§A.7
18	Runtime improvements: guided scheduling, hybrid barriers, task pools	§4
19	Shared memory demos: producer-consumer, synchronized, linear	§8
20	Safety demos: overlap bugs, stencil ordering, linear type prevention	§8
21	GHC spark parallelism via `parCombine` with `spark#`/`noDuplicate#`	§8