Demo Results

Phase 1 demonstration of three HDD skills for Claude Code. Each skill was tested using RED-GREEN-REFACTOR: baseline without skill, then with skill loaded.

Demo 1: Core Skill — Python TOC Generator

Task: Implement generate_toc(markdown: str) -> str — parse headings, generate nested TOC with links, handle duplicates.

Without skill (baseline)

Agent wrote everything in a single function, in a single pass. 5 tool calls. All logic inlined, no decomposition.

With skill

Agent decomposed into 4 holes, filled one at a time. 19 tool calls.

Skeleton with holes:

def generate_toc(markdown: str) -> str:
    headings = _extract_headings(markdown)          # HOLE_1
    toc_entries: list[str] = []
    slug_counts: dict[str, int] = {}
    for level, text in headings:
        slug = _make_slug(text)                     # HOLE_2
        slug = _deduplicate_slug(slug, slug_counts) # HOLE_3
        toc_entries.append(_format_entry(level, text, slug))  # HOLE_4
    return "\n".join(toc_entries)

Each helper started as raise NotImplementedError("HOLE: ...").

Fill order (most constrained first):

Order	Hole	Rationale
1st	`_format_entry`	Pure string formatting, zero ambiguity
2nd	`_deduplicate_slug`	Mechanical counter logic
3rd	`_extract_headings`	Regex determined by spec
4th	`_make_slug`	Least constrained — multiple valid algorithms

Metrics

Metric	Baseline	With skill
Tool calls	5	19
Decomposition	None	4 holes
Visible holes in file	Never	Yes
Code structure	Monolithic	5 focused functions

Demo 2: Compiler Loop — Haskell myFoldr

Task: Implement myFoldr :: (a -> b -> b) -> b -> [a] -> b using GHC's typed hole system.

The compiler loop in action

Starting point:

myFoldr :: (a -> b -> b) -> b -> [a] -> b
myFoldr = _

Cycle 1: Compile → hole type (a -> b -> b) -> b -> [a] -> b. Introduce pattern matching with named holes _empty, _cons.

Cycle 2: _empty :: b (fits: z), _cons :: b (fits: z — misleading!). Fill _empty with z.

Cycle 3: _cons :: b, bindings show f :: a -> b -> b, x :: a, xs :: [a]. Decompose into f x _rest.

Cycle 4: _rest :: b, bindings include myFoldr. Fill with myFoldr f z xs.

Cycle 5: Compilation succeeds.

myFoldr f z []     = z
myFoldr f z (x:xs) = f x (myFoldr f z xs)

Metrics

Metric	Value
Compile cycles	5
Holes filled	4 (including sub-hole `_rest`)
Named holes used	`_empty`, `_cons`, `_rest`
Misleading fits avoided	1 (`z` for recursive case)

Demo 3: Iterative Reasoning — Python CSV Parser

Task: Implement parse_csv(text: str) -> list[dict[str, str]] — handle quoted fields with commas.

Without skill (baseline)

Agent wrote 17-line complete implementation in one pass. 7 tool calls. No decomposition visible.

With skill

Agent decomposed into 3 holes + 1 sub-hole. 15 tool calls.

Skeleton → fill HOLE_1 → fill HOLE_2 (introduces sub-hole HOLE_2a) → fill HOLE_2a → fill HOLE_3.

Metrics

Metric	Baseline	With skill
Tool calls	7	15
Decomposition	None	3 holes + 1 sub-hole
Visible holes in file	Never	Yes
Contract reasoning	Implicit	Explicit per hole

Key Findings

What the skills change

Visible decomposition. Without skills, agents decompose mentally but write complete code. With skills, holes live in the file — the human watches the skeleton evolve.
One hole at a time. Without skills, agents batch-write. With skills, they fill exactly one hole per iteration, re-assessing after each fill.
Constraint-driven ordering. Agents naturally pick holes in reading order. The skill redirects toward most-constrained-first, which reduces errors.
Better code structure. The HDD process naturally produces modular code (named helpers, clear interfaces) vs. monolithic implementations.

Issue found	Refinement added
Agent kept holes in head, wrote final code	"Holes must be visible — write to the file"
Agent used bare `_` for multiple holes	"Use named holes (`_name`)"
Agent under-decomposed (2 holes for 4 requirements)	"Each distinct concern gets a hole"

Architecture validation

The three-layer architecture composes cleanly:

Core provides philosophy (decompose, visible holes, one at a time)
Compiler-loop adds mechanism (compile, read diagnostics, fill)
Iterative-reasoning adds mechanism (reason about contract, write markers)
No conflicts when combining core + extending skill

Demo Results

Demo 1: Core Skill — Python TOC Generator

Without skill (baseline)

With skill

Metrics

Demo 2: Compiler Loop — Haskell myFoldr

The compiler loop in action

Metrics

Demo 3: Iterative Reasoning — Python CSV Parser

Without skill (baseline)

With skill

Metrics

Key Findings

What the skills change

Critical skill refinements discovered during testing

Architecture validation