Origin Story
These skills were extracted from the methodology used to develop the Hole Driven Development Claude Code skills.
The HDD Experiment Journey
The HDD skill teaches agents to decompose code into typed holes and fill them iteratively. Developing it required answering a hard question: does the skill actually produce better code than baseline?
Phase 1: Compliance (24 experiments)
First, we checked whether agents follow the skill at all. 5 tasks across Python, Haskell, and Go, checking each rule (visible holes, one-at-a-time, most-constrained-first, stop when ambiguous).
Result: 4/5 PASS initially, 5/5 after a skill edit. Compliance confirmed.
Phase 2: Stress (24 experiments)
Same tasks with competing instructions ("just write the whole thing"), time pressure, and edge cases.
Result: 5/5 PASS. The skill holds under pressure.
Phase 3: Quality (35 experiments)
The critical test. 5 hard tasks (type inference, pipeline, 3-way merge, task scheduler, rate limiter), each run with and without the skill. Blind 3-persona review with randomized labels.
First result: Baseline won 4/5. The skill produced better architecture (+0.4 avg) and clarity (+0.6 avg) but dramatically worse bug scores (-2.0 avg). Each hole fill was locally correct, but cross-hole interactions had bugs.
The Improvement Cycle
This is where the methodology that became these skills was born:
- Root cause analysis — bugs were at hole boundaries (shared state, resource lifecycle, error paths)
- Targeted edit — added a VERIFY step checking cross-hole interactions after each fill
- Anti-overfitting check — VERIFY is structural (catches a class of bugs), not task-specific
- Re-run — HDD v2 won 5/5
The improvement was genuine: a structural step that catches cross-hole interaction bugs on any codebase, not wording tuned to pass 5 specific tests.
Key Insight
Compliance and quality are different things. An agent can follow every rule perfectly and still produce worse code. Only blind A/B comparison against baseline reveals whether a skill helps.
From Methodology to Skills
The three skills capture the patterns that emerged:
| Pattern | Skill |
|---|---|
| Randomized labels, multi-persona judging, per-dimension scoring | blind-skill-assessment |
| Baselines, phase progression, task diversity | experiment-set-design |
| Diagnose-triage-edit-rerun loop, anti-overfitting | iterative-skill-refinement |
The methodology is now reusable: any Claude Code skill can be developed, validated, and improved using these three skills.