Measure Twice. Code Once.

Validate Like a Scientist, Code Like a Machine.

Sep 21, 2025

Premise

Start with testable assumptions, not polished specs. Frame a problem/solution hypothesis, validate fast, and separate thinking (brainy LLMs) from codifying (dumb LLMs).

One‑Page Workflow

Frame a crisp hypothesis
- User, pain, suspected cause, intervention, metric, target, timeframe, falsifier.
- Example: “We believe new sellers fail to list due to fee confusion; simplifying copy will lift first‑listing completion by 20% within 14 days; invalidate if lift <10%.”
Max your human context window
- Brain‑dump constraints, knowns/unknowns, non‑goals, prior attempts, risks, edge cases, success thresholds.
Rapid desk validation with Perplexity
- Use Perplexity and Deep Research to map prior art, contradictions, and unknowns.
- Copy the reasoning .md into a GitHub Issue to preserve claims, citations, and gaps.
Work assumptions with brainy LLMs
- Surface hidden assumptions, counter‑arguments, and “cheapest falsification tests.”
- Decompose into minimal, measurable experiments; call out risks and invariants.
Codify with dumb LLMs
- Turn approved decisions into specs, checklists, API stubs, test scripts, and templates.
- Keep prompts concrete; low temperature; feed only validated inputs.
Make assumptions first‑class in GitHub
- One Issue per assumption with status (untested/validated/invalidated), evidence for/against, test owner, deadline, decision, and pasted reasoning.md.
Test small, decide fast
- Concierge tests, fake doors, mocks over full builds. Timebox. Update belief based on just‑enough signal.
- Promote validated bets to spec; pivot or drop invalidated ones; log learning.

Tools at a Glance

Brainy LLMs: decomposition, tradeoffs, threat modeling, experiment design, falsifiers.
Dumb LLMs: PRD skeletons, acceptance criteria, checklists, API contracts, boilerplate.
Perplexity + Deep Research: evidence trees, contradictions, market/academic scans; treat outputs as leads to verify.

Minimal Prompts

Hypothesis: “We believe [user] struggles with [problem] because [cause]. If we [solution], [metric] will improve by [target] within [timeframe]. Invalidate if [falsifier].”
Assumption tests (brainy): “List hidden assumptions. For each, propose a 1‑week, low‑cost falsification. Prioritize by risk.”
Counter‑evidence (Perplexity): “What credible evidence contradicts [claim]? Summarize sources, methods, limitations.”

Anti‑Patterns

Collecting only confirmatory evidence; no falsifiers.
Letting Deep Research become a rabbit hole; no timebox.
Jumping to code without a metric or success threshold.
Using brainy LLMs for boilerplate or dumb LLMs for strategy.

Tiny Example

Hypothesis: “Fee copy confusion blocks listings; simpler copy lifts completion by 20% in 14 days.”
Validation: Perplexity finds clarity helps but plateaus without calculators; experienced sellers less affected.
Test: A/B copy‑only vs copy+inline calculator. Outcome: +8% vs +19% → partial validation; iterate with cost preview.

Checklist

Hypothesis with metric and falsifier
Context brain‑dump logged
Deep Research reasoning.md in GitHub
Assumptions with owners and tests
Minimal experiment defined and timeboxed
Spec/checklists generated after validation
Decision and learning archived

André’s Substack

Discussion about this post