Factory Reports
Every factory run should end with a report, even if the decision is reject.
Template
# <target> — <method> — <date>
## Decision
Decision: ship | reject | retry-data | retry-training | retry-eval | park
Reason: <one paragraph>
## Target
- Target:
- Base model:
- Candidate:
- Training method:
- Artifact:
## Data
- Dataset:
- Rows:
- Heldout:
- Filters:
- Known gaps:
## Eval
| Metric | Baseline | Candidate | Delta | Pass |
|---|---:|---:|---:|---|
| Primary | | | | |
| Regression / breadth | | | | |
| Parse errors | | | | |
## Performance
| Metric | Value |
|---|---:|
| Train time | |
| Eval time | |
| Latency | |
| tok/s | |
| RAM / peak RSS | |
## Failures
- What failed:
- Likely cause:
- Data fix:
- Training fix:
- Eval fix:
## Next Action
One next action only.
Reporting Standard
Reports should be short and numeric. Avoid long narrative unless a failure is subtle.
Do not hide:
- regressions
- skipped evals
- non-determinism
- missing artifacts
- data leakage risk
- eval weakness