# SCORECARD

## Layer weights

| Layer | Weight |
| --- | ---: |
| Static validity | 10 |
| Snapshot integrity | 10 |
| Loadout hygiene | 10 |
| Skill unit correctness | 20 |
| Activation precision/recall | 15 |
| Transcript/process compliance | 15 |
| E2E task success | 15 |
| Multi-agent safety | 5 |

## Status bands

- 90-100: production-ready with evidence.
- 75-89: strong; review warnings before rollout.
- 60-74: useful but risky; fix failures before broad use.
- <60: not ready.

Hard failures override score and mark the setup blocked.
