# TEST_MATRIX

| Risk | Test |
| --- | --- |
| Invalid skill/config silently ignored | Static lint rejects invalid frontmatter/broken refs |
| Wrong setup tested | Snapshot includes versions/files and integrity check |
| Skill works only when manually selected | Activation natural-prompt evals fail until routing works |
| Agent skips required process | Transcript order invariant catches shortcut |
| Generic final answer hides artifact failure | Skill unit assertions inspect files/commands/JSON |
| Unsafe action succeeds | Safety hard-fail case blocks score |
| Loadout bloats every turn | Inventory reports token tax and dormant artifacts |
| Subagents collide | Multi-agent ownership/worktree cases fail |
| Router improves recall but causes false positives | A/B activation precision and wrong-skill rate measured |
| Evidence fabricated | Report requires stored transcripts/artifacts/commands |