Back to registry
↻ Workflow OS ✓ Validated Updated Not dated

Complete Agent Skills Evaluation OS

A blueprint for evaluating an entire coding-agent setup, not just one skill: installed skills, agents, commands, hooks, MCP, routers, subagents, workflow discipline, token cost, safety, and CI evidence.

What this builds

A reusable agent workflow packet.

A validated Buildprint for evaluating complete agent+skills installations from static validity through real behavior, with skill-eval-runner as the core module but not the whole system.

  • Setup snapshot and install parity model
  • Static lint gates for agent config files
  • Loadout and token-cost inventory
  • Skill unit/regression test harness pattern
  • Activation and routing eval pattern
  • Transcript/process invariant checks
Core capabilities

The useful parts the finished build should expose.

01 Setup snapshot and install parity model
02 Static lint gates for agent config files
03 Loadout and token-cost inventory
04 Skill unit/regression test harness pattern
05 Activation and routing eval pattern
06 Transcript/process invariant checks
What you need

Local first, live proof explicit.

  • A target agent setup to evaluate
  • Offline fixture cases for deterministic mode
  • Optional live agent/provider credentials for live adapters
  • A safety policy for external/destructive actions
  • A list of critical workflow invariants
System shape

What kind of artifact this becomes.

Workflow surface

A validated Buildprint for evaluating complete agent+skills installations from static validity through real behavior, with skill-eval-runner as the core module but not the whole system.

Runtime layer

Any coding agent, JavaScript proof, CI-ready adapters

Build materials

Snapshot / Lint / Inventory / Skill tests

Proof boundary

Deep stack design / Offline proof passed

Build scope

Included, required from you, and outside the claim.

Included
  • Setup snapshot and install parity model
  • Static lint gates for agent config files
  • Loadout and token-cost inventory
  • Skill unit/regression test harness pattern
Bring yourself
  • A target agent setup to evaluate
  • Offline fixture cases for deterministic mode
  • Optional live agent/provider credentials for live adapters
  • A safety policy for external/destructive actions
Out of scope
  • Mistaking per-skill tests for full setup proof
  • Ignoring activation failures
  • Skipping transcript/order evidence
  • Measuring a drifted install
Agent handoff

Start from the packet, not the UI.

agb start https://agent-buildprint.com/buildprints/complete-agent-skills-evaluation-os/package.json
Key files

The first files an agent should read.

All package files
ACTIVATION_EVALS.md Buildprint package file
BUILDPRINT.md compatibility bootstrap or package contract
checks/acceptance.md acceptance checklist
CONTRACTS.md legacy interface/data contracts, when present
E2E_TASK_BENCH.md Buildprint package file
LOADOUT_INVENTORY.md Buildprint package file
MULTI_AGENT_SAFETY.md Buildprint package file
PLAN.md legacy execution index, when present
proof/package-lock.json offline proof artifact
proof/package.json offline proof artifact
proof/src/eval-os.mjs offline proof artifact
proof/test/eval-os.test.mjs offline proof artifact
publication.json machine-readable mirror
README.md human overview, non-authoritative
SAFETY_POLICY.md Buildprint package file
SCORECARD.md Buildprint package file
SKILL_UNIT_EVALS.md Buildprint package file
SPEC.md legacy behavior requirements, when present
STATIC_LINT.md Buildprint package file
TEST_MATRIX.md legacy risk-to-test alignment, when present
TRANSCRIPT_PROCESS_EVALS.md Buildprint package file
VALIDATION_REPORT.md Buildprint package file
VALIDATION_TEMPLATE.md legacy completion report template, when present