Skip to content

TypeScript SDK

YAML remains AgentV’s canonical, portable eval format. The SDK surfaces below are for cases where you want to generate YAML-shaped definitions in code, embed eval runs inside another application, or write executable graders and prompt templates.

AgentV currently provides two npm packages for programmatic use:

  • @agentv/sdk — YAML-aligned eval authoring, custom assertions, and code graders
  • @agentv/core — programmatic evaluation API and typed configuration
Terminal window
# Assertion SDK (defineAssertion, defineCodeGrader)
npm install @agentv/sdk
# Programmatic API (evaluate, defineConfig)
npm install @agentv/core

Use the simplest surface that matches the job:

  • YAML / JSONL first for portable eval specs you want to run from the CLI, check into a repo, or share across TypeScript and Python workflows.
  • defineEval() / evalSuite() when you want a .eval.ts file that mirrors YAML concepts and lowers back to the canonical snake_case contract.
  • evaluate({ specFile }) when you want library control around an existing YAML suite.
  • Inline evaluate({ tests }) when the eval definition truly belongs inside application code. The programmatic API mirrors YAML, but uses current TypeScript naming such as expectedOutput and assert.
  • defineAssertion / defineCodeGrader when the grading logic itself must execute code.

There is no separate first-party Python authoring SDK today. Python-facing workflows should either emit canonical YAML/JSONL or implement executable graders that consume the standard snake_case wire format.

Use defineEval() from @agentv/sdk when you want TypeScript ergonomics without creating a second eval vocabulary. The helper keeps authoring in camelCase where TypeScript needs it, then lowers back to the canonical snake_case eval object contract when AgentV loads the file.

evals/greeting.eval.ts
import { defineEval, graders } from '@agentv/sdk';
export default defineEval({
name: 'hello-suite',
execution: {
targets: ['mock-sdk'],
},
workspace: {
hooks: {
beforeAll: {
command: ['echo', 'suite-start'],
},
},
},
tests: [
{
id: 'hello',
input: 'Say hello',
inputFiles: ['../fixtures/per-test-note.md'],
expectedOutput: 'Hello from the mock target',
assertions: [graders.contains('Hello')],
},
],
});

Useful companion helpers:

  • toEvalYamlObject() returns the canonical snake_case object.
  • serializeEvalYaml() returns YAML text using the same canonical field names.

The durable field remains assertions. This helper does not introduce a second YAML vocabulary.

@agentv/sdk includes a small graders catalog for common deterministic and LLM-backed grader configs. These helpers return ordinary assertions entries and serialize to the same canonical YAML you could write by hand.

import { defineEval, graders } from '@agentv/sdk';
export default defineEval({
name: 'grader-helper-suite',
tests: [
{
id: 'json-greeting',
input: 'Return a JSON greeting.',
assertions: [
graders.contains('Hello', { name: 'mentions-hello' }),
graders.exact('{"message":"Hello"}', { name: 'exact-json', minScore: 1 }),
graders.regex(/"message"\s*:/, { name: 'message-key' }),
graders.json({ name: 'valid-json', required: true }),
graders.rubrics(['Greets the user'], { name: 'rubric-review' }),
graders.llmGrader({
name: 'llm-review',
prompt: 'Grade whether the answer is useful.',
target: 'grader-target',
}),
graders.codeGrader(['bun', 'run', 'graders/check.ts'], { name: 'scripted-check' }),
],
},
],
});

The catalog covers contains, equals/exact, regex, is-json/json, rubrics, llm-grader, and code-grader. CamelCase SDK options such as minScore, maxSteps, and rubric scoreRanges lower to min_score, max_steps, and score_ranges when AgentV loads or serializes the suite.

Use defineAssertion from @agentv/sdk to create reusable assertion types. Place them in .agentv/assertions/ — they’re auto-discovered by filename.

.agentv/assertions/word-count.ts
import { defineAssertion } from '@agentv/sdk';
export default defineAssertion(({ output }) => {
const wordCount = (output ?? '').trim().split(/\s+/).filter(Boolean).length;
const pass = wordCount >= 3;
return {
pass,
assertions: [{ text: `Output has ${wordCount} words`, passed: pass }],
};
});

Return a score (0–1) instead of pass for graded evaluation:

.agentv/assertions/efficiency.ts
import { defineAssertion } from '@agentv/sdk';
export default defineAssertion(({ output, traceSummary }) => {
const hasContent = (output ?? '').length > 0 ? 0.5 : 0;
const isEfficient = (traceSummary?.eventCount ?? 0) <= 10 ? 0.5 : 0;
return {
score: hasContent + isEfficient,
reasoning: 'Checks content exists and is efficient',
};
});

If only pass is given, score is 1 (pass) or 0 (fail).

Convention-based discovery maps filename → assertion type:

.agentv/assertions/word-count.ts → type: word-count
.agentv/assertions/sentiment.ts → type: sentiment

Reference directly in your eval file — no command: needed:

assertions:
- type: word-count
- type: contains
value: "Hello"

Use defineCodeGrader from @agentv/sdk for full control over scoring with an explicit assertions array:

import { defineCodeGrader } from '@agentv/sdk';
export default defineCodeGrader(({ output, traceSummary }) => ({
score: (output ?? '').length > 0 && (traceSummary?.eventCount ?? 0) <= 5 ? 1.0 : 0.5,
assertions: [
{ text: 'Answer is not empty', passed: (output ?? '').length > 0 },
{ text: 'Efficient tool usage', passed: (traceSummary?.eventCount ?? 0) <= 5 },
],
}));

defineCodeGrader graders are referenced in YAML with type: code-grader and command: [bun, run, grader.ts]. defineAssertion uses convention-based discovery instead — just place in .agentv/assertions/ and reference by name.

For detailed patterns, input/output contracts, and language-agnostic examples, see Code Graders.

Raw grader stdin uses snake_case because it crosses a process boundary and may be consumed by Python, shell, jq, or external dashboards. The @agentv/sdk package converts that payload to idiomatic TypeScript camelCase before calling your handler.

Raw stdinSDK handler field
expected_outputexpectedOutput
output_pathoutputPath
trace_summarytraceSummary
token_usagetokenUsage
cost_usdcostUsd
duration_msdurationMs
workspace_pathworkspacePath

output is already the final answer string in both formats. Transcript-aware code should read messages, trace.messages, or trace.events; answer-text graders should read output.

Use evaluate() from @agentv/core to run evaluations as a library. The most portable pattern is still to keep the suite in YAML and point specFile at it; inline tests are best when the eval is tightly coupled to application code.

import { evaluate } from '@agentv/core';
const { results, summary } = await evaluate({
tests: [
{
id: 'greeting',
input: 'Say hello',
expectedOutput: 'Hello there!',
assert: [{ type: 'contains', value: 'Hello' }],
},
],
});
console.log(`${summary.passed}/${summary.total} passed`);

Auto-discovers the default target from .agentv/targets.yaml and .env credentials.

Point to an existing YAML eval instead of inlining tests:

import { evaluate } from '@agentv/core';
const { results, summary } = await evaluate({
specFile: './evals/my-eval.eval.yaml',
});

This is the recommended bridge when you want SDK control without creating a separate code-first eval surface.

Create agentv.config.ts at your project root for type-safe, validated configuration using defineConfig() from @agentv/core:

import { defineConfig } from '@agentv/core';
export default defineConfig({
execution: {
workers: 5,
maxRetries: 2,
verbose: true,
otelFile: '.agentv/results/otel-{timestamp}.json',
},
output: { dir: './results' },
limits: { maxCostUsd: 10.0 },
});

The config file is auto-discovered by the CLI from your project root and validated with Zod at startup.

AgentV’s observability surface is OpenTelemetry. For post-run workflows:

  • Use agentv eval ... --otel-file traces/eval.otlp.json to write OTLP JSON you can import into systems such as Opik.
  • Use agentv eval ... --export-otel --otel-backend <name> for live export when a built-in or local resolver exists.

AgentV does not currently ship a dedicated Opik authoring facade or built-in opik backend resolver. Keep the eval definition in YAML and route observability through OTLP export.

Bootstrap new assertions and eval files from the CLI:

Terminal window
# Create a new assertion type
agentv create assertion <name> # → .agentv/assertions/<name>.ts
# Create a new eval with test cases
agentv create eval <name> # → evals/<name>.eval.yaml + .cases.jsonl