Get the FREE Ultimate OpenClaw Setup Guide →

pydantic-ai-testing

Scanned
npx machina-cli add skill existential-birds/beagle/pydantic-ai-testing --openclaw
Files (1)
SKILL.md
6.1 KB

Testing PydanticAI Agents

TestModel (Deterministic Testing)

Use TestModel for tests without API calls:

import pytest
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel

def test_agent_basic():
    agent = Agent('openai:gpt-4o')

    # Override with TestModel for testing
    result = agent.run_sync('Hello', model=TestModel())

    # TestModel generates deterministic output based on output_type
    assert isinstance(result.output, str)

TestModel Configuration

from pydantic_ai.models.test import TestModel

# Custom text output
model = TestModel(custom_output_text='Custom response')
result = agent.run_sync('Hello', model=model)
assert result.output == 'Custom response'

# Custom structured output (for output_type agents)
from pydantic import BaseModel

class Response(BaseModel):
    message: str
    score: int

agent = Agent('openai:gpt-4o', output_type=Response)
model = TestModel(custom_output_args={'message': 'Test', 'score': 42})
result = agent.run_sync('Hello', model=model)
assert result.output.message == 'Test'

# Seed for reproducible random output
model = TestModel(seed=42)

# Force tool calls
model = TestModel(call_tools=['my_tool', 'another_tool'])

Override Context Manager

from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel

agent = Agent('openai:gpt-4o', deps_type=MyDeps)

def test_with_override():
    mock_deps = MyDeps(db=MockDB())

    with agent.override(model=TestModel(), deps=mock_deps):
        # All runs use TestModel and mock_deps
        result = agent.run_sync('Hello')
        assert result.output

FunctionModel (Custom Logic)

For complete control over model responses:

from pydantic_ai import Agent, ModelMessage, ModelResponse, TextPart
from pydantic_ai.models.function import AgentInfo, FunctionModel

def custom_model(
    messages: list[ModelMessage],
    info: AgentInfo
) -> ModelResponse:
    """Custom model that inspects messages and returns response."""
    # Access the last user message
    last_msg = messages[-1]

    # Return custom response
    return ModelResponse(parts=[TextPart('Custom response')])

agent = Agent(FunctionModel(custom_model))
result = agent.run_sync('Hello')

FunctionModel with Tool Calls

from pydantic_ai import ToolCallPart, ModelResponse
from pydantic_ai.models.function import AgentInfo, FunctionModel

def model_with_tools(
    messages: list[ModelMessage],
    info: AgentInfo
) -> ModelResponse:
    # First request: call a tool
    if len(messages) == 1:
        return ModelResponse(parts=[
            ToolCallPart(
                tool_name='get_data',
                args='{"id": 123}'
            )
        ])

    # After tool response: return final result
    return ModelResponse(parts=[TextPart('Done with tool result')])

agent = Agent(FunctionModel(model_with_tools))

@agent.tool_plain
def get_data(id: int) -> str:
    return f"Data for {id}"

result = agent.run_sync('Get data')

VCR Cassettes (Recorded API Calls)

Record and replay real LLM API interactions:

import pytest

@pytest.mark.vcr
def test_with_recorded_response():
    """Uses recorded cassette from tests/cassettes/"""
    agent = Agent('openai:gpt-4o')
    result = agent.run_sync('Hello')
    assert 'hello' in result.output.lower()

# To record/update cassettes:
# uv run pytest --record-mode=rewrite tests/test_file.py

Cassette files are stored in tests/cassettes/ as YAML.

Inline Snapshots

Assert expected outputs with auto-updating snapshots:

from inline_snapshot import snapshot

def test_agent_output():
    result = agent.run_sync('Hello', model=TestModel())

    # First run: creates snapshot
    # Subsequent runs: asserts against it
    assert result.output == snapshot('expected output here')

# Update snapshots:
# uv run pytest --inline-snapshot=fix

Testing Tools

from pydantic_ai import Agent, RunContext
from pydantic_ai.models.test import TestModel

def test_tool_is_called():
    agent = Agent('openai:gpt-4o')
    tool_called = False

    @agent.tool_plain
    def my_tool(x: int) -> str:
        nonlocal tool_called
        tool_called = True
        return f"Result: {x}"

    # Force TestModel to call the tool
    result = agent.run_sync(
        'Use my_tool',
        model=TestModel(call_tools=['my_tool'])
    )

    assert tool_called

Testing with Dependencies

from dataclasses import dataclass
from unittest.mock import AsyncMock

@dataclass
class Deps:
    api: ApiClient

def test_tool_with_deps():
    # Create mock dependency
    mock_api = AsyncMock()
    mock_api.fetch.return_value = {'data': 'test'}

    agent = Agent('openai:gpt-4o', deps_type=Deps)

    @agent.tool
    async def fetch_data(ctx: RunContext[Deps]) -> dict:
        return await ctx.deps.api.fetch()

    with agent.override(
        model=TestModel(call_tools=['fetch_data']),
        deps=Deps(api=mock_api)
    ):
        result = agent.run_sync('Fetch data')

    mock_api.fetch.assert_called_once()

Capture Messages

Inspect all messages in a run:

from pydantic_ai import Agent, capture_run_messages

agent = Agent('openai:gpt-4o')

with capture_run_messages() as messages:
    result = agent.run_sync('Hello', model=TestModel())

# Inspect captured messages
for msg in messages:
    print(msg)

Testing Patterns Summary

ScenarioApproach
Unit tests without APITestModel()
Custom model logicFunctionModel(func)
Recorded real responses@pytest.mark.vcr
Assert output structureinline_snapshot
Test tools are calledTestModel(call_tools=[...])
Mock dependenciesagent.override(deps=...)

pytest Configuration

Typical pyproject.toml:

[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"  # For async tests

Run tests:

uv run pytest tests/test_agent.py -v
uv run pytest --inline-snapshot=fix  # Update snapshots

Source

git clone https://github.com/existential-birds/beagle/blob/main/plugins/beagle-ai/skills/pydantic-ai-testing/SKILL.mdView on GitHub

Overview

Pattern-driven testing of PydanticAI agents using TestModel for deterministic outputs, FunctionModel for custom logic, and VCR cassettes and inline snapshots to validate results. This approach helps you write reliable unit tests, mock LLM responses, and record API interactions for stable test runs.

How This Skill Works

TestModel runs without API calls to produce deterministic outputs. FunctionModel enables custom response logic and tool-call flows; VCR cassettes capture and replay real API interactions via pytest.mark.vcr, while inline snapshots provide auto-updating output assertions.

When to Use It

  • Need deterministic unit tests without making external API calls.
  • Want to mock LLM responses to exercise downstream logic.
  • Need to test tool-call flows or multi-step interactions.
  • Want to record and replay real API interactions for stable tests.
  • Prefer automatic output verification with inline snapshots.

Quick Start

  1. Step 1: Choose a model (TestModel for deterministic tests or FunctionModel for custom logic).
  2. Step 2: Write a test that runs agent.run_sync(...) with the chosen model and asserts the result.
  3. Step 3: Run tests with pytest; enable VCR cassettes and inline snapshots as needed.

Best Practices

  • Use TestModel for pure determinism in unit tests.
  • Leverage TestModel seed or custom_output to shape responses.
  • Use the Override Context Manager to swap models or dependencies during tests.
  • When custom behavior is required, implement a FunctionModel with a clear contract.
  • Combine VCR cassettes and inline snapshots to stabilize tests.

Example Use Cases

  • Deterministic test: test_agent_basic uses TestModel to ensure a predictable string output.
  • Custom deterministic text: TestModel with custom_output_text='Custom response' in a test.
  • Tool flow: FunctionModel that calls a tool on first message, then returns final result.
  • VCR recording: decorate a test with @pytest.mark.vcr to replay OpenAI calls.
  • Inline snapshots: capture and auto-update agent outputs using inline snapshots in tests.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers