How do I override the model at runtime?

Pass a model to run(), e.g., agent.run('Hello', model='anthropic:claude-sonnet-4-5'), or use the override context manager for a scoped change.

What is a FallbackModel and when should I use it?

A FallbackModel chains multiple models to maintain availability; use it during rate limits or server errors to automatically retry with alternative providers.

How do I stream responses and access the final result?

Use run_stream() to yield chunks as they arrive and then access response.usage() and response.output (or a structured output model) after streaming completes.

pydantic-ai-model-integration

Scanned

npx machina-cli add skill existential-birds/beagle/pydantic-ai-model-integration --openclaw

Files (1)

SKILL.md

5.2 KB

PydanticAI Model Integration

Provider Model Strings

Format: provider:model-name

from pydantic_ai import Agent

# OpenAI
Agent('openai:gpt-4o')
Agent('openai:gpt-4o-mini')
Agent('openai:o1-preview')

# Anthropic
Agent('anthropic:claude-sonnet-4-5')
Agent('anthropic:claude-haiku-4-5')

# Google (API Key)
Agent('google-gla:gemini-2.0-flash')
Agent('google-gla:gemini-2.0-pro')

# Google (Vertex AI)
Agent('google-vertex:gemini-2.0-flash')

# Groq
Agent('groq:llama-3.3-70b-versatile')
Agent('groq:mixtral-8x7b-32768')

# Mistral
Agent('mistral:mistral-large-latest')

# Other providers
Agent('cohere:command-r-plus')
Agent('bedrock:anthropic.claude-3-sonnet')

Model Settings

from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings

agent = Agent(
    'openai:gpt-4o',
    model_settings=ModelSettings(
        temperature=0.7,
        max_tokens=1000,
        top_p=0.9,
        timeout=30.0,  # Request timeout
    )
)

# Override per-run
result = await agent.run(
    'Generate creative text',
    model_settings=ModelSettings(temperature=1.0)
)

Fallback Models

Chain models for resilience:

from pydantic_ai.models.fallback import FallbackModel

# Try models in order until one succeeds
fallback = FallbackModel(
    'openai:gpt-4o',
    'anthropic:claude-sonnet-4-5',
    'google-gla:gemini-2.0-flash'
)

agent = Agent(fallback)
result = await agent.run('Hello')

# Custom fallback conditions
from pydantic_ai.exceptions import ModelAPIError

def should_fallback(error: Exception) -> bool:
    """Only fallback on rate limits or server errors."""
    if isinstance(error, ModelAPIError):
        return error.status_code in (429, 500, 502, 503)
    return False

fallback = FallbackModel(
    'openai:gpt-4o',
    'anthropic:claude-sonnet-4-5',
    fallback_on=should_fallback
)

Streaming Responses

async def stream_response():
    async with agent.run_stream('Tell me a story') as response:
        # Stream text output
        async for chunk in response.stream_output():
            print(chunk, end='', flush=True)

    # Access final result after streaming
    print(f"\nTokens used: {response.usage().total_tokens}")

Streaming with Structured Output

from pydantic import BaseModel

class Story(BaseModel):
    title: str
    content: str
    moral: str

agent = Agent('openai:gpt-4o', output_type=Story)

async with agent.run_stream('Write a fable') as response:
    # For structured output, stream_output yields partial JSON
    async for partial in response.stream_output():
        print(partial)  # Partial Story object as parsed

    # Final validated result
    story = response.output

Dynamic Model Selection

import os

# Environment-based selection
model = os.getenv('PYDANTIC_AI_MODEL', 'openai:gpt-4o')
agent = Agent(model)

# Runtime model override
result = await agent.run(
    'Hello',
    model='anthropic:claude-sonnet-4-5'  # Override default
)

# Context manager override
with agent.override(model='google-gla:gemini-2.0-flash'):
    result = agent.run_sync('Hello')

Deferred Model Checking

Delay model validation for testing:

# Default: Validates model immediately (checks env vars)
agent = Agent('openai:gpt-4o')

# Deferred: Validates only on first run
agent = Agent('openai:gpt-4o', defer_model_check=True)

# Useful for testing with override
with agent.override(model=TestModel()):
    result = agent.run_sync('Test')  # No OpenAI key needed

Usage Tracking

result = await agent.run('Hello')

# Request usage (last request)
usage = result.usage()
print(f"Input tokens: {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
print(f"Total tokens: {usage.total_tokens}")

# Full run usage (all requests in run)
run_usage = result.run_usage()
print(f"Total requests: {run_usage.requests}")

Usage Limits

from pydantic_ai.usage import UsageLimits

# Limit token usage
result = await agent.run(
    'Generate content',
    usage_limits=UsageLimits(
        total_tokens=1000,
        request_tokens=500,
        response_tokens=500,
    )
)

Provider-Specific Features

OpenAI

from pydantic_ai.models.openai import OpenAIModel

model = OpenAIModel(
    'gpt-4o',
    api_key='your-key',  # Or use OPENAI_API_KEY env var
    base_url='https://custom-endpoint.com'  # For Azure, proxies
)

Anthropic

from pydantic_ai.models.anthropic import AnthropicModel

model = AnthropicModel(
    'claude-sonnet-4-5',
    api_key='your-key'  # Or ANTHROPIC_API_KEY
)

Common Model Patterns

Use Case	Recommendation
General purpose	`openai:gpt-4o` or `anthropic:claude-sonnet-4-5`
Fast/cheap	`openai:gpt-4o-mini` or `anthropic:claude-haiku-4-5`
Long context	`anthropic:claude-sonnet-4-5` (200k) or `google-gla:gemini-2.0-flash`
Reasoning	`openai:o1-preview`
Cost-sensitive prod	`FallbackModel` with fast model first

Source

git clone https://github.com/existential-birds/beagle/blob/main/plugins/beagle-ai/skills/pydantic-ai-model-integration/SKILL.md

View on GitHub

Overview

Configures LLM providers, models, fallbacks, streaming, and per-run settings in PydanticAI. It helps you pick the right model, build resilience against outages, and optimize API calls for speed and cost.

How This Skill Works

Define provider:model-name strings to instantiate an Agent. Attach ModelSettings to tune temperature, max_tokens, top_p, and timeouts, and override per-run when needed. Use FallbackModel to chain alternatives for resilience, and leverage run_stream for streaming outputs, including optional structured output parsing.

When to Use It

You need to select between providers and models based on cost, latency, or feature sets.
You want resilience by automatically switching models when a response fails or is rate-limited.
You require real-time or chunked outputs via streaming responses.
You’re testing different models or running in CI with deferred model validation.
You want to track usage and tune API calls for efficiency.

Quick Start

Step 1: Install and import Agent from pydantic_ai and create your first agent with a provider string (e.g., 'openai:gpt-4o').
Step 2: Optionally attach ModelSettings for tuning (temperature, max_tokens, etc.) or set up a FallbackModel for resilience.
Step 3: Run your prompt with run() or stream with run_stream(), and inspect usage or final structured output.

Best Practices

Use explicit provider:model strings to avoid ambiguity.
Tune ModelSettings thoughtfully (temperature, max_tokens, timeout) to balance quality and cost.
Combine a FallbackModel with robust error handling to improve uptime.
Test streaming with both unstructured and structured outputs to ensure compatibility.
Leverage environment-based model selection for predictable deployments.

Example Use Cases

Create an agent with a specific provider: Agent('openai:gpt-4o').
Define a 3-model fallback: FallbackModel('openai:gpt-4o', 'anthropic:claude-sonnet-4-5', 'google-gla:gemini-2.0-flash').
Stream a response: await agent.run_stream('Tell me a story') and iterate over response.stream_output().
Override model at runtime: result = await agent.run('Hello', model='anthropic:claude-sonnet-4-5').
Defer model validation for tests: Agent('openai:gpt-4o', defer_model_check=True).

Frequently Asked Questions

Add this skill to your agents