pydantic-ai-model-integration
Scannednpx machina-cli add skill existential-birds/beagle/pydantic-ai-model-integration --openclawFiles (1)
SKILL.md
5.2 KB
PydanticAI Model Integration
Provider Model Strings
Format: provider:model-name
from pydantic_ai import Agent
# OpenAI
Agent('openai:gpt-4o')
Agent('openai:gpt-4o-mini')
Agent('openai:o1-preview')
# Anthropic
Agent('anthropic:claude-sonnet-4-5')
Agent('anthropic:claude-haiku-4-5')
# Google (API Key)
Agent('google-gla:gemini-2.0-flash')
Agent('google-gla:gemini-2.0-pro')
# Google (Vertex AI)
Agent('google-vertex:gemini-2.0-flash')
# Groq
Agent('groq:llama-3.3-70b-versatile')
Agent('groq:mixtral-8x7b-32768')
# Mistral
Agent('mistral:mistral-large-latest')
# Other providers
Agent('cohere:command-r-plus')
Agent('bedrock:anthropic.claude-3-sonnet')
Model Settings
from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings
agent = Agent(
'openai:gpt-4o',
model_settings=ModelSettings(
temperature=0.7,
max_tokens=1000,
top_p=0.9,
timeout=30.0, # Request timeout
)
)
# Override per-run
result = await agent.run(
'Generate creative text',
model_settings=ModelSettings(temperature=1.0)
)
Fallback Models
Chain models for resilience:
from pydantic_ai.models.fallback import FallbackModel
# Try models in order until one succeeds
fallback = FallbackModel(
'openai:gpt-4o',
'anthropic:claude-sonnet-4-5',
'google-gla:gemini-2.0-flash'
)
agent = Agent(fallback)
result = await agent.run('Hello')
# Custom fallback conditions
from pydantic_ai.exceptions import ModelAPIError
def should_fallback(error: Exception) -> bool:
"""Only fallback on rate limits or server errors."""
if isinstance(error, ModelAPIError):
return error.status_code in (429, 500, 502, 503)
return False
fallback = FallbackModel(
'openai:gpt-4o',
'anthropic:claude-sonnet-4-5',
fallback_on=should_fallback
)
Streaming Responses
async def stream_response():
async with agent.run_stream('Tell me a story') as response:
# Stream text output
async for chunk in response.stream_output():
print(chunk, end='', flush=True)
# Access final result after streaming
print(f"\nTokens used: {response.usage().total_tokens}")
Streaming with Structured Output
from pydantic import BaseModel
class Story(BaseModel):
title: str
content: str
moral: str
agent = Agent('openai:gpt-4o', output_type=Story)
async with agent.run_stream('Write a fable') as response:
# For structured output, stream_output yields partial JSON
async for partial in response.stream_output():
print(partial) # Partial Story object as parsed
# Final validated result
story = response.output
Dynamic Model Selection
import os
# Environment-based selection
model = os.getenv('PYDANTIC_AI_MODEL', 'openai:gpt-4o')
agent = Agent(model)
# Runtime model override
result = await agent.run(
'Hello',
model='anthropic:claude-sonnet-4-5' # Override default
)
# Context manager override
with agent.override(model='google-gla:gemini-2.0-flash'):
result = agent.run_sync('Hello')
Deferred Model Checking
Delay model validation for testing:
# Default: Validates model immediately (checks env vars)
agent = Agent('openai:gpt-4o')
# Deferred: Validates only on first run
agent = Agent('openai:gpt-4o', defer_model_check=True)
# Useful for testing with override
with agent.override(model=TestModel()):
result = agent.run_sync('Test') # No OpenAI key needed
Usage Tracking
result = await agent.run('Hello')
# Request usage (last request)
usage = result.usage()
print(f"Input tokens: {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
print(f"Total tokens: {usage.total_tokens}")
# Full run usage (all requests in run)
run_usage = result.run_usage()
print(f"Total requests: {run_usage.requests}")
Usage Limits
from pydantic_ai.usage import UsageLimits
# Limit token usage
result = await agent.run(
'Generate content',
usage_limits=UsageLimits(
total_tokens=1000,
request_tokens=500,
response_tokens=500,
)
)
Provider-Specific Features
OpenAI
from pydantic_ai.models.openai import OpenAIModel
model = OpenAIModel(
'gpt-4o',
api_key='your-key', # Or use OPENAI_API_KEY env var
base_url='https://custom-endpoint.com' # For Azure, proxies
)
Anthropic
from pydantic_ai.models.anthropic import AnthropicModel
model = AnthropicModel(
'claude-sonnet-4-5',
api_key='your-key' # Or ANTHROPIC_API_KEY
)
Common Model Patterns
| Use Case | Recommendation |
|---|---|
| General purpose | openai:gpt-4o or anthropic:claude-sonnet-4-5 |
| Fast/cheap | openai:gpt-4o-mini or anthropic:claude-haiku-4-5 |
| Long context | anthropic:claude-sonnet-4-5 (200k) or google-gla:gemini-2.0-flash |
| Reasoning | openai:o1-preview |
| Cost-sensitive prod | FallbackModel with fast model first |
Source
git clone https://github.com/existential-birds/beagle/blob/main/plugins/beagle-ai/skills/pydantic-ai-model-integration/SKILL.mdView on GitHub Overview
Configures LLM providers, models, fallbacks, streaming, and per-run settings in PydanticAI. It helps you pick the right model, build resilience against outages, and optimize API calls for speed and cost.
How This Skill Works
Define provider:model-name strings to instantiate an Agent. Attach ModelSettings to tune temperature, max_tokens, top_p, and timeouts, and override per-run when needed. Use FallbackModel to chain alternatives for resilience, and leverage run_stream for streaming outputs, including optional structured output parsing.
When to Use It
- You need to select between providers and models based on cost, latency, or feature sets.
- You want resilience by automatically switching models when a response fails or is rate-limited.
- You require real-time or chunked outputs via streaming responses.
- You’re testing different models or running in CI with deferred model validation.
- You want to track usage and tune API calls for efficiency.
Quick Start
- Step 1: Install and import Agent from pydantic_ai and create your first agent with a provider string (e.g., 'openai:gpt-4o').
- Step 2: Optionally attach ModelSettings for tuning (temperature, max_tokens, etc.) or set up a FallbackModel for resilience.
- Step 3: Run your prompt with run() or stream with run_stream(), and inspect usage or final structured output.
Best Practices
- Use explicit provider:model strings to avoid ambiguity.
- Tune ModelSettings thoughtfully (temperature, max_tokens, timeout) to balance quality and cost.
- Combine a FallbackModel with robust error handling to improve uptime.
- Test streaming with both unstructured and structured outputs to ensure compatibility.
- Leverage environment-based model selection for predictable deployments.
Example Use Cases
- Create an agent with a specific provider: Agent('openai:gpt-4o').
- Define a 3-model fallback: FallbackModel('openai:gpt-4o', 'anthropic:claude-sonnet-4-5', 'google-gla:gemini-2.0-flash').
- Stream a response: await agent.run_stream('Tell me a story') and iterate over response.stream_output().
- Override model at runtime: result = await agent.run('Hello', model='anthropic:claude-sonnet-4-5').
- Defer model validation for tests: Agent('openai:gpt-4o', defer_model_check=True).
Frequently Asked Questions
Add this skill to your agents