How do I decide between TypedDict and Pydantic?

Use TypedDict for most state schemas to keep things lightweight; opt for Pydantic when you need runtime validation or complex nested structures.

What is the difference between Static Edges and Dynamic Routing?

Static Edges provide a fixed, build-time route for easier reasoning, while Dynamic Routing uses runtime decisions based on state and supports agent-driven navigation and flexible destinations.

When should I use InMemorySaver vs SqliteSaver vs PostgresSaver?

InMemorySaver is best for testing (lost on restart), SqliteSaver works well for development (single file, local), and PostgresSaver is suitable for production with durability and scalability needs.

langgraph-architecture

Scanned

npx machina-cli add skill existential-birds/beagle/langgraph-architecture --openclaw

Files (1)

SKILL.md

9.2 KB

LangGraph Architecture Decisions

When to Use LangGraph

Use LangGraph When You Need:

Stateful conversations - Multi-turn interactions with memory
Human-in-the-loop - Approval gates, corrections, interventions
Complex control flow - Loops, branches, conditional routing
Multi-agent coordination - Multiple LLMs working together
Persistence - Resume from checkpoints, time travel debugging
Streaming - Real-time token streaming, progress updates
Reliability - Retries, error recovery, durability guarantees

Consider Alternatives When:

Scenario	Alternative	Why
Single LLM call	Direct API call	Overhead not justified
Linear pipeline	LangChain LCEL	Simpler abstraction
Stateless tool use	Function calling	No persistence needed
Simple RAG	LangChain retrievers	Built-in patterns
Batch processing	Async tasks	Different execution model

State Schema Decisions

TypedDict vs Pydantic

TypedDict	Pydantic
Lightweight, faster	Runtime validation
Dict-like access	Attribute access
No validation overhead	Type coercion
Simpler serialization	Complex nested models

Recommendation: Use TypedDict for most cases. Use Pydantic when you need validation or complex nested structures.

Reducer Selection

Use Case	Reducer	Example
Chat messages	`add_messages`	Handles IDs, RemoveMessage
Simple append	`operator.add`	`Annotated[list, operator.add]`
Keep latest	None (LastValue)	`field: str`
Custom merge	Lambda	`Annotated[list, lambda a, b: ...]`
Overwrite list	`Overwrite`	Bypass reducer

State Size Considerations

# SMALL STATE (< 1MB) - Put in state
class State(TypedDict):
    messages: Annotated[list, add_messages]
    context: str

# LARGE DATA - Use Store
class State(TypedDict):
    messages: Annotated[list, add_messages]
    document_ref: str  # Reference to store

def node(state, *, store: BaseStore):
    doc = store.get(namespace, state["document_ref"])
    # Process without bloating checkpoints

Graph Structure Decisions

Single Graph vs Subgraphs

Single Graph when:

All nodes share the same state schema
Simple linear or branching flow
< 10 nodes

Subgraphs when:

Different state schemas needed
Reusable components across graphs
Team separation of concerns
Complex hierarchical workflows

Conditional Edges vs Command

Conditional Edges	Command
Routing based on state	Routing + state update
Separate router function	Decision in node
Clearer visualization	More flexible
Standard patterns	Dynamic destinations

# Conditional Edge - when routing is the focus
def router(state) -> Literal["a", "b"]:
    return "a" if condition else "b"
builder.add_conditional_edges("node", router)

# Command - when combining routing with updates
def node(state) -> Command:
    return Command(goto="next", update={"step": state["step"] + 1})

Static vs Dynamic Routing

Static Edges (add_edge):

Fixed flow known at build time
Clearer graph visualization
Easier to reason about

Dynamic Routing (add_conditional_edges, Command, Send):

Runtime decisions based on state
Agent-driven navigation
Fan-out patterns

Persistence Strategy

Checkpointer Selection

Checkpointer	Use Case	Characteristics
`InMemorySaver`	Testing only	Lost on restart
`SqliteSaver`	Development	Single file, local
`PostgresSaver`	Production	Scalable, concurrent
Custom	Special needs	Implement BaseCheckpointSaver

Checkpointing Scope

# Full persistence (default)
graph = builder.compile(checkpointer=checkpointer)

# Subgraph options
subgraph = sub_builder.compile(
    checkpointer=None,   # Inherit from parent
    checkpointer=True,   # Independent checkpointing
    checkpointer=False,  # No checkpointing (runs atomically)
)

When to Disable Checkpointing

Short-lived subgraphs that should be atomic
Subgraphs with incompatible state schemas
Performance-critical paths without need for resume

Multi-Agent Architecture

Supervisor Pattern

Best for:

Clear hierarchy
Centralized decision making
Different agent specializations

          ┌─────────────┐
          │  Supervisor │
          └──────┬──────┘
    ┌────────┬───┴───┬────────┐
    ▼        ▼       ▼        ▼
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│Agent1│ │Agent2│ │Agent3│ │Agent4│
└──────┘ └──────┘ └──────┘ └──────┘

Peer-to-Peer Pattern

Best for:

Collaborative agents
No clear hierarchy
Flexible communication

┌──────┐     ┌──────┐
│Agent1│◄───►│Agent2│
└──┬───┘     └───┬──┘
   │             │
   ▼             ▼
┌──────┐     ┌──────┐
│Agent3│◄───►│Agent4│
└──────┘     └──────┘

Handoff Pattern

Best for:

Sequential specialization
Clear stage transitions
Different capabilities per stage

┌────────┐    ┌────────┐    ┌────────┐
│Research│───►│Planning│───►│Execute │
└────────┘    └────────┘    └────────┘

Streaming Strategy

Stream Mode Selection

Mode	Use Case	Data
`updates`	UI updates	Node outputs only
`values`	State inspection	Full state each step
`messages`	Chat UX	LLM tokens
`custom`	Progress/logs	Your data via StreamWriter
`debug`	Debugging	Tasks + checkpoints

Subgraph Streaming

# Stream from subgraphs
async for chunk in graph.astream(
    input,
    stream_mode="updates",
    subgraphs=True  # Include subgraph events
):
    namespace, data = chunk  # namespace indicates depth

Human-in-the-Loop Design

Interrupt Placement

Strategy	Use Case
`interrupt_before`	Approval before action
`interrupt_after`	Review after completion
`interrupt()` in node	Dynamic, contextual pauses

Resume Patterns

# Simple resume (same thread)
graph.invoke(None, config)

# Resume with value
graph.invoke(Command(resume="approved"), config)

# Resume specific interrupt
graph.invoke(Command(resume={interrupt_id: value}), config)

# Modify state and resume
graph.update_state(config, {"field": "new_value"})
graph.invoke(None, config)

Error Handling Strategy

Retry Configuration

# Per-node retry
RetryPolicy(
    initial_interval=0.5,
    backoff_factor=2.0,
    max_interval=60.0,
    max_attempts=3,
    retry_on=lambda e: isinstance(e, (APIError, TimeoutError))
)

# Multiple policies (first match wins)
builder.add_node("node", fn, retry_policy=[
    RetryPolicy(retry_on=RateLimitError, max_attempts=5),
    RetryPolicy(retry_on=Exception, max_attempts=2),
])

Fallback Patterns

def node_with_fallback(state):
    try:
        return primary_operation(state)
    except PrimaryError:
        return fallback_operation(state)

# Or use conditional edges for complex fallback routing
def route_on_error(state) -> Literal["retry", "fallback", "__end__"]:
    if state.get("error") and state["attempts"] < 3:
        return "retry"
    elif state.get("error"):
        return "fallback"
    return END

Scaling Considerations

Horizontal Scaling

Use PostgresSaver for shared state
Consider LangGraph Platform for managed infrastructure
Use stores for large data outside checkpoints

Performance Optimization

Minimize state size - Use references for large data
Parallel nodes - Fan out when possible
Cache expensive operations - Use CachePolicy
Async everywhere - Use ainvoke, astream

Resource Limits

# Set recursion limit
config = {"recursion_limit": 50}
graph.invoke(input, config)

# Track remaining steps in state
class State(TypedDict):
    remaining_steps: RemainingSteps

def check_budget(state):
    if state["remaining_steps"] < 5:
        return "wrap_up"
    return "continue"

Decision Checklist

Before implementing:

Is LangGraph the right tool? (vs simpler alternatives)
State schema defined with appropriate reducers?
Persistence strategy chosen? (dev vs prod checkpointer)
Streaming needs identified?
Human-in-the-loop points defined?
Error handling and retry strategy?
Multi-agent coordination pattern? (if applicable)
Resource limits configured?

Source

git clone https://github.com/existential-birds/beagle/blob/main/plugins/beagle-ai/skills/langgraph-architecture/SKILL.mdView on GitHub

Overview

Guides architectural choices for LangGraph apps, from state management and graph design to persistence and streaming. It helps you decide between LangGraph and alternatives, and outlines practical patterns for multi-agent systems and reliability.

How This Skill Works

The skill presents concrete decision criteria and patterns: when to use TypedDict vs Pydantic for state schemas, how to pick a reducer, how to structure graphs (single vs subgraphs), routing strategies (static vs dynamic), and how to select a persistence approach with checkpointers. It translates these decisions into actionable steps and code-ready guidance.

When to Use It

Stateful conversations with memory
Human-in-the-loop workflows (approvals and corrections)
Complex control flow with loops, branches, and conditional routing
Multi-agent coordination (multiple LLMs)
Need for persistence or streaming (checkpoints, resume, real-time updates)

Quick Start

Step 1: Assess whether your app requires memory, multi-agent coordination, persistence, and streaming.
Step 2: Pick a state schema (TypedDict vs Pydantic), select a reducer, and decide between a Single Graph or Subgraphs with an appropriate routing approach.
Step 3: Choose a checkpointer (InMemorySaver, SqliteSaver, or PostgresSaver) based on your environment and wire it into your node.

Best Practices

Prefer TypedDict for most state schemas; use Pydantic only when you need runtime validation or complex nested structures.
Choose reducers to match the use case: add_messages for chat history, operator.add for simple appends, LastValue for keeping the latest item, lambda for custom merges, Overwrite to replace lists.
Keep state size in mind: store large data externally when necessary and store only essential state in memory to avoid bloating checkpoints.
Design graph structure early: use a Single Graph for uniform schemas and simple flows; use Subgraphs for multiple schemas and reusable components.
Balance routing strategy: use Static Edges for fixed flows; switch to Dynamic Routing with conditional edges or Command for runtime decisions; align routing with state updates.

Example Use Cases

A customer-support agent with memory across turns, including human-in-the-loop approvals when necessary.
An orchestration of multiple tools and LLMs to complete a complex task via coordinated agents.
A long-running process that streams real-time progress and results to the user.
Checkpointing and resuming from saved state by referencing external stores for large data.
Dynamic routing that adapts to user context and external signals using conditional edges and Commands.

Frequently Asked Questions

Add this skill to your agents