airweave-setup
Scannednpx machina-cli add skill airweave-ai/claude-plugin/airweave-setup --openclawAirweave Setup & Integration
Airweave is an open-source platform that makes any app searchable for AI agents. It connects to apps, productivity tools, databases, or document stores and transforms their contents into searchable knowledge bases.
Quick Start
Option 1: Airweave Cloud (Recommended)
- Sign up at https://app.airweave.ai
- Get your API key from the dashboard
- Install the SDK:
pip install airweave-sdk # Python
npm install @airweave/sdk # TypeScript
Option 2: Self-Hosted
git clone https://github.com/airweave-ai/airweave.git
cd airweave
chmod +x start.sh
./start.sh
Access the dashboard at http://localhost:8080
Core Workflow
The typical Airweave workflow follows these steps:
1. Create a Collection
A collection groups multiple data sources into a single searchable endpoint.
Python:
from airweave import AirweaveSDK
client = AirweaveSDK(
api_key="YOUR_API_KEY",
base_url="https://api.airweave.ai" # or http://localhost:8001 for self-hosted
)
collection = client.collections.create(name="My Knowledge Base")
print(f"Collection ID: {collection.readable_id}")
TypeScript:
import { AirweaveSDKClient } from "@airweave/sdk";
const client = new AirweaveSDKClient({ apiKey: "YOUR_API_KEY" });
const collection = await client.collections.create({ name: "My Knowledge Base" });
2. Add Source Connections
Connect data sources to your collection. 40+ sources supported including:
- Productivity: Notion, Google Drive/Docs/Slides, Dropbox, OneDrive, SharePoint, Box, Airtable
- Communication: Slack, Gmail, Outlook, Teams, Google Calendar
- Project Management: Jira, Linear, Asana, Trello, Monday, ClickUp, Todoist
- Development: GitHub, GitLab, Bitbucket, Confluence
- CRM & Sales: Salesforce, HubSpot, Attio, Zendesk, Pipedrive, Shopify
- Data: Stripe, PostgreSQL
See SDK-REFERENCE.md for the complete list of source short names.
Python (API Key sources like Stripe, Linear):
source = client.source_connections.create(
name="My Stripe Connection",
short_name="stripe",
readable_collection_id=collection.readable_id,
authentication={
"credentials": {"api_key": "sk_live_your_stripe_key"}
}
)
OAuth sources (Slack, Google, Microsoft, etc.):
Most sources use OAuth for authentication. Use the Airweave UI at https://app.airweave.ai to connect these sources—it handles the OAuth flow automatically.
3. Search Your Data
Once synced, search across all connected sources with a single query:
Python:
results = client.collections.search(
readable_id=collection.readable_id,
query="customer feedback about pricing"
)
for result in results.results:
print(f"Source: {result['payload']['source_name']}")
print(f"Content: {result['payload']['md_content'][:200]}...")
print(f"Score: {result['score']}")
Advanced Search Features
Airweave provides powerful search capabilities:
Search Parameters
| Parameter | Type | Description |
|---|---|---|
query | string | Natural language search query |
search_type | "semantic" | "hybrid" | Semantic (default) or hybrid search |
limit | number | Max results (default: 100) |
offset | number | Skip results for pagination |
recency_bias | 0-1 | Prioritize recent results (0=none, 1=most recent) |
enable_reranking | boolean | AI reranking for better relevance |
enable_query_expansion | boolean | Expand query with variations |
response_type | "raw" | "completion" | Raw results or AI-generated answer |
top_k | number | Internal retrieval count before reranking |
Example: Advanced Search
results = client.collections.search(
readable_id=collection.readable_id,
query="technical documentation",
search_type="hybrid",
enable_query_expansion=True,
enable_reranking=True,
recency_bias=0.5,
top_k=50,
limit=20
)
Example: AI-Generated Answer
answer = client.collections.search(
readable_id=collection.readable_id,
query="What are our customer refund policies?",
response_type="completion",
enable_reranking=True
)
# Returns a synthesized answer instead of raw results
MCP Integration for AI Agents
Airweave exposes search via MCP (Model Context Protocol) for seamless AI agent integration.
Setup for Claude Desktop / Cursor
Add to your MCP configuration (~/.cursor/mcp.json):
{
"mcpServers": {
"airweave-search": {
"command": "npx",
"args": ["airweave-mcp-search"],
"env": {
"AIRWEAVE_API_KEY": "your-api-key",
"AIRWEAVE_COLLECTION": "your-collection-id",
"AIRWEAVE_BASE_URL": "https://api.airweave.ai"
}
}
}
}
Hosted MCP Server
For cloud-based AI platforms like OpenAI Agent Builder:
- URL:
https://mcp.airweave.ai - Uses Streamable HTTP transport (MCP 2025-03-26)
See MCP-SETUP.md for detailed configuration.
Common Patterns
Pattern 1: Search with Source Filtering
from airweave import SearchRequest, Filter, FieldCondition, MatchAny
search_request = SearchRequest(
query="project updates",
filter=Filter(
must=[
FieldCondition(
key="source_name",
match=MatchAny(any=["Slack", "GitHub"])
)
]
)
)
results = client.collections.search_advanced(
readable_id=collection.readable_id,
search_request=search_request
)
Pattern 2: Recent Documents First
results = client.collections.search(
readable_id=collection.readable_id,
query="critical bugs",
recency_bias=0.8, # Strongly prefer recent
limit=10
)
Pattern 3: High-Quality Results with Reranking
results = client.collections.search(
readable_id=collection.readable_id,
query="API documentation",
enable_reranking=True,
top_k=30,
limit=10
)
Troubleshooting
No Results Found
- Check that sync has completed (can take a few minutes for large sources)
- Verify the collection ID is correct
- Try a broader search query
Authentication Errors
- Verify your API key is valid
- Check that the API key has access to the collection
- For OAuth sources, the token may have expired—reconnect in the UI
Rate Limits
- The API has rate limits for protection
- Implement exponential backoff for retries
- Contact support for higher limits
Additional Resources
- Documentation: https://docs.airweave.ai
- API Reference: https://api.airweave.ai/docs (Swagger)
- GitHub: https://github.com/airweave-ai/airweave
- Discord: https://discord.gg/484HY9Ehxt
For detailed SDK reference, see SDK-REFERENCE.md. For advanced search patterns, see SEARCH-PATTERNS.md.
Source
git clone https://github.com/airweave-ai/claude-plugin/blob/main/skills/airweave-setup/SKILL.mdView on GitHub Overview
Airweave is an open-source platform that makes any app searchable for AI agents by turning connected data sources into a unified knowledge base. It supports cloud and self-hosted deployments and provides SDKs for Python and TypeScript to build searchable collections. This skill guides developers through installing Airweave, creating collections, connecting sources, and configuring the SDK integration in their apps.
How This Skill Works
Install the Airweave SDK (cloud via API key or self-hosted) and create a collection to group data sources. Connect sources (40+ options) to that collection, using API keys or OAuth flows managed in the UI. Use the SDK to search across all connected sources with configurable parameters (semantic or hybrid search, recency, reranking, etc.) to retrieve unified results.
When to Use It
- You want to install Airweave in your application and start an integration quickly (cloud or self-hosted).
- You need to create a single, centralized collection that groups multiple data sources into one searchable endpoint.
- You want to connect 40+ sources (Notion, Google Drive, Slack, Jira, GitHub, Stripe, etc.) to enable cross-source search.
- You aim to perform searches across all connected data with a single query using semantic or hybrid search modes.
- You plan to configure MCP servers or customize deployment (cloud signup vs. self-hosted) and set up the SDK for your app.
Quick Start
- Step 1: Decide Cloud (recommended) or Self-Hosted, then sign up at the Airweave dashboard or clone the repo if self-hosting.
- Step 2: Install the SDK: Python users run 'pip install airweave-sdk'; TypeScript users run 'npm install @airweave/sdk'.
- Step 3: Create a collection and add source connections, then start querying across connected sources.
Best Practices
- Create a dedicated collection per app or knowledge base to keep data organized and manageable.
- Prefer Airweave Cloud for fastest setup; consider self-hosting when data sovereignty is required.
- Use OAuth for source connections where available to simplify authentication and renewal.
- Regularly review and refresh source connections as data sources update or expire credentials.
- Leverage search parameters (search_type, limit, recency_bias, enable_reranking) to tune relevance and performance.
Example Use Cases
- A product team connects Notion, Google Drive, and GitHub into one collection to answer customer- and product-related questions with AI agents.
- An ops team uses OAuth-connected Slack and Gmail sources to enable AI-assisted retrieval across chats and emails.
- A developer sets up Airweave Cloud, creates a 'Knowledge Base' collection, and uses the Python SDK to run cross-source searches.
- A data-heavy organization runs a self-hosted Airweave instance to consolidate Stripe and PostgreSQL data for internal AI search.
- A project management unit links Jira, Trello, and Confluence to provide AI-powered project status and history queries.