When should I use Fixed-Size vs. Semantic Chunking?

Use Fixed-Size for simple, unstructured content; use Semantic Chunking for complex documents with thematic shifts where embedding boundaries improve retrieval.

How do I set chunk size and overlap?

Base chunk size on the embedding model's context window and start with 10-20% overlap; adjust by document type and query needs (e.g., smaller for factoid, larger for analytical queries).

How is success measured in chunking?

Assess Retrieval Precision, Retrieval Recall, End-to-End Accuracy, Processing Time, and Resource Usage to balance quality and cost.

chunking-strategy

Scanned

npx machina-cli add skill giuseppe-trisciuoglio/developer-kit/chunking-strategy --openclaw

Files (1)

SKILL.md

7.0 KB

Chunking Strategy for RAG Systems

Overview

Implement optimal chunking strategies for Retrieval-Augmented Generation (RAG) systems and document processing pipelines. This skill provides a comprehensive framework for breaking large documents into smaller, semantically meaningful segments that preserve context while enabling efficient retrieval and search.

When to Use

Use this skill when building RAG systems, optimizing vector search performance, implementing document processing pipelines, handling multi-modal content, or performance-tuning existing RAG systems with poor retrieval quality.

Instructions

Choose Chunking Strategy

Select appropriate chunking strategy based on document type and use case:

Fixed-Size Chunking (Level 1)
- Use for simple documents without clear structure
- Start with 512 tokens and 10-20% overlap
- Adjust size based on query type: 256 for factoid, 1024 for analytical
Recursive Character Chunking (Level 2)
- Use for documents with clear structural boundaries
- Implement hierarchical separators: paragraphs → sentences → words
- Customize separators for document types (HTML, Markdown)
Structure-Aware Chunking (Level 3)
- Use for structured documents (Markdown, code, tables, PDFs)
- Preserve semantic units: functions, sections, table blocks
- Validate structure preservation post-splitting
Semantic Chunking (Level 4)
- Use for complex documents with thematic shifts
- Implement embedding-based boundary detection
- Configure similarity threshold (0.8) and buffer size (3-5 sentences)
Advanced Methods (Level 5)
- Use Late Chunking for long-context embedding models
- Apply Contextual Retrieval for high-precision requirements
- Monitor computational costs vs. retrieval improvements

Reference detailed strategy implementations in references/strategies.md.

Implement Chunking Pipeline

Follow these steps to implement effective chunking:

Pre-process documents
- Analyze document structure and content types
- Identify multi-modal content (tables, images, code)
- Assess information density and complexity
Select strategy parameters
- Choose chunk size based on embedding model context window
- Set overlap percentage (10-20% for most cases)
- Configure strategy-specific parameters
Process and validate
- Apply chosen chunking strategy
- Validate semantic coherence of chunks
- Test with representative documents
Evaluate and iterate
- Measure retrieval precision and recall
- Monitor processing latency and resource usage
- Optimize based on specific use case requirements

Reference detailed implementation guidelines in references/implementation.md.

Evaluate Performance

Use these metrics to evaluate chunking effectiveness:

Retrieval Precision: Fraction of retrieved chunks that are relevant
Retrieval Recall: Fraction of relevant chunks that are retrieved
End-to-End Accuracy: Quality of final RAG responses
Processing Time: Latency impact on overall system
Resource Usage: Memory and computational costs

Reference detailed evaluation framework in references/evaluation.md.

Examples

Basic Fixed-Size Chunking

from langchain.text_splitter import RecursiveCharacterTextSplitter

# Configure for factoid queries
splitter = RecursiveCharacterTextSplitter(
    chunk_size=256,
    chunk_overlap=25,
    length_function=len
)

chunks = splitter.split_documents(documents)

Structure-Aware Code Chunking

def chunk_python_code(code):
    """Split Python code into semantic chunks"""
    import ast

    tree = ast.parse(code)
    chunks = []

    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.ClassDef)):
            chunks.append(ast.get_source_segment(code, node))

    return chunks

Semantic Chunking with Embeddings

def semantic_chunk(text, similarity_threshold=0.8):
    """Chunk text based on semantic boundaries"""
    sentences = split_into_sentences(text)
    embeddings = generate_embeddings(sentences)

    chunks = []
    current_chunk = [sentences[0]]

    for i in range(1, len(sentences)):
        similarity = cosine_similarity(embeddings[i-1], embeddings[i])

        if similarity < similarity_threshold:
            chunks.append(" ".join(current_chunk))
            current_chunk = [sentences[i]]
        else:
            current_chunk.append(sentences[i])

    chunks.append(" ".join(current_chunk))
    return chunks

Best Practices

Core Principles

Balance context preservation with retrieval precision
Maintain semantic coherence within chunks
Optimize for embedding model constraints
Preserve document structure when beneficial

Implementation Guidelines

Start simple with fixed-size chunking (512 tokens, 10-20% overlap)
Test thoroughly with representative documents
Monitor both accuracy metrics and computational costs
Iterate based on specific document characteristics

Common Pitfalls to Avoid

Over-chunking: Creating too many small, context-poor chunks
Under-chunking: Missing relevant information due to oversized chunks
Ignoring document structure and semantic boundaries
Using one-size-fits-all approach for diverse content types
Neglecting overlap for boundary-crossing information

Constraints and Warnings

Resource Considerations

Semantic and contextual methods require significant computational resources
Late chunking needs long-context embedding models
Complex strategies increase processing latency
Monitor memory usage for large document processing

Quality Requirements

Validate chunk semantic coherence post-processing
Test with domain-specific documents before deployment
Ensure chunks maintain standalone meaning where possible
Implement proper error handling for edge cases

References

Reference detailed documentation in the references/ folder:

strategies.md - Detailed strategy implementations
implementation.md - Complete implementation guidelines
evaluation.md - Performance evaluation framework
tools.md - Recommended libraries and frameworks
research.md - Key research papers and findings
advanced-strategies.md - 11 comprehensive chunking methods
semantic-methods.md - Semantic and contextual approaches
visualization-tools.md - Evaluation and visualization tools

Source

git clone https://github.com/giuseppe-trisciuoglio/developer-kit/blob/main/plugins/developer-kit-ai/skills/chunking-strategy/SKILL.md

View on GitHub

Overview

Provides a practical framework to break large documents into semantically meaningful segments for embeddings and retrieval. It covers multiple chunking strategies (Fixed-Size, Recursive, Structure-Aware, Semantic, Advanced) and a pipeline to select, implement, and evaluate chunks to boost RAG performance.

How This Skill Works

Analyzes document structure, selects an appropriate chunking strategy, configures parameters like chunk size and overlap, and processes content while validating semantic coherence. It emphasizes embedding-aware boundaries, multi-modal content handling, and ongoing evaluation of retrieval metrics and resource costs.

When to Use It

Building RAG systems for large documents
Optimizing vector search performance in databases
Implementing document processing pipelines with multi-modal content
Handling structured documents (Markdown, code, tables, PDFs)
Performance-tuning RAG systems with suboptimal retrieval quality

Quick Start

Step 1: Pre-process documents to identify structure, content types, and multi-modal elements
Step 2: Select a chunking strategy and configure parameters (size, overlap, and strategy-specific settings)
Step 3: Process, validate semantic coherence, and evaluate with representative documents

Best Practices

Pre-process documents to identify structure, content types, and multi-modal elements
Choose chunk size based on the embedding model's context window and adjust overlap (commonly 10-20%)
Select strategy parameters aligned with document type (fixed-size, recursive, structure-aware, semantic, or advanced)
Validate semantic coherence of chunks and test with representative documents
Measure retrieval precision/recall, end-to-end accuracy, and monitor latency and resource use

Example Use Cases

Basic Fixed-Size Chunking: start with 512 tokens and 10-20% overlap, adjust to 256 for factoid or 1024 for analytical queries
Recursive Character Chunking for documents with clear structure by layering: paragraphs → sentences → words with custom separators
Structure-Aware Chunking for Markdown/code/PDFs to preserve functions, sections, and table blocks while validating structure preservation
Semantic Chunking with embedding-based boundary detection and a similarity threshold around 0.8, plus a 3–5 sentence buffer
Advanced Methods like Late Chunking for long-context models and Contextual Retrieval for high-precision needs

Frequently Asked Questions

Add this skill to your agents