Get the FREE Ultimate OpenClaw Setup Guide →

semantic-search

Scanned
npx machina-cli add skill sagarmk/beacon-plugin/semantic-search --openclaw
Files (1)
SKILL.md
2.4 KB

Hybrid Code Search (Beacon)

This repo has a Beacon hybrid search index combining semantic embeddings, BM25 keyword matching, and identifier boosting. Beacon is enforced as the default search — grep is automatically intercepted and redirected to Beacon for queries it handles better.

How to search

node ${CLAUDE_PLUGIN_ROOT}/scripts/search.js "<query>"

Options

  • --top-k N — number of results (default: 10)
  • --threshold F — minimum score cutoff (default: 0.35)
  • --no-hybrid — disable hybrid, use pure vector search only

Multi-query batch

node ${CLAUDE_PLUGIN_ROOT}/scripts/search.js "auth flow" "session handling" "token refresh"

Single HTTP round-trip for all queries. Returns grouped results.

Output

JSON array of matches, each with:

  • file — file path
  • lines — line range (e.g. "45-78")
  • similarity — vector cosine similarity
  • score — final hybrid score (when hybrid enabled)
  • preview — first 300 chars of matched chunk

Grep intercept behavior

Grep is denied and redirected to Beacon unless one of these conditions is met:

Grep passes through when...Example
Pattern has regex metacharactersfunction\s+\w+
Targets a specific filepath: "src/lib/db.js"
output_mode is "count"Counting occurrences
Pattern is <= 3 charactersfs, db
Dotted identifierfs.readFileSync, path.join
Path-like pattern (contains / or \)src/components
output_mode is "content"Viewing matching lines
Quoted string literal"use strict", 'Content-Type'
Annotation/marker patternTODO, FIXME, @param, #pragma
URL-like patternhttp://, localhost:3000
Beacon index is unhealthyDB missing, empty, dimension mismatch
Intercept disabled via configintercept.enabled: false

To disable interception entirely, set intercept.enabled: false in .claude/beacon.json.

Workflow

  1. Search with Beacon → get candidate files + line ranges with scores
  2. Read top 2-3 files at the indicated line ranges for full context
  3. If needed, grep within those files for specifics (imports, call sites)
  4. Answer the user with file:line citations

Source

git clone https://github.com/sagarmk/beacon-plugin/blob/main/skills/semantic-search/SKILL.mdView on GitHub

Overview

Beacon’s Hybrid Code Search blends semantic embeddings with BM25 keyword matching and identifier boosting to surface relevant code across files. It is enforced as the default search when the Beacon index is healthy, intercepting grep requests and routing them to Beacon.

How This Skill Works

The hybrid index combines semantic embeddings, BM25 keyword scoring, and identifier boosting to rank code results. Queries are executed via node ${CLAUDE_PLUGIN_ROOT}/scripts/search.js with options for top-k, threshold, and an optional pure vector mode (--no-hybrid). Results are returned as a JSON array where each item includes file, lines, similarity, score, and preview.

When to Use It

  • You want to find semantically similar code or usage examples across a codebase
  • You need exact keywords or identifiers boosted by BM25 to surface precise results
  • You prefer the default Beacon search to intercept and replace grep for supported queries
  • You want to batch multiple queries in one HTTP round trip
  • You need results presented as file path, line range, similarity, score, and a preview

Quick Start

  1. Step 1: Run a query with the search script using a natural language or code term
  2. Step 2: Optional: tune results with --top-k and --threshold, or disable hybrid with --no-hybrid
  3. Step 3: Read the top 2-3 files at the indicated lines for full context and refine your query if needed

Best Practices

  • Start with natural language queries to leverage semantic matching
  • Include specific function names, identifiers, or paths to boost BM25 relevance
  • Tune --top-k and --threshold to balance precision and recall
  • Review the top 2-3 files and their indicated line ranges for full context
  • Use multi-query batches to fetch several queries efficiently

Example Use Cases

  • Find debounce function usage across the repo
  • Trace the auth flow involving session handling and token refresh
  • Locate getUserToken usage in client and server code
  • Search for HTTP request construction in network utilities
  • Identify path.join usage in utility modules

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers