ChunkHound: Fast Local-First Code Semantic Search and Code Intelligence

ChunkHound is a local-first codebase intelligence tool that indexes repositories and conducts code research without sending source code to external services. The project combines a research-backed cAST algorithm with semantic and regex search, multi-hop discovery of code relationships, and optional LLM-driven code research to surface architecture, patterns, and institutional knowledge at scale.

Overview

ChunkHound targets the gap between simple keyword search and heavyweight knowledge-graph approaches by offering semantic + regex + code research capabilities with an emphasis on local-first privacy and offline operation. It integrates with the Model Context Protocol via the MCP spec (https://spec.modelcontextprotocol.io/) and exposes features intended for large monorepos, security-sensitive codebases, and air-gapped environments.

Key features

cAST Algorithm (research paper: https://arxiv.org/pdf/2506.15655) for semantic code chunking
Multi-Hop Semantic Search to discover indirect, meaningful links between code elements (details at https://chunkhound.github.io/under-the-hood/#multi-hop-semantic-search)
Semantic search for natural-language queries and regex search that operates without API keys
Local-first design with real-time indexing: automatic file watching, smart diffs, and seamless branch switching
MCP integration enabling interoperability with tools such as Claude, VS Code, Cursor, Windsurf, and Zed
Support for 30 languages via Tree-sitter and custom parsers, including Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, C#, Swift, Kotlin, plus configuration formats like JSON, YAML, TOML and text-based files including PDF

How it works (brief)

ChunkHound indexes code locally and constructs semantically meaningful chunks using its cAST approach. Indexed data powers semantic queries and supports multi-hop reasoning to reveal relationships beyond direct textual matches. For deeper code research, LLMs can be connected optionally; regex search remains available without any keys.

Installation and quickstart

Install the uv package manager (https://docs.astral.sh/uv/) if needed.
Install ChunkHound with uv tool install chunkhound.
Create a .chunkhound.json in the project root configuring embeddings and LLM provider. Example providers listed include VoyageAI, OpenAI, or local options like Ollama for embeddings, and Claude Code CLI or Codex CLI for LLM-driven research. Note that embeddings/LLM keys are optional for regex search workflows.
Index the repository with chunkhound index. Full configuration and IDE integration instructions are available in the docs.

Requirements and integrations

Python 3.10+
Optional API keys for embeddings/LLMs (VoyageAI recommended) — regex search works without keys
Works with MCP-enabled clients and tooling for a smoother developer experience

Ideal use cases

ChunkHound is positioned for:

Large, multi-language monorepos with cross-team dependencies
Security-sensitive projects requiring local-only indexing
Offline or air-gapped development environments requiring robust search and research capabilities

Project status and resources

The repository is MIT licensed and active: latest listed release is v4.0.1 (Nov 12, 2025). The codebase is predominantly Python (98.0%) and shows community involvement across a dozen contributors. Comprehensive guides, quickstarts, and architecture deep dives are hosted on the documentation site at https://chunkhound.github.io.

For full technical details and the project repository, see the original source: https://github.com/chunkhound/chunkhound