ChunkHound is a local-first codebase intelligence tool that indexes repositories and conducts code research without sending source code to external services. The project combines a research-backed cAST algorithm with semantic and regex search, multi-hop discovery of code relationships, and optional LLM-driven code research to surface architecture, patterns, and institutional knowledge at scale.
Overview
ChunkHound targets the gap between simple keyword search and heavyweight knowledge-graph approaches by offering semantic + regex + code research capabilities with an emphasis on local-first privacy and offline operation. It integrates with the Model Context Protocol via the MCP spec (https://spec.modelcontextprotocol.io/) and exposes features intended for large monorepos, security-sensitive codebases, and air-gapped environments.
Key features
- cAST Algorithm (research paper: https://arxiv.org/pdf/2506.15655) for semantic code chunking
- Multi-Hop Semantic Search to discover indirect, meaningful links between code elements (details at https://chunkhound.github.io/under-the-hood/#multi-hop-semantic-search)
- Semantic search for natural-language queries and regex search that operates without API keys
- Local-first design with real-time indexing: automatic file watching, smart diffs, and seamless branch switching
- MCP integration enabling interoperability with tools such as Claude, VS Code, Cursor, Windsurf, and Zed
- Support for 30 languages via Tree-sitter and custom parsers, including Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, C#, Swift, Kotlin, plus configuration formats like JSON, YAML, TOML and text-based files including PDF
How it works (brief)
ChunkHound indexes code locally and constructs semantically meaningful chunks using its cAST approach. Indexed data powers semantic queries and supports multi-hop reasoning to reveal relationships beyond direct textual matches. For deeper code research, LLMs can be connected optionally; regex search remains available without any keys.
Installation and quickstart
- Install the uv package manager (https://docs.astral.sh/uv/) if needed.
- Install ChunkHound with
uv tool install chunkhound. - Create a
.chunkhound.jsonin the project root configuring embeddings and LLM provider. Example providers listed include VoyageAI, OpenAI, or local options like Ollama for embeddings, and Claude Code CLI or Codex CLI for LLM-driven research. Note that embeddings/LLM keys are optional for regex search workflows. - Index the repository with
chunkhound index. Full configuration and IDE integration instructions are available in the docs.
Requirements and integrations
- Python 3.10+
- Optional API keys for embeddings/LLMs (VoyageAI recommended) — regex search works without keys
- Works with MCP-enabled clients and tooling for a smoother developer experience
Ideal use cases
ChunkHound is positioned for:
- Large, multi-language monorepos with cross-team dependencies
- Security-sensitive projects requiring local-only indexing
- Offline or air-gapped development environments requiring robust search and research capabilities
Project status and resources
The repository is MIT licensed and active: latest listed release is v4.0.1 (Nov 12, 2025). The codebase is predominantly Python (98.0%) and shows community involvement across a dozen contributors. Comprehensive guides, quickstarts, and architecture deep dives are hosted on the documentation site at https://chunkhound.github.io.
For full technical details and the project repository, see the original source: https://github.com/chunkhound/chunkhound


