ChunkHound is a local-first codebase intelligence tool that indexes repositories and conducts code research without sending source code to external services. The project combines a research-backed **cAST algorithm** with semantic and regex search, multi-hop discovery of code relationships, and optional LLM-driven code research to surface architecture, patterns, and institutional knowledge at scale.
Overview ChunkHound targets the gap between simple keyword search and heavyweight knowledge-graph approaches by offering **semantic + regex + code research** capabilities with an emphasis on local-first privacy and offline operation. It integrates with the Model Context Protocol via the MCP spec (https://spec.modelcontextprotocol.io/) and exposes features intended for large monorepos, security-sensitive codebases, and air-gapped environments.
Key features - **cAST Algorithm** (research paper: https://arxiv.org/pdf/2506.15655) for semantic code chunking - **Multi-Hop Semantic Search** to discover indirect, meaningful links between code elements (details at https://chunkhound.github.io/under-the-hood/#multi-hop-semantic-search) - **Semantic search** for natural-language queries and **regex search** that operates without API keys - **Local-first** design with real-time indexing: automatic file watching, smart diffs, and seamless branch switching - **MCP integration** enabling interoperability with tools such as Claude, VS Code, Cursor, Windsurf, and Zed - Support for **30 languages** via Tree-sitter and custom parsers, including **Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, C#, Swift, Kotlin**, plus configuration formats like JSON, YAML, TOML and text-based files including PDF
How it works (brief) ChunkHound indexes code locally and constructs semantically meaningful chunks using its cAST approach. Indexed data powers semantic queries and supports multi-hop reasoning to reveal relationships beyond direct textual matches. For deeper code research, LLMs can be connected optionally; regex search remains available without any keys.
Installation and quickstart - Install the uv package manager (https://docs.astral.sh/uv/) if needed. - Install ChunkHound with `uv tool install chunkhound`. - Create a `.chunkhound.json` in the project root configuring embeddings and LLM provider. Example providers listed include **VoyageAI**, **OpenAI**, or local options like **Ollama** for embeddings, and **Claude Code CLI** or **Codex CLI** for LLM-driven research. Note that embeddings/LLM keys are optional for regex search workflows. - Index the repository with `chunkhound index`. Full configuration and IDE integration instructions are available in the docs.
Requirements and integrations - **Python 3.10+** - Optional API keys for embeddings/LLMs (VoyageAI recommended) — regex search works without keys - Works with MCP-enabled clients and tooling for a smoother developer experience
Ideal use cases ChunkHound is positioned for: - Large, multi-language monorepos with cross-team dependencies - Security-sensitive projects requiring local-only indexing - Offline or air-gapped development environments requiring robust search and research capabilities
Project status and resources The repository is MIT licensed and active: latest listed release is **v4.0.1 (Nov 12, 2025)**. The codebase is predominantly Python (98.0%) and shows community involvement across a dozen contributors. Comprehensive guides, quickstarts, and architecture deep dives are hosted on the documentation site at https://chunkhound.github.io.
For full technical details and the project repository, see the original source: https://github.com/chunkhound/chunkhound
