ChunkHound: Fast Local-First Code Semantic Search and Code Intelligence

ChunkHound indexes repositories locally using a research-backed cAST algorithm to create semantic chunks, offering semantic and regex search plus multi-hop code discovery. Optional LLM integrations and MCP support enable deep, private code research for large monorepos.

tool cover

TL;DR

  • Core: local-first codebase intelligence that indexes repos, combining **cAST Algorithm** (https://arxiv.org/pdf/2506.15655) with semantic + regex search and optional LLM-driven code research
  • **Multi-Hop Semantic Search** to surface indirect code relationships; MCP integration (https://spec.modelcontextprotocol.io/) for interoperability with Claude, VS Code, Cursor, Windsurf, Zed (details: https://chunkhound.github.io/under-the-hood/#multi-hop-semantic-search)
  • Privacy/offline: local-only indexing for air-gapped or security-sensitive projects; regex search operates without API keys
  • Language support: 30 languages via Tree-sitter and custom parsers, incl. Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, C#, Swift, Kotlin, plus JSON/YAML/TOML and PDFs
  • Install/quickstart: install uv (https://docs.astral.sh/uv/), run `uv tool install chunkhound`, add `.chunkhound.json` to configure embeddings/LLM providers (VoyageAI, OpenAI, Ollama, Claude Code CLI, Codex CLI); embeddings/LLM keys optional

ChunkHound is a local-first codebase intelligence tool that indexes repositories and conducts code research without sending source code to external services. The project combines a research-backed **cAST algorithm** with semantic and regex search, multi-hop discovery of code relationships, and optional LLM-driven code research to surface architecture, patterns, and institutional knowledge at scale.

Overview ChunkHound targets the gap between simple keyword search and heavyweight knowledge-graph approaches by offering **semantic + regex + code research** capabilities with an emphasis on local-first privacy and offline operation. It integrates with the Model Context Protocol via the MCP spec (https://spec.modelcontextprotocol.io/) and exposes features intended for large monorepos, security-sensitive codebases, and air-gapped environments.

Key features - **cAST Algorithm** (research paper: https://arxiv.org/pdf/2506.15655) for semantic code chunking - **Multi-Hop Semantic Search** to discover indirect, meaningful links between code elements (details at https://chunkhound.github.io/under-the-hood/#multi-hop-semantic-search) - **Semantic search** for natural-language queries and **regex search** that operates without API keys - **Local-first** design with real-time indexing: automatic file watching, smart diffs, and seamless branch switching - **MCP integration** enabling interoperability with tools such as Claude, VS Code, Cursor, Windsurf, and Zed - Support for **30 languages** via Tree-sitter and custom parsers, including **Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, C#, Swift, Kotlin**, plus configuration formats like JSON, YAML, TOML and text-based files including PDF

How it works (brief) ChunkHound indexes code locally and constructs semantically meaningful chunks using its cAST approach. Indexed data powers semantic queries and supports multi-hop reasoning to reveal relationships beyond direct textual matches. For deeper code research, LLMs can be connected optionally; regex search remains available without any keys.

Installation and quickstart - Install the uv package manager (https://docs.astral.sh/uv/) if needed. - Install ChunkHound with `uv tool install chunkhound`. - Create a `.chunkhound.json` in the project root configuring embeddings and LLM provider. Example providers listed include **VoyageAI**, **OpenAI**, or local options like **Ollama** for embeddings, and **Claude Code CLI** or **Codex CLI** for LLM-driven research. Note that embeddings/LLM keys are optional for regex search workflows. - Index the repository with `chunkhound index`. Full configuration and IDE integration instructions are available in the docs.

Requirements and integrations - **Python 3.10+** - Optional API keys for embeddings/LLMs (VoyageAI recommended) — regex search works without keys - Works with MCP-enabled clients and tooling for a smoother developer experience

Ideal use cases ChunkHound is positioned for: - Large, multi-language monorepos with cross-team dependencies - Security-sensitive projects requiring local-only indexing - Offline or air-gapped development environments requiring robust search and research capabilities

Project status and resources The repository is MIT licensed and active: latest listed release is **v4.0.1 (Nov 12, 2025)**. The codebase is predominantly Python (98.0%) and shows community involvement across a dozen contributors. Comprehensive guides, quickstarts, and architecture deep dives are hosted on the documentation site at https://chunkhound.github.io.

For full technical details and the project repository, see the original source: https://github.com/chunkhound/chunkhound

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community