ChunkHound: Fast Local-First Code Semantic Search and Code Intelligence

ChunkHound is a local-first codebase intelligence tool that indexes repositories and conducts code research without sending source code to external services. The project combines a research-backed **cAST algorithm** with semantic and regex search, multi-hop discovery of code relationships, and optional LLM-driven code research to surface architecture, patterns, and institutional knowledge at scale.

Overview ChunkHound targets the gap between simple keyword search and heavyweight knowledge-graph approaches by offering semantic + regex + code research capabilities with an emphasis on local-first privacy and offline operation. It integrates with the Model Context Protocol via the MCP spec (https://spec.modelcontextprotocol.io/) and exposes features intended for large monorepos, security-sensitive codebases, and air-gapped environments.

Key features - cAST Algorithm (research paper: https://arxiv.org/pdf/2506.15655) for semantic code chunking - Multi-Hop Semantic Search to discover indirect, meaningful links between code elements (details at https://chunkhound.github.io/under-the-hood/#multi-hop-semantic-search) - Semantic search for natural-language queries and regex search that operates without API keys - Local-first design with real-time indexing: automatic file watching, smart diffs, and seamless branch switching - MCP integration enabling interoperability with tools such as Claude, VS Code, Cursor, Windsurf, and Zed - Support for 30 languages via Tree-sitter and custom parsers, including Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, C#, Swift, Kotlin, plus configuration formats like JSON, YAML, TOML and text-based files including PDF

How it works (brief) ChunkHound indexes code locally and constructs semantically meaningful chunks using its cAST approach. Indexed data powers semantic queries and supports multi-hop reasoning to reveal relationships beyond direct textual matches. For deeper code research, LLMs can be connected optionally; regex search remains available without any keys.

Installation and quickstart - Install the uv package manager (https://docs.astral.sh/uv/) if needed. - Install ChunkHound with `uv tool install chunkhound`. - Create a `.chunkhound.json` in the project root configuring embeddings and LLM provider. Example providers listed include VoyageAI, OpenAI, or local options like Ollama for embeddings, and Claude Code CLI or Codex CLI for LLM-driven research. Note that embeddings/LLM keys are optional for regex search workflows. - Index the repository with `chunkhound index`. Full configuration and IDE integration instructions are available in the docs.

Requirements and integrations - Python 3.10+ - Optional API keys for embeddings/LLMs (VoyageAI recommended) — regex search works without keys - Works with MCP-enabled clients and tooling for a smoother developer experience

Ideal use cases ChunkHound is positioned for: - Large, multi-language monorepos with cross-team dependencies - Security-sensitive projects requiring local-only indexing - Offline or air-gapped development environments requiring robust search and research capabilities

Project status and resources The repository is MIT licensed and active: latest listed release is v4.0.1 (Nov 12, 2025). The codebase is predominantly Python (98.0%) and shows community involvement across a dozen contributors. Comprehensive guides, quickstarts, and architecture deep dives are hosted on the documentation site at https://chunkhound.github.io.

For full technical details and the project repository, see the original source: https://github.com/chunkhound/chunkhound

ChunkHound: Fast Local-First Code Semantic Search and Code Intelligence

TL;DR

Requirements and integrations - Python 3.10+ - Optional API keys for embeddings/LLMs (VoyageAI recommended) — regex search works without keys - Works with MCP-enabled clients and tooling for a smoother developer experience

Ideal use cases ChunkHound is positioned for: - Large, multi-language monorepos with cross-team dependencies - Security-sensitive projects requiring local-only indexing - Offline or air-gapped development environments requiring robust search and research capabilities

Continue the conversation on Slack

Requirements and integrations - **Python 3.10+** - Optional API keys for embeddings/LLMs (VoyageAI recommended) — regex search works without keys - Works with MCP-enabled clients and tooling for a smoother developer experience

Ideal use cases ChunkHound is positioned for: - Large, multi-language monorepos with cross-team dependencies - Security-sensitive projects requiring local-only indexing - Offline or air-gapped development environments requiring robust search and research capabilities

Continue the conversation on Slack

Requirements and integrations - Python 3.10+ - Optional API keys for embeddings/LLMs (VoyageAI recommended) — regex search works without keys - Works with MCP-enabled clients and tooling for a smoother developer experience