ChunkHound: Fast Local-First Code Semantic Search and Code Intelligence

ChunkHound indexes repositories locally using a research-backed cAST algorithm to create semantic chunks, offering semantic and regex search plus multi-hop code discovery. Optional LLM integrations and MCP support enable deep, private code research for large monorepos.

tool cover

TL;DR

  • Core: local-first codebase intelligence that indexes repos, combining cAST Algorithm (https://arxiv.org/pdf/2506.15655) with semantic + regex search and optional LLM-driven code research
  • Multi-Hop Semantic Search to surface indirect code relationships; MCP integration (https://spec.modelcontextprotocol.io/) for interoperability with Claude, VS Code, Cursor, Windsurf, Zed (details: https://chunkhound.github.io/under-the-hood/#multi-hop-semantic-search)
  • Privacy/offline: local-only indexing for air-gapped or security-sensitive projects; regex search operates without API keys
  • Language support: 30 languages via Tree-sitter and custom parsers, incl. Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, C#, Swift, Kotlin, plus JSON/YAML/TOML and PDFs
  • Install/quickstart: install uv (https://docs.astral.sh/uv/), run uv tool install chunkhound, add .chunkhound.json to configure embeddings/LLM providers (VoyageAI, OpenAI, Ollama, Claude Code CLI, Codex CLI); embeddings/LLM keys optional

ChunkHound is a local-first codebase intelligence tool that indexes repositories and conducts code research without sending source code to external services. The project combines a research-backed cAST algorithm with semantic and regex search, multi-hop discovery of code relationships, and optional LLM-driven code research to surface architecture, patterns, and institutional knowledge at scale.

Overview

ChunkHound targets the gap between simple keyword search and heavyweight knowledge-graph approaches by offering semantic + regex + code research capabilities with an emphasis on local-first privacy and offline operation. It integrates with the Model Context Protocol via the MCP spec (https://spec.modelcontextprotocol.io/) and exposes features intended for large monorepos, security-sensitive codebases, and air-gapped environments.

Key features

  • cAST Algorithm (research paper: https://arxiv.org/pdf/2506.15655) for semantic code chunking
  • Multi-Hop Semantic Search to discover indirect, meaningful links between code elements (details at https://chunkhound.github.io/under-the-hood/#multi-hop-semantic-search)
  • Semantic search for natural-language queries and regex search that operates without API keys
  • Local-first design with real-time indexing: automatic file watching, smart diffs, and seamless branch switching
  • MCP integration enabling interoperability with tools such as Claude, VS Code, Cursor, Windsurf, and Zed
  • Support for 30 languages via Tree-sitter and custom parsers, including Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, C#, Swift, Kotlin, plus configuration formats like JSON, YAML, TOML and text-based files including PDF

How it works (brief)

ChunkHound indexes code locally and constructs semantically meaningful chunks using its cAST approach. Indexed data powers semantic queries and supports multi-hop reasoning to reveal relationships beyond direct textual matches. For deeper code research, LLMs can be connected optionally; regex search remains available without any keys.

Installation and quickstart

  • Install the uv package manager (https://docs.astral.sh/uv/) if needed.
  • Install ChunkHound with uv tool install chunkhound.
  • Create a .chunkhound.json in the project root configuring embeddings and LLM provider. Example providers listed include VoyageAI, OpenAI, or local options like Ollama for embeddings, and Claude Code CLI or Codex CLI for LLM-driven research. Note that embeddings/LLM keys are optional for regex search workflows.
  • Index the repository with chunkhound index. Full configuration and IDE integration instructions are available in the docs.

Requirements and integrations

  • Python 3.10+
  • Optional API keys for embeddings/LLMs (VoyageAI recommended) — regex search works without keys
  • Works with MCP-enabled clients and tooling for a smoother developer experience

Ideal use cases

ChunkHound is positioned for:

  • Large, multi-language monorepos with cross-team dependencies
  • Security-sensitive projects requiring local-only indexing
  • Offline or air-gapped development environments requiring robust search and research capabilities

Project status and resources

The repository is MIT licensed and active: latest listed release is v4.0.1 (Nov 12, 2025). The codebase is predominantly Python (98.0%) and shows community involvement across a dozen contributors. Comprehensive guides, quickstarts, and architecture deep dives are hosted on the documentation site at https://chunkhound.github.io.

For full technical details and the project repository, see the original source: https://github.com/chunkhound/chunkhound

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community