codectx — Codebase context compiler for AI agents

Feed LLMs signal,
not noise.

codectx compiles your repository into a structured context file for AI agents — ranking files by dependency graph centrality, compressing to a token budget, and emitting a document an agent can reason from immediately.

$ pip install codectx

or: uv add codectx · pipx install codectx

The problem

When you dump a repository into an LLM context window, you get files in filesystem order — alphabetical, arbitrary, disconnected. The model sees tests before the modules they test, utility helpers before the architecture they support, config files before the code that reads them.

Naive context dumps waste the most valuable positions in the context window on noise. Most files in any codebase are boilerplate, test fixtures, and auto-generated code — none of which helps an agent understand the system.

Arbitrarily truncating at a token limit doesn't help either. You might cut off the core module and keep a lockfile.

The fix

codectx builds a dependency graph of your repository, then scores every file by its fan-in centrality — how many other files import it. Files that everything depends on rank highest.

Git commit frequency, distance from entry points, and (optionally) semantic similarity to your query combine into a composite score. The top 15% of files get full AST-derived structured summaries. The next 30% get function signatures. The rest get one-liners.

The output is not a source dump. It is a compiled document an agent can navigate from the first token — architecture first, then core modules, then periphery.

Before vs after

cat **/*.py | llm naive

1# conftest.py — pytest fixtures
2import pytest
3from myapp.db import engine
4
5# setup.py — package config
6from setuptools import setup
7setup(name='myapp', ...)
8
9# __init__.py
10pass
11
12# ... 40 more files in random order
13# core/engine.py buried at line 4,847

CONTEXT.md codectx

## ARCHITECTURE
Request processing engine. 3 subsystems.

## ENTRY_POINTS
### `cli.py` (tier 1, score 0.94)
Full source — 312 lines

## CORE_MODULES
### `core/engine.py`
Purpose: Main execution engine.
Depends on: core.models, cache
Functions: process(), shutdown()

## DEPENDENCY_GRAPH · RANKED_FILES · PERIPHERY

Repository	Naive tokens	codectx tokens	Reduction
fastapi	224k	78k	64.9%
requests	41k	6k	84.7%
typer	80k	35k	55.4%
rich	354k	28k	92%
httpx	63k	6k	89.5%

Repository

Naive tokens

codectx tokens

Reduction

fastapi

224k

78k

64.9%

requests

41k

84.7%

typer

80k

35k

55.4%

rich

354k

28k

92%

httpx

63k

89.5%

Pipeline

🚶

Walker

scan files, apply .gitignore + .ctxignore

🌳

Parser

extract imports & symbols via tree-sitter

🕸

Graph

build dependency graph with rustworkx

📊

Ranker

score by fan-in, git frequency, proximity

🗜

Compressor

assign tiers, enforce token budget

📄

Formatter

emit structured CONTEXT.md

Feed LLMs signal,
not noise.

The problem

The fix

Before vs after

Benchmarks — real repos

Pipeline

Feed LLMs signal,not noise.

The problem

The fix

Before vs after

Benchmarks — real repos

Pipeline

Feed LLMs signal,
not noise.