v0.3.0 · MIT License · Python 3.10+

Feed LLMs signal,
not noise.

codectx compiles your repository into a structured context file for AI agents — ranking files by dependency graph centrality, compressing to a token budget, and emitting a document an agent can reason from immediately.

$ pip install codectx

or: uv add codectx · pipx install codectx

76% avg token reduction
9 languages supported
120k default token budget
0.3.0 latest version
Total downloads from PyPI total PyPI downloads

The problem

When you dump a repository into an LLM context window, you get files in filesystem order — alphabetical, arbitrary, disconnected. The model sees tests before the modules they test, utility helpers before the architecture they support, config files before the code that reads them.

Naive context dumps waste the most valuable positions in the context window on noise. Most files in any codebase are boilerplate, test fixtures, and auto-generated code — none of which helps an agent understand the system.

Arbitrarily truncating at a token limit doesn't help either. You might cut off the core module and keep a lockfile.

The fix

codectx builds a dependency graph of your repository, then scores every file by its fan-in centrality — how many other files import it. Files that everything depends on rank highest.

Git commit frequency, distance from entry points, and (optionally) semantic similarity to your query combine into a composite score. The top 15% of files get full AST-derived structured summaries. The next 30% get function signatures. The rest get one-liners.

The output is not a source dump. It is a compiled document an agent can navigate from the first token — architecture first, then core modules, then periphery.

Before vs after

cat **/*.py | llm naive
1# conftest.py — pytest fixtures
2import pytest
3from myapp.db import engine
4
5# setup.py — package config
6from setuptools import setup
7setup(name='myapp', ...)
8
9# __init__.py
10pass
11
12# ... 40 more files in random order
13# core/engine.py buried at line 4,847
CONTEXT.md codectx
## ARCHITECTURE
Request processing engine. 3 subsystems.

## ENTRY_POINTS
### `cli.py` (tier 1, score 0.94)
Full source — 312 lines

## CORE_MODULES
### `core/engine.py`
Purpose: Main execution engine.
Depends on: core.models, cache
Functions: process(), shutdown()

## DEPENDENCY_GRAPH · RANKED_FILES · PERIPHERY

Benchmarks — real repos

Repository Naive tokens codectx tokens Reduction
fastapi 224k 78k
64.9%
requests 41k 6k
84.7%
typer 80k 35k
55.4%
rich 354k 28k
92%
httpx 63k 6k
89.5%

Pipeline

🚶
Walker
scan files, apply .gitignore + .ctxignore
🌳
Parser
extract imports & symbols via tree-sitter
🕸
Graph
build dependency graph with rustworkx
📊
Ranker
score by fan-in, git frequency, proximity
🗜
Compressor
assign tiers, enforce token budget
📄
Formatter
emit structured CONTEXT.md