Batho Documentation
Batho (Bidirectional AST Traversal & Hypergraph Orchestrator) is a deterministic, production-grade code intelligence engine that transforms raw codebases into queryable, time-aware structured hypergraphs.
What Batho Does
| Capability | Description |
|---|---|
| AST Extraction | Parse 40+ languages via tree-sitter into structured entities and relationships |
| Code Graph | Build in-memory hypergraphs with cross-file symbol resolution |
| BSG Compression | Compress code intelligence into token-budgeted formats for LLM injection |
| Time Machine | Snapshot, diff, and incrementally patch code intelligence over time |
| Arrow IPC Bundle export | Export optimized hypergraphs and indices as Arrow IPC bundles for downstream integration |
Quick Links
- Getting Started — Install and run Batho in 30 seconds
- Whitepaper — Deep technical reference for every subsystem
- CLI Reference — Complete command documentation
- GitHub — Source code and issues
- PyPI — Install from Python Package Index
Architecture at a Glance
Architecture diagram showing Batho's data flow pipeline: Source code and batho.yaml feed into the AST Extractor, which builds the InMemoryGraph, is structured into a BSGMap, and exported as Arrow IPC or JSON.
Figure: Batho System Architecture - High-level data flow from source inputs through the core engine to Arrow IPC / JSON outputs.
Status
| Metric | Value |
|---|---|
| Supported Languages | 40+ via tree-sitter |
| Context Compression | Up to 10x for LLM injection |
| Incremental Patch Speed | 10–100x faster than full re-index |
| Test Coverage | 381 automated tests |
| Cache Hit Rate | >95% on typical PR-sized changes |
| Snapshot Retention | 90 days default, configurable |
| Max Indexed Files | 200,000 per repository |
Ready to dive in? Start with the Quick Start Guide.