Skip to main content

Batho Documentation

Batho (Bidirectional AST Traversal & Hypergraph Orchestrator) is a deterministic, production-grade code intelligence engine that transforms raw codebases into queryable, time-aware structured hypergraphs.

What Batho Does

CapabilityDescription
AST ExtractionParse 40+ languages via tree-sitter into structured entities and relationships
Code GraphBuild in-memory hypergraphs with cross-file symbol resolution
BSG CompressionCompress code intelligence into token-budgeted formats for LLM injection
Time MachineSnapshot, diff, and incrementally patch code intelligence over time
Arrow IPC Bundle exportExport optimized hypergraphs and indices as Arrow IPC bundles for downstream integration
  • Getting Started — Install and run Batho in 30 seconds
  • Whitepaper — Deep technical reference for every subsystem
  • CLI Reference — Complete command documentation
  • GitHub — Source code and issues
  • PyPI — Install from Python Package Index

Architecture at a Glance

Architecture diagram showing Batho's data flow pipeline: Source code and batho.yaml feed into the AST Extractor, which builds the InMemoryGraph, is structured into a BSGMap, and exported as Arrow IPC or JSON.

Figure: Batho System Architecture - High-level data flow from source inputs through the core engine to Arrow IPC / JSON outputs.

Status

MetricValue
Supported Languages40+ via tree-sitter
Context CompressionUp to 10x for LLM injection
Incremental Patch Speed10–100x faster than full re-index
Test Coverage381 automated tests
Cache Hit Rate>95% on typical PR-sized changes
Snapshot Retention90 days default, configurable
Max Indexed Files200,000 per repository

Ready to dive in? Start with the Quick Start Guide.