Skip to main content

2. Core Subsystems

2.1 Subsystem Inventory​

Batho is composed of tightly-integrated subsystems that work together to provide code intelligence capabilities. Each subsystem has a well-defined responsibility and can be tested independently.

SubsystemModule PathPurposeStatus
AST Extractionbatho/modules/extraction/tree-sitter based multi-language parsingProduction
Code Graphbatho/modules/graph/In-memory hypergraph with adjacency indexingProduction
BSG Map & Compressionbatho/modules/compression/Flat symbol index, priority token budgeting, and rule loadersProduction
Dependency Indexerbatho/modules/dependency/stdlib and installed virtual environment dependency indexerProduction
Integrity Verificationbatho/modules/integrity/Database checker, repair engine, and report generationProduction
Storage Registrybatho/modules/storage/Arrow IPC database managerProduction
CLI Command Suitebatho/cli/Command interface parsing and subcommand orchestrationProduction

2.2 Technology Stack​

Runtime Environment​

LayerTechnologyVersion
Language RuntimePython3.11+
AST Parsingtree-sitter0.25+
Language Packtree-sitter-language-packLatest
ConfigurationPydantic2.x

Interface Layer​

LayerTechnologyPurpose
CLI Frameworkargparse (stdlib)Command-line interface
Data ExportJSON / PrettyProgrammatic data output

Data Layer​

LayerTechnologyPurpose
Registry / Cachemsgpack + Arrow IPCDurable database storage and high-speed memory-mapping
Artifact BundleZIP ZSTD ArchiveTransport packaging format

Development & Testing​

LayerTechnologyPurpose
Testingpytest + pytest-cov8.x / 5.x
Build TooluvLatest
Lintingruff, mypyCode quality

2.3 Subsystem Interactions​

Subsystems communicate through well-defined interfaces:

Figure 4: Subsystem Interactions - Dependency graph showing how Batho subsystems communicate through well-defined interfaces.

2.4 Data Flow Between Subsystems​

  1. Extraction Phase: CLI → Extractor → Cache + Graph
  2. Resolution Phase: Graph → SymbolIndex → Graph (cross-file resolution)
  3. Intelligence Phase: Graph → BSG → Rules → BSG (semantic tagging)
  4. Storage & Serialization: BSG → Storage Registry (Arrow IPC views)
  5. Output Phase: Storage Registry → CLI (Command output / JSON Export)