1. Architecture Overview
1.1 High-Level System Architecture
Batho's architecture follows a layered approach with clear separation between extraction, indexing, intelligence, and output layers. The system is designed for deterministic processing, enabling reliable caching and incremental updates.
Figure 2: High-Level System Architecture - Flowchart showing the layered architecture from source inputs through the core engine to output interfaces. Components include Source Inputs (Git Repository, batho.yaml, hooks.yaml), Batho Core Engine (AST Extractor, InMemoryGraph, AST Cache, SymbolIndex, IncrementalGraphUpdater), Intelligence Layer (BSGMap, BSG Rule Plugins), and Output Interfaces (Time Machine Snapshots, Web Dashboard, Artifact Bridge, batho CLI).
Figure 2: High-Level System Architecture - Detailed component view showing the layered architecture from source inputs through the core engine to output interfaces.
1.2 Data Flow Pipeline
The data flow pipeline ensures deterministic processing with built-in caching and validation:
Figure 3: Data Flow Pipeline - Sequence diagram showing the deterministic indexing process with caching and validation steps. Flow: User triggers batho CLI index command, CLI discovers files respecting gitignore, Extractor checks cache using mtime and SHA-256 hash, parallel extraction parses files with tree-sitter and emits entities to InMemoryGraph, Graph resolves imports via SymbolIndex, BSGMap builds flat symbol index, Rule Plugins apply semantic overlay, Snapshot Store persists with UUID and timestamp, CLI returns snapshot ID and updates index.json.
Figure 3: Data Flow Pipeline - Sequence diagram showing the deterministic indexing process with caching and validation steps.
1.3 Component Responsibilities
Core Engine Components
| Component | Purpose | Key Features |
|---|---|---|
| AST Extractor | Multi-language parsing via tree-sitter | 40+ language support, parallel processing, mtime tracking |
| InMemoryGraph | Hypergraph storage | Lazy adjacency indexing, relationship deduplication, cross-file resolution |
| AST Cache | Persistent entity cache | SQLite-backed, SHA-256 validation, automatic invalidation |
| SymbolIndex | Cross-file symbol resolution | Two-pass resolution, unresolved target tracking |
| IncrementalUpdater | Patch application | Diff-based updates, chain validation, rollback support |
Intelligence Layer Components
| Component | Purpose | Key Features |
|---|---|---|
| BSGMap | Structured graph representation | Flat symbol index, priority scoring, rendering modes |
| Rule Plugins | Semantic analysis | YAML-defined rules, plugin architecture, tag-based annotation |
1.4 Output Interfaces
| Interface | Transport | Purpose |
|---|---|---|
| CLI | Terminal | Direct control, scripting, automation |
| Dashboard | HTTP (port 8080) | Interactive exploration, visualization |
| Bridge (REST) | HTTP | Programmatic access, CI/CD integration |
| Bridge (MCP) | stdio/SSE | LLM context provisioning |
| Snapshots | JSON files | Time-travel, audit trail, backup |