Skip to main content

1. Architecture Overview

1.1 High-Level System Architecture

Batho's architecture follows a layered approach with clear separation between extraction, indexing, intelligence, and output layers. The system is designed for deterministic processing, enabling reliable caching and incremental updates.

Figure 2: High-Level System Architecture - Flowchart showing the layered architecture from source inputs through the core engine to output interfaces. Components include Source Inputs (Git Repository, batho.yaml, hooks.yaml), Batho Core Engine (AST Extractor, InMemoryGraph, AST Cache, SymbolIndex, IncrementalGraphUpdater), Intelligence Layer (BSGMap, BSG Rule Plugins), and Output Interfaces (Time Machine Snapshots, Web Dashboard, Artifact Bridge, batho CLI).

Figure 2: High-Level System Architecture - Detailed component view showing the layered architecture from source inputs through the core engine to output interfaces.

1.2 Data Flow Pipeline

The data flow pipeline ensures deterministic processing with built-in caching and validation:

Figure 3: Data Flow Pipeline - Sequence diagram showing the deterministic indexing process with caching and validation steps. Flow: User triggers batho CLI index command, CLI discovers files respecting gitignore, Extractor checks cache using mtime and SHA-256 hash, parallel extraction parses files with tree-sitter and emits entities to InMemoryGraph, Graph resolves imports via SymbolIndex, BSGMap builds flat symbol index, Rule Plugins apply semantic overlay, Snapshot Store persists with UUID and timestamp, CLI returns snapshot ID and updates index.json.

Figure 3: Data Flow Pipeline - Sequence diagram showing the deterministic indexing process with caching and validation steps.

1.3 Component Responsibilities

Core Engine Components

ComponentPurposeKey Features
AST ExtractorMulti-language parsing via tree-sitter40+ language support, parallel processing, mtime tracking
InMemoryGraphHypergraph storageLazy adjacency indexing, relationship deduplication, cross-file resolution
AST CachePersistent entity cacheSQLite-backed, SHA-256 validation, automatic invalidation
SymbolIndexCross-file symbol resolutionTwo-pass resolution, unresolved target tracking
IncrementalUpdaterPatch applicationDiff-based updates, chain validation, rollback support

Intelligence Layer Components

ComponentPurposeKey Features
BSGMapStructured graph representationFlat symbol index, priority scoring, rendering modes
Rule PluginsSemantic analysisYAML-defined rules, plugin architecture, tag-based annotation

1.4 Output Interfaces

InterfaceTransportPurpose
CLITerminalDirect control, scripting, automation
DashboardHTTP (port 8080)Interactive exploration, visualization
Bridge (REST)HTTPProgrammatic access, CI/CD integration
Bridge (MCP)stdio/SSELLM context provisioning
SnapshotsJSON filesTime-travel, audit trail, backup