10. Performance & Scalability

10.1 Benchmarks

Performance metrics from production workloads:

Dimension	Strategy	Limit
Files	Parallel extraction + caching	200,000 default
Workers	CPU × 2, capped at 32	Auto-detected
File size	Configurable max (default 500KB)	Per-file
Snapshots	Deduplication + retention policy	500 default
Patches	Chain compression + retention	5,000 default

Repository Size	CPU	Memory	Disk
Small (≤10K files)	2 cores	512MB	1GB
Medium (10K-50K)	4 cores	1GB	5GB
Large (50K-200K)	8+ cores	4GB+	20GB+

The caching strategy minimizes redundant work:

Figure 20: Cache Strategy - Flowchart showing the caching logic that minimizes redundant parsing through mtime and SHA-256 validation.

Layer	Technology	TTL	Purpose
AST Cache	SQLite	90 days	Parsed entity cache
Symbol Cache	SQLite	90 days	Cross-file resolution
BSG Cache	JSON files	90 days	Rendered graphs
Snapshot Cache	JSON files	Configurable	Time-travel snapshots

# batho.yaml
indexer:
  max_workers: 8  # Default: CPU count × 2
  batch_size: 100  # Files per batch

# Limit memory usage
batho index --root . --max-memory 2G

# Enable streaming mode for large repos
batho index --root . --stream