Skip to main content

3. Deterministic Code Graph Engine

3.1 Entity Model​

The graph is built on two primitives: Entities and Relationships. This model enables efficient querying and cross-referencing across large codebases.

Entity Types​

TypeDescriptionExample
FUNCTIONStandalone functiondef process_data():
METHODClass/instance methoddef save(self):
CLASSClass definitionclass UserManager:
STRUCTStruct (Rust/Go)type Config struct
INTERFACEInterface/protocolinterface Repository
TRAITRust traittrait Sendable
FIELDAttribute/fieldname: str
ENUMEnumerationenum Status
TYPE_ALIASType aliastype ID = string
CONSTANTConstant declarationconst MAX = 100
MODULEModule/packagepackage main
NAMESPACENamespacenamespace App
ENTRY_POINTProgram entry pointmain()
EXTERNAL_SYMBOLStrict SCIP external reference nodeReference to external packages
SYNTAX_GLUEWhitespace, comments, braces, or non-semantic code segments(For bidirectional reconstruction)

Relationship Types​

TypeDirectionSemantics
IMPORTSFile → ModuleFile imports a module
CALLSEntity → EntityFunction/method invocation
USESEntity → EntityVariable/type usage
INHERITSClass → ClassInheritance
IMPLEMENTSClass → InterfaceInterface implementation
DEFINESFile → EntityContainer definition

3.2 Graph Consistency Model​

The InMemoryGraph ensures deterministic processing through lazy indexing and automatic deduplication:

Figure 5: Graph Consistency Model - Flowchart showing the lazy indexing and consistency validation process in InMemoryGraph.

Key Guarantees:

  • Index built on first neighbors() call.
  • Invalidated on every relationship mutation.
  • Duplicate relationships silently deduplicated via has_relationship().

3.3 Cross-File Resolution​

The SymbolIndex performs a two-pass resolution to enable cross-module references:

Figure 6: Cross-File Resolution Process - Two-pass resolution flow showing how SymbolIndex resolves imports across files.

Resolution Process:

  1. Local pass: Resolve symbols within each file's scope.
  2. Global pass: Match unresolved imports against exported symbols across the repository.
  3. Tracking: Unresolved targets are tagged with unresolved: prefix and tracked for later resolution.

3.4 Example: Cross-File Reference​

Consider a Python project with two files:

models.py:

class User:
def __init__(self, name: str):
self.name = name

services.py:

from models import User

def create_user(name: str) -> User:
return User(name)

Graph Representation:

{
"entities": [
{"id": "models.py::User", "type": "CLASS", "file": "models.py"},
{"id": "models.py::User.__init__", "type": "METHOD", "file": "models.py"},
{"id": "services.py::create_user", "type": "FUNCTION", "file": "services.py"}
],
"relationships": [
{"from": "services.py", "to": "models", "type": "IMPORTS"},
{"from": "services.py::create_user", "to": "models.py::User", "type": "USES"}
]
}

3.5 Bidirectional Traversal & Lossless Reconstruction​

Batho v1.1.0 supports lossless, bidirectional graph-to-code reconstruction, allowing a developer or LLM agent to rebuild the exact source file from the graph.

The Role of SYNTAX_GLUE​

When bsg.bidirectional.enabled is true, the parser identifies not only AST elements (e.g. classes, functions) but also all intervening segments, such as whitespace, braces, skipped comments, and other non-semantic structures. These are emitted as SYNTAX_GLUE entities.

Verification and Integrity​

By retaining complete byte coverage, Batho can reconstruct source code files byte-for-byte. The configuration keys under bsg.bidirectional control this behavior:

  • enabled: Activates bidirectional AST traversal.
  • include_gaps: Emits SYNTAX_GLUE entities to guarantee 100% byte coverage.
  • verify_integrity: Cryptographically compares the SHA-256 hash of the reconstructed file against the original stored hash, throwing an IntegrityError if they differ.
  • storage_view: Specifies if the original raw content is explicitly kept in the storage view.