databuild/docs/design/build-event-log.md
Stuart Axelbrooke ea83610d35
Some checks failed
/ setup (push) Has been cancelled
A lot of refactoring
2025-09-27 15:29:22 -07:00

2.5 KiB

Build Event Log (BEL)

Purpose: Store build events and provide efficient cross-graph coordination via a minimal, append-only event stream.

Architecture

  • Uses event sourcing / CQRS philosophy.
  • BELs are only ever written to by graph processes (e.g. CLI or service), not the jobs themselves.
  • Three-layer architecture:
    1. Storage Layer: Append-only event storage with sequential scanning
    2. Query Engine Layer: App-layer aggregation for entity queries (partition status, build summaries, etc.)
    3. Client Layer: CLI, Service, Dashboard consuming aggregated views
  • Cross-graph coordination via minimal GraphService API that supports event streaming since a given index
  • Storage backends focus on efficient append + sequential scan operations (file-based, SQLite, Postgres, Delta Lake)

Correctness Strategy

  • Access layer will evaluate events requested to be written, returning an error if the event is not a correct next. state based on the involved component's governing state diagram.
  • Events are versioned, with each versions' schemas stored in databuild.proto.

Storage Layer Interface

Minimal append-only interface optimized for sequential scanning:

trait BELStorage {
    fn append_event(&self, event: BuildEvent) -> Result<i64>; // returns event index
    fn list_events(&self, since_idx: i64, filter: EventFilter, limit: i64) -> Result<EventPage>;
}

Where EventFilter is defined in databuild.proto as:

message EventFilter {
  repeated string partition_refs = 1;        // Exact partition matches
  repeated string partition_patterns = 2;    // Glob patterns like "data/users/*"
  repeated string job_labels = 3;            // Job-specific events
  repeated string job_run_ids = 4;           // Job run events
}

The data build state is then built on top of this, as a reducer over the BEL event stream.

Cross-Graph Coordination

Graphs coordinate via the GraphService API for efficient event streaming:

trait GraphService {
    async fn list_events(&self, since_idx: i64, filter: EventFilter, limit: i64) -> Result<EventPage>;
}

This enables:

  • Event-driven reactivity: Downstream graphs react within seconds of upstream partition availability
  • Efficient subscriptions: Only scan events for relevant partitions
  • Reliable coordination: HTTP polling avoids event-loss issues of streaming APIs