databuild/design/build-event-log.md

3.1 KiB

Build Event Log (BEL)

Purpose: Store build events and provide efficient cross-graph coordination via a minimal, append-only event stream.

Architecture

  • Uses event sourcing / CQRS philosophy.
  • BELs are only ever written to by graph processes (e.g. CLI or service), not the jobs themselves.
  • Three-layer architecture:
    1. Storage Layer: Append-only event storage with sequential scanning
    2. Query Engine Layer: App-layer aggregation for entity queries (partition status, build summaries, etc.)
    3. Client Layer: CLI, Service, Dashboard consuming aggregated views
  • Cross-graph coordination via minimal GraphService API that supports event streaming since a given index
  • Storage backends focus on efficient append + sequential scan operations (file-based, SQLite, Postgres, Delta Lake)

Correctness Strategy

  • Access layer will evaluate events requested to be written, returning an error if the event is not a correct next. state based on the involved component's governing state diagram.
  • Events are versioned, with each versions' schemas stored in databuild.proto.

Storage Layer Interface

Minimal append-only interface optimized for sequential scanning:

#[async_trait]
trait BELStorage {
    async fn append_event(&self, event: BuildEvent) -> Result<i64>; // returns event index
    async fn list_events(&self, since_idx: i64, filter: EventFilter) -> Result<EventPage>;
}

Where EventFilter is defined in databuild.proto as:

message EventFilter {
  repeated string partition_refs = 1;        // Exact partition matches
  repeated string partition_patterns = 2;    // Glob patterns like "data/users/*"
  repeated string job_labels = 3;           // Job-specific events
  repeated string task_ids = 4;             // Task run events
  repeated string build_request_ids = 5;    // Build-specific events
}

Query Engine Interface

App-layer aggregation that scans storage layer events:

struct BELQueryEngine {
    storage: Box<dyn BELStorage>,
    partition_status_cache: Option<PartitionStatusCache>,
}

impl BELQueryEngine {
    async fn get_latest_partition_status(&self, partition_ref: &str) -> Result<Option<PartitionStatus>>;
    async fn get_active_builds_for_partition(&self, partition_ref: &str) -> Result<Vec<String>>;
    async fn get_build_request_summary(&self, build_id: &str) -> Result<BuildRequestSummary>;
    async fn list_build_requests(&self, limit: u32, offset: u32, status_filter: Option<BuildRequestStatus>) -> Result<Vec<BuildRequestSummary>>;
}

Cross-Graph Coordination

Graphs coordinate via the GraphService API for efficient event streaming:

trait GraphService {
    async fn list_events(&self, since_idx: i64, filter: EventFilter) -> Result<EventPage>;
}

This enables:

  • Event-driven reactivity: Downstream graphs react within seconds of upstream partition availability
  • Efficient subscriptions: Only scan events for relevant partitions
  • Reliable coordination: HTTP polling avoids event-loss issues of streaming APIs