databuild/plans/build-event-log.md

1.9 KiB

Build Event Log Design

The foundation of persistence for DataBuild is the build event log, a fact table recording events related to build requests, partitions, and jobs. Each graph has exactly one build event log, upon which other views (potentially materialized) rely and aggregate, e.g. powering the partition liveness catalog and enabling delegation to in-progress partition builds.

1. Schema

// Partition lifecycle states
enum PartitionStatus {
  PARTITION_UNKNOWN = 0;
  PARTITION_REQUESTED = 1;     // Partition requested but not yet scheduled
  PARTITION_SCHEDULED = 2;     // Job scheduled to produce this partition
  PARTITION_BUILDING = 3;      // Job actively building this partition
  PARTITION_AVAILABLE = 4;     // Partition successfully built and available
  PARTITION_FAILED = 5;        // Partition build failed
  PARTITION_STALE = 6;         // Partition exists but upstream dependencies changed
  PARTITION_DELEGATED = 7;     // Request delegated to existing build
}

// Job lifecycle
enum JobStatus {
  // TODO implement me
}

// Individual partition activity event
message BuildEvent {
  // TODO implement me
}

Build events are practically job events, as they are the unit of work, but they also represent progress towards building specific partitions and their downstreams. One build request ID represents the literal request to the service (potentially accepting a provided build request ID). The expectation is that most build requests involve multiple partitions, and we should be able to see the tree structure over time to see jobs succeeding and progress towards the requested partition being built. Individual job runs should have their own ID allowing them to be referenced later.

TODO narrative

2. Persistence

TODO narrative + design, with requirements:

  • Should target postgres, sqlite, and delta tables

3. Access Layer

TODO narrative + design