14 KiB
Build Event Log Design
The foundation of persistence for DataBuild is the build event log, a fact table recording events related to build requests, partitions, and jobs. Each graph has exactly one build event log, upon which other views (potentially materialized) rely and aggregate, e.g. powering the partition liveness catalog and enabling delegation to in-progress partition builds.
1. Schema
The build event log is an append-only event stream that captures all build-related activity. Each event represents a state change in either a build request, partition, or job lifecycle.
// Partition lifecycle states
enum PartitionStatus {
PARTITION_UNKNOWN = 0;
PARTITION_REQUESTED = 1; // Partition requested but not yet scheduled
PARTITION_SCHEDULED = 2; // Job scheduled to produce this partition
PARTITION_BUILDING = 3; // Job actively building this partition
PARTITION_AVAILABLE = 4; // Partition successfully built and available
PARTITION_FAILED = 5; // Partition build failed
PARTITION_DELEGATED = 6; // Request delegated to existing build
}
// Job execution lifecycle
enum JobStatus {
JOB_UNKNOWN = 0;
JOB_SCHEDULED = 1; // Job scheduled for execution
JOB_RUNNING = 2; // Job actively executing
JOB_COMPLETED = 3; // Job completed successfully
JOB_FAILED = 4; // Job execution failed
JOB_CANCELLED = 5; // Job execution cancelled
}
// Build request lifecycle
enum BuildRequestStatus {
BUILD_REQUEST_UNKNOWN = 0;
BUILD_REQUEST_RECEIVED = 1; // Build request received and queued
BUILD_REQUEST_PLANNING = 2; // Graph analysis in progress
BUILD_REQUEST_EXECUTING = 3; // Jobs are being executed
BUILD_REQUEST_COMPLETED = 4; // All requested partitions built
BUILD_REQUEST_FAILED = 5; // Build request failed
BUILD_REQUEST_CANCELLED = 6; // Build request cancelled
}
// Individual build event
message BuildEvent {
// Event metadata
string event_id = 1; // UUID for this event
int64 timestamp = 2; // Unix timestamp (nanoseconds)
string build_request_id = 3; // UUID of the build request
// Event type and payload (one of)
oneof event_type {
BuildRequestEvent build_request_event = 10;
PartitionEvent partition_event = 11;
JobEvent job_event = 12;
DelegationEvent delegation_event = 13;
}
}
// Build request lifecycle event
message BuildRequestEvent {
BuildRequestStatus status = 1;
repeated PartitionRef requested_partitions = 2;
string message = 3; // Optional status message
}
// Partition state change event
message PartitionEvent {
PartitionRef partition_ref = 1;
PartitionStatus status = 2;
string message = 3; // Optional status message
string job_run_id = 4; // UUID of job run producing this partition (if applicable)
}
// Job execution event
message JobEvent {
string job_run_id = 1; // UUID for this job run
JobLabel job_label = 2; // Job being executed
repeated PartitionRef target_partitions = 3; // Partitions this job run produces
JobStatus status = 4;
string message = 5; // Optional status message
JobConfig config = 6; // Job configuration used (for SCHEDULED events)
repeated PartitionManifest manifests = 7; // Results (for COMPLETED events)
}
// Delegation event (when build request delegates to existing build)
message DelegationEvent {
PartitionRef partition_ref = 1;
string delegated_to_build_request_id = 2; // Build request handling this partition
string message = 3; // Optional message
}
Build events capture the complete lifecycle of composite build requests. A single build request can involve multiple partitions, each potentially requiring different jobs. The event stream allows reconstruction of the full state at any point in time.
Design Principles
Staleness as Planning Concern: Staleness detection and handling occurs during the analysis/planning phase, not during execution. The analyze operation detects partitions that need rebuilding due to upstream changes and includes them in the execution graph. In-progress builds do not react to newly stale partitions - they execute their planned graph to completion.
Delegation as Unidirectional Optimization: When a build request discovers another build is already producing a needed partition, it logs a delegation event and waits for that partition to become available. The delegated-to build request remains unaware of the delegation - it simply continues building its own graph. This eliminates the need for coordination protocols between builds.
2. Persistence
The build event log uses a single build_events table storing serialized protobuf events. This design supports multiple storage backends while maintaining consistency.
Storage Requirements
- PostgreSQL: Primary production backend
- SQLite: Local development and testing
- Delta tables: Future extensibility for analytics workloads
Table Schema
-- Core event metadata
CREATE TABLE build_events (
event_id UUID PRIMARY KEY,
timestamp BIGINT NOT NULL,
build_request_id UUID NOT NULL,
event_type TEXT NOT NULL -- 'build_request', 'partition', 'job', 'delegation'
);
-- Build request lifecycle events
CREATE TABLE build_request_events (
event_id UUID PRIMARY KEY REFERENCES build_events(event_id),
status TEXT NOT NULL, -- BuildRequestStatus enum
requested_partitions TEXT[] NOT NULL,
message TEXT
);
-- Partition lifecycle events
CREATE TABLE partition_events (
event_id UUID PRIMARY KEY REFERENCES build_events(event_id),
partition_ref TEXT NOT NULL,
status TEXT NOT NULL, -- PartitionStatus enum
message TEXT,
job_run_id UUID -- NULL for non-job-related events
);
-- Job execution events
CREATE TABLE job_events (
event_id UUID PRIMARY KEY REFERENCES build_events(event_id),
job_run_id UUID NOT NULL,
job_label TEXT NOT NULL,
target_partitions TEXT[] NOT NULL,
status TEXT NOT NULL, -- JobStatus enum
message TEXT,
config_json TEXT, -- JobConfig as JSON (for SCHEDULED events)
manifests_json TEXT, -- PartitionManifests as JSON (for COMPLETED events)
start_time BIGINT, -- Extracted from config/manifests
end_time BIGINT -- Extracted from config/manifests
);
-- Delegation events
CREATE TABLE delegation_events (
event_id UUID PRIMARY KEY REFERENCES build_events(event_id),
partition_ref TEXT NOT NULL,
delegated_to_build_request_id UUID NOT NULL,
message TEXT
);
-- Indexes for common query patterns
CREATE INDEX idx_build_events_build_request ON build_events(build_request_id, timestamp);
CREATE INDEX idx_build_events_timestamp ON build_events(timestamp);
CREATE INDEX idx_partition_events_partition ON partition_events(partition_ref, timestamp);
CREATE INDEX idx_partition_events_job_run ON partition_events(job_run_id, timestamp) WHERE job_run_id IS NOT NULL;
CREATE INDEX idx_job_events_job_run ON job_events(job_run_id);
CREATE INDEX idx_job_events_job_label ON job_events(job_label, timestamp);
CREATE INDEX idx_job_events_status ON job_events(status, timestamp);
CREATE INDEX idx_delegation_events_partition ON delegation_events(partition_ref, timestamp);
CREATE INDEX idx_delegation_events_delegated_to ON delegation_events(delegated_to_build_request_id, timestamp);
3. Access Layer
The access layer provides a simple append/query interface for build events, leaving aggregation logic to the service layer.
Core Interface
The normalized schema enables both simple event queries and complex analytical queries:
trait BuildEventLog {
// Append new event to the log
async fn append_event(&self, event: BuildEvent) -> Result<(), Error>;
// Query events by build request
async fn get_build_request_events(
&self,
build_request_id: &str,
since: Option<i64>
) -> Result<Vec<BuildEvent>, Error>;
// Query events by partition
async fn get_partition_events(
&self,
partition_ref: &str,
since: Option<i64>
) -> Result<Vec<BuildEvent>, Error>;
// Query events by job run
async fn get_job_run_events(
&self,
job_run_id: &str
) -> Result<Vec<BuildEvent>, Error>;
// Query events in time range
async fn get_events_in_range(
&self,
start_time: i64,
end_time: i64
) -> Result<Vec<BuildEvent>, Error>;
// Execute raw SQL queries (for dashboard and debugging)
async fn execute_query(&self, query: &str) -> Result<QueryResult, Error>;
}
Example Analytical Queries
The normalized schema enables dashboard queries like:
-- Job success rates by label
SELECT job_label,
COUNT(*) as total_runs,
SUM(CASE WHEN status = 'JOB_COMPLETED' THEN 1 ELSE 0 END) as successful_runs,
AVG(end_time - start_time) as avg_duration_ns
FROM job_events
WHERE status IN ('JOB_COMPLETED', 'JOB_FAILED')
GROUP BY job_label;
-- Recent partition builds
SELECT p.partition_ref, p.status, e.timestamp, j.job_label
FROM partition_events p
JOIN build_events e ON p.event_id = e.event_id
LEFT JOIN job_events j ON p.job_run_id = j.job_run_id
WHERE p.status = 'PARTITION_AVAILABLE'
ORDER BY e.timestamp DESC
LIMIT 100;
-- Build request status summary
SELECT br.status, COUNT(*) as count
FROM build_request_events br
JOIN build_events e ON br.event_id = e.event_id
WHERE e.timestamp > extract(epoch from now() - interval '24 hours') * 1000000000
GROUP BY br.status;
The service layer builds higher-level operations on top of both the simple interface and direct SQL access.
4. Core Build Implementation Integration
Command Line Interface
The core build implementation (analyze.rs and execute.rs) will be enhanced with build event logging capabilities through new command line arguments:
# Standard usage with build event logging
./analyze partition_ref1 partition_ref2
./execute --build-event-log sqlite:///tmp/build.db < job_graph.json
# With explicit build request ID for correlation
./analyze --build-event-log postgres://user:pass@host/db --build-request-id 12345678-1234-1234-1234-123456789012
New Command Line Arguments:
--build-event-log <URI>- Specify persistence URI for build events (logging to stdout is implicit)sqlite://path- Persist to SQLite database filepostgres://connection- Persist to PostgreSQL database
--build-request-id <UUID>- Optional build request ID (auto-generated if not provided)
Integration Points
In analyze.rs (Graph Analysis Phase):
- Build Request Lifecycle: Log
BUILD_REQUEST_RECEIVEDwhen analysis starts,BUILD_REQUEST_PLANNINGduring dependency resolution, andBUILD_REQUEST_COMPLETEDwhen analysis finishes - Staleness Detection: Query build event log for existing
PARTITION_AVAILABLEevents to identify non-stale partitions that can be skipped - Delegation Logging: Log
PARTITION_DELEGATEDevents when skipping partitions that are already being built by another request - Job Planning: Log
PARTITION_SCHEDULEDevents for partitions that will be built
In execute.rs (Graph Execution Phase):
- Execution Lifecycle: Log
BUILD_REQUEST_EXECUTINGwhen execution starts - Job Execution Events: Log
JOB_SCHEDULED,JOB_RUNNING,JOB_COMPLETED/FAILEDevents throughout job execution - Partition Status: Log
PARTITION_BUILDINGwhen jobs start,PARTITION_AVAILABLE/FAILEDwhen jobs complete - Build Coordination: Check for concurrent builds before starting partition work to avoid duplicate effort
Non-Stale Partition Handling
The build event log enables intelligent partition skipping:
- During Analysis: Query for recent
PARTITION_AVAILABLEevents to identify partitions that don't need rebuilding - Staleness Logic: Compare partition timestamps with upstream dependency timestamps to determine if rebuilding is needed
- Skip Documentation: Log
PARTITION_DELEGATEDevents with references to the existing build request ID that produced the partition
Bazel Rules Integration
The databuild_graph rule in rules.bzl will be enhanced to propagate build event logging configuration:
databuild_graph(
name = "my_graph",
jobs = [":job1", ":job2"],
lookup = ":job_lookup",
build_event_log = "sqlite:///tmp/builds.db", # New attribute
)
Generated Targets Enhancement:
my_graph_analyze: Receives--build-event-logargumentmy_graph_exec: Receives--build-event-logargumentmy_graph_build: Coordinates build request ID across analyze/execute phases
Implementation Strategy
Phase 1: Infrastructure
- Add
BuildEventLogtrait and implementations for stdout/SQLite/PostgreSQL - Update
databuild.protowith build event schema - Add command line argument parsing to
analyze.rsandexecute.rs
Phase 2: Analysis Integration
- Integrate build event logging into
analyze.rs - Implement staleness detection queries
- Add partition delegation logic
Phase 3: Execution Integration
- Integrate build event logging into
execute.rs - Add job lifecycle event logging
- Implement build coordination checks
Phase 4: Bazel Integration
- Update
databuild_graphrule with build event log support - Add proper argument propagation and request ID correlation
- End-to-end testing with example graphs
Key Benefits
- Stdout Logging: Immediate visibility into build progress with
--build-event-log stdout - Persistent History: Database persistence enables build coordination and historical analysis
- Intelligent Skipping: Avoid rebuilding fresh partitions, significantly improving build performance
- Build Coordination: Prevent duplicate work when multiple builds target the same partitions
- Audit Trail: Complete record of all build activities for debugging and monitoring