databuild/docs/design/build-event-log.md
Stuart Axelbrooke ea83610d35
Some checks failed
/ setup (push) Has been cancelled
A lot of refactoring
2025-09-27 15:29:22 -07:00

55 lines
2.5 KiB
Markdown

# Build Event Log (BEL)
Purpose: Store build events and provide efficient cross-graph coordination via a minimal, append-only event stream.
## Architecture
- Uses [event sourcing](https://martinfowler.com/eaaDev/EventSourcing.html) /
[CQRS](https://www.wikipedia.org/wiki/cqrs) philosophy.
- BELs are only ever written to by graph processes (e.g. CLI or service), not the jobs themselves.
- **Three-layer architecture:**
1. **Storage Layer**: Append-only event storage with sequential scanning
2. **Query Engine Layer**: App-layer aggregation for entity queries (partition status, build summaries, etc.)
3. **Client Layer**: CLI, Service, Dashboard consuming aggregated views
- **Cross-graph coordination** via minimal `GraphService` API that supports event streaming since a given index
- Storage backends focus on efficient append + sequential scan operations (file-based, SQLite, Postgres, Delta Lake)
## Correctness Strategy
- Access layer will evaluate events requested to be written, returning an error if the event is not a correct next.
state based on the involved component's governing state diagram.
- Events are versioned, with each versions' schemas stored in [`databuild.proto`](../databuild/databuild.proto).
## Storage Layer Interface
Minimal append-only interface optimized for sequential scanning:
```rust
trait BELStorage {
fn append_event(&self, event: BuildEvent) -> Result<i64>; // returns event index
fn list_events(&self, since_idx: i64, filter: EventFilter, limit: i64) -> Result<EventPage>;
}
```
Where `EventFilter` is defined in `databuild.proto` as:
```protobuf
message EventFilter {
repeated string partition_refs = 1; // Exact partition matches
repeated string partition_patterns = 2; // Glob patterns like "data/users/*"
repeated string job_labels = 3; // Job-specific events
repeated string job_run_ids = 4; // Job run events
}
```
The data build state is then built on top of this, as a reducer over the BEL event stream.
## Cross-Graph Coordination
Graphs coordinate via the `GraphService` API for efficient event streaming:
```rust
trait GraphService {
async fn list_events(&self, since_idx: i64, filter: EventFilter, limit: i64) -> Result<EventPage>;
}
```
This enables:
- **Event-driven reactivity**: Downstream graphs react within seconds of upstream partition availability
- **Efficient subscriptions**: Only scan events for relevant partitions
- **Reliable coordination**: HTTP polling avoids event-loss issues of streaming APIs