Compare commits
108 commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 6b42bdb0ef | |||
| 21633c69c3 | |||
| 8176a8261e | |||
| e221cd8502 | |||
| 421544786f | |||
| 23c3572106 | |||
| 9c6cb11713 | |||
| 17d5987517 | |||
| f353660f97 | |||
| 6cb11af642 | |||
| 368558d9d8 | |||
| d812bb51e2 | |||
| 9a072ff74d | |||
| f7c196c9b3 | |||
| 704ec0b6f3 | |||
| ce8bb92cdb | |||
| d744d2a63f | |||
| 5c720ebc62 | |||
| 4a1ff75ea9 | |||
| f531730a6b | |||
| 14a24ef6d6 | |||
| e7aac32607 | |||
| efed281a5a | |||
| 042526ea8c | |||
| 31db6a00cb | |||
| 375a15d9e9 | |||
| a7ac85917c | |||
| 0d1cac6406 | |||
| 5ac51934ea | |||
| 7ccec59364 | |||
| dfc1d19237 | |||
| 61c13cdcc0 | |||
| c556fec218 | |||
| 5a768e9270 | |||
| 26b085a84e | |||
| 6508809745 | |||
| 7846cd6b86 | |||
| b978be53f5 | |||
| 895e499cc5 | |||
| f71be8482f | |||
| 7134b5e480 | |||
| 32f35ecbd5 | |||
| a5a1be8855 | |||
| 556ccb8a4b | |||
| f14d93da7a | |||
| be2b15de5e | |||
| da23af3227 | |||
| 6f7c6b3318 | |||
| 2084fadbb6 | |||
| cf163b294d | |||
| eb44350865 | |||
| 8e8ff33ef8 | |||
| 01d50dde1b | |||
| c8e2b4fdaf | |||
| 4af41533d4 | |||
| 0c766a381b | |||
| a9b68bfa6a | |||
| a641822ead | |||
| c96d77dd7f | |||
| d8af3c8174 | |||
| 55f51125c3 | |||
| 8208af6605 | |||
| a43e9fb6ea | |||
| eadd23eb63 | |||
| d42bddac90 | |||
| 66ba40e2db | |||
| 9bdd435089 | |||
| 2cf778a07b | |||
| 5361e295e0 | |||
| 75ef722a2c | |||
| bbeceaa015 | |||
| 1bca863be1 | |||
| 3f223829bb | |||
| aa2106ad8c | |||
| 2cd2ce7f7d | |||
| 9559a410d3 | |||
| cb580f83eb | |||
| 1f4138ecc0 | |||
| eeb90d0386 | |||
| 6572d4e3bd | |||
| cfcb201285 | |||
| 7debea96a2 | |||
| fa5a5fa200 | |||
| d7fb2323d8 | |||
| ea85af4d2b | |||
| bea9616227 | |||
| 873f766aa0 | |||
| bc61d8f530 | |||
| ac567240ea | |||
| 022868b7b0 | |||
| 4e28b6048e | |||
| f388f4d86d | |||
| ea83610d35 | |||
| c07cf7cd81 | |||
| b8cfdade16 | |||
| 5484363e52 | |||
| 2be5b016eb | |||
| 9342ae6816 | |||
| 97ddb3ae28 | |||
| 526b826091 | |||
| 2edfe90fd4 | |||
| 2009ac1c12 | |||
| f7ac3c077e | |||
| c2bd4f230c | |||
| cf449529a3 | |||
| a78c6fc5fb | |||
| 8ba4820654 | |||
| bfcf7cdfd2 |
262 changed files with 16625 additions and 34138 deletions
|
|
@ -1 +1 @@
|
|||
8.3.1
|
||||
8.4.2
|
||||
579
.claude/skills/databuild-build-state-semantics/SKILL.md
Normal file
579
.claude/skills/databuild-build-state-semantics/SKILL.md
Normal file
|
|
@ -0,0 +1,579 @@
|
|||
---
|
||||
name: databuild-build-state-semantics
|
||||
description: The core semantics of databuild's build state; conceptual boundaries, responsibilities, and interfaces; the architecture mental model and rationale; read when discussing architecture or design.
|
||||
---
|
||||
|
||||
# Build State Semantics
|
||||
|
||||
To achieve databuild's goal of declarative partitioned data builds (via explicitly stated data dependencies between jobs), databuild employs a "build state" concept that, together with the orchestrator, makes up all of the data catalog and job run scheduling logic needed to produce data based on user wants.
|
||||
|
||||
## Core Mental Model
|
||||
|
||||
DataBuild's BuildState implements **event sourcing** combined with an **Entity Component System (ECS)** pattern:
|
||||
|
||||
- **Immutable event log**: All state changes recorded as events (WantCreateEventV1, JobRunBufferEventV1, etc.)
|
||||
- **Derived mutable state**: BuildState reconstructed by replaying events through state machines
|
||||
- **ECS pattern**: Entities stored in flat collections, relationships via inverted indexes (not nested objects)
|
||||
- **Type-state machines**: Compile-time enforcement of valid state transitions
|
||||
|
||||
**Public interface**:
|
||||
- Query: `get_want()`, `list_partitions()`, `get_partition()`, etc.
|
||||
- Mutation: `handle_event()` processes events and transitions states
|
||||
- No direct state manipulation outside event handling
|
||||
|
||||
**Separation of concerns**:
|
||||
- **BuildState**: Maintains entity state, processes events, provides queries
|
||||
- **Orchestrator**: Polls BuildState, makes scheduling decisions, emits job events
|
||||
|
||||
## Compile-Time Correctness Strategy
|
||||
|
||||
The primary defense against bugs is making invalid states **unrepresentable** at compile time.
|
||||
|
||||
**Type-state pattern**: States encoded in type system, transitions consume self
|
||||
```rust
|
||||
// Can only call .complete() on BuildingState
|
||||
impl PartitionWithState<BuildingState> {
|
||||
pub fn complete(self, job_run_id: String, timestamp: u64) -> PartitionWithState<LiveState> {
|
||||
PartitionWithState {
|
||||
partition_ref: self.partition_ref,
|
||||
state: LiveState { built_at: timestamp, built_by: job_run_id },
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Cannot call .complete() on LiveState - method doesn't exist
|
||||
impl PartitionWithState<LiveState> {
|
||||
pub fn taint(self, taint_id: String, timestamp: u64) -> PartitionWithState<TaintedState> { ... }
|
||||
}
|
||||
```
|
||||
|
||||
**Benefits**:
|
||||
- Invalid transitions caught at compile time: `live_partition.complete()` → compile error
|
||||
- Refactoring safety: compiler guides you through state machine changes
|
||||
- Self-documenting: `fn schedule(want: WantWithState<IdleState>)` encodes precondition
|
||||
- Fast feedback loop: seconds (compile error) vs minutes (runtime panic) vs days (production bug)
|
||||
|
||||
**Runtime panics reserved for invariant violations** (bugs in BuildState implementation):
|
||||
- Missing references: `partitions_by_uuid[uuid]` doesn't exist → panic with context
|
||||
- Index inconsistencies: `canonical_partitions[ref]` points to invalid UUID → panic
|
||||
- These should never happen in correct implementation
|
||||
|
||||
## Architectural Layering
|
||||
|
||||
Three entity types with pragmatic data flow:
|
||||
|
||||
```
|
||||
Wants (user requests for data)
|
||||
↓ references partition refs (Vec<PartitionRef>)
|
||||
Partitions (data artifacts being built)
|
||||
↓ building_by/built_by job_run_ids (tracking)
|
||||
↑ wants_for_partition inverted index
|
||||
JobRuns (execution processes)
|
||||
```
|
||||
|
||||
**Direct references**:
|
||||
- Wants → Partitions: wants store `partitions: Vec<PartitionRef>`
|
||||
- JobRuns → Partitions: jobs store `building_partition_uuids: Vec<Uuid>`
|
||||
- Partitions → JobRuns: partitions store `building_by: Vec<String>` (job_run_ids)
|
||||
|
||||
**Inverted index**:
|
||||
- Partitions → Wants: `wants_for_partition: BTreeMap<String, Vec<String>>`
|
||||
- Maps partition_ref → want_ids waiting for it
|
||||
- Why not direct? Partitions keyed by UUID, but wants use partition_ref for mapping
|
||||
- Efficient lookup: "which wants are waiting for partition ref X?"
|
||||
|
||||
**Intentional separation**:
|
||||
- JobRuns don't know about Wants (jobs build partitions, agnostic to requesters)
|
||||
- Wants don't know about JobRuns (users care about data availability, not execution)
|
||||
|
||||
## Entity State Machines
|
||||
|
||||
### Want States
|
||||
|
||||
```
|
||||
New → {Idle, Building, UpstreamBuilding, Successful, Failed, UpstreamFailed, Canceled}
|
||||
```
|
||||
|
||||
**State semantics**: "What is the current status of my requested partitions?"
|
||||
|
||||
- **New**: Just created, state not yet determined (ephemeral, transitions immediately)
|
||||
- **Idle**: Partitions don't exist or are ready to retry (UpForRetry) → schedulable
|
||||
- **Building**: Canonical partitions currently being built by jobs
|
||||
- **UpstreamBuilding**: Canonical partitions waiting for upstream dependencies
|
||||
- **Successful**: All canonical partitions are Live
|
||||
- **Failed**: Canonical partition hard failure (shouldn't retry)
|
||||
- **UpstreamFailed**: Canonical partition's upstream failed (can't succeed)
|
||||
- **Canceled**: Explicitly canceled by user/system
|
||||
|
||||
**Key insight**: Want state reflects canonical partition state, not bound to specific partition UUIDs.
|
||||
|
||||
Example:
|
||||
```rust
|
||||
// Want created for "data/beta"
|
||||
want.partitions = ["data/beta"]
|
||||
|
||||
// Determine state by checking canonical partition
|
||||
if let Some(uuid) = canonical_partitions.get("data/beta") {
|
||||
let partition = partitions_by_uuid[uuid];
|
||||
match partition.state {
|
||||
Building => want.state = Building,
|
||||
Live => want.state = Successful,
|
||||
Failed => want.state = Failed,
|
||||
// ...
|
||||
}
|
||||
} else {
|
||||
want.state = Idle // No canonical partition exists
|
||||
}
|
||||
```
|
||||
|
||||
### Partition States
|
||||
|
||||
```
|
||||
Building → {UpstreamBuilding, UpForRetry, Live, Failed, UpstreamFailed, Tainted}
|
||||
```
|
||||
|
||||
**State semantics**: "What is the current build status? Is this partition leasable?"
|
||||
|
||||
- **Building**: Job actively building, lease held (prevent concurrent builds)
|
||||
- **UpstreamBuilding**: Dep miss occurred, waiting for upstreams, lease held
|
||||
- **UpForRetry**: Upstreams satisfied, ready to retry, lease released
|
||||
- **Live**: Successfully built (terminal)
|
||||
- **Failed**: Hard failure, shouldn't retry (terminal, lease released)
|
||||
- **UpstreamFailed**: Upstream deps failed, can't succeed (terminal, lease released)
|
||||
- **Tainted**: Marked invalid by taint event (terminal)
|
||||
|
||||
**No Missing state**: Partitions only exist when jobs start building them or have completed.
|
||||
|
||||
**State as lease mechanism**:
|
||||
- Building/UpstreamBuilding: Lease held → orchestrator will NOT schedule new jobs
|
||||
- UpForRetry/Failed/UpstreamFailed: Lease released → safe to schedule (though Failed/UpstreamFailed block wants)
|
||||
- Live/Tainted: Not lease states
|
||||
|
||||
Example lease behavior:
|
||||
```
|
||||
Partition uuid-1 ("data/beta"): Building
|
||||
Want W1 arrives for "data/beta" → New → Building (sees canonical is Building)
|
||||
Want W2 arrives for "data/beta" → New → Building (sees canonical is Building)
|
||||
Orchestrator polls: both wants Building, canonical partition Building → NOT schedulable (lease held)
|
||||
```
|
||||
|
||||
### JobRun States
|
||||
|
||||
```
|
||||
Queued → Running → {Successful, Failed, DepMissed}
|
||||
```
|
||||
|
||||
- **Queued**: Job buffered, not yet started
|
||||
- **Running**: Process executing
|
||||
- **Successful**: Completed successfully, partitions built
|
||||
- **Failed**: Process failed
|
||||
- **DepMissed**: Job discovered missing dependencies, created derivative wants
|
||||
|
||||
## Temporal Identity & References
|
||||
|
||||
**Problem**: How do we distinguish "the partition being built now" from "the partition built yesterday"?
|
||||
|
||||
**Solution**: Partition UUIDs for temporal identity, separate from user-facing refs.
|
||||
|
||||
### Partition UUIDs (Immutable Identity)
|
||||
|
||||
Each partition build attempt gets unique UUID:
|
||||
```rust
|
||||
fn derive_partition_uuid(job_run_id: &str, partition_ref: &str) -> Uuid {
|
||||
let mut hasher = Sha256::new();
|
||||
hasher.update(job_run_id.as_bytes());
|
||||
hasher.update(partition_ref.as_bytes());
|
||||
let hash = hasher.finalize();
|
||||
Uuid::from_slice(&hash[0..16]).unwrap()
|
||||
}
|
||||
```
|
||||
|
||||
**Properties**:
|
||||
- Deterministic: Same job + ref → same UUID (enables event replay)
|
||||
- Immutable: Partition(uuid-1) represents specific historical build
|
||||
- Jobs reference UUIDs: "Job J built Partition uuid-1 at time T"
|
||||
|
||||
### Partition Refs (Canonical Names)
|
||||
|
||||
User-facing identifier like `"data/category=tech/date=2024-01-15"`:
|
||||
- Wants reference refs: "I want data/beta to be Live"
|
||||
- Canonical partitions: `canonical_partitions["data/beta"] → uuid-3`
|
||||
- One canonical UUID per ref at any time
|
||||
|
||||
### Dual Indexing
|
||||
|
||||
```rust
|
||||
// All partition instances (historical + current)
|
||||
partitions_by_uuid: BTreeMap<Uuid, Partition>
|
||||
|
||||
// Current/canonical partition for each ref
|
||||
canonical_partitions: BTreeMap<String, Uuid>
|
||||
```
|
||||
|
||||
**Lifecycle example**:
|
||||
```
|
||||
1. Job J1 starts → uuid-1 generated for "data/beta"
|
||||
2. Partition(uuid-1, "data/beta", Building) created
|
||||
3. canonical_partitions["data/beta"] = uuid-1
|
||||
4. Job completes → Partition(uuid-1, Live)
|
||||
5. Partition tainted → Partition(uuid-1, Tainted), still canonical
|
||||
6. New job J2 starts → uuid-2 generated
|
||||
7. Partition(uuid-2, "data/beta", Building) created
|
||||
8. canonical_partitions["data/beta"] = uuid-2 (updated)
|
||||
9. Partition(uuid-1) remains in partitions_by_uuid for history
|
||||
```
|
||||
|
||||
**Query semantics**:
|
||||
- "What's the current state of data/beta?" → lookup canonical_partitions["data/beta"], then partitions_by_uuid[uuid]
|
||||
- "What partition did job J build?" → job.building_partition_uuids → partitions_by_uuid[uuid]
|
||||
- "What was the state at time T?" → replay events up to T, query canonical_partitions
|
||||
|
||||
## BuildState Data Structure (ECS Pattern)
|
||||
|
||||
Flat collections, not nested objects:
|
||||
|
||||
```rust
|
||||
pub struct BuildState {
|
||||
// Entity collections
|
||||
wants: BTreeMap<String, Want>,
|
||||
partitions_by_uuid: BTreeMap<Uuid, Partition>,
|
||||
canonical_partitions: BTreeMap<String, Uuid>,
|
||||
job_runs: BTreeMap<String, JobRun>,
|
||||
|
||||
// Inverted indexes
|
||||
wants_for_partition: BTreeMap<String, Vec<String>>, // partition_ref → want_ids
|
||||
downstream_waiting: BTreeMap<String, Vec<Uuid>>, // partition_ref → waiting_partition_uuids
|
||||
}
|
||||
```
|
||||
|
||||
**Why ECS over OOP**:
|
||||
- Avoids deep object hierarchies (`Want { partitions: Vec<Partition { job_runs: Vec<JobRun> }>}`)
|
||||
- Flexible querying without coupling
|
||||
- Inverted indexes provide O(1) reverse lookups
|
||||
- State rebuilds from events without complex object reconstruction
|
||||
- Access patterns drive data structure (not inheritance)
|
||||
|
||||
**Inverted index example**:
|
||||
```rust
|
||||
// Traditional OOP (tight coupling)
|
||||
partition.wants.iter().for_each(|want| transition_want(want));
|
||||
|
||||
// ECS with inverted index (decoupled)
|
||||
if let Some(want_ids) = wants_for_partition.get(&partition_ref) {
|
||||
for want_id in want_ids {
|
||||
let want = wants.get_mut(want_id).unwrap();
|
||||
transition_want(want);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Inverted Indexes
|
||||
|
||||
### wants_for_partition
|
||||
|
||||
```rust
|
||||
BTreeMap<String, Vec<String>> // partition_ref → want_ids
|
||||
```
|
||||
|
||||
**Purpose**: Find all wants waiting for a partition ref
|
||||
|
||||
**Maintenance**:
|
||||
- Updated on want creation: add want_id to each partition_ref in want
|
||||
- NOT cleaned up on want completion (acceptable, bounded growth)
|
||||
- Replaces `partition.wants: Vec<String>` that would exist in OOP
|
||||
|
||||
**Usage**:
|
||||
```rust
|
||||
// When partition transitions Building → Live
|
||||
let partition_ref = &partition.partition_ref.r#ref;
|
||||
if let Some(want_ids) = wants_for_partition.get(partition_ref) {
|
||||
for want_id in want_ids {
|
||||
// Check if all partitions for this want are Live
|
||||
// If yes, transition want Idle/Building → Successful
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### downstream_waiting
|
||||
|
||||
```rust
|
||||
BTreeMap<String, Vec<Uuid>> // partition_ref → waiting_partition_uuids
|
||||
```
|
||||
|
||||
**Purpose**: O(1) lookup of partitions waiting for an upstream when it completes/fails
|
||||
|
||||
**Maintenance**:
|
||||
- Updated when partition transitions Building → UpstreamBuilding
|
||||
- For each missing upstream ref, add partition UUID to `downstream_waiting[upstream_ref]`
|
||||
- Cleaned up when partition transitions UpstreamBuilding → UpForRetry/UpstreamFailed
|
||||
- Remove partition UUID from all `downstream_waiting` entries
|
||||
|
||||
**Usage**:
|
||||
```rust
|
||||
// When upstream partition "data/alpha" becomes Live
|
||||
if let Some(waiting_uuids) = downstream_waiting.get("data/alpha") {
|
||||
for uuid in waiting_uuids {
|
||||
let partition = partitions_by_uuid.get_mut(uuid).unwrap();
|
||||
// Check if ALL this partition's MissingDeps are now satisfied
|
||||
if all_deps_satisfied(partition) {
|
||||
partition = partition.transition_to_up_for_retry();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Why needed**: Avoids scanning all UpstreamBuilding partitions when upstreams complete.
|
||||
|
||||
## BuildState Responsibilities
|
||||
|
||||
What BuildState does:
|
||||
- Maintain entity state machines (process events, transition states)
|
||||
- Provide query interfaces (`get_want`, `list_partitions`, etc.)
|
||||
- Maintain inverted indexes for efficient lookups
|
||||
- Enforce invariants (panic on reference errors with context)
|
||||
- Rebuild state from event log (replay)
|
||||
|
||||
What BuildState does NOT do:
|
||||
- Make scheduling decisions (that's Orchestrator)
|
||||
- Execute jobs (that's external processes)
|
||||
- Generate UUIDs (done deterministically during event handling from job_run_id)
|
||||
|
||||
**Key insight**: BuildState is a pure state container. All coordination logic lives in Orchestrator.
|
||||
|
||||
## Want State Determination (Sensing)
|
||||
|
||||
When a want is created, it observes canonical partition states and transitions accordingly.
|
||||
|
||||
**Priority order** (first match wins):
|
||||
1. If ANY canonical partition is Failed → New → Failed
|
||||
2. If ANY canonical partition is UpstreamFailed → New → UpstreamFailed
|
||||
3. If ALL canonical partitions exist AND are Live → New → Successful
|
||||
4. If ANY canonical partition is Building → New → Building
|
||||
5. If ANY canonical partition is UpstreamBuilding → New → UpstreamBuilding
|
||||
6. If ANY canonical partition is UpForRetry → New → Idle (deps satisfied, ready to schedule)
|
||||
7. Otherwise (partitions don't exist) → New → Idle
|
||||
|
||||
**Example**:
|
||||
```rust
|
||||
// Want W1 created for ["data/alpha", "data/beta"]
|
||||
// canonical_partitions["data/alpha"] = uuid-1 (Building)
|
||||
// canonical_partitions["data/beta"] = uuid-2 (Live)
|
||||
// Result: W1 goes New → Building (rule 4: ANY partition Building)
|
||||
|
||||
// Want W2 created for ["data/gamma"]
|
||||
// canonical_partitions["data/gamma"] doesn't exist
|
||||
// Result: W2 goes New → Idle (rule 7: partition doesn't exist)
|
||||
```
|
||||
|
||||
**Key insight**: Most wants go New → Idle because canonical partitions only exist when jobs are running or completed. This is correct: "nothing is building yet, ready to schedule."
|
||||
|
||||
## Schedulability vs Want State
|
||||
|
||||
**Want State**: Reflects current reality of canonical partitions
|
||||
**Schedulability**: Orchestrator's decision logic for queuing jobs
|
||||
|
||||
**Not the same thing**:
|
||||
```
|
||||
Want W1: Idle → orchestrator schedules job → canonical partition becomes Building
|
||||
Want W1: Idle → Building (event handling transitions it)
|
||||
Want W2 arrives → sees canonical partition Building → New → Building
|
||||
Orchestrator polls: both W1 and W2 are Building
|
||||
Should orchestrator schedule another job? NO (lease held)
|
||||
```
|
||||
|
||||
**Schedulability check**: A want is schedulable if canonical partition is:
|
||||
- Doesn't exist (no lease), OR
|
||||
- Tainted (invalid, needs rebuild), OR
|
||||
- UpForRetry (lease released, deps satisfied)
|
||||
|
||||
**Not schedulable** if canonical partition is:
|
||||
- Building (lease held, job running)
|
||||
- UpstreamBuilding (lease held, waiting for deps)
|
||||
|
||||
**Implementation**:
|
||||
```rust
|
||||
fn is_schedulable(want: &Want, canonical_partitions: &BTreeMap<String, Uuid>) -> bool {
|
||||
for partition_ref in &want.partitions {
|
||||
if let Some(uuid) = canonical_partitions.get(partition_ref) {
|
||||
let partition = partitions_by_uuid[uuid];
|
||||
match partition.state {
|
||||
Building | UpstreamBuilding => return false, // Lease held
|
||||
Tainted | UpForRetry => continue, // Schedulable
|
||||
_ => continue,
|
||||
}
|
||||
}
|
||||
// Partition doesn't exist → schedulable
|
||||
}
|
||||
true
|
||||
}
|
||||
```
|
||||
|
||||
## Dependency Miss & Resolution Flow
|
||||
|
||||
The "dep miss" is the key mechanism for achieving multi-hop and complex data builds (traditionally solved via DAGs). When a job run fails due to missing upstream data, it generates a list of `MissingDeps`, which map the specific individual missing deps to the output partitions that needed them. This information enables databuild to create derivative wants, that will result in it scheduling jobs to build those partitions.
|
||||
|
||||
Complete flow when job encounters missing dependencies:
|
||||
|
||||
### 1. Job Reports Dep Miss
|
||||
```
|
||||
Job J1 building partition uuid-1 ("data/beta")
|
||||
Discovers missing upstream: "data/alpha" not Live
|
||||
Emits JobRunDepMissEventV1 {
|
||||
missing_deps: [
|
||||
MissingDeps {
|
||||
missing: [ PartitionRef { ref: "data/alpha" } ],
|
||||
impacted: PartitionRef { ref: "data/beta" }
|
||||
}, ...
|
||||
], ...
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Partition Transitions to UpstreamBuilding
|
||||
```rust
|
||||
// handle_job_run_dep_miss_event()
|
||||
partition = partition.transition_building_to_upstream_building(missing_deps);
|
||||
partition.state.missing_deps = ["data/alpha"];
|
||||
|
||||
// Update inverted index
|
||||
for upstream_ref in missing_deps {
|
||||
downstream_waiting.entry(upstream_ref).or_default().push(uuid-1);
|
||||
}
|
||||
// downstream_waiting["data/alpha"] = [uuid-1]
|
||||
|
||||
// Partition remains canonical (lease still held)
|
||||
// Job run transitions to DepMissed state
|
||||
```
|
||||
|
||||
### 3. Want Transitions
|
||||
```rust
|
||||
// All wants waiting for "data/beta" transition Building → UpstreamBuilding
|
||||
for want_id in wants_for_partition["data/beta"] {
|
||||
want = want.transition_building_to_upstream_building(derivative_want_ids);
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Derivative Wants Created
|
||||
```rust
|
||||
// System creates derivative want for missing upstream
|
||||
derivative_want = Want::new(["data/alpha"]);
|
||||
// This want goes New → Idle (alpha doesn't exist) → schedulable
|
||||
```
|
||||
|
||||
### 5. Upstream Builds Complete or Fail
|
||||
|
||||
**Success case**:
|
||||
```rust
|
||||
// Derivative want builds "data/alpha" → partition becomes Live
|
||||
// Look up downstream partitions waiting for "data/alpha"
|
||||
if let Some(waiting_uuids) = downstream_waiting.get("data/alpha") {
|
||||
for uuid in waiting_uuids {
|
||||
let partition = partitions_by_uuid.get_mut(uuid).unwrap();
|
||||
// Check if ALL missing deps now satisfied
|
||||
let all_satisfied = partition.state.missing_deps.iter().all(|dep_ref| {
|
||||
canonical_partitions.get(dep_ref)
|
||||
.and_then(|uuid| partitions_by_uuid.get(uuid))
|
||||
.map(|p| p.is_live())
|
||||
.unwrap_or(false)
|
||||
});
|
||||
|
||||
if all_satisfied {
|
||||
partition = partition.transition_to_up_for_retry();
|
||||
// Transition wants: UpstreamBuilding → Idle
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Failure case**:
|
||||
```rust
|
||||
// Upstream partition "data/alpha" transitions to Failed
|
||||
if let Some(waiting_uuids) = downstream_waiting.get("data/alpha") {
|
||||
for uuid in waiting_uuids {
|
||||
let partition = partitions_by_uuid.get_mut(uuid).unwrap();
|
||||
if matches!(partition, Partition::UpstreamBuilding(_)) {
|
||||
partition = partition.transition_to_upstream_failed();
|
||||
// Transition wants: UpstreamBuilding → UpstreamFailed
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6. Want Becomes Schedulable
|
||||
```rust
|
||||
// Partition uuid-1 now in UpForRetry state
|
||||
// Wants transition UpstreamBuilding → Idle
|
||||
// Orchestrator polls, sees Idle wants with UpForRetry canonical partition → schedulable
|
||||
// New job J2 queued → fresh uuid-2 generated for "data/beta"
|
||||
// Partition uuid-2 created in Building state, replaces uuid-1 in canonical_partitions
|
||||
// Partition uuid-1 remains in partitions_by_uuid (historical record)
|
||||
```
|
||||
|
||||
**Key properties**:
|
||||
- `downstream_waiting` enables O(1) lookup (no scanning all partitions)
|
||||
- Failure propagates down dependency chain automatically
|
||||
- Lease mechanism prevents concurrent retry attempts
|
||||
- Historical partition instances preserved for lineage
|
||||
|
||||
## Orchestrator Responsibilities
|
||||
|
||||
The Orchestrator coordinates execution but maintains no state:
|
||||
|
||||
**Core loop**:
|
||||
1. Poll BuildState for schedulable wants: `build_state.list_wants()` filtered by schedulability
|
||||
2. Make scheduling decisions (respect leases, check resources, etc.)
|
||||
3. Derive partition UUIDs for job: `derive_partition_uuid(job_run_id, partition_ref)`
|
||||
4. Emit JobRunBufferEventV1 with job_run_id and partition_refs
|
||||
5. BuildState processes event → creates partitions in Building state → updates canonical pointers → transitions wants
|
||||
|
||||
**Does NOT**:
|
||||
- Maintain its own state (always queries BuildState)
|
||||
- Know about partition UUIDs before emitting event (derives deterministically)
|
||||
- Track want-partition relationships (uses inverted index)
|
||||
|
||||
**Separation rationale**:
|
||||
- BuildState: source of truth for state
|
||||
- Orchestrator: coordination logic
|
||||
- Clear boundary enables testing, reasoning, replay
|
||||
|
||||
## Design Principles & Invariants
|
||||
|
||||
### 1. Compile-Time Correctness First
|
||||
Invalid states should be unrepresentable. Type-state pattern enforces valid transitions at compile time.
|
||||
|
||||
Example: Cannot call `complete()` on a partition that isn't Building.
|
||||
|
||||
### 2. Runtime Panics for Invariant Violations
|
||||
Reference errors and index inconsistencies represent BuildState bugs, not invalid input. Panic with context.
|
||||
|
||||
Example: `partitions_by_uuid[uuid]` missing → panic with "Partition UUID {uuid} referenced by canonical_partitions but not in partitions_by_uuid"
|
||||
|
||||
### 3. ECS Over OOP
|
||||
Flat collections with inverted indexes beat nested object hierarchies for flexibility and query performance.
|
||||
|
||||
### 4. Data Structure Follows Access Patterns
|
||||
Use inverted indexes where efficient reverse lookup is needed (`wants_for_partition`, `downstream_waiting`).
|
||||
|
||||
### 5. Events Represent Reality
|
||||
Events encode real things: job processes started, dependency misses occurred, user requests received. Not speculative.
|
||||
|
||||
### 6. No Backwards Compatibility Hacks
|
||||
Clean breaks preferred over technical debt. Code should be honest about state.
|
||||
|
||||
### 7. Fail Fast with Context
|
||||
Better to panic immediately with rich context than silently corrupt state or fail later mysteriously.
|
||||
|
||||
### 8. Type-State for Self-Documentation
|
||||
Function signatures encode preconditions: `fn schedule(want: WantWithState<IdleState>)` vs `fn schedule(want: Want)`.
|
||||
|
||||
## Summary
|
||||
|
||||
BuildState is a type-safe, event-sourced state machine using ECS patterns:
|
||||
|
||||
- **Compile-time correctness**: Invalid states unrepresentable
|
||||
- **Flat data structures**: Collections + inverted indexes, not nested objects
|
||||
- **Temporal identity**: UUID-based partition instances + canonical refs
|
||||
- **Lease mechanism**: State encodes schedulability (Building/UpstreamBuilding hold lease)
|
||||
- **Efficient lookups**: O(1) reverse queries via inverted indexes
|
||||
- **Clear separation**: BuildState maintains state, Orchestrator coordinates
|
||||
|
||||
The architecture prioritizes fast feedback during development (compile errors), clear semantics (explicit states), and correctness (type-safe transitions).
|
||||
3
.gitignore
vendored
3
.gitignore
vendored
|
|
@ -11,11 +11,14 @@ node_modules
|
|||
**/node_modules
|
||||
Cargo.toml
|
||||
Cargo.lock
|
||||
/askama.toml
|
||||
databuild/databuild.rs
|
||||
generated_number
|
||||
target
|
||||
logs/databuild/
|
||||
**/logs/databuild/
|
||||
**/.databuild
|
||||
|
||||
# DSL generated code
|
||||
**/generated/
|
||||
/databuild/databuild.rs
|
||||
|
|
|
|||
90
AGENTS.md
Normal file
90
AGENTS.md
Normal file
|
|
@ -0,0 +1,90 @@
|
|||
# Agent Instructions
|
||||
|
||||
## Project Overview
|
||||
DataBuild is a bazel-based data build system. Key files:
|
||||
- [`DESIGN.md`](./DESIGN.md) - Overall design of databuild
|
||||
- [`databuild.proto`](databuild/databuild.proto) - System interfaces
|
||||
- Component designs - design docs for specific aspects or components of databuild:
|
||||
- [Core build](docs/design/core-build.md) - How the core semantics of databuild works and are implemented
|
||||
- [Build event log](docs/design/build-event-log.md) - How the build event log works and is accessed
|
||||
- [Service](docs/design/service.md) - How the databuild HTTP service and web app are designed.
|
||||
- [Glossary](docs/design/glossary.md) - Centralized description of key terms.
|
||||
- [Graph specification](docs/design/graph-specification.md) - Describes the different libraries that enable more succinct declaration of databuild applications than the core bazel-based interface.
|
||||
- [Deploy strategies](docs/design/deploy-strategies.md) - Different strategies for deploying databuild applications.
|
||||
- [Wants](docs/design/wants.md) - How triggering works in databuild applications.
|
||||
- [Why databuild?](docs/design/why-databuild.md) - Why to choose databuild instead of other better established orchestration solutions.
|
||||
|
||||
Please reference these for any related work, as they indicate key technical bias/direction of the project.
|
||||
|
||||
## Architecture Pattern
|
||||
|
||||
DataBuild implements **Orchestrated State Machines** - a pattern where the application core is composed of:
|
||||
- **Type-safe state machines** for domain entities (Want, JobRun, Partition)
|
||||
- **Dependency graphs** expressing relationships between entities
|
||||
- **Orchestration logic** that coordinates state transitions based on dependencies
|
||||
|
||||
This architecture provides compile-time correctness, observability through event sourcing, and clean separation between entity behavior and coordination logic. See [`docs/orchestrated-state-machines.md`](docs/orchestrated-state-machines.md) for the full theory and implementation patterns.
|
||||
|
||||
**Key implications for development:**
|
||||
- Model entities as explicit state machines with type-parameterized states
|
||||
- Use consuming methods for state transitions (enforces immutability)
|
||||
- Emit events to BEL for all state changes (observability)
|
||||
- Centralize coordination logic in the Orchestrator (separation of concerns)
|
||||
- If it has a `status` field (or similar), it should have a state machine with type safe transitions that governs it
|
||||
|
||||
## Tenets
|
||||
|
||||
- Declarative over imperative wherever possible/reasonable.
|
||||
- We are building for the future, and choose to do "the right thing" rather than taking shortcuts to get unstuck. If you get stuck, pause and ask for help/input.
|
||||
- Do not add "unknown" results when parses or matches fail - these should always throw.
|
||||
- Compile time correctness is a super-power, and investment in it speeds up flywheel for development and user value.
|
||||
- **CLI/Service Interchangeability**: Both the CLI and service must produce identical artifacts (BEL events, logs, metrics, outputs) in the same locations. Users should be able to build with one interface and query/inspect results from the other seamlessly. This principle applies to all DataBuild operations, not just builds.
|
||||
- The BEL represents real things that happen: job run processes that are started or fail, requests from the user, dep misses, etc.
|
||||
- We focus on highly impactful tests anchored to stable interfaces. For instance, using BEL events to create valid application states to test orchestration logic via shared scenarios. This helps us keep a high ratio from "well tested functionality" to "test brittleness".
|
||||
|
||||
## Build & Test
|
||||
```bash
|
||||
# Build all databuild components
|
||||
bazel build //...
|
||||
|
||||
# Run databuild unit tests
|
||||
bazel test //...
|
||||
|
||||
# Run end-to-end tests (validates CLI vs Service consistency)
|
||||
./run_e2e_tests.sh
|
||||
|
||||
# Do not try to `bazel test //examples/basic_graph/...`, as this will not work.
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
- `databuild/` - Core system (Rust/Proto)
|
||||
- `examples/` - Example implementations
|
||||
- `scripts/` - Build utilities
|
||||
|
||||
## DataBuild Job Architecture
|
||||
|
||||
### Job Target Structure
|
||||
Each DataBuild job creates three Bazel targets:
|
||||
- `job_name.exec` - Execution target (calls binary with "exec" subcommand)
|
||||
- `job_name` - Main job target (pipes config output to exec input)
|
||||
|
||||
### Graph Configuration
|
||||
```python
|
||||
databuild_graph(
|
||||
name = "my_graph",
|
||||
jobs = [":job1", ":job2"], # Reference base job targets
|
||||
lookup = ":job_lookup", # Binary that routes partition refs to jobs
|
||||
)
|
||||
```
|
||||
|
||||
### Job Lookup Pattern
|
||||
```python
|
||||
def lookup_job_for_partition(partition_ref: str) -> str:
|
||||
if pattern.match(partition_ref):
|
||||
return "//:job_name" # Return base job target
|
||||
raise ValueError(f"No job found for: {partition_ref}")
|
||||
```
|
||||
|
||||
## Notes / Tips
|
||||
- Rust dependencies are implemented via rules_rust, so new dependencies should be added in the `MODULE.bazel` file.
|
||||
- Designs/plans should rarely include code snippets, outside of specifying interfaces or very specific changes.
|
||||
30
BUILD.bazel
30
BUILD.bazel
|
|
@ -1,24 +1,18 @@
|
|||
# Python Deps
|
||||
load("@rules_python//python:pip.bzl", "compile_pip_requirements")
|
||||
|
||||
filegroup(
|
||||
name = "jq",
|
||||
srcs = ["//databuild/runtime:jq"],
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
|
||||
# Export the E2E test runner script
|
||||
exports_files(["run_e2e_tests.sh"])
|
||||
|
||||
# End-to-End Test Runner
|
||||
sh_binary(
|
||||
name = "run_e2e_tests",
|
||||
srcs = ["run_e2e_tests.sh"],
|
||||
data = [
|
||||
"//tests/end_to_end:test_utils",
|
||||
],
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
## Export the E2E test runner script
|
||||
#exports_files(["run_e2e_tests.sh"])
|
||||
#
|
||||
## End-to-End Test Runner
|
||||
#sh_binary(
|
||||
# name = "run_e2e_tests",
|
||||
# srcs = ["run_e2e_tests.sh"],
|
||||
# data = [
|
||||
# "//tests/end_to_end:test_utils",
|
||||
# ],
|
||||
# visibility = ["//visibility:public"],
|
||||
#)
|
||||
|
||||
# `bazel run //:requirements.update` will regenerate the requirements_txt file
|
||||
compile_pip_requirements(
|
||||
|
|
|
|||
105
CLAUDE.md
105
CLAUDE.md
|
|
@ -1,105 +0,0 @@
|
|||
# Agent Instructions
|
||||
|
||||
## Project Overview
|
||||
DataBuild is a bazel-based data build system. Key files:
|
||||
- [`DESIGN.md`](./DESIGN.md) - Overall design of databuild
|
||||
- [`databuild.proto`](databuild/databuild.proto) - System interfaces
|
||||
- Component designs - design docs for specific aspects or components of databuild:
|
||||
- [Core build](./design/core-build.md) - How the core semantics of databuild works and are implemented
|
||||
- [Build event log](./design/build-event-log.md) - How the build event log works and is accessed
|
||||
- [Service](./design/service.md) - How the databuild HTTP service and web app are designed.
|
||||
- [Glossary](./design/glossary.md) - Centralized description of key terms.
|
||||
- [Graph specification](./design/graph-specification.md) - Describes the different libraries that enable more succinct declaration of databuild applications than the core bazel-based interface.
|
||||
- [Observability](./design/observability.md) - How observability is systematically achieved throughout databuild applications.
|
||||
- [Deploy strategies](./design/deploy-strategies.md) - Different strategies for deploying databuild applications.
|
||||
- [Wants](./design/wants.md) - How triggering works in databuild applications.
|
||||
- [Why databuild?](./design/why-databuild.md) - Why to choose databuild instead of other better established orchestration solutions.
|
||||
|
||||
Please reference these for any related work, as they indicate key technical bias/direction of the project.
|
||||
|
||||
## Tenets
|
||||
|
||||
- Declarative over imperative wherever possible/reasonable.
|
||||
- We are building for the future, and choose to do "the right thing" rather than taking shortcuts to get unstuck. If you get stuck, pause and ask for help/input.
|
||||
- Do not add "unknown" results when parses or matches fail - these should always throw.
|
||||
- Compile time correctness is a super-power, and investment in it speeds up flywheel for development and user value.
|
||||
- **CLI/Service Interchangeability**: Both the CLI and service must produce identical artifacts (BEL events, logs, metrics, outputs) in the same locations. Users should be able to build with one interface and query/inspect results from the other seamlessly. This principle applies to all DataBuild operations, not just builds.
|
||||
|
||||
## Build & Test
|
||||
```bash
|
||||
# Build all databuild components
|
||||
bazel build //...
|
||||
|
||||
# Run databuild unit tests
|
||||
bazel test //...
|
||||
|
||||
# Run end-to-end tests (validates CLI vs Service consistency)
|
||||
./run_e2e_tests.sh
|
||||
|
||||
# Do not try to `bazel test //examples/basic_graph/...`, as this will not work.
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
- `databuild/` - Core system (Rust/Proto)
|
||||
- `examples/` - Example implementations
|
||||
- `scripts/` - Build utilities
|
||||
|
||||
## Key Components
|
||||
- Graph analysis/execution in Rust
|
||||
- Bazel rules for job orchestration
|
||||
- Java/Python examples for different use cases
|
||||
|
||||
## DataBuild Job Architecture
|
||||
|
||||
### Job Target Structure
|
||||
Each DataBuild job creates three Bazel targets:
|
||||
- `job_name.cfg` - Configuration target (calls binary with "config" subcommand)
|
||||
- `job_name.exec` - Execution target (calls binary with "exec" subcommand)
|
||||
- `job_name` - Main job target (pipes config output to exec input)
|
||||
|
||||
### Unified Job Binary Pattern
|
||||
Jobs use a single binary with subcommands:
|
||||
```python
|
||||
def main():
|
||||
command = sys.argv[1] # "config" or "exec"
|
||||
if command == "config":
|
||||
handle_config(sys.argv[2:]) # Output job configuration JSON
|
||||
elif command == "exec":
|
||||
handle_exec(sys.argv[2:]) # Perform actual work
|
||||
```
|
||||
|
||||
### DataBuild Execution Flow
|
||||
1. **Planning Phase**: DataBuild calls `.cfg` targets to get job configurations
|
||||
2. **Execution Phase**: DataBuild calls main job targets which pipe config to exec
|
||||
3. **Job Resolution**: Job lookup returns base job names (e.g., `//:job_name`), not `.cfg` variants
|
||||
|
||||
### Graph Configuration
|
||||
```python
|
||||
databuild_graph(
|
||||
name = "my_graph",
|
||||
jobs = [":job1", ":job2"], # Reference base job targets
|
||||
lookup = ":job_lookup", # Binary that routes partition refs to jobs
|
||||
)
|
||||
```
|
||||
|
||||
### Job Lookup Pattern
|
||||
```python
|
||||
def lookup_job_for_partition(partition_ref: str) -> str:
|
||||
if pattern.match(partition_ref):
|
||||
return "//:job_name" # Return base job target
|
||||
raise ValueError(f"No job found for: {partition_ref}")
|
||||
```
|
||||
|
||||
### Common Pitfalls
|
||||
- **Not using protobuf-defined interface**: Where structs and interfaces are defined centrally in [`databuild.proto`](./databuild/databuild.proto), those interfaces should always be used. E.g., in rust depending on them via the prost-generated structs, and in the web app via the OpenAPI-generated typescript interfaces.
|
||||
- **Empty args**: Jobs with `"args": []` won't execute properly
|
||||
- **Wrong target refs**: Job lookup must return base targets, not `.cfg` variants
|
||||
- **Missing partition refs**: All outputs must be addressable via partition references
|
||||
- **Not adding new generated files to OpenAPI outs**: Bazel hermeticity demands that we specify each output file, so when the OpenAPI code gen would create new files, we need to explicitly add them to the target's outs field.
|
||||
|
||||
## Notes / Tips
|
||||
- Rust dependencies are implemented via rules_rust, so new dependencies should be added in the `MODULE.bazel` file.
|
||||
|
||||
## Documentation
|
||||
|
||||
We use plans / designs in the [plans](./plans/) directory to anchor most large scale efforts. We create plans that are good bets, though not necessarily exhaustive, then (and this is critical) we update them after the work is completed, or after significant progress towards completion.
|
||||
1
CLAUDE.md
Symbolic link
1
CLAUDE.md
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
AGENTS.md
|
||||
55
DESIGN.md
55
DESIGN.md
|
|
@ -1,45 +1,34 @@
|
|||
|
||||
# DataBuild Design
|
||||
|
||||
DataBuild is a trivially-deployable, partition-oriented, declarative build system. Where data orchestration flows are normally imperative and implicit (do this, then do that, etc), DataBuild uses stated data dependencies to make this process declarative and explicit. DataBuild scales the declarative nature of tools like DBT to meet the needs of modern, broadly integrated data and ML organizations, who consume data from many sources and which arrive on a highly varying basis. DataBuild enables confident, bounded completeness in a world where input data is effectively never complete at any given time.
|
||||
DataBuild is a trivially-deployable, partition-oriented, declarative build system. Where data orchestration flows are normally imperative and implicitly coupled (do this, then do that, etc), DataBuild uses stated data dependencies to make this process declarative and explicit. DataBuild scales the declarative nature of tools like DBT to meet the needs of modern, broadly integrated data and ML organizations, who consume data from many sources and which arrive on a highly varying basis. DataBuild enables confident, bounded completeness in a world where input data is effectively never complete at any given time.
|
||||
|
||||
## Philosophy
|
||||
|
||||
Inspired by [these requirements.](docs/design/requirements.md)
|
||||
|
||||
Many large-scale systems for producing data leave the complexity of true orchestration to the user - even DAG-based systems for implementing dependencies leave the system as a collection of DAGs, requiring engineers to solve the same "why doesn't this data exist?" and "how do I build this data?"
|
||||
|
||||
DataBuild takes inspiration from modern data orchestration and build systems to fully internalize this complexity, using the Job concept to localize all decisions of turning upstream data into output data (and making all dependencies explicit); and the Graph concept to handle composition of jobs, answering what sequence of jobs must be run to build a specific partition of data. With Jobs and Graphs, DataBuild takes complete responsibility for the data build process, allowing engineers to consider concerns only local to the jobs relevant to their feature.
|
||||
DataBuild takes inspiration from modern data orchestration and build systems to fully internalize this complexity, using the Job concept to localize all decisions of turning upstream data into output data (and making all dependencies explicit); and the Graph concept to handle composition of jobs, enabling continuous data reconciliation for data platforms of all sizes. With Jobs and Graphs, DataBuild takes complete responsibility for the data build process, allowing engineers to consider concerns only local to the jobs relevant to their feature.
|
||||
|
||||
Graphs and jobs are defined in [bazel](https://bazel.build), allowing graphs (and their constituent jobs) to be built and deployed trivially.
|
||||
|
||||
## Concepts
|
||||
|
||||
- **Partitions** - A partition is an atomic unit of data. DataBuild's data dependencies work by using partition references (e.g. `s3://some/dataset/date=2025-06-01`) as dependency signals between jobs, allowing the construction of build graphs to produce arbitrary partitions.
|
||||
- **Jobs** - Their `exec` entrypoint builds partitions from partitions, and their `config` entrypoint specifies what partitions are required to produce the requested partition(s), along with the specific config to run `exec` with to build said partitions.
|
||||
- **Graphs** - Composes jobs together to achieve multi-job orchestration, using a `lookup` mechanism to resolve a requested partition to the job that can build it. Together with its constituent jobs, Graphs can fully plan the build of any set of partitions. Most interactions with a DataBuild app happen with a graph.
|
||||
- **Build Event Log** - Encodes the state of the system, recording build requests, job activity, partition production, etc to enable running databuild as a deployed application.
|
||||
- **Wants** - Partition wants can be registered with DataBuild, causing it to build the wanted partitions as soon as its graph-external dependencies are met.
|
||||
- **Jobs** - Builds requested partitions from specific input partitions, or raising when input partitions are missing (specifying which partitions can't be built because of specific missing partitions)
|
||||
- **Graphs** - Composes jobs together to achieve multi-job orchestration, using a `lookup` mechanism to resolve a requested partition to the job that can build it. Together with its constituent jobs, Graphs can fully build any set of partitions. Most interactions with a DataBuild app happen with a graph.
|
||||
- **Build Event Log** - Encodes the state of the system, recording partition wants, job activity, partition production, etc to enable running databuild as a deployed application.
|
||||
- **Wants** - Partition wants can be registered with DataBuild, enabling continuous data reconciliation and build of wanted partitions as soon as their graph-external dependencies are met.
|
||||
- **Taints** - Taints mark a partition as invalid, indicating that readers should not use it, and that it should be rebuilt when requested or depended upon. If there is a still-active want for the tainted partition, it will be rebuilt immediately.
|
||||
- **Bazel Targets** - Bazel is a fast, extensible, and hermetic build system. DataBuild uses bazel targets to describe graphs and jobs, making graphs themselves deployable application. Implementing a DataBuild app is the process of integrating your data build jobs in `databuild_job` bazel targets, and connecting them with a `databuild_graph` target.
|
||||
- [**Graph Specification Strategies**](design/graph-specification.md) (coming soon) Application libraries in Python/Rust/Scala that use language features to enable ergonomic and succinct specification of jobs and graphs.
|
||||
- [**Graph Definition Languages**](docs/design/graph-specification.md) Application libraries in Python/Rust/Scala that use language features to enable ergonomic and succinct specification of jobs and graphs.
|
||||
|
||||
### Partition / Job Assumptions and Best Practices
|
||||
|
||||
- **Partitions are atomic and final** - Either the data is complete or its "not there".
|
||||
- **Partitions are mutually exclusive and collectively exhaustive** - Row membership to a partition should be unambiguous and consistent.
|
||||
- **Jobs are idempotent** - For the same input data and parameters, the same partition is produced (functionally).
|
||||
|
||||
### Partition Delegation
|
||||
|
||||
If a partition is already up to date, or is already being built by a previous build request, a new build request will "delegate" to that build request. Instead of running the job to build said partition again, it will emit a delegation event in the build event log, explicitly pointing to the build action it is delegating to.
|
||||
|
||||
## Components
|
||||
## Bazel Components
|
||||
|
||||
### Job
|
||||
|
||||
The `databuild_job` rule expects to reference a binary that adheres to the following expectations:
|
||||
|
||||
- For the `config` subcommand, it prints the JSON job config to stdout based on the requested partitions, e.g. for a binary `bazel-bin/my_binary`, it prints a valid job config when called like `bazel-bin/my_binary config my_dataset/color=red my_dataset/color=blue`.
|
||||
- For the `exec` subcommand, it produces the partitions requested to the `config` subcommand when configured by the job config it produced. E.g., if `config` had produced `{..., "args": ["red", "blue"], "env": {"MY_ENV": "foo"}`, then calling `MY_ENV=foo bazel-bin/my_binary exec red blue` should produce partitions `my_dataset/color=red` and `my_dataset/color=blue`.
|
||||
The `databuild_job` rule requires just a binary target that it can execute, and any relevant metadata that helps the graph call it properly. The referenced binary should accept a list of partitions that it needs to produce, and if any required partitions are missing, report which are missing and which requested partitions they prevent from being built.
|
||||
|
||||
Jobs are executed via a wrapper component that provides observability, error handling, and standardized communication with the graph. The wrapper captures all job output as structured logs, enabling comprehensive monitoring without requiring jobs to have network connectivity.
|
||||
|
||||
|
|
@ -50,19 +39,16 @@ The `databuild_graph` rule expects two fields, `jobs`, and `lookup`:
|
|||
- The `lookup` binary target should return a JSON object with keys as job labels and values as the list of partitions that each job is responsible for producing. This enables graph planning by walking backwards in the data dependency graph.
|
||||
- The `jobs` list should just be a list of all jobs involved in the graph. The graph will recursively call config to resolve the full set of jobs to run.
|
||||
|
||||
### Build Event Log (BEL)
|
||||
### [Build Event Log (BEL)](docs/design/build-event-log.md)
|
||||
|
||||
The BEL encodes all relevant build actions that occur, enabling concurrent builds. This includes:
|
||||
|
||||
- Graph events, including "build requested", "build started", "analysis started", "build failed", "build completed", etc.
|
||||
- Job events, including "..."
|
||||
The BEL encodes all relevant build actions that occur, enabling distributed/concurrent builds. This includes submitted wants, job events (started, succeeded, partitions missing, etc)
|
||||
|
||||
The BEL is similar to [event-sourced](https://martinfowler.com/eaaDev/EventSourcing.html) systems, as all application state is rendered from aggregations over the BEL. This enables the BEL to stay simple while also powering concurrent builds, the data catalog, and the DataBuild service.
|
||||
|
||||
### Triggers and Wants (Coming Soon)
|
||||
["Wants"](./design/wants.md) are the main mechanism for continually building partitions over time. In real world scenarios, it is standard for data to arrive late, or not at all. Wants cause the databuild graph to continually attempt to build the wanted partitions until a) the partitions are live or b) the want expires, at which another script can be run. Wants are the mechanism that implements SLA checking.
|
||||
### Wants and Taints
|
||||
["Wants"](docs/design/wants.md) are the main mechanism for eventually built partitions. In real world scenarios, it is standard for data to arrive late, or not at all. Wants cause the databuild graph to continually attempt to build the wanted partitions while they aren't live, and enabling it to list wants who are past SLA.
|
||||
|
||||
You can also use cron-based triggers, which return partition refs that they want built.
|
||||
Taints allow for manual/programmatic invalidation of built partitions. Partitions tainted since their last build are considered as non-existent, and will be rebuilt if any other wanted partition depends on them. This also opens the door for invalidating downstream partitions as well.
|
||||
|
||||
# Key Insights
|
||||
|
||||
|
|
@ -70,6 +56,15 @@ You can also use cron-based triggers, which return partition refs that they want
|
|||
- Orchestration decisions and application logic is innately coupled.
|
||||
- "systemd for data platforms"
|
||||
|
||||
## What About Configuration?
|
||||
|
||||
Configuration is all the information that is provided to a job that isn't a) the data the job reads or b) the partitions the job is being asked to produce. This could be info like "what modeling strategy do we use for this customer" or "when did was this feed configured", etc. It has the inconvenient features of being critical for practical business value and is also difficult to fit in as data (since you often want to change and "tweak" it).
|
||||
|
||||
DataBuild explicitly and intentionally treats configuration as a job-internal concept: jobs are not pure functions, but it is a good idea for almost all of the implementation to be purely functional: it's recommended to calculate structured job configuration up front (along with trying to resolve the required input data), then invoking the rest of your job as a pure function over the config and data.
|
||||
|
||||
What about situations where data is configured by a web app, etc? Taints are a great way to invalidate partitions that are impacted by config changes, and you can create callbacks in your application to taint impacted partitions.
|
||||
|
||||
|
||||
## Assumptions
|
||||
|
||||
- Job -> partition relationships are canonical, job runs are idempotent
|
||||
|
|
|
|||
152
MODULE.bazel
152
MODULE.bazel
|
|
@ -3,15 +3,21 @@ module(
|
|||
version = "0.1",
|
||||
)
|
||||
|
||||
bazel_dep(name = "bazel_skylib", version = "1.8.1")
|
||||
bazel_dep(name = "platforms", version = "0.0.11")
|
||||
bazel_dep(name = "rules_shell", version = "0.4.0")
|
||||
bazel_dep(name = "bazel_skylib", version = "1.8.2")
|
||||
bazel_dep(name = "platforms", version = "1.0.0")
|
||||
bazel_dep(name = "rules_shell", version = "0.6.1")
|
||||
bazel_dep(name = "rules_oci", version = "2.2.6")
|
||||
bazel_dep(name = "aspect_bazel_lib", version = "2.14.0")
|
||||
bazel_dep(name = "rules_rust", version = "0.61.0")
|
||||
bazel_dep(name = "rules_rust", version = "0.67.0")
|
||||
bazel_dep(name = "rules_proto", version = "7.0.2")
|
||||
bazel_dep(name = "protobuf", version = "29.0", repo_name = "com_google_protobuf")
|
||||
|
||||
#rust = use_extension("@rules_rust//rust:extensions.bzl", "rust")
|
||||
#rust.toolchain(
|
||||
# edition = "2024",
|
||||
# versions = ["1.91.1"],
|
||||
#)
|
||||
|
||||
crate = use_extension("@rules_rust//crate_universe:extensions.bzl", "crate")
|
||||
crate.spec(
|
||||
features = ["derive"],
|
||||
|
|
@ -22,34 +28,6 @@ crate.spec(
|
|||
package = "serde_json",
|
||||
version = "1.0",
|
||||
)
|
||||
crate.spec(
|
||||
package = "log",
|
||||
version = "0.4",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["stderr"],
|
||||
package = "simple_logger",
|
||||
version = "4.3",
|
||||
)
|
||||
crate.spec(
|
||||
package = "crossbeam-channel",
|
||||
version = "0.5",
|
||||
)
|
||||
crate.spec(
|
||||
package = "num_cpus",
|
||||
version = "1.16",
|
||||
)
|
||||
crate.spec(
|
||||
default_features = False,
|
||||
features = [
|
||||
"macros",
|
||||
"net",
|
||||
"rt-multi-thread",
|
||||
"sync",
|
||||
],
|
||||
package = "tokio",
|
||||
version = "1.38",
|
||||
)
|
||||
crate.spec(
|
||||
package = "prost",
|
||||
version = "0.13",
|
||||
|
|
@ -66,88 +44,88 @@ crate.spec(
|
|||
package = "tempfile",
|
||||
version = "3.0",
|
||||
)
|
||||
crate.spec(
|
||||
package = "async-trait",
|
||||
version = "0.1",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["v4"],
|
||||
package = "uuid",
|
||||
version = "1.0",
|
||||
)
|
||||
crate.spec(
|
||||
package = "sha2",
|
||||
version = "0.10",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["bundled"],
|
||||
package = "rusqlite",
|
||||
version = "0.31",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["derive"],
|
||||
package = "clap",
|
||||
version = "4.0",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["json"],
|
||||
package = "axum",
|
||||
version = "0.7.2",
|
||||
)
|
||||
crate.spec(
|
||||
package = "tower",
|
||||
version = "0.4",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["cors"],
|
||||
package = "tower-http",
|
||||
version = "0.5",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["full"],
|
||||
package = "hyper",
|
||||
version = "1.0",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["axum"],
|
||||
package = "aide",
|
||||
version = "0.13.0",
|
||||
)
|
||||
crate.spec(
|
||||
features = [
|
||||
"uuid1",
|
||||
"derive",
|
||||
],
|
||||
package = "schemars",
|
||||
version = "0.8.16",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["aide"],
|
||||
package = "axum-jsonschema",
|
||||
version = "0.8.0",
|
||||
package = "regex",
|
||||
version = "1.10",
|
||||
)
|
||||
crate.spec(
|
||||
package = "thiserror",
|
||||
features = ["full"],
|
||||
package = "tokio",
|
||||
version = "1.0",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["debug-embed"],
|
||||
package = "rust-embed",
|
||||
version = "8.0",
|
||||
package = "axum",
|
||||
version = "0.7",
|
||||
)
|
||||
crate.spec(
|
||||
package = "sysinfo",
|
||||
version = "0.30",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["datafusion"],
|
||||
package = "deltalake",
|
||||
version = "0.27",
|
||||
)
|
||||
crate.spec(
|
||||
package = "parquet",
|
||||
version = "55.2",
|
||||
)
|
||||
crate.spec(
|
||||
package = "chrono",
|
||||
package = "tower",
|
||||
version = "0.4",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["trace", "cors"],
|
||||
package = "tower-http",
|
||||
version = "0.5",
|
||||
)
|
||||
crate.spec(
|
||||
package = "tracing",
|
||||
version = "0.1",
|
||||
)
|
||||
crate.spec(
|
||||
package = "tracing-subscriber",
|
||||
version = "0.3",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["derive"],
|
||||
package = "clap",
|
||||
version = "4.0",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["blocking", "json"],
|
||||
package = "reqwest",
|
||||
version = "0.11",
|
||||
)
|
||||
crate.spec(
|
||||
package = "toml",
|
||||
version = "0.8",
|
||||
)
|
||||
crate.spec(
|
||||
features = ["urlencode"],
|
||||
package = "askama",
|
||||
version = "0.14",
|
||||
)
|
||||
crate.spec(
|
||||
package = "urlencoding",
|
||||
version = "2.1",
|
||||
)
|
||||
crate.spec(
|
||||
package = "fs2",
|
||||
version = "0.4",
|
||||
)
|
||||
crate.spec(
|
||||
package = "libc",
|
||||
version = "0.2",
|
||||
)
|
||||
crate.from_specs()
|
||||
use_repo(crate, "crates")
|
||||
|
||||
|
|
|
|||
5166
MODULE.bazel.lock
5166
MODULE.bazel.lock
File diff suppressed because one or more lines are too long
11
README.md
11
README.md
|
|
@ -18,14 +18,16 @@
|
|||
█████████╔╝ ██████╔═╝ ██╔╝ ████████╗ ███████╔═╝
|
||||
╚════════╝ ╚═════╝ ╚═╝ ╚═══════╝ ╚══════╝
|
||||
|
||||
- -- S Y S T E M O N L I N E -- -
|
||||
- - -- D E C L A R A T I V E -- - -
|
||||
- - -- P A R T I T I O N E D -- - -
|
||||
- - -- D A T A B U I L D S -- - -
|
||||
```
|
||||
|
||||
DataBuild is a trivially-deployable, partition-oriented, declarative data build system.
|
||||
|
||||
DataBuild is for teams at data-driven orgs who need reliable, flexible, and correct data pipelines and are tired of manually orchestrating complex dependency graphs. You define Jobs (that take input data partitions and produce output partitions), compose them into Graphs (partition dependency networks), and DataBuild handles the rest. Just ask it to build a partition, and databuild handles resolving the jobs that need to run, planning execution order, running builds concurrently, and tracking and exposing build progress. Instead of writing orchestration code that breaks when dependencies change, you focus on the data transformations while DataBuild ensures your pipelines are correct, observable, and reliable.
|
||||
|
||||
For important context, check out [DESIGN.md](./DESIGN.md), along with designs in [design/](./design/). Also, check out [`databuild.proto`](./databuild/databuild.proto) for key system interfaces. Key features:
|
||||
For important context, check out [DESIGN.md](./DESIGN.md), along with designs in [design/](docs/design/). Also, check out [`databuild.proto`](./databuild/databuild.proto) for key system interfaces. Key features:
|
||||
|
||||
- **Declarative dependencies** - Ask for data, get data. Define partition dependencies and DataBuild automatically plans what jobs to run and when.
|
||||
|
||||
|
|
@ -33,8 +35,6 @@ For important context, check out [DESIGN.md](./DESIGN.md), along with designs in
|
|||
|
||||
- **Deploy anywhere** - One binary, any platform. Bazel-based builds create hermetic applications that run locally, in containers, or in the cloud.
|
||||
|
||||
- **Concurrent by design** - Multiple teams, zero conflicts. Event-sourced coordination enables parallel builds without stepping on each other.
|
||||
|
||||
## Usage
|
||||
|
||||
### Graph Description Methods
|
||||
|
|
@ -103,3 +103,6 @@ End to end testing:
|
|||
```bash
|
||||
./run_e2e_tests.sh
|
||||
```
|
||||
|
||||
#### Test Strategy
|
||||
Where possible, we make invalid state unrepresentable via rust's type system. Where that is not possible, we prefer [property-testing](https://en.wikipedia.org/wiki/Software_testing#Property_testing), with a handful of bespoke tests to capture critical edge cases or important behaviors.
|
||||
|
|
|
|||
6
assets/logo.svg
Normal file
6
assets/logo.svg
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
<svg width="243" height="215" viewBox="0 0 243 215" fill="none" xmlns="http://www.w3.org/2000/svg">
|
||||
<path d="M123.5 77L149.048 121.25H97.9523L123.5 77Z" fill="#F2994A"/>
|
||||
<path d="M224.772 125.035L155.772 125.035L109.52 45.3147L86.7722 45.3147L40.2722 124.463L16.7725 124.463" stroke="#333333" stroke-width="20"/>
|
||||
<path d="M86.6196 5.18886L121.12 64.9444L75.2062 144.86L86.58 164.56L178.375 165.256L190.125 185.608" stroke="#333333" stroke-width="20"/>
|
||||
<path d="M51.966 184.847L86.4659 125.092L178.632 124.896L190.006 105.196L144.711 25.3514L156.461 5.00002" stroke="#333333" stroke-width="20"/>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 602 B |
|
|
@ -17,108 +17,68 @@ rust_binary(
|
|||
)
|
||||
|
||||
# DataBuild library using generated prost code
|
||||
# Note: Templates are embedded inline in web/templates.rs using Askama's in_doc feature
|
||||
# because Bazel's sandbox doesn't support file-based Askama templates properly.
|
||||
rust_library(
|
||||
name = "databuild",
|
||||
srcs = [
|
||||
"event_log/mock.rs",
|
||||
"event_log/mod.rs",
|
||||
"event_log/query_engine.rs",
|
||||
"event_log/sqlite_storage.rs",
|
||||
"event_log/storage.rs",
|
||||
"event_log/writer.rs",
|
||||
"format_consistency_test.rs",
|
||||
"lib.rs",
|
||||
"log_access.rs",
|
||||
"log_collector.rs",
|
||||
"mermaid_utils.rs",
|
||||
"metric_templates.rs",
|
||||
"metrics_aggregator.rs",
|
||||
"orchestration/error.rs",
|
||||
"orchestration/events.rs",
|
||||
"orchestration/mod.rs",
|
||||
"repositories/builds/mod.rs",
|
||||
"repositories/jobs/mod.rs",
|
||||
"repositories/mod.rs",
|
||||
"repositories/partitions/mod.rs",
|
||||
"repositories/tasks/mod.rs",
|
||||
"service/handlers.rs",
|
||||
"service/mod.rs",
|
||||
"status_utils.rs",
|
||||
name = "lib",
|
||||
srcs = glob(["**/*.rs"]) + [
|
||||
":generate_databuild_rust",
|
||||
],
|
||||
compile_data = glob(["web/templates/**"]) + ["askama.toml"],
|
||||
crate_root = "lib.rs",
|
||||
edition = "2021",
|
||||
proc_macro_deps = [
|
||||
"@crates//:async-trait",
|
||||
],
|
||||
# This is required to point to the `askama.toml`, which then points to the appropriate place for templates
|
||||
rustc_env = {"CARGO_MANIFEST_DIR": "$(BINDIR)/" + package_name()},
|
||||
visibility = ["//visibility:public"],
|
||||
deps = [
|
||||
"@crates//:aide",
|
||||
"@crates//:askama",
|
||||
"@crates//:axum",
|
||||
"@crates//:axum-jsonschema",
|
||||
"@crates//:chrono",
|
||||
"@crates//:log",
|
||||
"@crates//:fs2",
|
||||
"@crates//:libc",
|
||||
"@crates//:prost",
|
||||
"@crates//:prost-types",
|
||||
"@crates//:regex",
|
||||
"@crates//:reqwest",
|
||||
"@crates//:rusqlite",
|
||||
"@crates//:schemars",
|
||||
"@crates//:serde",
|
||||
"@crates//:serde_json",
|
||||
"@crates//:thiserror",
|
||||
"@crates//:tokio",
|
||||
"@crates//:uuid",
|
||||
],
|
||||
)
|
||||
|
||||
# OpenAPI Spec Generator binary (no dashboard dependency)
|
||||
# No need to run this manually - it will automatically generate source and it will be used in
|
||||
# the related targets (e.g. //databuild/client:extract_openapi_spec)
|
||||
rust_binary(
|
||||
name = "openapi_spec_generator",
|
||||
srcs = ["service/openapi_spec_generator.rs"],
|
||||
edition = "2021",
|
||||
visibility = ["//visibility:public"],
|
||||
deps = [
|
||||
":databuild",
|
||||
"@crates//:log",
|
||||
"@crates//:serde_json",
|
||||
"@crates//:tokio",
|
||||
],
|
||||
)
|
||||
|
||||
# Build Graph Service binary
|
||||
rust_binary(
|
||||
name = "build_graph_service",
|
||||
srcs = ["service/main.rs"],
|
||||
data = ["//databuild/dashboard:dist"],
|
||||
edition = "2021",
|
||||
visibility = ["//visibility:public"],
|
||||
deps = [
|
||||
":databuild",
|
||||
"@crates//:aide",
|
||||
"@crates//:axum",
|
||||
"@crates//:axum-jsonschema",
|
||||
"@crates//:clap",
|
||||
"@crates//:hyper",
|
||||
"@crates//:log",
|
||||
"@crates//:schemars",
|
||||
"@crates//:serde",
|
||||
"@crates//:serde_json",
|
||||
"@crates//:simple_logger",
|
||||
"@crates//:sha2",
|
||||
"@crates//:tokio",
|
||||
"@crates//:toml",
|
||||
"@crates//:tower",
|
||||
"@crates//:tower-http",
|
||||
"@crates//:tracing",
|
||||
"@crates//:urlencoding",
|
||||
"@crates//:uuid",
|
||||
],
|
||||
)
|
||||
|
||||
# Test for orchestration module
|
||||
rust_test(
|
||||
name = "orchestration_test",
|
||||
crate = ":databuild",
|
||||
edition = "2021",
|
||||
name = "databuild_test",
|
||||
crate = ":lib",
|
||||
data = ["//databuild/test:test_job_helper"],
|
||||
env = {"RUST_BACKTRACE": "1"},
|
||||
deps = [
|
||||
"@crates//:tempfile",
|
||||
],
|
||||
)
|
||||
|
||||
# DataBuild CLI binary
|
||||
rust_binary(
|
||||
name = "databuild",
|
||||
srcs = ["cli_main.rs"],
|
||||
edition = "2021",
|
||||
visibility = ["//visibility:public"],
|
||||
deps = [
|
||||
":lib",
|
||||
"@crates//:axum",
|
||||
"@crates//:clap",
|
||||
"@crates//:reqwest",
|
||||
"@crates//:serde_json",
|
||||
"@crates//:tokio",
|
||||
"@crates//:tracing",
|
||||
"@crates//:tracing-subscriber",
|
||||
],
|
||||
)
|
||||
|
||||
|
|
|
|||
|
|
@ -1,26 +0,0 @@
|
|||
|
||||
# DataBuild
|
||||
|
||||
## API
|
||||
|
||||
A sort of requirements doc for the semantics of DataBuild, enumerating the nouns and verbs they can do.
|
||||
|
||||
### Graph
|
||||
|
||||
- `analyze` - Produce the job graph required to build the requested set of partitions.
|
||||
- `build` - Analyze and then execute the produced job graph to build the requested partitions.
|
||||
- `builds`
|
||||
- `list` - List past builds.
|
||||
- `show` - Shows current status of specified build and list events. Can tail build events for a build with `--follow/-f`
|
||||
- `cancel` - Cancel specified build.
|
||||
- `partitions`
|
||||
- `list` - Lists partitions.
|
||||
- `show` - Shows current status of the specified partition.
|
||||
- `invalidate` - Marks a partition as invalid (will be rebuilt, won't be read).
|
||||
- `jobs`
|
||||
- `list` - List jobs in the graph.
|
||||
- `show` - Shows task statistics (success %, runtime, etc) and recent task results.
|
||||
- `tasks` (job runs)
|
||||
- `list` - Lists past tasks.
|
||||
- `show` - Describes current task status and lists events.
|
||||
- `cancel` - Cancels a specific task.
|
||||
2
databuild/askama.toml
Normal file
2
databuild/askama.toml
Normal file
|
|
@ -0,0 +1,2 @@
|
|||
[general]
|
||||
dirs = ["web/templates"]
|
||||
457
databuild/build_event_log.rs
Normal file
457
databuild/build_event_log.rs
Normal file
|
|
@ -0,0 +1,457 @@
|
|||
use crate::build_state::BuildState;
|
||||
use crate::data_build_event::Event;
|
||||
use crate::util::{DatabuildError, current_timestamp};
|
||||
use crate::{
|
||||
CancelWantRequest, CancelWantResponse, CreateTaintRequest, CreateTaintResponse,
|
||||
CreateWantRequest, CreateWantResponse, DataBuildEvent, GetTaintRequest, GetTaintResponse,
|
||||
GetWantRequest, GetWantResponse, ListJobRunsRequest, ListJobRunsResponse,
|
||||
ListPartitionsRequest, ListPartitionsResponse, ListTaintsRequest, ListTaintsResponse,
|
||||
ListWantsRequest, ListWantsResponse, TaintCreateEventV1, WantCancelEventV1, WantCreateEventV1,
|
||||
};
|
||||
use prost::Message;
|
||||
use rusqlite::Connection;
|
||||
use std::fmt::Debug;
|
||||
use std::sync::{Arc, Mutex};
|
||||
use std::time::{SystemTime, UNIX_EPOCH};
|
||||
|
||||
pub trait BELStorage: Send + Sync {
|
||||
fn append_event(&mut self, event: &Event) -> Result<u64, DatabuildError>;
|
||||
fn list_events(
|
||||
&self,
|
||||
since_idx: u64,
|
||||
limit: u64,
|
||||
) -> Result<Vec<DataBuildEvent>, DatabuildError>;
|
||||
fn get_event(&self, event_id: u64) -> Result<Option<DataBuildEvent>, DatabuildError>;
|
||||
fn latest_event_id(&self) -> Result<u64, DatabuildError>;
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct MemoryBELStorage {
|
||||
pub events: Vec<DataBuildEvent>,
|
||||
}
|
||||
|
||||
impl Default for MemoryBELStorage {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
impl MemoryBELStorage {
|
||||
pub fn new() -> MemoryBELStorage {
|
||||
MemoryBELStorage { events: vec![] }
|
||||
}
|
||||
}
|
||||
|
||||
impl BELStorage for MemoryBELStorage {
|
||||
fn append_event(&mut self, event: &Event) -> Result<u64, DatabuildError> {
|
||||
let timestamp = current_timestamp();
|
||||
let dbe = DataBuildEvent {
|
||||
timestamp,
|
||||
event_id: self.events.len() as u64,
|
||||
event: Some(event.clone()),
|
||||
};
|
||||
self.events.push(dbe);
|
||||
Ok(self.events.len() as u64)
|
||||
}
|
||||
|
||||
fn list_events(
|
||||
&self,
|
||||
since_idx: u64,
|
||||
limit: u64,
|
||||
) -> Result<Vec<DataBuildEvent>, DatabuildError> {
|
||||
Ok(self
|
||||
.events
|
||||
.iter()
|
||||
.cloned()
|
||||
.filter(|e| e.timestamp > since_idx)
|
||||
.take(limit as usize)
|
||||
.collect())
|
||||
}
|
||||
|
||||
fn get_event(&self, event_id: u64) -> Result<Option<DataBuildEvent>, DatabuildError> {
|
||||
Ok(self.events.iter().find(|e| e.event_id == event_id).cloned())
|
||||
}
|
||||
|
||||
fn latest_event_id(&self) -> Result<u64, DatabuildError> {
|
||||
Ok(self.events.len().saturating_sub(1) as u64)
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct SqliteBELStorage {
|
||||
connection: Arc<Mutex<Connection>>,
|
||||
}
|
||||
|
||||
impl SqliteBELStorage {
|
||||
pub fn create(database_url: &str) -> Result<SqliteBELStorage, DatabuildError> {
|
||||
let connection = Connection::open(database_url)?;
|
||||
|
||||
// Create the events table
|
||||
connection.execute(
|
||||
"CREATE TABLE IF NOT EXISTS events (
|
||||
event_id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
timestamp INTEGER NOT NULL,
|
||||
event_data BLOB NOT NULL
|
||||
)",
|
||||
(),
|
||||
)?;
|
||||
|
||||
Ok(SqliteBELStorage {
|
||||
connection: Arc::new(Mutex::new(connection)),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl BELStorage for SqliteBELStorage {
|
||||
fn append_event(&mut self, event: &Event) -> Result<u64, DatabuildError> {
|
||||
let now = SystemTime::now();
|
||||
let duration_since_epoch = now.duration_since(UNIX_EPOCH).expect("Time went backwards");
|
||||
let timestamp = duration_since_epoch.as_nanos() as u64;
|
||||
|
||||
// Serialize the event using prost
|
||||
let dbe = DataBuildEvent {
|
||||
timestamp,
|
||||
event_id: 0, // Will be set by the database
|
||||
event: Some(event.clone()),
|
||||
};
|
||||
|
||||
let mut buf = Vec::new();
|
||||
prost::Message::encode(&dbe, &mut buf)?;
|
||||
|
||||
let connection = self
|
||||
.connection
|
||||
.lock()
|
||||
.map_err(|e| format!("Failed to acquire lock: {}", e))?;
|
||||
|
||||
connection.execute(
|
||||
"INSERT INTO events (timestamp, event_data) VALUES (?1, ?2)",
|
||||
(×tamp, &buf),
|
||||
)?;
|
||||
|
||||
let event_id = connection.last_insert_rowid() as u64;
|
||||
Ok(event_id)
|
||||
}
|
||||
|
||||
fn list_events(
|
||||
&self,
|
||||
since_idx: u64,
|
||||
limit: u64,
|
||||
) -> Result<Vec<DataBuildEvent>, DatabuildError> {
|
||||
let connection = self
|
||||
.connection
|
||||
.lock()
|
||||
.map_err(|e| format!("Failed to acquire lock: {}", e))?;
|
||||
|
||||
let mut stmt = connection.prepare(
|
||||
"SELECT event_id, timestamp, event_data FROM events
|
||||
WHERE timestamp > ?1
|
||||
ORDER BY event_id
|
||||
LIMIT ?2",
|
||||
)?;
|
||||
|
||||
let rows = stmt.query_map([since_idx, limit], |row| {
|
||||
let event_id: u64 = row.get(0)?;
|
||||
let timestamp: u64 = row.get(1)?;
|
||||
let event_data: Vec<u8> = row.get(2)?;
|
||||
|
||||
// Deserialize the event using prost
|
||||
let mut dbe = DataBuildEvent::decode(event_data.as_slice()).map_err(|_e| {
|
||||
rusqlite::Error::InvalidColumnType(
|
||||
0,
|
||||
"event_data".to_string(),
|
||||
rusqlite::types::Type::Blob,
|
||||
)
|
||||
})?;
|
||||
|
||||
// Update the event_id from the database
|
||||
dbe.event_id = event_id;
|
||||
dbe.timestamp = timestamp;
|
||||
|
||||
let result: DataBuildEvent = dbe;
|
||||
|
||||
Ok(result)
|
||||
})?;
|
||||
|
||||
let mut events = Vec::new();
|
||||
for row_result in rows {
|
||||
events.push(row_result?);
|
||||
}
|
||||
|
||||
Ok(events)
|
||||
}
|
||||
|
||||
fn get_event(&self, event_id: u64) -> Result<Option<DataBuildEvent>, DatabuildError> {
|
||||
let connection = self
|
||||
.connection
|
||||
.lock()
|
||||
.map_err(|e| format!("Failed to acquire lock: {}", e))?;
|
||||
|
||||
let mut stmt = connection
|
||||
.prepare("SELECT event_id, timestamp, event_data FROM events WHERE event_id = ?1")?;
|
||||
|
||||
let result = stmt.query_row([event_id], |row| {
|
||||
let event_id: u64 = row.get(0)?;
|
||||
let timestamp: u64 = row.get(1)?;
|
||||
let event_data: Vec<u8> = row.get(2)?;
|
||||
|
||||
// Deserialize the event using prost
|
||||
let mut dbe = DataBuildEvent::decode(event_data.as_slice()).map_err(|_e| {
|
||||
rusqlite::Error::InvalidColumnType(
|
||||
0,
|
||||
"event_data".to_string(),
|
||||
rusqlite::types::Type::Blob,
|
||||
)
|
||||
})?;
|
||||
|
||||
// Update the event_id from the database
|
||||
dbe.event_id = event_id;
|
||||
dbe.timestamp = timestamp;
|
||||
|
||||
Ok(dbe)
|
||||
});
|
||||
|
||||
match result {
|
||||
Ok(event) => Ok(Some(event)),
|
||||
Err(rusqlite::Error::QueryReturnedNoRows) => Ok(None),
|
||||
Err(e) => Err(e.into()),
|
||||
}
|
||||
}
|
||||
|
||||
fn latest_event_id(&self) -> Result<u64, DatabuildError> {
|
||||
let connection = self
|
||||
.connection
|
||||
.lock()
|
||||
.map_err(|e| format!("Failed to acquire lock: {}", e))?;
|
||||
|
||||
let result: Result<u64, rusqlite::Error> =
|
||||
connection.query_row("SELECT MAX(event_id) FROM events", [], |row| row.get(0));
|
||||
|
||||
match result {
|
||||
Ok(id) => Ok(id),
|
||||
Err(rusqlite::Error::QueryReturnedNoRows) => Ok(0),
|
||||
Err(e) => Err(e.into()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Default)]
|
||||
pub struct BuildEventLog<S: BELStorage + Debug> {
|
||||
pub storage: S,
|
||||
pub state: BuildState,
|
||||
/// Optional event broadcaster for HTTP server mirroring
|
||||
#[cfg_attr(not(feature = "server"), allow(dead_code))]
|
||||
pub event_broadcaster: Option<tokio::sync::broadcast::Sender<Event>>,
|
||||
}
|
||||
|
||||
impl<S: BELStorage + Debug> BuildEventLog<S> {
|
||||
pub fn new(storage: S, state: BuildState) -> BuildEventLog<S> {
|
||||
BuildEventLog {
|
||||
storage,
|
||||
state,
|
||||
event_broadcaster: None,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn with_broadcaster(mut self, broadcaster: tokio::sync::broadcast::Sender<Event>) -> Self {
|
||||
self.event_broadcaster = Some(broadcaster);
|
||||
self
|
||||
}
|
||||
|
||||
pub fn append_event(&mut self, event: &Event) -> Result<u64, DatabuildError> {
|
||||
let events = self.state.handle_event(&event);
|
||||
let idx = self.storage.append_event(event)?;
|
||||
|
||||
// Broadcast event to HTTP server (if configured)
|
||||
if let Some(ref tx) = self.event_broadcaster {
|
||||
let _ = tx.send(event.clone());
|
||||
}
|
||||
|
||||
// Recursion here might be dangerous, but in theory the event propagation always terminates
|
||||
for event in events {
|
||||
self.append_event(&event)?;
|
||||
}
|
||||
Ok(idx)
|
||||
}
|
||||
|
||||
pub fn append_event_no_recurse(&mut self, event: &Event) -> Result<u64, DatabuildError> {
|
||||
self.state.handle_event(&event);
|
||||
let idx = self.storage.append_event(event)?;
|
||||
// Recursion here might be dangerous, but in theory the event propagation always terminates
|
||||
Ok(idx)
|
||||
}
|
||||
|
||||
// API methods
|
||||
pub fn api_handle_list_wants(&self, req: ListWantsRequest) -> ListWantsResponse {
|
||||
self.state.list_wants(&req)
|
||||
}
|
||||
|
||||
pub fn api_handle_list_taints(&self, req: ListTaintsRequest) -> ListTaintsResponse {
|
||||
self.state.list_taints(&req)
|
||||
}
|
||||
|
||||
pub fn api_handle_list_partitions(&self, req: ListPartitionsRequest) -> ListPartitionsResponse {
|
||||
self.state.list_partitions(&req)
|
||||
}
|
||||
|
||||
pub fn api_handle_list_job_runs(&self, req: ListJobRunsRequest) -> ListJobRunsResponse {
|
||||
self.state.list_job_runs(&req)
|
||||
}
|
||||
|
||||
pub fn api_handle_want_create(
|
||||
&mut self,
|
||||
req: CreateWantRequest,
|
||||
) -> Result<CreateWantResponse, DatabuildError> {
|
||||
let ev: WantCreateEventV1 = req.into();
|
||||
self.append_event(&ev.clone().into())?;
|
||||
Ok(self.state.get_want(&ev.want_id).into())
|
||||
}
|
||||
|
||||
pub fn api_handle_want_get(&self, req: GetWantRequest) -> GetWantResponse {
|
||||
self.state.get_want(&req.want_id).into()
|
||||
}
|
||||
|
||||
pub fn api_handle_want_cancel(
|
||||
&mut self,
|
||||
req: CancelWantRequest,
|
||||
) -> Result<CancelWantResponse, DatabuildError> {
|
||||
let ev: WantCancelEventV1 = req.into();
|
||||
self.append_event(&ev.clone().into())?;
|
||||
Ok(self.state.get_want(&ev.want_id).into())
|
||||
}
|
||||
|
||||
pub fn api_handle_taint_create(
|
||||
&mut self,
|
||||
req: CreateTaintRequest,
|
||||
) -> Result<CreateTaintResponse, DatabuildError> {
|
||||
// TODO Need to do this hierarchically? A taint will impact downstream partitions also
|
||||
todo!();
|
||||
let ev: TaintCreateEventV1 = req.into();
|
||||
self.append_event(&ev.clone().into())?;
|
||||
Ok(self.state.get_taint(&ev.taint_id).into())
|
||||
}
|
||||
|
||||
pub fn api_handle_taint_get(&self, req: GetTaintRequest) -> GetTaintResponse {
|
||||
todo!()
|
||||
}
|
||||
|
||||
// Not implemented yet
|
||||
// pub fn api_handle_taint_cancel(&mut self, req: CancelWantRequest) -> CancelWantResponse {
|
||||
// todo!()
|
||||
// }
|
||||
}
|
||||
|
||||
impl Clone for BuildEventLog<MemoryBELStorage> {
|
||||
fn clone(&self) -> Self {
|
||||
Self {
|
||||
storage: self.storage.clone(),
|
||||
state: self.state.clone(),
|
||||
event_broadcaster: self.event_broadcaster.clone(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
mod sqlite_bel_storage {
|
||||
use crate::build_event_log::{BELStorage, BuildEventLog, SqliteBELStorage};
|
||||
use crate::build_state::BuildState;
|
||||
use crate::data_build_event::Event;
|
||||
use crate::util::test_scenarios::default_originating_lifetime;
|
||||
use crate::{PartitionRef, WantCreateEventV1};
|
||||
use uuid::Uuid;
|
||||
|
||||
#[test]
|
||||
fn test_sqlite_append_event() {
|
||||
let storage =
|
||||
SqliteBELStorage::create(":memory:").expect("Failed to create SQLite storage");
|
||||
let state = BuildState::default();
|
||||
let mut log = BuildEventLog {
|
||||
storage,
|
||||
state,
|
||||
event_broadcaster: None,
|
||||
};
|
||||
|
||||
let want_id = "sqlite_test_1234".to_string();
|
||||
|
||||
// Initial state - verify storage is empty
|
||||
let events = log
|
||||
.storage
|
||||
.list_events(0, 100)
|
||||
.expect("Failed to list events");
|
||||
assert_eq!(events.len(), 0);
|
||||
|
||||
// Verify want doesn't exist in state
|
||||
assert!(log.state.get_want(&want_id).is_none());
|
||||
|
||||
// Append an event
|
||||
let mut e = WantCreateEventV1::default();
|
||||
e.want_id = want_id.clone();
|
||||
e.partitions = vec![PartitionRef {
|
||||
r#ref: "sqlite_partition_1234".to_string(),
|
||||
}];
|
||||
e.lifetime = Some(default_originating_lifetime());
|
||||
let event_id = log
|
||||
.append_event(&Event::WantCreateV1(e))
|
||||
.expect("append_event failed");
|
||||
|
||||
// Verify event was stored
|
||||
assert!(event_id > 0);
|
||||
|
||||
// Verify event can be retrieved
|
||||
let events = log
|
||||
.storage
|
||||
.list_events(0, 100)
|
||||
.expect("Failed to list events");
|
||||
assert_eq!(events.len(), 1);
|
||||
|
||||
let stored_event = &events[0];
|
||||
assert_eq!(stored_event.event_id, event_id);
|
||||
assert!(stored_event.timestamp > 0);
|
||||
|
||||
// Verify the event content
|
||||
if let Some(Event::WantCreateV1(want_event)) = &stored_event.event {
|
||||
assert_eq!(want_event.want_id, want_id);
|
||||
assert_eq!(want_event.partitions.len(), 1);
|
||||
assert_eq!(want_event.partitions[0].r#ref, "sqlite_partition_1234");
|
||||
} else {
|
||||
panic!("Expected WantCreateV1 event, got {:?}", stored_event.event);
|
||||
}
|
||||
|
||||
// Verify state was updated
|
||||
assert!(
|
||||
log.state.get_want(&want_id).is_some(),
|
||||
"want_id not found in state"
|
||||
);
|
||||
assert_eq!(
|
||||
log.state
|
||||
.get_want(&want_id)
|
||||
.map(|want| want.want_id.clone())
|
||||
.expect("state.wants want_id not found"),
|
||||
want_id,
|
||||
"want_id not equal in state",
|
||||
);
|
||||
|
||||
let mut e2 = WantCreateEventV1::default();
|
||||
e2.want_id = Uuid::new_v4().into();
|
||||
e2.lifetime = Some(default_originating_lifetime());
|
||||
log.append_event(&Event::WantCreateV1(e2))
|
||||
.expect("append_event failed");
|
||||
let mut e3 = WantCreateEventV1::default();
|
||||
e3.want_id = Uuid::new_v4().into();
|
||||
e3.lifetime = Some(default_originating_lifetime());
|
||||
log.append_event(&Event::WantCreateV1(e3))
|
||||
.expect("append_event failed");
|
||||
let mut e4 = WantCreateEventV1::default();
|
||||
e4.want_id = Uuid::new_v4().into();
|
||||
e4.lifetime = Some(default_originating_lifetime());
|
||||
log.append_event(&Event::WantCreateV1(e4))
|
||||
.expect("append_event failed");
|
||||
|
||||
let events = log
|
||||
.storage
|
||||
.list_events(0, 100)
|
||||
.expect("Failed to list events");
|
||||
assert_eq!(events.len(), 4);
|
||||
}
|
||||
}
|
||||
}
|
||||
1160
databuild/build_state/event_handlers.rs
Normal file
1160
databuild/build_state/event_handlers.rs
Normal file
File diff suppressed because it is too large
Load diff
202
databuild/build_state/mod.rs
Normal file
202
databuild/build_state/mod.rs
Normal file
|
|
@ -0,0 +1,202 @@
|
|||
//! Build State - the heart of databuild's orchestration system
|
||||
//!
|
||||
//! The BuildState struct tracks all application state, defines valid state transitions,
|
||||
//! and manages cross-state machine state transitions (e.g. job run success resulting
|
||||
//! in partition going from Building to Live).
|
||||
//!
|
||||
//! See docs/design/build-state-semantics.md for the full conceptual model.
|
||||
|
||||
mod event_handlers;
|
||||
mod partition_transitions;
|
||||
mod queries;
|
||||
mod schedulability;
|
||||
mod want_transitions;
|
||||
|
||||
use crate::job_run_state::JobRun;
|
||||
use crate::partition_state::Partition;
|
||||
use crate::want_state::Want;
|
||||
use crate::{PartitionRef, TaintDetail};
|
||||
use std::collections::BTreeMap;
|
||||
use uuid::Uuid;
|
||||
|
||||
// Re-export public types
|
||||
pub use schedulability::{WantSchedulability, WantUpstreamStatus, WantsSchedulability};
|
||||
|
||||
/**
|
||||
Design Notes
|
||||
|
||||
The build state struct is the heart of the service and orchestrator, adapting build events to
|
||||
higher level questions about build state. One temptation is to implement the build state as a set
|
||||
of hierarchically defined reducers, to achieve information hiding and factor system capabilities and
|
||||
state tracking simply. Unfortunately, to update state based on an event, you need a mutable borrow
|
||||
of some part of the build state (that the reducer controls, for instance), and an immutable borrow
|
||||
of the whole state for read/query purposes. The whole state needs to be available to handle state
|
||||
updates like "this is the list of currently active job runs" in response to a job run event. Put
|
||||
simply, this isn't possible without introducing some locking of the whole state and mutable state
|
||||
subset, since they would conflict (the mutable subset would have already been borrowed, so can't
|
||||
be borrowed immutably as part of the whole state borrow). You might also define a "query" phase
|
||||
in which reducers query the state based on the received event, but that just increases complexity.
|
||||
|
||||
Instead, databuild opts for an entity-component system (ECS) that just provides the whole build
|
||||
state mutably to all state update functionality, trusting that we know how to use it responsibly.
|
||||
This means no boxing or "query phase", and means we can have all state updates happen as map lookups
|
||||
and updates, which is exceptionally fast. The states of the different entities are managed by state
|
||||
machines, in a pseudo-colored-petri-net style (only pseudo because we haven't formalized it). It is
|
||||
critical that these state machines, their states, and their transitions are type-safe.
|
||||
*/
|
||||
|
||||
/// Tracks all application state, defines valid state transitions, and manages cross-state machine
|
||||
/// state transitions (e.g. job run success resulting in partition going from Building to Live)
|
||||
#[derive(Debug, Clone, Default)]
|
||||
pub struct BuildState {
|
||||
// Core entity storage
|
||||
pub(crate) wants: BTreeMap<String, Want>,
|
||||
pub(crate) taints: BTreeMap<String, TaintDetail>,
|
||||
pub(crate) job_runs: BTreeMap<String, JobRun>,
|
||||
|
||||
// UUID-based partition indexing
|
||||
pub(crate) partitions_by_uuid: BTreeMap<Uuid, Partition>,
|
||||
pub(crate) canonical_partitions: BTreeMap<String, Uuid>, // partition ref → current UUID
|
||||
|
||||
// Inverted indexes
|
||||
pub(crate) wants_for_partition: BTreeMap<String, Vec<String>>, // partition ref → want_ids
|
||||
pub(crate) downstream_waiting: BTreeMap<String, Vec<Uuid>>, // upstream ref → partition UUIDs waiting for it
|
||||
|
||||
// Consumer index for lineage queries: input_uuid → list of (output_uuid, job_run_id)
|
||||
// Uses UUIDs (not refs) to preserve historical lineage across partition rebuilds
|
||||
// Populated from read_deps on job success
|
||||
pub(crate) partition_consumers: BTreeMap<Uuid, Vec<(Uuid, String)>>,
|
||||
}
|
||||
|
||||
impl BuildState {
|
||||
/// Reconstruct BuildState from a sequence of events (for read path in web server)
|
||||
/// This allows the web server to rebuild state from BEL storage without holding a lock
|
||||
pub fn from_events(events: &[crate::DataBuildEvent]) -> Self {
|
||||
let mut state = BuildState::default();
|
||||
for event in events {
|
||||
if let Some(ref inner_event) = event.event {
|
||||
// handle_event returns Vec<Event> for cascading events, but we ignore them
|
||||
// since we're replaying from a complete event log
|
||||
state.handle_event(inner_event);
|
||||
}
|
||||
}
|
||||
state
|
||||
}
|
||||
|
||||
pub fn count_job_runs(&self) -> usize {
|
||||
self.job_runs.len()
|
||||
}
|
||||
|
||||
// ===== UUID-based partition access methods =====
|
||||
|
||||
/// Get the canonical partition for a ref (the current/active partition instance)
|
||||
pub fn get_canonical_partition(&self, partition_ref: &str) -> Option<&Partition> {
|
||||
self.canonical_partitions
|
||||
.get(partition_ref)
|
||||
.and_then(|uuid| self.partitions_by_uuid.get(uuid))
|
||||
}
|
||||
|
||||
/// Get the canonical partition UUID for a ref
|
||||
pub fn get_canonical_partition_uuid(&self, partition_ref: &str) -> Option<Uuid> {
|
||||
self.canonical_partitions.get(partition_ref).copied()
|
||||
}
|
||||
|
||||
/// Get a partition by its UUID
|
||||
pub fn get_partition_by_uuid(&self, uuid: Uuid) -> Option<&Partition> {
|
||||
self.partitions_by_uuid.get(&uuid)
|
||||
}
|
||||
|
||||
/// Take the canonical partition for a ref (removes from partitions_by_uuid for state transition)
|
||||
/// The canonical_partitions mapping is NOT removed - caller must update it if creating a new partition
|
||||
pub(crate) fn take_canonical_partition(&mut self, partition_ref: &str) -> Option<Partition> {
|
||||
self.canonical_partitions
|
||||
.get(partition_ref)
|
||||
.copied()
|
||||
.and_then(|uuid| self.partitions_by_uuid.remove(&uuid))
|
||||
}
|
||||
|
||||
/// Get want IDs for a partition ref (from inverted index)
|
||||
pub fn get_wants_for_partition(&self, partition_ref: &str) -> &[String] {
|
||||
self.wants_for_partition
|
||||
.get(partition_ref)
|
||||
.map(|v| v.as_slice())
|
||||
.unwrap_or(&[])
|
||||
}
|
||||
|
||||
/// Get consumers for a partition UUID (downstream partitions that read this one)
|
||||
/// Returns list of (output_uuid, job_run_id) tuples
|
||||
pub fn get_partition_consumers(&self, uuid: &Uuid) -> &[(Uuid, String)] {
|
||||
self.partition_consumers
|
||||
.get(uuid)
|
||||
.map(|v| v.as_slice())
|
||||
.unwrap_or(&[])
|
||||
}
|
||||
|
||||
/// Register a want in the wants_for_partition inverted index
|
||||
pub(crate) fn register_want_for_partitions(
|
||||
&mut self,
|
||||
want_id: &str,
|
||||
partition_refs: &[PartitionRef],
|
||||
) {
|
||||
for pref in partition_refs {
|
||||
let want_ids = self
|
||||
.wants_for_partition
|
||||
.entry(pref.r#ref.clone())
|
||||
.or_insert_with(Vec::new);
|
||||
if !want_ids.contains(&want_id.to_string()) {
|
||||
want_ids.push(want_id.to_string());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Update a partition in the indexes (after state transition)
|
||||
pub(crate) fn update_partition(&mut self, partition: Partition) {
|
||||
let uuid = partition.uuid();
|
||||
self.partitions_by_uuid.insert(uuid, partition);
|
||||
}
|
||||
|
||||
// Test helpers
|
||||
pub(crate) fn with_wants(self, wants: BTreeMap<String, Want>) -> Self {
|
||||
Self { wants, ..self }
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
pub(crate) fn with_partitions(
|
||||
self,
|
||||
old_partitions: BTreeMap<String, crate::PartitionDetail>,
|
||||
) -> Self {
|
||||
use crate::partition_state::PartitionWithState;
|
||||
|
||||
let mut canonical_partitions: BTreeMap<String, Uuid> = BTreeMap::new();
|
||||
let mut partitions_by_uuid: BTreeMap<Uuid, Partition> = BTreeMap::new();
|
||||
|
||||
// Convert PartitionDetail to Live partitions for testing
|
||||
for (key, detail) in old_partitions {
|
||||
let partition_ref = detail.r#ref.clone().unwrap_or_default();
|
||||
// Create a deterministic UUID for test data
|
||||
let uuid =
|
||||
crate::partition_state::derive_partition_uuid("test_job_run", &partition_ref.r#ref);
|
||||
let live_partition = Partition::Live(PartitionWithState {
|
||||
uuid,
|
||||
partition_ref,
|
||||
state: crate::partition_state::LiveState {
|
||||
built_at: 0,
|
||||
built_by: "test_job_run".to_string(),
|
||||
},
|
||||
});
|
||||
|
||||
canonical_partitions.insert(key, uuid);
|
||||
partitions_by_uuid.insert(uuid, live_partition);
|
||||
}
|
||||
|
||||
Self {
|
||||
canonical_partitions,
|
||||
partitions_by_uuid,
|
||||
..self
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) mod consts {
|
||||
pub const DEFAULT_PAGE_SIZE: u64 = 100;
|
||||
}
|
||||
345
databuild/build_state/partition_transitions.rs
Normal file
345
databuild/build_state/partition_transitions.rs
Normal file
|
|
@ -0,0 +1,345 @@
|
|||
//! Partition state transition logic
|
||||
//!
|
||||
//! Methods for transitioning partitions between states (Building, Live, Failed,
|
||||
//! UpstreamBuilding, UpForRetry, UpstreamFailed) and managing downstream dependencies.
|
||||
|
||||
use crate::PartitionRef;
|
||||
use crate::partition_state::{
|
||||
BuildingPartitionRef, BuildingState, FailedPartitionRef, LivePartitionRef, Partition,
|
||||
PartitionWithState,
|
||||
};
|
||||
use crate::util::current_timestamp;
|
||||
use uuid::Uuid;
|
||||
|
||||
use super::BuildState;
|
||||
|
||||
impl BuildState {
|
||||
/// Create a new partition in Building state and update indexes
|
||||
pub(crate) fn create_partition_building(
|
||||
&mut self,
|
||||
job_run_id: &str,
|
||||
partition_ref: PartitionRef,
|
||||
) -> Uuid {
|
||||
let partition =
|
||||
PartitionWithState::<BuildingState>::new(job_run_id.to_string(), partition_ref.clone());
|
||||
let uuid = partition.uuid;
|
||||
|
||||
// Update indexes
|
||||
self.partitions_by_uuid
|
||||
.insert(uuid, Partition::Building(partition));
|
||||
self.canonical_partitions
|
||||
.insert(partition_ref.r#ref.clone(), uuid);
|
||||
|
||||
tracing::info!(
|
||||
partition = %partition_ref.r#ref,
|
||||
uuid = %uuid,
|
||||
job_run_id = %job_run_id,
|
||||
"Partition: Created in Building state"
|
||||
);
|
||||
|
||||
uuid
|
||||
}
|
||||
|
||||
/// Create partitions in Building state
|
||||
/// Used when a job run starts building partitions.
|
||||
/// Note: Partitions no longer have a Missing state - they start directly as Building.
|
||||
pub(crate) fn transition_partitions_to_building(
|
||||
&mut self,
|
||||
partition_refs: &[BuildingPartitionRef],
|
||||
job_run_id: &str,
|
||||
) {
|
||||
for building_ref in partition_refs {
|
||||
if let Some(partition) = self.get_canonical_partition(&building_ref.0.r#ref).cloned() {
|
||||
// Partition already exists - this is an error unless we're retrying from UpForRetry
|
||||
match partition {
|
||||
Partition::UpForRetry(_) => {
|
||||
// Valid: UpForRetry -> Building (retry after deps satisfied)
|
||||
// Old partition stays in partitions_by_uuid as historical record
|
||||
// Create new Building partition with fresh UUID
|
||||
let uuid =
|
||||
self.create_partition_building(job_run_id, building_ref.0.clone());
|
||||
tracing::info!(
|
||||
partition = %building_ref.0.r#ref,
|
||||
job_run_id = %job_run_id,
|
||||
uuid = %uuid,
|
||||
"Partition: UpForRetry → Building (retry)"
|
||||
);
|
||||
}
|
||||
_ => {
|
||||
panic!(
|
||||
"BUG: Invalid state - partition {} cannot start building from state {:?}",
|
||||
building_ref.0.r#ref, partition
|
||||
)
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Partition doesn't exist yet - create directly in Building state
|
||||
let uuid = self.create_partition_building(job_run_id, building_ref.0.clone());
|
||||
tracing::info!(
|
||||
partition = %building_ref.0.r#ref,
|
||||
job_run_id = %job_run_id,
|
||||
uuid = %uuid,
|
||||
"Partition: (new) → Building"
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition partitions from Building to Live state
|
||||
/// Used when a job run successfully completes
|
||||
pub(crate) fn transition_partitions_to_live(
|
||||
&mut self,
|
||||
partition_refs: &[LivePartitionRef],
|
||||
job_run_id: &str,
|
||||
timestamp: u64,
|
||||
) {
|
||||
for pref in partition_refs {
|
||||
let partition = self
|
||||
.take_canonical_partition(&pref.0.r#ref)
|
||||
.expect(&format!(
|
||||
"BUG: Partition {} must exist and be in Building state before completion",
|
||||
pref.0.r#ref
|
||||
));
|
||||
|
||||
// ONLY valid transition: Building -> Live
|
||||
let transitioned = match partition {
|
||||
Partition::Building(building) => {
|
||||
tracing::info!(
|
||||
partition = %pref.0.r#ref,
|
||||
job_run_id = %job_run_id,
|
||||
"Partition: Building → Live"
|
||||
);
|
||||
Partition::Live(building.complete(timestamp))
|
||||
}
|
||||
// All other states are invalid
|
||||
_ => {
|
||||
panic!(
|
||||
"BUG: Invalid state - partition {} must be Building to transition to Live, found {:?}",
|
||||
pref.0.r#ref, partition
|
||||
)
|
||||
}
|
||||
};
|
||||
self.update_partition(transitioned);
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition partitions from Building to Failed state
|
||||
/// Used when a job run fails
|
||||
pub(crate) fn transition_partitions_to_failed(
|
||||
&mut self,
|
||||
partition_refs: &[FailedPartitionRef],
|
||||
job_run_id: &str,
|
||||
timestamp: u64,
|
||||
) {
|
||||
for pref in partition_refs {
|
||||
let partition = self
|
||||
.take_canonical_partition(&pref.0.r#ref)
|
||||
.expect(&format!(
|
||||
"BUG: Partition {} must exist and be in Building state before failure",
|
||||
pref.0.r#ref
|
||||
));
|
||||
|
||||
// ONLY valid transition: Building -> Failed
|
||||
let transitioned = match partition {
|
||||
Partition::Building(building) => {
|
||||
tracing::info!(
|
||||
partition = %pref.0.r#ref,
|
||||
job_run_id = %job_run_id,
|
||||
"Partition: Building → Failed"
|
||||
);
|
||||
Partition::Failed(building.fail(timestamp))
|
||||
}
|
||||
// All other states are invalid
|
||||
_ => {
|
||||
panic!(
|
||||
"BUG: Invalid state - partition {} must be Building to transition to Failed, found {:?}",
|
||||
pref.0.r#ref, partition
|
||||
)
|
||||
}
|
||||
};
|
||||
self.update_partition(transitioned);
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition partitions from Building to UpstreamBuilding state
|
||||
/// Used when a job run encounters missing dependencies and cannot proceed.
|
||||
/// The partition waits for its upstream deps to be built before becoming UpForRetry.
|
||||
pub(crate) fn transition_partitions_to_upstream_building(
|
||||
&mut self,
|
||||
partition_refs: &[BuildingPartitionRef],
|
||||
missing_deps: Vec<PartitionRef>,
|
||||
) {
|
||||
for building_ref in partition_refs {
|
||||
let partition = self
|
||||
.take_canonical_partition(&building_ref.0.r#ref)
|
||||
.expect(&format!(
|
||||
"BUG: Partition {} must exist and be in Building state during dep_miss",
|
||||
building_ref.0.r#ref
|
||||
));
|
||||
|
||||
// Only valid transition: Building -> UpstreamBuilding
|
||||
let transitioned = match partition {
|
||||
Partition::Building(building) => {
|
||||
let partition_uuid = building.uuid;
|
||||
tracing::info!(
|
||||
partition = %building_ref.0.r#ref,
|
||||
uuid = %partition_uuid,
|
||||
missing_deps = ?missing_deps.iter().map(|p| &p.r#ref).collect::<Vec<_>>(),
|
||||
"Partition: Building → UpstreamBuilding (dep miss)"
|
||||
);
|
||||
|
||||
// Update downstream_waiting index: for each missing dep, record that this partition is waiting
|
||||
for missing_dep in &missing_deps {
|
||||
self.downstream_waiting
|
||||
.entry(missing_dep.r#ref.clone())
|
||||
.or_default()
|
||||
.push(partition_uuid);
|
||||
}
|
||||
|
||||
Partition::UpstreamBuilding(building.dep_miss(missing_deps.clone()))
|
||||
}
|
||||
// All other states are invalid
|
||||
_ => {
|
||||
panic!(
|
||||
"BUG: Invalid state - partition {} must be Building during dep_miss, found {:?}",
|
||||
building_ref.0.r#ref, partition
|
||||
)
|
||||
}
|
||||
};
|
||||
self.update_partition(transitioned);
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition partitions from UpstreamBuilding to UpForRetry when their upstream deps become Live.
|
||||
/// This should be called when partitions become Live to check if any downstream partitions can now retry.
|
||||
/// Uses the `downstream_waiting` index for O(1) lookup of affected partitions.
|
||||
pub(crate) fn unblock_downstream_partitions(
|
||||
&mut self,
|
||||
newly_live_partition_refs: &[LivePartitionRef],
|
||||
) {
|
||||
// Collect UUIDs of partitions that might be unblocked using the inverted index
|
||||
let mut uuids_to_check: Vec<Uuid> = Vec::new();
|
||||
for live_ref in newly_live_partition_refs {
|
||||
if let Some(waiting_uuids) = self.downstream_waiting.get(&live_ref.0.r#ref) {
|
||||
uuids_to_check.extend(waiting_uuids.iter().cloned());
|
||||
}
|
||||
}
|
||||
|
||||
// Deduplicate UUIDs (a partition might be waiting for multiple deps that all became live)
|
||||
uuids_to_check.sort();
|
||||
uuids_to_check.dedup();
|
||||
|
||||
for uuid in uuids_to_check {
|
||||
// Get partition by UUID - it might have been transitioned already or no longer exist
|
||||
let Some(partition) = self.partitions_by_uuid.get(&uuid).cloned() else {
|
||||
continue;
|
||||
};
|
||||
|
||||
let partition_ref = partition.partition_ref().r#ref.clone();
|
||||
|
||||
// Only process UpstreamBuilding partitions
|
||||
if let Partition::UpstreamBuilding(mut upstream_building) = partition {
|
||||
// Remove satisfied deps from missing_deps
|
||||
for live_ref in newly_live_partition_refs {
|
||||
upstream_building
|
||||
.state
|
||||
.missing_deps
|
||||
.retain(|d| d.r#ref != live_ref.0.r#ref);
|
||||
// Also remove from downstream_waiting index
|
||||
if let Some(waiting) = self.downstream_waiting.get_mut(&live_ref.0.r#ref) {
|
||||
waiting.retain(|u| *u != uuid);
|
||||
}
|
||||
}
|
||||
|
||||
let transitioned = if upstream_building.state.missing_deps.is_empty() {
|
||||
// All deps satisfied, transition to UpForRetry
|
||||
tracing::info!(
|
||||
partition = %partition_ref,
|
||||
uuid = %uuid,
|
||||
"Partition: UpstreamBuilding → UpForRetry (all upstreams satisfied)"
|
||||
);
|
||||
Partition::UpForRetry(upstream_building.upstreams_satisfied())
|
||||
} else {
|
||||
// Still waiting for more deps
|
||||
tracing::debug!(
|
||||
partition = %partition_ref,
|
||||
uuid = %uuid,
|
||||
remaining_deps = ?upstream_building.state.missing_deps.iter().map(|d| &d.r#ref).collect::<Vec<_>>(),
|
||||
"Partition remains in UpstreamBuilding (still waiting for deps)"
|
||||
);
|
||||
Partition::UpstreamBuilding(upstream_building)
|
||||
};
|
||||
|
||||
self.update_partition(transitioned);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Cascade failures to downstream partitions when their upstream dependencies fail.
|
||||
/// Transitions UpstreamBuilding → UpstreamFailed for partitions waiting on failed upstreams.
|
||||
/// Uses the `downstream_waiting` index for O(1) lookup of affected partitions.
|
||||
pub(crate) fn cascade_failures_to_downstream_partitions(
|
||||
&mut self,
|
||||
failed_partition_refs: &[FailedPartitionRef],
|
||||
) {
|
||||
// Collect UUIDs of partitions that are waiting for the failed partitions
|
||||
let mut uuids_to_fail: Vec<Uuid> = Vec::new();
|
||||
for failed_ref in failed_partition_refs {
|
||||
if let Some(waiting_uuids) = self.downstream_waiting.get(&failed_ref.0.r#ref) {
|
||||
uuids_to_fail.extend(waiting_uuids.iter().cloned());
|
||||
}
|
||||
}
|
||||
|
||||
// Deduplicate UUIDs
|
||||
uuids_to_fail.sort();
|
||||
uuids_to_fail.dedup();
|
||||
|
||||
for uuid in uuids_to_fail {
|
||||
// Get partition by UUID
|
||||
let Some(partition) = self.partitions_by_uuid.get(&uuid).cloned() else {
|
||||
continue;
|
||||
};
|
||||
|
||||
let partition_ref = partition.partition_ref().r#ref.clone();
|
||||
|
||||
// Only process UpstreamBuilding partitions
|
||||
if let Partition::UpstreamBuilding(upstream_building) = partition {
|
||||
// Collect which upstream refs failed
|
||||
let failed_upstream_refs: Vec<PartitionRef> = failed_partition_refs
|
||||
.iter()
|
||||
.filter(|f| {
|
||||
upstream_building
|
||||
.state
|
||||
.missing_deps
|
||||
.iter()
|
||||
.any(|d| d.r#ref == f.0.r#ref)
|
||||
})
|
||||
.map(|f| f.0.clone())
|
||||
.collect();
|
||||
|
||||
if !failed_upstream_refs.is_empty() {
|
||||
tracing::info!(
|
||||
partition = %partition_ref,
|
||||
uuid = %uuid,
|
||||
failed_upstreams = ?failed_upstream_refs.iter().map(|p| &p.r#ref).collect::<Vec<_>>(),
|
||||
"Partition: UpstreamBuilding → UpstreamFailed (upstream failed)"
|
||||
);
|
||||
|
||||
// Remove from downstream_waiting index for all deps
|
||||
for dep in &upstream_building.state.missing_deps {
|
||||
if let Some(waiting) = self.downstream_waiting.get_mut(&dep.r#ref) {
|
||||
waiting.retain(|u| *u != uuid);
|
||||
}
|
||||
}
|
||||
|
||||
// Transition to UpstreamFailed
|
||||
let transitioned = Partition::UpstreamFailed(
|
||||
upstream_building
|
||||
.upstream_failed(failed_upstream_refs, current_timestamp()),
|
||||
);
|
||||
self.update_partition(transitioned);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
362
databuild/build_state/queries.rs
Normal file
362
databuild/build_state/queries.rs
Normal file
|
|
@ -0,0 +1,362 @@
|
|||
//! Query methods for BuildState
|
||||
//!
|
||||
//! Read-only methods for accessing state (get_*, list_*) used by the API layer.
|
||||
|
||||
use crate::util::{HasRelatedIds, RelatedIds};
|
||||
use crate::{
|
||||
GetJobRunResponse, GetPartitionResponse, GetWantResponse, JobRunDetail, ListJobRunsRequest,
|
||||
ListJobRunsResponse, ListPartitionsRequest, ListPartitionsResponse, ListTaintsRequest,
|
||||
ListTaintsResponse, ListWantsRequest, ListWantsResponse, PartitionDetail, RelatedEntities,
|
||||
TaintDetail, WantDetail,
|
||||
};
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
use super::{BuildState, consts};
|
||||
|
||||
impl BuildState {
|
||||
pub fn get_want(&self, want_id: &str) -> Option<WantDetail> {
|
||||
self.wants.get(want_id).map(|w| {
|
||||
let mut detail = w.to_detail();
|
||||
// Populate job_runs and compute derivative_want_ids by traversing job runs.
|
||||
//
|
||||
// derivative_want_ids is computed at query time rather than maintained during
|
||||
// event handling. The relationship flows: Want → JobRun → (dep-miss) → EphemeralWant
|
||||
//
|
||||
// - JobRun tracks which derivative wants it spawned (on DepMissState)
|
||||
// - Want only tracks which job runs serviced it (job_run_ids)
|
||||
// - At query time, we traverse: Want's job_run_ids → each JobRun's derivative_want_ids
|
||||
//
|
||||
// This keeps event handling simple (just update the job run) and keeps JobRun
|
||||
// as the source of truth for derivative want relationships.
|
||||
for job_run_id in &detail.job_run_ids {
|
||||
if let Some(job_run) = self.job_runs.get(job_run_id) {
|
||||
let job_detail = job_run.to_detail();
|
||||
// Collect derivative want IDs
|
||||
for derivative_want_id in &job_detail.derivative_want_ids {
|
||||
if !detail.derivative_want_ids.contains(derivative_want_id) {
|
||||
detail.derivative_want_ids.push(derivative_want_id.clone());
|
||||
}
|
||||
}
|
||||
// Add full job run details
|
||||
detail.job_runs.push(job_detail);
|
||||
}
|
||||
}
|
||||
detail
|
||||
})
|
||||
}
|
||||
|
||||
pub fn get_taint(&self, taint_id: &str) -> Option<TaintDetail> {
|
||||
self.taints.get(taint_id).cloned()
|
||||
}
|
||||
|
||||
pub fn get_partition(&self, partition_id: &str) -> Option<PartitionDetail> {
|
||||
self.get_canonical_partition(partition_id)
|
||||
.map(|p| p.to_detail())
|
||||
}
|
||||
|
||||
pub fn get_job_run(&self, job_run_id: &str) -> Option<JobRunDetail> {
|
||||
self.job_runs.get(job_run_id).map(|jr| jr.to_detail())
|
||||
}
|
||||
|
||||
pub fn list_wants(&self, request: &ListWantsRequest) -> ListWantsResponse {
|
||||
let page = request.page.unwrap_or(0);
|
||||
let page_size = request.page_size.unwrap_or(consts::DEFAULT_PAGE_SIZE);
|
||||
|
||||
let start = page * page_size;
|
||||
|
||||
// Paginate first, then convert only the needed wants to WantDetail
|
||||
let data: Vec<WantDetail> = self
|
||||
.wants
|
||||
.values()
|
||||
.skip(start as usize)
|
||||
.take(page_size as usize)
|
||||
.map(|w| w.to_detail())
|
||||
.collect();
|
||||
|
||||
ListWantsResponse {
|
||||
data,
|
||||
match_count: self.wants.len() as u64,
|
||||
page,
|
||||
page_size,
|
||||
index: None,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn list_taints(&self, request: &ListTaintsRequest) -> ListTaintsResponse {
|
||||
let page = request.page.unwrap_or(0);
|
||||
let page_size = request.page_size.unwrap_or(consts::DEFAULT_PAGE_SIZE);
|
||||
ListTaintsResponse {
|
||||
data: list_state_items(&self.taints, page, page_size),
|
||||
match_count: self.wants.len() as u64,
|
||||
page,
|
||||
page_size,
|
||||
index: None,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn list_partitions(&self, request: &ListPartitionsRequest) -> ListPartitionsResponse {
|
||||
let page = request.page.unwrap_or(0);
|
||||
let page_size = request.page_size.unwrap_or(consts::DEFAULT_PAGE_SIZE);
|
||||
// Convert canonical partitions to PartitionDetail for API
|
||||
let partition_details: BTreeMap<String, PartitionDetail> = self
|
||||
.canonical_partitions
|
||||
.iter()
|
||||
.filter_map(|(k, uuid)| {
|
||||
self.partitions_by_uuid
|
||||
.get(uuid)
|
||||
.map(|p| (k.clone(), p.to_detail()))
|
||||
})
|
||||
.collect();
|
||||
ListPartitionsResponse {
|
||||
data: list_state_items(&partition_details, page, page_size),
|
||||
match_count: self.canonical_partitions.len() as u64,
|
||||
page,
|
||||
page_size,
|
||||
index: None,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn list_job_runs(&self, request: &ListJobRunsRequest) -> ListJobRunsResponse {
|
||||
let page = request.page.unwrap_or(0);
|
||||
let page_size = request.page_size.unwrap_or(consts::DEFAULT_PAGE_SIZE);
|
||||
|
||||
let start = page * page_size;
|
||||
let data: Vec<JobRunDetail> = self
|
||||
.job_runs
|
||||
.values()
|
||||
.skip(start as usize)
|
||||
.take(page_size as usize)
|
||||
.map(|jr| jr.to_detail())
|
||||
.collect();
|
||||
|
||||
ListJobRunsResponse {
|
||||
data,
|
||||
match_count: self.job_runs.len() as u64,
|
||||
page,
|
||||
page_size,
|
||||
index: None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn list_state_items<T: Clone>(map: &BTreeMap<String, T>, page: u64, page_size: u64) -> Vec<T> {
|
||||
// TODO when we add filtering, can we add it generically via some trait or filter object that can be provided?
|
||||
let start = page * page_size;
|
||||
let end = start + page_size;
|
||||
map.values()
|
||||
.skip(start as usize)
|
||||
.take(end as usize)
|
||||
.cloned()
|
||||
.collect()
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Response builders with RelatedEntities index
|
||||
// ============================================================================
|
||||
|
||||
impl BuildState {
|
||||
/// Resolve RelatedIds to a RelatedEntities index by looking up entities in BuildState.
|
||||
/// This is the central method for building the index from collected IDs.
|
||||
pub fn resolve_related_ids(&self, ids: &RelatedIds) -> RelatedEntities {
|
||||
let mut index = RelatedEntities::default();
|
||||
|
||||
// Resolve partition refs
|
||||
for partition_ref in &ids.partition_refs {
|
||||
if !index.partitions.contains_key(partition_ref) {
|
||||
if let Some(p) = self.get_canonical_partition(partition_ref) {
|
||||
index
|
||||
.partitions
|
||||
.insert(partition_ref.clone(), p.to_detail());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Resolve partition UUIDs
|
||||
for uuid in &ids.partition_uuids {
|
||||
if let Some(p) = self.partitions_by_uuid.get(uuid) {
|
||||
let detail = p.to_detail();
|
||||
if let Some(ref pref) = detail.r#ref {
|
||||
if !index.partitions.contains_key(&pref.r#ref) {
|
||||
index.partitions.insert(pref.r#ref.clone(), detail);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Resolve job run IDs
|
||||
for job_run_id in &ids.job_run_ids {
|
||||
if !index.job_runs.contains_key(job_run_id) {
|
||||
if let Some(jr) = self.job_runs.get(job_run_id) {
|
||||
index.job_runs.insert(job_run_id.clone(), jr.to_detail());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Resolve want IDs
|
||||
for want_id in &ids.want_ids {
|
||||
if !index.wants.contains_key(want_id) {
|
||||
if let Some(w) = self.wants.get(want_id) {
|
||||
index.wants.insert(want_id.clone(), w.to_detail());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
index
|
||||
}
|
||||
|
||||
/// Get a want with its related entities (job runs, partitions)
|
||||
pub fn get_want_with_index(&self, want_id: &str) -> Option<GetWantResponse> {
|
||||
let want = self.wants.get(want_id)?;
|
||||
let want_detail = want.to_detail();
|
||||
let ids = want.related_ids();
|
||||
let index = self.resolve_related_ids(&ids);
|
||||
|
||||
Some(GetWantResponse {
|
||||
data: Some(want_detail),
|
||||
index: Some(index),
|
||||
})
|
||||
}
|
||||
|
||||
/// Get a partition with its related entities (builder job run, downstream consumers)
|
||||
pub fn get_partition_with_index(&self, partition_ref: &str) -> Option<GetPartitionResponse> {
|
||||
let partition = self.get_canonical_partition(partition_ref)?;
|
||||
let partition_detail = partition.to_detail();
|
||||
|
||||
let mut ids = partition.related_ids();
|
||||
|
||||
// Add downstream consumers from the consumer index (not stored on partition)
|
||||
let uuid = partition.uuid();
|
||||
for (output_uuid, job_run_id) in self.get_partition_consumers(&uuid) {
|
||||
if !ids.partition_uuids.contains(output_uuid) {
|
||||
ids.partition_uuids.push(*output_uuid);
|
||||
}
|
||||
if !ids.job_run_ids.contains(job_run_id) {
|
||||
ids.job_run_ids.push(job_run_id.clone());
|
||||
}
|
||||
}
|
||||
|
||||
// Add wants that reference this partition (from inverted index)
|
||||
for want_id in self.get_wants_for_partition(partition_ref) {
|
||||
if !ids.want_ids.contains(want_id) {
|
||||
ids.want_ids.push(want_id.clone());
|
||||
}
|
||||
}
|
||||
|
||||
let index = self.resolve_related_ids(&ids);
|
||||
|
||||
Some(GetPartitionResponse {
|
||||
data: Some(partition_detail),
|
||||
index: Some(index),
|
||||
})
|
||||
}
|
||||
|
||||
/// Get a job run with its related entities (read/wrote partitions, derivative wants)
|
||||
pub fn get_job_run_with_index(&self, job_run_id: &str) -> Option<GetJobRunResponse> {
|
||||
let job_run = self.job_runs.get(job_run_id)?;
|
||||
let job_run_detail = job_run.to_detail();
|
||||
let ids = job_run.related_ids();
|
||||
let index = self.resolve_related_ids(&ids);
|
||||
|
||||
Some(GetJobRunResponse {
|
||||
data: Some(job_run_detail),
|
||||
index: Some(index),
|
||||
})
|
||||
}
|
||||
|
||||
/// List wants with related entities index
|
||||
pub fn list_wants_with_index(&self, request: &ListWantsRequest) -> ListWantsResponse {
|
||||
let page = request.page.unwrap_or(0);
|
||||
let page_size = request.page_size.unwrap_or(consts::DEFAULT_PAGE_SIZE);
|
||||
let start = page * page_size;
|
||||
|
||||
let wants: Vec<_> = self
|
||||
.wants
|
||||
.values()
|
||||
.skip(start as usize)
|
||||
.take(page_size as usize)
|
||||
.collect();
|
||||
|
||||
// Collect related IDs from all wants
|
||||
let mut all_ids = RelatedIds::default();
|
||||
for want in &wants {
|
||||
all_ids.merge(want.related_ids());
|
||||
}
|
||||
|
||||
let data: Vec<WantDetail> = wants.iter().map(|w| w.to_detail()).collect();
|
||||
let index = self.resolve_related_ids(&all_ids);
|
||||
|
||||
ListWantsResponse {
|
||||
data,
|
||||
match_count: self.wants.len() as u64,
|
||||
page,
|
||||
page_size,
|
||||
index: Some(index),
|
||||
}
|
||||
}
|
||||
|
||||
/// List partitions with related entities index
|
||||
pub fn list_partitions_with_index(
|
||||
&self,
|
||||
request: &ListPartitionsRequest,
|
||||
) -> ListPartitionsResponse {
|
||||
let page = request.page.unwrap_or(0);
|
||||
let page_size = request.page_size.unwrap_or(consts::DEFAULT_PAGE_SIZE);
|
||||
let start = page * page_size;
|
||||
|
||||
let partitions: Vec<_> = self
|
||||
.canonical_partitions
|
||||
.iter()
|
||||
.skip(start as usize)
|
||||
.take(page_size as usize)
|
||||
.filter_map(|(_, uuid)| self.partitions_by_uuid.get(uuid))
|
||||
.collect();
|
||||
|
||||
// Collect related IDs from all partitions
|
||||
let mut all_ids = RelatedIds::default();
|
||||
for partition in &partitions {
|
||||
all_ids.merge(partition.related_ids());
|
||||
}
|
||||
|
||||
let data: Vec<PartitionDetail> = partitions.iter().map(|p| p.to_detail()).collect();
|
||||
let index = self.resolve_related_ids(&all_ids);
|
||||
|
||||
ListPartitionsResponse {
|
||||
data,
|
||||
match_count: self.canonical_partitions.len() as u64,
|
||||
page,
|
||||
page_size,
|
||||
index: Some(index),
|
||||
}
|
||||
}
|
||||
|
||||
/// List job runs with related entities index
|
||||
pub fn list_job_runs_with_index(&self, request: &ListJobRunsRequest) -> ListJobRunsResponse {
|
||||
let page = request.page.unwrap_or(0);
|
||||
let page_size = request.page_size.unwrap_or(consts::DEFAULT_PAGE_SIZE);
|
||||
let start = page * page_size;
|
||||
|
||||
let job_runs: Vec<_> = self
|
||||
.job_runs
|
||||
.values()
|
||||
.skip(start as usize)
|
||||
.take(page_size as usize)
|
||||
.collect();
|
||||
|
||||
// Collect related IDs from all job runs
|
||||
let mut all_ids = RelatedIds::default();
|
||||
for job_run in &job_runs {
|
||||
all_ids.merge(job_run.related_ids());
|
||||
}
|
||||
|
||||
let data: Vec<JobRunDetail> = job_runs.iter().map(|jr| jr.to_detail()).collect();
|
||||
let index = self.resolve_related_ids(&all_ids);
|
||||
|
||||
ListJobRunsResponse {
|
||||
data,
|
||||
match_count: self.job_runs.len() as u64,
|
||||
page,
|
||||
page_size,
|
||||
index: Some(index),
|
||||
}
|
||||
}
|
||||
}
|
||||
176
databuild/build_state/schedulability.rs
Normal file
176
databuild/build_state/schedulability.rs
Normal file
|
|
@ -0,0 +1,176 @@
|
|||
//! Want schedulability logic
|
||||
//!
|
||||
//! Types and methods for determining whether wants are schedulable based on
|
||||
//! upstream partition states and target partition build status.
|
||||
|
||||
use crate::partition_state::{
|
||||
BuildingPartitionRef, LivePartitionRef, Partition, TaintedPartitionRef,
|
||||
};
|
||||
use crate::{PartitionRef, WantDetail};
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
use super::BuildState;
|
||||
|
||||
/// The status of partitions required by a want to build (sensed from dep miss job run)
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct WantUpstreamStatus {
|
||||
pub live: Vec<LivePartitionRef>,
|
||||
pub tainted: Vec<TaintedPartitionRef>,
|
||||
/// Upstream partitions that are not ready (don't exist, or are in Building/UpstreamBuilding/UpForRetry/Failed/UpstreamFailed states)
|
||||
pub not_ready: Vec<PartitionRef>,
|
||||
/// Target partitions that are currently being built by another job
|
||||
pub building: Vec<BuildingPartitionRef>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct WantSchedulability {
|
||||
pub want: WantDetail,
|
||||
pub status: WantUpstreamStatus,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct WantsSchedulability(pub Vec<WantSchedulability>);
|
||||
|
||||
impl WantsSchedulability {
|
||||
pub fn schedulable_wants(self) -> Vec<WantDetail> {
|
||||
self.0
|
||||
.iter()
|
||||
.filter_map(|ws| match ws.is_schedulable() {
|
||||
false => None,
|
||||
true => Some(ws.want.clone()),
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
}
|
||||
|
||||
impl WantSchedulability {
|
||||
pub fn is_schedulable(&self) -> bool {
|
||||
// Want is schedulable if:
|
||||
// - No not-ready upstream dependencies (must all be Live or Tainted)
|
||||
// - No tainted upstream dependencies
|
||||
// - No target partitions currently being built by another job
|
||||
self.status.not_ready.is_empty()
|
||||
&& self.status.tainted.is_empty()
|
||||
&& self.status.building.is_empty()
|
||||
}
|
||||
}
|
||||
|
||||
impl BuildState {
|
||||
/// Wants are schedulable when their upstreams are ready and target partitions are not already building
|
||||
pub fn want_schedulability(&self, want: &WantDetail) -> WantSchedulability {
|
||||
// Check upstream partition statuses (dependencies)
|
||||
let mut live: Vec<LivePartitionRef> = Vec::new();
|
||||
let mut tainted: Vec<TaintedPartitionRef> = Vec::new();
|
||||
let mut not_ready: Vec<PartitionRef> = Vec::new(); // Partitions that don't exist or aren't Live
|
||||
|
||||
for upstream_ref in &want.upstreams {
|
||||
match self.get_canonical_partition(&upstream_ref.r#ref) {
|
||||
Some(partition) => {
|
||||
match partition {
|
||||
Partition::Live(p) => live.push(p.get_ref()),
|
||||
Partition::Tainted(p) => tainted.push(p.get_ref()),
|
||||
// All other states (Building, UpstreamBuilding, UpForRetry, Failed, UpstreamFailed) mean upstream is not ready
|
||||
_ => not_ready.push(upstream_ref.clone()),
|
||||
}
|
||||
}
|
||||
None => {
|
||||
// Partition doesn't exist yet - it's not ready
|
||||
not_ready.push(upstream_ref.clone());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Check target partition statuses (what this want is trying to build)
|
||||
// If any target partition is already Building, this want should wait
|
||||
let mut building: Vec<BuildingPartitionRef> = Vec::new();
|
||||
for target_ref in &want.partitions {
|
||||
if let Some(partition) = self.get_canonical_partition(&target_ref.r#ref) {
|
||||
if let Partition::Building(p) = partition {
|
||||
building.push(p.get_ref());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
WantSchedulability {
|
||||
want: want.clone(),
|
||||
status: WantUpstreamStatus {
|
||||
live,
|
||||
tainted,
|
||||
not_ready,
|
||||
building,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
pub fn wants_schedulability(&self) -> WantsSchedulability {
|
||||
WantsSchedulability(
|
||||
self.wants
|
||||
.values()
|
||||
// Use type-safe is_schedulable() - only Idle wants are schedulable
|
||||
.filter(|w| w.is_schedulable())
|
||||
.map(|w| self.want_schedulability(&w.to_detail()))
|
||||
.collect(),
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::want_state::{IdleState as WantIdleState, Want, WantInfo, WantWithState};
|
||||
use crate::{PartitionDetail, PartitionRef, PartitionStatus, WantStatus};
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
impl WantDetail {
|
||||
fn with_partitions(self, partitions: Vec<PartitionRef>) -> Self {
|
||||
Self { partitions, ..self }
|
||||
}
|
||||
fn with_upstreams(self, upstreams: Vec<PartitionRef>) -> Self {
|
||||
Self { upstreams, ..self }
|
||||
}
|
||||
fn with_status(self, status: Option<WantStatus>) -> Self {
|
||||
Self { status, ..self }
|
||||
}
|
||||
}
|
||||
|
||||
impl PartitionDetail {
|
||||
fn with_status(self, status: Option<PartitionStatus>) -> Self {
|
||||
Self { status, ..self }
|
||||
}
|
||||
fn with_ref(self, r#ref: Option<PartitionRef>) -> Self {
|
||||
Self { r#ref, ..self }
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_empty_wants_noop() {
|
||||
assert_eq!(BuildState::default().wants_schedulability().0.len(), 0);
|
||||
}
|
||||
|
||||
// A want with satisfied upstreams (incl "none") should be schedulable
|
||||
#[test]
|
||||
fn test_simple_want_with_live_upstream_is_schedulable() {
|
||||
// Given...
|
||||
let test_partition = "test_partition";
|
||||
let state = BuildState::default()
|
||||
.with_wants(BTreeMap::from([(
|
||||
"foo".to_string(),
|
||||
Want::Idle(WantWithState {
|
||||
want: WantInfo {
|
||||
partitions: vec![test_partition.into()],
|
||||
..Default::default()
|
||||
},
|
||||
state: WantIdleState {},
|
||||
}),
|
||||
)]))
|
||||
.with_partitions(BTreeMap::from([(
|
||||
test_partition.to_string(),
|
||||
PartitionDetail::default().with_ref(Some(test_partition.into())),
|
||||
)]));
|
||||
|
||||
// Should...
|
||||
let schedulability = state.wants_schedulability();
|
||||
let ws = schedulability.0.first().unwrap();
|
||||
assert!(ws.is_schedulable());
|
||||
}
|
||||
}
|
||||
403
databuild/build_state/want_transitions.rs
Normal file
403
databuild/build_state/want_transitions.rs
Normal file
|
|
@ -0,0 +1,403 @@
|
|||
//! Want state transition logic
|
||||
//!
|
||||
//! Methods for transitioning wants between states and managing dependencies
|
||||
//! between wants (derivative wants from dep misses).
|
||||
|
||||
use crate::PartitionRef;
|
||||
use crate::job_run_state::JobRun;
|
||||
use crate::partition_state::{FailedPartitionRef, LivePartitionRef, Partition};
|
||||
use crate::want_state::{FailedWantId, SuccessfulWantId, Want};
|
||||
|
||||
use super::BuildState;
|
||||
|
||||
impl BuildState {
|
||||
/// Handle creation of a derivative want (created due to job dep miss)
|
||||
///
|
||||
/// When a job reports missing dependencies, it returns WantCreateV1 events for those missing partitions.
|
||||
/// Those events get appended to the BEL and eventually processed by handle_want_create().
|
||||
///
|
||||
/// This function is called when we detect a derivative want (has source.job_triggered) and transitions
|
||||
/// the impacted wants to UpstreamBuilding state, tracking the derivative want ID as an upstream dependency.
|
||||
///
|
||||
/// KEY INSIGHT: We must use the actual want_id from the WantCreateV1 event, not synthetic UUIDs generated
|
||||
/// during event processing. This ensures replay works correctly - the same want IDs are used both during
|
||||
/// original execution and during replay from the BEL.
|
||||
pub(crate) fn handle_derivative_want_creation(
|
||||
&mut self,
|
||||
derivative_want_id: &str,
|
||||
derivative_want_partitions: &[PartitionRef],
|
||||
source_job_run_id: &str,
|
||||
) {
|
||||
// Look up the job run that triggered this derivative want
|
||||
// This job run must be in DepMiss state because it reported missing dependencies
|
||||
let job_run = self.job_runs.get(source_job_run_id).expect(&format!(
|
||||
"BUG: Job run {} must exist when derivative want created",
|
||||
source_job_run_id
|
||||
));
|
||||
|
||||
// Extract the missing deps from the DepMiss job run
|
||||
let missing_deps = match job_run {
|
||||
JobRun::DepMiss(dep_miss) => dep_miss.get_missing_deps(),
|
||||
_ => {
|
||||
panic!(
|
||||
"BUG: Job run {} must be in DepMiss state when derivative want created, found {:?}",
|
||||
source_job_run_id, job_run
|
||||
);
|
||||
}
|
||||
};
|
||||
|
||||
// Find which MissingDeps entry corresponds to this derivative want
|
||||
// The derivative want was created for a specific set of missing partitions,
|
||||
// and we need to find which downstream partitions are impacted by those missing partitions
|
||||
for md in missing_deps {
|
||||
// Check if this derivative want's partitions match the missing partitions in this entry
|
||||
// We need exact match because one dep miss event can create multiple derivative wants
|
||||
let partitions_match = md.missing.iter().all(|missing_ref| {
|
||||
derivative_want_partitions
|
||||
.iter()
|
||||
.any(|p| p.r#ref == missing_ref.r#ref)
|
||||
}) && derivative_want_partitions.len() == md.missing.len();
|
||||
|
||||
if partitions_match {
|
||||
// Now we know which partitions are impacted by this missing dependency
|
||||
let impacted_partition_refs: Vec<String> =
|
||||
md.impacted.iter().map(|p| p.r#ref.clone()).collect();
|
||||
|
||||
tracing::debug!(
|
||||
derivative_want_id = %derivative_want_id,
|
||||
source_job_run_id = %source_job_run_id,
|
||||
missing_partitions = ?derivative_want_partitions.iter().map(|p| &p.r#ref).collect::<Vec<_>>(),
|
||||
impacted_partitions = ?impacted_partition_refs,
|
||||
"Processing derivative want creation"
|
||||
);
|
||||
|
||||
// Find all wants that include these impacted partitions
|
||||
// These are the wants that need to wait for the derivative want to complete
|
||||
let mut impacted_want_ids: std::collections::HashSet<String> =
|
||||
std::collections::HashSet::new();
|
||||
for partition_ref in &impacted_partition_refs {
|
||||
for want_id in self.get_wants_for_partition(partition_ref) {
|
||||
impacted_want_ids.insert(want_id.clone());
|
||||
}
|
||||
}
|
||||
|
||||
// Transition each impacted want to UpstreamBuilding, tracking this derivative want as an upstream
|
||||
for want_id in impacted_want_ids {
|
||||
let want = self.wants.remove(&want_id).expect(&format!(
|
||||
"BUG: Want {} must exist when processing derivative want",
|
||||
want_id
|
||||
));
|
||||
|
||||
let transitioned = match want {
|
||||
Want::Building(building) => {
|
||||
// First dep miss for this want: Building → UpstreamBuilding
|
||||
tracing::info!(
|
||||
want_id = %want_id,
|
||||
derivative_want_id = %derivative_want_id,
|
||||
"Want: Building → UpstreamBuilding (first missing dep detected)"
|
||||
);
|
||||
Want::UpstreamBuilding(
|
||||
building.detect_missing_deps(vec![derivative_want_id.to_string()]),
|
||||
)
|
||||
}
|
||||
Want::UpstreamBuilding(upstream) => {
|
||||
// Additional dep miss: UpstreamBuilding → UpstreamBuilding (add another upstream)
|
||||
// This can happen if multiple jobs report dep misses for different upstreams
|
||||
tracing::info!(
|
||||
want_id = %want_id,
|
||||
derivative_want_id = %derivative_want_id,
|
||||
"Want: UpstreamBuilding → UpstreamBuilding (additional upstream added)"
|
||||
);
|
||||
Want::UpstreamBuilding(
|
||||
upstream.add_upstreams(vec![derivative_want_id.to_string()]),
|
||||
)
|
||||
}
|
||||
_ => {
|
||||
panic!(
|
||||
"BUG: Want {} in invalid state {:?} when processing derivative want. Should be Building or UpstreamBuilding.",
|
||||
want_id, want
|
||||
);
|
||||
}
|
||||
};
|
||||
|
||||
self.wants.insert(want_id, transitioned);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Complete wants when all their partitions become Live
|
||||
/// Transitions Building → Successful, returns list of newly successful want IDs
|
||||
pub(crate) fn complete_successful_wants(
|
||||
&mut self,
|
||||
newly_live_partitions: &[LivePartitionRef],
|
||||
job_run_id: &str,
|
||||
timestamp: u64,
|
||||
) -> Vec<SuccessfulWantId> {
|
||||
let mut newly_successful_wants: Vec<SuccessfulWantId> = Vec::new();
|
||||
|
||||
for pref in newly_live_partitions {
|
||||
let want_ids: Vec<String> = self.get_wants_for_partition(&pref.0.r#ref).to_vec();
|
||||
|
||||
for want_id in want_ids {
|
||||
let want = self.wants.remove(&want_id).expect(&format!(
|
||||
"BUG: Want {} must exist when referenced by partition",
|
||||
want_id
|
||||
));
|
||||
|
||||
let transitioned = match want {
|
||||
Want::Building(building) => {
|
||||
// Check if ALL partitions for this want are now Live
|
||||
let all_partitions_live = building.want.partitions.iter().all(|p| {
|
||||
self.get_canonical_partition(&p.r#ref)
|
||||
.map(|partition| partition.is_live())
|
||||
.unwrap_or(false)
|
||||
});
|
||||
|
||||
if all_partitions_live {
|
||||
let successful_want =
|
||||
building.complete(job_run_id.to_string(), timestamp);
|
||||
tracing::info!(
|
||||
want_id = %want_id,
|
||||
job_run_id = %job_run_id,
|
||||
"Want: Building → Successful"
|
||||
);
|
||||
newly_successful_wants.push(successful_want.get_id());
|
||||
Want::Successful(successful_want)
|
||||
} else {
|
||||
Want::Building(building) // Still building other partitions
|
||||
}
|
||||
}
|
||||
_ => {
|
||||
panic!(
|
||||
"BUG: Want {} in invalid state {:?} when partition {} became Live. Should be Building.",
|
||||
want_id, want, pref.0.r#ref
|
||||
);
|
||||
}
|
||||
};
|
||||
|
||||
self.wants.insert(want_id.clone(), transitioned);
|
||||
}
|
||||
}
|
||||
|
||||
newly_successful_wants
|
||||
}
|
||||
|
||||
/// Fail wants when their partitions fail
|
||||
/// Transitions Building → Failed, and adds to already-failed wants
|
||||
/// Returns list of newly failed want IDs for downstream cascade
|
||||
pub(crate) fn fail_directly_affected_wants(
|
||||
&mut self,
|
||||
failed_partitions: &[FailedPartitionRef],
|
||||
) -> Vec<FailedWantId> {
|
||||
let mut newly_failed_wants: Vec<FailedWantId> = Vec::new();
|
||||
|
||||
for pref in failed_partitions {
|
||||
let want_ids: Vec<String> = self.get_wants_for_partition(&pref.0.r#ref).to_vec();
|
||||
|
||||
for want_id in want_ids {
|
||||
let want = self.wants.remove(&want_id).expect(&format!(
|
||||
"BUG: Want {} must exist when referenced by partition",
|
||||
want_id
|
||||
));
|
||||
|
||||
let transitioned = match want {
|
||||
Want::Building(building) => {
|
||||
let failed = building
|
||||
.fail(vec![pref.0.clone()], "Partition build failed".to_string());
|
||||
newly_failed_wants.push(failed.get_id());
|
||||
Want::Failed(failed)
|
||||
}
|
||||
// Failed → Failed: add new failed partition to existing failed state
|
||||
Want::Failed(failed) => {
|
||||
Want::Failed(failed.add_failed_partitions(vec![pref.clone()]))
|
||||
}
|
||||
_ => {
|
||||
panic!(
|
||||
"BUG: Want {} in invalid state {:?} when partition {} failed. Should be Building or Failed.",
|
||||
want_id, want, pref.0.r#ref
|
||||
);
|
||||
}
|
||||
};
|
||||
|
||||
self.wants.insert(want_id.clone(), transitioned);
|
||||
}
|
||||
}
|
||||
|
||||
newly_failed_wants
|
||||
}
|
||||
|
||||
/// Unblock downstream wants when their upstream dependencies succeed
|
||||
/// Transitions UpstreamBuilding → Idle (when ready) or Building (when partitions already building)
|
||||
pub(crate) fn unblock_downstream_wants(
|
||||
&mut self,
|
||||
newly_successful_wants: &[SuccessfulWantId],
|
||||
job_run_id: &str,
|
||||
timestamp: u64,
|
||||
) {
|
||||
tracing::debug!(
|
||||
newly_successful_wants = ?newly_successful_wants
|
||||
.iter()
|
||||
.map(|w| &w.0)
|
||||
.collect::<Vec<_>>(),
|
||||
"Checking downstream wants for unblocking"
|
||||
);
|
||||
// Find downstream wants that are waiting for any of the newly successful wants
|
||||
// TODO: Consider adding upstream_want_id -> downstream_want_ids index to avoid iterating all wants
|
||||
let downstream_wants_to_check: Vec<String> = self
|
||||
.wants
|
||||
.iter()
|
||||
.filter_map(|(id, want)| {
|
||||
match want {
|
||||
Want::UpstreamBuilding(downstream_want) => {
|
||||
// Is this downstream want waiting for any of the newly successful wants?
|
||||
let is_affected =
|
||||
downstream_want.state.upstream_want_ids.iter().any(|up_id| {
|
||||
newly_successful_wants.iter().any(|swid| &swid.0 == up_id)
|
||||
});
|
||||
if is_affected { Some(id.clone()) } else { None }
|
||||
}
|
||||
_ => None,
|
||||
}
|
||||
})
|
||||
.collect();
|
||||
tracing::debug!(
|
||||
downstream_wants_to_check = ?downstream_wants_to_check,
|
||||
"Found downstream wants affected by upstream completion"
|
||||
);
|
||||
|
||||
for want_id in downstream_wants_to_check {
|
||||
let want = self
|
||||
.wants
|
||||
.remove(&want_id)
|
||||
.expect(&format!("BUG: Want {} must exist", want_id));
|
||||
|
||||
let transitioned = match want {
|
||||
Want::UpstreamBuilding(downstream_want) => {
|
||||
tracing::debug!(
|
||||
want_id = %want_id,
|
||||
upstreams = ?downstream_want.state.upstream_want_ids,
|
||||
"Checking if all upstreams are satisfied"
|
||||
);
|
||||
// Check if ALL of this downstream want's upstream dependencies are now Successful
|
||||
let all_upstreams_successful = downstream_want
|
||||
.state
|
||||
.upstream_want_ids
|
||||
.iter()
|
||||
.all(|up_want_id| {
|
||||
self.wants
|
||||
.get(up_want_id)
|
||||
.map(|w| matches!(w, Want::Successful(_)))
|
||||
.unwrap_or(false)
|
||||
});
|
||||
tracing::debug!(
|
||||
want_id = %want_id,
|
||||
all_upstreams_successful = %all_upstreams_successful,
|
||||
"Upstream satisfaction check complete"
|
||||
);
|
||||
|
||||
if all_upstreams_successful {
|
||||
// Check if any of this want's partitions are still being built
|
||||
// If a job dep-missed, its partitions transitioned back to Missing
|
||||
// But other jobs might still be building other partitions for this want
|
||||
let any_partition_building =
|
||||
downstream_want.want.partitions.iter().any(|p| {
|
||||
self.get_canonical_partition(&p.r#ref)
|
||||
.map(|partition| matches!(partition, Partition::Building(_)))
|
||||
.unwrap_or(false)
|
||||
});
|
||||
tracing::debug!(
|
||||
want_id = %want_id,
|
||||
any_partition_building = %any_partition_building,
|
||||
"Partition building status check"
|
||||
);
|
||||
|
||||
if any_partition_building {
|
||||
// Some partitions still being built, continue in Building state
|
||||
tracing::info!(
|
||||
want_id = %want_id,
|
||||
job_run_id = %job_run_id,
|
||||
"Want: UpstreamBuilding → Building (upstreams satisfied, partitions building)"
|
||||
);
|
||||
Want::Building(
|
||||
downstream_want
|
||||
.continue_building(job_run_id.to_string(), timestamp),
|
||||
)
|
||||
} else {
|
||||
// No partitions being built, become schedulable again
|
||||
tracing::info!(
|
||||
want_id = %want_id,
|
||||
"Want: UpstreamBuilding → Idle (upstreams satisfied, ready to schedule)"
|
||||
);
|
||||
Want::Idle(downstream_want.upstreams_satisfied())
|
||||
}
|
||||
} else {
|
||||
// Upstreams not all satisfied yet, stay in UpstreamBuilding
|
||||
tracing::debug!(
|
||||
want_id = %want_id,
|
||||
"Want remains in UpstreamBuilding state (upstreams not yet satisfied)"
|
||||
);
|
||||
Want::UpstreamBuilding(downstream_want)
|
||||
}
|
||||
}
|
||||
_ => {
|
||||
panic!("BUG: Want {} should be UpstreamBuilding here", want_id);
|
||||
}
|
||||
};
|
||||
|
||||
self.wants.insert(want_id, transitioned);
|
||||
}
|
||||
}
|
||||
|
||||
/// Cascade failures to downstream wants when their upstream dependencies fail
|
||||
/// Transitions UpstreamBuilding → UpstreamFailed
|
||||
pub(crate) fn cascade_failures_to_downstream_wants(
|
||||
&mut self,
|
||||
newly_failed_wants: &[FailedWantId],
|
||||
timestamp: u64,
|
||||
) {
|
||||
// Find downstream wants that are waiting for any of the newly failed wants
|
||||
// TODO: Consider adding upstream_want_id -> downstream_want_ids index to avoid iterating all wants
|
||||
let downstream_wants_to_fail: Vec<String> = self
|
||||
.wants
|
||||
.iter()
|
||||
.filter_map(|(id, want)| {
|
||||
match want {
|
||||
Want::UpstreamBuilding(downstream_want) => {
|
||||
// Is this downstream want waiting for any of the newly failed wants?
|
||||
let is_affected =
|
||||
downstream_want.state.upstream_want_ids.iter().any(|up_id| {
|
||||
newly_failed_wants.iter().any(|fwid| &fwid.0 == up_id)
|
||||
});
|
||||
if is_affected { Some(id.clone()) } else { None }
|
||||
}
|
||||
_ => None,
|
||||
}
|
||||
})
|
||||
.collect();
|
||||
|
||||
for want_id in downstream_wants_to_fail {
|
||||
let want = self
|
||||
.wants
|
||||
.remove(&want_id)
|
||||
.expect(&format!("BUG: Want {} must exist", want_id));
|
||||
|
||||
let transitioned = match want {
|
||||
Want::UpstreamBuilding(downstream_want) => Want::UpstreamFailed(
|
||||
downstream_want.upstream_failed(
|
||||
newly_failed_wants
|
||||
.iter()
|
||||
.map(|fwid| fwid.0.clone())
|
||||
.collect(),
|
||||
timestamp,
|
||||
),
|
||||
),
|
||||
_ => {
|
||||
panic!("BUG: Want {} should be UpstreamBuilding here", want_id);
|
||||
}
|
||||
};
|
||||
|
||||
self.wants.insert(want_id, transitioned);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,27 +0,0 @@
|
|||
load("@rules_rust//rust:defs.bzl", "rust_binary")
|
||||
|
||||
# DataBuild CLI wrapper using orchestrator
|
||||
rust_binary(
|
||||
name = "databuild_cli",
|
||||
srcs = [
|
||||
"main.rs",
|
||||
"error.rs",
|
||||
],
|
||||
edition = "2021",
|
||||
visibility = ["//visibility:public"],
|
||||
data = [
|
||||
"//databuild/graph:analyze",
|
||||
"//databuild/graph:execute",
|
||||
],
|
||||
deps = [
|
||||
"//databuild:databuild",
|
||||
"@crates//:clap",
|
||||
"@crates//:log",
|
||||
"@crates//:serde",
|
||||
"@crates//:serde_json",
|
||||
"@crates//:simple_logger",
|
||||
"@crates//:thiserror",
|
||||
"@crates//:tokio",
|
||||
"@crates//:uuid",
|
||||
],
|
||||
)
|
||||
|
|
@ -1,31 +0,0 @@
|
|||
use crate::event_log::BuildEventLogError;
|
||||
use crate::orchestration::OrchestrationError;
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum CliError {
|
||||
#[error("Event log error: {0}")]
|
||||
EventLog(#[from] BuildEventLogError),
|
||||
|
||||
#[error("Orchestration error: {0}")]
|
||||
Orchestration(#[from] OrchestrationError),
|
||||
|
||||
#[error("Analysis error: {0}")]
|
||||
Analysis(String),
|
||||
|
||||
#[error("Execution error: {0}")]
|
||||
Execution(String),
|
||||
|
||||
#[error("Environment error: {0}")]
|
||||
Environment(String),
|
||||
|
||||
#[error("Invalid arguments: {0}")]
|
||||
InvalidArguments(String),
|
||||
|
||||
#[error("Database error: {0}")]
|
||||
Database(String),
|
||||
|
||||
#[error("Output formatting error: {0}")]
|
||||
Output(String),
|
||||
}
|
||||
|
||||
pub type Result<T> = std::result::Result<T, CliError>;
|
||||
File diff suppressed because it is too large
Load diff
539
databuild/cli_main.rs
Normal file
539
databuild/cli_main.rs
Normal file
|
|
@ -0,0 +1,539 @@
|
|||
use clap::{Parser, Subcommand};
|
||||
use reqwest::blocking::Client;
|
||||
use std::path::Path;
|
||||
use std::sync::{Arc, RwLock};
|
||||
use std::time::Duration;
|
||||
use tokio::sync::{broadcast, mpsc};
|
||||
use lib::build_event_log::SqliteBELStorage;
|
||||
use lib::build_state::BuildState;
|
||||
use lib::config::DatabuildConfig;
|
||||
use lib::daemon::{self, DaemonizeResult};
|
||||
use lib::http_server::{create_router, AppState};
|
||||
use lib::orchestrator::{Orchestrator, OrchestratorConfig};
|
||||
use lib::server_lock::ServerLock;
|
||||
|
||||
#[derive(Parser)]
|
||||
#[command(name = "databuild")]
|
||||
#[command(about = "DataBuild CLI - Build system for data pipelines", long_about = None)]
|
||||
struct Cli {
|
||||
/// Path to configuration file (JSON or TOML)
|
||||
#[arg(long, default_value = "databuild.json", global = true)]
|
||||
config: String,
|
||||
|
||||
#[command(subcommand)]
|
||||
command: Commands,
|
||||
}
|
||||
|
||||
#[derive(Subcommand)]
|
||||
enum Commands {
|
||||
/// Start the DataBuild HTTP server
|
||||
Serve {
|
||||
/// Port to listen on (auto-selected if not specified)
|
||||
#[arg(long)]
|
||||
port: Option<u16>,
|
||||
|
||||
/// Run as a daemon (internal flag, used by auto-start)
|
||||
#[arg(long, hide = true)]
|
||||
daemon: bool,
|
||||
},
|
||||
|
||||
/// Stop the running server
|
||||
Stop,
|
||||
|
||||
/// Show server status
|
||||
Status,
|
||||
|
||||
/// Create a new want (trigger partition builds)
|
||||
Want {
|
||||
/// Partition references to build (e.g., "data/users", "metrics/daily")
|
||||
partitions: Vec<String>,
|
||||
},
|
||||
|
||||
/// List and manage wants
|
||||
Wants {
|
||||
#[command(subcommand)]
|
||||
command: WantsCommands,
|
||||
},
|
||||
|
||||
/// List and inspect partitions
|
||||
Partitions {
|
||||
#[command(subcommand)]
|
||||
command: Option<PartitionsCommands>,
|
||||
},
|
||||
|
||||
/// List and inspect job runs
|
||||
JobRuns {
|
||||
#[command(subcommand)]
|
||||
command: Option<JobRunsCommands>,
|
||||
},
|
||||
}
|
||||
|
||||
#[derive(Subcommand)]
|
||||
enum WantsCommands {
|
||||
/// List all wants
|
||||
List,
|
||||
}
|
||||
|
||||
#[derive(Subcommand)]
|
||||
enum PartitionsCommands {
|
||||
/// List all partitions
|
||||
List,
|
||||
}
|
||||
|
||||
#[derive(Subcommand)]
|
||||
enum JobRunsCommands {
|
||||
/// List all job runs
|
||||
List,
|
||||
}
|
||||
|
||||
/// Load config and return (config, graph_label, config_hash)
|
||||
fn load_config(config_path: &str) -> (DatabuildConfig, String) {
|
||||
let config = DatabuildConfig::from_file(config_path).unwrap_or_else(|e| {
|
||||
eprintln!("Failed to load config from {}: {}", config_path, e);
|
||||
std::process::exit(1);
|
||||
});
|
||||
let config_hash = ServerLock::hash_config(Path::new(config_path)).unwrap_or_else(|e| {
|
||||
eprintln!("Failed to hash config: {}", e);
|
||||
std::process::exit(1);
|
||||
});
|
||||
(config, config_hash)
|
||||
}
|
||||
|
||||
/// Ensure server is running, return the server URL
|
||||
fn ensure_server(config_path: &str) -> String {
|
||||
let (config, config_hash) = load_config(config_path);
|
||||
|
||||
match daemon::ensure_server_running(Path::new(config_path), &config.graph_label, &config_hash) {
|
||||
Ok(DaemonizeResult::Started { port }) => {
|
||||
eprintln!("Started server on port {}", port);
|
||||
format!("http://127.0.0.1:{}", port)
|
||||
}
|
||||
Ok(DaemonizeResult::AlreadyRunning { port }) => {
|
||||
format!("http://127.0.0.1:{}", port)
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Failed to start server: {}", e);
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn main() {
|
||||
let cli = Cli::parse();
|
||||
|
||||
match cli.command {
|
||||
Commands::Serve { port, daemon: _ } => {
|
||||
let (config, _config_hash) = load_config(&cli.config);
|
||||
// Ensure graph directory exists (for log file, lock file, etc.)
|
||||
let _ = ServerLock::new(&config.graph_label).unwrap_or_else(|e| {
|
||||
eprintln!("Failed to create graph directory: {}", e);
|
||||
std::process::exit(1);
|
||||
});
|
||||
let database = config.effective_bel_uri();
|
||||
let actual_port = port.unwrap_or_else(|| {
|
||||
daemon::find_available_port(3538).unwrap_or_else(|e| {
|
||||
eprintln!("Failed to find available port: {}", e);
|
||||
std::process::exit(1);
|
||||
})
|
||||
});
|
||||
cmd_serve(actual_port, &database, &cli.config, &config);
|
||||
}
|
||||
Commands::Stop => {
|
||||
cmd_stop(&cli.config);
|
||||
}
|
||||
Commands::Status => {
|
||||
cmd_status(&cli.config);
|
||||
}
|
||||
Commands::Want { partitions } => {
|
||||
let server_url = ensure_server(&cli.config);
|
||||
cmd_want(&server_url, partitions);
|
||||
}
|
||||
Commands::Wants { command } => match command {
|
||||
WantsCommands::List => {
|
||||
let server_url = ensure_server(&cli.config);
|
||||
cmd_wants_list(&server_url);
|
||||
}
|
||||
},
|
||||
Commands::Partitions { command } => match command {
|
||||
Some(PartitionsCommands::List) | None => {
|
||||
let server_url = ensure_server(&cli.config);
|
||||
cmd_partitions_list(&server_url);
|
||||
}
|
||||
},
|
||||
Commands::JobRuns { command } => match command {
|
||||
Some(JobRunsCommands::List) | None => {
|
||||
let server_url = ensure_server(&cli.config);
|
||||
cmd_job_runs_list(&server_url);
|
||||
}
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
fn cmd_stop(config_path: &str) {
|
||||
let (config, _) = load_config(config_path);
|
||||
|
||||
match daemon::stop_server(&config.graph_label) {
|
||||
Ok(()) => {
|
||||
println!("Server stopped");
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Failed to stop server: {}", e);
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn cmd_status(config_path: &str) {
|
||||
let (config, config_hash) = load_config(config_path);
|
||||
let lock = ServerLock::new(&config.graph_label).unwrap_or_else(|e| {
|
||||
eprintln!("Failed to access lock: {}", e);
|
||||
std::process::exit(1);
|
||||
});
|
||||
|
||||
match lock.read_state() {
|
||||
Ok(Some(state)) => {
|
||||
let running = ServerLock::is_process_running(state.pid);
|
||||
let healthy = if running { daemon::health_check(state.port) } else { false };
|
||||
let config_changed = state.config_hash != config_hash;
|
||||
|
||||
println!("DataBuild Server Status");
|
||||
println!("━━━━━━━━━━━━━━━━━━━━━━━━");
|
||||
println!("Graph: {}", config.graph_label);
|
||||
println!("Status: {}", if healthy { "Running" } else if running { "Unhealthy" } else { "Stopped" });
|
||||
println!("URL: http://127.0.0.1:{}", state.port);
|
||||
println!("PID: {}", state.pid);
|
||||
println!("Database: {}", config.effective_bel_uri());
|
||||
if config_changed {
|
||||
println!();
|
||||
println!("⚠ Config has changed since server started");
|
||||
}
|
||||
}
|
||||
Ok(None) => {
|
||||
println!("DataBuild Server Status");
|
||||
println!("━━━━━━━━━━━━━━━━━━━━━━━━");
|
||||
println!("Graph: {}", config.graph_label);
|
||||
println!("Database: {}", config.effective_bel_uri());
|
||||
println!("Status: Not running");
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Failed to read server state: {}", e);
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Command Implementations
|
||||
// ============================================================================
|
||||
|
||||
#[tokio::main]
|
||||
async fn cmd_serve(port: u16, database: &str, config_path: &str, config: &DatabuildConfig) {
|
||||
|
||||
// Initialize logging
|
||||
tracing_subscriber::fmt::init();
|
||||
|
||||
// Acquire and hold the server lock for the duration of the server
|
||||
let mut server_lock = ServerLock::new(&config.graph_label).unwrap_or_else(|e| {
|
||||
eprintln!("Failed to create server lock: {}", e);
|
||||
std::process::exit(1);
|
||||
});
|
||||
|
||||
// Try to acquire exclusive lock
|
||||
match server_lock.try_lock() {
|
||||
Ok(true) => {
|
||||
// Write our state
|
||||
let config_hash = ServerLock::hash_config(Path::new(config_path)).unwrap_or_default();
|
||||
let state = lib::server_lock::ServerLockState {
|
||||
pid: std::process::id(),
|
||||
port,
|
||||
started_at: ServerLock::now_millis(),
|
||||
config_hash,
|
||||
};
|
||||
if let Err(e) = server_lock.write_state(&state) {
|
||||
eprintln!("Failed to write server state: {}", e);
|
||||
}
|
||||
}
|
||||
Ok(false) => {
|
||||
// Another server is holding the lock - this shouldn't happen in daemon mode
|
||||
// but could happen if user manually runs serve while another server is running
|
||||
eprintln!("Another server is already running for graph '{}'. Use 'databuild stop' first.", config.graph_label);
|
||||
std::process::exit(1);
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Failed to acquire server lock: {}", e);
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
println!("Loaded configuration from: {}", config_path);
|
||||
println!(" Graph: {}", config.graph_label);
|
||||
println!(" Jobs: {}", config.jobs.len());
|
||||
println!(" Idle timeout: {}s", config.idle_timeout_seconds);
|
||||
|
||||
let jobs = config.clone().into_job_configurations();
|
||||
let idle_timeout_secs = config.idle_timeout_seconds;
|
||||
|
||||
// Create SQLite BEL storage (shared between orchestrator and HTTP server)
|
||||
let bel_storage = Arc::new(
|
||||
SqliteBELStorage::create(database).expect("Failed to create BEL storage"),
|
||||
);
|
||||
|
||||
// Create command channel for orchestrator communication
|
||||
let (command_tx, command_rx) = mpsc::channel(100);
|
||||
|
||||
// Create event broadcast channel (orchestrator -> HTTP server)
|
||||
let (event_tx, _event_rx) = broadcast::channel(1000);
|
||||
|
||||
// Create shutdown broadcast channel
|
||||
let (shutdown_tx, _shutdown_rx) = broadcast::channel(1);
|
||||
|
||||
// Create shared mirrored build state for HTTP server
|
||||
let mirrored_state = Arc::new(RwLock::new(BuildState::default()));
|
||||
|
||||
// Spawn state-mirror task to keep HTTP server's build state in sync
|
||||
let mirror_clone = mirrored_state.clone();
|
||||
let mut mirror_rx = event_tx.subscribe();
|
||||
tokio::spawn(async move {
|
||||
while let Ok(event) = mirror_rx.recv().await {
|
||||
match mirror_clone.write() {
|
||||
Ok(mut state) => {
|
||||
state.handle_event(&event);
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("State mirror task: RwLock poisoned, cannot update state: {}", e);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
// Spawn orchestrator in background thread
|
||||
// Note: Orchestrator needs its own BEL storage instance for writes
|
||||
let orch_bel_storage = SqliteBELStorage::create(database).expect("Failed to create BEL storage");
|
||||
let orch_shutdown_rx = shutdown_tx.subscribe();
|
||||
let orch_handle = std::thread::spawn(move || {
|
||||
// Create orchestrator with both channels and jobs from config
|
||||
let config = OrchestratorConfig { jobs };
|
||||
let mut orchestrator = Orchestrator::new_with_channels(
|
||||
orch_bel_storage,
|
||||
config,
|
||||
command_rx,
|
||||
event_tx,
|
||||
);
|
||||
let mut shutdown_rx = orch_shutdown_rx;
|
||||
|
||||
// Run orchestrator loop
|
||||
loop {
|
||||
// Check for shutdown signal
|
||||
if shutdown_rx.try_recv().is_ok() {
|
||||
println!("Orchestrator received shutdown signal");
|
||||
break;
|
||||
}
|
||||
|
||||
if let Err(e) = orchestrator.step() {
|
||||
eprintln!("Orchestrator error: {}", e);
|
||||
}
|
||||
// Small sleep to avoid busy-waiting
|
||||
std::thread::sleep(std::time::Duration::from_millis(10));
|
||||
}
|
||||
});
|
||||
|
||||
// Create app state with mirrored state, shared storage, command sender, and shutdown channel
|
||||
let state = AppState::new(mirrored_state, bel_storage, command_tx, shutdown_tx.clone());
|
||||
|
||||
// Spawn idle timeout checker task
|
||||
let idle_state = state.clone();
|
||||
let idle_shutdown_tx = shutdown_tx.clone();
|
||||
tokio::spawn(async move {
|
||||
let idle_timeout = Duration::from_secs(idle_timeout_secs);
|
||||
|
||||
loop {
|
||||
tokio::time::sleep(Duration::from_secs(60)).await;
|
||||
|
||||
let last_request = idle_state.last_request_time.load(std::sync::atomic::Ordering::Relaxed);
|
||||
let now = std::time::SystemTime::now()
|
||||
.duration_since(std::time::UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_millis() as u64;
|
||||
|
||||
if now - last_request > idle_timeout.as_millis() as u64 {
|
||||
eprintln!(
|
||||
"Server idle for {}s, shutting down",
|
||||
idle_timeout.as_secs()
|
||||
);
|
||||
let _ = idle_shutdown_tx.send(());
|
||||
break;
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
// Create router
|
||||
let app = create_router(state);
|
||||
|
||||
// Bind to specified port
|
||||
let addr = format!("127.0.0.1:{}", port);
|
||||
let listener = tokio::net::TcpListener::bind(&addr)
|
||||
.await
|
||||
.unwrap_or_else(|_| panic!("Failed to bind to {}", addr));
|
||||
|
||||
println!("DataBuild server listening on http://{}", addr);
|
||||
println!(" GET /health");
|
||||
println!(" GET /api/wants");
|
||||
println!(" POST /api/wants");
|
||||
println!(" GET /api/wants/:id");
|
||||
println!(" GET /api/partitions");
|
||||
println!(" GET /api/job_runs");
|
||||
|
||||
// Subscribe to shutdown signal for graceful shutdown
|
||||
let mut server_shutdown_rx = shutdown_tx.subscribe();
|
||||
|
||||
// Run the server with graceful shutdown
|
||||
axum::serve(listener, app)
|
||||
.with_graceful_shutdown(async move {
|
||||
let _ = server_shutdown_rx.recv().await;
|
||||
println!("HTTP server received shutdown signal");
|
||||
})
|
||||
.await
|
||||
.expect("Server error");
|
||||
|
||||
// Wait for orchestrator to finish
|
||||
let _ = orch_handle.join();
|
||||
println!("Shutdown complete");
|
||||
}
|
||||
|
||||
fn cmd_want(server_url: &str, partitions: Vec<String>) {
|
||||
let client = Client::new();
|
||||
|
||||
// Convert partition strings to PartitionRef objects
|
||||
let partition_refs: Vec<serde_json::Value> = partitions
|
||||
.iter()
|
||||
.map(|p| serde_json::json!({"ref": p}))
|
||||
.collect();
|
||||
|
||||
// Get current timestamp (milliseconds since epoch)
|
||||
let now = std::time::SystemTime::now()
|
||||
.duration_since(std::time::UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_millis() as u64;
|
||||
|
||||
let request = serde_json::json!({
|
||||
"partitions": partition_refs,
|
||||
"data_timestamp": now,
|
||||
"ttl_seconds": 3600, // 1 hour default
|
||||
"sla_seconds": 300 // 5 minutes default
|
||||
});
|
||||
|
||||
let url = format!("{}/api/wants", server_url);
|
||||
|
||||
match client.post(&url)
|
||||
.json(&request)
|
||||
.send()
|
||||
{
|
||||
Ok(response) => {
|
||||
if response.status().is_success() {
|
||||
println!("Want created successfully");
|
||||
if let Ok(body) = response.text() {
|
||||
println!("{}", body);
|
||||
}
|
||||
} else {
|
||||
eprintln!("Failed to create want: {}", response.status());
|
||||
if let Ok(body) = response.text() {
|
||||
eprintln!("{}", body);
|
||||
}
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Failed to connect to server: {}", e);
|
||||
eprintln!("Is the server running? Try: databuild serve");
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn cmd_wants_list(server_url: &str) {
|
||||
let client = Client::new();
|
||||
let url = format!("{}/api/wants", server_url);
|
||||
|
||||
match client.get(&url).send() {
|
||||
Ok(response) => {
|
||||
if response.status().is_success() {
|
||||
match response.json::<serde_json::Value>() {
|
||||
Ok(json) => {
|
||||
println!("{}", serde_json::to_string_pretty(&json).unwrap());
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Failed to parse response: {}", e);
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
eprintln!("Request failed: {}", response.status());
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Failed to connect to server: {}", e);
|
||||
eprintln!("Is the server running? Try: databuild serve");
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn cmd_partitions_list(server_url: &str) {
|
||||
let client = Client::new();
|
||||
let url = format!("{}/api/partitions", server_url);
|
||||
|
||||
match client.get(&url).send() {
|
||||
Ok(response) => {
|
||||
if response.status().is_success() {
|
||||
match response.json::<serde_json::Value>() {
|
||||
Ok(json) => {
|
||||
println!("{}", serde_json::to_string_pretty(&json).unwrap());
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Failed to parse response: {}", e);
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
eprintln!("Request failed: {}", response.status());
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Failed to connect to server: {}", e);
|
||||
eprintln!("Is the server running? Try: databuild serve");
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn cmd_job_runs_list(server_url: &str) {
|
||||
let client = Client::new();
|
||||
let url = format!("{}/api/job_runs", server_url);
|
||||
|
||||
match client.get(&url).send() {
|
||||
Ok(response) => {
|
||||
if response.status().is_success() {
|
||||
match response.json::<serde_json::Value>() {
|
||||
Ok(json) => {
|
||||
println!("{}", serde_json::to_string_pretty(&json).unwrap());
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Failed to parse response: {}", e);
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
eprintln!("Request failed: {}", response.status());
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Failed to connect to server: {}", e);
|
||||
eprintln!("Is the server running? Try: databuild serve");
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,192 +0,0 @@
|
|||
load("@aspect_rules_ts//ts:defs.bzl", "ts_config", "ts_project")
|
||||
|
||||
# Extract OpenAPI spec from the dedicated spec generator binary
|
||||
genrule(
|
||||
name = "extract_openapi_spec",
|
||||
srcs = [],
|
||||
outs = ["openapi.json"],
|
||||
cmd = """
|
||||
$(location //databuild:openapi_spec_generator) > $@
|
||||
""",
|
||||
tools = [
|
||||
"//databuild:openapi_spec_generator",
|
||||
],
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
|
||||
# TypeScript generator configuration
|
||||
filegroup(
|
||||
name = "typescript_generator_config",
|
||||
srcs = ["typescript_generator_config.json"],
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
|
||||
# Generate TypeScript client using OpenAPI Generator JAR
|
||||
genrule(
|
||||
name = "typescript_client",
|
||||
srcs = [
|
||||
":extract_openapi_spec",
|
||||
":typescript_generator_config",
|
||||
],
|
||||
outs = [
|
||||
"typescript_generated/src/apis/DefaultApi.ts",
|
||||
"typescript_generated/src/apis/index.ts",
|
||||
"typescript_generated/src/models/index.ts",
|
||||
"typescript_generated/src/models/ActivityApiResponse.ts",
|
||||
"typescript_generated/src/models/ActivityResponse.ts",
|
||||
"typescript_generated/src/models/AnalyzeRequest.ts",
|
||||
"typescript_generated/src/models/AnalyzeResponse.ts",
|
||||
"typescript_generated/src/models/BuildCancelPathRequest.ts",
|
||||
"typescript_generated/src/models/BuildCancelRepositoryResponse.ts",
|
||||
"typescript_generated/src/models/BuildDetailRequest.ts",
|
||||
"typescript_generated/src/models/BuildDetailResponse.ts",
|
||||
"typescript_generated/src/models/BuildEventSummary.ts",
|
||||
"typescript_generated/src/models/BuildRequest.ts",
|
||||
"typescript_generated/src/models/BuildRequestResponse.ts",
|
||||
"typescript_generated/src/models/BuildSummary.ts",
|
||||
"typescript_generated/src/models/BuildTimelineEvent.ts",
|
||||
"typescript_generated/src/models/BuildsListApiResponse.ts",
|
||||
"typescript_generated/src/models/BuildsListResponse.ts",
|
||||
"typescript_generated/src/models/CancelBuildRepositoryRequest.ts",
|
||||
"typescript_generated/src/models/InvalidatePartitionRequest.ts",
|
||||
"typescript_generated/src/models/JobDailyStats.ts",
|
||||
"typescript_generated/src/models/JobDetailRequest.ts",
|
||||
"typescript_generated/src/models/JobDetailResponse.ts",
|
||||
"typescript_generated/src/models/JobMetricsRequest.ts",
|
||||
"typescript_generated/src/models/JobMetricsResponse.ts",
|
||||
"typescript_generated/src/models/JobRunDetail.ts",
|
||||
"typescript_generated/src/models/JobSummary.ts",
|
||||
"typescript_generated/src/models/JobsListApiResponse.ts",
|
||||
"typescript_generated/src/models/JobsListResponse.ts",
|
||||
"typescript_generated/src/models/PaginationInfo.ts",
|
||||
"typescript_generated/src/models/PartitionDetailRequest.ts",
|
||||
"typescript_generated/src/models/PartitionDetailResponse.ts",
|
||||
"typescript_generated/src/models/PartitionEventsRequest.ts",
|
||||
"typescript_generated/src/models/PartitionEventsResponse.ts",
|
||||
"typescript_generated/src/models/PartitionInvalidatePathRequest.ts",
|
||||
"typescript_generated/src/models/PartitionInvalidateResponse.ts",
|
||||
"typescript_generated/src/models/PartitionRef.ts",
|
||||
"typescript_generated/src/models/PartitionStatusRequest.ts",
|
||||
"typescript_generated/src/models/PartitionStatusResponse.ts",
|
||||
"typescript_generated/src/models/PartitionSummary.ts",
|
||||
"typescript_generated/src/models/PartitionTimelineEvent.ts",
|
||||
"typescript_generated/src/models/PartitionsListApiResponse.ts",
|
||||
"typescript_generated/src/models/PartitionsListResponse.ts",
|
||||
"typescript_generated/src/models/CancelTaskRequest.ts",
|
||||
"typescript_generated/src/models/JobRunDetailResponse.ts",
|
||||
"typescript_generated/src/models/JobRunSummary.ts",
|
||||
"typescript_generated/src/models/JobRunSummary2.ts",
|
||||
"typescript_generated/src/models/JobRunTimelineEvent.ts",
|
||||
"typescript_generated/src/models/JobRunsListApiResponse.ts",
|
||||
"typescript_generated/src/models/JobRunsListResponse.ts",
|
||||
"typescript_generated/src/models/TaskCancelPathRequest.ts",
|
||||
"typescript_generated/src/models/TaskCancelResponse.ts",
|
||||
"typescript_generated/src/models/TaskDetailRequest.ts",
|
||||
"typescript_generated/src/runtime.ts",
|
||||
"typescript_generated/src/index.ts",
|
||||
],
|
||||
cmd = """
|
||||
# Download OpenAPI Generator JAR
|
||||
OPENAPI_JAR=/tmp/openapi-generator-cli.jar
|
||||
if [ ! -f $$OPENAPI_JAR ]; then
|
||||
curl -L -o $$OPENAPI_JAR https://repo1.maven.org/maven2/org/openapitools/openapi-generator-cli/7.2.0/openapi-generator-cli-7.2.0.jar
|
||||
fi
|
||||
|
||||
# Create temporary directory for generation
|
||||
TEMP_DIR=$$(mktemp -d)
|
||||
|
||||
# Generate TypeScript client to temp directory
|
||||
java -jar $$OPENAPI_JAR generate \
|
||||
-i $(location :extract_openapi_spec) \
|
||||
-g typescript-fetch \
|
||||
-c $(location :typescript_generator_config) \
|
||||
-o $$TEMP_DIR
|
||||
|
||||
# Copy generated files to expected output locations
|
||||
cp $$TEMP_DIR/src/apis/DefaultApi.ts $(location typescript_generated/src/apis/DefaultApi.ts)
|
||||
cp $$TEMP_DIR/src/apis/index.ts $(location typescript_generated/src/apis/index.ts)
|
||||
cp $$TEMP_DIR/src/models/index.ts $(location typescript_generated/src/models/index.ts)
|
||||
cp $$TEMP_DIR/src/models/ActivityApiResponse.ts $(location typescript_generated/src/models/ActivityApiResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/ActivityResponse.ts $(location typescript_generated/src/models/ActivityResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/AnalyzeRequest.ts $(location typescript_generated/src/models/AnalyzeRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/AnalyzeResponse.ts $(location typescript_generated/src/models/AnalyzeResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/BuildCancelPathRequest.ts $(location typescript_generated/src/models/BuildCancelPathRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/BuildCancelRepositoryResponse.ts $(location typescript_generated/src/models/BuildCancelRepositoryResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/BuildDetailRequest.ts $(location typescript_generated/src/models/BuildDetailRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/BuildDetailResponse.ts $(location typescript_generated/src/models/BuildDetailResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/BuildEventSummary.ts $(location typescript_generated/src/models/BuildEventSummary.ts)
|
||||
cp $$TEMP_DIR/src/models/BuildRequest.ts $(location typescript_generated/src/models/BuildRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/BuildRequestResponse.ts $(location typescript_generated/src/models/BuildRequestResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/BuildSummary.ts $(location typescript_generated/src/models/BuildSummary.ts)
|
||||
cp $$TEMP_DIR/src/models/BuildTimelineEvent.ts $(location typescript_generated/src/models/BuildTimelineEvent.ts)
|
||||
cp $$TEMP_DIR/src/models/BuildsListApiResponse.ts $(location typescript_generated/src/models/BuildsListApiResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/BuildsListResponse.ts $(location typescript_generated/src/models/BuildsListResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/CancelBuildRepositoryRequest.ts $(location typescript_generated/src/models/CancelBuildRepositoryRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/InvalidatePartitionRequest.ts $(location typescript_generated/src/models/InvalidatePartitionRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/JobDailyStats.ts $(location typescript_generated/src/models/JobDailyStats.ts)
|
||||
cp $$TEMP_DIR/src/models/JobDetailRequest.ts $(location typescript_generated/src/models/JobDetailRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/JobDetailResponse.ts $(location typescript_generated/src/models/JobDetailResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/JobMetricsRequest.ts $(location typescript_generated/src/models/JobMetricsRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/JobMetricsResponse.ts $(location typescript_generated/src/models/JobMetricsResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/JobRunDetail.ts $(location typescript_generated/src/models/JobRunDetail.ts)
|
||||
cp $$TEMP_DIR/src/models/JobRunSummary.ts $(location typescript_generated/src/models/JobRunSummary.ts)
|
||||
cp $$TEMP_DIR/src/models/JobSummary.ts $(location typescript_generated/src/models/JobSummary.ts)
|
||||
cp $$TEMP_DIR/src/models/JobsListApiResponse.ts $(location typescript_generated/src/models/JobsListApiResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/JobsListResponse.ts $(location typescript_generated/src/models/JobsListResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/PaginationInfo.ts $(location typescript_generated/src/models/PaginationInfo.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionDetailRequest.ts $(location typescript_generated/src/models/PartitionDetailRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionDetailResponse.ts $(location typescript_generated/src/models/PartitionDetailResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionEventsRequest.ts $(location typescript_generated/src/models/PartitionEventsRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionEventsResponse.ts $(location typescript_generated/src/models/PartitionEventsResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionInvalidatePathRequest.ts $(location typescript_generated/src/models/PartitionInvalidatePathRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionInvalidateResponse.ts $(location typescript_generated/src/models/PartitionInvalidateResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionRef.ts $(location typescript_generated/src/models/PartitionRef.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionStatusRequest.ts $(location typescript_generated/src/models/PartitionStatusRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionStatusResponse.ts $(location typescript_generated/src/models/PartitionStatusResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionSummary.ts $(location typescript_generated/src/models/PartitionSummary.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionTimelineEvent.ts $(location typescript_generated/src/models/PartitionTimelineEvent.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionsListApiResponse.ts $(location typescript_generated/src/models/PartitionsListApiResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/PartitionsListResponse.ts $(location typescript_generated/src/models/PartitionsListResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/JobRunSummary.ts $(location typescript_generated/src/models/JobRunSummary.ts)
|
||||
cp $$TEMP_DIR/src/models/JobRunTimelineEvent.ts $(location typescript_generated/src/models/JobRunTimelineEvent.ts)
|
||||
cp $$TEMP_DIR/src/models/JobRunsListApiResponse.ts $(location typescript_generated/src/models/JobRunsListApiResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/JobRunsListResponse.ts $(location typescript_generated/src/models/JobRunsListResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/CancelTaskRequest.ts $(location typescript_generated/src/models/CancelTaskRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/JobRunDetailResponse.ts $(location typescript_generated/src/models/JobRunDetailResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/JobRunSummary2.ts $(location typescript_generated/src/models/JobRunSummary2.ts)
|
||||
cp $$TEMP_DIR/src/models/TaskCancelPathRequest.ts $(location typescript_generated/src/models/TaskCancelPathRequest.ts)
|
||||
cp $$TEMP_DIR/src/models/TaskCancelResponse.ts $(location typescript_generated/src/models/TaskCancelResponse.ts)
|
||||
cp $$TEMP_DIR/src/models/TaskDetailRequest.ts $(location typescript_generated/src/models/TaskDetailRequest.ts)
|
||||
cp $$TEMP_DIR/src/runtime.ts $(location typescript_generated/src/runtime.ts)
|
||||
cp $$TEMP_DIR/src/index.ts $(location typescript_generated/src/index.ts)
|
||||
""",
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
|
||||
# TypeScript configuration for the client
|
||||
ts_config(
|
||||
name = "ts_config",
|
||||
src = "tsconfig.json",
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
|
||||
# Create a proper TypeScript project from the generated files
|
||||
ts_project(
|
||||
name = "typescript_lib",
|
||||
srcs = [":typescript_client"],
|
||||
allow_js = True,
|
||||
declaration = True,
|
||||
resolve_json_module = True,
|
||||
transpiler = "tsc",
|
||||
tsconfig = ":ts_config",
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
|
||||
# Main TypeScript client target
|
||||
filegroup(
|
||||
name = "typescript",
|
||||
srcs = [
|
||||
":typescript_client",
|
||||
],
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
{
|
||||
"compilerOptions": {
|
||||
"target": "ES2020",
|
||||
"module": "CommonJS",
|
||||
"moduleResolution": "node",
|
||||
"allowJs": true,
|
||||
"declaration": true,
|
||||
"strict": false,
|
||||
"esModuleInterop": true,
|
||||
"skipLibCheck": true,
|
||||
"forceConsistentCasingInFileNames": true,
|
||||
"resolveJsonModule": true,
|
||||
"isolatedModules": true,
|
||||
"noEmit": false
|
||||
},
|
||||
"include": ["**/*"],
|
||||
"exclude": [
|
||||
"node_modules",
|
||||
"**/*.test.ts"
|
||||
]
|
||||
}
|
||||
|
|
@ -1,14 +0,0 @@
|
|||
{
|
||||
"enumPropertyNaming": "snake_case",
|
||||
"withInterfaces": true,
|
||||
"useSingleRequestParameter": true,
|
||||
"typescriptThreePlus": true,
|
||||
"npmName": "databuild-client",
|
||||
"npmVersion": "1.0.0",
|
||||
"stringEnums": true,
|
||||
"generateAliasAsModel": false,
|
||||
"modelPropertyNaming": "snake_case",
|
||||
"paramNaming": "snake_case",
|
||||
"supportsES6": true,
|
||||
"withoutRuntimeChecks": false
|
||||
}
|
||||
19
databuild/commands.rs
Normal file
19
databuild/commands.rs
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
use crate::util::DatabuildError;
|
||||
use crate::{CancelWantRequest, CancelWantResponse, CreateWantRequest, CreateWantResponse};
|
||||
use tokio::sync::oneshot;
|
||||
|
||||
/// Commands that can be sent to the orchestrator via the command channel.
|
||||
/// Only write operations need commands; reads go directly to BEL storage.
|
||||
pub enum Command {
|
||||
/// Create a new want
|
||||
CreateWant {
|
||||
request: CreateWantRequest,
|
||||
reply: oneshot::Sender<Result<CreateWantResponse, DatabuildError>>,
|
||||
},
|
||||
|
||||
/// Cancel an existing want
|
||||
CancelWant {
|
||||
request: CancelWantRequest,
|
||||
reply: oneshot::Sender<Result<CancelWantResponse, DatabuildError>>,
|
||||
},
|
||||
}
|
||||
164
databuild/config.rs
Normal file
164
databuild/config.rs
Normal file
|
|
@ -0,0 +1,164 @@
|
|||
use crate::JobConfig;
|
||||
use crate::job::JobConfiguration;
|
||||
use crate::util::DatabuildError;
|
||||
use std::fs;
|
||||
use std::path::Path;
|
||||
|
||||
/// Default idle timeout in seconds (1 hour)
|
||||
pub const DEFAULT_IDLE_TIMEOUT_SECONDS: u64 = 3600;
|
||||
|
||||
/// Configuration file format for DataBuild application
|
||||
#[derive(Debug, Clone, serde::Deserialize, serde::Serialize)]
|
||||
pub struct DatabuildConfig {
|
||||
/// Unique identifier for this graph, used for `.databuild/${graph_label}/` directory
|
||||
pub graph_label: String,
|
||||
|
||||
/// Server auto-shutdown after this many seconds of inactivity (default: 3600)
|
||||
#[serde(default = "default_idle_timeout")]
|
||||
pub idle_timeout_seconds: u64,
|
||||
|
||||
/// BEL storage URI. Defaults to `.databuild/${graph_label}/bel.sqlite` if not specified.
|
||||
///
|
||||
/// Supported formats:
|
||||
/// - Relative path: `path/to/bel.sqlite` (resolved relative to config file)
|
||||
/// - Absolute path: `/var/data/bel.sqlite`
|
||||
/// - SQLite URI: `sqlite:///path/to/bel.sqlite` or `sqlite::memory:`
|
||||
/// - Future: `postgresql://user:pass@host/db`
|
||||
#[serde(default)]
|
||||
pub bel_uri: Option<String>,
|
||||
|
||||
/// List of job configurations
|
||||
#[serde(default)]
|
||||
pub jobs: Vec<JobConfig>,
|
||||
}
|
||||
|
||||
fn default_idle_timeout() -> u64 {
|
||||
DEFAULT_IDLE_TIMEOUT_SECONDS
|
||||
}
|
||||
|
||||
impl DatabuildConfig {
|
||||
/// Load configuration from a file, auto-detecting format from extension
|
||||
pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self, DatabuildError> {
|
||||
let path = path.as_ref();
|
||||
let contents = fs::read_to_string(path)
|
||||
.map_err(|e| DatabuildError::from(format!("Failed to read config file: {}", e)))?;
|
||||
|
||||
// Determine format from file extension
|
||||
let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("");
|
||||
|
||||
match extension {
|
||||
"json" => Self::from_json(&contents),
|
||||
"toml" => Self::from_toml(&contents),
|
||||
_ => Err(DatabuildError::from(format!(
|
||||
"Unknown config file extension: {}. Use .json or .toml",
|
||||
extension
|
||||
))),
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse configuration from JSON string
|
||||
pub fn from_json(s: &str) -> Result<Self, DatabuildError> {
|
||||
serde_json::from_str(s)
|
||||
.map_err(|e| DatabuildError::from(format!("Failed to parse JSON config: {}", e)))
|
||||
}
|
||||
|
||||
/// Parse configuration from TOML string
|
||||
pub fn from_toml(s: &str) -> Result<Self, DatabuildError> {
|
||||
toml::from_str(s)
|
||||
.map_err(|e| DatabuildError::from(format!("Failed to parse TOML config: {}", e)))
|
||||
}
|
||||
|
||||
/// Convert to a list of JobConfiguration
|
||||
pub fn into_job_configurations(self) -> Vec<JobConfiguration> {
|
||||
self.jobs.into_iter().map(|jc| jc.into()).collect()
|
||||
}
|
||||
|
||||
/// Get the effective BEL URI, resolving defaults based on graph_label.
|
||||
///
|
||||
/// If `bel_uri` is not set, returns the default path `.databuild/${graph_label}/bel.sqlite`.
|
||||
/// Relative paths are not resolved here - that's the caller's responsibility.
|
||||
pub fn effective_bel_uri(&self) -> String {
|
||||
self.bel_uri
|
||||
.clone()
|
||||
.unwrap_or_else(|| format!(".databuild/{}/bel.sqlite", self.graph_label))
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_parse_json_config() {
|
||||
let json = r#"
|
||||
{
|
||||
"graph_label": "test_graph",
|
||||
"jobs": [
|
||||
{
|
||||
"label": "//test:job_alpha",
|
||||
"entrypoint": "/usr/bin/python3",
|
||||
"environment": {"FOO": "bar"},
|
||||
"partition_patterns": ["data/alpha/.*"]
|
||||
}
|
||||
]
|
||||
}
|
||||
"#;
|
||||
|
||||
let config = DatabuildConfig::from_json(json).unwrap();
|
||||
assert_eq!(config.graph_label, "test_graph");
|
||||
assert_eq!(config.idle_timeout_seconds, DEFAULT_IDLE_TIMEOUT_SECONDS);
|
||||
assert_eq!(config.jobs.len(), 1);
|
||||
assert_eq!(config.jobs[0].label, "//test:job_alpha");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_json_config_with_idle_timeout() {
|
||||
let json = r#"
|
||||
{
|
||||
"graph_label": "test_graph",
|
||||
"idle_timeout_seconds": 7200,
|
||||
"jobs": []
|
||||
}
|
||||
"#;
|
||||
|
||||
let config = DatabuildConfig::from_json(json).unwrap();
|
||||
assert_eq!(config.graph_label, "test_graph");
|
||||
assert_eq!(config.idle_timeout_seconds, 7200);
|
||||
assert_eq!(config.jobs.len(), 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_effective_bel_uri() {
|
||||
// Default: derives from graph_label
|
||||
let config = DatabuildConfig::from_json(r#"{ "graph_label": "my_graph" }"#).unwrap();
|
||||
assert_eq!(config.effective_bel_uri(), ".databuild/my_graph/bel.sqlite");
|
||||
|
||||
// Custom: uses provided value
|
||||
let config = DatabuildConfig::from_json(
|
||||
r#"{ "graph_label": "my_graph", "bel_uri": "postgresql://localhost/db" }"#,
|
||||
)
|
||||
.unwrap();
|
||||
assert_eq!(config.effective_bel_uri(), "postgresql://localhost/db");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parse_toml_config() {
|
||||
let toml = r#"
|
||||
graph_label = "test_graph"
|
||||
|
||||
[[jobs]]
|
||||
label = "//test:job_alpha"
|
||||
entrypoint = "/usr/bin/python3"
|
||||
partition_patterns = ["data/alpha/.*"]
|
||||
|
||||
[jobs.environment]
|
||||
FOO = "bar"
|
||||
"#;
|
||||
|
||||
let config = DatabuildConfig::from_toml(toml).unwrap();
|
||||
assert_eq!(config.graph_label, "test_graph");
|
||||
assert_eq!(config.idle_timeout_seconds, DEFAULT_IDLE_TIMEOUT_SECONDS);
|
||||
assert_eq!(config.jobs.len(), 1);
|
||||
assert_eq!(config.jobs[0].label, "//test:job_alpha");
|
||||
}
|
||||
}
|
||||
196
databuild/daemon.rs
Normal file
196
databuild/daemon.rs
Normal file
|
|
@ -0,0 +1,196 @@
|
|||
//! Server daemonization for CLI-server automation.
|
||||
//!
|
||||
//! Implements the classic double-fork pattern to create a proper Unix daemon.
|
||||
|
||||
use crate::server_lock::ServerLock;
|
||||
use crate::util::DatabuildError;
|
||||
use std::fs::OpenOptions;
|
||||
use std::path::Path;
|
||||
use std::process::{Child, Command, Stdio};
|
||||
|
||||
/// Result of attempting to start a daemonized server.
|
||||
pub enum DaemonizeResult {
|
||||
/// Server was started, here's the port.
|
||||
Started { port: u16 },
|
||||
/// Server was already running at this port.
|
||||
AlreadyRunning { port: u16 },
|
||||
}
|
||||
|
||||
/// Find an available port starting from the given port.
|
||||
pub fn find_available_port(start_port: u16) -> Result<u16, DatabuildError> {
|
||||
for port in start_port..=start_port + 100 {
|
||||
if let Ok(listener) = std::net::TcpListener::bind(format!("127.0.0.1:{}", port)) {
|
||||
drop(listener);
|
||||
return Ok(port);
|
||||
}
|
||||
}
|
||||
Err(DatabuildError::from(format!(
|
||||
"No available port found in range {}..{}",
|
||||
start_port,
|
||||
start_port + 100
|
||||
)))
|
||||
}
|
||||
|
||||
/// Check if the server at the given port is healthy.
|
||||
pub fn health_check(port: u16) -> bool {
|
||||
let url = format!("http://127.0.0.1:{}/health", port);
|
||||
match reqwest::blocking::get(&url) {
|
||||
Ok(resp) => resp.status().is_success(),
|
||||
Err(_) => false,
|
||||
}
|
||||
}
|
||||
|
||||
/// Wait for the server at the given port to become healthy.
|
||||
/// Returns Ok(()) if healthy within timeout, Err otherwise.
|
||||
pub fn wait_for_health(port: u16, timeout_ms: u64) -> Result<(), DatabuildError> {
|
||||
let start = std::time::Instant::now();
|
||||
let timeout = std::time::Duration::from_millis(timeout_ms);
|
||||
|
||||
while start.elapsed() < timeout {
|
||||
if health_check(port) {
|
||||
return Ok(());
|
||||
}
|
||||
std::thread::sleep(std::time::Duration::from_millis(100));
|
||||
}
|
||||
|
||||
Err(DatabuildError::from(format!(
|
||||
"Server did not become healthy within {}ms",
|
||||
timeout_ms
|
||||
)))
|
||||
}
|
||||
|
||||
/// Spawn the server as a daemon process.
|
||||
///
|
||||
/// This re-executes the current binary with `serve` command and special flags
|
||||
/// to indicate it should daemonize itself.
|
||||
pub fn spawn_daemon(
|
||||
config_path: &Path,
|
||||
port: u16,
|
||||
log_path: &Path,
|
||||
) -> Result<Child, DatabuildError> {
|
||||
// Get the current executable path
|
||||
let exe = std::env::current_exe()
|
||||
.map_err(|e| DatabuildError::from(format!("Failed to get current executable: {}", e)))?;
|
||||
|
||||
// Open log file for stdout/stderr redirection
|
||||
let log_file = OpenOptions::new()
|
||||
.create(true)
|
||||
.append(true)
|
||||
.open(log_path)
|
||||
.map_err(|e| DatabuildError::from(format!("Failed to open log file: {}", e)))?;
|
||||
|
||||
let log_file_err = log_file
|
||||
.try_clone()
|
||||
.map_err(|e| DatabuildError::from(format!("Failed to clone log file: {}", e)))?;
|
||||
|
||||
// Spawn the daemon process
|
||||
let child = Command::new(exe)
|
||||
.arg("serve")
|
||||
.arg("--port")
|
||||
.arg(port.to_string())
|
||||
.arg("--config")
|
||||
.arg(config_path)
|
||||
.arg("--daemon")
|
||||
.stdin(Stdio::null())
|
||||
.stdout(Stdio::from(log_file))
|
||||
.stderr(Stdio::from(log_file_err))
|
||||
.spawn()
|
||||
.map_err(|e| DatabuildError::from(format!("Failed to spawn daemon: {}", e)))?;
|
||||
|
||||
Ok(child)
|
||||
}
|
||||
|
||||
/// Attempt to start the server, or connect to an existing one.
|
||||
///
|
||||
/// This is the main entry point for CLI commands that need the server running.
|
||||
pub fn ensure_server_running(
|
||||
config_path: &Path,
|
||||
graph_label: &str,
|
||||
config_hash: &str,
|
||||
) -> Result<DaemonizeResult, DatabuildError> {
|
||||
let lock = ServerLock::new(graph_label)?;
|
||||
|
||||
// First, check if there's already a running server by reading existing state
|
||||
if let Some(state) = lock.read_state()? {
|
||||
// Check if that process is still running
|
||||
if ServerLock::is_process_running(state.pid) {
|
||||
// Verify server is actually healthy
|
||||
if health_check(state.port) {
|
||||
// Check if config has changed
|
||||
if state.config_hash != config_hash {
|
||||
eprintln!(
|
||||
"Warning: Config has changed since server started.\n\
|
||||
Run 'databuild stop && databuild serve' to apply changes."
|
||||
);
|
||||
}
|
||||
return Ok(DaemonizeResult::AlreadyRunning { port: state.port });
|
||||
}
|
||||
// Process exists but not healthy - might still be starting up
|
||||
// Wait a bit and check again
|
||||
if wait_for_health(state.port, 5000).is_ok() {
|
||||
return Ok(DaemonizeResult::AlreadyRunning { port: state.port });
|
||||
}
|
||||
// Still unhealthy, will need to be stopped manually
|
||||
return Err(DatabuildError::from(format!(
|
||||
"Server at port {} appears unhealthy. Try 'databuild stop' and retry.",
|
||||
state.port
|
||||
)));
|
||||
} else {
|
||||
// Stale lock file - process is gone, clean up
|
||||
lock.remove_stale_lock()?;
|
||||
}
|
||||
}
|
||||
|
||||
// No server running - start one
|
||||
// Find an available port
|
||||
let port = find_available_port(3538)?;
|
||||
|
||||
// Spawn the daemon - it will acquire its own lock
|
||||
let log_path = lock.log_path();
|
||||
let _child = spawn_daemon(config_path, port, &log_path)?;
|
||||
|
||||
// Wait for server to become healthy (which implies it has acquired the lock)
|
||||
wait_for_health(port, 10000)?;
|
||||
|
||||
Ok(DaemonizeResult::Started { port })
|
||||
}
|
||||
|
||||
/// Stop a running server.
|
||||
pub fn stop_server(graph_label: &str) -> Result<(), DatabuildError> {
|
||||
let lock = ServerLock::new(graph_label)?;
|
||||
let state = lock
|
||||
.read_state()?
|
||||
.ok_or_else(|| DatabuildError::from("No server is running"))?;
|
||||
|
||||
// Check if process exists
|
||||
if !ServerLock::is_process_running(state.pid) {
|
||||
// Process already dead, clean up stale lock
|
||||
lock.remove_stale_lock()?;
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// Send SIGTERM to the server
|
||||
#[cfg(unix)]
|
||||
unsafe {
|
||||
libc::kill(state.pid as i32, libc::SIGTERM);
|
||||
}
|
||||
|
||||
// Wait for process to exit (with timeout)
|
||||
let start = std::time::Instant::now();
|
||||
let timeout = std::time::Duration::from_secs(10);
|
||||
|
||||
while start.elapsed() < timeout {
|
||||
if !ServerLock::is_process_running(state.pid) {
|
||||
return Ok(());
|
||||
}
|
||||
std::thread::sleep(std::time::Duration::from_millis(100));
|
||||
}
|
||||
|
||||
// If still running after timeout, force kill
|
||||
#[cfg(unix)]
|
||||
unsafe {
|
||||
libc::kill(state.pid as i32, libc::SIGKILL);
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
|
@ -1,111 +0,0 @@
|
|||
load("@aspect_rules_esbuild//esbuild:defs.bzl", "esbuild")
|
||||
load("@aspect_rules_js//js:defs.bzl", "js_test")
|
||||
load("@aspect_rules_ts//ts:defs.bzl", "ts_config", "ts_project")
|
||||
load("@databuild_npm//:defs.bzl", "npm_link_all_packages")
|
||||
|
||||
npm_link_all_packages(name = "node_modules")
|
||||
|
||||
filegroup(
|
||||
name = "dist",
|
||||
srcs = [
|
||||
# To be added once we have one
|
||||
# "favicon.svg",
|
||||
"index.html",
|
||||
":app_dist",
|
||||
":css",
|
||||
],
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
|
||||
genrule(
|
||||
name = "css",
|
||||
srcs = [
|
||||
"index.css",
|
||||
"index.html",
|
||||
":node_modules/daisyui",
|
||||
":app_dist",
|
||||
],
|
||||
outs = ["dist.css"],
|
||||
cmd = """
|
||||
# Must manually copy sources, because tailwind silently ignores symlinked files:
|
||||
# https://github.com/tailwindlabs/tailwindcss/issues/13731
|
||||
WORKDIR=$$(dirname $(location index.css))
|
||||
find $$WORKDIR -type l -exec bash -c 'echo "> $${0}" && cp -fL "$${0}" "$${0}.tmp" && mv "$${0}.tmp" "$${0}"' {} \\;
|
||||
# Copy over source from built TS app so that tailwind can see the used classes
|
||||
for fpath in $(locations :app_dist); do
|
||||
cp $$fpath $$WORKDIR
|
||||
done
|
||||
# Include daisyui plugin
|
||||
cp -R $(@D)/node_modules/.aspect_rules_js/*/node_modules $$WORKDIR/node_modules
|
||||
# Run tailwind build
|
||||
$(location //tools/build_rules:tailwind) -i $(location index.css) -o $@
|
||||
""",
|
||||
tools = ["//tools/build_rules:tailwind"],
|
||||
)
|
||||
|
||||
ts_config(
|
||||
name = "ts_config_app",
|
||||
src = ":tsconfig_app.json",
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
|
||||
# Making modules of ts projects seems to be a rats nest.
|
||||
# Hopefully we can figure this out in the future.
|
||||
ts_project(
|
||||
name = "app",
|
||||
srcs = [
|
||||
"index.ts",
|
||||
"layout.ts",
|
||||
"pages.ts",
|
||||
"services.ts",
|
||||
"types.ts",
|
||||
"utils.ts",
|
||||
# Test files
|
||||
"index.test.ts",
|
||||
"utils.test.ts",
|
||||
"transformation-tests.ts",
|
||||
],
|
||||
allow_js = True,
|
||||
resolve_json_module = True,
|
||||
transpiler = "tsc",
|
||||
tsconfig = ":ts_config_app",
|
||||
deps = [
|
||||
":node_modules/@types/mithril",
|
||||
":node_modules/@types/node",
|
||||
":node_modules/@types/ospec",
|
||||
":node_modules/mithril",
|
||||
":node_modules/ospec",
|
||||
":node_modules/whatwg-fetch",
|
||||
"//databuild/client:typescript_lib",
|
||||
],
|
||||
)
|
||||
|
||||
esbuild(
|
||||
name = "app_dist",
|
||||
srcs = [":app"],
|
||||
bazel_sandbox_plugin = True,
|
||||
entry_point = "index.js",
|
||||
# esbuild_log_level = "verbose",
|
||||
# js_log_level = "debug",
|
||||
metafile = True,
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
|
||||
js_test(
|
||||
name = "app_test",
|
||||
chdir = package_name(),
|
||||
data = [":app"],
|
||||
entry_point = "index.test.js",
|
||||
)
|
||||
|
||||
# Test to verify strict TypeScript configuration catches expected failures
|
||||
sh_test(
|
||||
name = "strict_config_test",
|
||||
srcs = ["test-strict-config.sh"],
|
||||
data = [
|
||||
"test-data/strict-config-failures.ts",
|
||||
"tsconfig_app.json",
|
||||
":node_modules/@types/node",
|
||||
":node_modules/typescript",
|
||||
],
|
||||
)
|
||||
|
|
@ -1,4 +0,0 @@
|
|||
|
||||
# Dashboard
|
||||
|
||||
A dashboard for viewing past build status, current running builds, etc. Extremely prototyped right now.
|
||||
|
|
@ -1,127 +0,0 @@
|
|||
# Dashboard Type Safety Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the type safety architecture implemented in the DataBuild dashboard to prevent runtime errors from backend API changes.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
The dashboard previously experienced runtime crashes when backend API changes were deployed:
|
||||
- `status.toLowerCase()` failed when status changed from string to object
|
||||
- `partition.str` access failed when partition structure changed
|
||||
- TypeScript compilation passed but runtime errors occurred
|
||||
|
||||
## Solution Architecture
|
||||
|
||||
### 1. Dashboard Data Contracts
|
||||
|
||||
We define stable TypeScript interfaces in `types.ts` that represent the data shapes the UI components expect:
|
||||
|
||||
```typescript
|
||||
export interface DashboardBuild {
|
||||
build_request_id: string;
|
||||
status: string; // Always a human-readable string
|
||||
requested_partitions: string[]; // Always flat string array
|
||||
// ... other fields
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Transformation Layer
|
||||
|
||||
The `services.ts` file contains transformation functions that convert OpenAPI-generated types to dashboard types:
|
||||
|
||||
```typescript
|
||||
function transformBuildSummary(apiResponse: BuildSummary): DashboardBuild {
|
||||
return {
|
||||
build_request_id: apiResponse.build_request_id,
|
||||
status: apiResponse.status_name, // Extract string from API
|
||||
requested_partitions: apiResponse.requested_partitions.map(p => p.str), // Flatten objects
|
||||
// ... transform other fields
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Component Isolation
|
||||
|
||||
All UI components use only dashboard types, never raw API types:
|
||||
|
||||
```typescript
|
||||
// GOOD: Using dashboard types
|
||||
const build: DashboardBuild = await DashboardService.getBuildDetail(id);
|
||||
m('div', build.status.toLowerCase()); // Safe - status is always string
|
||||
|
||||
// BAD: Using API types directly
|
||||
const build: BuildSummary = await apiClient.getBuild(id);
|
||||
m('div', build.status.toLowerCase()); // Unsafe - status might be object
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Compile-time Safety**: TypeScript catches type mismatches during development
|
||||
2. **Runtime Protection**: Transformation functions handle API changes gracefully
|
||||
3. **Clear Boundaries**: UI code is isolated from API implementation details
|
||||
4. **Easier Updates**: API changes require updates only in transformation functions
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
- `transformation-tests.ts`: Verify transformation functions produce correct dashboard types
|
||||
|
||||
### Strict TypeScript Configuration
|
||||
- `exactOptionalPropertyTypes`: Ensures optional properties are handled explicitly
|
||||
- `strictNullChecks`: Prevents null/undefined errors
|
||||
- `noImplicitAny`: Requires explicit typing
|
||||
|
||||
## Maintenance Guidelines
|
||||
|
||||
### When Backend API Changes
|
||||
|
||||
1. Update the OpenAPI spec and regenerate client
|
||||
2. TypeScript compilation will fail in transformation functions if types changed
|
||||
3. Update only the transformation functions to handle new API shape
|
||||
4. Run tests to verify UI components still work correctly
|
||||
|
||||
### Adding New Features
|
||||
|
||||
1. Define dashboard types in `types.ts`
|
||||
2. Create transformation functions in `services.ts`
|
||||
3. Use only dashboard types in components
|
||||
4. Add tests for the transformation logic
|
||||
|
||||
## Example: Handling API Evolution
|
||||
|
||||
If the backend changes `status` from string to object:
|
||||
|
||||
```typescript
|
||||
// Old API
|
||||
{ status_name: "COMPLETED" }
|
||||
|
||||
// New API
|
||||
{ status: { code: 4, name: "COMPLETED" } }
|
||||
|
||||
// Transformation handles both
|
||||
function transformBuildSummary(apiResponse: any): DashboardBuild {
|
||||
return {
|
||||
status: apiResponse.status_name || apiResponse.status?.name || 'UNKNOWN',
|
||||
// ... other fields
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
The UI components continue working without changes because they always receive the expected `string` type.
|
||||
|
||||
## Monitoring
|
||||
|
||||
To maintain type safety over time:
|
||||
|
||||
1. **Build-time Checks**: TypeScript compilation catches type errors
|
||||
2. **Test Suite**: Transformation tests run on every build
|
||||
3. **Code Reviews**: Ensure new code follows the pattern
|
||||
4. **Documentation**: Keep this document updated with patterns
|
||||
|
||||
## Related Files
|
||||
|
||||
- `types.ts` - Dashboard type definitions
|
||||
- `services.ts` - API transformation functions
|
||||
- `transformation-tests.ts` - Unit tests for transformations
|
||||
- `tsconfig_app.json` - Strict TypeScript configuration
|
||||
|
|
@ -1,78 +0,0 @@
|
|||
@import "tailwindcss" source("./**/*.{js,html}");
|
||||
@plugin "daisyui" {
|
||||
}
|
||||
|
||||
|
||||
|
||||
@plugin "daisyui/theme" {
|
||||
name: "databuild-light";
|
||||
default: true;
|
||||
prefersdark: false;
|
||||
color-scheme: "light";
|
||||
--color-base-100: oklch(100% 0 0);
|
||||
--color-base-200: oklch(98% 0.002 247.839);
|
||||
--color-base-300: oklch(96% 0.003 264.542);
|
||||
--color-base-content: oklch(21% 0.034 264.665);
|
||||
--color-primary: oklch(37% 0.01 67.558);
|
||||
--color-primary-content: oklch(100% 0 0);
|
||||
--color-secondary: oklch(77% 0.152 181.912);
|
||||
--color-secondary-content: oklch(100% 0 0);
|
||||
--color-accent: oklch(75% 0.183 55.934);
|
||||
--color-accent-content: oklch(100% 0 0);
|
||||
--color-neutral: oklch(37% 0.01 67.558);
|
||||
--color-neutral-content: oklch(98% 0.002 247.839);
|
||||
--color-info: oklch(80% 0.105 251.813);
|
||||
--color-info-content: oklch(28% 0.091 267.935);
|
||||
--color-success: oklch(84% 0.238 128.85);
|
||||
--color-success-content: oklch(27% 0.072 132.109);
|
||||
--color-warning: oklch(85% 0.199 91.936);
|
||||
--color-warning-content: oklch(27% 0.077 45.635);
|
||||
--color-error: oklch(70% 0.191 22.216);
|
||||
--color-error-content: oklch(25% 0.092 26.042);
|
||||
--radius-selector: 0.5rem;
|
||||
--radius-field: 0.5rem;
|
||||
--radius-box: 0.5rem;
|
||||
--size-selector: 0.25rem;
|
||||
--size-field: 0.25rem;
|
||||
--border: 1px;
|
||||
--depth: 0;
|
||||
--noise: 0;
|
||||
}
|
||||
|
||||
|
||||
@plugin "daisyui/theme" {
|
||||
name: "databuild-dark";
|
||||
default: false;
|
||||
prefersdark: false;
|
||||
color-scheme: "dark";
|
||||
--color-base-100: oklch(15% 0.002 247.839);
|
||||
--color-base-200: oklch(18% 0.003 264.542);
|
||||
--color-base-300: oklch(22% 0.006 264.531);
|
||||
--color-base-content: oklch(92% 0.034 264.665);
|
||||
--color-primary: oklch(75% 0.005 56.366);
|
||||
--color-primary-content: oklch(15% 0.006 56.043);
|
||||
--color-secondary: oklch(65% 0.152 181.912);
|
||||
--color-secondary-content: oklch(15% 0 0);
|
||||
--color-accent: oklch(70% 0.183 55.934);
|
||||
--color-accent-content: oklch(15% 0 0);
|
||||
--color-neutral: oklch(25% 0.01 67.558);
|
||||
--color-neutral-content: oklch(92% 0.002 247.839);
|
||||
--color-info: oklch(65% 0.165 254.624);
|
||||
--color-info-content: oklch(15% 0.091 267.935);
|
||||
--color-success: oklch(75% 0.238 128.85);
|
||||
--color-success-content: oklch(15% 0.072 132.109);
|
||||
--color-warning: oklch(80% 0.199 91.936);
|
||||
--color-warning-content: oklch(15% 0.077 45.635);
|
||||
--color-error: oklch(65% 0.191 22.216);
|
||||
--color-error-content: oklch(15% 0.092 26.042);
|
||||
--radius-selector: 0.5rem;
|
||||
--radius-field: 0.5rem;
|
||||
--radius-box: 0.5rem;
|
||||
--size-selector: 0.25rem;
|
||||
--size-field: 0.25rem;
|
||||
--border: 1px;
|
||||
--depth: 0;
|
||||
--noise: 0;
|
||||
}
|
||||
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
<!doctype html>
|
||||
<html data-theme="databuild-light">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>DataBuild Dashboard</title>
|
||||
<link href="/static/dist.css" rel="stylesheet">
|
||||
<script src="/static/app_dist.js"></script>
|
||||
<script type="module">
|
||||
import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
|
||||
window.mermaid = mermaid;
|
||||
mermaid.initialize({ startOnLoad: true });
|
||||
console.info("mermaid loaded", mermaid);
|
||||
</script>
|
||||
</head>
|
||||
<body>
|
||||
<div id="app">
|
||||
Loading...
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -1,15 +0,0 @@
|
|||
const { appName } = require('./index');
|
||||
const o = require('ospec');
|
||||
|
||||
// Import transformation tests
|
||||
require('./transformation-tests');
|
||||
|
||||
o.spec("appName", () => {
|
||||
o("should be databuild", () => {
|
||||
o(appName).equals("databuild") `Should be databuild`;
|
||||
});
|
||||
});
|
||||
|
||||
// TODO - I think we can create an ospec target that invokes these with the ospec CLI?
|
||||
// https://github.com/MithrilJS/ospec?tab=readme-ov-file#command-line-interface
|
||||
o.run();
|
||||
|
|
@ -1,76 +0,0 @@
|
|||
import m from 'mithril';
|
||||
import { Layout } from './layout';
|
||||
import {
|
||||
RecentActivity,
|
||||
BuildStatus,
|
||||
PartitionsList,
|
||||
PartitionStatus,
|
||||
JobsList,
|
||||
JobMetrics,
|
||||
GraphAnalysis
|
||||
} from './pages';
|
||||
import { decodePartitionRef } from './utils';
|
||||
import {
|
||||
TypedComponent,
|
||||
LayoutWrapperAttrs,
|
||||
RecentActivityAttrs,
|
||||
BuildStatusAttrs,
|
||||
PartitionStatusAttrs,
|
||||
PartitionsListAttrs,
|
||||
JobsListAttrs,
|
||||
JobMetricsAttrs,
|
||||
GraphAnalysisAttrs
|
||||
} from './types';
|
||||
|
||||
export const appName = "databuild";
|
||||
|
||||
// Wrapper components that include layout - now with type safety
|
||||
function createLayoutWrapper<TAttrs>(component: TypedComponent<TAttrs>): m.Component<TAttrs> {
|
||||
const wrapper: any = {
|
||||
view: (vnode: m.Vnode<TAttrs>) => m(Layout, [component.view.call(component, vnode)])
|
||||
};
|
||||
|
||||
// Only add lifecycle methods if they exist to avoid exactOptionalPropertyTypes issues
|
||||
if (component.oninit) {
|
||||
wrapper.oninit = (vnode: m.Vnode<TAttrs>) => component.oninit!.call(component, vnode);
|
||||
}
|
||||
if (component.oncreate) {
|
||||
wrapper.oncreate = (vnode: m.VnodeDOM<TAttrs>) => component.oncreate!.call(component, vnode);
|
||||
}
|
||||
if (component.onupdate) {
|
||||
wrapper.onupdate = (vnode: m.VnodeDOM<TAttrs>) => component.onupdate!.call(component, vnode);
|
||||
}
|
||||
if (component.onbeforeremove) {
|
||||
wrapper.onbeforeremove = (vnode: m.VnodeDOM<TAttrs>) => component.onbeforeremove!.call(component, vnode);
|
||||
}
|
||||
if (component.onremove) {
|
||||
wrapper.onremove = (vnode: m.VnodeDOM<TAttrs>) => component.onremove!.call(component, vnode);
|
||||
}
|
||||
if (component.onbeforeupdate) {
|
||||
wrapper.onbeforeupdate = (vnode: m.Vnode<TAttrs>, old: m.VnodeDOM<TAttrs>) => component.onbeforeupdate!.call(component, vnode, old);
|
||||
}
|
||||
|
||||
return wrapper;
|
||||
}
|
||||
|
||||
// Route definitions with type safety
|
||||
const routes = {
|
||||
'/': createLayoutWrapper<RecentActivityAttrs>(RecentActivity),
|
||||
'/builds/:id': createLayoutWrapper<BuildStatusAttrs>(BuildStatus),
|
||||
'/partitions': createLayoutWrapper<PartitionsListAttrs>(PartitionsList),
|
||||
'/partitions/:base64_ref': createLayoutWrapper<PartitionStatusAttrs>(PartitionStatus),
|
||||
'/jobs': createLayoutWrapper<JobsListAttrs>(JobsList),
|
||||
'/jobs/:label': createLayoutWrapper<JobMetricsAttrs>(JobMetrics),
|
||||
'/analyze': createLayoutWrapper<GraphAnalysisAttrs>(GraphAnalysis),
|
||||
};
|
||||
|
||||
if (typeof window !== "undefined") {
|
||||
document.addEventListener("DOMContentLoaded", () => {
|
||||
// Initialize theme from localStorage
|
||||
const savedTheme = localStorage.getItem('theme') || 'databuild-light';
|
||||
document.documentElement.setAttribute('data-theme', savedTheme);
|
||||
|
||||
// Set up routing
|
||||
m.route(document.getElementById('app') as HTMLElement, '/', routes);
|
||||
});
|
||||
}
|
||||
|
|
@ -1,52 +0,0 @@
|
|||
import m from 'mithril';
|
||||
|
||||
export const Layout = {
|
||||
view: (vnode: any) => [
|
||||
m('header.navbar.bg-base-100.shadow-lg', [
|
||||
m('div.navbar-start', [
|
||||
m('div.dropdown', [
|
||||
m('div.btn.btn-ghost.lg:hidden[tabindex="0"][role="button"]', [
|
||||
m('svg.w-5.h-5[xmlns="http://www.w3.org/2000/svg"][fill="none"][viewBox="0 0 24 24"]', [
|
||||
m('path[stroke-linecap="round"][stroke-linejoin="round"][stroke-width="2"][stroke="currentColor"][d="M4 6h16M4 12h8m-8 6h16"]'),
|
||||
]),
|
||||
]),
|
||||
m('ul.menu.menu-sm.dropdown-content.bg-base-100.rounded-box.z-1.mt-3.w-52.p-2.shadow[tabindex="0"]', [
|
||||
m('li', m(m.route.Link, { href: '/partitions' }, 'Partitions')),
|
||||
m('li', m(m.route.Link, { href: '/jobs' }, 'Jobs')),
|
||||
m('li', m(m.route.Link, { href: '/analyze' }, 'Analyze')),
|
||||
]),
|
||||
]),
|
||||
m(m.route.Link, { href: '/', class: 'btn btn-ghost text-xl' }, 'DataBuild Dashboard'),
|
||||
]),
|
||||
m('div.navbar-center.hidden.lg:flex', [
|
||||
m('ul.menu.menu-horizontal.px-1', [
|
||||
m('li', m(m.route.Link, { href: '/' }, 'Dashboard')),
|
||||
m('li', m(m.route.Link, { href: '/partitions' }, 'Partitions')),
|
||||
m('li', m(m.route.Link, { href: '/jobs' }, 'Jobs')),
|
||||
m('li', m(m.route.Link, { href: '/analyze' }, 'Analyze')),
|
||||
]),
|
||||
]),
|
||||
m('div.navbar-end', [
|
||||
m('label.swap.swap-rotate', [
|
||||
m('input.theme-controller[type="checkbox"]', {
|
||||
value: 'databuild-dark',
|
||||
onchange: (e: Event) => {
|
||||
const target = e.target as HTMLInputElement;
|
||||
const theme = target.checked ? 'databuild-dark' : 'databuild-light';
|
||||
document.documentElement.setAttribute('data-theme', theme);
|
||||
localStorage.setItem('theme', theme);
|
||||
},
|
||||
checked: localStorage.getItem('theme') === 'databuild-dark'
|
||||
}),
|
||||
m('svg.swap-off.fill-current.w-6.h-6[xmlns="http://www.w3.org/2000/svg"][viewBox="0 0 24 24"]', [
|
||||
m('path[d="M5.64,17l-.71.71a1,1,0,0,0,0,1.41,1,1,0,0,0,1.41,0l.71-.71A1,1,0,0,0,5.64,17ZM5,12a1,1,0,0,0-1-1H3a1,1,0,0,0,0,2H4A1,1,0,0,0,5,12Zm7-7a1,1,0,0,0,1-1V3a1,1,0,0,0-2,0V4A1,1,0,0,0,12,5ZM5.64,7.05a1,1,0,0,0,.7.29,1,1,0,0,0,.71-.29,1,1,0,0,0,0-1.41l-.71-.71A1,1,0,0,0,4.93,6.34Zm12,.29a1,1,0,0,0,.7-.29l.71-.71a1,1,0,1,0-1.41-1.41L17,5.64a1,1,0,0,0,0,1.41A1,1,0,0,0,17.66,7.34ZM21,11H20a1,1,0,0,0,0,2h1a1,1,0,0,0,0-2Zm-9,8a1,1,0,0,0-1,1v1a1,1,0,0,0,2,0V20A1,1,0,0,0,12,19ZM18.36,17A1,1,0,0,0,17,18.36l.71.71a1,1,0,0,0,1.41,0,1,1,0,0,0,0-1.41ZM12,6.5A5.5,5.5,0,1,0,17.5,12,5.51,5.51,0,0,0,12,6.5Zm0,9A3.5,3.5,0,1,1,15.5,12,3.5,3.5,0,0,1,12,15.5Z"]'),
|
||||
]),
|
||||
m('svg.swap-on.fill-current.w-6.h-6[xmlns="http://www.w3.org/2000/svg"][viewBox="0 0 24 24"]', [
|
||||
m('path[d="M21.64,13a1,1,0,0,0-1.05-.14,8.05,8.05,0,0,1-3.37.73A8.15,8.15,0,0,1,9.08,5.49a8.59,8.59,0,0,1,.25-2A1,1,0,0,0,8,2.36,10.14,10.14,0,1,0,22,14.05,1,1,0,0,0,21.64,13Zm-9.5,6.69A8.14,8.14,0,0,1,7.08,5.22v.27A10.15,10.15,0,0,0,17.22,15.63a9.79,9.79,0,0,0,2.1-.22A8.11,8.11,0,0,1,12.14,19.73Z"]'),
|
||||
]),
|
||||
]),
|
||||
]),
|
||||
]),
|
||||
m('main.min-h-screen.bg-base-200.pt-4', vnode.children),
|
||||
]
|
||||
};
|
||||
|
|
@ -1,16 +0,0 @@
|
|||
{
|
||||
"private": true,
|
||||
"devDependencies": {
|
||||
"typescript": "5.7.3",
|
||||
"@types/node": "^22.12.0",
|
||||
"mithril": "^2.2.7",
|
||||
"@types/mithril": "^2.2.7",
|
||||
"ospec": "^4.2.0",
|
||||
"@types/ospec": "^4.2.0",
|
||||
"whatwg-fetch": "^3.6.20",
|
||||
"daisyui": "^5.0.0-beta.6"
|
||||
},
|
||||
"pnpm": {
|
||||
"onlyBuiltDependencies": []
|
||||
}
|
||||
}
|
||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,111 +0,0 @@
|
|||
lockfileVersion: '9.0'
|
||||
settings:
|
||||
autoInstallPeers: true
|
||||
excludeLinksFromLockfile: false
|
||||
importers:
|
||||
.:
|
||||
devDependencies:
|
||||
'@types/mithril':
|
||||
specifier: ^2.2.7
|
||||
version: 2.2.7
|
||||
'@types/node':
|
||||
specifier: ^22.12.0
|
||||
version: 22.12.0
|
||||
'@types/ospec':
|
||||
specifier: ^4.2.0
|
||||
version: 4.2.0
|
||||
daisyui:
|
||||
specifier: ^5.0.0-beta.6
|
||||
version: 5.0.0-beta.6
|
||||
mithril:
|
||||
specifier: ^2.2.7
|
||||
version: 2.2.13
|
||||
ospec:
|
||||
specifier: ^4.2.0
|
||||
version: 4.2.1
|
||||
typescript:
|
||||
specifier: ^5.7.3
|
||||
version: 5.7.3
|
||||
whatwg-fetch:
|
||||
specifier: ^3.6.20
|
||||
version: 3.6.20
|
||||
packages:
|
||||
'@types/mithril@2.2.7':
|
||||
resolution: {integrity: sha512-uetxoYizBMHPELl6DSZUfO6Q/aOm+h0NUCv9bVAX2iAxfrdBSOvU9KKFl+McTtxR13F+BReYLY814pJsZvnSxg==}
|
||||
'@types/node@22.12.0':
|
||||
resolution: {integrity: sha512-Fll2FZ1riMjNmlmJOdAyY5pUbkftXslB5DgEzlIuNaiWhXd00FhWxVC/r4yV/4wBb9JfImTu+jiSvXTkJ7F/gA==}
|
||||
'@types/ospec@4.2.0':
|
||||
resolution: {integrity: sha512-QgwAtrYYstU7otBXmQ2yjUWaYMWkF48EevmG+IfYzAWk39cwsTw7ZHp7dK2XyA3eJ2v5AvbMa5ijcLewklDRDA==}
|
||||
balanced-match@1.0.2:
|
||||
resolution: {integrity: sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==}
|
||||
brace-expansion@2.0.1:
|
||||
resolution: {integrity: sha512-XnAIvQ8eM+kC6aULx6wuQiwVsnzsi9d3WxzV3FpWTGA19F621kwdbsAcFKXgKUHZWsy+mY6iL1sHTxWEFCytDA==}
|
||||
daisyui@5.0.0-beta.6:
|
||||
resolution: {integrity: sha512-gwXHv6MApRBrvUayzg83vS6bfZ+y7/1VGLu0a8/cEAMviS4rXLCd4AndEdlVxhq+25wkAp0CZRkNQ7O4wIoFnQ==}
|
||||
fs.realpath@1.0.0:
|
||||
resolution: {integrity: sha512-OO0pH2lK6a0hZnAdau5ItzHPI6pUlvI7jMVnxUQRtw4owF2wk8lOSabtGDCTP4Ggrg2MbGnWO9X8K1t4+fGMDw==}
|
||||
glob@9.3.5:
|
||||
resolution: {integrity: sha512-e1LleDykUz2Iu+MTYdkSsuWX8lvAjAcs0Xef0lNIu0S2wOAzuTxCJtcd9S3cijlwYF18EsU3rzb8jPVobxDh9Q==}
|
||||
engines: {node: '>=16 || 14 >=14.17'}
|
||||
lru-cache@10.4.3:
|
||||
resolution: {integrity: sha512-JNAzZcXrCt42VGLuYz0zfAzDfAvJWW6AfYlDBQyDV5DClI2m5sAmK+OIO7s59XfsRsWHp02jAJrRadPRGTt6SQ==}
|
||||
minimatch@8.0.4:
|
||||
resolution: {integrity: sha512-W0Wvr9HyFXZRGIDgCicunpQ299OKXs9RgZfaukz4qAW/pJhcpUfupc9c+OObPOFueNy8VSrZgEmDtk6Kh4WzDA==}
|
||||
engines: {node: '>=16 || 14 >=14.17'}
|
||||
minipass@4.2.8:
|
||||
resolution: {integrity: sha512-fNzuVyifolSLFL4NzpF+wEF4qrgqaaKX0haXPQEdQ7NKAN+WecoKMHV09YcuL/DHxrUsYQOK3MiuDf7Ip2OXfQ==}
|
||||
engines: {node: '>=8'}
|
||||
minipass@7.1.2:
|
||||
resolution: {integrity: sha512-qOOzS1cBTWYF4BH8fVePDBOO9iptMnGUEZwNc/cMWnTV2nVLZ7VoNWEPHkYczZA0pdoA7dl6e7FL659nX9S2aw==}
|
||||
engines: {node: '>=16 || 14 >=14.17'}
|
||||
mithril@2.2.13:
|
||||
resolution: {integrity: sha512-dfWFYmRJDXAROG6B1AsQXEwhSgFZ65Am/5Xj3oJ/R1wZtrC0W20P4sIAtFQB0SZsGwV7H2MiEJiFGmlUtXF1Ww==}
|
||||
ospec@4.2.1:
|
||||
resolution: {integrity: sha512-LsJw2WMaVlFDiaIPPH+LMtsxOABjFD29XQ12ENZM+8Cwgg5BEgW65CB+SPL1PceIun+HSfdw8hkf27C8iF/XFw==}
|
||||
hasBin: true
|
||||
path-scurry@1.11.1:
|
||||
resolution: {integrity: sha512-Xa4Nw17FS9ApQFJ9umLiJS4orGjm7ZzwUrwamcGQuHSzDyth9boKDaycYdDcZDuqYATXw4HFXgaqWTctW/v1HA==}
|
||||
engines: {node: '>=16 || 14 >=14.18'}
|
||||
typescript@5.7.3:
|
||||
resolution: {integrity: sha512-84MVSjMEHP+FQRPy3pX9sTVV/INIex71s9TL2Gm5FG/WG1SqXeKyZ0k7/blY/4FdOzI12CBy1vGc4og/eus0fw==}
|
||||
engines: {node: '>=14.17'}
|
||||
hasBin: true
|
||||
undici-types@6.20.0:
|
||||
resolution: {integrity: sha512-Ny6QZ2Nju20vw1SRHe3d9jVu6gJ+4e3+MMpqu7pqE5HT6WsTSlce++GQmK5UXS8mzV8DSYHrQH+Xrf2jVcuKNg==}
|
||||
whatwg-fetch@3.6.20:
|
||||
resolution: {integrity: sha512-EqhiFU6daOA8kpjOWTL0olhVOF3i7OrFzSYiGsEMB8GcXS+RrzauAERX65xMeNWVqxA6HXH2m69Z9LaKKdisfg==}
|
||||
snapshots:
|
||||
'@types/mithril@2.2.7': {}
|
||||
'@types/node@22.12.0':
|
||||
dependencies:
|
||||
undici-types: 6.20.0
|
||||
'@types/ospec@4.2.0': {}
|
||||
balanced-match@1.0.2: {}
|
||||
brace-expansion@2.0.1:
|
||||
dependencies:
|
||||
balanced-match: 1.0.2
|
||||
daisyui@5.0.0-beta.6: {}
|
||||
fs.realpath@1.0.0: {}
|
||||
glob@9.3.5:
|
||||
dependencies:
|
||||
fs.realpath: 1.0.0
|
||||
minimatch: 8.0.4
|
||||
minipass: 4.2.8
|
||||
path-scurry: 1.11.1
|
||||
lru-cache@10.4.3: {}
|
||||
minimatch@8.0.4:
|
||||
dependencies:
|
||||
brace-expansion: 2.0.1
|
||||
minipass@4.2.8: {}
|
||||
minipass@7.1.2: {}
|
||||
mithril@2.2.13: {}
|
||||
ospec@4.2.1:
|
||||
dependencies:
|
||||
glob: 9.3.5
|
||||
path-scurry@1.11.1:
|
||||
dependencies:
|
||||
lru-cache: 10.4.3
|
||||
minipass: 7.1.2
|
||||
typescript@5.7.3: {}
|
||||
undici-types@6.20.0: {}
|
||||
whatwg-fetch@3.6.20: {}
|
||||
|
|
@ -1,2 +0,0 @@
|
|||
packages:
|
||||
- "databuild"
|
||||
|
|
@ -1,492 +0,0 @@
|
|||
// Import the generated TypeScript client
|
||||
import {
|
||||
DefaultApi,
|
||||
Configuration,
|
||||
ActivityApiResponse,
|
||||
ActivityResponse,
|
||||
BuildSummary,
|
||||
BuildDetailResponse,
|
||||
PartitionSummary,
|
||||
JobsListApiResponse,
|
||||
JobMetricsResponse,
|
||||
JobSummary,
|
||||
JobRunSummary,
|
||||
JobDailyStats
|
||||
} from '../client/typescript_generated/src/index';
|
||||
|
||||
// Import our dashboard types
|
||||
import {
|
||||
DashboardActivity,
|
||||
DashboardBuild,
|
||||
DashboardPartition,
|
||||
DashboardJob,
|
||||
isDashboardActivity,
|
||||
isDashboardBuild,
|
||||
isDashboardPartition,
|
||||
isDashboardJob
|
||||
} from './types';
|
||||
|
||||
// Configure the API client
|
||||
const apiConfig = new Configuration({
|
||||
basePath: '', // Use relative paths since we're on the same host
|
||||
});
|
||||
const apiClient = new DefaultApi(apiConfig);
|
||||
|
||||
// Transformation functions: Convert API responses to dashboard types
|
||||
// These functions prevent runtime errors by ensuring consistent data shapes
|
||||
|
||||
function transformBuildSummary(apiResponse: BuildSummary): DashboardBuild {
|
||||
return {
|
||||
build_request_id: apiResponse.build_request_id,
|
||||
status_code: apiResponse.status_code,
|
||||
status_name: apiResponse.status_name,
|
||||
requested_partitions: apiResponse.requested_partitions, // Keep as PartitionRef array
|
||||
total_jobs: apiResponse.total_jobs,
|
||||
completed_jobs: apiResponse.completed_jobs,
|
||||
failed_jobs: apiResponse.failed_jobs,
|
||||
cancelled_jobs: apiResponse.cancelled_jobs,
|
||||
requested_at: apiResponse.requested_at,
|
||||
started_at: apiResponse.started_at ?? null,
|
||||
completed_at: apiResponse.completed_at ?? null,
|
||||
duration_ms: apiResponse.duration_ms ?? null,
|
||||
cancelled: apiResponse.cancelled,
|
||||
};
|
||||
}
|
||||
|
||||
function transformBuildDetail(apiResponse: BuildDetailResponse): DashboardBuild {
|
||||
return {
|
||||
build_request_id: apiResponse.build_request_id,
|
||||
status_code: apiResponse.status_code,
|
||||
status_name: apiResponse.status_name,
|
||||
requested_partitions: apiResponse.requested_partitions, // Keep as PartitionRef array
|
||||
total_jobs: apiResponse.total_jobs,
|
||||
completed_jobs: apiResponse.completed_jobs,
|
||||
failed_jobs: apiResponse.failed_jobs,
|
||||
cancelled_jobs: apiResponse.cancelled_jobs,
|
||||
requested_at: apiResponse.requested_at,
|
||||
started_at: apiResponse.started_at ?? null,
|
||||
completed_at: apiResponse.completed_at ?? null,
|
||||
duration_ms: apiResponse.duration_ms ?? null,
|
||||
cancelled: apiResponse.cancelled,
|
||||
};
|
||||
}
|
||||
|
||||
function transformPartitionSummary(apiResponse: PartitionSummary): DashboardPartition {
|
||||
if (!apiResponse.partition_ref) {
|
||||
throw new Error('PartitionSummary must have a valid partition_ref');
|
||||
}
|
||||
|
||||
return {
|
||||
partition_ref: apiResponse.partition_ref, // Keep as PartitionRef object
|
||||
status_code: apiResponse.status_code,
|
||||
status_name: apiResponse.status_name,
|
||||
last_updated: apiResponse.last_updated ?? null,
|
||||
build_requests: (apiResponse as any).build_requests || [], // This field might not be in the OpenAPI spec
|
||||
};
|
||||
}
|
||||
|
||||
function transformJobSummary(apiResponse: JobSummary): DashboardJob {
|
||||
return {
|
||||
job_label: apiResponse.job_label,
|
||||
total_runs: apiResponse.total_runs,
|
||||
successful_runs: apiResponse.successful_runs,
|
||||
failed_runs: apiResponse.failed_runs,
|
||||
cancelled_runs: apiResponse.cancelled_runs,
|
||||
last_run_timestamp: apiResponse.last_run_timestamp,
|
||||
last_run_status_code: apiResponse.last_run_status_code,
|
||||
last_run_status_name: apiResponse.last_run_status_name,
|
||||
average_partitions_per_run: apiResponse.average_partitions_per_run,
|
||||
recent_builds: apiResponse.recent_builds || [], // Default for optional array field
|
||||
};
|
||||
}
|
||||
|
||||
function transformActivityResponse(apiResponse: ActivityResponse): DashboardActivity {
|
||||
return {
|
||||
active_builds_count: apiResponse.active_builds_count,
|
||||
recent_builds: apiResponse.recent_builds.map(transformBuildSummary),
|
||||
recent_partitions: apiResponse.recent_partitions.map(transformPartitionSummary),
|
||||
total_partitions_count: apiResponse.total_partitions_count,
|
||||
system_status: apiResponse.system_status,
|
||||
graph_name: apiResponse.graph_name,
|
||||
};
|
||||
}
|
||||
|
||||
// Type guards for runtime validation
|
||||
function isValidBuildDetailResponse(data: unknown): data is BuildDetailResponse {
|
||||
return typeof data === 'object' &&
|
||||
data !== null &&
|
||||
'build_request_id' in data &&
|
||||
'status_name' in data &&
|
||||
'requested_partitions' in data;
|
||||
}
|
||||
|
||||
function isValidActivityResponse(data: unknown): data is ActivityResponse {
|
||||
return typeof data === 'object' &&
|
||||
data !== null &&
|
||||
'active_builds_count' in data &&
|
||||
'recent_builds' in data &&
|
||||
'recent_partitions' in data;
|
||||
}
|
||||
|
||||
function isValidJobsListApiResponse(data: unknown): data is JobsListApiResponse {
|
||||
return typeof data === 'object' &&
|
||||
data !== null &&
|
||||
'data' in data;
|
||||
}
|
||||
|
||||
// API Service for fetching recent activity data
|
||||
export class DashboardService {
|
||||
private static instance: DashboardService;
|
||||
|
||||
static getInstance(): DashboardService {
|
||||
if (!DashboardService.instance) {
|
||||
DashboardService.instance = new DashboardService();
|
||||
}
|
||||
return DashboardService.instance;
|
||||
}
|
||||
|
||||
async getRecentActivity(): Promise<DashboardActivity> {
|
||||
try {
|
||||
// Use the new activity endpoint that aggregates all the data we need
|
||||
const activityApiResponse: ActivityApiResponse = await apiClient.apiV1ActivityGet();
|
||||
console.info('Recent activity:', activityApiResponse);
|
||||
|
||||
const activityResponse = activityApiResponse.data;
|
||||
|
||||
// Validate API response structure
|
||||
if (!isValidActivityResponse(activityResponse)) {
|
||||
throw new Error('Invalid activity response structure');
|
||||
}
|
||||
|
||||
// Transform API response to dashboard format using transformation function
|
||||
const dashboardActivity = transformActivityResponse(activityResponse);
|
||||
|
||||
// Validate transformed result
|
||||
if (!isDashboardActivity(dashboardActivity)) {
|
||||
throw new Error('Transformation produced invalid dashboard activity');
|
||||
}
|
||||
|
||||
return dashboardActivity;
|
||||
} catch (error) {
|
||||
console.error('Failed to fetch recent activity:', error);
|
||||
|
||||
// Fall back to valid dashboard format if API call fails
|
||||
return {
|
||||
active_builds_count: 0,
|
||||
recent_builds: [],
|
||||
recent_partitions: [],
|
||||
total_partitions_count: 0,
|
||||
system_status: 'error',
|
||||
graph_name: 'Unknown Graph'
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
async getJobs(searchTerm?: string): Promise<DashboardJob[]> {
|
||||
try {
|
||||
// Build query parameters manually since the generated client may not support query params correctly
|
||||
const queryParams = new URLSearchParams();
|
||||
if (searchTerm) {
|
||||
queryParams.append('search', searchTerm);
|
||||
}
|
||||
const url = `/api/v1/jobs${queryParams.toString() ? '?' + queryParams.toString() : ''}`;
|
||||
|
||||
const response = await fetch(url);
|
||||
if (!response.ok) {
|
||||
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
|
||||
}
|
||||
const data: unknown = await response.json();
|
||||
|
||||
// Validate API response structure
|
||||
if (!isValidJobsListApiResponse(data)) {
|
||||
throw new Error('Invalid jobs list response structure');
|
||||
}
|
||||
|
||||
// Transform each job using our transformation function
|
||||
const dashboardJobs = data.data.jobs.map(transformJobSummary);
|
||||
|
||||
// Validate each transformed job
|
||||
for (const job of dashboardJobs) {
|
||||
if (!isDashboardJob(job)) {
|
||||
throw new Error('Transformation produced invalid dashboard job');
|
||||
}
|
||||
}
|
||||
|
||||
return dashboardJobs;
|
||||
} catch (error) {
|
||||
console.error('Failed to fetch jobs:', error);
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
async getBuildDetail(buildId: string): Promise<DashboardBuild | null> {
|
||||
try {
|
||||
const url = `/api/v1/builds/${buildId}`;
|
||||
|
||||
const response = await fetch(url);
|
||||
if (!response.ok) {
|
||||
if (response.status === 404) {
|
||||
return null; // Build not found
|
||||
}
|
||||
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
|
||||
}
|
||||
const data: unknown = await response.json();
|
||||
|
||||
// Validate API response structure
|
||||
if (!isValidBuildDetailResponse(data)) {
|
||||
throw new Error('Invalid build detail response structure');
|
||||
}
|
||||
|
||||
// Transform to dashboard format
|
||||
const dashboardBuild = transformBuildDetail(data);
|
||||
|
||||
// Validate transformed result
|
||||
if (!isDashboardBuild(dashboardBuild)) {
|
||||
throw new Error('Transformation produced invalid dashboard build');
|
||||
}
|
||||
|
||||
return dashboardBuild;
|
||||
} catch (error) {
|
||||
console.error('Failed to fetch build detail:', error);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async getPartitionDetail(partitionRef: string): Promise<DashboardPartition | null> {
|
||||
try {
|
||||
// Encode partition ref for URL safety
|
||||
const encodedRef = btoa(partitionRef).replace(/\+/g, '-').replace(/\//g, '_').replace(/=/g, '');
|
||||
const url = `/api/v1/partitions/${encodedRef}`;
|
||||
|
||||
const response = await fetch(url);
|
||||
if (!response.ok) {
|
||||
if (response.status === 404) {
|
||||
return null; // Partition not found
|
||||
}
|
||||
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
|
||||
}
|
||||
const data: unknown = await response.json();
|
||||
|
||||
// For partition detail, we need to extract the PartitionSummary from the response
|
||||
// and transform it to dashboard format
|
||||
if (typeof data === 'object' && data !== null && 'partition_ref' in data) {
|
||||
const dashboardPartition = transformPartitionSummary(data as PartitionSummary);
|
||||
|
||||
if (!isDashboardPartition(dashboardPartition)) {
|
||||
throw new Error('Transformation produced invalid dashboard partition');
|
||||
}
|
||||
|
||||
return dashboardPartition;
|
||||
} else {
|
||||
throw new Error('Invalid partition detail response structure');
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('Failed to fetch partition detail:', error);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async getJobMetrics(jobLabel: string): Promise<DashboardJob | null> {
|
||||
try {
|
||||
// Encode job label like partition refs for URL safety
|
||||
const encodedLabel = btoa(jobLabel).replace(/\+/g, '-').replace(/\//g, '_').replace(/=/g, '');
|
||||
const url = `/api/v1/jobs/${encodedLabel}`;
|
||||
|
||||
const response = await fetch(url);
|
||||
if (!response.ok) {
|
||||
if (response.status === 404) {
|
||||
return null; // Job not found
|
||||
}
|
||||
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
|
||||
}
|
||||
const data: unknown = await response.json();
|
||||
console.log('Job metrics response:', data);
|
||||
|
||||
// Extract job summary from metrics response and transform it
|
||||
if (typeof data === 'object' && data !== null && 'job_label' in data) {
|
||||
const dashboardJob = transformJobSummary(data as unknown as JobSummary);
|
||||
console.log('Transformed job summary:', dashboardJob);
|
||||
|
||||
if (!isDashboardJob(dashboardJob)) {
|
||||
throw new Error('Transformation produced invalid dashboard job');
|
||||
}
|
||||
|
||||
return dashboardJob;
|
||||
}
|
||||
|
||||
throw new Error('Invalid job metrics response structure');
|
||||
} catch (error) {
|
||||
console.error('Failed to fetch job metrics:', error);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async getMermaidDiagram(buildId: string): Promise<string | null> {
|
||||
try {
|
||||
const url = `/api/v1/builds/${buildId}/mermaid`;
|
||||
|
||||
const response = await fetch(url);
|
||||
if (!response.ok) {
|
||||
if (response.status === 404) {
|
||||
return null; // Build not found or no job graph
|
||||
}
|
||||
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
|
||||
}
|
||||
|
||||
const data = await response.json();
|
||||
|
||||
// Validate response structure
|
||||
if (typeof data === 'object' && data !== null && 'mermaid' in data && typeof data.mermaid === 'string') {
|
||||
return data.mermaid;
|
||||
}
|
||||
|
||||
throw new Error('Invalid mermaid response structure');
|
||||
} catch (error) {
|
||||
console.error('Failed to fetch mermaid diagram:', error);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Polling manager with Page Visibility API integration
|
||||
export class PollingManager {
|
||||
private intervals: Map<string, NodeJS.Timeout> = new Map();
|
||||
private isTabVisible: boolean = true;
|
||||
private visibilityChangeHandler: () => void;
|
||||
|
||||
constructor() {
|
||||
this.visibilityChangeHandler = () => {
|
||||
this.isTabVisible = !document.hidden;
|
||||
|
||||
// Pause or resume polling based on tab visibility
|
||||
if (this.isTabVisible) {
|
||||
this.resumePolling();
|
||||
} else {
|
||||
this.pausePolling();
|
||||
}
|
||||
};
|
||||
|
||||
// Set up Page Visibility API listener only in browser environment
|
||||
if (typeof document !== 'undefined') {
|
||||
document.addEventListener('visibilitychange', this.visibilityChangeHandler);
|
||||
}
|
||||
}
|
||||
|
||||
startPolling(key: string, callback: () => void, intervalMs: number): void {
|
||||
// Clear existing interval if any
|
||||
this.stopPolling(key);
|
||||
|
||||
// Only start polling if tab is visible
|
||||
if (this.isTabVisible) {
|
||||
const interval = setInterval(callback, intervalMs);
|
||||
this.intervals.set(key, interval);
|
||||
}
|
||||
}
|
||||
|
||||
stopPolling(key: string): void {
|
||||
const interval = this.intervals.get(key);
|
||||
if (interval) {
|
||||
clearInterval(interval);
|
||||
this.intervals.delete(key);
|
||||
}
|
||||
}
|
||||
|
||||
private pausePolling(): void {
|
||||
// Store current intervals but clear them
|
||||
for (const [key, interval] of this.intervals) {
|
||||
clearInterval(interval);
|
||||
}
|
||||
}
|
||||
|
||||
private resumePolling(): void {
|
||||
// This is a simplified approach - in practice you'd want to store the callback
|
||||
// and interval info to properly resume. For now, components will handle this
|
||||
// by checking visibility state when setting up polling.
|
||||
}
|
||||
|
||||
cleanup(): void {
|
||||
// Clean up all intervals
|
||||
for (const interval of this.intervals.values()) {
|
||||
clearInterval(interval);
|
||||
}
|
||||
this.intervals.clear();
|
||||
|
||||
// Remove event listener only in browser environment
|
||||
if (typeof document !== 'undefined') {
|
||||
document.removeEventListener('visibilitychange', this.visibilityChangeHandler);
|
||||
}
|
||||
}
|
||||
|
||||
isVisible(): boolean {
|
||||
return this.isTabVisible;
|
||||
}
|
||||
}
|
||||
|
||||
// Export singleton instance
|
||||
export const pollingManager = new PollingManager();
|
||||
|
||||
// Utility functions for time formatting
|
||||
export function formatTime(epochNanos: number): string {
|
||||
const date = new Date(epochNanos / 1000000);
|
||||
const now = new Date();
|
||||
const diffMs = now.getTime() - date.getTime();
|
||||
|
||||
if (diffMs < 60000) { // Less than 1 minute
|
||||
return 'just now';
|
||||
} else if (diffMs < 3600000) { // Less than 1 hour
|
||||
const minutes = Math.floor(diffMs / 60000);
|
||||
return `${minutes}m ago`;
|
||||
} else if (diffMs < 86400000) { // Less than 1 day
|
||||
const hours = Math.floor(diffMs / 3600000);
|
||||
return `${hours}h ago`;
|
||||
} else {
|
||||
return date.toLocaleDateString();
|
||||
}
|
||||
}
|
||||
|
||||
export function formatDateTime(epochNanos: number): string {
|
||||
const date = new Date(epochNanos / 1000000);
|
||||
const dateStr = date.toLocaleDateString('en-US');
|
||||
const timeStr = date.toLocaleTimeString('en-US', {
|
||||
hour: 'numeric',
|
||||
minute: '2-digit',
|
||||
second: '2-digit',
|
||||
hour12: true,
|
||||
timeZoneName: 'short'
|
||||
});
|
||||
const millisStr = date.getMilliseconds().toString().padStart(3, '0');
|
||||
|
||||
// Insert milliseconds between seconds and AM/PM: "7/12/2025, 9:03:48.264 AM EST"
|
||||
return `${dateStr}, ${timeStr.replace(/(\d{2})\s+(AM|PM)/, `$1.${millisStr} $2`)}`;
|
||||
}
|
||||
|
||||
export function formatDuration(durationNanos?: number | null): string {
|
||||
let durationMs = durationNanos ? durationNanos / 1000000 : null;
|
||||
console.warn('Formatting duration:', durationMs);
|
||||
if (!durationMs || durationMs <= 0) {
|
||||
return '—';
|
||||
}
|
||||
|
||||
if (durationMs < 1000) {
|
||||
return `${Math.round(durationMs)}ms`;
|
||||
} else if (durationMs < 60000) {
|
||||
return `${(durationMs / 1000).toFixed(1)}s`;
|
||||
} else if (durationMs < 3600000) {
|
||||
const minutes = Math.floor(durationMs / 60000);
|
||||
const seconds = Math.floor((durationMs % 60000) / 1000);
|
||||
return `${minutes}m ${seconds}s`;
|
||||
} else {
|
||||
const hours = Math.floor(durationMs / 3600000);
|
||||
const minutes = Math.floor((durationMs % 3600000) / 60000);
|
||||
return `${hours}h ${minutes}m`;
|
||||
}
|
||||
}
|
||||
|
||||
export function formatDate(epochNanos: number): string {
|
||||
const date = new Date(epochNanos / 1000000);
|
||||
return date.toLocaleDateString('en-US', {
|
||||
month: 'short',
|
||||
day: 'numeric',
|
||||
year: 'numeric'
|
||||
});
|
||||
}
|
||||
|
|
@ -1,44 +0,0 @@
|
|||
// Test file designed to fail TypeScript compilation with strict config
|
||||
// These are the exact patterns that caused runtime failures in production
|
||||
|
||||
// Test 1: Reproduce original status.toLowerCase() failure
|
||||
const mockResponseWithStatusObject = { status_code: 1, status_name: "COMPLETED" };
|
||||
|
||||
// This should cause compilation error: Property 'status' does not exist
|
||||
const test1 = mockResponseWithStatusObject.status?.toLowerCase();
|
||||
|
||||
// Test 2: Reproduce original status?.status access failure
|
||||
const test2 = mockResponseWithStatusObject.status?.status;
|
||||
|
||||
// Test 3: Optional field access without null check
|
||||
interface PartitionSummaryTest {
|
||||
last_updated?: number;
|
||||
partition_ref: string;
|
||||
}
|
||||
|
||||
const testPartition: PartitionSummaryTest = {
|
||||
partition_ref: "test-partition"
|
||||
};
|
||||
|
||||
// This should fail: accessing optional field without null check
|
||||
const timestamp = testPartition.last_updated.toString();
|
||||
|
||||
// Test 4: Exact optional property types
|
||||
interface StrictTest {
|
||||
required: string;
|
||||
optional?: string;
|
||||
}
|
||||
|
||||
// This should fail with exactOptionalPropertyTypes
|
||||
const testObj: StrictTest = {
|
||||
required: "test",
|
||||
optional: undefined // undefined not assignable to optional string
|
||||
};
|
||||
|
||||
// Test 5: Array access without undefined handling
|
||||
const testArray: string[] = ["a", "b", "c"];
|
||||
const element: string = testArray[10]; // Should include undefined in type
|
||||
|
||||
// Test 6: Null access without proper checks
|
||||
let possiblyNull: string | null = Math.random() > 0.5 ? "value" : null;
|
||||
const upperCase = possiblyNull.toUpperCase(); // Should fail with strictNullChecks
|
||||
|
|
@ -1,69 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Test script to verify strict TypeScript configuration catches expected failures
|
||||
|
||||
set -e
|
||||
|
||||
echo "Testing strict TypeScript configuration..."
|
||||
|
||||
# Find TypeScript compiler in runfiles
|
||||
if [[ -n "${RUNFILES_DIR:-}" ]]; then
|
||||
TSC="${RUNFILES_DIR}/_main/databuild/dashboard/node_modules/typescript/bin/tsc"
|
||||
else
|
||||
# Fallback for local execution
|
||||
TSC="$(find . -name tsc -type f | head -1)"
|
||||
if [[ -z "$TSC" ]]; then
|
||||
echo "ERROR: Could not find TypeScript compiler"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Get paths relative to runfiles
|
||||
if [[ -n "${RUNFILES_DIR:-}" ]]; then
|
||||
TEST_DATA_DIR="${RUNFILES_DIR}/_main/databuild/dashboard/test-data"
|
||||
TSCONFIG="${RUNFILES_DIR}/_main/databuild/dashboard/tsconfig_app.json"
|
||||
else
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
TEST_DATA_DIR="$SCRIPT_DIR/test-data"
|
||||
TSCONFIG="$SCRIPT_DIR/tsconfig_app.json"
|
||||
fi
|
||||
|
||||
# Function to test that TypeScript compilation fails with expected errors
|
||||
test_compilation_failures() {
|
||||
local test_file="$1"
|
||||
local expected_errors="$2"
|
||||
|
||||
echo "Testing compilation failures for: $test_file"
|
||||
|
||||
# Run TypeScript compilation and capture output
|
||||
if node "$TSC" --noEmit --strict --strictNullChecks --noImplicitAny --noImplicitReturns --noUncheckedIndexedAccess --exactOptionalPropertyTypes "$test_file" 2>&1; then
|
||||
echo "ERROR: Expected TypeScript compilation to fail for $test_file, but it passed"
|
||||
return 1
|
||||
fi
|
||||
|
||||
# Check that we get the expected error patterns
|
||||
local tsc_output=$(node "$TSC" --noEmit --strict --strictNullChecks --noImplicitAny --noImplicitReturns --noUncheckedIndexedAccess --exactOptionalPropertyTypes "$test_file" 2>&1 || true)
|
||||
|
||||
IFS='|' read -ra ERROR_PATTERNS <<< "$expected_errors"
|
||||
for pattern in "${ERROR_PATTERNS[@]}"; do
|
||||
if ! echo "$tsc_output" | grep -q "$pattern"; then
|
||||
echo "ERROR: Expected error pattern '$pattern' not found in TypeScript output"
|
||||
echo "Actual output:"
|
||||
echo "$tsc_output"
|
||||
return 1
|
||||
fi
|
||||
done
|
||||
|
||||
echo "✓ Compilation correctly failed with expected errors"
|
||||
}
|
||||
|
||||
# Test 1: Verify strict config catches undefined property access
|
||||
test_compilation_failures "$TEST_DATA_DIR/strict-config-failures.ts" "Property 'status' does not exist|is possibly 'undefined'|Type 'undefined' is not assignable"
|
||||
|
||||
echo "All strict TypeScript configuration tests passed!"
|
||||
echo ""
|
||||
echo "Summary of what strict config catches:"
|
||||
echo "- ✓ Undefined property access (status.toLowerCase() failures)"
|
||||
echo "- ✓ Optional field access without null checks"
|
||||
echo "- ✓ Exact optional property type mismatches"
|
||||
echo "- ✓ Array access without undefined handling"
|
||||
echo "- ✓ Null/undefined access without proper checks"
|
||||
|
|
@ -1,320 +0,0 @@
|
|||
// Phase 3.5: Unit tests for transformation functions
|
||||
// These tests verify that transformation functions prevent the observed runtime failures
|
||||
|
||||
import o from 'ospec';
|
||||
import {
|
||||
BuildSummary,
|
||||
BuildDetailResponse,
|
||||
PartitionSummary,
|
||||
JobSummary,
|
||||
ActivityResponse
|
||||
} from '../client/typescript_generated/src/index';
|
||||
|
||||
// Import types directly since we're now in the same ts_project
|
||||
import {
|
||||
DashboardActivity,
|
||||
DashboardBuild,
|
||||
DashboardPartition,
|
||||
DashboardJob,
|
||||
isDashboardActivity,
|
||||
isDashboardBuild,
|
||||
isDashboardPartition,
|
||||
isDashboardJob
|
||||
} from './types';
|
||||
|
||||
// Mock transformation functions for testing (since they're not exported from services.ts)
|
||||
function transformBuildSummary(apiResponse: BuildSummary): DashboardBuild {
|
||||
return {
|
||||
build_request_id: apiResponse.build_request_id,
|
||||
status_code: apiResponse.status_code,
|
||||
status_name: apiResponse.status_name,
|
||||
requested_partitions: apiResponse.requested_partitions, // Keep as PartitionRef array
|
||||
total_jobs: apiResponse.total_jobs,
|
||||
completed_jobs: apiResponse.completed_jobs,
|
||||
failed_jobs: apiResponse.failed_jobs,
|
||||
cancelled_jobs: apiResponse.cancelled_jobs,
|
||||
requested_at: apiResponse.requested_at,
|
||||
started_at: apiResponse.started_at ?? null,
|
||||
completed_at: apiResponse.completed_at ?? null,
|
||||
duration_ms: apiResponse.duration_ms ?? null,
|
||||
cancelled: apiResponse.cancelled,
|
||||
};
|
||||
}
|
||||
|
||||
function transformBuildDetail(apiResponse: BuildDetailResponse): DashboardBuild {
|
||||
return {
|
||||
build_request_id: apiResponse.build_request_id,
|
||||
status_code: apiResponse.status_code,
|
||||
status_name: apiResponse.status_name,
|
||||
requested_partitions: apiResponse.requested_partitions, // Keep as PartitionRef array
|
||||
total_jobs: apiResponse.total_jobs,
|
||||
completed_jobs: apiResponse.completed_jobs,
|
||||
failed_jobs: apiResponse.failed_jobs,
|
||||
cancelled_jobs: apiResponse.cancelled_jobs,
|
||||
requested_at: apiResponse.requested_at,
|
||||
started_at: apiResponse.started_at ?? null,
|
||||
completed_at: apiResponse.completed_at ?? null,
|
||||
duration_ms: apiResponse.duration_ms ?? null,
|
||||
cancelled: apiResponse.cancelled,
|
||||
};
|
||||
}
|
||||
|
||||
function transformPartitionSummary(apiResponse: any): DashboardPartition {
|
||||
return {
|
||||
partition_ref: apiResponse.partition_ref, // Keep as PartitionRef object
|
||||
status_code: apiResponse.status_code,
|
||||
status_name: apiResponse.status_name,
|
||||
last_updated: apiResponse.last_updated ?? null,
|
||||
build_requests: apiResponse.build_requests || [],
|
||||
};
|
||||
}
|
||||
|
||||
function transformJobSummary(apiResponse: JobSummary): DashboardJob {
|
||||
return {
|
||||
job_label: apiResponse.job_label,
|
||||
total_runs: apiResponse.total_runs,
|
||||
successful_runs: apiResponse.successful_runs,
|
||||
failed_runs: apiResponse.failed_runs,
|
||||
cancelled_runs: apiResponse.cancelled_runs,
|
||||
last_run_timestamp: apiResponse.last_run_timestamp,
|
||||
last_run_status_code: apiResponse.last_run_status_code,
|
||||
last_run_status_name: apiResponse.last_run_status_name,
|
||||
average_partitions_per_run: apiResponse.average_partitions_per_run,
|
||||
recent_builds: apiResponse.recent_builds || [],
|
||||
};
|
||||
}
|
||||
|
||||
function transformActivityResponse(apiResponse: ActivityResponse): DashboardActivity {
|
||||
return {
|
||||
active_builds_count: apiResponse.active_builds_count,
|
||||
recent_builds: apiResponse.recent_builds.map(transformBuildSummary),
|
||||
recent_partitions: apiResponse.recent_partitions.map(transformPartitionSummary),
|
||||
total_partitions_count: apiResponse.total_partitions_count,
|
||||
system_status: apiResponse.system_status,
|
||||
graph_name: apiResponse.graph_name,
|
||||
};
|
||||
}
|
||||
|
||||
// Test Data Mocks
|
||||
const mockBuildSummary: BuildSummary = {
|
||||
build_request_id: 'build-123',
|
||||
status_code: 4, // BUILD_REQUEST_COMPLETED
|
||||
status_name: 'COMPLETED',
|
||||
requested_partitions: [{ str: 'partition-1' }, { str: 'partition-2' }],
|
||||
total_jobs: 5,
|
||||
completed_jobs: 5,
|
||||
failed_jobs: 0,
|
||||
cancelled_jobs: 0,
|
||||
requested_at: 1640995200000000000, // 2022-01-01 00:00:00 UTC in nanos
|
||||
started_at: 1640995260000000000, // 2022-01-01 00:01:00 UTC in nanos
|
||||
completed_at: 1640995320000000000, // 2022-01-01 00:02:00 UTC in nanos
|
||||
duration_ms: 60000, // 1 minute
|
||||
cancelled: false
|
||||
};
|
||||
|
||||
const mockPartitionSummary: any = {
|
||||
partition_ref: { str: 'test-partition' },
|
||||
status_code: 4, // PARTITION_AVAILABLE
|
||||
status_name: 'AVAILABLE',
|
||||
last_updated: 1640995200000000000,
|
||||
builds_count: 3,
|
||||
invalidation_count: 0,
|
||||
build_requests: ['build-123', 'build-124'],
|
||||
last_successful_build: 'build-123'
|
||||
};
|
||||
|
||||
const mockJobSummary: JobSummary = {
|
||||
job_label: '//:test-job',
|
||||
total_runs: 10,
|
||||
successful_runs: 9,
|
||||
failed_runs: 1,
|
||||
cancelled_runs: 0,
|
||||
average_partitions_per_run: 2.5,
|
||||
last_run_timestamp: 1640995200000000000,
|
||||
last_run_status_code: 3, // JOB_COMPLETED
|
||||
last_run_status_name: 'COMPLETED',
|
||||
recent_builds: ['build-123', 'build-124']
|
||||
};
|
||||
|
||||
const mockActivityResponse: ActivityResponse = {
|
||||
active_builds_count: 2,
|
||||
recent_builds: [mockBuildSummary],
|
||||
recent_partitions: [mockPartitionSummary],
|
||||
total_partitions_count: 100,
|
||||
system_status: 'healthy',
|
||||
graph_name: 'test-graph'
|
||||
};
|
||||
|
||||
// Test Suite
|
||||
o.spec('Transformation Functions', () => {
|
||||
o('transformBuildSummary handles status fields correctly', () => {
|
||||
const result = transformBuildSummary(mockBuildSummary);
|
||||
|
||||
// The key fix: status_name should be a string, status_code a number
|
||||
o(typeof result.status_code).equals('number');
|
||||
o(typeof result.status_name).equals('string');
|
||||
o(result.status_name).equals('COMPLETED');
|
||||
|
||||
// This should not throw (preventing the original runtime error)
|
||||
o(() => result.status_name.toLowerCase()).notThrows('status_name.toLowerCase should work');
|
||||
});
|
||||
|
||||
o('transformBuildSummary handles null optional fields', () => {
|
||||
const buildWithNulls: BuildSummary = {
|
||||
...mockBuildSummary,
|
||||
started_at: null,
|
||||
completed_at: null,
|
||||
duration_ms: null
|
||||
};
|
||||
|
||||
const result = transformBuildSummary(buildWithNulls);
|
||||
|
||||
// Explicit null handling prevents undefined property access
|
||||
o(result.started_at).equals(null);
|
||||
o(result.completed_at).equals(null);
|
||||
o(result.duration_ms).equals(null);
|
||||
});
|
||||
|
||||
o('transformPartitionSummary preserves PartitionRef objects correctly', () => {
|
||||
const result = transformPartitionSummary(mockPartitionSummary);
|
||||
|
||||
// The key fix: partition_ref should remain as PartitionRef object
|
||||
o(typeof result.partition_ref).equals('object');
|
||||
o(result.partition_ref.str).equals('test-partition');
|
||||
|
||||
// This should not throw (preventing original runtime errors)
|
||||
o(() => result.partition_ref.str.toLowerCase()).notThrows('partition_ref.str.toLowerCase should work');
|
||||
});
|
||||
|
||||
o('transformPartitionSummary handles missing arrays safely', () => {
|
||||
const partitionWithoutArray: any = {
|
||||
...mockPartitionSummary
|
||||
};
|
||||
delete partitionWithoutArray.build_requests;
|
||||
|
||||
const result = transformPartitionSummary(partitionWithoutArray);
|
||||
|
||||
// Should default to empty array, preventing length/iteration errors
|
||||
o(Array.isArray(result.build_requests)).equals(true);
|
||||
o(result.build_requests.length).equals(0);
|
||||
});
|
||||
|
||||
o('transformJobSummary handles status fields correctly', () => {
|
||||
const result = transformJobSummary(mockJobSummary);
|
||||
|
||||
// The key fix: both status code and name should be preserved
|
||||
o(typeof result.last_run_status_code).equals('number');
|
||||
o(typeof result.last_run_status_name).equals('string');
|
||||
o(result.last_run_status_name).equals('COMPLETED');
|
||||
|
||||
// This should not throw
|
||||
o(() => result.last_run_status_name.toLowerCase()).notThrows('last_run_status_name.toLowerCase should work');
|
||||
});
|
||||
|
||||
o('transformActivityResponse maintains structure consistency', () => {
|
||||
const result = transformActivityResponse(mockActivityResponse);
|
||||
|
||||
// Should pass our type guard
|
||||
o(isDashboardActivity(result)).equals(true);
|
||||
|
||||
// All nested objects should be properly transformed
|
||||
o(result.recent_builds.length).equals(1);
|
||||
o(typeof result.recent_builds[0]?.status_name).equals('string');
|
||||
|
||||
o(result.recent_partitions.length).equals(1);
|
||||
o(typeof result.recent_partitions[0]?.partition_ref).equals('object');
|
||||
o(typeof result.recent_partitions[0]?.partition_ref.str).equals('string');
|
||||
});
|
||||
|
||||
o('transformations prevent original runtime failures', () => {
|
||||
const result = transformActivityResponse(mockActivityResponse);
|
||||
|
||||
// These are the exact patterns that caused runtime failures:
|
||||
|
||||
// 1. status_name.toLowerCase() - should not crash
|
||||
result.recent_builds.forEach((build: DashboardBuild) => {
|
||||
o(() => build.status_name.toLowerCase()).notThrows('build.status_name.toLowerCase should work');
|
||||
o(build.status_name.toLowerCase()).equals('completed');
|
||||
});
|
||||
|
||||
// 2. partition_ref.str access - should access string property
|
||||
result.recent_partitions.forEach((partition: DashboardPartition) => {
|
||||
o(typeof partition.partition_ref).equals('object');
|
||||
o(typeof partition.partition_ref.str).equals('string');
|
||||
o(() => partition.partition_ref.str.toLowerCase()).notThrows('partition.partition_ref.str.toLowerCase should work');
|
||||
});
|
||||
|
||||
// 3. Null/undefined handling - should be explicit
|
||||
result.recent_builds.forEach((build: DashboardBuild) => {
|
||||
// These fields can be null but never undefined
|
||||
o(build.started_at === null || typeof build.started_at === 'number').equals(true);
|
||||
o(build.completed_at === null || typeof build.completed_at === 'number').equals(true);
|
||||
o(build.duration_ms === null || typeof build.duration_ms === 'number').equals(true);
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
// Edge Cases and Error Conditions
|
||||
o.spec('Transformation Edge Cases', () => {
|
||||
o('handles empty arrays correctly', () => {
|
||||
const emptyActivity: ActivityResponse = {
|
||||
...mockActivityResponse,
|
||||
recent_builds: [],
|
||||
recent_partitions: []
|
||||
};
|
||||
|
||||
const result = transformActivityResponse(emptyActivity);
|
||||
|
||||
o(Array.isArray(result.recent_builds)).equals(true);
|
||||
o(result.recent_builds.length).equals(0);
|
||||
o(Array.isArray(result.recent_partitions)).equals(true);
|
||||
o(result.recent_partitions.length).equals(0);
|
||||
});
|
||||
|
||||
o('handles malformed PartitionRef gracefully', () => {
|
||||
const malformedPartition: any = {
|
||||
...mockPartitionSummary,
|
||||
partition_ref: { str: '' } // Empty string
|
||||
};
|
||||
|
||||
const result = transformPartitionSummary(malformedPartition);
|
||||
|
||||
o(typeof result.partition_ref.str).equals('string');
|
||||
o(result.partition_ref.str).equals('');
|
||||
});
|
||||
|
||||
o('transformations produce valid dashboard types', () => {
|
||||
// Test that all transformation results pass type guards
|
||||
const transformedBuild = transformBuildSummary(mockBuildSummary);
|
||||
const transformedPartition = transformPartitionSummary(mockPartitionSummary);
|
||||
const transformedJob = transformJobSummary(mockJobSummary);
|
||||
const transformedActivity = transformActivityResponse(mockActivityResponse);
|
||||
|
||||
o(isDashboardBuild(transformedBuild)).equals(true);
|
||||
o(isDashboardPartition(transformedPartition)).equals(true);
|
||||
o(isDashboardJob(transformedJob)).equals(true);
|
||||
o(isDashboardActivity(transformedActivity)).equals(true);
|
||||
});
|
||||
});
|
||||
|
||||
// Performance and Memory Tests
|
||||
o.spec('Transformation Performance', () => {
|
||||
o('transforms large datasets efficiently', () => {
|
||||
const largeActivity: ActivityResponse = {
|
||||
...mockActivityResponse,
|
||||
recent_builds: Array(1000).fill(mockBuildSummary),
|
||||
recent_partitions: Array(1000).fill(mockPartitionSummary)
|
||||
};
|
||||
|
||||
const start = Date.now();
|
||||
const result = transformActivityResponse(largeActivity);
|
||||
const duration = Date.now() - start;
|
||||
|
||||
// Should complete transformation in reasonable time
|
||||
o(duration < 1000).equals(true); // Less than 1 second
|
||||
o(result.recent_builds.length).equals(1000);
|
||||
o(result.recent_partitions.length).equals(1000);
|
||||
});
|
||||
});
|
||||
|
||||
// Export default removed - tests are run by importing this file
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
{
|
||||
"compilerOptions": {
|
||||
"target": "es2016", /* Set the JavaScript language version for emitted JavaScript and include compatible library declarations. */
|
||||
"lib": ["es6","dom", "es2021"], /* Specify a set of bundled library declaration files that describe the target runtime environment. */
|
||||
"module": "commonjs", /* Specify what module code is generated. */
|
||||
"rootDir": "./", /* Specify the root folder within your source files. */
|
||||
"moduleResolution": "node", /* Specify how TypeScript looks up a file from a given module specifier. */
|
||||
"resolveJsonModule": true, /* Enable importing .json files. */
|
||||
"allowJs": true, /* Allow JavaScript files to be a part of your program. Use the 'checkJS' option to get errors from these files. */
|
||||
"inlineSourceMap": true, /* Include sourcemap files inside the emitted JavaScript. */
|
||||
"esModuleInterop": true, /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables 'allowSyntheticDefaultImports' for type compatibility. */
|
||||
"forceConsistentCasingInFileNames": true, /* Ensure that casing is correct in imports. */
|
||||
"strict": true, /* Enable all strict type-checking options. */
|
||||
"noImplicitAny": true, /* Enable error reporting for expressions and declarations with an implied 'any' type. */
|
||||
"strictNullChecks": true, /* Enable error reporting for null and undefined values. */
|
||||
"noImplicitReturns": true, /* Enable error reporting for codepaths that do not explicitly return. */
|
||||
"noUncheckedIndexedAccess": true, /* Add 'undefined' to index signature results. */
|
||||
"exactOptionalPropertyTypes": true, /* Ensure optional property types are exact. */
|
||||
"skipLibCheck": true /* Skip type checking all .d.ts files. */
|
||||
}
|
||||
}
|
||||
|
|
@ -1,22 +0,0 @@
|
|||
{
|
||||
"compilerOptions": {
|
||||
"target": "es2016", /* Set the JavaScript language version for emitted JavaScript and include compatible library declarations. */
|
||||
"lib": ["es6","dom"], /* Specify a set of bundled library declaration files that describe the target runtime environment. */
|
||||
"module": "commonjs", /* Specify what module code is generated. */
|
||||
"rootDir": "./", /* Specify the root folder within your source files. */
|
||||
"moduleResolution": "node", /* Specify how TypeScript looks up a file from a given module specifier. */
|
||||
"baseUrl": "./", /* Specify the base directory to resolve non-relative module names. */
|
||||
"resolveJsonModule": true, /* Enable importing .json files. */
|
||||
"allowJs": true, /* Allow JavaScript files to be a part of your program. Use the 'checkJS' option to get errors from these files. */
|
||||
"inlineSourceMap": true, /* Include sourcemap files inside the emitted JavaScript. */
|
||||
"esModuleInterop": true, /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables 'allowSyntheticDefaultImports' for type compatibility. */
|
||||
"forceConsistentCasingInFileNames": true, /* Ensure that casing is correct in imports. */
|
||||
"strict": true, /* Enable all strict type-checking options. */
|
||||
"noImplicitAny": true, /* Enable error reporting for expressions and declarations with an implied 'any' type. */
|
||||
"strictNullChecks": true, /* Enable error reporting for null and undefined values. */
|
||||
"noImplicitReturns": true, /* Enable error reporting for codepaths that do not explicitly return. */
|
||||
"noUncheckedIndexedAccess": true, /* Add 'undefined' to index signature results. */
|
||||
"exactOptionalPropertyTypes": true, /* Ensure optional property types are exact. */
|
||||
"skipLibCheck": true /* Skip type checking all .d.ts files. */
|
||||
}
|
||||
}
|
||||
|
|
@ -1,287 +0,0 @@
|
|||
import m from 'mithril';
|
||||
import {
|
||||
ActivityResponse,
|
||||
ActivityApiResponse,
|
||||
BuildSummary,
|
||||
BuildDetailResponse,
|
||||
PartitionSummary,
|
||||
PartitionDetailResponse,
|
||||
PartitionEventsResponse,
|
||||
JobSummary,
|
||||
JobMetricsResponse,
|
||||
JobDailyStats,
|
||||
JobRunSummary,
|
||||
PartitionRef
|
||||
} from '../client/typescript_generated/src/index';
|
||||
|
||||
// Dashboard-optimized types - canonical frontend types independent of backend schema
|
||||
// These types prevent runtime errors by ensuring consistent data shapes throughout components
|
||||
|
||||
export interface DashboardBuild {
|
||||
build_request_id: string;
|
||||
status_code: number;
|
||||
status_name: string;
|
||||
requested_partitions: PartitionRef[];
|
||||
total_jobs: number;
|
||||
completed_jobs: number;
|
||||
failed_jobs: number;
|
||||
cancelled_jobs: number;
|
||||
requested_at: number;
|
||||
started_at: number | null;
|
||||
completed_at: number | null;
|
||||
duration_ms: number | null;
|
||||
cancelled: boolean;
|
||||
}
|
||||
|
||||
export interface DashboardPartition {
|
||||
partition_ref: PartitionRef;
|
||||
status_code: number;
|
||||
status_name: string;
|
||||
last_updated: number | null;
|
||||
build_requests: string[];
|
||||
}
|
||||
|
||||
export interface DashboardJob {
|
||||
job_label: string;
|
||||
total_runs: number;
|
||||
successful_runs: number;
|
||||
failed_runs: number;
|
||||
cancelled_runs: number;
|
||||
last_run_timestamp: number;
|
||||
last_run_status_code: number;
|
||||
last_run_status_name: string;
|
||||
average_partitions_per_run: number;
|
||||
recent_builds: string[];
|
||||
}
|
||||
|
||||
export interface DashboardActivity {
|
||||
active_builds_count: number;
|
||||
recent_builds: DashboardBuild[];
|
||||
recent_partitions: DashboardPartition[];
|
||||
total_partitions_count: number;
|
||||
system_status: string;
|
||||
graph_name: string;
|
||||
}
|
||||
|
||||
// Dashboard timeline event types for consistent UI handling
|
||||
export interface DashboardBuildTimelineEvent {
|
||||
timestamp: number;
|
||||
status_code: number;
|
||||
status_name: string;
|
||||
message: string;
|
||||
event_type: string;
|
||||
cancel_reason?: string;
|
||||
}
|
||||
|
||||
export interface DashboardPartitionTimelineEvent {
|
||||
timestamp: number;
|
||||
status_code: number;
|
||||
status_name: string;
|
||||
message: string;
|
||||
build_request_id: string;
|
||||
job_run_id?: string;
|
||||
}
|
||||
|
||||
// Generic typed component interface that extends Mithril's component
|
||||
// Uses intersection type to allow arbitrary properties while ensuring type safety for lifecycle methods
|
||||
export interface TypedComponent<TAttrs = {}> extends Record<string, any> {
|
||||
oninit?(vnode: m.Vnode<TAttrs>): void;
|
||||
oncreate?(vnode: m.VnodeDOM<TAttrs>): void;
|
||||
onupdate?(vnode: m.VnodeDOM<TAttrs>): void;
|
||||
onbeforeremove?(vnode: m.VnodeDOM<TAttrs>): Promise<any> | void;
|
||||
onremove?(vnode: m.VnodeDOM<TAttrs>): void;
|
||||
onbeforeupdate?(vnode: m.Vnode<TAttrs>, old: m.VnodeDOM<TAttrs>): boolean | void;
|
||||
view(vnode: m.Vnode<TAttrs>): m.Children;
|
||||
}
|
||||
|
||||
// Helper type for typed vnodes
|
||||
export type TypedVnode<TAttrs = {}> = m.Vnode<TAttrs>;
|
||||
export type TypedVnodeDOM<TAttrs = {}> = m.VnodeDOM<TAttrs>;
|
||||
|
||||
// Route parameter types
|
||||
export interface RouteParams {
|
||||
[key: string]: string;
|
||||
}
|
||||
|
||||
export interface BuildRouteParams extends RouteParams {
|
||||
id: string;
|
||||
}
|
||||
|
||||
export interface PartitionRouteParams extends RouteParams {
|
||||
base64_ref: string;
|
||||
}
|
||||
|
||||
export interface JobRouteParams extends RouteParams {
|
||||
label: string;
|
||||
}
|
||||
|
||||
// Component attribute interfaces that reference OpenAPI types
|
||||
|
||||
export interface RecentActivityAttrs {
|
||||
// No external attrs needed - component manages its own data loading
|
||||
}
|
||||
|
||||
export interface BuildStatusAttrs {
|
||||
id: string;
|
||||
}
|
||||
|
||||
export interface PartitionStatusAttrs {
|
||||
base64_ref: string;
|
||||
}
|
||||
|
||||
export interface PartitionsListAttrs {
|
||||
// No external attrs needed - component manages its own data loading
|
||||
}
|
||||
|
||||
export interface JobsListAttrs {
|
||||
// No external attrs needed - component manages its own data loading
|
||||
}
|
||||
|
||||
export interface JobMetricsAttrs {
|
||||
label: string;
|
||||
}
|
||||
|
||||
export interface GraphAnalysisAttrs {
|
||||
// No external attrs needed for now
|
||||
}
|
||||
|
||||
// Badge component attribute interfaces with OpenAPI type constraints
|
||||
|
||||
export interface BuildStatusBadgeAttrs {
|
||||
status: string; // This should be constrained to BuildSummary status values
|
||||
size?: 'xs' | 'sm' | 'md' | 'lg';
|
||||
class?: string;
|
||||
}
|
||||
|
||||
export interface PartitionStatusBadgeAttrs {
|
||||
status: string; // This should be constrained to PartitionSummary status values
|
||||
size?: 'xs' | 'sm' | 'md' | 'lg';
|
||||
class?: string;
|
||||
}
|
||||
|
||||
export interface EventTypeBadgeAttrs {
|
||||
eventType: string; // This should be constrained to known event types
|
||||
size?: 'xs' | 'sm' | 'md' | 'lg';
|
||||
class?: string;
|
||||
}
|
||||
|
||||
// Layout wrapper attributes
|
||||
export interface LayoutWrapperAttrs {
|
||||
// Layout wrapper will pass through attributes to wrapped component
|
||||
[key: string]: any;
|
||||
}
|
||||
|
||||
// Data types for component state (using Dashboard types for consistency)
|
||||
export interface RecentActivityData {
|
||||
data: DashboardActivity | null;
|
||||
loading: boolean;
|
||||
error: string | null;
|
||||
}
|
||||
|
||||
export interface BuildStatusData {
|
||||
data: DashboardBuild | null;
|
||||
partitionStatuses: Map<string, DashboardPartition>; // Key is partition_ref.str
|
||||
timeline: DashboardBuildTimelineEvent[];
|
||||
loading: boolean;
|
||||
error: string | null;
|
||||
buildId: string;
|
||||
}
|
||||
|
||||
export interface PartitionStatusData {
|
||||
data: DashboardPartition | null;
|
||||
timeline: DashboardPartitionTimelineEvent[];
|
||||
loading: boolean;
|
||||
error: string | null;
|
||||
partitionRef: string;
|
||||
buildHistory: DashboardBuild[];
|
||||
}
|
||||
|
||||
export interface JobsListData {
|
||||
jobs: DashboardJob[];
|
||||
searchTerm: string;
|
||||
loading: boolean;
|
||||
error: string | null;
|
||||
searchTimeout: NodeJS.Timeout | null;
|
||||
}
|
||||
|
||||
export interface JobMetricsData {
|
||||
jobLabel: string;
|
||||
job: DashboardJob | null;
|
||||
loading: boolean;
|
||||
error: string | null;
|
||||
}
|
||||
|
||||
// Utility type for creating typed components
|
||||
export type CreateTypedComponent<TAttrs> = TypedComponent<TAttrs>;
|
||||
|
||||
/*
|
||||
## Dashboard Type Transformation Rationale
|
||||
|
||||
The dashboard types provide a stable interface between the OpenAPI-generated types and UI components:
|
||||
|
||||
1. **Explicit Null Handling**: Protobuf optional fields become `T | null` instead of `T | undefined`
|
||||
to ensure consistent null checking throughout the application.
|
||||
|
||||
2. **Type Safety**: Keep protobuf structure (PartitionRef objects, status codes) to maintain
|
||||
type safety from backend to frontend. Only convert to display strings in components.
|
||||
|
||||
3. **Clear Boundaries**: Dashboard types are the contract between services and components.
|
||||
Services handle API responses, components handle presentation.
|
||||
|
||||
Key principles:
|
||||
- Preserve protobuf structure for type safety
|
||||
- Explicit null handling for optional fields
|
||||
- Convert to display strings only at the UI layer
|
||||
- Consistent types prevent runtime errors
|
||||
*/
|
||||
|
||||
// Type guards and validators for Dashboard types
|
||||
export function isDashboardActivity(data: any): data is DashboardActivity {
|
||||
return data &&
|
||||
typeof data.active_builds_count === 'number' &&
|
||||
typeof data.graph_name === 'string' &&
|
||||
Array.isArray(data.recent_builds) &&
|
||||
Array.isArray(data.recent_partitions) &&
|
||||
typeof data.system_status === 'string' &&
|
||||
typeof data.total_partitions_count === 'number';
|
||||
}
|
||||
|
||||
export function isDashboardBuild(data: any): data is DashboardBuild {
|
||||
return data &&
|
||||
typeof data.build_request_id === 'string' &&
|
||||
typeof data.status_code === 'number' &&
|
||||
typeof data.status_name === 'string' &&
|
||||
typeof data.requested_at === 'number' &&
|
||||
Array.isArray(data.requested_partitions);
|
||||
}
|
||||
|
||||
export function isDashboardPartition(data: any): data is DashboardPartition {
|
||||
return data &&
|
||||
data.partition_ref &&
|
||||
typeof data.partition_ref.str === 'string' &&
|
||||
typeof data.status_code === 'number' &&
|
||||
typeof data.status_name === 'string' &&
|
||||
(data.last_updated === null || typeof data.last_updated === 'number') &&
|
||||
Array.isArray(data.build_requests);
|
||||
}
|
||||
|
||||
export function isDashboardJob(data: any): data is DashboardJob {
|
||||
return data &&
|
||||
typeof data.job_label === 'string' &&
|
||||
typeof data.total_runs === 'number' &&
|
||||
typeof data.last_run_status_code === 'number' &&
|
||||
typeof data.last_run_status_name === 'string' &&
|
||||
Array.isArray(data.recent_builds);
|
||||
}
|
||||
|
||||
// Helper function to create type-safe Mithril components
|
||||
export function createTypedComponent<TAttrs>(
|
||||
component: TypedComponent<TAttrs>
|
||||
): m.Component<TAttrs> {
|
||||
return component as m.Component<TAttrs>;
|
||||
}
|
||||
|
||||
// Helper for type-safe route handling
|
||||
export function getTypedRouteParams<T extends RouteParams>(vnode: m.Vnode<T>): T {
|
||||
return vnode.attrs;
|
||||
}
|
||||
|
|
@ -1,52 +0,0 @@
|
|||
import o from 'ospec';
|
||||
|
||||
// Inline the utils functions for testing since we can't import from the app module in tests
|
||||
function encodePartitionRef(ref: string): string {
|
||||
return btoa(ref).replace(/\+/g, '-').replace(/\//g, '_').replace(/=/g, '');
|
||||
}
|
||||
|
||||
function decodePartitionRef(encoded: string): string {
|
||||
// Add padding if needed
|
||||
const padding = '='.repeat((4 - (encoded.length % 4)) % 4);
|
||||
const padded = encoded.replace(/-/g, '+').replace(/_/g, '/') + padding;
|
||||
return atob(padded);
|
||||
}
|
||||
|
||||
o.spec('URL Encoding Utils', () => {
|
||||
o('should encode and decode partition references correctly', () => {
|
||||
const testCases = [
|
||||
'simple/partition',
|
||||
'complex/partition/with/slashes',
|
||||
'partition+with+plus',
|
||||
'partition=with=equals',
|
||||
'partition_with_underscores',
|
||||
'partition-with-dashes',
|
||||
'partition/with/mixed+symbols=test_case-123',
|
||||
];
|
||||
|
||||
testCases.forEach(original => {
|
||||
const encoded = encodePartitionRef(original);
|
||||
const decoded = decodePartitionRef(encoded);
|
||||
|
||||
o(decoded).equals(original)(`Failed for: ${original}`);
|
||||
|
||||
// Encoded string should be URL-safe (no +, /, or = characters)
|
||||
o(encoded.includes('+')).equals(false)(`Encoded string contains +: ${encoded}`);
|
||||
o(encoded.includes('/')).equals(false)(`Encoded string contains /: ${encoded}`);
|
||||
o(encoded.includes('=')).equals(false)(`Encoded string contains =: ${encoded}`);
|
||||
});
|
||||
});
|
||||
|
||||
o('should handle empty string', () => {
|
||||
const encoded = encodePartitionRef('');
|
||||
const decoded = decodePartitionRef(encoded);
|
||||
o(decoded).equals('');
|
||||
});
|
||||
|
||||
o('should handle special characters', () => {
|
||||
const special = 'test/path?query=value&other=123#fragment';
|
||||
const encoded = encodePartitionRef(special);
|
||||
const decoded = decodePartitionRef(encoded);
|
||||
o(decoded).equals(special);
|
||||
});
|
||||
});
|
||||
|
|
@ -1,108 +0,0 @@
|
|||
// URL encoding utilities for partition references
|
||||
export function encodePartitionRef(ref: string): string {
|
||||
return btoa(ref).replace(/\+/g, '-').replace(/\//g, '_').replace(/=/g, '');
|
||||
}
|
||||
|
||||
export function decodePartitionRef(encoded: string): string {
|
||||
// Add padding if needed
|
||||
const padding = '='.repeat((4 - (encoded.length % 4)) % 4);
|
||||
const padded = encoded.replace(/-/g, '+').replace(/_/g, '/') + padding;
|
||||
return atob(padded);
|
||||
}
|
||||
|
||||
// Job label encoding utilities (same pattern as partition refs)
|
||||
export function encodeJobLabel(label: string): string {
|
||||
return btoa(label).replace(/\+/g, '-').replace(/\//g, '_').replace(/=/g, '');
|
||||
}
|
||||
|
||||
export function decodeJobLabel(encoded: string): string {
|
||||
// Add padding if needed
|
||||
const padding = '='.repeat((4 - (encoded.length % 4)) % 4);
|
||||
const padded = encoded.replace(/-/g, '+').replace(/_/g, '/') + padding;
|
||||
return atob(padded);
|
||||
}
|
||||
|
||||
import m from 'mithril';
|
||||
import {
|
||||
TypedComponent,
|
||||
BuildStatusBadgeAttrs,
|
||||
PartitionStatusBadgeAttrs,
|
||||
EventTypeBadgeAttrs,
|
||||
createTypedComponent
|
||||
} from './types';
|
||||
|
||||
// Mithril components for status badges - encapsulates both logic and presentation
|
||||
|
||||
export const BuildStatusBadge: TypedComponent<BuildStatusBadgeAttrs> = {
|
||||
view(vnode: m.Vnode<BuildStatusBadgeAttrs>) {
|
||||
const { status, size = 'sm', class: className, ...attrs } = vnode.attrs;
|
||||
const normalizedStatus = status.toLowerCase();
|
||||
|
||||
let badgeClass = 'badge-neutral';
|
||||
if (normalizedStatus.includes('completed')) {
|
||||
badgeClass = 'badge-success';
|
||||
} else if (normalizedStatus.includes('executing') || normalizedStatus.includes('planning')) {
|
||||
badgeClass = 'badge-warning';
|
||||
} else if (normalizedStatus.includes('received')) {
|
||||
badgeClass = 'badge-info';
|
||||
} else if (normalizedStatus.includes('failed') || normalizedStatus.includes('cancelled')) {
|
||||
badgeClass = 'badge-error';
|
||||
}
|
||||
|
||||
return m(`span.badge.badge-${size}.${badgeClass}`, { class: className, ...attrs }, status);
|
||||
}
|
||||
};
|
||||
|
||||
export const PartitionStatusBadge: TypedComponent<PartitionStatusBadgeAttrs> = {
|
||||
view(vnode: m.Vnode<PartitionStatusBadgeAttrs>) {
|
||||
const { status, size = 'sm', class: className, ...attrs } = vnode.attrs;
|
||||
if (!status) {
|
||||
return m(`span.badge.badge-${size}.badge-neutral`, { class: className, ...attrs }, 'Unknown');
|
||||
}
|
||||
|
||||
const normalizedStatus = status.toLowerCase();
|
||||
let badgeClass = 'badge-neutral';
|
||||
|
||||
if (normalizedStatus.includes('available')) {
|
||||
badgeClass = 'badge-success';
|
||||
} else if (normalizedStatus.includes('building') || normalizedStatus.includes('analyzed')) {
|
||||
badgeClass = 'badge-warning';
|
||||
} else if (normalizedStatus.includes('requested') || normalizedStatus.includes('delegated')) {
|
||||
badgeClass = 'badge-info';
|
||||
} else if (normalizedStatus.includes('failed')) {
|
||||
badgeClass = 'badge-error';
|
||||
}
|
||||
|
||||
return m(`span.badge.badge-${size}.${badgeClass}`, { class: className, ...attrs }, status);
|
||||
}
|
||||
};
|
||||
|
||||
export const EventTypeBadge: TypedComponent<EventTypeBadgeAttrs> = {
|
||||
view(vnode: m.Vnode<EventTypeBadgeAttrs>) {
|
||||
const { eventType, size = 'sm', class: className, ...attrs } = vnode.attrs;
|
||||
|
||||
let badgeClass = 'badge-ghost';
|
||||
let displayName = eventType;
|
||||
|
||||
switch (eventType) {
|
||||
case 'build_request':
|
||||
badgeClass = 'badge-primary';
|
||||
displayName = 'Build';
|
||||
break;
|
||||
case 'job':
|
||||
badgeClass = 'badge-secondary';
|
||||
displayName = 'Job';
|
||||
break;
|
||||
case 'partition':
|
||||
badgeClass = 'badge-accent';
|
||||
displayName = 'Partition';
|
||||
break;
|
||||
case 'delegation':
|
||||
badgeClass = 'badge-info';
|
||||
displayName = 'Delegation';
|
||||
break;
|
||||
}
|
||||
|
||||
return m(`span.badge.badge-${size}.${badgeClass}`, { class: className, ...attrs }, displayName);
|
||||
}
|
||||
};
|
||||
209
databuild/data_deps.rs
Normal file
209
databuild/data_deps.rs
Normal file
|
|
@ -0,0 +1,209 @@
|
|||
use crate::data_build_event::Event;
|
||||
use crate::want_create_event_v1::Lifetime;
|
||||
use crate::{
|
||||
EphemeralLifetime, JobRunMissingDeps, JobRunReadDeps, MissingDeps, ReadDeps, WantCreateEventV1,
|
||||
};
|
||||
use uuid::Uuid;
|
||||
|
||||
// TODO - how do we version this?
|
||||
pub const DATABUILD_MISSING_DEPS_JSON: &str = "DATABUILD_MISSING_DEPS_JSON:";
|
||||
pub const DATABUILD_DEP_READ_JSON: &str = "DATABUILD_DEP_READ_JSON:";
|
||||
|
||||
pub enum DataDepLogLine {
|
||||
DepMiss(JobRunMissingDeps),
|
||||
DepRead(JobRunReadDeps),
|
||||
}
|
||||
|
||||
impl From<DataDepLogLine> for String {
|
||||
fn from(value: DataDepLogLine) -> Self {
|
||||
match value {
|
||||
DataDepLogLine::DepMiss(dm) => {
|
||||
format!(
|
||||
"{}{}",
|
||||
DATABUILD_MISSING_DEPS_JSON,
|
||||
serde_json::to_string(&dm).expect("json serialize")
|
||||
)
|
||||
}
|
||||
DataDepLogLine::DepRead(dr) => {
|
||||
format!(
|
||||
"{}{}",
|
||||
DATABUILD_DEP_READ_JSON,
|
||||
serde_json::to_string(&dr).expect("json serialize")
|
||||
)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Default, Debug)]
|
||||
pub struct JobRunDataDepResults {
|
||||
pub reads: Vec<ReadDeps>,
|
||||
pub misses: Vec<MissingDeps>,
|
||||
}
|
||||
|
||||
impl JobRunDataDepResults {
|
||||
pub fn with(mut self, dep_log_line: DataDepLogLine) -> Self {
|
||||
match dep_log_line {
|
||||
DataDepLogLine::DepMiss(dm) => self.misses.extend(dm.missing_deps),
|
||||
DataDepLogLine::DepRead(rd) => self.reads.extend(rd.read_deps),
|
||||
}
|
||||
self
|
||||
}
|
||||
|
||||
pub fn with_lines(mut self, lines: Vec<String>) -> Self {
|
||||
lines
|
||||
.iter()
|
||||
.flat_map(|line| parse_log_line(line))
|
||||
.fold(self, |agg, it| agg.with(it))
|
||||
}
|
||||
}
|
||||
|
||||
impl Into<JobRunDataDepResults> for Vec<String> {
|
||||
fn into(self) -> JobRunDataDepResults {
|
||||
JobRunDataDepResults::default().with_lines(self)
|
||||
}
|
||||
}
|
||||
|
||||
pub fn parse_log_line(line: &str) -> Option<DataDepLogLine> {
|
||||
if let Some(message) = line_matches(line, DATABUILD_MISSING_DEPS_JSON) {
|
||||
serde_json::from_str(message)
|
||||
.ok()
|
||||
.map(|dm| DataDepLogLine::DepMiss(dm))
|
||||
} else if let Some(message) = line_matches(line, DATABUILD_DEP_READ_JSON) {
|
||||
serde_json::from_str(message)
|
||||
.ok()
|
||||
.map(|dm| DataDepLogLine::DepRead(dm))
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
|
||||
fn line_matches<'a>(line: &'a str, prefix: &'a str) -> Option<&'a str> {
|
||||
line.trim().strip_prefix(prefix)
|
||||
}
|
||||
|
||||
/// Create ephemeral want events from missing dependencies.
|
||||
/// Ephemeral wants are derivative wants created by the system when a job hits a dep-miss.
|
||||
/// They delegate freshness decisions to their originating want.
|
||||
pub fn missing_deps_to_want_events(missing_deps: Vec<MissingDeps>, job_run_id: &str) -> Vec<Event> {
|
||||
missing_deps
|
||||
.iter()
|
||||
.map(|md| {
|
||||
Event::WantCreateV1(WantCreateEventV1 {
|
||||
want_id: Uuid::new_v4().into(),
|
||||
partitions: md.missing.clone(),
|
||||
lifetime: Some(Lifetime::Ephemeral(EphemeralLifetime {
|
||||
job_run_id: job_run_id.to_string(),
|
||||
})),
|
||||
comment: Some("Missing data".to_string()),
|
||||
})
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_parse_missing_deps_with_1_to_1_and_1_to_n() {
|
||||
let log_line = r#"DATABUILD_MISSING_DEPS_JSON:{"version":"1","missing_deps":[{"impacted":[{"ref":"output/p1"}],"missing":[{"ref":"input/p1"}]},{"impacted":[{"ref":"output/p2"},{"ref":"output/p3"}],"missing":[{"ref":"input/p2"}]}]}"#.to_string();
|
||||
|
||||
let result = parse_log_line(&log_line);
|
||||
assert!(result.is_some());
|
||||
|
||||
let missing_deps = match result.unwrap() {
|
||||
DataDepLogLine::DepMiss(md) => md,
|
||||
_ => panic!("expected dep miss log line"),
|
||||
};
|
||||
assert_eq!(missing_deps.missing_deps.len(), 2);
|
||||
|
||||
// First entry: 1:1 (one missing input -> one impacted output)
|
||||
assert_eq!(missing_deps.missing_deps[0].impacted.len(), 1);
|
||||
assert_eq!(missing_deps.missing_deps[0].impacted[0].r#ref, "output/p1");
|
||||
assert_eq!(missing_deps.missing_deps[0].missing.len(), 1);
|
||||
assert_eq!(missing_deps.missing_deps[0].missing[0].r#ref, "input/p1");
|
||||
|
||||
// Second entry: 1:N (one missing input -> multiple impacted outputs)
|
||||
assert_eq!(missing_deps.missing_deps[1].impacted.len(), 2);
|
||||
assert_eq!(missing_deps.missing_deps[1].impacted[0].r#ref, "output/p2");
|
||||
assert_eq!(missing_deps.missing_deps[1].impacted[1].r#ref, "output/p3");
|
||||
assert_eq!(missing_deps.missing_deps[1].missing.len(), 1);
|
||||
assert_eq!(missing_deps.missing_deps[1].missing[0].r#ref, "input/p2");
|
||||
}
|
||||
|
||||
/// We can accumulate dep miss and read events
|
||||
#[test]
|
||||
fn test_accumulate_dep_parse_and_miss() {
|
||||
// Given
|
||||
let r = JobRunDataDepResults::default();
|
||||
assert_eq!(r.misses.len(), 0);
|
||||
assert_eq!(r.reads.len(), 0);
|
||||
|
||||
// When
|
||||
let r = r
|
||||
.with(DataDepLogLine::DepRead(JobRunReadDeps {
|
||||
version: "1".into(),
|
||||
read_deps: vec![ReadDeps {
|
||||
impacted: vec!["output/p1".into()],
|
||||
read: vec!["input/p1".into()],
|
||||
}],
|
||||
}))
|
||||
.with(DataDepLogLine::DepRead(JobRunReadDeps {
|
||||
version: "1".into(),
|
||||
read_deps: vec![ReadDeps {
|
||||
impacted: vec!["output/p2".into()],
|
||||
read: vec!["input/p2".into(), "input/p2".into()],
|
||||
}],
|
||||
}))
|
||||
.with(DataDepLogLine::DepMiss(JobRunMissingDeps {
|
||||
version: "1".into(),
|
||||
missing_deps: vec![MissingDeps {
|
||||
impacted: vec!["output/p3".into()],
|
||||
missing: vec!["input/p3".into()],
|
||||
}],
|
||||
}));
|
||||
}
|
||||
|
||||
/// It's acceptable to print separately for each missing dep
|
||||
#[test]
|
||||
fn test_parse_multiple_missing_deps() {
|
||||
// Given
|
||||
let r = JobRunDataDepResults::default();
|
||||
let stdout_lines: Vec<String> = vec![
|
||||
"something".into(),
|
||||
DataDepLogLine::DepRead(JobRunReadDeps {
|
||||
version: "1".into(),
|
||||
read_deps: vec![ReadDeps {
|
||||
impacted: vec!["output/p1".into()],
|
||||
read: vec!["input/p1".into()],
|
||||
}],
|
||||
})
|
||||
.into(),
|
||||
DataDepLogLine::DepRead(JobRunReadDeps {
|
||||
version: "1".into(),
|
||||
read_deps: vec![ReadDeps {
|
||||
impacted: vec!["output/p2".into()],
|
||||
read: vec!["input/p2".into()],
|
||||
}],
|
||||
})
|
||||
.into(),
|
||||
"something else".into(),
|
||||
DataDepLogLine::DepMiss(JobRunMissingDeps {
|
||||
version: "1".into(),
|
||||
missing_deps: vec![MissingDeps {
|
||||
impacted: vec!["output/p3".into()],
|
||||
missing: vec!["input/p3".into()],
|
||||
}],
|
||||
})
|
||||
.into(),
|
||||
];
|
||||
|
||||
// When
|
||||
let results = r.with_lines(stdout_lines);
|
||||
|
||||
// Should
|
||||
assert_eq!(results.misses.len(), 1);
|
||||
assert_eq!(results.reads.len(), 2);
|
||||
}
|
||||
}
|
||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,29 +0,0 @@
|
|||
py_library(
|
||||
name = "dsl",
|
||||
srcs = ["dsl.py"],
|
||||
visibility = ["//visibility:public"],
|
||||
deps = [
|
||||
"//databuild:py_proto",
|
||||
],
|
||||
)
|
||||
|
||||
py_library(
|
||||
name = "generator_lib",
|
||||
srcs = ["generator_lib.py"],
|
||||
visibility = ["//visibility:public"],
|
||||
deps = [
|
||||
":dsl",
|
||||
"//databuild:py_proto",
|
||||
],
|
||||
)
|
||||
|
||||
py_binary(
|
||||
name = "generator",
|
||||
srcs = ["generator.py"],
|
||||
data = ["dsl_job_wrapper.py"],
|
||||
main = "generator.py",
|
||||
visibility = ["//visibility:public"],
|
||||
deps = [
|
||||
":generator_lib",
|
||||
],
|
||||
)
|
||||
|
|
@ -1,431 +0,0 @@
|
|||
|
||||
from databuild.proto import JobConfig, PartitionRef, DataDep, DepType
|
||||
from typing import Self, Protocol, get_type_hints, get_origin, get_args
|
||||
from dataclasses import fields, is_dataclass, dataclass, field
|
||||
import re
|
||||
|
||||
|
||||
class PartitionPattern:
|
||||
_raw_pattern: str
|
||||
|
||||
@property
|
||||
def _pattern(self) -> re.Pattern:
|
||||
return re.compile(self._raw_pattern)
|
||||
|
||||
def _validate_pattern(self):
|
||||
"""Checks that both conditions are met:
|
||||
1. All fields from the PartitionFields type are present in the pattern
|
||||
2. All fields from the pattern are present in the PartitionFields type
|
||||
"""
|
||||
# TODO how do I get this to be called?
|
||||
assert is_dataclass(self), "Should be a dataclass also (for partition fields)"
|
||||
pattern_fields = set(self._pattern.groupindex.keys())
|
||||
partition_fields = {field.name for field in fields(self)}
|
||||
if pattern_fields != partition_fields:
|
||||
raise ValueError(f"Pattern fields {pattern_fields} do not match partition fields {partition_fields}")
|
||||
|
||||
@classmethod
|
||||
def deserialize(cls, raw_value: str) -> Self:
|
||||
"""Parses a partition from a string based on the defined pattern."""
|
||||
# Create a temporary instance to access the compiled pattern
|
||||
# We need to compile the pattern to match against it
|
||||
pattern = re.compile(cls._raw_pattern)
|
||||
|
||||
# Match the raw value against the pattern
|
||||
match = pattern.match(raw_value)
|
||||
if not match:
|
||||
raise ValueError(f"String '{raw_value}' does not match pattern '{cls._pattern}'")
|
||||
|
||||
# Extract the field values from the match
|
||||
field_values = match.groupdict()
|
||||
|
||||
# Create and return a new instance with the extracted values
|
||||
return cls(**field_values)
|
||||
|
||||
def serialize(self) -> str:
|
||||
"""Returns a string representation by filling in the pattern template with field values."""
|
||||
# Start with the pattern
|
||||
result = self._raw_pattern
|
||||
|
||||
# Replace each named group in the pattern with its corresponding field value
|
||||
for field in fields(self):
|
||||
# Find the named group pattern and replace it with the actual value
|
||||
# We need to replace the regex pattern with the actual value
|
||||
# Look for the pattern (?P<field_name>...) and replace with the field value
|
||||
pattern_to_replace = rf'\(\?P<{field.name}>[^)]+\)'
|
||||
actual_value = getattr(self, field.name)
|
||||
result = re.sub(pattern_to_replace, actual_value, result)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
class DataBuildJob(Protocol):
|
||||
# The types of partitions that this job produces
|
||||
output_types: list[type[PartitionPattern]]
|
||||
|
||||
def config(self, outputs: list[PartitionPattern]) -> list[JobConfig]: ...
|
||||
|
||||
def exec(self, *args: str) -> None: ...
|
||||
|
||||
|
||||
class DataBuildGraph:
|
||||
def __init__(self, label: str):
|
||||
self.label = label
|
||||
self.lookup = {}
|
||||
|
||||
def job(self, cls: type[DataBuildJob]) -> None:
|
||||
"""Register a job with the graph."""
|
||||
for partition in cls.output_types:
|
||||
assert partition not in self.lookup, f"Partition `{partition}` already registered"
|
||||
self.lookup[partition] = cls
|
||||
return cls
|
||||
|
||||
def generate_bazel_module(self):
|
||||
"""Generates a complete databuild application, packaging up referenced jobs and this graph via bazel targets"""
|
||||
raise NotImplementedError
|
||||
|
||||
def generate_bazel_package(self, name: str, output_dir: str, deps: list = None) -> None:
|
||||
"""Generate BUILD.bazel and binaries into a generated/ subdirectory.
|
||||
|
||||
Args:
|
||||
name: Base name for the generated graph (without .generate suffix)
|
||||
output_dir: Directory to write generated files to (will create generated/ subdir)
|
||||
deps: List of Bazel dependency labels to use in generated BUILD.bazel
|
||||
"""
|
||||
import os
|
||||
import shutil
|
||||
|
||||
# Create generated/ subdirectory
|
||||
generated_dir = os.path.join(output_dir, "generated")
|
||||
os.makedirs(generated_dir, exist_ok=True)
|
||||
|
||||
# Generate BUILD.bazel with job and graph targets
|
||||
self._generate_build_bazel(generated_dir, name, deps or [])
|
||||
|
||||
# Generate individual job scripts (instead of shared wrapper)
|
||||
self._generate_job_scripts(generated_dir)
|
||||
|
||||
# Generate job lookup binary
|
||||
self._generate_job_lookup(generated_dir, name)
|
||||
|
||||
package_name = self._get_package_name()
|
||||
print(f"Generated DataBuild package '{name}' in {generated_dir}")
|
||||
if package_name != "UNKNOWN_PACKAGE":
|
||||
print(f"Run 'bazel build \"@databuild//{package_name}/generated:{name}_graph.analyze\"' to use the generated graph")
|
||||
else:
|
||||
print(f"Run 'bazel build generated:{name}_graph.analyze' to use the generated graph")
|
||||
|
||||
def _generate_build_bazel(self, output_dir: str, name: str, deps: list) -> None:
|
||||
"""Generate BUILD.bazel with databuild_job and databuild_graph targets."""
|
||||
import os
|
||||
|
||||
# Get job classes from the lookup table
|
||||
job_classes = sorted(set(self.lookup.values()), key=lambda cls: cls.__name__)
|
||||
|
||||
# Format deps for BUILD.bazel
|
||||
if deps:
|
||||
deps_str = ", ".join([f'"{dep}"' for dep in deps])
|
||||
else:
|
||||
# Fallback to parent package if no deps provided
|
||||
parent_package = self._get_package_name()
|
||||
deps_str = f'"//{parent_package}:dsl_src"'
|
||||
|
||||
# Generate py_binary targets for each job
|
||||
job_binaries = []
|
||||
job_targets = []
|
||||
|
||||
for job_class in job_classes:
|
||||
job_name = self._snake_case(job_class.__name__)
|
||||
binary_name = f"{job_name}_binary"
|
||||
job_targets.append(f'"{job_name}"')
|
||||
|
||||
job_script_name = f"{job_name}.py"
|
||||
job_binaries.append(f'''py_binary(
|
||||
name = "{binary_name}",
|
||||
srcs = ["{job_script_name}"],
|
||||
main = "{job_script_name}",
|
||||
deps = [{deps_str}],
|
||||
)
|
||||
|
||||
databuild_job(
|
||||
name = "{job_name}",
|
||||
binary = ":{binary_name}",
|
||||
)''')
|
||||
|
||||
# Generate the complete BUILD.bazel content
|
||||
build_content = f'''load("@databuild//databuild:rules.bzl", "databuild_job", "databuild_graph")
|
||||
|
||||
# Generated by DataBuild DSL - do not edit manually
|
||||
# This file is generated in a subdirectory to avoid overwriting the original BUILD.bazel
|
||||
|
||||
{chr(10).join(job_binaries)}
|
||||
|
||||
py_binary(
|
||||
name = "{name}_job_lookup",
|
||||
srcs = ["{name}_job_lookup.py"],
|
||||
deps = [{deps_str}],
|
||||
)
|
||||
|
||||
databuild_graph(
|
||||
name = "{name}_graph",
|
||||
jobs = [{", ".join(job_targets)}],
|
||||
lookup = ":{name}_job_lookup",
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
|
||||
# Create tar archive of generated files for testing
|
||||
genrule(
|
||||
name = "existing_generated",
|
||||
srcs = glob(["*.py", "BUILD.bazel"]),
|
||||
outs = ["existing_generated.tar"],
|
||||
cmd = "mkdir -p temp && cp $(SRCS) temp/ && find temp -exec touch -t 197001010000 {{}} + && tar -cf $@ -C temp .",
|
||||
visibility = ["//visibility:public"],
|
||||
)
|
||||
'''
|
||||
|
||||
with open(os.path.join(output_dir, "BUILD.bazel"), "w") as f:
|
||||
f.write(build_content)
|
||||
|
||||
def _generate_job_scripts(self, output_dir: str) -> None:
|
||||
"""Generate individual Python scripts for each job class."""
|
||||
import os
|
||||
|
||||
# Get job classes and generate a script for each one
|
||||
job_classes = list(set(self.lookup.values()))
|
||||
graph_module_path = self._get_graph_module_path()
|
||||
|
||||
for job_class in job_classes:
|
||||
job_name = self._snake_case(job_class.__name__)
|
||||
script_name = f"{job_name}.py"
|
||||
|
||||
script_content = f'''#!/usr/bin/env python3
|
||||
"""
|
||||
Generated job script for {job_class.__name__}.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import json
|
||||
from {graph_module_path} import {job_class.__name__}
|
||||
from databuild.proto import PartitionRef, JobConfigureResponse, to_dict
|
||||
|
||||
|
||||
def parse_outputs_from_args(args: list[str]) -> list:
|
||||
"""Parse partition output references from command line arguments."""
|
||||
outputs = []
|
||||
for arg in args:
|
||||
# Find which output type can deserialize this partition reference
|
||||
for output_type in {job_class.__name__}.output_types:
|
||||
try:
|
||||
partition = output_type.deserialize(arg)
|
||||
outputs.append(partition)
|
||||
break
|
||||
except ValueError:
|
||||
continue
|
||||
else:
|
||||
raise ValueError(f"No output type in {job_class.__name__} can deserialize partition ref: {{arg}}")
|
||||
|
||||
return outputs
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) < 2:
|
||||
raise Exception(f"Invalid command usage")
|
||||
|
||||
command = sys.argv[1]
|
||||
job_instance = {job_class.__name__}()
|
||||
|
||||
if command == "config":
|
||||
# Parse output partition references as PartitionRef objects (for Rust wrapper)
|
||||
output_refs = [PartitionRef(str=raw_ref) for raw_ref in sys.argv[2:]]
|
||||
|
||||
# Also parse them into DSL partition objects (for DSL job.config())
|
||||
outputs = parse_outputs_from_args(sys.argv[2:])
|
||||
|
||||
# Call job's config method - returns list[JobConfig]
|
||||
configs = job_instance.config(outputs)
|
||||
|
||||
# Wrap in JobConfigureResponse and serialize using to_dict()
|
||||
response = JobConfigureResponse(configs=configs)
|
||||
print(json.dumps(to_dict(response)))
|
||||
|
||||
elif command == "exec":
|
||||
# The exec method expects a JobConfig but the Rust wrapper passes args
|
||||
# For now, let the DSL job handle the args directly
|
||||
# TODO: This needs to be refined based on actual Rust wrapper interface
|
||||
job_instance.exec(*sys.argv[2:])
|
||||
|
||||
else:
|
||||
raise Exception(f"Invalid command `{{sys.argv[1]}}`")
|
||||
'''
|
||||
|
||||
script_path = os.path.join(output_dir, script_name)
|
||||
with open(script_path, "w") as f:
|
||||
f.write(script_content)
|
||||
|
||||
# Make it executable
|
||||
os.chmod(script_path, 0o755)
|
||||
|
||||
def _generate_job_lookup(self, output_dir: str, name: str) -> None:
|
||||
"""Generate job lookup binary that maps partition patterns to job targets."""
|
||||
import os
|
||||
|
||||
# Build the job lookup mappings with full package paths
|
||||
package_name = self._get_package_name()
|
||||
lookup_mappings = []
|
||||
for partition_type, job_class in self.lookup.items():
|
||||
job_name = self._snake_case(job_class.__name__)
|
||||
pattern = partition_type._raw_pattern
|
||||
full_target = f"//{package_name}/generated:{job_name}"
|
||||
lookup_mappings.append(f' r"{pattern}": "{full_target}",')
|
||||
|
||||
lookup_content = f'''#!/usr/bin/env python3
|
||||
"""
|
||||
Generated job lookup for DataBuild DSL graph.
|
||||
Maps partition patterns to job targets.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import re
|
||||
import json
|
||||
from collections import defaultdict
|
||||
|
||||
|
||||
# Mapping from partition patterns to job targets
|
||||
JOB_MAPPINGS = {{
|
||||
{chr(10).join(lookup_mappings)}
|
||||
}}
|
||||
|
||||
|
||||
def lookup_job_for_partition(partition_ref: str) -> str:
|
||||
"""Look up which job can build the given partition reference."""
|
||||
for pattern, job_target in JOB_MAPPINGS.items():
|
||||
if re.match(pattern, partition_ref):
|
||||
return job_target
|
||||
|
||||
raise ValueError(f"No job found for partition: {{partition_ref}}")
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: job_lookup.py <partition_ref> [partition_ref...]", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
results = defaultdict(list)
|
||||
try:
|
||||
for partition_ref in sys.argv[1:]:
|
||||
job_target = lookup_job_for_partition(partition_ref)
|
||||
results[job_target].append(partition_ref)
|
||||
|
||||
# Output the results as JSON (matching existing lookup format)
|
||||
print(json.dumps(dict(results)))
|
||||
except ValueError as e:
|
||||
print(f"ERROR: {{e}}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
'''
|
||||
|
||||
lookup_file = os.path.join(output_dir, f"{name}_job_lookup.py")
|
||||
with open(lookup_file, "w") as f:
|
||||
f.write(lookup_content)
|
||||
|
||||
# Make it executable
|
||||
os.chmod(lookup_file, 0o755)
|
||||
|
||||
def _snake_case(self, name: str) -> str:
|
||||
"""Convert CamelCase to snake_case."""
|
||||
import re
|
||||
s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
|
||||
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()
|
||||
|
||||
def _get_graph_module_path(self) -> str:
|
||||
"""Get the module path for the graph containing this instance."""
|
||||
# Try to find the module by looking at where the graph object is defined
|
||||
import inspect
|
||||
import sys
|
||||
|
||||
# Look through all loaded modules to find where this graph instance is defined
|
||||
for module_name, module in sys.modules.items():
|
||||
if hasattr(module, 'graph') and getattr(module, 'graph') is self:
|
||||
if module_name != '__main__':
|
||||
return module_name
|
||||
|
||||
# Look through the call stack to find the module that imported us
|
||||
for frame_info in inspect.stack():
|
||||
frame_globals = frame_info.frame.f_globals
|
||||
module_name = frame_globals.get('__name__')
|
||||
if module_name and module_name != '__main__' and 'graph' in frame_globals:
|
||||
# Check if this frame has our graph
|
||||
if frame_globals.get('graph') is self:
|
||||
return module_name
|
||||
|
||||
# Last resort fallback - this will need to be manually configured
|
||||
return "UNKNOWN_MODULE"
|
||||
|
||||
def _get_package_name(self) -> str:
|
||||
"""Get the Bazel package name where the DSL source files are located."""
|
||||
# Extract package from the graph label if available
|
||||
if hasattr(self, 'label') and self.label.startswith('//'):
|
||||
# Extract package from label like "//databuild/test/app:dsl_graph"
|
||||
package_part = self.label.split(':')[0]
|
||||
return package_part[2:] # Remove "//" prefix
|
||||
|
||||
# Fallback to trying to infer from module path
|
||||
module_path = self._get_graph_module_path()
|
||||
if module_path != "UNKNOWN_MODULE":
|
||||
# Convert module path to package path
|
||||
# e.g., "databuild.test.app.dsl.graph" -> "databuild/test/app/dsl"
|
||||
parts = module_path.split('.')
|
||||
if parts[-1] in ['graph', 'main']:
|
||||
parts = parts[:-1]
|
||||
return '/'.join(parts)
|
||||
|
||||
return "UNKNOWN_PACKAGE"
|
||||
|
||||
|
||||
@dataclass
|
||||
class JobConfigBuilder:
|
||||
outputs: list[PartitionRef] = field(default_factory=list)
|
||||
inputs: list[DataDep] = field(default_factory=list)
|
||||
args: list[str] = field(default_factory=list)
|
||||
env: dict[str, str] = field(default_factory=dict)
|
||||
|
||||
def build(self) -> JobConfig:
|
||||
return JobConfig(
|
||||
outputs=self.outputs,
|
||||
inputs=self.inputs,
|
||||
args=self.args,
|
||||
env=self.env,
|
||||
)
|
||||
|
||||
def add_inputs(self, *partitions: PartitionPattern, dep_type: DepType=DepType.MATERIALIZE) -> Self:
|
||||
for p in partitions:
|
||||
dep_type_name = "materialize" if dep_type == DepType.MATERIALIZE else "query"
|
||||
self.inputs.append(DataDep(dep_type_code=dep_type, dep_type_name=dep_type_name, partition_ref=PartitionRef(str=p.serialize())))
|
||||
return self
|
||||
|
||||
def add_outputs(self, *partitions: PartitionPattern) -> Self:
|
||||
for p in partitions:
|
||||
self.outputs.append(PartitionRef(str=p.serialize()))
|
||||
return self
|
||||
|
||||
def add_args(self, *args: str) -> Self:
|
||||
self.args.extend(args)
|
||||
return self
|
||||
|
||||
def set_args(self, args: list[str]) -> Self:
|
||||
self.args = args
|
||||
return self
|
||||
|
||||
def set_env(self, env: dict[str, str]) -> Self:
|
||||
self.env = env
|
||||
return self
|
||||
|
||||
def add_env(self, **kwargs) -> Self:
|
||||
for k, v in kwargs.items():
|
||||
assert isinstance(k, str), f"Expected a string key, got `{k}`"
|
||||
assert isinstance(v, str), f"Expected a string key, got `{v}`"
|
||||
self.env[k] = v
|
||||
return self
|
||||
|
|
@ -1,118 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Shared DSL job wrapper that can execute any DataBuildJob defined in a DSL graph.
|
||||
Configured via environment variables:
|
||||
- DATABUILD_DSL_GRAPH_MODULE: Python module path containing the graph (e.g., 'databuild.test.app.dsl.graph')
|
||||
- DATABUILD_JOB_CLASS: Job class name to execute (e.g., 'IngestColorVotes')
|
||||
"""
|
||||
|
||||
import sys
|
||||
import json
|
||||
import os
|
||||
import importlib
|
||||
from typing import List, Any
|
||||
from databuild.proto import JobConfig
|
||||
|
||||
|
||||
def parse_outputs_from_args(args: List[str], job_class: Any) -> List[Any]:
|
||||
"""Parse partition output references from command line arguments into partition objects."""
|
||||
outputs = []
|
||||
for arg in args:
|
||||
# Find which output type can deserialize this partition reference
|
||||
for output_type in job_class.output_types:
|
||||
try:
|
||||
partition = output_type.deserialize(arg)
|
||||
outputs.append(partition)
|
||||
break
|
||||
except ValueError:
|
||||
continue
|
||||
else:
|
||||
raise ValueError(f"No output type in {job_class.__name__} can deserialize partition ref: {arg}")
|
||||
|
||||
return outputs
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: dsl_job_wrapper.py <config|exec> [args...]", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
command = sys.argv[1]
|
||||
|
||||
# Read configuration from environment
|
||||
graph_module_path = os.environ.get('DATABUILD_DSL_GRAPH_MODULE')
|
||||
job_class_name = os.environ.get('DATABUILD_JOB_CLASS')
|
||||
|
||||
if not graph_module_path:
|
||||
print("ERROR: DATABUILD_DSL_GRAPH_MODULE environment variable not set", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
if not job_class_name:
|
||||
print("ERROR: DATABUILD_JOB_CLASS environment variable not set", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
try:
|
||||
# Import the graph module
|
||||
module = importlib.import_module(graph_module_path)
|
||||
graph = getattr(module, 'graph')
|
||||
|
||||
# Get the job class
|
||||
job_class = getattr(module, job_class_name)
|
||||
|
||||
# Create job instance
|
||||
job_instance = job_class()
|
||||
|
||||
except (ImportError, AttributeError) as e:
|
||||
print(f"ERROR: Failed to load job {job_class_name} from {graph_module_path}: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
if command == "config":
|
||||
try:
|
||||
# Parse output partition references from remaining args
|
||||
output_refs = sys.argv[2:]
|
||||
if not output_refs:
|
||||
print("ERROR: No output partition references provided", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
outputs = parse_outputs_from_args(output_refs, job_class)
|
||||
|
||||
# Call job's config method
|
||||
configs = job_instance.config(outputs)
|
||||
|
||||
# Output each config as JSON (one per line for multiple configs)
|
||||
for config in configs:
|
||||
# Convert JobConfig to dict for JSON serialization
|
||||
config_dict = {
|
||||
'outputs': [{'str': ref.str} for ref in config.outputs],
|
||||
'inputs': [
|
||||
{
|
||||
'dep_type_code': dep.dep_type_code,
|
||||
'dep_type_name': dep.dep_type_name,
|
||||
'partition_ref': {'str': dep.partition_ref.str}
|
||||
} for dep in config.inputs
|
||||
],
|
||||
'args': config.args,
|
||||
'env': config.env,
|
||||
}
|
||||
print(json.dumps(config_dict))
|
||||
|
||||
except Exception as e:
|
||||
print(f"ERROR: Config failed: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
elif command == "exec":
|
||||
try:
|
||||
# Read config from stdin
|
||||
job_instance.exec(*sys.argv[2:])
|
||||
|
||||
except Exception as e:
|
||||
print(f"ERROR: Execution failed: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
else:
|
||||
print(f"ERROR: Unknown command '{command}'. Use 'config' or 'exec'.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -1,29 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
DSL code generator that can be run as a py_binary with proper dependencies.
|
||||
"""
|
||||
|
||||
import sys
|
||||
from databuild.dsl.python.generator_lib import generate_dsl_package
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) != 4:
|
||||
print("Usage: generator.py <module_path> <graph_attr> <output_dir>", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
module_path = sys.argv[1]
|
||||
graph_attr = sys.argv[2]
|
||||
output_dir = sys.argv[3]
|
||||
|
||||
try:
|
||||
generate_dsl_package(module_path, graph_attr, output_dir)
|
||||
except Exception as e:
|
||||
print(f"ERROR: {e}", file=sys.stderr)
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -1,38 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Core DSL code generation library that can be imported by different generator binaries.
|
||||
"""
|
||||
|
||||
import os
|
||||
import importlib
|
||||
|
||||
|
||||
def generate_dsl_package(module_path: str, graph_attr: str, output_dir: str, deps: list = None):
|
||||
"""
|
||||
Generate DataBuild DSL package from a graph definition.
|
||||
|
||||
Args:
|
||||
module_path: Python module path (e.g., "databuild.test.app.dsl.graph")
|
||||
graph_attr: Name of the graph attribute in the module
|
||||
output_dir: Directory where to generate the DSL package
|
||||
deps: List of Bazel dependency labels to use in generated BUILD.bazel
|
||||
"""
|
||||
# Extract the base name from the output directory for naming
|
||||
name = os.path.basename(output_dir.rstrip('/')) or "graph"
|
||||
|
||||
try:
|
||||
# Import the graph module
|
||||
module = importlib.import_module(module_path)
|
||||
graph = getattr(module, graph_attr)
|
||||
|
||||
# Generate the bazel package
|
||||
graph.generate_bazel_package(name, output_dir, deps or [])
|
||||
|
||||
print(f"Generated DataBuild DSL package in {output_dir}")
|
||||
|
||||
except ImportError as e:
|
||||
raise ImportError(f"Failed to import {graph_attr} from {module_path}: {e}")
|
||||
except AttributeError as e:
|
||||
raise AttributeError(f"Module {module_path} does not have attribute {graph_attr}: {e}")
|
||||
except Exception as e:
|
||||
raise Exception(f"Generation failed: {e}")
|
||||
|
|
@ -1,8 +0,0 @@
|
|||
py_test(
|
||||
name = "dsl_test",
|
||||
srcs = glob(["*.py"]),
|
||||
deps = [
|
||||
"//databuild/dsl/python:dsl",
|
||||
"@databuild_pypi//pytest",
|
||||
],
|
||||
)
|
||||
|
|
@ -1,75 +0,0 @@
|
|||
|
||||
from databuild.dsl.python.dsl import PartitionPattern, DataBuildGraph, DataBuildJob
|
||||
from databuild.proto import JobConfig, PartitionManifest
|
||||
from dataclasses import dataclass
|
||||
import pytest
|
||||
|
||||
|
||||
@dataclass
|
||||
class DateCategory:
|
||||
data_date: str
|
||||
category: str
|
||||
|
||||
|
||||
class CategoryAnalysisPartition(DateCategory, PartitionPattern):
|
||||
_raw_pattern = r"category_analysis/category=(?P<category>[^/]+)/date=(?P<data_date>\d{4}-\d{2}-\d{2})"
|
||||
|
||||
def test_basic_partition_pattern():
|
||||
p1 = CategoryAnalysisPartition(data_date="2025-01-01", category="comedy")
|
||||
assert p1.serialize() == "category_analysis/category=comedy/date=2025-01-01"
|
||||
|
||||
p2 = CategoryAnalysisPartition.deserialize("category_analysis/category=technology/date=2025-01-02")
|
||||
assert p2.data_date == "2025-01-02"
|
||||
assert p2.category == "technology"
|
||||
|
||||
|
||||
class NotEnoughFieldsPartition(DateCategory, PartitionPattern):
|
||||
# Doesn't use the partition fields
|
||||
_raw_pattern = r"invalid_partition_pattern"
|
||||
|
||||
|
||||
class TooManyFieldsPartition(DateCategory, PartitionPattern):
|
||||
# Doesn't use the partition fields
|
||||
_raw_pattern = r"category_analysis/category=(?P<category>[^/]+)/date=(?P<data_date>\d{4}-\d{2}-\d{2})/hour=(?P<hour>\d{2})"
|
||||
|
||||
|
||||
def test_invalid_partition_pattern():
|
||||
with pytest.raises(ValueError):
|
||||
NotEnoughFieldsPartition(data_date="2025-01-01", category="comedy")._validate_pattern()
|
||||
with pytest.raises(ValueError):
|
||||
TooManyFieldsPartition(data_date="2025-01-01", category="comedy")._validate_pattern()
|
||||
|
||||
|
||||
def test_basic_graph_definition():
|
||||
graph = DataBuildGraph("//:test_graph")
|
||||
|
||||
@graph.job
|
||||
class TestJob(DataBuildJob):
|
||||
output_types = [CategoryAnalysisPartition]
|
||||
def exec(self, config: JobConfig) -> None: ...
|
||||
def config(self, outputs: list[PartitionPattern]) -> list[JobConfig]: ...
|
||||
|
||||
assert len(graph.lookup) == 1
|
||||
assert CategoryAnalysisPartition in graph.lookup
|
||||
|
||||
|
||||
def test_graph_collision():
|
||||
graph = DataBuildGraph("//:test_graph")
|
||||
|
||||
@graph.job
|
||||
class TestJob1(DataBuildJob):
|
||||
output_types = [CategoryAnalysisPartition]
|
||||
def exec(self, config: JobConfig) -> None: ...
|
||||
def config(self, outputs: list[PartitionPattern]) -> list[JobConfig]: ...
|
||||
|
||||
with pytest.raises(AssertionError):
|
||||
# Outputs the same partition, so should raise
|
||||
@graph.job
|
||||
class TestJob2(DataBuildJob):
|
||||
output_types = [CategoryAnalysisPartition]
|
||||
def exec(self, config: JobConfig) -> None: ...
|
||||
def config(self, outputs: list[PartitionPattern]) -> list[JobConfig]: ...
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(pytest.main([__file__]))
|
||||
|
|
@ -1,660 +0,0 @@
|
|||
use crate::*;
|
||||
use crate::event_log::{BuildEventLogError, Result};
|
||||
use crate::event_log::storage::BELStorage;
|
||||
use crate::event_log::query_engine::BELQueryEngine;
|
||||
use async_trait::async_trait;
|
||||
use std::sync::{Arc, Mutex};
|
||||
use rusqlite::Connection;
|
||||
|
||||
/// MockBuildEventLog provides an in-memory SQLite database for testing
|
||||
///
|
||||
/// This implementation makes it easy to specify test data and verify behavior
|
||||
/// while using the real code paths for event writing and repository queries.
|
||||
///
|
||||
/// Key features:
|
||||
/// - Uses in-memory SQLite for parallel test execution
|
||||
/// - Provides event constructors with sensible defaults
|
||||
/// - Allows easy specification of test scenarios
|
||||
/// - Uses the same SQL schema as production SQLite implementation
|
||||
pub struct MockBuildEventLog {
|
||||
connection: Arc<Mutex<Connection>>,
|
||||
}
|
||||
|
||||
impl MockBuildEventLog {
|
||||
/// Create a new MockBuildEventLog with an in-memory SQLite database
|
||||
pub async fn new() -> Result<Self> {
|
||||
let conn = Connection::open(":memory:")
|
||||
.map_err(|e| BuildEventLogError::ConnectionError(e.to_string()))?;
|
||||
|
||||
// Disable foreign key constraints for simplicity in testing
|
||||
// conn.execute("PRAGMA foreign_keys = ON", [])
|
||||
|
||||
let mock = Self {
|
||||
connection: Arc::new(Mutex::new(conn)),
|
||||
};
|
||||
|
||||
// Initialize the schema
|
||||
mock.initialize().await?;
|
||||
|
||||
Ok(mock)
|
||||
}
|
||||
|
||||
/// Create a new MockBuildEventLog with predefined events
|
||||
pub async fn with_events(events: Vec<BuildEvent>) -> Result<Self> {
|
||||
let mock = Self::new().await?;
|
||||
|
||||
// Insert all provided events
|
||||
for event in events {
|
||||
mock.append_event(event).await?;
|
||||
}
|
||||
|
||||
Ok(mock)
|
||||
}
|
||||
|
||||
/// Get the number of events in the mock event log
|
||||
pub async fn event_count(&self) -> Result<usize> {
|
||||
let conn = self.connection.lock().unwrap();
|
||||
let mut stmt = conn.prepare("SELECT COUNT(*) FROM build_events")
|
||||
.map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let count: i64 = stmt.query_row([], |row| row.get(0))
|
||||
.map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
Ok(count as usize)
|
||||
}
|
||||
|
||||
/// Get all events ordered by timestamp
|
||||
pub async fn get_all_events(&self) -> Result<Vec<BuildEvent>> {
|
||||
let conn = self.connection.lock().unwrap();
|
||||
let mut stmt = conn.prepare(
|
||||
"SELECT event_data FROM build_events ORDER BY timestamp ASC"
|
||||
).map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let rows = stmt.query_map([], |row| {
|
||||
let event_data: String = row.get(0)?;
|
||||
Ok(event_data)
|
||||
}).map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let mut events = Vec::new();
|
||||
for row in rows {
|
||||
let event_data = row.map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
let event: BuildEvent = serde_json::from_str(&event_data)
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
events.push(event);
|
||||
}
|
||||
|
||||
Ok(events)
|
||||
}
|
||||
|
||||
/// Clear all events from the mock event log
|
||||
pub async fn clear(&self) -> Result<()> {
|
||||
let conn = self.connection.lock().unwrap();
|
||||
|
||||
// Clear all tables
|
||||
conn.execute("DELETE FROM build_events", [])
|
||||
.map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
conn.execute("DELETE FROM build_request_events", [])
|
||||
.map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
conn.execute("DELETE FROM partition_events", [])
|
||||
.map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
conn.execute("DELETE FROM job_events", [])
|
||||
.map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
conn.execute("DELETE FROM delegation_events", [])
|
||||
.map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
conn.execute("DELETE FROM job_graph_events", [])
|
||||
.map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Initialize the database schema for testing
|
||||
pub async fn initialize(&self) -> Result<()> {
|
||||
let conn = self.connection.lock().unwrap();
|
||||
|
||||
// Create main events table
|
||||
conn.execute(
|
||||
"CREATE TABLE IF NOT EXISTS build_events (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
timestamp INTEGER NOT NULL,
|
||||
build_request_id TEXT NOT NULL,
|
||||
event_type TEXT NOT NULL,
|
||||
event_data TEXT NOT NULL
|
||||
)",
|
||||
[],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
|
||||
// Create supporting tables for easier queries
|
||||
conn.execute(
|
||||
"CREATE TABLE IF NOT EXISTS build_request_events (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
status TEXT NOT NULL,
|
||||
requested_partitions TEXT NOT NULL,
|
||||
message TEXT NOT NULL
|
||||
)",
|
||||
[],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
|
||||
conn.execute(
|
||||
"CREATE TABLE IF NOT EXISTS partition_events (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
partition_ref TEXT NOT NULL,
|
||||
status TEXT NOT NULL,
|
||||
message TEXT NOT NULL,
|
||||
job_run_id TEXT
|
||||
)",
|
||||
[],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
|
||||
conn.execute(
|
||||
"CREATE TABLE IF NOT EXISTS job_events (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
job_run_id TEXT NOT NULL,
|
||||
job_label TEXT NOT NULL,
|
||||
target_partitions TEXT NOT NULL,
|
||||
status TEXT NOT NULL,
|
||||
message TEXT NOT NULL,
|
||||
config_json TEXT,
|
||||
manifests_json TEXT NOT NULL
|
||||
)",
|
||||
[],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
|
||||
conn.execute(
|
||||
"CREATE TABLE IF NOT EXISTS delegation_events (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
partition_ref TEXT NOT NULL,
|
||||
delegated_to_build_request_id TEXT NOT NULL,
|
||||
message TEXT NOT NULL
|
||||
)",
|
||||
[],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
|
||||
conn.execute(
|
||||
"CREATE TABLE IF NOT EXISTS job_graph_events (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
job_graph_json TEXT NOT NULL
|
||||
)",
|
||||
[],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Append an event to the mock event log
|
||||
pub async fn append_event(&self, event: BuildEvent) -> Result<()> {
|
||||
let conn = self.connection.lock().unwrap();
|
||||
|
||||
// Serialize the entire event for storage
|
||||
let event_data = serde_json::to_string(&event)
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
|
||||
// Insert into main events table
|
||||
conn.execute(
|
||||
"INSERT INTO build_events (event_id, timestamp, build_request_id, event_type, event_data) VALUES (?1, ?2, ?3, ?4, ?5)",
|
||||
rusqlite::params![
|
||||
event.event_id,
|
||||
event.timestamp,
|
||||
event.build_request_id,
|
||||
match &event.event_type {
|
||||
Some(crate::build_event::EventType::BuildRequestEvent(_)) => "build_request",
|
||||
Some(crate::build_event::EventType::PartitionEvent(_)) => "partition",
|
||||
Some(crate::build_event::EventType::JobEvent(_)) => "job",
|
||||
Some(crate::build_event::EventType::DelegationEvent(_)) => "delegation",
|
||||
Some(crate::build_event::EventType::JobGraphEvent(_)) => "job_graph",
|
||||
Some(crate::build_event::EventType::PartitionInvalidationEvent(_)) => "partition_invalidation",
|
||||
Some(crate::build_event::EventType::JobRunCancelEvent(_)) => "job_run_cancel",
|
||||
Some(crate::build_event::EventType::BuildCancelEvent(_)) => "build_cancel",
|
||||
None => "unknown",
|
||||
},
|
||||
event_data
|
||||
],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
|
||||
// Insert into specific event type table for better querying
|
||||
match &event.event_type {
|
||||
Some(crate::build_event::EventType::BuildRequestEvent(br_event)) => {
|
||||
let partitions_json = serde_json::to_string(&br_event.requested_partitions)
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
|
||||
conn.execute(
|
||||
"INSERT INTO build_request_events (event_id, status, requested_partitions, message) VALUES (?1, ?2, ?3, ?4)",
|
||||
rusqlite::params![
|
||||
event.event_id,
|
||||
br_event.status_code.to_string(),
|
||||
partitions_json,
|
||||
br_event.message
|
||||
],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
}
|
||||
Some(crate::build_event::EventType::PartitionEvent(p_event)) => {
|
||||
conn.execute(
|
||||
"INSERT INTO partition_events (event_id, partition_ref, status, message, job_run_id) VALUES (?1, ?2, ?3, ?4, ?5)",
|
||||
rusqlite::params![
|
||||
event.event_id,
|
||||
p_event.partition_ref.as_ref().map(|r| &r.str).unwrap_or(&String::new()),
|
||||
p_event.status_code.to_string(),
|
||||
p_event.message,
|
||||
if p_event.job_run_id.is_empty() { None } else { Some(&p_event.job_run_id) }
|
||||
],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
}
|
||||
Some(crate::build_event::EventType::JobEvent(j_event)) => {
|
||||
let partitions_json = serde_json::to_string(&j_event.target_partitions)
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
let config_json = j_event.config.as_ref()
|
||||
.map(|c| serde_json::to_string(c))
|
||||
.transpose()
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
let manifests_json = serde_json::to_string(&j_event.manifests)
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
|
||||
conn.execute(
|
||||
"INSERT INTO job_events (event_id, job_run_id, job_label, target_partitions, status, message, config_json, manifests_json) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8)",
|
||||
rusqlite::params![
|
||||
event.event_id,
|
||||
j_event.job_run_id,
|
||||
j_event.job_label.as_ref().map(|l| &l.label).unwrap_or(&String::new()),
|
||||
partitions_json,
|
||||
j_event.status_code.to_string(),
|
||||
j_event.message,
|
||||
config_json,
|
||||
manifests_json
|
||||
],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
}
|
||||
_ => {} // Other event types don't need special handling for testing
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Get all events for a specific build request
|
||||
pub async fn get_build_request_events(&self, build_request_id: &str, _limit: Option<u32>) -> Result<Vec<BuildEvent>> {
|
||||
let conn = self.connection.lock().unwrap();
|
||||
let mut stmt = conn.prepare(
|
||||
"SELECT event_data FROM build_events WHERE build_request_id = ? ORDER BY timestamp ASC"
|
||||
).map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let rows = stmt.query_map([build_request_id], |row| {
|
||||
let event_data: String = row.get(0)?;
|
||||
Ok(event_data)
|
||||
}).map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let mut events = Vec::new();
|
||||
for row in rows {
|
||||
let event_data = row.map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
let event: BuildEvent = serde_json::from_str(&event_data)
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
events.push(event);
|
||||
}
|
||||
|
||||
Ok(events)
|
||||
}
|
||||
|
||||
/// Get all events for a specific partition
|
||||
pub async fn get_partition_events(&self, partition_ref: &str, _limit: Option<u32>) -> Result<Vec<BuildEvent>> {
|
||||
let conn = self.connection.lock().unwrap();
|
||||
let mut stmt = conn.prepare(
|
||||
"SELECT e.event_data FROM build_events e
|
||||
JOIN partition_events p ON e.event_id = p.event_id
|
||||
WHERE p.partition_ref = ? ORDER BY e.timestamp ASC"
|
||||
).map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let rows = stmt.query_map([partition_ref], |row| {
|
||||
let event_data: String = row.get(0)?;
|
||||
Ok(event_data)
|
||||
}).map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let mut events = Vec::new();
|
||||
for row in rows {
|
||||
let event_data = row.map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
let event: BuildEvent = serde_json::from_str(&event_data)
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
events.push(event);
|
||||
}
|
||||
|
||||
Ok(events)
|
||||
}
|
||||
|
||||
/// Get the latest status for a partition
|
||||
pub async fn get_latest_partition_status(&self, partition_ref: &str) -> Result<Option<(PartitionStatus, i64)>> {
|
||||
let conn = self.connection.lock().unwrap();
|
||||
let mut stmt = conn.prepare(
|
||||
"SELECT p.status, e.timestamp FROM build_events e
|
||||
JOIN partition_events p ON e.event_id = p.event_id
|
||||
WHERE p.partition_ref = ? ORDER BY e.timestamp DESC LIMIT 1"
|
||||
).map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let result = stmt.query_row([partition_ref], |row| {
|
||||
let status_str: String = row.get(0)?;
|
||||
let timestamp: i64 = row.get(1)?;
|
||||
let status_code = status_str.parse::<i32>().unwrap_or(0);
|
||||
let status = PartitionStatus::try_from(status_code).unwrap_or(PartitionStatus::PartitionUnknown);
|
||||
Ok((status, timestamp))
|
||||
});
|
||||
|
||||
match result {
|
||||
Ok(status_and_timestamp) => Ok(Some(status_and_timestamp)),
|
||||
Err(rusqlite::Error::QueryReturnedNoRows) => Ok(None),
|
||||
Err(e) => Err(BuildEventLogError::QueryError(e.to_string())),
|
||||
}
|
||||
}
|
||||
|
||||
/// Get events in a timestamp range (used by BELStorage)
|
||||
pub async fn get_events_in_range(&self, start: i64, end: i64) -> Result<Vec<BuildEvent>> {
|
||||
let conn = self.connection.lock().unwrap();
|
||||
let mut stmt = conn.prepare(
|
||||
"SELECT event_data FROM build_events WHERE timestamp >= ? AND timestamp <= ? ORDER BY timestamp ASC"
|
||||
).map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let rows = stmt.query_map([start, end], |row| {
|
||||
let event_data: String = row.get(0)?;
|
||||
Ok(event_data)
|
||||
}).map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let mut events = Vec::new();
|
||||
for row in rows {
|
||||
let event_data = row.map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
let event: BuildEvent = serde_json::from_str(&event_data)
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
events.push(event);
|
||||
}
|
||||
|
||||
Ok(events)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/// Utility functions for creating test events with sensible defaults
|
||||
pub mod test_events {
|
||||
use super::*;
|
||||
use crate::event_log::{generate_event_id, current_timestamp_nanos};
|
||||
use uuid::Uuid;
|
||||
|
||||
/// Create a build request received event with random defaults
|
||||
pub fn build_request_received(
|
||||
build_request_id: Option<String>,
|
||||
partitions: Vec<PartitionRef>,
|
||||
) -> BuildEvent {
|
||||
BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id: build_request_id.unwrap_or_else(|| Uuid::new_v4().to_string()),
|
||||
event_type: Some(build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestReceived as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestReceived.to_display_string(),
|
||||
requested_partitions: partitions,
|
||||
message: "Build request received".to_string(),
|
||||
})),
|
||||
}
|
||||
}
|
||||
|
||||
/// Create a build request event with specific status
|
||||
pub fn build_request_event(
|
||||
build_request_id: Option<String>,
|
||||
partitions: Vec<PartitionRef>,
|
||||
status: BuildRequestStatus,
|
||||
) -> BuildEvent {
|
||||
BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id: build_request_id.unwrap_or_else(|| Uuid::new_v4().to_string()),
|
||||
event_type: Some(build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: status as i32,
|
||||
status_name: status.to_display_string(),
|
||||
requested_partitions: partitions,
|
||||
message: format!("Build request status: {:?}", status),
|
||||
})),
|
||||
}
|
||||
}
|
||||
|
||||
/// Create a partition status event with random defaults
|
||||
pub fn partition_status(
|
||||
build_request_id: Option<String>,
|
||||
partition_ref: PartitionRef,
|
||||
status: PartitionStatus,
|
||||
job_run_id: Option<String>,
|
||||
) -> BuildEvent {
|
||||
BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id: build_request_id.unwrap_or_else(|| Uuid::new_v4().to_string()),
|
||||
event_type: Some(build_event::EventType::PartitionEvent(PartitionEvent {
|
||||
partition_ref: Some(partition_ref),
|
||||
status_code: status as i32,
|
||||
status_name: status.to_display_string(),
|
||||
message: format!("Partition status: {:?}", status),
|
||||
job_run_id: job_run_id.unwrap_or_default(),
|
||||
})),
|
||||
}
|
||||
}
|
||||
|
||||
/// Create a job event with random defaults
|
||||
pub fn job_event(
|
||||
build_request_id: Option<String>,
|
||||
job_run_id: Option<String>,
|
||||
job_label: JobLabel,
|
||||
target_partitions: Vec<PartitionRef>,
|
||||
status: JobStatus,
|
||||
) -> BuildEvent {
|
||||
BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id: build_request_id.unwrap_or_else(|| Uuid::new_v4().to_string()),
|
||||
event_type: Some(build_event::EventType::JobEvent(JobEvent {
|
||||
job_run_id: job_run_id.unwrap_or_else(|| Uuid::new_v4().to_string()),
|
||||
job_label: Some(job_label),
|
||||
target_partitions,
|
||||
status_code: status as i32,
|
||||
status_name: status.to_display_string(),
|
||||
message: format!("Job status: {:?}", status),
|
||||
config: None,
|
||||
manifests: vec![],
|
||||
})),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use super::test_events::*;
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_mock_build_event_log_basic() {
|
||||
let mock = MockBuildEventLog::new().await.unwrap();
|
||||
|
||||
// Initially empty
|
||||
assert_eq!(mock.event_count().await.unwrap(), 0);
|
||||
|
||||
// Add an event
|
||||
let build_id = "test-build-123".to_string();
|
||||
let partition = PartitionRef { str: "test/partition".to_string() };
|
||||
let event = build_request_received(Some(build_id.clone()), vec![partition]);
|
||||
|
||||
mock.append_event(event).await.unwrap();
|
||||
|
||||
// Check event count
|
||||
assert_eq!(mock.event_count().await.unwrap(), 1);
|
||||
|
||||
// Query events by build request
|
||||
let events = mock.get_build_request_events(&build_id, None).await.unwrap();
|
||||
assert_eq!(events.len(), 1);
|
||||
|
||||
// Clear events
|
||||
mock.clear().await.unwrap();
|
||||
assert_eq!(mock.event_count().await.unwrap(), 0);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_mock_build_event_log_with_predefined_events() {
|
||||
let build_id = "test-build-456".to_string();
|
||||
let partition = PartitionRef { str: "data/users".to_string() };
|
||||
|
||||
let events = vec![
|
||||
build_request_received(Some(build_id.clone()), vec![partition.clone()]),
|
||||
partition_status(Some(build_id.clone()), partition.clone(), PartitionStatus::PartitionBuilding, None),
|
||||
partition_status(Some(build_id.clone()), partition.clone(), PartitionStatus::PartitionAvailable, None),
|
||||
];
|
||||
|
||||
let mock = MockBuildEventLog::with_events(events).await.unwrap();
|
||||
|
||||
// Should have 3 events
|
||||
assert_eq!(mock.event_count().await.unwrap(), 3);
|
||||
|
||||
// Query partition events
|
||||
let partition_events = mock.get_partition_events(&partition.str, None).await.unwrap();
|
||||
assert_eq!(partition_events.len(), 2); // Two partition events
|
||||
|
||||
// Check latest partition status
|
||||
let latest_status = mock.get_latest_partition_status(&partition.str).await.unwrap();
|
||||
assert!(latest_status.is_some());
|
||||
let (status, _timestamp) = latest_status.unwrap();
|
||||
assert_eq!(status, PartitionStatus::PartitionAvailable);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_event_constructors() {
|
||||
let partition = PartitionRef { str: "test/data".to_string() };
|
||||
let job_label = JobLabel { label: "//:test_job".to_string() };
|
||||
|
||||
// Test build request event constructor
|
||||
let br_event = build_request_received(None, vec![partition.clone()]);
|
||||
assert!(matches!(br_event.event_type, Some(build_event::EventType::BuildRequestEvent(_))));
|
||||
|
||||
// Test partition event constructor
|
||||
let p_event = partition_status(None, partition.clone(), PartitionStatus::PartitionAvailable, None);
|
||||
assert!(matches!(p_event.event_type, Some(build_event::EventType::PartitionEvent(_))));
|
||||
|
||||
// Test job event constructor
|
||||
let j_event = job_event(None, None, job_label, vec![partition], JobStatus::JobCompleted);
|
||||
assert!(matches!(j_event.event_type, Some(build_event::EventType::JobEvent(_))));
|
||||
}
|
||||
}
|
||||
|
||||
/// MockBELStorage is a BELStorage implementation that wraps MockBuildEventLog
|
||||
/// This allows us to use the real BELQueryEngine in tests while having control over the data
|
||||
pub struct MockBELStorage {
|
||||
mock_log: Arc<MockBuildEventLog>,
|
||||
}
|
||||
|
||||
impl MockBELStorage {
|
||||
pub async fn new() -> Result<Self> {
|
||||
let mock_log = Arc::new(MockBuildEventLog::new().await?);
|
||||
Ok(Self { mock_log })
|
||||
}
|
||||
|
||||
pub async fn with_events(events: Vec<BuildEvent>) -> Result<Self> {
|
||||
let mock_log = Arc::new(MockBuildEventLog::with_events(events).await?);
|
||||
Ok(Self { mock_log })
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl BELStorage for MockBELStorage {
|
||||
async fn append_event(&self, event: BuildEvent) -> Result<i64> {
|
||||
self.mock_log.append_event(event).await?;
|
||||
Ok(0) // Return dummy index for mock storage
|
||||
}
|
||||
|
||||
async fn list_events(&self, since_idx: i64, filter: EventFilter) -> Result<EventPage> {
|
||||
// Get all events first (MockBELEventLog uses timestamps, so we get all events)
|
||||
let mut events = self.mock_log.get_events_in_range(0, i64::MAX).await?;
|
||||
|
||||
// Apply filtering based on EventFilter
|
||||
events.retain(|event| {
|
||||
// Filter by build request IDs if specified
|
||||
if !filter.build_request_ids.is_empty() {
|
||||
if !filter.build_request_ids.contains(&event.build_request_id) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Filter by partition refs if specified
|
||||
if !filter.partition_refs.is_empty() {
|
||||
let has_matching_partition = match &event.event_type {
|
||||
Some(build_event::EventType::PartitionEvent(pe)) => {
|
||||
pe.partition_ref.as_ref()
|
||||
.map(|pr| filter.partition_refs.contains(&pr.str))
|
||||
.unwrap_or(false)
|
||||
}
|
||||
Some(build_event::EventType::BuildRequestEvent(bre)) => {
|
||||
bre.requested_partitions.iter()
|
||||
.any(|pr| filter.partition_refs.contains(&pr.str))
|
||||
}
|
||||
Some(build_event::EventType::JobEvent(je)) => {
|
||||
je.target_partitions.iter()
|
||||
.any(|pr| filter.partition_refs.contains(&pr.str))
|
||||
}
|
||||
_ => false,
|
||||
};
|
||||
if !has_matching_partition {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Filter by job labels if specified
|
||||
if !filter.job_labels.is_empty() {
|
||||
let has_matching_job = match &event.event_type {
|
||||
Some(build_event::EventType::JobEvent(je)) => {
|
||||
je.job_label.as_ref()
|
||||
.map(|jl| filter.job_labels.contains(&jl.label))
|
||||
.unwrap_or(false)
|
||||
}
|
||||
_ => false,
|
||||
};
|
||||
if !has_matching_job {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Filter by job run IDs if specified
|
||||
if !filter.job_run_ids.is_empty() {
|
||||
let has_matching_job_run = match &event.event_type {
|
||||
Some(build_event::EventType::JobEvent(je)) => {
|
||||
filter.job_run_ids.contains(&je.job_run_id)
|
||||
}
|
||||
Some(build_event::EventType::JobRunCancelEvent(jrce)) => {
|
||||
filter.job_run_ids.contains(&jrce.job_run_id)
|
||||
}
|
||||
Some(build_event::EventType::PartitionEvent(pe)) => {
|
||||
if pe.job_run_id.is_empty() {
|
||||
false
|
||||
} else {
|
||||
filter.job_run_ids.contains(&pe.job_run_id)
|
||||
}
|
||||
}
|
||||
// Add other job-run-related events here if they exist
|
||||
_ => false,
|
||||
};
|
||||
if !has_matching_job_run {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
true
|
||||
});
|
||||
|
||||
Ok(EventPage {
|
||||
events,
|
||||
next_idx: since_idx + 1, // Simple increment for testing
|
||||
has_more: false, // Simplify for testing
|
||||
})
|
||||
}
|
||||
|
||||
async fn initialize(&self) -> Result<()> {
|
||||
self.mock_log.initialize().await
|
||||
}
|
||||
}
|
||||
|
||||
/// Helper function to create a BELQueryEngine for testing with mock data
|
||||
pub async fn create_mock_bel_query_engine() -> Result<Arc<BELQueryEngine>> {
|
||||
let storage: Arc<dyn BELStorage> = Arc::new(MockBELStorage::new().await?);
|
||||
Ok(Arc::new(BELQueryEngine::new(storage)))
|
||||
}
|
||||
|
||||
/// Helper function to create a BELQueryEngine for testing with predefined events
|
||||
pub async fn create_mock_bel_query_engine_with_events(events: Vec<BuildEvent>) -> Result<Arc<BELQueryEngine>> {
|
||||
let storage: Arc<dyn BELStorage> = Arc::new(MockBELStorage::with_events(events).await?);
|
||||
Ok(Arc::new(BELQueryEngine::new(storage)))
|
||||
}
|
||||
|
|
@ -1,113 +0,0 @@
|
|||
use crate::*;
|
||||
use std::error::Error as StdError;
|
||||
use uuid::Uuid;
|
||||
|
||||
pub mod writer;
|
||||
pub mod mock;
|
||||
pub mod storage;
|
||||
pub mod sqlite_storage;
|
||||
pub mod query_engine;
|
||||
|
||||
#[derive(Debug)]
|
||||
pub enum BuildEventLogError {
|
||||
DatabaseError(String),
|
||||
SerializationError(String),
|
||||
ConnectionError(String),
|
||||
QueryError(String),
|
||||
}
|
||||
|
||||
impl std::fmt::Display for BuildEventLogError {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
match self {
|
||||
BuildEventLogError::DatabaseError(msg) => write!(f, "Database error: {}", msg),
|
||||
BuildEventLogError::SerializationError(msg) => write!(f, "Serialization error: {}", msg),
|
||||
BuildEventLogError::ConnectionError(msg) => write!(f, "Connection error: {}", msg),
|
||||
BuildEventLogError::QueryError(msg) => write!(f, "Query error: {}", msg),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl StdError for BuildEventLogError {}
|
||||
|
||||
pub type Result<T> = std::result::Result<T, BuildEventLogError>;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct QueryResult {
|
||||
pub columns: Vec<String>,
|
||||
pub rows: Vec<Vec<String>>,
|
||||
}
|
||||
|
||||
// Summary types for list endpoints
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct BuildRequestSummary {
|
||||
pub build_request_id: String,
|
||||
pub status: BuildRequestStatus,
|
||||
pub requested_partitions: Vec<String>,
|
||||
pub created_at: i64,
|
||||
pub updated_at: i64,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct PartitionSummary {
|
||||
pub partition_ref: String,
|
||||
pub status: PartitionStatus,
|
||||
pub updated_at: i64,
|
||||
pub build_request_id: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct ActivitySummary {
|
||||
pub active_builds_count: u32,
|
||||
pub recent_builds: Vec<BuildRequestSummary>,
|
||||
pub recent_partitions: Vec<PartitionSummary>,
|
||||
pub total_partitions_count: u32,
|
||||
}
|
||||
|
||||
|
||||
// Helper function to generate event ID
|
||||
pub fn generate_event_id() -> String {
|
||||
Uuid::new_v4().to_string()
|
||||
}
|
||||
|
||||
// Helper function to get current timestamp in nanoseconds
|
||||
pub fn current_timestamp_nanos() -> i64 {
|
||||
std::time::SystemTime::now()
|
||||
.duration_since(std::time::UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_nanos() as i64
|
||||
}
|
||||
|
||||
// Helper function to create build event with metadata
|
||||
pub fn create_build_event(
|
||||
build_request_id: String,
|
||||
event_type: crate::build_event::EventType,
|
||||
) -> BuildEvent {
|
||||
BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id,
|
||||
event_type: Some(event_type),
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
// Parse build event log URI and create BEL query engine with appropriate storage backend
|
||||
pub async fn create_bel_query_engine(uri: &str) -> Result<std::sync::Arc<query_engine::BELQueryEngine>> {
|
||||
use std::sync::Arc;
|
||||
use storage::BELStorage;
|
||||
|
||||
if uri == "stdout" {
|
||||
let storage: Arc<dyn BELStorage> = Arc::new(storage::StdoutBELStorage::new());
|
||||
storage.initialize().await?;
|
||||
Ok(Arc::new(query_engine::BELQueryEngine::new(storage)))
|
||||
} else if uri.starts_with("sqlite://") {
|
||||
let path = &uri[9..]; // Remove "sqlite://" prefix
|
||||
let storage: Arc<dyn BELStorage> = Arc::new(sqlite_storage::SqliteBELStorage::new(path)?);
|
||||
storage.initialize().await?;
|
||||
Ok(Arc::new(query_engine::BELQueryEngine::new(storage)))
|
||||
} else {
|
||||
Err(BuildEventLogError::ConnectionError(
|
||||
format!("Unsupported build event log URI for BEL query engine: {}", uri)
|
||||
))
|
||||
}
|
||||
}
|
||||
|
|
@ -1,388 +0,0 @@
|
|||
use super::*;
|
||||
use super::storage::BELStorage;
|
||||
use std::sync::Arc;
|
||||
use std::collections::HashMap;
|
||||
|
||||
/// App-layer aggregation that scans storage events
|
||||
pub struct BELQueryEngine {
|
||||
storage: Arc<dyn BELStorage>,
|
||||
}
|
||||
|
||||
impl BELQueryEngine {
|
||||
pub fn new(storage: Arc<dyn BELStorage>) -> Self {
|
||||
Self { storage }
|
||||
}
|
||||
|
||||
/// Get latest status for a partition by scanning recent events
|
||||
pub async fn get_latest_partition_status(&self, partition_ref: &str) -> Result<Option<(PartitionStatus, i64)>> {
|
||||
let filter = EventFilter {
|
||||
partition_refs: vec![partition_ref.to_string()],
|
||||
partition_patterns: vec![],
|
||||
job_labels: vec![],
|
||||
job_run_ids: vec![],
|
||||
build_request_ids: vec![],
|
||||
};
|
||||
|
||||
let events = self.storage.list_events(0, filter).await?;
|
||||
self.aggregate_partition_status(&events.events)
|
||||
}
|
||||
|
||||
/// Get all build requests that are currently building a partition
|
||||
pub async fn get_active_builds_for_partition(&self, partition_ref: &str) -> Result<Vec<String>> {
|
||||
let filter = EventFilter {
|
||||
partition_refs: vec![partition_ref.to_string()],
|
||||
partition_patterns: vec![],
|
||||
job_labels: vec![],
|
||||
job_run_ids: vec![],
|
||||
build_request_ids: vec![],
|
||||
};
|
||||
|
||||
let events = self.storage.list_events(0, filter).await?;
|
||||
let mut active_builds = Vec::new();
|
||||
let mut build_states: HashMap<String, BuildRequestStatus> = HashMap::new();
|
||||
|
||||
// Process events chronologically to track build states
|
||||
for event in events.events {
|
||||
match &event.event_type {
|
||||
Some(crate::build_event::EventType::BuildRequestEvent(br_event)) => {
|
||||
if let Ok(status) = BuildRequestStatus::try_from(br_event.status_code) {
|
||||
build_states.insert(event.build_request_id.clone(), status);
|
||||
}
|
||||
}
|
||||
Some(crate::build_event::EventType::PartitionEvent(p_event)) => {
|
||||
if let Some(partition_event_ref) = &p_event.partition_ref {
|
||||
if partition_event_ref.str == partition_ref {
|
||||
// Check if this partition is actively being built
|
||||
if let Ok(status) = PartitionStatus::try_from(p_event.status_code) {
|
||||
if matches!(status, PartitionStatus::PartitionBuilding | PartitionStatus::PartitionAnalyzed) {
|
||||
// Check if the build request is still active
|
||||
if let Some(build_status) = build_states.get(&event.build_request_id) {
|
||||
if matches!(build_status,
|
||||
BuildRequestStatus::BuildRequestReceived |
|
||||
BuildRequestStatus::BuildRequestPlanning |
|
||||
BuildRequestStatus::BuildRequestExecuting |
|
||||
BuildRequestStatus::BuildRequestAnalysisCompleted
|
||||
) {
|
||||
if !active_builds.contains(&event.build_request_id) {
|
||||
active_builds.push(event.build_request_id.clone());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(active_builds)
|
||||
}
|
||||
|
||||
/// Get summary of a build request by aggregating its events
|
||||
pub async fn get_build_request_summary(&self, build_id: &str) -> Result<BuildRequestSummary> {
|
||||
let filter = EventFilter {
|
||||
partition_refs: vec![],
|
||||
partition_patterns: vec![],
|
||||
job_labels: vec![],
|
||||
job_run_ids: vec![],
|
||||
build_request_ids: vec![build_id.to_string()],
|
||||
};
|
||||
|
||||
let events = self.storage.list_events(0, filter).await?;
|
||||
|
||||
// If no events found, build doesn't exist
|
||||
if events.events.is_empty() {
|
||||
return Err(BuildEventLogError::QueryError(format!("Build request '{}' not found", build_id)));
|
||||
}
|
||||
|
||||
let mut status = BuildRequestStatus::BuildRequestUnknown;
|
||||
let mut requested_partitions = Vec::new();
|
||||
let mut created_at = 0i64;
|
||||
let mut updated_at = 0i64;
|
||||
|
||||
for event in events.events {
|
||||
if event.timestamp > 0 {
|
||||
if created_at == 0 || event.timestamp < created_at {
|
||||
created_at = event.timestamp;
|
||||
}
|
||||
if event.timestamp > updated_at {
|
||||
updated_at = event.timestamp;
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(crate::build_event::EventType::BuildRequestEvent(br_event)) = &event.event_type {
|
||||
if let Ok(event_status) = BuildRequestStatus::try_from(br_event.status_code) {
|
||||
status = event_status;
|
||||
}
|
||||
if !br_event.requested_partitions.is_empty() {
|
||||
requested_partitions = br_event.requested_partitions.iter()
|
||||
.map(|p| p.str.clone())
|
||||
.collect();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(BuildRequestSummary {
|
||||
build_request_id: build_id.to_string(),
|
||||
status,
|
||||
requested_partitions,
|
||||
created_at,
|
||||
updated_at,
|
||||
})
|
||||
}
|
||||
|
||||
/// List build requests with pagination and filtering
|
||||
pub async fn list_build_requests(&self, request: BuildsListRequest) -> Result<BuildsListResponse> {
|
||||
// For now, scan all events and aggregate
|
||||
let filter = EventFilter {
|
||||
partition_refs: vec![],
|
||||
partition_patterns: vec![],
|
||||
job_labels: vec![],
|
||||
job_run_ids: vec![],
|
||||
build_request_ids: vec![],
|
||||
};
|
||||
|
||||
let events = self.storage.list_events(0, filter).await?;
|
||||
let mut build_summaries: HashMap<String, BuildRequestSummary> = HashMap::new();
|
||||
|
||||
// Aggregate by build request ID
|
||||
for event in events.events {
|
||||
if let Some(crate::build_event::EventType::BuildRequestEvent(br_event)) = &event.event_type {
|
||||
let build_id = &event.build_request_id;
|
||||
let entry = build_summaries.entry(build_id.clone()).or_insert_with(|| {
|
||||
BuildRequestSummary {
|
||||
build_request_id: build_id.clone(),
|
||||
status: BuildRequestStatus::BuildRequestUnknown,
|
||||
requested_partitions: Vec::new(),
|
||||
created_at: event.timestamp,
|
||||
updated_at: event.timestamp,
|
||||
}
|
||||
});
|
||||
|
||||
if let Ok(status) = BuildRequestStatus::try_from(br_event.status_code) {
|
||||
entry.status = status;
|
||||
}
|
||||
entry.updated_at = event.timestamp.max(entry.updated_at);
|
||||
if !br_event.requested_partitions.is_empty() {
|
||||
entry.requested_partitions = br_event.requested_partitions.iter()
|
||||
.map(|p| p.str.clone())
|
||||
.collect();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let mut builds: Vec<_> = build_summaries.into_values().collect();
|
||||
builds.sort_by(|a, b| b.created_at.cmp(&a.created_at)); // Most recent first
|
||||
|
||||
// Apply status filter if provided
|
||||
if let Some(status_filter) = &request.status_filter {
|
||||
if let Ok(filter_status) = status_filter.parse::<i32>() {
|
||||
if let Ok(status) = BuildRequestStatus::try_from(filter_status) {
|
||||
builds.retain(|b| b.status == status);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let total_count = builds.len() as u32;
|
||||
let offset = request.offset.unwrap_or(0) as usize;
|
||||
let limit = request.limit.unwrap_or(50) as usize;
|
||||
|
||||
let paginated_builds = builds.into_iter()
|
||||
.skip(offset)
|
||||
.take(limit)
|
||||
.map(|summary| BuildSummary {
|
||||
build_request_id: summary.build_request_id,
|
||||
status_code: summary.status as i32,
|
||||
status_name: summary.status.to_display_string(),
|
||||
requested_partitions: summary.requested_partitions.into_iter()
|
||||
.map(|s| PartitionRef { str: s })
|
||||
.collect(),
|
||||
total_jobs: 0, // TODO: Implement
|
||||
completed_jobs: 0, // TODO: Implement
|
||||
failed_jobs: 0, // TODO: Implement
|
||||
cancelled_jobs: 0, // TODO: Implement
|
||||
requested_at: summary.created_at,
|
||||
started_at: None, // TODO: Implement
|
||||
completed_at: None, // TODO: Implement
|
||||
duration_ms: None, // TODO: Implement
|
||||
cancelled: false, // TODO: Implement
|
||||
})
|
||||
.collect();
|
||||
|
||||
Ok(BuildsListResponse {
|
||||
builds: paginated_builds,
|
||||
total_count,
|
||||
has_more: (offset + limit) < total_count as usize,
|
||||
})
|
||||
}
|
||||
|
||||
/// Get activity summary for dashboard
|
||||
pub async fn get_activity_summary(&self) -> Result<ActivitySummary> {
|
||||
let builds_response = self.list_build_requests(BuildsListRequest {
|
||||
limit: Some(5),
|
||||
offset: Some(0),
|
||||
status_filter: None,
|
||||
}).await?;
|
||||
|
||||
let active_builds_count = builds_response.builds.iter()
|
||||
.filter(|b| matches!(
|
||||
BuildRequestStatus::try_from(b.status_code).unwrap_or(BuildRequestStatus::BuildRequestUnknown),
|
||||
BuildRequestStatus::BuildRequestReceived |
|
||||
BuildRequestStatus::BuildRequestPlanning |
|
||||
BuildRequestStatus::BuildRequestExecuting |
|
||||
BuildRequestStatus::BuildRequestAnalysisCompleted
|
||||
))
|
||||
.count() as u32;
|
||||
|
||||
let recent_builds = builds_response.builds.into_iter()
|
||||
.map(|b| BuildRequestSummary {
|
||||
build_request_id: b.build_request_id,
|
||||
status: BuildRequestStatus::try_from(b.status_code).unwrap_or(BuildRequestStatus::BuildRequestUnknown),
|
||||
requested_partitions: b.requested_partitions.into_iter().map(|p| p.str).collect(),
|
||||
created_at: b.requested_at,
|
||||
updated_at: b.completed_at.unwrap_or(b.requested_at),
|
||||
})
|
||||
.collect();
|
||||
|
||||
// For partitions, we'd need a separate implementation
|
||||
let recent_partitions = Vec::new(); // TODO: Implement partition listing
|
||||
|
||||
Ok(ActivitySummary {
|
||||
active_builds_count,
|
||||
recent_builds,
|
||||
recent_partitions,
|
||||
total_partitions_count: 0, // TODO: Implement
|
||||
})
|
||||
}
|
||||
|
||||
/// Helper to aggregate partition status from events
|
||||
fn aggregate_partition_status(&self, events: &[BuildEvent]) -> Result<Option<(PartitionStatus, i64)>> {
|
||||
let mut latest_status = None;
|
||||
let mut latest_timestamp = 0i64;
|
||||
|
||||
// Look for the most recent partition event for this partition
|
||||
for event in events {
|
||||
if let Some(crate::build_event::EventType::PartitionEvent(p_event)) = &event.event_type {
|
||||
if event.timestamp >= latest_timestamp {
|
||||
if let Ok(status) = PartitionStatus::try_from(p_event.status_code) {
|
||||
latest_status = Some(status);
|
||||
latest_timestamp = event.timestamp;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(latest_status.map(|status| (status, latest_timestamp)))
|
||||
}
|
||||
|
||||
/// Get build request ID that created an available partition
|
||||
pub async fn get_build_request_for_available_partition(&self, partition_ref: &str) -> Result<Option<String>> {
|
||||
let filter = EventFilter {
|
||||
partition_refs: vec![partition_ref.to_string()],
|
||||
partition_patterns: vec![],
|
||||
job_labels: vec![],
|
||||
job_run_ids: vec![],
|
||||
build_request_ids: vec![],
|
||||
};
|
||||
|
||||
let events = self.storage.list_events(0, filter).await?;
|
||||
|
||||
// Find the most recent PARTITION_AVAILABLE event
|
||||
let mut latest_available_build_id = None;
|
||||
let mut latest_timestamp = 0i64;
|
||||
|
||||
for event in events.events {
|
||||
if let Some(crate::build_event::EventType::PartitionEvent(p_event)) = &event.event_type {
|
||||
if let Some(partition_event_ref) = &p_event.partition_ref {
|
||||
if partition_event_ref.str == partition_ref {
|
||||
if let Ok(status) = PartitionStatus::try_from(p_event.status_code) {
|
||||
if status == PartitionStatus::PartitionAvailable && event.timestamp >= latest_timestamp {
|
||||
latest_available_build_id = Some(event.build_request_id.clone());
|
||||
latest_timestamp = event.timestamp;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(latest_available_build_id)
|
||||
}
|
||||
|
||||
/// Append an event to storage
|
||||
pub async fn append_event(&self, event: BuildEvent) -> Result<i64> {
|
||||
self.storage.append_event(event).await
|
||||
}
|
||||
|
||||
/// Get all events for a specific partition
|
||||
pub async fn get_partition_events(&self, partition_ref: &str, _limit: Option<u32>) -> Result<Vec<BuildEvent>> {
|
||||
let filter = EventFilter {
|
||||
partition_refs: vec![partition_ref.to_string()],
|
||||
partition_patterns: vec![],
|
||||
job_labels: vec![],
|
||||
job_run_ids: vec![],
|
||||
build_request_ids: vec![],
|
||||
};
|
||||
|
||||
let events = self.storage.list_events(0, filter).await?;
|
||||
Ok(events.events)
|
||||
}
|
||||
|
||||
/// Execute a raw SQL query (for backwards compatibility)
|
||||
pub async fn execute_query(&self, _query: &str) -> Result<QueryResult> {
|
||||
// TODO: Implement SQL query execution if needed
|
||||
// For now, return empty result to avoid compilation errors
|
||||
Ok(QueryResult {
|
||||
columns: vec![],
|
||||
rows: vec![],
|
||||
})
|
||||
}
|
||||
|
||||
/// Get all events in a timestamp range
|
||||
pub async fn get_events_in_range(&self, _start: i64, _end: i64) -> Result<Vec<BuildEvent>> {
|
||||
// TODO: Implement range filtering
|
||||
// For now, get all events
|
||||
let filter = EventFilter {
|
||||
partition_refs: vec![],
|
||||
partition_patterns: vec![],
|
||||
job_labels: vec![],
|
||||
job_run_ids: vec![],
|
||||
build_request_ids: vec![],
|
||||
};
|
||||
|
||||
let events = self.storage.list_events(0, filter).await?;
|
||||
Ok(events.events)
|
||||
}
|
||||
|
||||
/// Get all events for a specific job run
|
||||
pub async fn get_job_run_events(&self, job_run_id: &str) -> Result<Vec<BuildEvent>> {
|
||||
let filter = EventFilter {
|
||||
partition_refs: vec![],
|
||||
partition_patterns: vec![],
|
||||
job_labels: vec![],
|
||||
job_run_ids: vec![job_run_id.to_string()],
|
||||
build_request_ids: vec![],
|
||||
};
|
||||
|
||||
let events = self.storage.list_events(0, filter).await?;
|
||||
Ok(events.events)
|
||||
}
|
||||
|
||||
/// Get all events for a specific build request
|
||||
pub async fn get_build_request_events(&self, build_request_id: &str, _limit: Option<u32>) -> Result<Vec<BuildEvent>> {
|
||||
let filter = EventFilter {
|
||||
partition_refs: vec![],
|
||||
partition_patterns: vec![],
|
||||
job_labels: vec![],
|
||||
job_run_ids: vec![],
|
||||
build_request_ids: vec![build_request_id.to_string()],
|
||||
};
|
||||
|
||||
let events = self.storage.list_events(0, filter).await?;
|
||||
Ok(events.events)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
|
@ -1,154 +0,0 @@
|
|||
use super::*;
|
||||
use super::storage::BELStorage;
|
||||
use async_trait::async_trait;
|
||||
use rusqlite::{params, Connection};
|
||||
use std::path::Path;
|
||||
use std::sync::{Arc, Mutex};
|
||||
|
||||
pub struct SqliteBELStorage {
|
||||
connection: Arc<Mutex<Connection>>,
|
||||
}
|
||||
|
||||
impl SqliteBELStorage {
|
||||
pub fn new(path: &str) -> Result<Self> {
|
||||
// Create parent directory if it doesn't exist
|
||||
if let Some(parent) = Path::new(path).parent() {
|
||||
std::fs::create_dir_all(parent)
|
||||
.map_err(|e| BuildEventLogError::ConnectionError(
|
||||
format!("Failed to create directory {}: {}", parent.display(), e)
|
||||
))?;
|
||||
}
|
||||
|
||||
let conn = Connection::open(path)
|
||||
.map_err(|e| BuildEventLogError::ConnectionError(e.to_string()))?;
|
||||
|
||||
Ok(Self {
|
||||
connection: Arc::new(Mutex::new(conn)),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl BELStorage for SqliteBELStorage {
|
||||
async fn append_event(&self, event: BuildEvent) -> Result<i64> {
|
||||
let serialized = serde_json::to_string(&event)
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
|
||||
let conn = self.connection.lock().unwrap();
|
||||
let _row_id = conn.execute(
|
||||
"INSERT INTO build_events (event_data) VALUES (?)",
|
||||
params![serialized],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
|
||||
Ok(conn.last_insert_rowid())
|
||||
}
|
||||
|
||||
async fn list_events(&self, since_idx: i64, filter: EventFilter) -> Result<EventPage> {
|
||||
let conn = self.connection.lock().unwrap();
|
||||
|
||||
// For simplicity in the initial implementation, we'll do basic filtering
|
||||
// More sophisticated JSON path filtering can be added later if needed
|
||||
let mut query = "SELECT rowid, event_data FROM build_events WHERE rowid > ?".to_string();
|
||||
let mut params_vec = vec![since_idx.to_string()];
|
||||
|
||||
// Add build request ID filter if provided
|
||||
if !filter.build_request_ids.is_empty() {
|
||||
query.push_str(" AND (");
|
||||
for (i, build_id) in filter.build_request_ids.iter().enumerate() {
|
||||
if i > 0 { query.push_str(" OR "); }
|
||||
query.push_str("JSON_EXTRACT(event_data, '$.build_request_id') = ?");
|
||||
params_vec.push(build_id.clone());
|
||||
}
|
||||
query.push_str(")");
|
||||
}
|
||||
|
||||
// Add ordering and pagination
|
||||
query.push_str(" ORDER BY rowid ASC LIMIT 1000");
|
||||
|
||||
let mut stmt = conn.prepare(&query)
|
||||
.map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
// Convert params to rusqlite params
|
||||
let param_refs: Vec<&dyn rusqlite::ToSql> = params_vec.iter()
|
||||
.map(|p| p as &dyn rusqlite::ToSql)
|
||||
.collect();
|
||||
|
||||
let rows = stmt.query_map(¶m_refs[..], |row| {
|
||||
let rowid: i64 = row.get(0)?;
|
||||
let event_data: String = row.get(1)?;
|
||||
Ok((rowid, event_data))
|
||||
}).map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let mut events = Vec::new();
|
||||
let mut max_idx = since_idx;
|
||||
|
||||
for row in rows {
|
||||
let (rowid, event_data) = row.map_err(|e| BuildEventLogError::QueryError(e.to_string()))?;
|
||||
|
||||
let event: BuildEvent = serde_json::from_str(&event_data)
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
|
||||
// Apply additional filtering in memory for now
|
||||
let mut include_event = true;
|
||||
|
||||
if !filter.partition_refs.is_empty() {
|
||||
include_event = false;
|
||||
if let Some(event_type) = &event.event_type {
|
||||
if let crate::build_event::EventType::PartitionEvent(pe) = event_type {
|
||||
if let Some(partition_ref) = &pe.partition_ref {
|
||||
if filter.partition_refs.contains(&partition_ref.str) {
|
||||
include_event = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if !filter.job_run_ids.is_empty() && include_event {
|
||||
include_event = false;
|
||||
if let Some(event_type) = &event.event_type {
|
||||
if let crate::build_event::EventType::JobEvent(je) = event_type {
|
||||
if filter.job_run_ids.contains(&je.job_run_id) {
|
||||
include_event = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if include_event {
|
||||
events.push(event);
|
||||
max_idx = rowid;
|
||||
}
|
||||
}
|
||||
|
||||
let has_more = events.len() >= 1000; // If we got the max limit, there might be more
|
||||
|
||||
Ok(EventPage {
|
||||
events,
|
||||
next_idx: max_idx,
|
||||
has_more,
|
||||
})
|
||||
}
|
||||
|
||||
async fn initialize(&self) -> Result<()> {
|
||||
let conn = self.connection.lock().unwrap();
|
||||
|
||||
conn.execute(
|
||||
"CREATE TABLE IF NOT EXISTS build_events (
|
||||
rowid INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
event_data TEXT NOT NULL
|
||||
)",
|
||||
[],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
|
||||
// Create index for efficient JSON queries
|
||||
conn.execute(
|
||||
"CREATE INDEX IF NOT EXISTS idx_build_request_id ON build_events(
|
||||
JSON_EXTRACT(event_data, '$.build_request_id')
|
||||
)",
|
||||
[],
|
||||
).map_err(|e| BuildEventLogError::DatabaseError(e.to_string()))?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
|
@ -1,75 +0,0 @@
|
|||
use crate::*;
|
||||
use async_trait::async_trait;
|
||||
use super::Result;
|
||||
|
||||
/// Simple stdout storage backend for debugging
|
||||
pub struct StdoutBELStorage;
|
||||
|
||||
impl StdoutBELStorage {
|
||||
pub fn new() -> Self {
|
||||
Self
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl BELStorage for StdoutBELStorage {
|
||||
async fn append_event(&self, event: BuildEvent) -> Result<i64> {
|
||||
let json = serde_json::to_string(&event)
|
||||
.map_err(|e| BuildEventLogError::SerializationError(e.to_string()))?;
|
||||
|
||||
println!("BUILD_EVENT: {}", json);
|
||||
Ok(0) // Return dummy index for stdout
|
||||
}
|
||||
|
||||
async fn list_events(&self, _since_idx: i64, _filter: EventFilter) -> Result<EventPage> {
|
||||
// Stdout implementation doesn't support querying
|
||||
Err(BuildEventLogError::QueryError(
|
||||
"Stdout storage backend doesn't support querying".to_string()
|
||||
))
|
||||
}
|
||||
|
||||
async fn initialize(&self) -> Result<()> {
|
||||
Ok(()) // Nothing to initialize for stdout
|
||||
}
|
||||
}
|
||||
|
||||
/// Minimal append-only interface optimized for sequential scanning
|
||||
#[async_trait]
|
||||
pub trait BELStorage: Send + Sync {
|
||||
/// Append a single event, returns the sequential index
|
||||
async fn append_event(&self, event: BuildEvent) -> Result<i64>;
|
||||
|
||||
/// List events with filtering, starting from a given index
|
||||
async fn list_events(&self, since_idx: i64, filter: EventFilter) -> Result<EventPage>;
|
||||
|
||||
/// Initialize storage backend (create tables, etc.)
|
||||
async fn initialize(&self) -> Result<()>;
|
||||
}
|
||||
|
||||
/// Factory function to create storage backends from URI
|
||||
pub async fn create_bel_storage(uri: &str) -> Result<Box<dyn BELStorage>> {
|
||||
if uri == "stdout" {
|
||||
Ok(Box::new(StdoutBELStorage::new()))
|
||||
} else if uri.starts_with("sqlite://") {
|
||||
let path = &uri[9..]; // Remove "sqlite://" prefix
|
||||
let storage = crate::event_log::sqlite_storage::SqliteBELStorage::new(path)?;
|
||||
storage.initialize().await?;
|
||||
Ok(Box::new(storage))
|
||||
} else if uri.starts_with("postgres://") {
|
||||
// TODO: Implement PostgresBELStorage
|
||||
Err(BuildEventLogError::ConnectionError(
|
||||
"PostgreSQL storage backend not yet implemented".to_string()
|
||||
))
|
||||
} else {
|
||||
Err(BuildEventLogError::ConnectionError(
|
||||
format!("Unsupported build event log URI: {}", uri)
|
||||
))
|
||||
}
|
||||
}
|
||||
|
||||
/// Factory function to create query engine from URI
|
||||
pub async fn create_bel_query_engine(uri: &str) -> Result<std::sync::Arc<crate::event_log::query_engine::BELQueryEngine>> {
|
||||
let storage = create_bel_storage(uri).await?;
|
||||
let storage_arc = std::sync::Arc::from(storage);
|
||||
Ok(std::sync::Arc::new(crate::event_log::query_engine::BELQueryEngine::new(storage_arc)))
|
||||
}
|
||||
|
|
@ -1,457 +0,0 @@
|
|||
use crate::*;
|
||||
use crate::event_log::{BuildEventLogError, Result, create_build_event, current_timestamp_nanos, generate_event_id, query_engine::BELQueryEngine};
|
||||
use std::sync::Arc;
|
||||
use log::debug;
|
||||
|
||||
/// Common interface for writing events to the build event log with validation
|
||||
pub struct EventWriter {
|
||||
query_engine: Arc<BELQueryEngine>,
|
||||
}
|
||||
|
||||
impl EventWriter {
|
||||
/// Create a new EventWriter with the specified query engine
|
||||
pub fn new(query_engine: Arc<BELQueryEngine>) -> Self {
|
||||
Self { query_engine }
|
||||
}
|
||||
|
||||
/// Append an event directly to the event log
|
||||
pub async fn append_event(&self, event: BuildEvent) -> Result<()> {
|
||||
self.query_engine.append_event(event).await.map(|_| ())
|
||||
}
|
||||
|
||||
/// Get access to the underlying query engine for direct operations
|
||||
pub fn query_engine(&self) -> &BELQueryEngine {
|
||||
self.query_engine.as_ref()
|
||||
}
|
||||
|
||||
/// Request a new build for the specified partitions
|
||||
pub async fn request_build(
|
||||
&self,
|
||||
build_request_id: String,
|
||||
requested_partitions: Vec<PartitionRef>,
|
||||
) -> Result<()> {
|
||||
debug!("Writing build request event for build: {}", build_request_id);
|
||||
|
||||
let event = create_build_event(
|
||||
build_request_id,
|
||||
build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestReceived as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestReceived.to_display_string(),
|
||||
requested_partitions,
|
||||
message: "Build request received".to_string(),
|
||||
}),
|
||||
);
|
||||
|
||||
self.query_engine.append_event(event).await.map(|_| ())
|
||||
}
|
||||
|
||||
/// Update build request status
|
||||
pub async fn update_build_status(
|
||||
&self,
|
||||
build_request_id: String,
|
||||
status: BuildRequestStatus,
|
||||
message: String,
|
||||
) -> Result<()> {
|
||||
debug!("Updating build status for {}: {:?}", build_request_id, status);
|
||||
|
||||
let event = create_build_event(
|
||||
build_request_id,
|
||||
build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: status as i32,
|
||||
status_name: status.to_display_string(),
|
||||
requested_partitions: vec![],
|
||||
message,
|
||||
}),
|
||||
);
|
||||
|
||||
self.query_engine.append_event(event).await.map(|_| ())
|
||||
}
|
||||
|
||||
/// Update build request status with partition list
|
||||
pub async fn update_build_status_with_partitions(
|
||||
&self,
|
||||
build_request_id: String,
|
||||
status: BuildRequestStatus,
|
||||
requested_partitions: Vec<PartitionRef>,
|
||||
message: String,
|
||||
) -> Result<()> {
|
||||
debug!("Updating build status for {}: {:?}", build_request_id, status);
|
||||
|
||||
let event = create_build_event(
|
||||
build_request_id,
|
||||
build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: status as i32,
|
||||
status_name: status.to_display_string(),
|
||||
requested_partitions,
|
||||
message,
|
||||
}),
|
||||
);
|
||||
|
||||
self.query_engine.append_event(event).await.map(|_| ())
|
||||
}
|
||||
|
||||
/// Update partition status
|
||||
pub async fn update_partition_status(
|
||||
&self,
|
||||
build_request_id: String,
|
||||
partition_ref: PartitionRef,
|
||||
status: PartitionStatus,
|
||||
message: String,
|
||||
job_run_id: Option<String>,
|
||||
) -> Result<()> {
|
||||
debug!("Updating partition status for {}: {:?}", partition_ref.str, status);
|
||||
|
||||
let event = BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id,
|
||||
event_type: Some(build_event::EventType::PartitionEvent(PartitionEvent {
|
||||
partition_ref: Some(partition_ref),
|
||||
status_code: status as i32,
|
||||
status_name: status.to_display_string(),
|
||||
message,
|
||||
job_run_id: job_run_id.unwrap_or_default(),
|
||||
})),
|
||||
};
|
||||
|
||||
self.query_engine.append_event(event).await.map(|_| ())
|
||||
}
|
||||
|
||||
/// Invalidate a partition with a reason
|
||||
pub async fn invalidate_partition(
|
||||
&self,
|
||||
build_request_id: String,
|
||||
partition_ref: PartitionRef,
|
||||
reason: String,
|
||||
) -> Result<()> {
|
||||
// First validate that the partition exists by checking its current status
|
||||
let current_status = self.query_engine.get_latest_partition_status(&partition_ref.str).await?;
|
||||
|
||||
if current_status.is_none() {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Cannot invalidate non-existent partition: {}", partition_ref.str)
|
||||
));
|
||||
}
|
||||
|
||||
let event = BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id,
|
||||
event_type: Some(build_event::EventType::PartitionInvalidationEvent(
|
||||
PartitionInvalidationEvent {
|
||||
partition_ref: Some(partition_ref),
|
||||
reason,
|
||||
}
|
||||
)),
|
||||
};
|
||||
|
||||
self.query_engine.append_event(event).await.map(|_| ())
|
||||
}
|
||||
|
||||
/// Schedule a job for execution
|
||||
pub async fn schedule_job(
|
||||
&self,
|
||||
build_request_id: String,
|
||||
job_run_id: String,
|
||||
job_label: JobLabel,
|
||||
target_partitions: Vec<PartitionRef>,
|
||||
config: JobConfig,
|
||||
) -> Result<()> {
|
||||
debug!("Scheduling job {} for partitions: {:?}", job_label.label, target_partitions);
|
||||
|
||||
let event = BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id,
|
||||
event_type: Some(build_event::EventType::JobEvent(JobEvent {
|
||||
job_run_id,
|
||||
job_label: Some(job_label),
|
||||
target_partitions,
|
||||
status_code: JobStatus::JobScheduled as i32,
|
||||
status_name: JobStatus::JobScheduled.to_display_string(),
|
||||
message: "Job scheduled for execution".to_string(),
|
||||
config: Some(config),
|
||||
manifests: vec![],
|
||||
})),
|
||||
};
|
||||
|
||||
self.query_engine.append_event(event).await.map(|_| ())
|
||||
}
|
||||
|
||||
/// Update job status
|
||||
pub async fn update_job_status(
|
||||
&self,
|
||||
build_request_id: String,
|
||||
job_run_id: String,
|
||||
job_label: JobLabel,
|
||||
target_partitions: Vec<PartitionRef>,
|
||||
status: JobStatus,
|
||||
message: String,
|
||||
manifests: Vec<PartitionManifest>,
|
||||
) -> Result<()> {
|
||||
debug!("Updating job {} status to {:?}", job_run_id, status);
|
||||
|
||||
let event = BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id,
|
||||
event_type: Some(build_event::EventType::JobEvent(JobEvent {
|
||||
job_run_id,
|
||||
job_label: Some(job_label),
|
||||
target_partitions,
|
||||
status_code: status as i32,
|
||||
status_name: status.to_display_string(),
|
||||
message,
|
||||
config: None,
|
||||
manifests,
|
||||
})),
|
||||
};
|
||||
|
||||
self.query_engine.append_event(event).await.map(|_| ())
|
||||
}
|
||||
|
||||
/// Cancel a task (job run) with a reason
|
||||
pub async fn cancel_task(
|
||||
&self,
|
||||
build_request_id: String,
|
||||
job_run_id: String,
|
||||
reason: String,
|
||||
) -> Result<()> {
|
||||
// Validate that the job run exists and is in a cancellable state
|
||||
let job_events = self.query_engine.get_job_run_events(&job_run_id).await?;
|
||||
|
||||
if job_events.is_empty() {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Cannot cancel non-existent job run: {}", job_run_id)
|
||||
));
|
||||
}
|
||||
|
||||
// Find the latest job status
|
||||
let latest_status = job_events.iter()
|
||||
.rev()
|
||||
.find_map(|e| match &e.event_type {
|
||||
Some(build_event::EventType::JobEvent(job)) => Some(job.status_code),
|
||||
_ => None,
|
||||
});
|
||||
|
||||
match latest_status {
|
||||
Some(status) if status == JobStatus::JobCompleted as i32 => {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Cannot cancel completed job run: {}", job_run_id)
|
||||
));
|
||||
}
|
||||
Some(status) if status == JobStatus::JobFailed as i32 => {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Cannot cancel failed job run: {}", job_run_id)
|
||||
));
|
||||
}
|
||||
Some(status) if status == JobStatus::JobCancelled as i32 => {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Job run already cancelled: {}", job_run_id)
|
||||
));
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
|
||||
let event = BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id,
|
||||
event_type: Some(build_event::EventType::JobRunCancelEvent(JobRunCancelEvent {
|
||||
job_run_id,
|
||||
reason,
|
||||
})),
|
||||
};
|
||||
|
||||
self.query_engine.append_event(event).await.map(|_| ())
|
||||
}
|
||||
|
||||
/// Cancel a build request with a reason
|
||||
pub async fn cancel_build(
|
||||
&self,
|
||||
build_request_id: String,
|
||||
reason: String,
|
||||
) -> Result<()> {
|
||||
// Validate that the build exists and is in a cancellable state
|
||||
let build_events = self.query_engine.get_build_request_events(&build_request_id, None).await?;
|
||||
|
||||
if build_events.is_empty() {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Cannot cancel non-existent build: {}", build_request_id)
|
||||
));
|
||||
}
|
||||
|
||||
// Find the latest build status
|
||||
let latest_status = build_events.iter()
|
||||
.rev()
|
||||
.find_map(|e| match &e.event_type {
|
||||
Some(build_event::EventType::BuildRequestEvent(br)) => Some(br.status_code),
|
||||
_ => None,
|
||||
});
|
||||
|
||||
match latest_status {
|
||||
Some(status) if status == BuildRequestStatus::BuildRequestCompleted as i32 => {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Cannot cancel completed build: {}", build_request_id)
|
||||
));
|
||||
}
|
||||
Some(status) if status == BuildRequestStatus::BuildRequestFailed as i32 => {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Cannot cancel failed build: {}", build_request_id)
|
||||
));
|
||||
}
|
||||
Some(status) if status == BuildRequestStatus::BuildRequestCancelled as i32 => {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Build already cancelled: {}", build_request_id)
|
||||
));
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
|
||||
let event = BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id: build_request_id.clone(),
|
||||
event_type: Some(build_event::EventType::BuildCancelEvent(BuildCancelEvent {
|
||||
reason,
|
||||
})),
|
||||
};
|
||||
|
||||
self.query_engine.append_event(event).await.map(|_| ())?;
|
||||
|
||||
// Also emit a build request status update
|
||||
self.update_build_status(
|
||||
build_request_id,
|
||||
BuildRequestStatus::BuildRequestCancelled,
|
||||
"Build cancelled by user".to_string(),
|
||||
).await
|
||||
}
|
||||
|
||||
/// Record a delegation event when a partition build is delegated to another build
|
||||
pub async fn record_delegation(
|
||||
&self,
|
||||
build_request_id: String,
|
||||
partition_ref: PartitionRef,
|
||||
delegated_to_build_request_id: String,
|
||||
message: String,
|
||||
) -> Result<()> {
|
||||
debug!("Recording delegation of {} to build {}", partition_ref.str, delegated_to_build_request_id);
|
||||
|
||||
let event = create_build_event(
|
||||
build_request_id,
|
||||
build_event::EventType::DelegationEvent(DelegationEvent {
|
||||
partition_ref: Some(partition_ref),
|
||||
delegated_to_build_request_id,
|
||||
message,
|
||||
}),
|
||||
);
|
||||
|
||||
self.query_engine.append_event(event).await.map(|_| ())
|
||||
}
|
||||
|
||||
/// Record the analyzed job graph
|
||||
pub async fn record_job_graph(
|
||||
&self,
|
||||
build_request_id: String,
|
||||
job_graph: JobGraph,
|
||||
message: String,
|
||||
) -> Result<()> {
|
||||
debug!("Recording job graph for build: {}", build_request_id);
|
||||
|
||||
let event = BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id,
|
||||
event_type: Some(build_event::EventType::JobGraphEvent(JobGraphEvent {
|
||||
job_graph: Some(job_graph),
|
||||
message,
|
||||
})),
|
||||
};
|
||||
|
||||
self.query_engine.append_event(event).await.map(|_| ())
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::event_log::mock::create_mock_bel_query_engine;
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_event_writer_build_lifecycle() {
|
||||
let query_engine = create_mock_bel_query_engine().await.unwrap();
|
||||
let writer = EventWriter::new(query_engine);
|
||||
|
||||
let build_id = "test-build-123".to_string();
|
||||
let partitions = vec![PartitionRef { str: "test/partition".to_string() }];
|
||||
|
||||
// Test build request
|
||||
writer.request_build(build_id.clone(), partitions.clone()).await.unwrap();
|
||||
|
||||
// Test status updates
|
||||
writer.update_build_status(
|
||||
build_id.clone(),
|
||||
BuildRequestStatus::BuildRequestPlanning,
|
||||
"Starting planning".to_string(),
|
||||
).await.unwrap();
|
||||
|
||||
writer.update_build_status(
|
||||
build_id.clone(),
|
||||
BuildRequestStatus::BuildRequestExecuting,
|
||||
"Starting execution".to_string(),
|
||||
).await.unwrap();
|
||||
|
||||
writer.update_build_status(
|
||||
build_id.clone(),
|
||||
BuildRequestStatus::BuildRequestCompleted,
|
||||
"Build completed successfully".to_string(),
|
||||
).await.unwrap();
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_event_writer_partition_and_job() {
|
||||
let query_engine = create_mock_bel_query_engine().await.unwrap();
|
||||
let writer = EventWriter::new(query_engine);
|
||||
|
||||
let build_id = "test-build-456".to_string();
|
||||
let partition = PartitionRef { str: "data/users".to_string() };
|
||||
let job_run_id = "job-run-789".to_string();
|
||||
let job_label = JobLabel { label: "//:test_job".to_string() };
|
||||
|
||||
// Test partition status update
|
||||
writer.update_partition_status(
|
||||
build_id.clone(),
|
||||
partition.clone(),
|
||||
PartitionStatus::PartitionBuilding,
|
||||
"Building partition".to_string(),
|
||||
Some(job_run_id.clone()),
|
||||
).await.unwrap();
|
||||
|
||||
// Test job scheduling
|
||||
let config = JobConfig {
|
||||
outputs: vec![partition.clone()],
|
||||
inputs: vec![],
|
||||
args: vec!["test".to_string()],
|
||||
env: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
writer.schedule_job(
|
||||
build_id.clone(),
|
||||
job_run_id.clone(),
|
||||
job_label.clone(),
|
||||
vec![partition.clone()],
|
||||
config,
|
||||
).await.unwrap();
|
||||
|
||||
// Test job status update
|
||||
writer.update_job_status(
|
||||
build_id.clone(),
|
||||
job_run_id,
|
||||
job_label,
|
||||
vec![partition],
|
||||
JobStatus::JobCompleted,
|
||||
"Job completed successfully".to_string(),
|
||||
vec![],
|
||||
).await.unwrap();
|
||||
}
|
||||
}
|
||||
269
databuild/event_transforms.rs
Normal file
269
databuild/event_transforms.rs
Normal file
|
|
@ -0,0 +1,269 @@
|
|||
use crate::PartitionStatusCode::{PartitionFailed, PartitionLive};
|
||||
use crate::data_build_event::Event;
|
||||
use crate::job_run_state::{JobInfo, JobRunWithState, QueuedState, TimingInfo};
|
||||
use crate::util::current_timestamp;
|
||||
use crate::want_create_event_v1::Lifetime;
|
||||
use crate::{
|
||||
CancelWantRequest, CancelWantResponse, CreateTaintRequest, CreateTaintResponse,
|
||||
CreateWantRequest, CreateWantResponse, EventSource, GetWantResponse, JobRunBufferEventV1,
|
||||
JobRunDetail, JobRunStatus, JobRunStatusCode, JobTriggeredEvent, ManuallyTriggeredEvent,
|
||||
OriginatingLifetime, PartitionDetail, PartitionRef, PartitionStatus, PartitionStatusCode,
|
||||
TaintCancelEventV1, TaintCreateEventV1, TaintDetail, WantAttributedPartitions,
|
||||
WantCancelEventV1, WantCreateEventV1, WantDetail, WantStatus, WantStatusCode, event_source,
|
||||
};
|
||||
use uuid::Uuid;
|
||||
|
||||
impl From<&WantCreateEventV1> for WantDetail {
|
||||
fn from(e: &WantCreateEventV1) -> Self {
|
||||
e.clone().into()
|
||||
}
|
||||
}
|
||||
impl From<WantCreateEventV1> for WantDetail {
|
||||
fn from(e: WantCreateEventV1) -> Self {
|
||||
// Convert want_create_event_v1::Lifetime to want_detail::Lifetime
|
||||
let lifetime = e.lifetime.map(|l| match l {
|
||||
Lifetime::Originating(orig) => crate::want_detail::Lifetime::Originating(orig),
|
||||
Lifetime::Ephemeral(eph) => crate::want_detail::Lifetime::Ephemeral(eph),
|
||||
});
|
||||
|
||||
WantDetail {
|
||||
want_id: e.want_id,
|
||||
partitions: e.partitions,
|
||||
upstreams: vec![],
|
||||
lifetime,
|
||||
comment: e.comment,
|
||||
status: Some(WantStatusCode::WantIdle.into()),
|
||||
last_updated_timestamp: current_timestamp(),
|
||||
job_run_ids: vec![],
|
||||
derivative_want_ids: vec![],
|
||||
job_runs: vec![],
|
||||
}
|
||||
}
|
||||
}
|
||||
impl From<WantCreateEventV1> for Event {
|
||||
fn from(value: WantCreateEventV1) -> Self {
|
||||
Event::WantCreateV1(value)
|
||||
}
|
||||
}
|
||||
impl From<WantCancelEventV1> for Event {
|
||||
fn from(value: WantCancelEventV1) -> Self {
|
||||
Event::WantCancelV1(value)
|
||||
}
|
||||
}
|
||||
impl From<TaintCreateEventV1> for Event {
|
||||
fn from(value: TaintCreateEventV1) -> Self {
|
||||
Event::TaintCreateV1(value)
|
||||
}
|
||||
}
|
||||
impl From<TaintCancelEventV1> for Event {
|
||||
fn from(value: TaintCancelEventV1) -> Self {
|
||||
Event::TaintCancelV1(value)
|
||||
}
|
||||
}
|
||||
|
||||
impl From<WantCreateEventV1> for WantAttributedPartitions {
|
||||
fn from(value: WantCreateEventV1) -> Self {
|
||||
Self {
|
||||
want_id: value.want_id,
|
||||
partitions: value.partitions,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<WantStatusCode> for WantStatus {
|
||||
fn from(code: WantStatusCode) -> Self {
|
||||
WantStatus {
|
||||
code: code.into(),
|
||||
name: code.as_str_name().to_string(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<JobRunBufferEventV1> for JobRunDetail {
|
||||
fn from(value: JobRunBufferEventV1) -> Self {
|
||||
use std::collections::HashMap;
|
||||
Self {
|
||||
id: value.job_run_id,
|
||||
job_label: value.job_label,
|
||||
status: Some(JobRunStatusCode::JobRunQueued.into()),
|
||||
last_heartbeat_at: None,
|
||||
building_partitions: value.building_partitions,
|
||||
servicing_wants: value.want_attributed_partitions,
|
||||
read_deps: vec![],
|
||||
read_partition_uuids: HashMap::new(),
|
||||
wrote_partition_uuids: HashMap::new(),
|
||||
derivative_want_ids: vec![],
|
||||
queued_at: Some(current_timestamp()),
|
||||
started_at: None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<JobRunBufferEventV1> for JobRunWithState<QueuedState> {
|
||||
fn from(event: JobRunBufferEventV1) -> Self {
|
||||
let queued_at = current_timestamp();
|
||||
JobRunWithState {
|
||||
info: JobInfo {
|
||||
id: event.job_run_id,
|
||||
job_label: event.job_label,
|
||||
building_partitions: event.building_partitions,
|
||||
servicing_wants: event.want_attributed_partitions,
|
||||
},
|
||||
timing: TimingInfo {
|
||||
queued_at,
|
||||
started_at: None,
|
||||
},
|
||||
state: QueuedState { queued_at },
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn want_status_matches_any(
|
||||
pds: &Vec<Option<PartitionDetail>>,
|
||||
status: PartitionStatusCode,
|
||||
) -> bool {
|
||||
pds.iter().any(|pd| {
|
||||
pd.clone()
|
||||
.map(|pd| pd.status == Some(status.into()))
|
||||
.unwrap_or(false)
|
||||
})
|
||||
}
|
||||
|
||||
pub fn want_status_matches_all(
|
||||
pds: &Vec<Option<PartitionDetail>>,
|
||||
status: PartitionStatusCode,
|
||||
) -> bool {
|
||||
pds.iter().all(|pd| {
|
||||
pd.clone()
|
||||
.map(|pd| pd.status == Some(status.into()))
|
||||
.unwrap_or(false)
|
||||
})
|
||||
}
|
||||
|
||||
/// Merges a list of partition details into a single status code.
|
||||
/// Takes the lowest state as the want status.
|
||||
impl Into<WantStatusCode> for Vec<Option<PartitionDetail>> {
|
||||
fn into(self) -> WantStatusCode {
|
||||
if want_status_matches_any(&self, PartitionFailed) {
|
||||
WantStatusCode::WantFailed
|
||||
} else if want_status_matches_all(&self, PartitionLive) {
|
||||
WantStatusCode::WantSuccessful
|
||||
} else if self.iter().any(|pd| pd.is_none()) {
|
||||
WantStatusCode::WantBuilding
|
||||
} else {
|
||||
WantStatusCode::WantIdle
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<&str> for PartitionRef {
|
||||
fn from(value: &str) -> Self {
|
||||
Self {
|
||||
r#ref: value.to_string(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<PartitionStatusCode> for PartitionStatus {
|
||||
fn from(code: PartitionStatusCode) -> Self {
|
||||
PartitionStatus {
|
||||
code: code.into(),
|
||||
name: code.as_str_name().to_string(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<JobRunStatusCode> for JobRunStatus {
|
||||
fn from(code: JobRunStatusCode) -> Self {
|
||||
JobRunStatus {
|
||||
code: code.into(),
|
||||
name: code.as_str_name().to_string(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<ManuallyTriggeredEvent> for EventSource {
|
||||
fn from(value: ManuallyTriggeredEvent) -> Self {
|
||||
Self {
|
||||
source: Some(event_source::Source::ManuallyTriggered(value)),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<JobTriggeredEvent> for EventSource {
|
||||
fn from(value: JobTriggeredEvent) -> Self {
|
||||
Self {
|
||||
source: Some(event_source::Source::JobTriggered(value)),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<&WantDetail> for WantAttributedPartitions {
|
||||
fn from(value: &WantDetail) -> Self {
|
||||
Self {
|
||||
want_id: value.want_id.clone(),
|
||||
partitions: value.partitions.clone(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<CreateWantRequest> for WantCreateEventV1 {
|
||||
fn from(value: CreateWantRequest) -> Self {
|
||||
// User-created wants are always originating (have explicit freshness requirements)
|
||||
WantCreateEventV1 {
|
||||
want_id: Uuid::new_v4().into(),
|
||||
partitions: value.partitions,
|
||||
lifetime: Some(Lifetime::Originating(OriginatingLifetime {
|
||||
data_timestamp: value.data_timestamp,
|
||||
ttl_seconds: value.ttl_seconds,
|
||||
sla_seconds: value.sla_seconds,
|
||||
})),
|
||||
comment: value.comment,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Into<CreateWantResponse> for Option<WantDetail> {
|
||||
fn into(self) -> CreateWantResponse {
|
||||
CreateWantResponse { data: self }
|
||||
}
|
||||
}
|
||||
|
||||
impl Into<GetWantResponse> for Option<WantDetail> {
|
||||
fn into(self) -> GetWantResponse {
|
||||
GetWantResponse {
|
||||
data: self,
|
||||
index: None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<CancelWantRequest> for WantCancelEventV1 {
|
||||
fn from(value: CancelWantRequest) -> Self {
|
||||
WantCancelEventV1 {
|
||||
want_id: value.want_id,
|
||||
source: value.source,
|
||||
comment: value.comment,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Into<CancelWantResponse> for Option<WantDetail> {
|
||||
fn into(self) -> CancelWantResponse {
|
||||
CancelWantResponse { data: self }
|
||||
}
|
||||
}
|
||||
|
||||
impl From<CreateTaintRequest> for TaintCreateEventV1 {
|
||||
fn from(value: CreateTaintRequest) -> Self {
|
||||
todo!()
|
||||
}
|
||||
}
|
||||
|
||||
impl Into<CreateTaintResponse> for Option<TaintDetail> {
|
||||
fn into(self) -> CreateTaintResponse {
|
||||
CreateTaintResponse {
|
||||
// TODO
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,144 +0,0 @@
|
|||
#[cfg(test)]
|
||||
mod format_consistency_tests {
|
||||
use super::*;
|
||||
use crate::*;
|
||||
use crate::repositories::partitions::PartitionsRepository;
|
||||
use crate::event_log::mock::{create_mock_bel_query_engine_with_events, test_events};
|
||||
use std::sync::Arc;
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_partitions_list_json_format_consistency() {
|
||||
// Create test data
|
||||
let build_id = "test-build-123".to_string();
|
||||
let partition1 = PartitionRef { str: "data/users".to_string() };
|
||||
let partition2 = PartitionRef { str: "data/orders".to_string() };
|
||||
|
||||
let events = vec![
|
||||
test_events::build_request_received(Some(build_id.clone()), vec![partition1.clone(), partition2.clone()]),
|
||||
test_events::partition_status(Some(build_id.clone()), partition1.clone(), PartitionStatus::PartitionBuilding, None),
|
||||
test_events::partition_status(Some(build_id.clone()), partition1.clone(), PartitionStatus::PartitionAvailable, None),
|
||||
test_events::partition_status(Some(build_id.clone()), partition2.clone(), PartitionStatus::PartitionBuilding, None),
|
||||
test_events::partition_status(Some(build_id.clone()), partition2.clone(), PartitionStatus::PartitionFailed, None),
|
||||
];
|
||||
|
||||
let query_engine = create_mock_bel_query_engine_with_events(events).await.unwrap();
|
||||
let repository = PartitionsRepository::new(query_engine);
|
||||
|
||||
// Test the new unified protobuf format
|
||||
let request = PartitionsListRequest {
|
||||
limit: Some(10),
|
||||
offset: None,
|
||||
status_filter: None,
|
||||
};
|
||||
|
||||
let response = repository.list_protobuf(request).await.unwrap();
|
||||
|
||||
// Serialize to JSON and verify structure
|
||||
let json_value = serde_json::to_value(&response).unwrap();
|
||||
|
||||
// Verify top-level structure matches expected protobuf schema
|
||||
assert!(json_value.get("partitions").is_some());
|
||||
assert!(json_value.get("total_count").is_some());
|
||||
assert!(json_value.get("has_more").is_some());
|
||||
|
||||
let partitions = json_value["partitions"].as_array().unwrap();
|
||||
assert_eq!(partitions.len(), 2);
|
||||
|
||||
// Verify each partition has dual status fields
|
||||
for partition in partitions {
|
||||
assert!(partition.get("partition_ref").is_some());
|
||||
assert!(partition.get("status_code").is_some(), "Missing status_code field");
|
||||
assert!(partition.get("status_name").is_some(), "Missing status_name field");
|
||||
assert!(partition.get("last_updated").is_some());
|
||||
assert!(partition.get("builds_count").is_some());
|
||||
assert!(partition.get("invalidation_count").is_some());
|
||||
|
||||
// Verify status fields are consistent
|
||||
let status_code = partition["status_code"].as_i64().unwrap();
|
||||
let status_name = partition["status_name"].as_str().unwrap();
|
||||
|
||||
// Map status codes to expected names
|
||||
let expected_name = match status_code {
|
||||
1 => "requested",
|
||||
2 => "analyzed",
|
||||
3 => "building",
|
||||
4 => "available",
|
||||
5 => "failed",
|
||||
6 => "delegated",
|
||||
_ => "unknown",
|
||||
};
|
||||
|
||||
// Find the partition by status to verify correct mapping
|
||||
if status_name == "available" {
|
||||
assert_eq!(status_code, 4, "Available status should have code 4");
|
||||
} else if status_name == "failed" {
|
||||
assert_eq!(status_code, 5, "Failed status should have code 5");
|
||||
}
|
||||
}
|
||||
|
||||
// Verify JSON serialization produces expected field names (snake_case for JSON)
|
||||
let json_str = serde_json::to_string_pretty(&response).unwrap();
|
||||
assert!(json_str.contains("\"partitions\""));
|
||||
assert!(json_str.contains("\"total_count\""));
|
||||
assert!(json_str.contains("\"has_more\""));
|
||||
assert!(json_str.contains("\"partition_ref\""));
|
||||
assert!(json_str.contains("\"status_code\""));
|
||||
assert!(json_str.contains("\"status_name\""));
|
||||
assert!(json_str.contains("\"last_updated\""));
|
||||
assert!(json_str.contains("\"builds_count\""));
|
||||
assert!(json_str.contains("\"invalidation_count\""));
|
||||
|
||||
println!("✅ Partitions list JSON format test passed");
|
||||
println!("Sample JSON output:\n{}", json_str);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_status_conversion_utilities() {
|
||||
use crate::status_utils::*;
|
||||
|
||||
// Test PartitionStatus conversions
|
||||
let status = PartitionStatus::PartitionAvailable;
|
||||
assert_eq!(status.to_display_string(), "available");
|
||||
assert_eq!(PartitionStatus::from_display_string("available"), Some(status));
|
||||
|
||||
// Test JobStatus conversions
|
||||
let job_status = JobStatus::JobCompleted;
|
||||
assert_eq!(job_status.to_display_string(), "completed");
|
||||
assert_eq!(JobStatus::from_display_string("completed"), Some(job_status));
|
||||
|
||||
// Test BuildRequestStatus conversions
|
||||
let build_status = BuildRequestStatus::BuildRequestCompleted;
|
||||
assert_eq!(build_status.to_display_string(), "completed");
|
||||
assert_eq!(BuildRequestStatus::from_display_string("completed"), Some(build_status));
|
||||
|
||||
// Test invalid conversions
|
||||
assert_eq!(PartitionStatus::from_display_string("invalid"), None);
|
||||
|
||||
println!("✅ Status conversion utilities test passed");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_protobuf_response_helper_functions() {
|
||||
use crate::status_utils::list_response_helpers::*;
|
||||
|
||||
// Test PartitionSummary creation
|
||||
let summary = create_partition_summary(
|
||||
PartitionRef { str: "test/partition".to_string() },
|
||||
PartitionStatus::PartitionAvailable,
|
||||
1234567890,
|
||||
5,
|
||||
2,
|
||||
Some("build-123".to_string()),
|
||||
);
|
||||
|
||||
assert_eq!(summary.partition_ref, Some(PartitionRef { str: "test/partition".to_string() }));
|
||||
assert_eq!(summary.status_code, 4); // PartitionAvailable = 4
|
||||
assert_eq!(summary.status_name, "available");
|
||||
assert_eq!(summary.last_updated, 1234567890);
|
||||
assert_eq!(summary.builds_count, 5);
|
||||
assert_eq!(summary.invalidation_count, 2);
|
||||
assert_eq!(summary.last_successful_build, Some("build-123".to_string()));
|
||||
|
||||
println!("✅ Protobuf response helper functions test passed");
|
||||
}
|
||||
}
|
||||
|
|
@ -1,43 +0,0 @@
|
|||
load("@rules_rust//rust:defs.bzl", "rust_binary", "rust_library")
|
||||
|
||||
exports_files([
|
||||
"rust_analyze_wrapper.sh.tpl",
|
||||
"rust_execute_wrapper.sh.tpl",
|
||||
])
|
||||
|
||||
rust_binary(
|
||||
name = "execute",
|
||||
srcs = ["execute.rs"],
|
||||
edition = "2021",
|
||||
visibility = ["//visibility:public"],
|
||||
deps = [
|
||||
"//databuild",
|
||||
"@crates//:clap",
|
||||
"@crates//:crossbeam-channel",
|
||||
"@crates//:log",
|
||||
"@crates//:serde",
|
||||
"@crates//:serde_json",
|
||||
"@crates//:simple_logger",
|
||||
"@crates//:tokio",
|
||||
"@crates//:uuid",
|
||||
],
|
||||
)
|
||||
|
||||
rust_binary(
|
||||
name = "analyze",
|
||||
srcs = ["analyze.rs"],
|
||||
edition = "2021",
|
||||
visibility = ["//visibility:public"],
|
||||
deps = [
|
||||
"//databuild",
|
||||
"@crates//:clap",
|
||||
"@crates//:crossbeam-channel",
|
||||
"@crates//:log",
|
||||
"@crates//:num_cpus",
|
||||
"@crates//:serde",
|
||||
"@crates//:serde_json",
|
||||
"@crates//:simple_logger",
|
||||
"@crates//:tokio",
|
||||
"@crates//:uuid",
|
||||
],
|
||||
)
|
||||
|
|
@ -1,10 +0,0 @@
|
|||
|
||||
# DataBuild Graph
|
||||
|
||||
## Entrypoints
|
||||
|
||||
- `graph.build` - Build the requested partitions.
|
||||
- `graph.analyze` - Calculate the `JobGraph` that would produce the requested partitions.
|
||||
- `graph.mermaid` - Calculate a [mermaid](https://mermaid.js.org/syntax/flowchart.html) diagram describing the `JobGraph`.
|
||||
- `graph.serve` - Run the databuild server for this graph.
|
||||
- `graph.image` / `graph.load` - Build a deployable graph artifact and wrap it in a container. `load` registers the container locally.
|
||||
|
|
@ -1,648 +0,0 @@
|
|||
use std::collections::{HashMap, HashSet};
|
||||
use std::env;
|
||||
use std::process::{Command, exit};
|
||||
use std::sync::{Arc, Mutex};
|
||||
use std::thread;
|
||||
use log::{info, error};
|
||||
use simple_logger::SimpleLogger;
|
||||
use clap::{Arg, Command as ClapCommand};
|
||||
use uuid::Uuid;
|
||||
use databuild::*;
|
||||
use databuild::event_log::{create_bel_query_engine, create_build_event};
|
||||
use databuild::mermaid_utils::generate_mermaid_diagram;
|
||||
|
||||
// Configure a job to produce the desired outputs
|
||||
fn configure(job_label: &str, output_refs: &[String]) -> Result<Vec<Task>, String> {
|
||||
let candidate_jobs_str = env::var("DATABUILD_CANDIDATE_JOBS_CFG")
|
||||
.map_err(|e| format!("Failed to get DATABUILD_CANDIDATE_JOBS_CFG: {}", e))?;
|
||||
|
||||
let job_path_map: HashMap<String, String> = serde_json::from_str(&candidate_jobs_str)
|
||||
.map_err(|e| format!("Failed to parse DATABUILD_CANDIDATE_JOBS_CFG: {}", e))?;
|
||||
|
||||
// Look up the executable path for this job
|
||||
let exec_path = job_path_map.get(job_label)
|
||||
.ok_or_else(|| format!("Job {} is not a candidate job", job_label))?;
|
||||
|
||||
// Check if executable exists
|
||||
if !std::path::Path::new(exec_path).exists() {
|
||||
return Err(format!("Executable not found at path: {}", exec_path));
|
||||
}
|
||||
|
||||
info!("Executing job configuration: {} {:?}", exec_path, output_refs);
|
||||
|
||||
// Execute the job configuration command
|
||||
let output = Command::new(exec_path)
|
||||
.args(output_refs)
|
||||
.output()
|
||||
.map_err(|e| format!("Failed to execute job config: {}", e))?;
|
||||
|
||||
if !output.status.success() {
|
||||
let stderr = String::from_utf8_lossy(&output.stderr);
|
||||
error!("Job configuration failed: {}", stderr);
|
||||
return Err(format!("Failed to run job config: {}", stderr));
|
||||
}
|
||||
|
||||
info!("Job configuration succeeded for {}", job_label);
|
||||
|
||||
// Parse the job configurations
|
||||
let stdout = String::from_utf8_lossy(&output.stdout);
|
||||
let job_configure_response: JobConfigureResponse = serde_json::from_str(&stdout)
|
||||
.map_err(|e| {
|
||||
error!("Error parsing job configs for {}: {}. `{}`", job_label, e, stdout);
|
||||
format!("Failed to parse job configs: {}", e)
|
||||
})?;
|
||||
let job_configs = job_configure_response.configs;
|
||||
|
||||
// Create tasks
|
||||
let tasks: Vec<Task> = job_configs.into_iter()
|
||||
.map(|cfg| Task {
|
||||
job: Some(JobLabel { label: job_label.to_string() }),
|
||||
config: Some(cfg),
|
||||
})
|
||||
.collect();
|
||||
|
||||
info!("Created {} tasks for job {}", tasks.len(), job_label);
|
||||
Ok(tasks)
|
||||
}
|
||||
|
||||
// Resolve produces a mapping of required job refs to the partitions it produces
|
||||
fn resolve(output_refs: &[String]) -> Result<HashMap<String, Vec<String>>, String> {
|
||||
let lookup_path = env::var("DATABUILD_JOB_LOOKUP_PATH")
|
||||
.map_err(|e| format!("Failed to get DATABUILD_JOB_LOOKUP_PATH: {}", e))?;
|
||||
|
||||
// Run the job lookup
|
||||
info!("Executing job lookup: {} {:?}", lookup_path, output_refs);
|
||||
|
||||
let output = Command::new(&lookup_path)
|
||||
.args(output_refs)
|
||||
.output()
|
||||
.map_err(|e| format!("Failed to execute job lookup: {}", e))?;
|
||||
|
||||
if !output.status.success() {
|
||||
error!("Job lookup failed: {}", output.status);
|
||||
let stderr = String::from_utf8_lossy(&output.stderr);
|
||||
error!("stderr: {}", stderr);
|
||||
let stdout = String::from_utf8_lossy(&output.stdout);
|
||||
error!("stdout: {}", stdout);
|
||||
return Err(format!("Failed to run job lookup: {}", stderr));
|
||||
}
|
||||
|
||||
info!("Job lookup succeeded for {} output refs", output_refs.len());
|
||||
|
||||
// Parse the result
|
||||
let stdout = String::from_utf8_lossy(&output.stdout);
|
||||
let result: HashMap<String, Vec<String>> = serde_json::from_str(&stdout)
|
||||
.map_err(|e| {
|
||||
error!("Error parsing job lookup result: {}", e);
|
||||
format!("Failed to parse job lookup result: {}", e)
|
||||
})?;
|
||||
|
||||
info!("Job lookup found {} job mappings", result.len());
|
||||
for (job, refs) in &result {
|
||||
info!(" Job {} produces {} refs", job, refs.len());
|
||||
}
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
|
||||
// Configure multiple jobs in parallel
|
||||
fn configure_parallel(job_refs: HashMap<String, Vec<String>>, num_workers: usize) -> Result<Vec<Task>, String> {
|
||||
// Create a channel for jobs
|
||||
let (job_sender, job_receiver) = crossbeam_channel::unbounded();
|
||||
|
||||
// Fill the jobs channel
|
||||
for (job_label, produced_refs) in job_refs {
|
||||
job_sender.send((job_label, produced_refs)).unwrap();
|
||||
}
|
||||
drop(job_sender); // Close the channel
|
||||
|
||||
// Create a channel for results
|
||||
let (task_sender, task_receiver) = crossbeam_channel::unbounded();
|
||||
let error = Arc::new(Mutex::new(None));
|
||||
|
||||
// Spawn worker threads
|
||||
let mut handles = vec![];
|
||||
for _ in 0..num_workers {
|
||||
let job_receiver = job_receiver.clone();
|
||||
let task_sender = task_sender.clone();
|
||||
let error = Arc::clone(&error);
|
||||
|
||||
let handle = thread::spawn(move || {
|
||||
for (job_label, produced_refs) in job_receiver {
|
||||
// Check if an error has already occurred
|
||||
if error.lock().unwrap().is_some() {
|
||||
return;
|
||||
}
|
||||
|
||||
match configure(&job_label, &produced_refs) {
|
||||
Ok(tasks) => {
|
||||
task_sender.send(tasks).unwrap();
|
||||
}
|
||||
Err(e) => {
|
||||
let mut error_guard = error.lock().unwrap();
|
||||
if error_guard.is_none() {
|
||||
*error_guard = Some(e);
|
||||
}
|
||||
return;
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
handles.push(handle);
|
||||
}
|
||||
|
||||
// Close the task sender
|
||||
drop(task_sender);
|
||||
|
||||
// Wait for all workers to finish
|
||||
for handle in handles {
|
||||
handle.join().unwrap();
|
||||
}
|
||||
|
||||
// Check for errors
|
||||
let error_guard = error.lock().unwrap();
|
||||
if let Some(e) = &*error_guard {
|
||||
return Err(e.clone());
|
||||
}
|
||||
|
||||
// Collect results
|
||||
let mut all_tasks = Vec::new();
|
||||
while let Ok(tasks) = task_receiver.try_recv() {
|
||||
all_tasks.extend(tasks);
|
||||
}
|
||||
|
||||
Ok(all_tasks)
|
||||
}
|
||||
|
||||
// Simple staleness check - all requested partitions need jobs created
|
||||
// Delegation optimization happens in execution phase
|
||||
async fn check_partition_staleness(
|
||||
partition_refs: &[String],
|
||||
_query_engine: &std::sync::Arc<databuild::event_log::query_engine::BELQueryEngine>,
|
||||
_build_request_id: &str
|
||||
) -> Result<(Vec<String>, Vec<String>), String> {
|
||||
// Analysis phase creates jobs for all requested partitions
|
||||
// Execution phase will handle delegation optimization
|
||||
let stale_partitions = partition_refs.to_vec();
|
||||
let delegated_partitions = Vec::new();
|
||||
|
||||
Ok((stale_partitions, delegated_partitions))
|
||||
}
|
||||
|
||||
// Plan creates a job graph for given output references
|
||||
async fn plan(
|
||||
output_refs: &[String],
|
||||
query_engine: Option<std::sync::Arc<databuild::event_log::query_engine::BELQueryEngine>>,
|
||||
build_request_id: &str
|
||||
) -> Result<JobGraph, String> {
|
||||
info!("Starting planning for {} output refs: {:?}", output_refs.len(), output_refs);
|
||||
|
||||
// Log build request received event
|
||||
if let Some(ref query_engine_ref) = query_engine {
|
||||
let event = create_build_event(
|
||||
build_request_id.to_string(),
|
||||
crate::build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestReceived as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestReceived.to_display_string(),
|
||||
requested_partitions: output_refs.iter().map(|s| PartitionRef { str: s.clone() }).collect(),
|
||||
message: "Analysis started".to_string(),
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine_ref.append_event(event).await {
|
||||
error!("Failed to log build request event: {}", e);
|
||||
}
|
||||
}
|
||||
|
||||
// Check for partition staleness and delegation opportunities
|
||||
let (stale_refs, _delegated_refs) = if let Some(ref query_engine_ref) = query_engine {
|
||||
match check_partition_staleness(output_refs, query_engine_ref, build_request_id).await {
|
||||
Ok((stale, delegated)) => {
|
||||
info!("Staleness check: {} stale, {} delegated partitions", stale.len(), delegated.len());
|
||||
(stale, delegated)
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Failed to check partition staleness: {}", e);
|
||||
// Fall back to building all partitions
|
||||
(output_refs.to_vec(), Vec::new())
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// No event log, build all partitions
|
||||
(output_refs.to_vec(), Vec::new())
|
||||
};
|
||||
|
||||
// Only plan for stale partitions that need to be built
|
||||
let mut unhandled_refs = HashSet::new();
|
||||
for ref_str in &stale_refs {
|
||||
unhandled_refs.insert(ref_str.clone());
|
||||
}
|
||||
|
||||
// Note: Partition analysis events will be logged after successful job graph creation
|
||||
|
||||
let mut epoch = 0;
|
||||
let mut nodes = Vec::new();
|
||||
|
||||
// Determine the number of workers based on available CPU cores or environment variable
|
||||
let mut num_workers = num_cpus::get();
|
||||
if let Ok(worker_env) = env::var("DATABUILD_PARALLEL_WORKERS") {
|
||||
if let Ok(parsed_workers) = worker_env.parse::<usize>() {
|
||||
if parsed_workers < 1 {
|
||||
num_workers = 1;
|
||||
info!("Warning: DATABUILD_PARALLEL_WORKERS must be at least 1, using: {}", num_workers);
|
||||
} else {
|
||||
num_workers = parsed_workers;
|
||||
}
|
||||
} else {
|
||||
info!("Warning: Invalid DATABUILD_PARALLEL_WORKERS value '{}', using default: {}", worker_env, num_workers);
|
||||
}
|
||||
}
|
||||
info!("Using {} workers for parallel execution", num_workers);
|
||||
|
||||
// Log planning phase start
|
||||
if let Some(ref query_engine_ref) = query_engine {
|
||||
let event = create_build_event(
|
||||
build_request_id.to_string(),
|
||||
crate::build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestPlanning as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestPlanning.to_display_string(),
|
||||
requested_partitions: output_refs.iter().map(|s| PartitionRef { str: s.clone() }).collect(),
|
||||
message: "Graph analysis in progress".to_string(),
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine_ref.append_event(event).await {
|
||||
error!("Failed to log planning event: {}", e);
|
||||
}
|
||||
}
|
||||
|
||||
while !unhandled_refs.is_empty() {
|
||||
if epoch >= 1000 {
|
||||
error!("Planning timeout: still planning after {} epochs, giving up", epoch);
|
||||
return Err(format!("Still planning after {} epochs, giving up", epoch));
|
||||
}
|
||||
|
||||
info!("Planning epoch {} with {} unhandled refs", epoch, unhandled_refs.len());
|
||||
|
||||
// Resolve jobs for all unhandled refs
|
||||
let unhandled_refs_list: Vec<String> = unhandled_refs.iter().cloned().collect();
|
||||
let job_refs = resolve(&unhandled_refs_list)?;
|
||||
|
||||
// Configure jobs in parallel
|
||||
let new_nodes = configure_parallel(job_refs.clone(), num_workers)?;
|
||||
|
||||
// Remove handled refs
|
||||
for (_, produced_refs) in job_refs {
|
||||
for ref_str in produced_refs {
|
||||
unhandled_refs.remove(&ref_str);
|
||||
}
|
||||
}
|
||||
|
||||
if !unhandled_refs.is_empty() {
|
||||
error!("Error: Still have unhandled refs after configuration phase: {:?}", unhandled_refs);
|
||||
return Err(format!("Should have no unhandled refs after configuration phase, but had: {:?}", unhandled_refs));
|
||||
}
|
||||
|
||||
epoch += 1;
|
||||
|
||||
// Add new nodes to the graph
|
||||
nodes.extend(new_nodes.clone());
|
||||
info!("Planning epoch {} completed: added {} new nodes, total nodes: {}", epoch, new_nodes.len(), nodes.len());
|
||||
|
||||
// Plan next epoch
|
||||
let mut new_unhandled_count = 0;
|
||||
for task in &new_nodes {
|
||||
for input in &task.config.as_ref().unwrap().inputs {
|
||||
if input.dep_type_code == 1 { // MATERIALIZE = 1
|
||||
if !unhandled_refs.contains(&input.partition_ref.as_ref().unwrap().str) {
|
||||
new_unhandled_count += 1;
|
||||
}
|
||||
unhandled_refs.insert(input.partition_ref.as_ref().unwrap().str.clone());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if new_unhandled_count > 0 {
|
||||
info!("Added {} new unhandled refs for next planning epoch", new_unhandled_count);
|
||||
}
|
||||
}
|
||||
|
||||
if !nodes.is_empty() {
|
||||
info!("Planning complete: created graph with {} nodes for {} output refs", nodes.len(), output_refs.len());
|
||||
|
||||
// Log analysis completion event
|
||||
if let Some(ref query_engine) = query_engine {
|
||||
let event = create_build_event(
|
||||
build_request_id.to_string(),
|
||||
crate::build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestAnalysisCompleted as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestAnalysisCompleted.to_display_string(),
|
||||
requested_partitions: output_refs.iter().map(|s| PartitionRef { str: s.clone() }).collect(),
|
||||
message: format!("Analysis completed successfully, {} tasks planned", nodes.len()),
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(event).await {
|
||||
error!("Failed to log analysis completion event: {}", e);
|
||||
}
|
||||
|
||||
// Store the job graph as an event in the build event log
|
||||
let job_graph = JobGraph {
|
||||
label: Some(GraphLabel { label: "analyzed_graph".to_string() }),
|
||||
outputs: output_refs.iter().map(|s| PartitionRef { str: s.clone() }).collect(),
|
||||
nodes: nodes.clone(),
|
||||
};
|
||||
|
||||
let job_graph_event = create_build_event(
|
||||
build_request_id.to_string(),
|
||||
crate::build_event::EventType::JobGraphEvent(JobGraphEvent {
|
||||
job_graph: Some(job_graph),
|
||||
message: format!("Job graph analysis completed with {} tasks", nodes.len()),
|
||||
}),
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(job_graph_event).await {
|
||||
error!("Failed to log job graph event: {}", e);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(JobGraph {
|
||||
label: Some(GraphLabel { label: "analyzed_graph".to_string() }),
|
||||
outputs: output_refs.iter().map(|s| PartitionRef { str: s.clone() }).collect(),
|
||||
nodes,
|
||||
})
|
||||
} else {
|
||||
error!("Planning failed: no nodes created for output refs {:?}", output_refs);
|
||||
|
||||
// Log planning failure
|
||||
if let Some(ref query_engine) = query_engine {
|
||||
let event = create_build_event(
|
||||
build_request_id.to_string(),
|
||||
crate::build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestFailed as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestFailed.to_display_string(),
|
||||
requested_partitions: output_refs.iter().map(|s| PartitionRef { str: s.clone() }).collect(),
|
||||
message: "No jobs found for requested partitions".to_string(),
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(event).await {
|
||||
error!("Failed to log failure event: {}", e);
|
||||
}
|
||||
}
|
||||
|
||||
Err("Unknown failure in graph planning".to_string())
|
||||
}
|
||||
}
|
||||
|
||||
// Generate a Mermaid flowchart diagram from a job graph
|
||||
// fn generate_mermaid_diagram(graph: &JobGraph) -> String {
|
||||
// // Start the mermaid flowchart
|
||||
// let mut mermaid = String::from("flowchart TD\n");
|
||||
//
|
||||
// // Track nodes we've already added to avoid duplicates
|
||||
// let mut added_nodes = HashSet::new();
|
||||
// let mut added_refs = HashSet::new();
|
||||
//
|
||||
// // Map to track which refs are outputs (to highlight them)
|
||||
// let mut is_output_ref = HashSet::new();
|
||||
// for ref_str in &graph.outputs {
|
||||
// is_output_ref.insert(ref_str.str.clone());
|
||||
// }
|
||||
//
|
||||
// // Process each task in the graph
|
||||
// for task in &graph.nodes {
|
||||
// // Create a unique ID for this job+outputs combination
|
||||
// let outputs_strs: Vec<String> = task.config.as_ref().unwrap().outputs.iter().map(|o| o.str.clone()).collect();
|
||||
// let outputs_key = outputs_strs.join("_");
|
||||
// let mut job_node_id = format!("job_{}", task.job.as_ref().unwrap().label.replace("//", "_"));
|
||||
// job_node_id = job_node_id.replace(":", "_").replace("=", "_").replace("?", "_").replace(" ", "_");
|
||||
// job_node_id = format!("{}_{}", job_node_id, outputs_key.replace("/", "_").replace("=", "_"));
|
||||
//
|
||||
// // Create a descriptive label that includes both job label and outputs
|
||||
// let job_label = &task.job.as_ref().unwrap().label;
|
||||
// let outputs_label = if !task.config.as_ref().unwrap().outputs.is_empty() {
|
||||
// if task.config.as_ref().unwrap().outputs.len() == 1 {
|
||||
// format!(" [{}]", task.config.as_ref().unwrap().outputs[0].str)
|
||||
// } else {
|
||||
// format!(" [{}, ...]", task.config.as_ref().unwrap().outputs[0].str)
|
||||
// }
|
||||
// } else {
|
||||
// String::new()
|
||||
// };
|
||||
//
|
||||
// // Add the job node if not already added
|
||||
// if !added_nodes.contains(&job_node_id) {
|
||||
// // Represent job as a process shape with escaped label
|
||||
// mermaid.push_str(&format!(
|
||||
// " {}[\"`**{}** {}`\"]:::job\n",
|
||||
// job_node_id,
|
||||
// job_label,
|
||||
// outputs_label
|
||||
// ));
|
||||
// added_nodes.insert(job_node_id.clone());
|
||||
// }
|
||||
//
|
||||
// // Process inputs (dependencies)
|
||||
// for input in &task.config.as_ref().unwrap().inputs {
|
||||
// let ref_node_id = format!("ref_{}", input.partition_ref.as_ref().unwrap().str.replace("/", "_").replace("=", "_"));
|
||||
//
|
||||
// // Add the partition ref node if not already added
|
||||
// if !added_refs.contains(&ref_node_id) {
|
||||
// let node_class = if is_output_ref.contains(&input.partition_ref.as_ref().unwrap().str) {
|
||||
// "outputPartition"
|
||||
// } else {
|
||||
// "partition"
|
||||
// };
|
||||
//
|
||||
// // Represent partition as a cylinder
|
||||
// mermaid.push_str(&format!(
|
||||
// " {}[(\"{}\")]:::{}\n",
|
||||
// ref_node_id,
|
||||
// input.partition_ref.as_ref().unwrap().str.replace("/", "_").replace("=", "_"),
|
||||
// node_class
|
||||
// ));
|
||||
// added_refs.insert(ref_node_id.clone());
|
||||
// }
|
||||
//
|
||||
// // Add the edge from input to job
|
||||
// if input.dep_type == 1 { // MATERIALIZE = 1
|
||||
// // Solid line for materialize dependencies
|
||||
// mermaid.push_str(&format!(" {} --> {}\n", ref_node_id, job_node_id));
|
||||
// } else {
|
||||
// // Dashed line for query dependencies
|
||||
// mermaid.push_str(&format!(" {} -.-> {}\n", ref_node_id, job_node_id));
|
||||
// }
|
||||
// }
|
||||
//
|
||||
// // Process outputs
|
||||
// for output in &task.config.as_ref().unwrap().outputs {
|
||||
// let ref_node_id = format!("ref_{}", output.str.replace("/", "_").replace("=", "_"));
|
||||
//
|
||||
// // Add the partition ref node if not already added
|
||||
// if !added_refs.contains(&ref_node_id) {
|
||||
// let node_class = if is_output_ref.contains(&output.str) {
|
||||
// "outputPartition"
|
||||
// } else {
|
||||
// "partition"
|
||||
// };
|
||||
//
|
||||
// // Represent partition as a cylinder
|
||||
// mermaid.push_str(&format!(
|
||||
// " {}[(\"Partition: {}\")]:::{}\n",
|
||||
// ref_node_id,
|
||||
// output.str,
|
||||
// node_class
|
||||
// ));
|
||||
// added_refs.insert(ref_node_id.clone());
|
||||
// }
|
||||
//
|
||||
// // Add the edge from job to output
|
||||
// mermaid.push_str(&format!(" {} --> {}\n", job_node_id, ref_node_id));
|
||||
// }
|
||||
// }
|
||||
//
|
||||
// // Add styling
|
||||
// mermaid.push_str("\n %% Styling\n");
|
||||
// mermaid.push_str(" classDef job fill:#f9f,stroke:#333,stroke-width:1px;\n");
|
||||
// mermaid.push_str(" classDef partition fill:#bbf,stroke:#333,stroke-width:1px;\n");
|
||||
// mermaid.push_str(" classDef outputPartition fill:#bfb,stroke:#333,stroke-width:2px;\n");
|
||||
//
|
||||
// mermaid
|
||||
// }
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() {
|
||||
// Initialize logger
|
||||
SimpleLogger::new().init().unwrap();
|
||||
|
||||
let mode = env::var("DATABUILD_MODE").unwrap_or_else(|_| "unknown".to_string());
|
||||
info!("Starting analyze.rs in mode: {}", mode);
|
||||
|
||||
// Parse command line arguments (only for partition references)
|
||||
let matches = ClapCommand::new("analyze")
|
||||
.version("1.0")
|
||||
.about("DataBuild graph analysis tool")
|
||||
.arg(
|
||||
Arg::new("partitions")
|
||||
.help("Partition references to analyze")
|
||||
.required(false)
|
||||
.num_args(0..)
|
||||
.value_name("PARTITIONS")
|
||||
)
|
||||
.get_matches();
|
||||
|
||||
let args: Vec<String> = matches.get_many::<String>("partitions")
|
||||
.unwrap_or_default()
|
||||
.cloned()
|
||||
.collect();
|
||||
|
||||
// Validate arguments based on mode
|
||||
match mode.as_str() {
|
||||
"plan" | "mermaid" => {
|
||||
if args.is_empty() {
|
||||
error!("Error: Partition references are required for {} mode", mode);
|
||||
eprintln!("Error: Partition references are required for {} mode", mode);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
"import_test" => {
|
||||
// No partition arguments needed for test mode
|
||||
}
|
||||
_ => {
|
||||
// Unknown mode, will be handled later
|
||||
}
|
||||
}
|
||||
|
||||
// Get build event log configuration from environment variables
|
||||
let build_event_log_uri = env::var("DATABUILD_BUILD_EVENT_LOG").ok();
|
||||
let build_request_id = env::var("DATABUILD_BUILD_REQUEST_ID")
|
||||
.unwrap_or_else(|_| Uuid::new_v4().to_string());
|
||||
|
||||
// Initialize build event log if provided
|
||||
let query_engine = if let Some(uri) = build_event_log_uri {
|
||||
match create_bel_query_engine(&uri).await {
|
||||
Ok(engine) => {
|
||||
info!("Initialized build event log: {}", uri);
|
||||
Some(engine)
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Failed to initialize build event log {}: {}", uri, e);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
|
||||
match mode.as_str() {
|
||||
"plan" => {
|
||||
// Get output refs from command line arguments
|
||||
match plan(&args, query_engine, &build_request_id).await {
|
||||
Ok(graph) => {
|
||||
// Output the job graph as JSON
|
||||
match serde_json::to_string(&graph) {
|
||||
Ok(json_data) => {
|
||||
info!("Successfully generated job graph with {} nodes", graph.nodes.len());
|
||||
println!("{}", json_data);
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Error marshaling job graph: {}", e);
|
||||
eprintln!("Error marshaling job graph: {}", e);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Error: {}", e);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
"lookup" => {
|
||||
// Get output refs from command line arguments
|
||||
match resolve(&args) {
|
||||
Ok(result) => {
|
||||
// Output the result as JSON
|
||||
match serde_json::to_string(&result) {
|
||||
Ok(json_data) => {
|
||||
info!("Successfully completed lookup for {} output refs with {} job mappings", args.len(), result.len());
|
||||
println!("{}", json_data);
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Error marshaling lookup result: {}", e);
|
||||
eprintln!("Error marshaling lookup result: {}", e);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Error: {}", e);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
"mermaid" => {
|
||||
// Get output refs from command line arguments
|
||||
match plan(&args, None, &build_request_id).await {
|
||||
Ok(graph) => {
|
||||
// Generate and output the mermaid diagram
|
||||
let mermaid_diagram = generate_mermaid_diagram(&graph);
|
||||
println!("{}", mermaid_diagram);
|
||||
info!("Successfully generated mermaid diagram for {} nodes", graph.nodes.len());
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Error: {}", e);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
"import_test" => {
|
||||
info!("Running in import_test mode");
|
||||
println!("ok :)");
|
||||
info!("Import test completed successfully");
|
||||
}
|
||||
_ => {
|
||||
error!("Error: Unknown mode '{}'", mode);
|
||||
eprintln!("Unknown MODE `{}`", mode);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,815 +0,0 @@
|
|||
use databuild::{JobGraph, Task, JobStatus, BuildRequestStatus, PartitionStatus, BuildRequestEvent, JobEvent, PartitionEvent, PartitionRef};
|
||||
use databuild::event_log::{create_bel_query_engine, create_build_event};
|
||||
use databuild::build_event::EventType;
|
||||
use databuild::log_collector::{LogCollector, LogCollectorError};
|
||||
use crossbeam_channel::{Receiver, Sender};
|
||||
use log::{debug, error, info, warn};
|
||||
use std::collections::{HashMap, HashSet};
|
||||
use std::io::{BufReader, Read, Write};
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::process::{Command, Stdio};
|
||||
use std::sync::Arc;
|
||||
use std::{env, thread};
|
||||
use std::time::{Duration, Instant};
|
||||
// Command line parsing removed - using environment variables
|
||||
use uuid::Uuid;
|
||||
|
||||
const NUM_WORKERS: usize = 4;
|
||||
const LOG_INTERVAL: Duration = Duration::from_secs(5);
|
||||
const FAIL_FAST: bool = true; // Same default as the Go version
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
enum TaskState {
|
||||
Pending,
|
||||
Running,
|
||||
Succeeded,
|
||||
Failed,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
struct TaskExecutionResult {
|
||||
task_key: String,
|
||||
job_label: String, // For logging
|
||||
success: bool,
|
||||
stdout: String,
|
||||
stderr: String,
|
||||
duration: Duration,
|
||||
error_message: Option<String>,
|
||||
}
|
||||
|
||||
// Generates a unique key for a task based on its JobLabel, input and output references.
|
||||
// Mirrors the Go implementation's getTaskKey.
|
||||
fn get_task_key(task: &Task) -> String {
|
||||
let mut key_parts = Vec::new();
|
||||
key_parts.push(task.job.as_ref().unwrap().label.clone());
|
||||
|
||||
for input_dep in &task.config.as_ref().unwrap().inputs {
|
||||
key_parts.push(format!("input:{}", input_dep.partition_ref.as_ref().unwrap().str));
|
||||
}
|
||||
for output_ref in &task.config.as_ref().unwrap().outputs {
|
||||
key_parts.push(format!("output:{}", output_ref.str));
|
||||
}
|
||||
key_parts.join("|")
|
||||
}
|
||||
|
||||
fn worker(
|
||||
task_rx: Receiver<Arc<Task>>,
|
||||
result_tx: Sender<TaskExecutionResult>,
|
||||
worker_id: usize,
|
||||
) {
|
||||
info!("[Worker {}] Starting", worker_id);
|
||||
while let Ok(task) = task_rx.recv() {
|
||||
let task_key = get_task_key(&task);
|
||||
info!("[Worker {}] Starting job: {} (Key: {})", worker_id, task.job.as_ref().unwrap().label, task_key);
|
||||
let start_time = Instant::now();
|
||||
|
||||
let candidate_jobs_str = env::var("DATABUILD_CANDIDATE_JOBS_EXEC")
|
||||
.map_err(|e| format!("Failed to get DATABUILD_CANDIDATE_JOBS_EXEC: {}", e)).unwrap();
|
||||
|
||||
let job_path_map: HashMap<String, String> = serde_json::from_str(&candidate_jobs_str)
|
||||
.map_err(|e| format!("Failed to parse DATABUILD_CANDIDATE_JOBS_EXEC: {}", e)).unwrap();
|
||||
|
||||
// Look up the executable path for this job
|
||||
let job_label = &task.job.as_ref().unwrap().label;
|
||||
let exec_path = job_path_map.get(job_label)
|
||||
.ok_or_else(|| format!("Job {} is not a candidate job", job_label)).unwrap();
|
||||
|
||||
let config_json = match serde_json::to_string(&task.config.as_ref().unwrap()) {
|
||||
Ok(json) => json,
|
||||
Err(e) => {
|
||||
let err_msg = format!("Failed to serialize task config for {}: {}", task.job.as_ref().unwrap().label, e);
|
||||
error!("[Worker {}] {}", worker_id, err_msg);
|
||||
result_tx
|
||||
.send(TaskExecutionResult {
|
||||
task_key,
|
||||
job_label: task.job.as_ref().unwrap().label.clone(),
|
||||
success: false,
|
||||
stdout: String::new(),
|
||||
stderr: err_msg.clone(),
|
||||
duration: start_time.elapsed(),
|
||||
error_message: Some(err_msg),
|
||||
})
|
||||
.unwrap_or_else(|e| error!("[Worker {}] Failed to send error result: {}", worker_id, e));
|
||||
continue;
|
||||
}
|
||||
};
|
||||
|
||||
// Generate a job run ID for this execution
|
||||
let job_run_id = Uuid::new_v4().to_string();
|
||||
|
||||
info!("Running job {} (Path: {}) with config: {}", job_label, exec_path, config_json);
|
||||
let mut cmd = Command::new(&exec_path);
|
||||
cmd.stdin(Stdio::piped())
|
||||
.stdout(Stdio::piped())
|
||||
.stderr(Stdio::piped());
|
||||
|
||||
// Set environment variables from the current process's environment
|
||||
// This mirrors the Go `cmd.Env = os.Environ()` behavior.
|
||||
// Task-specific env vars from task.config.env are passed via JSON through stdin.
|
||||
cmd.env_clear(); // Start with no environment variables
|
||||
for (key, value) in std::env::vars() {
|
||||
cmd.env(key, value); // Add current process's environment variables
|
||||
}
|
||||
|
||||
// Add the job run ID so the job wrapper can use the same ID
|
||||
cmd.env("DATABUILD_JOB_RUN_ID", &job_run_id);
|
||||
|
||||
match cmd.spawn() {
|
||||
Ok(mut child) => {
|
||||
if let Some(mut child_stdin) = child.stdin.take() {
|
||||
if let Err(e) = child_stdin.write_all(config_json.as_bytes()) {
|
||||
let err_msg = format!("[Worker {}] Failed to write to stdin for {}: {}", worker_id, task.job.as_ref().unwrap().label, e);
|
||||
error!("{}", err_msg);
|
||||
// Ensure child is killed if stdin write fails before wait
|
||||
let _ = child.kill();
|
||||
let _ = child.wait(); // Reap the child
|
||||
result_tx.send(TaskExecutionResult {
|
||||
task_key,
|
||||
job_label: task.job.as_ref().unwrap().label.clone(),
|
||||
success: false,
|
||||
stdout: String::new(),
|
||||
stderr: err_msg.clone(),
|
||||
duration: start_time.elapsed(),
|
||||
error_message: Some(err_msg),
|
||||
})
|
||||
.unwrap_or_else(|e| error!("[Worker {}] Failed to send error result: {}", worker_id, e));
|
||||
continue;
|
||||
}
|
||||
drop(child_stdin); // Close stdin to signal EOF to the child
|
||||
} else {
|
||||
let err_msg = format!("[Worker {}] Failed to get stdin for {}", worker_id, task.job.as_ref().unwrap().label);
|
||||
error!("{}", err_msg);
|
||||
result_tx.send(TaskExecutionResult {
|
||||
task_key,
|
||||
job_label: task.job.as_ref().unwrap().label.clone(),
|
||||
success: false,
|
||||
stdout: String::new(),
|
||||
stderr: err_msg.clone(),
|
||||
duration: start_time.elapsed(),
|
||||
error_message: Some(err_msg),
|
||||
})
|
||||
.unwrap_or_else(|e| error!("[Worker {}] Failed to send error result: {}", worker_id, e));
|
||||
continue;
|
||||
}
|
||||
|
||||
// Initialize log collector
|
||||
let mut log_collector = match LogCollector::new(LogCollector::default_logs_dir()) {
|
||||
Ok(mut collector) => {
|
||||
// Set the job label mapping for this job run
|
||||
collector.set_job_label(&job_run_id, &task.job.as_ref().unwrap().label);
|
||||
collector
|
||||
},
|
||||
Err(e) => {
|
||||
let err_msg = format!("[Worker {}] Failed to initialize log collector for {}: {}",
|
||||
worker_id, task.job.as_ref().unwrap().label, e);
|
||||
error!("{}", err_msg);
|
||||
result_tx
|
||||
.send(TaskExecutionResult {
|
||||
task_key,
|
||||
job_label: task.job.as_ref().unwrap().label.clone(),
|
||||
success: false,
|
||||
stdout: String::new(),
|
||||
stderr: err_msg.clone(),
|
||||
duration: start_time.elapsed(),
|
||||
error_message: Some(err_msg),
|
||||
})
|
||||
.unwrap_or_else(|e| error!("[Worker {}] Failed to send error result: {}", worker_id, e));
|
||||
continue;
|
||||
}
|
||||
};
|
||||
|
||||
// Collect stdout/stderr and process with LogCollector
|
||||
let stdout_handle = child.stdout.take();
|
||||
let stderr_handle = child.stderr.take();
|
||||
|
||||
let mut stdout_content = String::new();
|
||||
let mut stderr_content = String::new();
|
||||
|
||||
// Read stdout and process with LogCollector
|
||||
if let Some(stdout) = stdout_handle {
|
||||
let stdout_reader = BufReader::new(stdout);
|
||||
if let Err(e) = log_collector.consume_job_output(&job_run_id, stdout_reader) {
|
||||
warn!("[Worker {}] Failed to process job logs for {}: {}",
|
||||
worker_id, task.job.as_ref().unwrap().label, e);
|
||||
}
|
||||
}
|
||||
|
||||
// Read stderr (raw, not structured)
|
||||
if let Some(mut stderr) = stderr_handle {
|
||||
if let Err(e) = stderr.read_to_string(&mut stderr_content) {
|
||||
warn!("[Worker {}] Failed to read stderr for {}: {}",
|
||||
worker_id, task.job.as_ref().unwrap().label, e);
|
||||
}
|
||||
}
|
||||
|
||||
// Wait for the process to finish
|
||||
match child.wait() {
|
||||
Ok(status) => {
|
||||
let duration = start_time.elapsed();
|
||||
let success = status.success();
|
||||
|
||||
// Close the log collector for this job
|
||||
if let Err(e) = log_collector.close_job(&job_run_id) {
|
||||
warn!("[Worker {}] Failed to close log collector for {}: {}",
|
||||
worker_id, task.job.as_ref().unwrap().label, e);
|
||||
}
|
||||
|
||||
if success {
|
||||
info!(
|
||||
"[Worker {}] Job succeeded: {} (Duration: {:?}, Job Run ID: {})",
|
||||
worker_id, task.job.as_ref().unwrap().label, duration, job_run_id
|
||||
);
|
||||
} else {
|
||||
error!(
|
||||
"[Worker {}] Job failed: {} (Duration: {:?}, Status: {:?}, Job Run ID: {})\nStderr: {}",
|
||||
worker_id, task.job.as_ref().unwrap().label, duration, status, job_run_id, stderr_content
|
||||
);
|
||||
}
|
||||
result_tx
|
||||
.send(TaskExecutionResult {
|
||||
task_key,
|
||||
job_label: task.job.as_ref().unwrap().label.clone(),
|
||||
success,
|
||||
stdout: format!("Job logs written to JSONL (Job Run ID: {})", job_run_id),
|
||||
stderr: stderr_content,
|
||||
duration,
|
||||
error_message: if success { None } else { Some(format!("Exited with status: {:?}", status)) },
|
||||
})
|
||||
.unwrap_or_else(|e| error!("[Worker {}] Failed to send result: {}", worker_id, e));
|
||||
}
|
||||
Err(e) => {
|
||||
let err_msg = format!("[Worker {}] Failed to execute or wait for {}: {}", worker_id, task.job.as_ref().unwrap().label, e);
|
||||
error!("{}", err_msg);
|
||||
result_tx
|
||||
.send(TaskExecutionResult {
|
||||
task_key,
|
||||
job_label: task.job.as_ref().unwrap().label.clone(),
|
||||
success: false,
|
||||
stdout: String::new(),
|
||||
stderr: err_msg.clone(),
|
||||
duration: start_time.elapsed(),
|
||||
error_message: Some(err_msg),
|
||||
})
|
||||
.unwrap_or_else(|e| error!("[Worker {}] Failed to send execution error result: {}", worker_id, e));
|
||||
}
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
let err_msg = format!("[Worker {}] Failed to spawn command for {}: {} (Path: {:?})", worker_id, task.job.as_ref().unwrap().label, e, exec_path);
|
||||
error!("{}", err_msg);
|
||||
result_tx
|
||||
.send(TaskExecutionResult {
|
||||
task_key,
|
||||
job_label: task.job.as_ref().unwrap().label.clone(),
|
||||
success: false,
|
||||
stdout: String::new(),
|
||||
stderr: err_msg.clone(),
|
||||
duration: start_time.elapsed(),
|
||||
error_message: Some(err_msg),
|
||||
})
|
||||
.unwrap_or_else(|e| error!("[Worker {}] Failed to send spawn error result: {}", worker_id, e));
|
||||
}
|
||||
}
|
||||
}
|
||||
info!("[Worker {}] Exiting", worker_id);
|
||||
}
|
||||
|
||||
fn is_task_ready(task: &Task, completed_outputs: &HashSet<String>) -> bool {
|
||||
let mut missing_deps = Vec::new();
|
||||
|
||||
for dep in &task.config.as_ref().unwrap().inputs {
|
||||
if dep.dep_type_code == 1 { // MATERIALIZE = 1
|
||||
if !completed_outputs.contains(&dep.partition_ref.as_ref().unwrap().str) {
|
||||
missing_deps.push(&dep.partition_ref.as_ref().unwrap().str);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if !missing_deps.is_empty() {
|
||||
debug!("Task {} not ready - missing dependencies: {:?}", task.job.as_ref().unwrap().label, missing_deps);
|
||||
return false;
|
||||
}
|
||||
|
||||
true
|
||||
}
|
||||
|
||||
// Check if partitions are already available or being built by other build requests
|
||||
async fn check_build_coordination(
|
||||
task: &Task,
|
||||
query_engine: &Arc<databuild::event_log::query_engine::BELQueryEngine>,
|
||||
build_request_id: &str
|
||||
) -> Result<(bool, bool, Vec<(PartitionRef, String)>), String> {
|
||||
let outputs = &task.config.as_ref().unwrap().outputs;
|
||||
let mut available_partitions = Vec::new();
|
||||
let mut needs_building = false;
|
||||
|
||||
for output_ref in outputs {
|
||||
debug!("Checking build coordination for partition: {}", output_ref.str);
|
||||
|
||||
// First check if this partition is already available
|
||||
match query_engine.get_latest_partition_status(&output_ref.str).await {
|
||||
Ok(Some((status, _timestamp))) => {
|
||||
debug!("Partition {} has status: {:?}", output_ref.str, status);
|
||||
if status == databuild::PartitionStatus::PartitionAvailable {
|
||||
// Get which build request created this partition
|
||||
match query_engine.get_build_request_for_available_partition(&output_ref.str).await {
|
||||
Ok(Some(source_build_id)) => {
|
||||
info!("Partition {} already available from build {}", output_ref.str, source_build_id);
|
||||
available_partitions.push((output_ref.clone(), source_build_id));
|
||||
continue;
|
||||
}
|
||||
Ok(None) => {
|
||||
error!("Partition {} is available but no source build found - this indicates a bug in the event log implementation", output_ref.str);
|
||||
return Err(format!("Available partition {} has no source build ID. This suggests the event log is missing required data.", output_ref.str));
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Failed to get source build for partition {}: {}", output_ref.str, e);
|
||||
return Err(format!("Cannot determine source build for available partition {}: {}", output_ref.str, e));
|
||||
}
|
||||
}
|
||||
} else {
|
||||
debug!("Partition {} has non-available status {:?}, needs building", output_ref.str, status);
|
||||
needs_building = true;
|
||||
}
|
||||
}
|
||||
Ok(None) => {
|
||||
debug!("Partition {} has no status, needs building", output_ref.str);
|
||||
needs_building = true;
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Failed to check partition status for {}: {}", output_ref.str, e);
|
||||
return Err(format!("Cannot check partition status: {}. Use a queryable event log (e.g., SQLite) for builds that need to check existing partitions.", e));
|
||||
}
|
||||
}
|
||||
|
||||
// Check if this partition is being built by another request
|
||||
match query_engine.get_active_builds_for_partition(&output_ref.str).await {
|
||||
Ok(active_builds) => {
|
||||
let other_builds: Vec<String> = active_builds.into_iter()
|
||||
.filter(|id| id != build_request_id)
|
||||
.collect();
|
||||
|
||||
if !other_builds.is_empty() {
|
||||
info!("Partition {} is already being built by other requests: {:?}. Delegating.",
|
||||
output_ref.str, other_builds);
|
||||
|
||||
// Log delegation event for active builds
|
||||
for delegated_to_build_id in &other_builds {
|
||||
let event = create_build_event(
|
||||
build_request_id.to_string(),
|
||||
EventType::DelegationEvent(databuild::DelegationEvent {
|
||||
partition_ref: Some(output_ref.clone()),
|
||||
delegated_to_build_request_id: delegated_to_build_id.clone(),
|
||||
message: "Delegated to active build during execution".to_string(),
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(event).await {
|
||||
error!("Failed to log delegation event: {}", e);
|
||||
}
|
||||
}
|
||||
|
||||
return Ok((false, false, available_partitions)); // Don't build, delegated to active build
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Failed to check active builds for partition {}: {}", output_ref.str, e);
|
||||
return Err(format!("Cannot check active builds: {}. Use a queryable event log (e.g., SQLite) for builds that need to check for concurrent execution.", e));
|
||||
}
|
||||
}
|
||||
|
||||
// If we reach here, this partition needs to be built
|
||||
needs_building = true;
|
||||
}
|
||||
|
||||
// Only skip the job if ALL partitions are already available
|
||||
if !needs_building && available_partitions.len() == outputs.len() {
|
||||
Ok((false, true, available_partitions)) // Don't build, skip due to all partitions available
|
||||
} else {
|
||||
Ok((true, false, available_partitions)) // Need to build (some partitions unavailable)
|
||||
}
|
||||
}
|
||||
|
||||
fn log_status_summary(
|
||||
task_states: &HashMap<String, TaskState>,
|
||||
original_tasks_by_key: &HashMap<String, Arc<Task>>,
|
||||
) {
|
||||
let mut pending_tasks = Vec::new();
|
||||
let mut running_tasks = Vec::new();
|
||||
let mut succeeded_tasks = Vec::new();
|
||||
let mut failed_tasks = Vec::new();
|
||||
|
||||
for (key, state) in task_states {
|
||||
let label = original_tasks_by_key.get(key).map_or_else(|| key.as_str(), |t| t.job.as_ref().unwrap().label.as_str());
|
||||
match state {
|
||||
TaskState::Pending => pending_tasks.push(label),
|
||||
TaskState::Running => running_tasks.push(label),
|
||||
TaskState::Succeeded => succeeded_tasks.push(label),
|
||||
TaskState::Failed => failed_tasks.push(label),
|
||||
}
|
||||
}
|
||||
|
||||
info!("Task Status Summary:");
|
||||
info!(" Pending ({}): {:?}", pending_tasks.len(), pending_tasks);
|
||||
info!(" Running ({}): {:?}", running_tasks.len(), running_tasks);
|
||||
info!(" Succeeded ({}): {:?}", succeeded_tasks.len(), succeeded_tasks);
|
||||
info!(" Failed ({}): {:?}", failed_tasks.len(), failed_tasks);
|
||||
}
|
||||
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
simple_logger::SimpleLogger::new()
|
||||
.with_level(
|
||||
std::env::var("RUST_LOG")
|
||||
.unwrap_or_else(|_| "info".to_string())
|
||||
.parse()
|
||||
.unwrap_or(log::LevelFilter::Info)
|
||||
)
|
||||
.init()?;
|
||||
|
||||
// Get build event log configuration from environment variables
|
||||
let build_event_log_uri = std::env::var("DATABUILD_BUILD_EVENT_LOG").ok();
|
||||
let build_request_id = std::env::var("DATABUILD_BUILD_REQUEST_ID")
|
||||
.unwrap_or_else(|_| Uuid::new_v4().to_string());
|
||||
|
||||
// Initialize build event log if provided
|
||||
let build_event_log = if let Some(uri) = build_event_log_uri {
|
||||
match create_bel_query_engine(&uri).await {
|
||||
Ok(log) => {
|
||||
info!("Initialized build event log: {}", uri);
|
||||
Some(log)
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Failed to initialize build event log {}: {}", uri, e);
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
let mut buffer = String::new();
|
||||
std::io::stdin().read_to_string(&mut buffer)?;
|
||||
let graph: JobGraph = serde_json::from_str(&buffer)?;
|
||||
|
||||
info!("Executing job graph with {} nodes", graph.nodes.len());
|
||||
|
||||
|
||||
// Log build request execution start (existing detailed event)
|
||||
if let Some(ref query_engine) = build_event_log {
|
||||
let event = create_build_event(
|
||||
build_request_id.clone(),
|
||||
EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestExecuting as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestExecuting.to_display_string(),
|
||||
requested_partitions: graph.outputs.clone(),
|
||||
message: format!("Starting execution of {} jobs", graph.nodes.len()),
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(event).await {
|
||||
error!("Failed to log execution start event: {}", e);
|
||||
}
|
||||
}
|
||||
|
||||
let mut task_states: HashMap<String, TaskState> = HashMap::new();
|
||||
let mut original_tasks_by_key: HashMap<String, Arc<Task>> = HashMap::new();
|
||||
let graph_nodes_arc: Vec<Arc<Task>> = graph.nodes.into_iter().map(Arc::new).collect();
|
||||
|
||||
|
||||
for task_node in &graph_nodes_arc {
|
||||
let key = get_task_key(task_node);
|
||||
task_states.insert(key.clone(), TaskState::Pending);
|
||||
original_tasks_by_key.insert(key, task_node.clone());
|
||||
}
|
||||
|
||||
let mut completed_outputs: HashSet<String> = HashSet::new();
|
||||
let mut job_results: Vec<TaskExecutionResult> = Vec::new();
|
||||
|
||||
let (task_tx, task_rx): (Sender<Arc<Task>>, Receiver<Arc<Task>>) = crossbeam_channel::unbounded();
|
||||
let (result_tx, result_rx): (Sender<TaskExecutionResult>, Receiver<TaskExecutionResult>) = crossbeam_channel::unbounded();
|
||||
|
||||
let mut worker_handles = Vec::new();
|
||||
for i in 0..NUM_WORKERS {
|
||||
let task_rx_clone = task_rx.clone();
|
||||
let result_tx_clone = result_tx.clone();
|
||||
worker_handles.push(thread::spawn(move || {
|
||||
worker(task_rx_clone, result_tx_clone, i + 1);
|
||||
}));
|
||||
}
|
||||
// Drop the original result_tx so the channel closes when all workers are done
|
||||
// if result_rx is the only remaining receiver.
|
||||
drop(result_tx);
|
||||
|
||||
|
||||
let mut last_log_time = Instant::now();
|
||||
let mut active_tasks_count = 0;
|
||||
let mut fail_fast_triggered = false;
|
||||
|
||||
loop {
|
||||
// 1. Process results
|
||||
while let Ok(result) = result_rx.try_recv() {
|
||||
active_tasks_count -= 1;
|
||||
info!(
|
||||
"Received result for task {}: Success: {}",
|
||||
result.job_label, result.success
|
||||
);
|
||||
|
||||
let current_state = if result.success {
|
||||
TaskState::Succeeded
|
||||
} else {
|
||||
TaskState::Failed
|
||||
};
|
||||
task_states.insert(result.task_key.clone(), current_state);
|
||||
|
||||
// Log job completion events
|
||||
if let Some(ref query_engine) = build_event_log {
|
||||
if let Some(original_task) = original_tasks_by_key.get(&result.task_key) {
|
||||
let job_run_id = Uuid::new_v4().to_string();
|
||||
|
||||
// Log job completion
|
||||
let job_event = create_build_event(
|
||||
build_request_id.clone(),
|
||||
EventType::JobEvent(JobEvent {
|
||||
job_run_id: job_run_id.clone(),
|
||||
job_label: original_task.job.clone(),
|
||||
target_partitions: original_task.config.as_ref().unwrap().outputs.clone(),
|
||||
status_code: if result.success { JobStatus::JobCompleted as i32 } else { JobStatus::JobFailed as i32 },
|
||||
status_name: if result.success { JobStatus::JobCompleted.to_display_string() } else { JobStatus::JobFailed.to_display_string() },
|
||||
message: if result.success { "Job completed successfully".to_string() } else { result.error_message.clone().unwrap_or_default() },
|
||||
config: original_task.config.clone(),
|
||||
manifests: vec![], // Would be populated from actual job output
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(job_event).await {
|
||||
error!("Failed to log job completion event: {}", e);
|
||||
}
|
||||
|
||||
// Log partition status updates
|
||||
for output_ref in &original_task.config.as_ref().unwrap().outputs {
|
||||
let partition_event = create_build_event(
|
||||
build_request_id.clone(),
|
||||
EventType::PartitionEvent(PartitionEvent {
|
||||
partition_ref: Some(output_ref.clone()),
|
||||
status_code: if result.success { PartitionStatus::PartitionAvailable as i32 } else { PartitionStatus::PartitionFailed as i32 },
|
||||
status_name: if result.success { PartitionStatus::PartitionAvailable.to_display_string() } else { PartitionStatus::PartitionFailed.to_display_string() },
|
||||
message: if result.success { "Partition built successfully".to_string() } else { "Partition build failed".to_string() },
|
||||
job_run_id: job_run_id.clone(),
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(partition_event).await {
|
||||
error!("Failed to log partition status event: {}", e);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if result.success {
|
||||
if let Some(original_task) = original_tasks_by_key.get(&result.task_key) {
|
||||
for output_ref in &original_task.config.as_ref().unwrap().outputs {
|
||||
completed_outputs.insert(output_ref.str.clone());
|
||||
}
|
||||
}
|
||||
} else {
|
||||
if FAIL_FAST {
|
||||
warn!("Fail-fast enabled and task {} failed. Shutting down.", result.job_label);
|
||||
fail_fast_triggered = true;
|
||||
}
|
||||
}
|
||||
job_results.push(result);
|
||||
}
|
||||
|
||||
// 2. Check for fail-fast break
|
||||
if fail_fast_triggered && active_tasks_count == 0 { // Wait for running tasks to finish if fail fast
|
||||
info!("All active tasks completed after fail-fast trigger.");
|
||||
break;
|
||||
}
|
||||
if fail_fast_triggered && active_tasks_count > 0 {
|
||||
// Don't schedule new tasks, just wait for active ones or log
|
||||
} else if !fail_fast_triggered { // Only dispatch if not in fail-fast shutdown
|
||||
// 3. Dispatch ready tasks
|
||||
for task_node in &graph_nodes_arc {
|
||||
let task_key = get_task_key(task_node);
|
||||
if task_states.get(&task_key) == Some(&TaskState::Pending) {
|
||||
if is_task_ready(task_node, &completed_outputs) {
|
||||
// Check build coordination if event log is available
|
||||
let (should_build, is_skipped, available_partitions) = if let Some(ref query_engine) = build_event_log {
|
||||
match check_build_coordination(task_node, query_engine, &build_request_id).await {
|
||||
Ok((should_build, is_skipped, available_partitions)) => (should_build, is_skipped, available_partitions),
|
||||
Err(e) => {
|
||||
error!("Error checking build coordination for {}: {}",
|
||||
task_node.job.as_ref().unwrap().label, e);
|
||||
(true, false, Vec::<(PartitionRef, String)>::new()) // Default to building on error
|
||||
}
|
||||
}
|
||||
} else {
|
||||
(true, false, Vec::<(PartitionRef, String)>::new()) // No event log, always build
|
||||
};
|
||||
|
||||
if !should_build {
|
||||
if is_skipped {
|
||||
// Task skipped due to all partitions already available
|
||||
info!("Task {} skipped - all target partitions already available", task_node.job.as_ref().unwrap().label);
|
||||
|
||||
// Log delegation events for each available partition
|
||||
if let Some(ref query_engine) = build_event_log {
|
||||
for (partition_ref, source_build_id) in &available_partitions {
|
||||
let delegation_event = create_build_event(
|
||||
build_request_id.clone(),
|
||||
EventType::DelegationEvent(databuild::DelegationEvent {
|
||||
partition_ref: Some(partition_ref.clone()),
|
||||
delegated_to_build_request_id: source_build_id.clone(),
|
||||
message: "Delegated to historical build - partition already available".to_string(),
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(delegation_event).await {
|
||||
error!("Failed to log historical delegation event: {}", e);
|
||||
}
|
||||
}
|
||||
|
||||
// Log JOB_SKIPPED event
|
||||
let job_run_id = Uuid::new_v4().to_string();
|
||||
let job_event = create_build_event(
|
||||
build_request_id.clone(),
|
||||
EventType::JobEvent(JobEvent {
|
||||
job_run_id: job_run_id.clone(),
|
||||
job_label: task_node.job.clone(),
|
||||
target_partitions: task_node.config.as_ref().unwrap().outputs.clone(),
|
||||
status_code: JobStatus::JobSkipped as i32,
|
||||
status_name: JobStatus::JobSkipped.to_display_string(),
|
||||
message: "Job skipped - all target partitions already available".to_string(),
|
||||
config: task_node.config.clone(),
|
||||
manifests: vec![],
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(job_event).await {
|
||||
error!("Failed to log job skipped event: {}", e);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Task delegated to active build
|
||||
info!("Task {} delegated to active build request", task_node.job.as_ref().unwrap().label);
|
||||
}
|
||||
|
||||
task_states.insert(task_key.clone(), TaskState::Succeeded);
|
||||
|
||||
// Mark outputs as completed
|
||||
for output_ref in &task_node.config.as_ref().unwrap().outputs {
|
||||
completed_outputs.insert(output_ref.str.clone());
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
info!("Dispatching task: {}", task_node.job.as_ref().unwrap().label);
|
||||
|
||||
// Log job scheduling events
|
||||
if let Some(ref query_engine) = build_event_log {
|
||||
let job_run_id = Uuid::new_v4().to_string();
|
||||
|
||||
// Log job scheduled
|
||||
let job_event = create_build_event(
|
||||
build_request_id.clone(),
|
||||
EventType::JobEvent(JobEvent {
|
||||
job_run_id: job_run_id.clone(),
|
||||
job_label: task_node.job.clone(),
|
||||
target_partitions: task_node.config.as_ref().unwrap().outputs.clone(),
|
||||
status_code: JobStatus::JobScheduled as i32,
|
||||
status_name: JobStatus::JobScheduled.to_display_string(),
|
||||
message: "Job scheduled for execution".to_string(),
|
||||
config: task_node.config.clone(),
|
||||
manifests: vec![],
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(job_event).await {
|
||||
error!("Failed to log job scheduled event: {}", e);
|
||||
}
|
||||
|
||||
// Log partition building status
|
||||
for output_ref in &task_node.config.as_ref().unwrap().outputs {
|
||||
let partition_event = create_build_event(
|
||||
build_request_id.clone(),
|
||||
EventType::PartitionEvent(PartitionEvent {
|
||||
partition_ref: Some(output_ref.clone()),
|
||||
status_code: PartitionStatus::PartitionBuilding as i32,
|
||||
status_name: PartitionStatus::PartitionBuilding.to_display_string(),
|
||||
message: "Partition build started".to_string(),
|
||||
job_run_id: job_run_id.clone(),
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(partition_event).await {
|
||||
error!("Failed to log partition building event: {}", e);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
task_states.insert(task_key.clone(), TaskState::Running);
|
||||
task_tx.send(task_node.clone())?;
|
||||
active_tasks_count += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
// 4. Periodic logging
|
||||
if last_log_time.elapsed() >= LOG_INTERVAL {
|
||||
log_status_summary(&task_states, &original_tasks_by_key);
|
||||
|
||||
// Debug: Check for deadlock (pending tasks with no running tasks)
|
||||
let has_pending = task_states.values().any(|s| *s == TaskState::Pending);
|
||||
if has_pending && active_tasks_count == 0 {
|
||||
warn!("Potential deadlock detected: {} pending tasks with no running tasks",
|
||||
task_states.values().filter(|s| **s == TaskState::Pending).count());
|
||||
|
||||
// Log details of pending tasks and their preconditions
|
||||
for (key, state) in &task_states {
|
||||
if *state == TaskState::Pending {
|
||||
if let Some(task) = original_tasks_by_key.get(key) {
|
||||
warn!("Pending task: {} ({})", task.job.as_ref().unwrap().label, key);
|
||||
warn!(" Required inputs:");
|
||||
for dep in &task.config.as_ref().unwrap().inputs {
|
||||
if dep.dep_type_code == 1 { // MATERIALIZE = 1
|
||||
let available = completed_outputs.contains(&dep.partition_ref.as_ref().unwrap().str);
|
||||
warn!(" {} - {}", dep.partition_ref.as_ref().unwrap().str, if available { "AVAILABLE" } else { "MISSING" });
|
||||
}
|
||||
}
|
||||
warn!(" Produces outputs:");
|
||||
for output in &task.config.as_ref().unwrap().outputs {
|
||||
warn!(" {}", output.str);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
last_log_time = Instant::now();
|
||||
}
|
||||
|
||||
// 5. Check completion
|
||||
let all_done = task_states.values().all(|s| *s == TaskState::Succeeded || *s == TaskState::Failed);
|
||||
if active_tasks_count == 0 && all_done {
|
||||
info!("All tasks are in a terminal state and no tasks are active.");
|
||||
break;
|
||||
}
|
||||
|
||||
// Avoid busy-waiting if no events, give channels time
|
||||
// Select would be better here, but for simplicity:
|
||||
thread::sleep(Duration::from_millis(50));
|
||||
}
|
||||
|
||||
info!("Shutting down workers...");
|
||||
drop(task_tx); // Signal workers to stop by closing the task channel
|
||||
|
||||
for handle in worker_handles {
|
||||
handle.join().expect("Failed to join worker thread");
|
||||
}
|
||||
info!("All workers finished.");
|
||||
|
||||
// Final processing of any remaining results (should be minimal if loop logic is correct)
|
||||
while let Ok(result) = result_rx.try_recv() {
|
||||
active_tasks_count -= 1; // Should be 0
|
||||
info!(
|
||||
"Received late result for task {}: Success: {}",
|
||||
result.job_label, result.success
|
||||
);
|
||||
// Update state for completeness, though it might not affect overall outcome now
|
||||
let current_state = if result.success { TaskState::Succeeded } else { TaskState::Failed };
|
||||
task_states.insert(result.task_key.clone(), current_state);
|
||||
job_results.push(result);
|
||||
}
|
||||
|
||||
|
||||
let success_count = job_results.iter().filter(|r| r.success).count();
|
||||
let failure_count = job_results.len() - success_count;
|
||||
|
||||
info!("Execution complete: {} succeeded, {} failed", success_count, failure_count);
|
||||
|
||||
|
||||
// Log final build request status (existing detailed event)
|
||||
if let Some(ref query_engine) = build_event_log {
|
||||
let final_status = if failure_count > 0 || fail_fast_triggered {
|
||||
BuildRequestStatus::BuildRequestFailed
|
||||
} else {
|
||||
BuildRequestStatus::BuildRequestCompleted
|
||||
};
|
||||
|
||||
let event = create_build_event(
|
||||
build_request_id.clone(),
|
||||
EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: final_status as i32,
|
||||
status_name: final_status.to_display_string(),
|
||||
requested_partitions: graph.outputs.clone(),
|
||||
message: format!("Execution completed: {} succeeded, {} failed", success_count, failure_count),
|
||||
})
|
||||
);
|
||||
if let Err(e) = query_engine.append_event(event).await {
|
||||
error!("Failed to log final build request event: {}", e);
|
||||
}
|
||||
}
|
||||
|
||||
if failure_count > 0 || fail_fast_triggered {
|
||||
error!("Execution finished with errors.");
|
||||
std::process::exit(1);
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
|
@ -1,13 +0,0 @@
|
|||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
%{RUNFILES_PREFIX}
|
||||
|
||||
%{PREFIX}
|
||||
|
||||
# Locate the Rust binary using its standard runfiles path
|
||||
# Assumes workspace name is 'databuild'
|
||||
EXECUTABLE_BINARY="$(rlocation "databuild/databuild/graph/analyze")"
|
||||
|
||||
# Run the analysis
|
||||
exec "${EXECUTABLE_BINARY}" "$@"
|
||||
|
|
@ -1,11 +0,0 @@
|
|||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
%{RUNFILES_PREFIX}
|
||||
|
||||
%{PREFIX}
|
||||
|
||||
EXECUTABLE_BINARY="$(rlocation "databuild/databuild/graph/execute")"
|
||||
|
||||
# Run the execution
|
||||
exec "${EXECUTABLE_BINARY}" "$@"
|
||||
|
|
@ -1,5 +0,0 @@
|
|||
sh_test(
|
||||
name = "analyze_test",
|
||||
srcs = ["analyze_test.sh"],
|
||||
data = ["//databuild/graph:analyze"],
|
||||
)
|
||||
|
|
@ -1,3 +0,0 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
DATABUILD_MODE=import_test DATABUILD_JOB_LOOKUP_PATH=foo DATABUILD_CANDIDATE_JOBS=bar databuild/graph/analyze
|
||||
647
databuild/http_server.rs
Normal file
647
databuild/http_server.rs
Normal file
|
|
@ -0,0 +1,647 @@
|
|||
use crate::build_event_log::BELStorage;
|
||||
use crate::build_state::BuildState;
|
||||
use crate::commands::Command;
|
||||
use crate::lineage::build_lineage_graph;
|
||||
use crate::web::templates::{
|
||||
BaseContext, DerivativeWantView, HomePage, JobRunDetailPage, JobRunDetailView, JobRunsListPage,
|
||||
PartitionDetailPage, PartitionDetailView, PartitionsListPage, WantCreatePage, WantDetailPage,
|
||||
WantDetailView, WantsListPage,
|
||||
};
|
||||
use crate::{
|
||||
CancelWantRequest, CreateWantRequest, CreateWantResponse, GetWantRequest, GetWantResponse,
|
||||
ListJobRunsRequest, ListJobRunsResponse, ListPartitionsRequest, ListPartitionsResponse,
|
||||
ListWantsRequest, ListWantsResponse, PartitionStatusCode,
|
||||
};
|
||||
use askama::Template;
|
||||
use axum::{
|
||||
Json, Router,
|
||||
extract::{Path, Query, Request, State},
|
||||
http::{HeaderValue, Method, StatusCode},
|
||||
middleware::{self, Next},
|
||||
response::{Html, IntoResponse, Response},
|
||||
routing::{delete, get, post},
|
||||
};
|
||||
use std::sync::{
|
||||
Arc, RwLock,
|
||||
atomic::{AtomicU64, Ordering},
|
||||
};
|
||||
use std::time::{SystemTime, UNIX_EPOCH};
|
||||
use tokio::sync::{broadcast, mpsc, oneshot};
|
||||
use tower_http::cors::CorsLayer;
|
||||
|
||||
/// Shared application state for HTTP handlers
|
||||
#[derive(Clone)]
|
||||
pub struct AppState {
|
||||
/// Mirrored build state (updated via event broadcast from orchestrator)
|
||||
pub build_state: Arc<RwLock<BuildState>>,
|
||||
/// Shared read-only access to BEL storage (for event log queries if needed)
|
||||
pub bel_storage: Arc<dyn BELStorage>,
|
||||
/// Command sender for write operations (sends to orchestrator)
|
||||
pub command_tx: mpsc::Sender<Command>,
|
||||
/// For idle timeout tracking (epoch millis)
|
||||
pub last_request_time: Arc<AtomicU64>,
|
||||
/// Broadcast channel for shutdown signal
|
||||
pub shutdown_tx: broadcast::Sender<()>,
|
||||
}
|
||||
|
||||
impl AppState {
|
||||
pub fn new(
|
||||
build_state: Arc<RwLock<BuildState>>,
|
||||
bel_storage: Arc<dyn BELStorage>,
|
||||
command_tx: mpsc::Sender<Command>,
|
||||
shutdown_tx: broadcast::Sender<()>,
|
||||
) -> Self {
|
||||
// Initialize last_request_time to current time
|
||||
let now = SystemTime::now()
|
||||
.duration_since(UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_millis() as u64;
|
||||
|
||||
Self {
|
||||
build_state,
|
||||
bel_storage,
|
||||
command_tx,
|
||||
last_request_time: Arc::new(AtomicU64::new(now)),
|
||||
shutdown_tx,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Middleware to update last request time
|
||||
async fn update_last_request_time(
|
||||
State(state): State<AppState>,
|
||||
req: Request,
|
||||
next: Next,
|
||||
) -> Response {
|
||||
state.last_request_time.store(
|
||||
SystemTime::now()
|
||||
.duration_since(UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_millis() as u64,
|
||||
Ordering::Relaxed,
|
||||
);
|
||||
next.run(req).await
|
||||
}
|
||||
|
||||
/// Create the Axum router with all endpoints
|
||||
pub fn create_router(state: AppState) -> Router {
|
||||
// Configure CORS for web app development
|
||||
let cors = CorsLayer::new()
|
||||
.allow_origin("http://localhost:3538".parse::<HeaderValue>().unwrap())
|
||||
.allow_methods([Method::GET, Method::POST, Method::DELETE, Method::OPTIONS])
|
||||
.allow_headers([
|
||||
axum::http::header::CONTENT_TYPE,
|
||||
axum::http::header::AUTHORIZATION,
|
||||
]);
|
||||
|
||||
Router::new()
|
||||
// Health check
|
||||
.route("/health", get(health))
|
||||
// HTML pages
|
||||
.route("/", get(home_page))
|
||||
.route("/wants", get(wants_list_page))
|
||||
.route("/wants/create", get(want_create_page))
|
||||
.route("/wants/:id", get(want_detail_page))
|
||||
.route("/partitions", get(partitions_list_page))
|
||||
.route("/partitions/*id", get(partition_detail_page))
|
||||
.route("/job_runs", get(job_runs_list_page))
|
||||
.route("/job_runs/:id", get(job_run_detail_page))
|
||||
// JSON API endpoints
|
||||
.route("/api/wants", get(list_wants_json))
|
||||
.route("/api/wants", post(create_want))
|
||||
.route("/api/wants/:id", get(get_want_json))
|
||||
.route("/api/wants/:id", delete(cancel_want))
|
||||
.route("/api/partitions", get(list_partitions_json))
|
||||
.route("/api/job_runs", get(list_job_runs_json))
|
||||
// Add CORS middleware
|
||||
.layer(cors)
|
||||
// Add middleware to track request time
|
||||
.layer(middleware::from_fn_with_state(
|
||||
state.clone(),
|
||||
update_last_request_time,
|
||||
))
|
||||
.with_state(state)
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Error Handling
|
||||
// ============================================================================
|
||||
|
||||
/// Standard error response structure
|
||||
#[derive(serde::Serialize)]
|
||||
struct ErrorResponse {
|
||||
error: String,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
details: Option<serde_json::Value>,
|
||||
}
|
||||
|
||||
impl ErrorResponse {
|
||||
fn new(error: impl Into<String>) -> Self {
|
||||
Self {
|
||||
error: error.into(),
|
||||
details: None,
|
||||
}
|
||||
}
|
||||
|
||||
fn with_details(error: impl Into<String>, details: serde_json::Value) -> Self {
|
||||
Self {
|
||||
error: error.into(),
|
||||
details: Some(details),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// HTML Page Handlers
|
||||
// ============================================================================
|
||||
|
||||
/// Home page
|
||||
async fn home_page(State(state): State<AppState>) -> impl IntoResponse {
|
||||
let build_state = match state.build_state.read() {
|
||||
Ok(s) => s,
|
||||
Err(_) => return Html("<h1>Error: state lock poisoned</h1>".to_string()).into_response(),
|
||||
};
|
||||
|
||||
// Count active wants (not successful or canceled)
|
||||
let active_wants_count = build_state
|
||||
.list_wants(&ListWantsRequest::default())
|
||||
.data
|
||||
.iter()
|
||||
.filter(|w| {
|
||||
w.status
|
||||
.as_ref()
|
||||
.map(|s| s.name != "Successful" && s.name != "Canceled")
|
||||
.unwrap_or(true)
|
||||
})
|
||||
.count() as u64;
|
||||
|
||||
// Count active job runs (running or queued)
|
||||
let active_job_runs_count = build_state
|
||||
.list_job_runs(&ListJobRunsRequest::default())
|
||||
.data
|
||||
.iter()
|
||||
.filter(|jr| {
|
||||
jr.status
|
||||
.as_ref()
|
||||
.map(|s| s.name == "Running" || s.name == "Queued")
|
||||
.unwrap_or(false)
|
||||
})
|
||||
.count() as u64;
|
||||
|
||||
// Count live partitions
|
||||
let live_partitions_count = build_state
|
||||
.list_partitions(&ListPartitionsRequest::default())
|
||||
.data
|
||||
.iter()
|
||||
.filter(|p| {
|
||||
p.status
|
||||
.as_ref()
|
||||
.map(|s| s.code == PartitionStatusCode::PartitionLive as i32)
|
||||
.unwrap_or(false)
|
||||
})
|
||||
.count() as u64;
|
||||
|
||||
let template = HomePage {
|
||||
base: BaseContext::default(),
|
||||
active_wants_count,
|
||||
active_job_runs_count,
|
||||
live_partitions_count,
|
||||
};
|
||||
|
||||
match template.render() {
|
||||
Ok(html) => Html(html).into_response(),
|
||||
Err(e) => {
|
||||
tracing::error!("Template render error: {}", e);
|
||||
Html(format!("<h1>Template error: {}</h1>", e)).into_response()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Wants list page
|
||||
async fn wants_list_page(
|
||||
State(state): State<AppState>,
|
||||
Query(params): Query<ListWantsRequest>,
|
||||
) -> impl IntoResponse {
|
||||
let build_state = match state.build_state.read() {
|
||||
Ok(s) => s,
|
||||
Err(_) => return Html("<h1>Error: state lock poisoned</h1>".to_string()).into_response(),
|
||||
};
|
||||
|
||||
let response = build_state.list_wants(¶ms);
|
||||
let template = WantsListPage {
|
||||
base: BaseContext::default(),
|
||||
wants: response
|
||||
.data
|
||||
.into_iter()
|
||||
.map(WantDetailView::from)
|
||||
.collect(),
|
||||
page: response.page,
|
||||
page_size: response.page_size,
|
||||
total_count: response.match_count,
|
||||
};
|
||||
|
||||
match template.render() {
|
||||
Ok(html) => Html(html).into_response(),
|
||||
Err(e) => {
|
||||
tracing::error!("Template render error: {}", e);
|
||||
Html(format!("<h1>Template error: {}</h1>", e)).into_response()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Want detail page
|
||||
async fn want_detail_page(
|
||||
State(state): State<AppState>,
|
||||
Path(want_id): Path<String>,
|
||||
) -> impl IntoResponse {
|
||||
let build_state = match state.build_state.read() {
|
||||
Ok(s) => s,
|
||||
Err(_) => return Html("<h1>Error: state lock poisoned</h1>".to_string()).into_response(),
|
||||
};
|
||||
|
||||
match build_state.get_want(&want_id) {
|
||||
Some(want) => {
|
||||
// Fetch derivative wants
|
||||
let derivative_wants: Vec<_> = want
|
||||
.derivative_want_ids
|
||||
.iter()
|
||||
.filter_map(|id| build_state.get_want(id))
|
||||
.map(|w| DerivativeWantView::from(&w))
|
||||
.collect();
|
||||
|
||||
// Build lineage graph (up to 3 generations)
|
||||
let lineage_mermaid = build_lineage_graph(&build_state, &want.want_id, 3).to_mermaid();
|
||||
|
||||
let template = WantDetailPage {
|
||||
base: BaseContext::default(),
|
||||
want: WantDetailView::new(&want, derivative_wants, lineage_mermaid),
|
||||
};
|
||||
match template.render() {
|
||||
Ok(html) => Html(html).into_response(),
|
||||
Err(e) => Html(format!("<h1>Template error: {}</h1>", e)).into_response(),
|
||||
}
|
||||
}
|
||||
None => (
|
||||
StatusCode::NOT_FOUND,
|
||||
Html("<h1>Want not found</h1>".to_string()),
|
||||
)
|
||||
.into_response(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Want create page
|
||||
async fn want_create_page() -> impl IntoResponse {
|
||||
let template = WantCreatePage {
|
||||
base: BaseContext::default(),
|
||||
};
|
||||
|
||||
match template.render() {
|
||||
Ok(html) => Html(html).into_response(),
|
||||
Err(e) => Html(format!("<h1>Template error: {}</h1>", e)).into_response(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Partitions list page
|
||||
async fn partitions_list_page(
|
||||
State(state): State<AppState>,
|
||||
Query(params): Query<ListPartitionsRequest>,
|
||||
) -> impl IntoResponse {
|
||||
let build_state = match state.build_state.read() {
|
||||
Ok(s) => s,
|
||||
Err(_) => return Html("<h1>Error: state lock poisoned</h1>".to_string()).into_response(),
|
||||
};
|
||||
|
||||
let response = build_state.list_partitions(¶ms);
|
||||
let template = PartitionsListPage {
|
||||
base: BaseContext::default(),
|
||||
partitions: response
|
||||
.data
|
||||
.into_iter()
|
||||
.map(PartitionDetailView::from)
|
||||
.collect(),
|
||||
page: response.page,
|
||||
page_size: response.page_size,
|
||||
total_count: response.match_count,
|
||||
};
|
||||
|
||||
match template.render() {
|
||||
Ok(html) => Html(html).into_response(),
|
||||
Err(e) => Html(format!("<h1>Template error: {}</h1>", e)).into_response(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Partition detail page
|
||||
async fn partition_detail_page(
|
||||
State(state): State<AppState>,
|
||||
Path(partition_ref): Path<String>,
|
||||
) -> impl IntoResponse {
|
||||
let build_state = match state.build_state.read() {
|
||||
Ok(s) => s,
|
||||
Err(_) => return Html("<h1>Error: state lock poisoned</h1>".to_string()).into_response(),
|
||||
};
|
||||
|
||||
// Axum's Path extractor automatically percent-decodes the path parameter
|
||||
match build_state.get_partition(&partition_ref) {
|
||||
Some(partition) => {
|
||||
let template = PartitionDetailPage {
|
||||
base: BaseContext::default(),
|
||||
partition: PartitionDetailView::from(partition),
|
||||
};
|
||||
match template.render() {
|
||||
Ok(html) => Html(html).into_response(),
|
||||
Err(e) => Html(format!("<h1>Template error: {}</h1>", e)).into_response(),
|
||||
}
|
||||
}
|
||||
None => (
|
||||
StatusCode::NOT_FOUND,
|
||||
Html("<h1>Partition not found</h1>".to_string()),
|
||||
)
|
||||
.into_response(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Job runs list page
|
||||
async fn job_runs_list_page(
|
||||
State(state): State<AppState>,
|
||||
Query(params): Query<ListJobRunsRequest>,
|
||||
) -> impl IntoResponse {
|
||||
let build_state = match state.build_state.read() {
|
||||
Ok(s) => s,
|
||||
Err(_) => return Html("<h1>Error: state lock poisoned</h1>".to_string()).into_response(),
|
||||
};
|
||||
|
||||
let response = build_state.list_job_runs(¶ms);
|
||||
let template = JobRunsListPage {
|
||||
base: BaseContext::default(),
|
||||
job_runs: response
|
||||
.data
|
||||
.into_iter()
|
||||
.map(JobRunDetailView::from)
|
||||
.collect(),
|
||||
page: response.page,
|
||||
page_size: response.page_size,
|
||||
total_count: response.match_count,
|
||||
};
|
||||
|
||||
match template.render() {
|
||||
Ok(html) => Html(html).into_response(),
|
||||
Err(e) => Html(format!("<h1>Template error: {}</h1>", e)).into_response(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Job run detail page
|
||||
async fn job_run_detail_page(
|
||||
State(state): State<AppState>,
|
||||
Path(job_run_id): Path<String>,
|
||||
) -> impl IntoResponse {
|
||||
let build_state = match state.build_state.read() {
|
||||
Ok(s) => s,
|
||||
Err(_) => return Html("<h1>Error: state lock poisoned</h1>".to_string()).into_response(),
|
||||
};
|
||||
|
||||
match build_state.get_job_run(&job_run_id) {
|
||||
Some(job_run) => {
|
||||
let template = JobRunDetailPage {
|
||||
base: BaseContext::default(),
|
||||
job_run: JobRunDetailView::from(job_run),
|
||||
};
|
||||
match template.render() {
|
||||
Ok(html) => Html(html).into_response(),
|
||||
Err(e) => Html(format!("<h1>Template error: {}</h1>", e)).into_response(),
|
||||
}
|
||||
}
|
||||
None => (
|
||||
StatusCode::NOT_FOUND,
|
||||
Html("<h1>Job run not found</h1>".to_string()),
|
||||
)
|
||||
.into_response(),
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// JSON API Handlers
|
||||
// ============================================================================
|
||||
|
||||
/// Health check endpoint
|
||||
async fn health() -> impl IntoResponse {
|
||||
(StatusCode::OK, "OK")
|
||||
}
|
||||
|
||||
/// List all wants (JSON)
|
||||
async fn list_wants_json(
|
||||
State(state): State<AppState>,
|
||||
Query(params): Query<ListWantsRequest>,
|
||||
) -> impl IntoResponse {
|
||||
let build_state = match state.build_state.read() {
|
||||
Ok(state) => state,
|
||||
Err(e) => {
|
||||
tracing::error!("Failed to acquire read lock on build state: {}", e);
|
||||
return (
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
Json(ErrorResponse::new(
|
||||
"Internal server error: state lock poisoned",
|
||||
)),
|
||||
)
|
||||
.into_response();
|
||||
}
|
||||
};
|
||||
let response = build_state.list_wants_with_index(¶ms);
|
||||
|
||||
(StatusCode::OK, Json(response)).into_response()
|
||||
}
|
||||
|
||||
/// Get a specific want by ID (JSON)
|
||||
async fn get_want_json(
|
||||
State(state): State<AppState>,
|
||||
Path(want_id): Path<String>,
|
||||
) -> impl IntoResponse {
|
||||
let build_state = match state.build_state.read() {
|
||||
Ok(state) => state,
|
||||
Err(e) => {
|
||||
tracing::error!("Failed to acquire read lock on build state: {}", e);
|
||||
return (
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
Json(ErrorResponse::new(
|
||||
"Internal server error: state lock poisoned",
|
||||
)),
|
||||
)
|
||||
.into_response();
|
||||
}
|
||||
};
|
||||
match build_state.get_want_with_index(&want_id) {
|
||||
Some(response) => (StatusCode::OK, Json(response)).into_response(),
|
||||
None => {
|
||||
tracing::debug!("Want not found: {}", want_id);
|
||||
(
|
||||
StatusCode::NOT_FOUND,
|
||||
Json(ErrorResponse::with_details(
|
||||
"Want not found",
|
||||
serde_json::json!({"want_id": want_id}),
|
||||
)),
|
||||
)
|
||||
.into_response()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Create a new want
|
||||
async fn create_want(
|
||||
State(state): State<AppState>,
|
||||
Json(req): Json<CreateWantRequest>,
|
||||
) -> impl IntoResponse {
|
||||
// Create oneshot channel for reply
|
||||
let (reply_tx, reply_rx) = oneshot::channel();
|
||||
|
||||
// Send command to orchestrator
|
||||
let command = Command::CreateWant {
|
||||
request: req,
|
||||
reply: reply_tx,
|
||||
};
|
||||
|
||||
if let Err(_) = state.command_tx.send(command).await {
|
||||
return (
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
Json(serde_json::json!({
|
||||
"error": "Failed to send command to orchestrator"
|
||||
})),
|
||||
)
|
||||
.into_response();
|
||||
}
|
||||
|
||||
// Wait for orchestrator reply
|
||||
match reply_rx.await {
|
||||
Ok(Ok(response)) => {
|
||||
tracing::info!(
|
||||
"Created want: {}",
|
||||
response
|
||||
.data
|
||||
.as_ref()
|
||||
.map(|w| &w.want_id)
|
||||
.unwrap_or(&"unknown".to_string())
|
||||
);
|
||||
(StatusCode::OK, Json(response)).into_response()
|
||||
}
|
||||
Ok(Err(e)) => {
|
||||
tracing::error!("Failed to create want: {}", e);
|
||||
(
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
Json(ErrorResponse::new(format!("Failed to create want: {}", e))),
|
||||
)
|
||||
.into_response()
|
||||
}
|
||||
Err(_) => {
|
||||
tracing::error!("Orchestrator did not respond to create want command");
|
||||
(
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
Json(ErrorResponse::new("Orchestrator did not respond")),
|
||||
)
|
||||
.into_response()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Cancel a want
|
||||
async fn cancel_want(
|
||||
State(state): State<AppState>,
|
||||
Path(want_id): Path<String>,
|
||||
) -> impl IntoResponse {
|
||||
// Create oneshot channel for reply
|
||||
let (reply_tx, reply_rx) = oneshot::channel();
|
||||
|
||||
// Send command to orchestrator
|
||||
let command = Command::CancelWant {
|
||||
request: CancelWantRequest {
|
||||
want_id,
|
||||
source: None, // HTTP requests don't have a source
|
||||
comment: None,
|
||||
},
|
||||
reply: reply_tx,
|
||||
};
|
||||
|
||||
if let Err(_) = state.command_tx.send(command).await {
|
||||
return (
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
Json(serde_json::json!({
|
||||
"error": "Failed to send command to orchestrator"
|
||||
})),
|
||||
)
|
||||
.into_response();
|
||||
}
|
||||
|
||||
// Wait for orchestrator reply
|
||||
match reply_rx.await {
|
||||
Ok(Ok(response)) => {
|
||||
tracing::info!(
|
||||
"Cancelled want: {}",
|
||||
response
|
||||
.data
|
||||
.as_ref()
|
||||
.map(|w| &w.want_id)
|
||||
.unwrap_or(&"unknown".to_string())
|
||||
);
|
||||
(StatusCode::OK, Json(response)).into_response()
|
||||
}
|
||||
Ok(Err(e)) => {
|
||||
tracing::error!("Failed to cancel want: {}", e);
|
||||
(
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
Json(ErrorResponse::new(format!("Failed to cancel want: {}", e))),
|
||||
)
|
||||
.into_response()
|
||||
}
|
||||
Err(_) => {
|
||||
tracing::error!("Orchestrator did not respond to cancel want command");
|
||||
(
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
Json(ErrorResponse::new("Orchestrator did not respond")),
|
||||
)
|
||||
.into_response()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// List all partitions (JSON)
|
||||
async fn list_partitions_json(
|
||||
State(state): State<AppState>,
|
||||
Query(params): Query<ListPartitionsRequest>,
|
||||
) -> impl IntoResponse {
|
||||
let build_state = match state.build_state.read() {
|
||||
Ok(state) => state,
|
||||
Err(e) => {
|
||||
tracing::error!("Failed to acquire read lock on build state: {}", e);
|
||||
return (
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
Json(ErrorResponse::new(
|
||||
"Internal server error: state lock poisoned",
|
||||
)),
|
||||
)
|
||||
.into_response();
|
||||
}
|
||||
};
|
||||
let response = build_state.list_partitions_with_index(¶ms);
|
||||
|
||||
(StatusCode::OK, Json(response)).into_response()
|
||||
}
|
||||
|
||||
/// List all job runs (JSON)
|
||||
async fn list_job_runs_json(
|
||||
State(state): State<AppState>,
|
||||
Query(params): Query<ListJobRunsRequest>,
|
||||
) -> impl IntoResponse {
|
||||
let build_state = match state.build_state.read() {
|
||||
Ok(state) => state,
|
||||
Err(e) => {
|
||||
tracing::error!("Failed to acquire read lock on build state: {}", e);
|
||||
return (
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
Json(ErrorResponse::new(
|
||||
"Internal server error: state lock poisoned",
|
||||
)),
|
||||
)
|
||||
.into_response();
|
||||
}
|
||||
};
|
||||
let response = build_state.list_job_runs_with_index(¶ms);
|
||||
|
||||
(StatusCode::OK, Json(response)).into_response()
|
||||
}
|
||||
51
databuild/job.rs
Normal file
51
databuild/job.rs
Normal file
|
|
@ -0,0 +1,51 @@
|
|||
use crate::job_run::{JobRunHandle, SubProcessBackend};
|
||||
use crate::util::DatabuildError;
|
||||
use crate::{JobConfig, PartitionRef, WantDetail};
|
||||
use regex::Regex;
|
||||
use std::collections::HashMap;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct JobConfiguration {
|
||||
pub label: String,
|
||||
pub patterns: Vec<String>,
|
||||
pub entry_point: String,
|
||||
pub environment: HashMap<String, String>,
|
||||
}
|
||||
|
||||
impl JobConfiguration {
|
||||
/** Launch job to build the partitions specified by the provided wants. */
|
||||
pub fn spawn(
|
||||
&self,
|
||||
wants: Vec<WantDetail>,
|
||||
) -> Result<JobRunHandle<SubProcessBackend>, std::io::Error> {
|
||||
let wanted_refs: Vec<PartitionRef> = wants
|
||||
.iter()
|
||||
.flat_map(|want| want.partitions.clone())
|
||||
.collect();
|
||||
let args: Vec<String> = wanted_refs.iter().map(|pref| pref.r#ref.clone()).collect();
|
||||
Ok(JobRunHandle::spawn(self.entry_point.clone(), args))
|
||||
}
|
||||
|
||||
pub fn matches(&self, refs: &PartitionRef) -> bool {
|
||||
self.patterns.iter().any(|pattern| {
|
||||
let regex = Regex::new(&pattern).expect(&format!("Invalid regex pattern: {}", pattern));
|
||||
regex.is_match(&refs.r#ref)
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl From<JobConfig> for JobConfiguration {
|
||||
fn from(config: JobConfig) -> Self {
|
||||
Self {
|
||||
label: config.label,
|
||||
patterns: config.partition_patterns,
|
||||
entry_point: config.entrypoint,
|
||||
environment: config.environment,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn parse_job_configuration(s: &str) -> Result<JobConfiguration, DatabuildError> {
|
||||
let cfg: JobConfig = serde_json::from_str(s)?;
|
||||
Ok(cfg.into())
|
||||
}
|
||||
|
|
@ -1,27 +0,0 @@
|
|||
load("@rules_rust//rust:defs.bzl", "rust_binary", "rust_test")
|
||||
|
||||
rust_binary(
|
||||
name = "job_wrapper",
|
||||
srcs = ["main.rs"],
|
||||
visibility = ["//visibility:public"],
|
||||
deps = [
|
||||
"//databuild",
|
||||
"@crates//:serde",
|
||||
"@crates//:serde_json",
|
||||
"@crates//:uuid",
|
||||
"@crates//:sysinfo",
|
||||
],
|
||||
)
|
||||
|
||||
rust_test(
|
||||
name = "job_wrapper_test",
|
||||
srcs = ["main.rs"],
|
||||
deps = [
|
||||
"//databuild",
|
||||
"@crates//:serde",
|
||||
"@crates//:serde_json",
|
||||
"@crates//:uuid",
|
||||
"@crates//:sysinfo",
|
||||
"@crates//:tempfile",
|
||||
],
|
||||
)
|
||||
|
|
@ -1,4 +0,0 @@
|
|||
|
||||
# DataBuild Jobs
|
||||
|
||||
Contains wrappers and tools for implementing DataBuild jobs.
|
||||
|
|
@ -1,985 +0,0 @@
|
|||
use std::env;
|
||||
use std::io::{self, Read, Write};
|
||||
use std::process::{Command, Stdio};
|
||||
use std::sync::{mpsc, Arc, Mutex};
|
||||
use std::thread;
|
||||
use std::time::{Duration, SystemTime, UNIX_EPOCH};
|
||||
// All serialization handled by protobuf serde derives
|
||||
use serde_json;
|
||||
use sysinfo::{Pid, ProcessRefreshKind, System};
|
||||
use uuid::Uuid;
|
||||
|
||||
// Import protobuf types from databuild
|
||||
use databuild::{
|
||||
job_log_entry, log_message, JobConfig, JobLabel, JobLogEntry, LogMessage, PartitionManifest,
|
||||
PartitionRef, Task, WrapperJobEvent,
|
||||
};
|
||||
|
||||
// All types now come from protobuf - no custom structs needed
|
||||
|
||||
// Configuration constants
|
||||
const DEFAULT_HEARTBEAT_INTERVAL_MS: u64 = 30_000; // 30 seconds
|
||||
const DEFAULT_METRICS_INTERVAL_MS: u64 = 100; // 100 milliseconds
|
||||
const TEST_HEARTBEAT_INTERVAL_MS: u64 = 100; // Fast heartbeats for testing
|
||||
const TEST_METRICS_INTERVAL_MS: u64 = 50; // Fast metrics for testing
|
||||
|
||||
#[derive(Debug)]
|
||||
struct HeartbeatMessage {
|
||||
entry: JobLogEntry,
|
||||
}
|
||||
|
||||
fn get_timestamp() -> String {
|
||||
SystemTime::now()
|
||||
.duration_since(UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_secs()
|
||||
.to_string()
|
||||
}
|
||||
|
||||
trait LogSink {
|
||||
fn emit(&mut self, entry: JobLogEntry);
|
||||
}
|
||||
|
||||
struct StdoutSink;
|
||||
|
||||
impl LogSink for StdoutSink {
|
||||
fn emit(&mut self, entry: JobLogEntry) {
|
||||
println!("{}", serde_json::to_string(&entry).unwrap());
|
||||
}
|
||||
}
|
||||
|
||||
struct JobWrapper<S: LogSink> {
|
||||
job_id: String,
|
||||
sequence_number: u64,
|
||||
start_time: i64,
|
||||
sink: S,
|
||||
}
|
||||
|
||||
impl JobWrapper<StdoutSink> {
|
||||
fn new() -> Self {
|
||||
Self::new_with_sink(StdoutSink)
|
||||
}
|
||||
}
|
||||
|
||||
impl<S: LogSink> JobWrapper<S> {
|
||||
fn new_with_sink(sink: S) -> Self {
|
||||
// Use job ID from environment if provided by graph execution, otherwise generate one
|
||||
let job_id = env::var("DATABUILD_JOB_RUN_ID")
|
||||
.unwrap_or_else(|_| Uuid::new_v4().to_string());
|
||||
|
||||
Self {
|
||||
job_id,
|
||||
sequence_number: 0,
|
||||
start_time: SystemTime::now()
|
||||
.duration_since(UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_secs() as i64,
|
||||
sink,
|
||||
}
|
||||
}
|
||||
|
||||
fn next_sequence(&mut self) -> u64 {
|
||||
self.sequence_number += 1;
|
||||
self.sequence_number
|
||||
}
|
||||
|
||||
fn emit_log(&mut self, outputs: &[PartitionRef], content: job_log_entry::Content) {
|
||||
let entry = JobLogEntry {
|
||||
timestamp: get_timestamp(),
|
||||
job_id: self.job_id.clone(),
|
||||
outputs: outputs.to_vec(),
|
||||
sequence_number: self.next_sequence(),
|
||||
content: Some(content),
|
||||
};
|
||||
|
||||
self.sink.emit(entry);
|
||||
}
|
||||
|
||||
fn config_mode(&mut self, outputs: Vec<String>) -> Result<(), Box<dyn std::error::Error>> {
|
||||
// Convert to PartitionRef objects
|
||||
let output_refs: Vec<PartitionRef> = outputs
|
||||
.iter()
|
||||
.map(|s| PartitionRef { r#str: s.clone() })
|
||||
.collect();
|
||||
|
||||
// Following the state diagram: wrapper_validate_config -> emit_config_validate_success
|
||||
self.emit_log(
|
||||
&output_refs,
|
||||
job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "config_validate_success".to_string(),
|
||||
metadata: std::collections::HashMap::new(),
|
||||
job_status: None,
|
||||
exit_code: None,
|
||||
job_label: None, // Will be enriched by LogCollector
|
||||
}),
|
||||
);
|
||||
|
||||
// For Phase 0, we still need to produce the expected JSON config format
|
||||
// so the current graph system can parse it. Later phases will change this.
|
||||
let config = JobConfig {
|
||||
outputs: output_refs.clone(),
|
||||
inputs: vec![],
|
||||
args: outputs.clone(),
|
||||
env: {
|
||||
let mut env_map = std::collections::HashMap::new();
|
||||
if let Some(partition_ref) = outputs.first() {
|
||||
env_map.insert("PARTITION_REF".to_string(), partition_ref.clone());
|
||||
}
|
||||
env_map
|
||||
},
|
||||
};
|
||||
|
||||
// For config mode, we need to output the standard config format to stdout
|
||||
// The structured logs will come later during exec mode
|
||||
let configs_wrapper = serde_json::json!({
|
||||
"configs": [config]
|
||||
});
|
||||
|
||||
println!("{}", serde_json::to_string(&configs_wrapper)?);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn exec_mode(&mut self, job_binary: &str) -> Result<(), Box<dyn std::error::Error>> {
|
||||
// Read the job config from stdin
|
||||
let mut buffer = String::new();
|
||||
io::stdin().read_to_string(&mut buffer)?;
|
||||
|
||||
let config: JobConfig = serde_json::from_str(&buffer)?;
|
||||
self.exec_mode_with_config(job_binary, config)
|
||||
}
|
||||
|
||||
fn exec_mode_with_config(
|
||||
&mut self,
|
||||
job_binary: &str,
|
||||
config: JobConfig,
|
||||
) -> Result<(), Box<dyn std::error::Error>> {
|
||||
let outputs = &config.outputs;
|
||||
|
||||
// Following the state diagram:
|
||||
// 1. wrapper_validate_config -> emit_config_validate_success
|
||||
self.emit_log(
|
||||
outputs,
|
||||
job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "config_validate_success".to_string(),
|
||||
job_status: None,
|
||||
exit_code: None,
|
||||
metadata: std::collections::HashMap::new(),
|
||||
job_label: None, // Will be enriched by LogCollector
|
||||
}),
|
||||
);
|
||||
|
||||
// 2. wrapper_launch_task -> emit_task_launch_success
|
||||
self.emit_log(
|
||||
outputs,
|
||||
job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "task_launch_success".to_string(),
|
||||
job_status: None,
|
||||
exit_code: None,
|
||||
metadata: std::collections::HashMap::new(),
|
||||
job_label: None, // Will be enriched by LogCollector
|
||||
}),
|
||||
);
|
||||
|
||||
// Execute the original job binary with the exec subcommand
|
||||
let mut cmd = Command::new(job_binary);
|
||||
cmd.arg("exec");
|
||||
|
||||
// Add the args from the config
|
||||
for arg in &config.args {
|
||||
cmd.arg(arg);
|
||||
}
|
||||
|
||||
cmd.stdin(Stdio::piped())
|
||||
.stdout(Stdio::piped())
|
||||
.stderr(Stdio::piped());
|
||||
|
||||
// Set environment variables from config
|
||||
for (key, value) in &config.env {
|
||||
cmd.env(key, value);
|
||||
}
|
||||
|
||||
let mut child = cmd.spawn()?;
|
||||
let child_pid = child.id();
|
||||
|
||||
// Send the config to the job
|
||||
if let Some(stdin) = child.stdin.as_mut() {
|
||||
stdin.write_all(serde_json::to_string(&config).unwrap().as_bytes())?;
|
||||
}
|
||||
|
||||
// Start heartbeat thread with channel communication
|
||||
let heartbeat_job_id = self.job_id.clone();
|
||||
let heartbeat_outputs = outputs.clone();
|
||||
let heartbeat_sequence = Arc::new(Mutex::new(0u64));
|
||||
let heartbeat_sequence_clone = heartbeat_sequence.clone();
|
||||
let (heartbeat_tx, heartbeat_rx) = mpsc::channel::<HeartbeatMessage>();
|
||||
|
||||
let heartbeat_handle = thread::spawn(move || {
|
||||
let mut system = System::new_all();
|
||||
let pid = Pid::from(child_pid as usize);
|
||||
|
||||
let heartbeat_interval_ms = env::var("DATABUILD_HEARTBEAT_INTERVAL_MS")
|
||||
.unwrap_or_else(|_| DEFAULT_HEARTBEAT_INTERVAL_MS.to_string())
|
||||
.parse::<u64>()
|
||||
.unwrap_or(DEFAULT_HEARTBEAT_INTERVAL_MS);
|
||||
|
||||
loop {
|
||||
thread::sleep(Duration::from_millis(heartbeat_interval_ms));
|
||||
|
||||
// Refresh process info
|
||||
system.refresh_processes_specifics(ProcessRefreshKind::new());
|
||||
|
||||
// Check if process still exists
|
||||
if let Some(process) = system.process(pid) {
|
||||
let memory_mb = process.memory() as f64 / 1024.0 / 1024.0;
|
||||
let cpu_percent = process.cpu_usage();
|
||||
|
||||
// Create heartbeat event with metrics
|
||||
let mut metadata = std::collections::HashMap::new();
|
||||
metadata.insert("memory_usage_mb".to_string(), format!("{:.3}", memory_mb));
|
||||
metadata.insert(
|
||||
"cpu_usage_percent".to_string(),
|
||||
format!("{:.3}", cpu_percent),
|
||||
);
|
||||
|
||||
// Get next sequence number for heartbeat
|
||||
let seq = {
|
||||
let mut seq_lock = heartbeat_sequence_clone.lock().unwrap();
|
||||
*seq_lock += 1;
|
||||
*seq_lock
|
||||
};
|
||||
|
||||
let heartbeat_event = JobLogEntry {
|
||||
timestamp: get_timestamp(),
|
||||
job_id: heartbeat_job_id.clone(),
|
||||
outputs: heartbeat_outputs.clone(),
|
||||
sequence_number: seq,
|
||||
content: Some(job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "heartbeat".to_string(),
|
||||
job_status: None,
|
||||
exit_code: None,
|
||||
metadata,
|
||||
job_label: None, // Will be enriched by LogCollector
|
||||
})),
|
||||
};
|
||||
|
||||
// Send heartbeat through channel instead of printing directly
|
||||
if heartbeat_tx.send(HeartbeatMessage { entry: heartbeat_event }).is_err() {
|
||||
// Main thread dropped receiver, exit
|
||||
break;
|
||||
}
|
||||
} else {
|
||||
// Process no longer exists, exit heartbeat thread
|
||||
break;
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
// Track metrics while job is running
|
||||
let job_start_time = SystemTime::now();
|
||||
let mut system = System::new();
|
||||
let pid = Pid::from(child_pid as usize);
|
||||
|
||||
// Initial refresh to establish baseline for CPU measurements
|
||||
system.refresh_cpu();
|
||||
system.refresh_processes_specifics(ProcessRefreshKind::new().with_cpu());
|
||||
|
||||
let mut peak_memory_mb = 0.0f64;
|
||||
let mut cpu_samples = Vec::new();
|
||||
let mut stdout_buffer = Vec::new();
|
||||
let mut stderr_buffer = Vec::new();
|
||||
|
||||
// Sleep briefly to allow the process to start up before measuring
|
||||
let sample_interval_ms = env::var("DATABUILD_METRICS_INTERVAL_MS")
|
||||
.unwrap_or_else(|_| DEFAULT_METRICS_INTERVAL_MS.to_string())
|
||||
.parse::<u64>()
|
||||
.unwrap_or(DEFAULT_METRICS_INTERVAL_MS);
|
||||
thread::sleep(Duration::from_millis(sample_interval_ms));
|
||||
|
||||
// Poll process status and metrics
|
||||
let (output, peak_memory_mb, total_cpu_ms, job_duration) = loop {
|
||||
// Check if process has exited
|
||||
match child.try_wait()? {
|
||||
Some(status) => {
|
||||
// Process has exited, collect any remaining output
|
||||
if let Some(mut stdout) = child.stdout.take() {
|
||||
stdout.read_to_end(&mut stdout_buffer)?;
|
||||
}
|
||||
if let Some(mut stderr) = child.stderr.take() {
|
||||
stderr.read_to_end(&mut stderr_buffer)?;
|
||||
}
|
||||
|
||||
// Calculate final metrics
|
||||
let job_duration = job_start_time.elapsed().map_err(|e| {
|
||||
io::Error::new(
|
||||
io::ErrorKind::Other,
|
||||
format!("Time calculation error: {}", e),
|
||||
)
|
||||
})?;
|
||||
|
||||
// Calculate CPU time: average CPU percentage * wall-clock time
|
||||
let total_cpu_ms = if cpu_samples.is_empty() {
|
||||
0.0
|
||||
} else {
|
||||
let avg_cpu_percent =
|
||||
cpu_samples.iter().sum::<f32>() as f64 / cpu_samples.len() as f64;
|
||||
(avg_cpu_percent / 100.0) * job_duration.as_millis() as f64
|
||||
};
|
||||
|
||||
// Stop heartbeat thread
|
||||
drop(heartbeat_handle);
|
||||
|
||||
// Process any remaining heartbeat messages
|
||||
while let Ok(heartbeat_msg) = heartbeat_rx.try_recv() {
|
||||
self.sink.emit(heartbeat_msg.entry);
|
||||
}
|
||||
|
||||
// Update sequence number to account for heartbeats
|
||||
let heartbeat_count = heartbeat_sequence.lock().unwrap();
|
||||
self.sequence_number = self.sequence_number.max(*heartbeat_count);
|
||||
drop(heartbeat_count);
|
||||
|
||||
// Create output struct to match original behavior
|
||||
let output = std::process::Output {
|
||||
status,
|
||||
stdout: stdout_buffer,
|
||||
stderr: stderr_buffer,
|
||||
};
|
||||
|
||||
break (output, peak_memory_mb, total_cpu_ms, job_duration);
|
||||
}
|
||||
None => {
|
||||
// Check for heartbeat messages and emit them
|
||||
while let Ok(heartbeat_msg) = heartbeat_rx.try_recv() {
|
||||
self.sink.emit(heartbeat_msg.entry);
|
||||
}
|
||||
|
||||
// Process still running, collect metrics
|
||||
// Refresh CPU info and processes
|
||||
system.refresh_cpu();
|
||||
system.refresh_processes_specifics(ProcessRefreshKind::new().with_cpu());
|
||||
|
||||
// Sleep to allow CPU measurement interval
|
||||
thread::sleep(Duration::from_millis(sample_interval_ms));
|
||||
|
||||
// Refresh again to get updated CPU usage
|
||||
system.refresh_cpu();
|
||||
system.refresh_processes_specifics(ProcessRefreshKind::new().with_cpu());
|
||||
|
||||
if let Some(process) = system.process(pid) {
|
||||
let memory_mb = process.memory() as f64 / 1024.0 / 1024.0;
|
||||
peak_memory_mb = peak_memory_mb.max(memory_mb);
|
||||
let cpu_usage = process.cpu_usage();
|
||||
cpu_samples.push(cpu_usage);
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
let success = output.status.success();
|
||||
let exit_code = output.status.code().unwrap_or(-1);
|
||||
|
||||
// Capture and forward job stdout/stderr as log messages
|
||||
if !output.stdout.is_empty() {
|
||||
let stdout_str = String::from_utf8_lossy(&output.stdout);
|
||||
self.emit_log(
|
||||
outputs,
|
||||
job_log_entry::Content::Log(LogMessage {
|
||||
level: log_message::LogLevel::Info as i32,
|
||||
message: stdout_str.to_string(),
|
||||
fields: std::collections::HashMap::new(),
|
||||
}),
|
||||
);
|
||||
}
|
||||
|
||||
if !output.stderr.is_empty() {
|
||||
let stderr_str = String::from_utf8_lossy(&output.stderr);
|
||||
self.emit_log(
|
||||
outputs,
|
||||
job_log_entry::Content::Log(LogMessage {
|
||||
level: log_message::LogLevel::Error as i32,
|
||||
message: stderr_str.to_string(),
|
||||
fields: std::collections::HashMap::new(),
|
||||
}),
|
||||
);
|
||||
}
|
||||
|
||||
// Emit job summary with resource metrics
|
||||
let mut summary_metadata = std::collections::HashMap::new();
|
||||
summary_metadata.insert(
|
||||
"runtime_ms".to_string(),
|
||||
format!("{:.3}", job_duration.as_millis() as f64),
|
||||
);
|
||||
summary_metadata.insert(
|
||||
"peak_memory_mb".to_string(),
|
||||
format!("{:.3}", peak_memory_mb),
|
||||
);
|
||||
summary_metadata.insert("total_cpu_ms".to_string(), format!("{:.3}", total_cpu_ms));
|
||||
summary_metadata.insert("exit_code".to_string(), exit_code.to_string());
|
||||
|
||||
self.emit_log(
|
||||
outputs,
|
||||
job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "job_summary".to_string(),
|
||||
job_status: None,
|
||||
exit_code: Some(exit_code),
|
||||
metadata: summary_metadata,
|
||||
job_label: None, // Will be enriched by LogCollector
|
||||
}),
|
||||
);
|
||||
|
||||
if success {
|
||||
// Following the state diagram: wrapper_monitor_task -> zero exit -> emit_task_success
|
||||
self.emit_log(
|
||||
outputs,
|
||||
job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "task_success".to_string(),
|
||||
job_status: Some("JOB_COMPLETED".to_string()),
|
||||
exit_code: Some(exit_code),
|
||||
metadata: std::collections::HashMap::new(),
|
||||
job_label: None, // Will be enriched by LogCollector
|
||||
}),
|
||||
);
|
||||
|
||||
// Then emit_partition_manifest -> success
|
||||
let end_time = SystemTime::now()
|
||||
.duration_since(UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_secs() as i64;
|
||||
|
||||
self.emit_log(
|
||||
outputs,
|
||||
job_log_entry::Content::Manifest(PartitionManifest {
|
||||
outputs: config.outputs.clone(),
|
||||
inputs: vec![], // Phase 0: no input manifests yet
|
||||
start_time: self.start_time,
|
||||
end_time,
|
||||
task: Some(Task {
|
||||
job: Some(JobLabel {
|
||||
label: env::var("DATABUILD_JOB_LABEL")
|
||||
.unwrap_or_else(|_| "unknown".to_string()),
|
||||
}),
|
||||
config: Some(config.clone()),
|
||||
}),
|
||||
metadata: std::collections::HashMap::new(), // Phase 0: no metadata yet
|
||||
}),
|
||||
);
|
||||
} else {
|
||||
// Following the state diagram: wrapper_monitor_task -> non-zero exit -> emit_task_failed
|
||||
self.emit_log(
|
||||
outputs,
|
||||
job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "task_failed".to_string(),
|
||||
job_status: Some("JOB_FAILED".to_string()),
|
||||
exit_code: Some(exit_code),
|
||||
metadata: std::collections::HashMap::new(),
|
||||
job_label: None, // Will be enriched by LogCollector
|
||||
}),
|
||||
);
|
||||
|
||||
// Then emit_job_exec_fail -> fail (don't emit partition manifest on failure)
|
||||
self.emit_log(
|
||||
outputs,
|
||||
job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "job_exec_fail".to_string(),
|
||||
job_status: Some("JOB_FAILED".to_string()),
|
||||
exit_code: Some(exit_code),
|
||||
metadata: {
|
||||
let mut meta = std::collections::HashMap::new();
|
||||
meta.insert(
|
||||
"error".to_string(),
|
||||
format!("Job failed with exit code {}", exit_code),
|
||||
);
|
||||
meta
|
||||
},
|
||||
job_label: None, // Will be enriched by LogCollector
|
||||
}),
|
||||
);
|
||||
}
|
||||
|
||||
// Forward the original job's output to stdout for compatibility
|
||||
io::stdout().write_all(&output.stdout)?;
|
||||
io::stderr().write_all(&output.stderr)?;
|
||||
|
||||
if !success {
|
||||
std::process::exit(exit_code);
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
let args: Vec<String> = env::args().collect();
|
||||
|
||||
if args.len() < 2 {
|
||||
eprintln!("Usage: job_wrapper <config|exec> [args...]");
|
||||
std::process::exit(1);
|
||||
}
|
||||
|
||||
let mode = &args[1];
|
||||
let mut wrapper = JobWrapper::new();
|
||||
|
||||
match mode.as_str() {
|
||||
"config" => {
|
||||
let outputs = args[2..].to_vec();
|
||||
wrapper.config_mode(outputs)?;
|
||||
}
|
||||
"exec" => {
|
||||
// For exec mode, we need to know which original job binary to call
|
||||
// For Phase 0, we'll derive this from environment or make it configurable
|
||||
let job_binary =
|
||||
env::var("DATABUILD_JOB_BINARY").unwrap_or_else(|_| "python3".to_string()); // Default fallback
|
||||
|
||||
wrapper.exec_mode(&job_binary)?;
|
||||
}
|
||||
_ => {
|
||||
eprintln!("Unknown mode: {}", mode);
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
// Test infrastructure
|
||||
struct TestSink {
|
||||
entries: Vec<JobLogEntry>,
|
||||
}
|
||||
|
||||
impl TestSink {
|
||||
fn new() -> Self {
|
||||
Self {
|
||||
entries: Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
fn find_event(&self, event_type: &str) -> Option<&JobLogEntry> {
|
||||
self.entries.iter().find(|entry| {
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &entry.content {
|
||||
event.event_type == event_type
|
||||
} else {
|
||||
false
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl LogSink for TestSink {
|
||||
fn emit(&mut self, entry: JobLogEntry) {
|
||||
self.entries.push(entry);
|
||||
}
|
||||
}
|
||||
|
||||
// Helper functions for testing
|
||||
fn generate_test_config(outputs: &[String]) -> JobConfig {
|
||||
JobConfig {
|
||||
outputs: outputs
|
||||
.iter()
|
||||
.map(|s| PartitionRef { r#str: s.clone() })
|
||||
.collect(),
|
||||
inputs: vec![],
|
||||
args: outputs.to_vec(),
|
||||
env: {
|
||||
let mut env_map = std::collections::HashMap::new();
|
||||
if let Some(partition_ref) = outputs.first() {
|
||||
env_map.insert("PARTITION_REF".to_string(), partition_ref.clone());
|
||||
}
|
||||
env_map
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_job_log_entry_serialization() {
|
||||
let entry = JobLogEntry {
|
||||
timestamp: "1234567890".to_string(),
|
||||
job_id: "test-id".to_string(),
|
||||
outputs: vec![PartitionRef { r#str: "test/partition".to_string() }],
|
||||
sequence_number: 1,
|
||||
content: Some(job_log_entry::Content::Log(LogMessage {
|
||||
level: log_message::LogLevel::Info as i32,
|
||||
message: "test message".to_string(),
|
||||
fields: std::collections::HashMap::new(),
|
||||
})),
|
||||
};
|
||||
|
||||
let json = serde_json::to_string(&entry).unwrap();
|
||||
assert!(json.contains("\"timestamp\":\"1234567890\""));
|
||||
assert!(json.contains("\"sequence_number\":1"));
|
||||
assert!(json.contains("\"Log\":{")); // Capitalized field name
|
||||
assert!(json.contains("\"message\":\"test message\""));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_sequence_number_increment() {
|
||||
let mut wrapper = JobWrapper::new();
|
||||
assert_eq!(wrapper.next_sequence(), 1);
|
||||
assert_eq!(wrapper.next_sequence(), 2);
|
||||
assert_eq!(wrapper.next_sequence(), 3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_config_mode_output_format() {
|
||||
let outputs = vec!["test/partition".to_string()];
|
||||
let config = generate_test_config(&outputs);
|
||||
|
||||
// Verify it produces expected structure
|
||||
assert_eq!(config.outputs.len(), 1);
|
||||
assert_eq!(config.outputs[0].r#str, "test/partition");
|
||||
assert_eq!(config.args, outputs);
|
||||
assert_eq!(
|
||||
config.env.get("PARTITION_REF"),
|
||||
Some(&"test/partition".to_string())
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_multiple_outputs_config() {
|
||||
let outputs = vec![
|
||||
"reviews/date=2025-01-01".to_string(),
|
||||
"reviews/date=2025-01-02".to_string(),
|
||||
];
|
||||
let config = generate_test_config(&outputs);
|
||||
|
||||
assert_eq!(config.outputs.len(), 2);
|
||||
assert_eq!(config.outputs[0].r#str, "reviews/date=2025-01-01");
|
||||
assert_eq!(config.outputs[1].r#str, "reviews/date=2025-01-02");
|
||||
// First output is used as PARTITION_REF
|
||||
assert_eq!(
|
||||
config.env.get("PARTITION_REF"),
|
||||
Some(&"reviews/date=2025-01-01".to_string())
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_wrapper_job_event_creation() {
|
||||
// Test success event
|
||||
let event = WrapperJobEvent {
|
||||
event_type: "task_success".to_string(),
|
||||
job_status: Some("JOB_COMPLETED".to_string()),
|
||||
exit_code: Some(0),
|
||||
metadata: std::collections::HashMap::new(),
|
||||
job_label: None,
|
||||
};
|
||||
assert_eq!(event.event_type, "task_success");
|
||||
assert_eq!(event.job_status, Some("JOB_COMPLETED".to_string()));
|
||||
assert_eq!(event.exit_code, Some(0));
|
||||
|
||||
// Test failure event
|
||||
let event = WrapperJobEvent {
|
||||
event_type: "task_failed".to_string(),
|
||||
job_status: Some("JOB_FAILED".to_string()),
|
||||
exit_code: Some(1),
|
||||
metadata: std::collections::HashMap::new(),
|
||||
job_label: None,
|
||||
};
|
||||
assert_eq!(event.event_type, "task_failed");
|
||||
assert_eq!(event.job_status, Some("JOB_FAILED".to_string()));
|
||||
assert_eq!(event.exit_code, Some(1));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_log_message_levels() {
|
||||
let info_log = LogMessage {
|
||||
level: log_message::LogLevel::Info as i32,
|
||||
message: "info message".to_string(),
|
||||
fields: std::collections::HashMap::new(),
|
||||
};
|
||||
assert_eq!(info_log.level, log_message::LogLevel::Info as i32);
|
||||
|
||||
let error_log = LogMessage {
|
||||
level: log_message::LogLevel::Error as i32,
|
||||
message: "error message".to_string(),
|
||||
fields: std::collections::HashMap::new(),
|
||||
};
|
||||
assert_eq!(error_log.level, log_message::LogLevel::Error as i32);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_partition_manifest_structure() {
|
||||
let config = generate_test_config(&vec!["test/partition".to_string()]);
|
||||
let manifest = PartitionManifest {
|
||||
outputs: config.outputs.clone(),
|
||||
inputs: vec![],
|
||||
start_time: 1234567890,
|
||||
end_time: 1234567900,
|
||||
task: Some(Task {
|
||||
job: Some(JobLabel {
|
||||
label: "//test:job".to_string(),
|
||||
}),
|
||||
config: Some(config),
|
||||
}),
|
||||
metadata: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
assert_eq!(manifest.outputs.len(), 1);
|
||||
assert_eq!(manifest.outputs[0].r#str, "test/partition");
|
||||
assert_eq!(manifest.end_time - manifest.start_time, 10);
|
||||
assert!(manifest.task.is_some());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_timestamp_generation() {
|
||||
let ts1 = get_timestamp();
|
||||
std::thread::sleep(std::time::Duration::from_millis(10));
|
||||
let ts2 = get_timestamp();
|
||||
|
||||
// Timestamps should be parseable as integers
|
||||
let t1: u64 = ts1.parse().expect("Should be valid timestamp");
|
||||
let t2: u64 = ts2.parse().expect("Should be valid timestamp");
|
||||
|
||||
// Second timestamp should be equal or greater
|
||||
assert!(t2 >= t1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_job_wrapper_initialization() {
|
||||
let wrapper = JobWrapper::new();
|
||||
assert_eq!(wrapper.sequence_number, 0);
|
||||
assert!(!wrapper.job_id.is_empty());
|
||||
assert!(wrapper.start_time > 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_cpu_metrics_are_captured() {
|
||||
use std::io::Write;
|
||||
use tempfile::NamedTempFile;
|
||||
|
||||
// Create a CPU-intensive test script
|
||||
let mut temp_file = NamedTempFile::new().expect("Failed to create temp file");
|
||||
let script_content = r#"#!/usr/bin/env python3
|
||||
import sys
|
||||
import json
|
||||
import time
|
||||
|
||||
if len(sys.argv) > 1 and sys.argv[1] == "config":
|
||||
config = {
|
||||
"outputs": [{"str": "test/cpu"}],
|
||||
"inputs": [],
|
||||
"args": [],
|
||||
"env": {"PARTITION_REF": "test/cpu"}
|
||||
}
|
||||
print(json.dumps({"configs": [config]}))
|
||||
elif len(sys.argv) > 1 and sys.argv[1] == "exec":
|
||||
# CPU-intensive work that runs longer
|
||||
start_time = time.time()
|
||||
total = 0
|
||||
while time.time() - start_time < 0.5: # Run for at least 500ms
|
||||
total += sum(range(1_000_000))
|
||||
print(f"Sum: {total}")
|
||||
"#;
|
||||
|
||||
temp_file
|
||||
.write_all(script_content.as_bytes())
|
||||
.expect("Failed to write script");
|
||||
let script_path = temp_file.path().to_str().unwrap();
|
||||
|
||||
// Make script executable
|
||||
std::fs::set_permissions(
|
||||
script_path,
|
||||
std::os::unix::fs::PermissionsExt::from_mode(0o755),
|
||||
)
|
||||
.expect("Failed to set permissions");
|
||||
|
||||
// Set up environment for fast sampling and the test script
|
||||
env::set_var("DATABUILD_METRICS_INTERVAL_MS", "10"); // Even faster for CPU test
|
||||
env::set_var("DATABUILD_JOB_BINARY", script_path);
|
||||
|
||||
// Create test sink and wrapper
|
||||
let sink = TestSink::new();
|
||||
let mut wrapper = JobWrapper::new_with_sink(sink);
|
||||
|
||||
// Create a JobConfig for the test
|
||||
let config = JobConfig {
|
||||
outputs: vec![PartitionRef {
|
||||
r#str: "test/cpu".to_string(),
|
||||
}],
|
||||
inputs: vec![],
|
||||
args: vec![],
|
||||
env: {
|
||||
let mut env_map = std::collections::HashMap::new();
|
||||
env_map.insert("PARTITION_REF".to_string(), "test/cpu".to_string());
|
||||
env_map
|
||||
},
|
||||
};
|
||||
|
||||
// We need to simulate stdin for exec_mode - let's create a test-specific exec method
|
||||
// that takes the config directly rather than reading from stdin
|
||||
let result = wrapper.exec_mode_with_config(script_path, config);
|
||||
|
||||
// Clean up environment
|
||||
env::remove_var("DATABUILD_METRICS_INTERVAL_MS");
|
||||
env::remove_var("DATABUILD_JOB_BINARY");
|
||||
|
||||
// Check that exec_mode succeeded
|
||||
if let Err(e) = &result {
|
||||
println!("exec_mode failed with error: {}", e);
|
||||
}
|
||||
assert!(result.is_ok(), "exec_mode should succeed: {:?}", result);
|
||||
|
||||
// Find the job_summary event
|
||||
let summary_event = wrapper
|
||||
.sink
|
||||
.find_event("job_summary")
|
||||
.expect("Should have job_summary event");
|
||||
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &summary_event.content {
|
||||
// Verify we have CPU metrics
|
||||
let cpu_ms_str = event
|
||||
.metadata
|
||||
.get("total_cpu_ms")
|
||||
.expect("Should have total_cpu_ms metric");
|
||||
let cpu_ms: f64 = cpu_ms_str
|
||||
.parse()
|
||||
.expect("CPU metric should be valid float");
|
||||
|
||||
// For CPU-intensive work, we should get non-zero CPU time
|
||||
assert!(
|
||||
cpu_ms > 0.0,
|
||||
"Expected non-zero CPU time for CPU-intensive workload, but got {:.3}ms",
|
||||
cpu_ms
|
||||
);
|
||||
|
||||
// Also verify runtime is reasonable
|
||||
let runtime_ms_str = event
|
||||
.metadata
|
||||
.get("runtime_ms")
|
||||
.expect("Should have runtime_ms metric");
|
||||
let runtime_ms: f64 = runtime_ms_str
|
||||
.parse()
|
||||
.expect("Runtime metric should be valid float");
|
||||
assert!(runtime_ms > 0.0, "Should have non-zero runtime");
|
||||
|
||||
println!(
|
||||
"CPU test results: {:.3}ms CPU time over {:.3}ms runtime",
|
||||
cpu_ms, runtime_ms
|
||||
);
|
||||
} else {
|
||||
panic!("job_summary event should contain JobEvent");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_heartbeat_functionality() {
|
||||
use std::io::Write;
|
||||
use tempfile::NamedTempFile;
|
||||
|
||||
// Create a longer-running test script to trigger heartbeats
|
||||
let mut temp_file = NamedTempFile::new().expect("Failed to create temp file");
|
||||
let script_content = r#"#!/usr/bin/env python3
|
||||
import sys
|
||||
import json
|
||||
import time
|
||||
|
||||
if len(sys.argv) > 1 and sys.argv[1] == "config":
|
||||
config = {
|
||||
"outputs": [{"str": "test/heartbeat"}],
|
||||
"inputs": [],
|
||||
"args": [],
|
||||
"env": {"PARTITION_REF": "test/heartbeat"}
|
||||
}
|
||||
print(json.dumps({"configs": [config]}))
|
||||
elif len(sys.argv) > 1 and sys.argv[1] == "exec":
|
||||
# Sleep long enough to trigger at least 2 heartbeats
|
||||
time.sleep(0.3) # 300ms with 100ms heartbeat interval should give us 2-3 heartbeats
|
||||
print("Job completed")
|
||||
"#;
|
||||
|
||||
temp_file
|
||||
.write_all(script_content.as_bytes())
|
||||
.expect("Failed to write script");
|
||||
let script_path = temp_file.path().to_str().unwrap();
|
||||
|
||||
// Make script executable
|
||||
std::fs::set_permissions(
|
||||
script_path,
|
||||
std::os::unix::fs::PermissionsExt::from_mode(0o755),
|
||||
)
|
||||
.expect("Failed to set permissions");
|
||||
|
||||
// Set up environment for fast heartbeats and the test script
|
||||
env::set_var("DATABUILD_HEARTBEAT_INTERVAL_MS", &TEST_HEARTBEAT_INTERVAL_MS.to_string());
|
||||
env::set_var("DATABUILD_METRICS_INTERVAL_MS", &TEST_METRICS_INTERVAL_MS.to_string());
|
||||
env::set_var("DATABUILD_JOB_BINARY", script_path);
|
||||
|
||||
// Create test sink and wrapper
|
||||
let sink = TestSink::new();
|
||||
let mut wrapper = JobWrapper::new_with_sink(sink);
|
||||
|
||||
// Create a JobConfig for the test
|
||||
let config = JobConfig {
|
||||
outputs: vec![PartitionRef {
|
||||
r#str: "test/heartbeat".to_string(),
|
||||
}],
|
||||
inputs: vec![],
|
||||
args: vec![],
|
||||
env: {
|
||||
let mut env_map = std::collections::HashMap::new();
|
||||
env_map.insert("PARTITION_REF".to_string(), "test/heartbeat".to_string());
|
||||
env_map
|
||||
},
|
||||
};
|
||||
|
||||
// Run the job
|
||||
let result = wrapper.exec_mode_with_config(script_path, config);
|
||||
|
||||
// Clean up environment
|
||||
env::remove_var("DATABUILD_HEARTBEAT_INTERVAL_MS");
|
||||
env::remove_var("DATABUILD_METRICS_INTERVAL_MS");
|
||||
env::remove_var("DATABUILD_JOB_BINARY");
|
||||
|
||||
// Check that exec_mode succeeded
|
||||
assert!(result.is_ok(), "exec_mode should succeed: {:?}", result);
|
||||
|
||||
// Count heartbeat events
|
||||
let heartbeat_count = wrapper
|
||||
.sink
|
||||
.entries
|
||||
.iter()
|
||||
.filter(|entry| {
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &entry.content {
|
||||
event.event_type == "heartbeat"
|
||||
} else {
|
||||
false
|
||||
}
|
||||
})
|
||||
.count();
|
||||
|
||||
// We should have at least 1 heartbeat event (possibly 2-3 depending on timing)
|
||||
assert!(
|
||||
heartbeat_count >= 1,
|
||||
"Expected at least 1 heartbeat event, but got {}",
|
||||
heartbeat_count
|
||||
);
|
||||
|
||||
// Verify heartbeat event structure
|
||||
let heartbeat_event = wrapper
|
||||
.sink
|
||||
.entries
|
||||
.iter()
|
||||
.find(|entry| {
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &entry.content {
|
||||
event.event_type == "heartbeat"
|
||||
} else {
|
||||
false
|
||||
}
|
||||
})
|
||||
.expect("Should have at least one heartbeat event");
|
||||
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &heartbeat_event.content {
|
||||
// Verify heartbeat contains memory and CPU metrics
|
||||
assert!(
|
||||
event.metadata.contains_key("memory_usage_mb"),
|
||||
"Heartbeat should contain memory_usage_mb"
|
||||
);
|
||||
assert!(
|
||||
event.metadata.contains_key("cpu_usage_percent"),
|
||||
"Heartbeat should contain cpu_usage_percent"
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
610
databuild/job_run.rs
Normal file
610
databuild/job_run.rs
Normal file
|
|
@ -0,0 +1,610 @@
|
|||
use crate::data_build_event::Event;
|
||||
use crate::data_deps::JobRunDataDepResults;
|
||||
use crate::util::DatabuildError;
|
||||
use crate::{
|
||||
EventSource, JobRunCancelEventV1, JobRunFailureEventV1, JobRunMissingDepsEventV1, JobRunStatus,
|
||||
JobRunSuccessEventV1, MissingDeps, ReadDeps,
|
||||
};
|
||||
use std::collections::HashMap;
|
||||
use std::io::{BufRead, BufReader};
|
||||
use std::marker::PhantomData;
|
||||
use std::process::{Child, Command, Stdio};
|
||||
use uuid::Uuid;
|
||||
// TODO log to /var/log/databuild/jobruns/$JOB_RUN_ID/, and rotate over max size (e.g. only ever use 1GB for logs)
|
||||
// Leave door open to background log processor that tails job logs, but don't include in jobrun concept
|
||||
|
||||
/// Backend trait that defines the state types and transition logic for different job run implementations
|
||||
pub trait JobRunBackend: Sized {
|
||||
type NotStartedState;
|
||||
type RunningState;
|
||||
type CompletedState;
|
||||
type FailedState;
|
||||
type CanceledState;
|
||||
type DepMissState;
|
||||
|
||||
/// Create a new not-started job run
|
||||
fn create(entry_point: String, args: Vec<String>) -> Self::NotStartedState;
|
||||
|
||||
/// Transition from NotStarted to Running
|
||||
fn start(
|
||||
not_started: Self::NotStartedState,
|
||||
env: Option<HashMap<String, String>>,
|
||||
) -> Result<Self::RunningState, DatabuildError>;
|
||||
|
||||
/// Poll a running job for state changes
|
||||
fn poll(
|
||||
running: &mut Self::RunningState,
|
||||
) -> Result<
|
||||
PollResult<Self::CompletedState, Self::FailedState, Self::DepMissState>,
|
||||
DatabuildError,
|
||||
>;
|
||||
|
||||
/// Cancel a running job
|
||||
fn cancel_job(
|
||||
running: Self::RunningState,
|
||||
source: EventSource,
|
||||
) -> Result<Self::CanceledState, DatabuildError>;
|
||||
}
|
||||
|
||||
/// Result of polling a running job
|
||||
pub enum PollResult<C, F, D> {
|
||||
StillRunning,
|
||||
Completed(C),
|
||||
Failed(F),
|
||||
DepMiss(D),
|
||||
}
|
||||
|
||||
// ===== TYPE-SAFE STATE MACHINE PATTERN =====
|
||||
// Uses parameterized JobRunHandleWithState wrapped in JobRun enum for storage
|
||||
|
||||
/// JobRunHandle with embedded state enum
|
||||
/// Type-safe job run handle struct, parameterized by backend and state
|
||||
/// This struct manages the actual running process/execution and can only perform operations valid for its current state type
|
||||
pub struct JobRunHandleWithState<B: JobRunBackend, S> {
|
||||
pub job_run_id: Uuid,
|
||||
pub state: S,
|
||||
pub _backend: PhantomData<B>,
|
||||
}
|
||||
|
||||
/// Wrapper enum for storing job run handles in a single collection
|
||||
/// This allows us to store jobs in different states together while maintaining type safety
|
||||
pub enum JobRunHandle<B: JobRunBackend> {
|
||||
NotStarted(JobRunHandleWithState<B, B::NotStartedState>),
|
||||
Running(JobRunHandleWithState<B, B::RunningState>),
|
||||
Completed(JobRunHandleWithState<B, B::CompletedState>),
|
||||
Failed(JobRunHandleWithState<B, B::FailedState>),
|
||||
Canceled(JobRunHandleWithState<B, B::CanceledState>),
|
||||
DepMiss(JobRunHandleWithState<B, B::DepMissState>),
|
||||
}
|
||||
|
||||
/// Result of visiting a running job - returns the typed states
|
||||
pub enum VisitResult<B: JobRunBackend> {
|
||||
StillRunning(JobRunHandleWithState<B, B::RunningState>),
|
||||
Completed(JobRunHandleWithState<B, B::CompletedState>),
|
||||
Failed(JobRunHandleWithState<B, B::FailedState>),
|
||||
DepMiss(JobRunHandleWithState<B, B::DepMissState>),
|
||||
}
|
||||
|
||||
pub enum JobRunConfig {
|
||||
SubProcess {
|
||||
entry_point: String,
|
||||
args: Vec<String>,
|
||||
},
|
||||
}
|
||||
|
||||
// ===== SubProcess Backend Implementation =====
|
||||
|
||||
/// SubProcess backend for running jobs as local subprocesses
|
||||
pub struct SubProcessBackend;
|
||||
|
||||
/// NotStarted state for SubProcess backend
|
||||
pub struct SubProcessNotStarted {
|
||||
pub entry_point: String,
|
||||
pub args: Vec<String>,
|
||||
}
|
||||
|
||||
/// Running state for SubProcess backend
|
||||
pub struct SubProcessRunning {
|
||||
pub process: Child,
|
||||
pub stdout_buffer: Vec<String>,
|
||||
}
|
||||
|
||||
/// Completed state for SubProcess backend
|
||||
pub struct SubProcessCompleted {
|
||||
pub exit_code: i32,
|
||||
pub stdout_buffer: Vec<String>,
|
||||
pub read_deps: Vec<ReadDeps>,
|
||||
}
|
||||
|
||||
/// Failed state for SubProcess backend
|
||||
pub struct SubProcessFailed {
|
||||
pub exit_code: i32,
|
||||
pub reason: String,
|
||||
pub stdout_buffer: Vec<String>,
|
||||
}
|
||||
|
||||
/// Canceled state for SubProcess backend
|
||||
pub struct SubProcessCanceled {
|
||||
pub source: EventSource,
|
||||
pub stdout_buffer: Vec<String>,
|
||||
}
|
||||
|
||||
pub struct SubProcessDepMiss {
|
||||
pub stdout_buffer: Vec<String>,
|
||||
pub missing_deps: Vec<MissingDeps>,
|
||||
pub read_deps: Vec<ReadDeps>,
|
||||
}
|
||||
|
||||
impl JobRunBackend for SubProcessBackend {
|
||||
type NotStartedState = SubProcessNotStarted;
|
||||
type RunningState = SubProcessRunning;
|
||||
type CompletedState = SubProcessCompleted;
|
||||
type FailedState = SubProcessFailed;
|
||||
type CanceledState = SubProcessCanceled;
|
||||
type DepMissState = SubProcessDepMiss;
|
||||
|
||||
fn create(entry_point: String, args: Vec<String>) -> Self::NotStartedState {
|
||||
SubProcessNotStarted { entry_point, args }
|
||||
}
|
||||
|
||||
fn start(
|
||||
not_started: Self::NotStartedState,
|
||||
env: Option<HashMap<String, String>>,
|
||||
) -> Result<Self::RunningState, DatabuildError> {
|
||||
let process = Command::new(not_started.entry_point)
|
||||
.args(not_started.args)
|
||||
.stdout(Stdio::piped())
|
||||
.stderr(Stdio::piped())
|
||||
.envs(env.unwrap_or_default())
|
||||
.spawn()?;
|
||||
|
||||
Ok(SubProcessRunning {
|
||||
process,
|
||||
stdout_buffer: Vec::new(),
|
||||
})
|
||||
}
|
||||
|
||||
fn poll(
|
||||
running: &mut Self::RunningState,
|
||||
) -> Result<
|
||||
PollResult<Self::CompletedState, Self::FailedState, Self::DepMissState>,
|
||||
DatabuildError,
|
||||
> {
|
||||
// Non-blocking check for exit status
|
||||
if let Some(exit_status) = running.process.try_wait()? {
|
||||
// Job has exited
|
||||
// Read any remaining stdout
|
||||
if let Some(stdout) = running.process.stdout.take() {
|
||||
let reader = BufReader::new(stdout);
|
||||
for line in reader.lines() {
|
||||
// TODO we should write lines to the job's file logs
|
||||
if let Ok(line) = line {
|
||||
running.stdout_buffer.push(line);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Take ownership of stdout_buffer, parse dep events
|
||||
let stdout_buffer = std::mem::take(&mut running.stdout_buffer);
|
||||
let deps: JobRunDataDepResults = stdout_buffer.clone().into();
|
||||
|
||||
// Check exit status and return appropriate result
|
||||
match exit_status.code() {
|
||||
Some(0) => {
|
||||
// Success case
|
||||
Ok(PollResult::Completed(SubProcessCompleted {
|
||||
exit_code: 0,
|
||||
stdout_buffer,
|
||||
read_deps: deps.reads,
|
||||
}))
|
||||
}
|
||||
Some(code) => {
|
||||
// Failed with exit code
|
||||
match deps.misses {
|
||||
vec if vec.is_empty() => {
|
||||
// No missing deps, job failed
|
||||
let reason = format!("Job failed with exit code {}", code);
|
||||
Ok(PollResult::Failed(SubProcessFailed {
|
||||
exit_code: code,
|
||||
reason,
|
||||
stdout_buffer,
|
||||
}))
|
||||
}
|
||||
misses => Ok(PollResult::DepMiss(SubProcessDepMiss {
|
||||
stdout_buffer,
|
||||
missing_deps: misses,
|
||||
read_deps: deps.reads,
|
||||
})),
|
||||
}
|
||||
}
|
||||
None => {
|
||||
// Terminated by signal (Unix) - treat as failure
|
||||
let reason = format!("Job terminated by signal: {}", exit_status);
|
||||
Ok(PollResult::Failed(SubProcessFailed {
|
||||
exit_code: -1,
|
||||
reason,
|
||||
stdout_buffer,
|
||||
}))
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Still running
|
||||
Ok(PollResult::StillRunning)
|
||||
}
|
||||
}
|
||||
|
||||
fn cancel_job(
|
||||
mut running: Self::RunningState,
|
||||
source: EventSource,
|
||||
) -> Result<Self::CanceledState, DatabuildError> {
|
||||
// Kill the process
|
||||
running.process.kill()?;
|
||||
|
||||
// Wait for it to actually terminate
|
||||
running.process.wait()?;
|
||||
|
||||
// Return canceled state
|
||||
Ok(SubProcessCanceled {
|
||||
source,
|
||||
stdout_buffer: running.stdout_buffer,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// Helper functions to convert between states and events
|
||||
impl SubProcessCompleted {
|
||||
pub fn to_event(&self, job_run_id: &Uuid) -> Event {
|
||||
Event::JobRunSuccessV1(JobRunSuccessEventV1 {
|
||||
job_run_id: job_run_id.to_string(),
|
||||
read_deps: self.read_deps.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl SubProcessFailed {
|
||||
pub fn to_event(&self, job_run_id: &Uuid) -> Event {
|
||||
Event::JobRunFailureV1(JobRunFailureEventV1 {
|
||||
job_run_id: job_run_id.to_string(),
|
||||
reason: self.reason.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl SubProcessCanceled {
|
||||
pub fn to_event(&self, job_run_id: &Uuid) -> JobRunCancelEventV1 {
|
||||
JobRunCancelEventV1 {
|
||||
job_run_id: job_run_id.to_string(),
|
||||
source: Some(self.source.clone()),
|
||||
comment: Some("Job was canceled".to_string()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl SubProcessDepMiss {
|
||||
pub fn to_event(&self, job_run_id: &Uuid) -> Event {
|
||||
Event::JobRunMissingDepsV1(JobRunMissingDepsEventV1 {
|
||||
job_run_id: job_run_id.to_string(),
|
||||
missing_deps: self.missing_deps.clone(),
|
||||
read_deps: self.read_deps.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// Old JobRunPollResult structure - kept for compatibility during migration
|
||||
pub struct JobRunPollResult {
|
||||
pub new_events: Vec<Event>,
|
||||
pub status: JobRunStatus,
|
||||
}
|
||||
|
||||
// ===== Type-Safe State Transition Implementation =====
|
||||
|
||||
// Factory and helper methods on the JobRunHandle enum
|
||||
impl<B: JobRunBackend> JobRunHandle<B> {
|
||||
/// Create a new job run in the NotStarted state
|
||||
pub fn spawn(entry_point: String, args: Vec<String>) -> Self {
|
||||
JobRunHandle::NotStarted(JobRunHandleWithState {
|
||||
job_run_id: Uuid::new_v4(),
|
||||
state: B::create(entry_point, args),
|
||||
_backend: PhantomData,
|
||||
})
|
||||
}
|
||||
|
||||
/// Get the job run ID regardless of state
|
||||
pub fn job_run_id(&self) -> &Uuid {
|
||||
match self {
|
||||
JobRunHandle::NotStarted(j) => &j.job_run_id,
|
||||
JobRunHandle::Running(j) => &j.job_run_id,
|
||||
JobRunHandle::Completed(j) => &j.job_run_id,
|
||||
JobRunHandle::Failed(j) => &j.job_run_id,
|
||||
JobRunHandle::Canceled(j) => &j.job_run_id,
|
||||
JobRunHandle::DepMiss(j) => &j.job_run_id,
|
||||
}
|
||||
}
|
||||
|
||||
/// Check if the job is in a terminal state
|
||||
pub fn is_terminal(&self) -> bool {
|
||||
matches!(
|
||||
self,
|
||||
JobRunHandle::Completed(_)
|
||||
| JobRunHandle::Failed(_)
|
||||
| JobRunHandle::Canceled(_)
|
||||
| JobRunHandle::DepMiss(_)
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
// Type-safe transition: NotStarted -> Running
|
||||
// This method can ONLY be called on NotStarted jobs - compile error otherwise!
|
||||
impl<B: JobRunBackend> JobRunHandleWithState<B, B::NotStartedState> {
|
||||
pub fn run(
|
||||
self,
|
||||
env: Option<HashMap<String, String>>,
|
||||
) -> Result<JobRunHandleWithState<B, B::RunningState>, DatabuildError> {
|
||||
let running = B::start(self.state, env)?;
|
||||
Ok(JobRunHandleWithState {
|
||||
job_run_id: self.job_run_id,
|
||||
state: running,
|
||||
_backend: PhantomData,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// Type-safe transition: Running -> (Running | Completed | Failed | DepMiss)
|
||||
// This method can ONLY be called on Running jobs - compile error otherwise!
|
||||
impl<B: JobRunBackend> JobRunHandleWithState<B, B::RunningState> {
|
||||
pub fn visit(mut self) -> Result<VisitResult<B>, DatabuildError> {
|
||||
match B::poll(&mut self.state)? {
|
||||
PollResult::StillRunning => Ok(VisitResult::StillRunning(self)),
|
||||
PollResult::Completed(completed) => Ok(VisitResult::Completed(JobRunHandleWithState {
|
||||
job_run_id: self.job_run_id,
|
||||
state: completed,
|
||||
_backend: PhantomData,
|
||||
})),
|
||||
PollResult::Failed(failed) => Ok(VisitResult::Failed(JobRunHandleWithState {
|
||||
job_run_id: self.job_run_id,
|
||||
state: failed,
|
||||
_backend: PhantomData,
|
||||
})),
|
||||
PollResult::DepMiss(dep_miss) => Ok(VisitResult::DepMiss(JobRunHandleWithState {
|
||||
job_run_id: self.job_run_id,
|
||||
state: dep_miss,
|
||||
_backend: PhantomData,
|
||||
})),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn cancel(
|
||||
self,
|
||||
source: EventSource,
|
||||
) -> Result<JobRunHandleWithState<B, B::CanceledState>, DatabuildError> {
|
||||
let canceled = B::cancel_job(self.state, source)?;
|
||||
Ok(JobRunHandleWithState {
|
||||
job_run_id: self.job_run_id,
|
||||
state: canceled,
|
||||
_backend: PhantomData,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// Helper trait for converting states to events
|
||||
pub trait ToEvent {
|
||||
fn to_event(&self, job_run_id: &Uuid) -> Event;
|
||||
}
|
||||
|
||||
impl ToEvent for SubProcessCompleted {
|
||||
fn to_event(&self, job_run_id: &Uuid) -> Event {
|
||||
Event::JobRunSuccessV1(JobRunSuccessEventV1 {
|
||||
job_run_id: job_run_id.to_string(),
|
||||
read_deps: self.read_deps.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl ToEvent for SubProcessFailed {
|
||||
fn to_event(&self, job_run_id: &Uuid) -> Event {
|
||||
Event::JobRunFailureV1(JobRunFailureEventV1 {
|
||||
job_run_id: job_run_id.to_string(),
|
||||
reason: self.reason.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl ToEvent for SubProcessDepMiss {
|
||||
fn to_event(&self, job_run_id: &Uuid) -> Event {
|
||||
Event::JobRunMissingDepsV1(JobRunMissingDepsEventV1 {
|
||||
job_run_id: job_run_id.to_string(),
|
||||
missing_deps: self.missing_deps.clone(),
|
||||
read_deps: self.read_deps.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
mod tests {
|
||||
use crate::data_build_event::Event;
|
||||
use crate::data_deps::DATABUILD_MISSING_DEPS_JSON;
|
||||
use crate::job_run::{JobRunBackend, JobRunHandle, SubProcessBackend, VisitResult};
|
||||
use crate::mock_job_run::MockJobRun;
|
||||
use crate::{JobRunMissingDeps, MissingDeps};
|
||||
|
||||
/// Happy path - run that succeeds should emit a JobRunSuccessEventV1
|
||||
#[test]
|
||||
fn test_job_run_success_returns_job_run_success_event() {
|
||||
// Spawn a job run that will succeed (exit code 0)
|
||||
let job_run = JobRunHandle::<SubProcessBackend>::spawn(MockJobRun::bin_path(), vec![]);
|
||||
|
||||
// Start the job - this consumes the NotStarted and returns Running
|
||||
let running_job = match job_run {
|
||||
JobRunHandle::NotStarted(not_started) => not_started.run(None).unwrap(),
|
||||
_ => panic!("Expected NotStarted job"),
|
||||
};
|
||||
|
||||
// Poll until we get completion
|
||||
let mut current_job = running_job;
|
||||
loop {
|
||||
match current_job.visit().unwrap() {
|
||||
VisitResult::Completed(completed) => {
|
||||
// Generate the event from the completed state
|
||||
let event = completed.state.to_event(&completed.job_run_id);
|
||||
assert!(matches!(event, Event::JobRunSuccessV1(_)));
|
||||
break;
|
||||
}
|
||||
VisitResult::Failed(failed) => {
|
||||
panic!("Job failed unexpectedly: {}", failed.state.reason);
|
||||
}
|
||||
VisitResult::StillRunning(still_running) => {
|
||||
// Sleep briefly and poll again
|
||||
std::thread::sleep(std::time::Duration::from_millis(10));
|
||||
current_job = still_running;
|
||||
continue;
|
||||
}
|
||||
VisitResult::DepMiss(_dep_miss) => {
|
||||
panic!("Job dep miss unexpectedly");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Job run that fails should emit a JobRunFailureEventV1
|
||||
#[test]
|
||||
fn test_job_run_failure_returns_job_run_failure_event() {
|
||||
// Spawn a job run
|
||||
let job_run = JobRunHandle::<SubProcessBackend>::spawn(MockJobRun::bin_path(), vec![]);
|
||||
|
||||
// Start the job with an exit code that indicates failure (non-zero)
|
||||
let env = MockJobRun::new().exit_code(1).to_env();
|
||||
let running_job = match job_run {
|
||||
JobRunHandle::NotStarted(not_started) => not_started.run(Some(env)).unwrap(),
|
||||
_ => panic!("Expected NotStarted job"),
|
||||
};
|
||||
|
||||
// Poll until we get completion
|
||||
let mut current_job = running_job;
|
||||
loop {
|
||||
match current_job.visit().unwrap() {
|
||||
VisitResult::Completed(_) => {
|
||||
panic!("Job succeeded unexpectedly");
|
||||
}
|
||||
VisitResult::Failed(failed) => {
|
||||
// Generate the event from the failed state
|
||||
let event = failed.state.to_event(&failed.job_run_id);
|
||||
assert!(matches!(event, Event::JobRunFailureV1(_)));
|
||||
break;
|
||||
}
|
||||
VisitResult::StillRunning(still_running) => {
|
||||
// Sleep briefly and poll again
|
||||
std::thread::sleep(std::time::Duration::from_millis(10));
|
||||
current_job = still_running;
|
||||
continue;
|
||||
}
|
||||
VisitResult::DepMiss(_dep_miss) => {
|
||||
panic!("Job dep miss unexpectedly");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Canceling an event before it completes should result in it:
|
||||
/// - Stop the actual subprocess (e.g. no output file should be written)
|
||||
/// - Emitting a JobRunCancelEventV1 event
|
||||
#[test]
|
||||
fn test_job_run_cancel_returns_job_run_cancel_event() {
|
||||
use crate::ManuallyTriggeredEvent;
|
||||
use std::fs;
|
||||
use uuid::Uuid;
|
||||
|
||||
// Create a temp file path for the test
|
||||
let temp_file = format!("/tmp/databuild_test_cancel_{}", Uuid::new_v4());
|
||||
|
||||
// Spawn a job run that will sleep for 1 second and write a file
|
||||
let job_run = JobRunHandle::<SubProcessBackend>::spawn(MockJobRun::bin_path(), vec![]);
|
||||
|
||||
let env = MockJobRun::new()
|
||||
.sleep_ms(1000)
|
||||
.output_file(&temp_file, &"completed".to_string())
|
||||
.exit_code(0)
|
||||
.to_env();
|
||||
let running_job = match job_run {
|
||||
JobRunHandle::NotStarted(not_started) => not_started.run(Some(env)).unwrap(),
|
||||
_ => panic!("Expected NotStarted job"),
|
||||
};
|
||||
|
||||
// Give it a tiny bit of time to start
|
||||
std::thread::sleep(std::time::Duration::from_millis(10));
|
||||
|
||||
// Cancel the job before it can complete - this consumes the running job and returns canceled
|
||||
let canceled_job = running_job
|
||||
.cancel(
|
||||
ManuallyTriggeredEvent {
|
||||
user: "test_user".into(),
|
||||
}
|
||||
.into(),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
// Generate the cancel event from the canceled state
|
||||
let cancel_event = canceled_job.state.to_event(&canceled_job.job_run_id);
|
||||
|
||||
// Verify we got the cancel event
|
||||
assert_eq!(cancel_event.job_run_id, canceled_job.job_run_id.to_string());
|
||||
assert!(cancel_event.source.is_some());
|
||||
assert_eq!(cancel_event.comment, Some("Job was canceled".to_string()));
|
||||
|
||||
// Verify the output file was NOT written (process was killed before it could complete)
|
||||
assert!(
|
||||
!std::path::Path::new(&temp_file).exists(),
|
||||
"Output file should not exist - process should have been killed"
|
||||
);
|
||||
|
||||
// Cleanup just in case
|
||||
let _ = fs::remove_file(&temp_file);
|
||||
}
|
||||
|
||||
/// Job run that fails and emits a recognized "dep miss" statement should emit a JobRunMissingDepsEventV1
|
||||
#[test]
|
||||
fn test_job_run_fail_on_missing_deps_should_emit_missing_deps_event() {
|
||||
// Spawn a job run that will sleep for 1 second and write a file
|
||||
let job_run = JobRunHandle::<SubProcessBackend>::spawn(MockJobRun::bin_path(), vec![]);
|
||||
|
||||
let expected_dep_miss = JobRunMissingDeps {
|
||||
version: "1".into(),
|
||||
missing_deps: vec![MissingDeps {
|
||||
impacted: vec!["my_fav_output".into()],
|
||||
missing: vec!["cool_input_1".into(), "cool_input_2".into()],
|
||||
}],
|
||||
};
|
||||
let dep_miss_json =
|
||||
serde_json::to_string(&expected_dep_miss).expect("Failed to serialize dep miss");
|
||||
let dep_miss_line = format!("{}{}", DATABUILD_MISSING_DEPS_JSON, dep_miss_json);
|
||||
let env = MockJobRun::new()
|
||||
.stdout_msg(&dep_miss_line)
|
||||
.exit_code(1)
|
||||
.to_env();
|
||||
let running_job = match job_run {
|
||||
JobRunHandle::NotStarted(not_started) => not_started.run(Some(env)).unwrap(),
|
||||
_ => panic!("Expected NotStarted job"),
|
||||
};
|
||||
|
||||
// Poll until we get completion
|
||||
let mut current_job = running_job;
|
||||
loop {
|
||||
match current_job.visit().unwrap() {
|
||||
VisitResult::Completed(_) => {
|
||||
panic!("Job succeeded unexpectedly");
|
||||
}
|
||||
VisitResult::Failed(_failed) => {
|
||||
panic!("Job failed unexpectedly");
|
||||
}
|
||||
VisitResult::StillRunning(still_running) => {
|
||||
// Sleep briefly and poll again
|
||||
std::thread::sleep(std::time::Duration::from_millis(10));
|
||||
current_job = still_running;
|
||||
continue;
|
||||
}
|
||||
VisitResult::DepMiss(dep_miss) => {
|
||||
assert_eq!(dep_miss.state.missing_deps, expected_dep_miss.missing_deps);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
578
databuild/job_run_state.rs
Normal file
578
databuild/job_run_state.rs
Normal file
|
|
@ -0,0 +1,578 @@
|
|||
use crate::partition_state::{BuildingPartitionRef, FailedPartitionRef, LivePartitionRef};
|
||||
use crate::util::{HasRelatedIds, RelatedIds, current_timestamp};
|
||||
use crate::{
|
||||
EventSource, JobRunDetail, JobRunStatusCode, MissingDeps, PartitionRef, ReadDeps,
|
||||
WantAttributedPartitions,
|
||||
};
|
||||
use std::collections::BTreeMap;
|
||||
use uuid::Uuid;
|
||||
|
||||
/// State: Job has been queued but not yet started
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct QueuedState {
|
||||
pub queued_at: u64,
|
||||
}
|
||||
|
||||
/// State: Job is currently running
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct RunningState {
|
||||
pub started_at: u64,
|
||||
pub last_heartbeat_at: u64, // NOT optional, defaults to started_at
|
||||
}
|
||||
|
||||
/// State: Job completed successfully
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct SucceededState {
|
||||
pub completed_at: u64,
|
||||
/// The read dependencies reported by the job, preserving impacted→read relationships
|
||||
pub read_deps: Vec<ReadDeps>,
|
||||
/// Resolved UUIDs for partitions that were read (ref → UUID at read time)
|
||||
pub read_partition_uuids: BTreeMap<String, String>,
|
||||
/// Resolved UUIDs for partitions that were written (ref → UUID)
|
||||
pub wrote_partition_uuids: BTreeMap<String, String>,
|
||||
}
|
||||
|
||||
/// State: Job failed during execution
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct FailedState {
|
||||
pub failed_at: u64,
|
||||
pub failure_reason: String,
|
||||
}
|
||||
|
||||
/// State: Job detected missing dependencies
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct DepMissState {
|
||||
pub detected_at: u64,
|
||||
pub missing_deps: Vec<MissingDeps>,
|
||||
pub read_deps: Vec<ReadDeps>,
|
||||
/// Want IDs of ephemeral wants spawned by this dep-miss
|
||||
pub derivative_want_ids: Vec<String>,
|
||||
}
|
||||
|
||||
/// State: Job was explicitly canceled
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct CanceledState {
|
||||
pub canceled_at: u64,
|
||||
pub source: Option<EventSource>,
|
||||
pub comment: String,
|
||||
}
|
||||
|
||||
/// Shared information across all job run states
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct JobInfo {
|
||||
pub id: String,
|
||||
pub job_label: String,
|
||||
pub building_partitions: Vec<PartitionRef>,
|
||||
pub servicing_wants: Vec<WantAttributedPartitions>,
|
||||
}
|
||||
|
||||
/// Timing information preserved across state transitions
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct TimingInfo {
|
||||
/// When the job was first queued
|
||||
pub queued_at: u64,
|
||||
/// When the job started running (None if still queued or canceled before starting)
|
||||
pub started_at: Option<u64>,
|
||||
}
|
||||
|
||||
/// Generic job run struct parameterized by state
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct JobRunWithState<S> {
|
||||
pub info: JobInfo,
|
||||
pub timing: TimingInfo,
|
||||
pub state: S,
|
||||
}
|
||||
|
||||
/// Wrapper enum for storing job runs in collections
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum JobRun {
|
||||
Queued(JobRunWithState<QueuedState>),
|
||||
Running(JobRunWithState<RunningState>),
|
||||
Succeeded(JobRunWithState<SucceededState>),
|
||||
Failed(JobRunWithState<FailedState>),
|
||||
DepMiss(JobRunWithState<DepMissState>),
|
||||
Canceled(JobRunWithState<CanceledState>),
|
||||
}
|
||||
|
||||
// ==================== State Transitions ====================
|
||||
|
||||
impl JobRunWithState<QueuedState> {
|
||||
/// Transition from Queued to Running
|
||||
pub fn start_running(self, timestamp: u64) -> JobRunWithState<RunningState> {
|
||||
JobRunWithState {
|
||||
info: self.info,
|
||||
timing: TimingInfo {
|
||||
queued_at: self.timing.queued_at,
|
||||
started_at: Some(timestamp),
|
||||
},
|
||||
state: RunningState {
|
||||
started_at: timestamp,
|
||||
last_heartbeat_at: timestamp, // Initialize to start time
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition from Queued to Canceled (canceled before starting)
|
||||
pub fn cancel(
|
||||
self,
|
||||
timestamp: u64,
|
||||
source: Option<EventSource>,
|
||||
comment: String,
|
||||
) -> JobRunWithState<CanceledState> {
|
||||
JobRunWithState {
|
||||
info: self.info,
|
||||
timing: self.timing, // Preserve timing (started_at remains None)
|
||||
state: CanceledState {
|
||||
canceled_at: timestamp,
|
||||
source,
|
||||
comment,
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl JobRunWithState<RunningState> {
|
||||
/// Update heartbeat timestamp (non-consuming)
|
||||
pub fn heartbeat(mut self, timestamp: u64) -> Self {
|
||||
self.state.last_heartbeat_at = timestamp;
|
||||
self
|
||||
}
|
||||
|
||||
/// Transition from Running to Succeeded
|
||||
pub fn succeed(
|
||||
self,
|
||||
timestamp: u64,
|
||||
read_deps: Vec<ReadDeps>,
|
||||
read_partition_uuids: BTreeMap<String, String>,
|
||||
wrote_partition_uuids: BTreeMap<String, String>,
|
||||
) -> JobRunWithState<SucceededState> {
|
||||
JobRunWithState {
|
||||
info: self.info,
|
||||
timing: self.timing,
|
||||
state: SucceededState {
|
||||
completed_at: timestamp,
|
||||
read_deps,
|
||||
read_partition_uuids,
|
||||
wrote_partition_uuids,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition from Running to Failed
|
||||
pub fn fail(self, timestamp: u64, reason: String) -> JobRunWithState<FailedState> {
|
||||
JobRunWithState {
|
||||
info: self.info,
|
||||
timing: self.timing,
|
||||
state: FailedState {
|
||||
failed_at: timestamp,
|
||||
failure_reason: reason,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition from Running to DepMiss
|
||||
pub fn dep_miss(
|
||||
self,
|
||||
timestamp: u64,
|
||||
missing_deps: Vec<MissingDeps>,
|
||||
read_deps: Vec<ReadDeps>,
|
||||
) -> JobRunWithState<DepMissState> {
|
||||
JobRunWithState {
|
||||
timing: self.timing,
|
||||
info: self.info,
|
||||
state: DepMissState {
|
||||
detected_at: timestamp,
|
||||
missing_deps,
|
||||
read_deps,
|
||||
derivative_want_ids: vec![], // Populated later when ephemeral wants are created
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition from Running to Canceled
|
||||
pub fn cancel(
|
||||
self,
|
||||
timestamp: u64,
|
||||
source: Option<EventSource>,
|
||||
comment: String,
|
||||
) -> JobRunWithState<CanceledState> {
|
||||
JobRunWithState {
|
||||
info: self.info,
|
||||
timing: self.timing,
|
||||
state: CanceledState {
|
||||
canceled_at: timestamp,
|
||||
source,
|
||||
comment,
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ==================== Type-Safe Job Run IDs ====================
|
||||
|
||||
/// Type-safe job run ID wrappers that encode state expectations in function signatures.
|
||||
/// These should be created ephemerally from typestate objects via .get_id() and used immediately—never stored long-term.
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub struct QueuedJobRunId(pub String);
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub struct RunningJobRunId(pub String);
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub struct SucceededJobRunId(pub String);
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub struct FailedJobRunId(pub String);
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub struct DepMissJobRunId(pub String);
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub struct CanceledJobRunId(pub String);
|
||||
|
||||
// ==================== State-Specific Methods ====================
|
||||
|
||||
impl JobRunWithState<QueuedState> {
|
||||
pub fn get_id(&self) -> QueuedJobRunId {
|
||||
QueuedJobRunId(self.info.id.clone())
|
||||
}
|
||||
}
|
||||
|
||||
impl JobRunWithState<RunningState> {
|
||||
pub fn get_id(&self) -> RunningJobRunId {
|
||||
RunningJobRunId(self.info.id.clone())
|
||||
}
|
||||
|
||||
/// Currently building these partitions
|
||||
/// Job run running state is the SOURCE of truth that these partitions are building
|
||||
pub fn get_building_partitions(&self) -> Vec<BuildingPartitionRef> {
|
||||
self.info
|
||||
.building_partitions
|
||||
.iter()
|
||||
.map(|p| BuildingPartitionRef(p.clone()))
|
||||
.collect()
|
||||
}
|
||||
|
||||
pub fn get_last_heartbeat(&self) -> u64 {
|
||||
self.state.last_heartbeat_at
|
||||
}
|
||||
}
|
||||
|
||||
impl JobRunWithState<SucceededState> {
|
||||
pub fn get_id(&self) -> SucceededJobRunId {
|
||||
SucceededJobRunId(self.info.id.clone())
|
||||
}
|
||||
|
||||
/// Job run success is the SOURCE of truth that these partitions are live
|
||||
pub fn get_completed_partitions(&self) -> Vec<LivePartitionRef> {
|
||||
self.info
|
||||
.building_partitions
|
||||
.iter()
|
||||
.map(|p| LivePartitionRef(p.clone()))
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Get the read dependencies reported by the job
|
||||
pub fn get_read_deps(&self) -> &[ReadDeps] {
|
||||
&self.state.read_deps
|
||||
}
|
||||
|
||||
/// Get the resolved UUIDs for partitions that were read
|
||||
pub fn get_read_partition_uuids(&self) -> &BTreeMap<String, String> {
|
||||
&self.state.read_partition_uuids
|
||||
}
|
||||
|
||||
/// Get the resolved UUIDs for partitions that were written
|
||||
pub fn get_wrote_partition_uuids(&self) -> &BTreeMap<String, String> {
|
||||
&self.state.wrote_partition_uuids
|
||||
}
|
||||
}
|
||||
|
||||
impl JobRunWithState<FailedState> {
|
||||
pub fn get_id(&self) -> FailedJobRunId {
|
||||
FailedJobRunId(self.info.id.clone())
|
||||
}
|
||||
|
||||
/// Job run failure is the SOURCE of truth that these partitions failed
|
||||
pub fn get_failed_partitions(&self) -> Vec<FailedPartitionRef> {
|
||||
self.info
|
||||
.building_partitions
|
||||
.iter()
|
||||
.map(|p| FailedPartitionRef(p.clone()))
|
||||
.collect()
|
||||
}
|
||||
|
||||
pub fn get_failure_reason(&self) -> &str {
|
||||
&self.state.failure_reason
|
||||
}
|
||||
}
|
||||
|
||||
impl JobRunWithState<DepMissState> {
|
||||
pub fn get_id(&self) -> DepMissJobRunId {
|
||||
DepMissJobRunId(self.info.id.clone())
|
||||
}
|
||||
|
||||
/// Job run dep miss means building partitions should reset to Missing
|
||||
pub fn get_building_partitions_to_reset(&self) -> Vec<BuildingPartitionRef> {
|
||||
self.info
|
||||
.building_partitions
|
||||
.iter()
|
||||
.map(|p| BuildingPartitionRef(p.clone()))
|
||||
.collect()
|
||||
}
|
||||
|
||||
pub fn get_missing_deps(&self) -> &[MissingDeps] {
|
||||
&self.state.missing_deps
|
||||
}
|
||||
|
||||
pub fn get_read_deps(&self) -> &[ReadDeps] {
|
||||
&self.state.read_deps
|
||||
}
|
||||
|
||||
/// Add a derivative want ID (ephemeral want spawned by this dep-miss)
|
||||
pub fn add_derivative_want_id(&mut self, want_id: &str) {
|
||||
if !self
|
||||
.state
|
||||
.derivative_want_ids
|
||||
.contains(&want_id.to_string())
|
||||
{
|
||||
self.state.derivative_want_ids.push(want_id.to_string());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl JobRunWithState<CanceledState> {
|
||||
pub fn get_id(&self) -> CanceledJobRunId {
|
||||
CanceledJobRunId(self.info.id.clone())
|
||||
}
|
||||
|
||||
/// Canceled job means building partitions should reset to Missing
|
||||
pub fn get_building_partitions_to_reset(&self) -> Vec<BuildingPartitionRef> {
|
||||
self.info
|
||||
.building_partitions
|
||||
.iter()
|
||||
.map(|p| BuildingPartitionRef(p.clone()))
|
||||
.collect()
|
||||
}
|
||||
}
|
||||
|
||||
// ==================== HasRelatedIds trait implementation ====================
|
||||
|
||||
impl HasRelatedIds for JobRun {
|
||||
/// Get the IDs of all entities this job run references.
|
||||
/// Note: derivative_want_ids come from BuildState, not from JobRun itself.
|
||||
fn related_ids(&self) -> RelatedIds {
|
||||
// Partition refs from building_partitions (all states have this)
|
||||
let partition_refs: Vec<String> = match self {
|
||||
JobRun::Queued(jr) => jr
|
||||
.info
|
||||
.building_partitions
|
||||
.iter()
|
||||
.map(|p| p.r#ref.clone())
|
||||
.collect(),
|
||||
JobRun::Running(jr) => jr
|
||||
.info
|
||||
.building_partitions
|
||||
.iter()
|
||||
.map(|p| p.r#ref.clone())
|
||||
.collect(),
|
||||
JobRun::Succeeded(jr) => jr
|
||||
.info
|
||||
.building_partitions
|
||||
.iter()
|
||||
.map(|p| p.r#ref.clone())
|
||||
.collect(),
|
||||
JobRun::Failed(jr) => jr
|
||||
.info
|
||||
.building_partitions
|
||||
.iter()
|
||||
.map(|p| p.r#ref.clone())
|
||||
.collect(),
|
||||
JobRun::DepMiss(jr) => jr
|
||||
.info
|
||||
.building_partitions
|
||||
.iter()
|
||||
.map(|p| p.r#ref.clone())
|
||||
.collect(),
|
||||
JobRun::Canceled(jr) => jr
|
||||
.info
|
||||
.building_partitions
|
||||
.iter()
|
||||
.map(|p| p.r#ref.clone())
|
||||
.collect(),
|
||||
};
|
||||
|
||||
// Partition UUIDs from read/write lineage (only Succeeded state has these)
|
||||
let partition_uuids: Vec<Uuid> = match self {
|
||||
JobRun::Succeeded(jr) => {
|
||||
let mut uuids = Vec::new();
|
||||
for uuid_str in jr.state.read_partition_uuids.values() {
|
||||
if let Ok(uuid) = Uuid::parse_str(uuid_str) {
|
||||
uuids.push(uuid);
|
||||
}
|
||||
}
|
||||
for uuid_str in jr.state.wrote_partition_uuids.values() {
|
||||
if let Ok(uuid) = Uuid::parse_str(uuid_str) {
|
||||
if !uuids.contains(&uuid) {
|
||||
uuids.push(uuid);
|
||||
}
|
||||
}
|
||||
}
|
||||
uuids
|
||||
}
|
||||
_ => vec![],
|
||||
};
|
||||
|
||||
// Want IDs from servicing_wants (all states have this)
|
||||
let want_ids: Vec<String> = match self {
|
||||
JobRun::Queued(jr) => jr
|
||||
.info
|
||||
.servicing_wants
|
||||
.iter()
|
||||
.map(|w| w.want_id.clone())
|
||||
.collect(),
|
||||
JobRun::Running(jr) => jr
|
||||
.info
|
||||
.servicing_wants
|
||||
.iter()
|
||||
.map(|w| w.want_id.clone())
|
||||
.collect(),
|
||||
JobRun::Succeeded(jr) => jr
|
||||
.info
|
||||
.servicing_wants
|
||||
.iter()
|
||||
.map(|w| w.want_id.clone())
|
||||
.collect(),
|
||||
JobRun::Failed(jr) => jr
|
||||
.info
|
||||
.servicing_wants
|
||||
.iter()
|
||||
.map(|w| w.want_id.clone())
|
||||
.collect(),
|
||||
JobRun::DepMiss(jr) => jr
|
||||
.info
|
||||
.servicing_wants
|
||||
.iter()
|
||||
.map(|w| w.want_id.clone())
|
||||
.collect(),
|
||||
JobRun::Canceled(jr) => jr
|
||||
.info
|
||||
.servicing_wants
|
||||
.iter()
|
||||
.map(|w| w.want_id.clone())
|
||||
.collect(),
|
||||
};
|
||||
|
||||
RelatedIds {
|
||||
partition_refs,
|
||||
partition_uuids,
|
||||
job_run_ids: vec![],
|
||||
want_ids,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ==================== Conversion to JobRunDetail for API ====================
|
||||
|
||||
impl JobRun {
|
||||
pub fn to_detail(&self) -> JobRunDetail {
|
||||
use std::collections::HashMap;
|
||||
|
||||
match self {
|
||||
JobRun::Queued(queued) => JobRunDetail {
|
||||
id: queued.info.id.clone(),
|
||||
job_label: queued.info.job_label.clone(),
|
||||
status: Some(JobRunStatusCode::JobRunQueued.into()),
|
||||
last_heartbeat_at: None,
|
||||
building_partitions: queued.info.building_partitions.clone(),
|
||||
servicing_wants: queued.info.servicing_wants.clone(),
|
||||
read_deps: vec![],
|
||||
read_partition_uuids: HashMap::new(),
|
||||
wrote_partition_uuids: HashMap::new(),
|
||||
derivative_want_ids: vec![],
|
||||
queued_at: Some(queued.timing.queued_at),
|
||||
started_at: queued.timing.started_at,
|
||||
},
|
||||
JobRun::Running(running) => JobRunDetail {
|
||||
id: running.info.id.clone(),
|
||||
job_label: running.info.job_label.clone(),
|
||||
status: Some(JobRunStatusCode::JobRunRunning.into()),
|
||||
last_heartbeat_at: Some(running.state.last_heartbeat_at),
|
||||
building_partitions: running.info.building_partitions.clone(),
|
||||
servicing_wants: running.info.servicing_wants.clone(),
|
||||
read_deps: vec![],
|
||||
read_partition_uuids: HashMap::new(),
|
||||
wrote_partition_uuids: HashMap::new(),
|
||||
derivative_want_ids: vec![],
|
||||
queued_at: Some(running.timing.queued_at),
|
||||
started_at: running.timing.started_at,
|
||||
},
|
||||
JobRun::Succeeded(succeeded) => JobRunDetail {
|
||||
id: succeeded.info.id.clone(),
|
||||
job_label: succeeded.info.job_label.clone(),
|
||||
status: Some(JobRunStatusCode::JobRunSucceeded.into()),
|
||||
last_heartbeat_at: None,
|
||||
building_partitions: succeeded.info.building_partitions.clone(),
|
||||
servicing_wants: succeeded.info.servicing_wants.clone(),
|
||||
read_deps: succeeded.state.read_deps.clone(),
|
||||
read_partition_uuids: succeeded
|
||||
.state
|
||||
.read_partition_uuids
|
||||
.clone()
|
||||
.into_iter()
|
||||
.collect(),
|
||||
wrote_partition_uuids: succeeded
|
||||
.state
|
||||
.wrote_partition_uuids
|
||||
.clone()
|
||||
.into_iter()
|
||||
.collect(),
|
||||
derivative_want_ids: vec![],
|
||||
queued_at: Some(succeeded.timing.queued_at),
|
||||
started_at: succeeded.timing.started_at,
|
||||
},
|
||||
JobRun::Failed(failed) => JobRunDetail {
|
||||
id: failed.info.id.clone(),
|
||||
job_label: failed.info.job_label.clone(),
|
||||
status: Some(JobRunStatusCode::JobRunFailed.into()),
|
||||
last_heartbeat_at: None,
|
||||
building_partitions: failed.info.building_partitions.clone(),
|
||||
servicing_wants: failed.info.servicing_wants.clone(),
|
||||
read_deps: vec![],
|
||||
read_partition_uuids: HashMap::new(),
|
||||
wrote_partition_uuids: HashMap::new(),
|
||||
derivative_want_ids: vec![],
|
||||
queued_at: Some(failed.timing.queued_at),
|
||||
started_at: failed.timing.started_at,
|
||||
},
|
||||
JobRun::DepMiss(dep_miss) => JobRunDetail {
|
||||
id: dep_miss.info.id.clone(),
|
||||
job_label: dep_miss.info.job_label.clone(),
|
||||
status: Some(JobRunStatusCode::JobRunDepMiss.into()),
|
||||
last_heartbeat_at: None,
|
||||
building_partitions: dep_miss.info.building_partitions.clone(),
|
||||
servicing_wants: dep_miss.info.servicing_wants.clone(),
|
||||
read_deps: dep_miss.state.read_deps.clone(),
|
||||
read_partition_uuids: HashMap::new(),
|
||||
wrote_partition_uuids: HashMap::new(),
|
||||
derivative_want_ids: dep_miss.state.derivative_want_ids.clone(),
|
||||
queued_at: Some(dep_miss.timing.queued_at),
|
||||
started_at: dep_miss.timing.started_at,
|
||||
},
|
||||
JobRun::Canceled(canceled) => JobRunDetail {
|
||||
id: canceled.info.id.clone(),
|
||||
job_label: canceled.info.job_label.clone(),
|
||||
status: Some(JobRunStatusCode::JobRunCanceled.into()),
|
||||
last_heartbeat_at: None,
|
||||
building_partitions: canceled.info.building_partitions.clone(),
|
||||
servicing_wants: canceled.info.servicing_wants.clone(),
|
||||
read_deps: vec![],
|
||||
read_partition_uuids: HashMap::new(),
|
||||
wrote_partition_uuids: HashMap::new(),
|
||||
derivative_want_ids: vec![],
|
||||
queued_at: Some(canceled.timing.queued_at),
|
||||
started_at: canceled.timing.started_at,
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,41 +1,22 @@
|
|||
pub mod build_event_log;
|
||||
pub mod build_state;
|
||||
pub mod commands;
|
||||
pub mod config;
|
||||
pub mod daemon;
|
||||
mod data_deps;
|
||||
mod event_transforms;
|
||||
pub mod http_server;
|
||||
mod job;
|
||||
mod job_run;
|
||||
mod job_run_state;
|
||||
pub mod lineage;
|
||||
mod mock_job_run;
|
||||
pub mod orchestrator;
|
||||
mod partition_state;
|
||||
pub mod server_lock;
|
||||
mod util;
|
||||
mod want_state;
|
||||
pub mod web;
|
||||
|
||||
// Include generated protobuf code
|
||||
include!("databuild.rs");
|
||||
|
||||
// Event log module
|
||||
pub mod event_log;
|
||||
|
||||
// Orchestration module
|
||||
pub mod orchestration;
|
||||
|
||||
// Service module
|
||||
pub mod service;
|
||||
|
||||
// Repository pattern implementations
|
||||
pub mod repositories;
|
||||
|
||||
pub mod mermaid_utils;
|
||||
|
||||
// Status conversion utilities
|
||||
pub mod status_utils;
|
||||
|
||||
// Log collection module
|
||||
pub mod log_collector;
|
||||
|
||||
// Log access module
|
||||
pub mod log_access;
|
||||
|
||||
// Metric templates module
|
||||
pub mod metric_templates;
|
||||
|
||||
// Metrics aggregator module
|
||||
pub mod metrics_aggregator;
|
||||
|
||||
// Format consistency tests
|
||||
#[cfg(test)]
|
||||
mod format_consistency_test;
|
||||
|
||||
// Re-export commonly used types from event_log
|
||||
pub use event_log::{BuildEventLogError, create_bel_query_engine};
|
||||
|
||||
// Re-export orchestration types
|
||||
pub use orchestration::{BuildOrchestrator, BuildResult, OrchestrationError};
|
||||
402
databuild/lineage.rs
Normal file
402
databuild/lineage.rs
Normal file
|
|
@ -0,0 +1,402 @@
|
|||
//! Lineage graph generation for visualizing want → partition → job run dependencies.
|
||||
//!
|
||||
//! This module provides functionality for building Mermaid flowcharts that show
|
||||
//! the dependency relationships between wants, partitions, and job runs.
|
||||
|
||||
use crate::build_state::BuildState;
|
||||
use std::collections::HashSet;
|
||||
|
||||
// =============================================================================
|
||||
// Lineage Graph Data Structures
|
||||
// =============================================================================
|
||||
|
||||
/// Node types in the lineage graph
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub enum LineageNodeType {
|
||||
Want,
|
||||
Partition,
|
||||
JobRun,
|
||||
}
|
||||
|
||||
/// A node in the lineage graph
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct LineageNode {
|
||||
pub id: String,
|
||||
pub label: String,
|
||||
pub node_type: LineageNodeType,
|
||||
pub status_fill: String,
|
||||
pub status_stroke: String,
|
||||
pub url: String,
|
||||
}
|
||||
|
||||
/// An edge in the lineage graph
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct LineageEdge {
|
||||
pub from: String,
|
||||
pub to: String,
|
||||
pub dashed: bool,
|
||||
}
|
||||
|
||||
/// A directed graph representing want/partition/job run lineage
|
||||
#[derive(Debug, Clone, Default)]
|
||||
pub struct LineageGraph {
|
||||
pub nodes: Vec<LineageNode>,
|
||||
pub edges: Vec<LineageEdge>,
|
||||
}
|
||||
|
||||
impl LineageGraph {
|
||||
pub fn new() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
|
||||
pub fn add_node(&mut self, node: LineageNode) {
|
||||
// Only add if not already present
|
||||
if !self.nodes.iter().any(|n| n.id == node.id) {
|
||||
self.nodes.push(node);
|
||||
}
|
||||
}
|
||||
|
||||
pub fn add_edge(&mut self, from: String, to: String, dashed: bool) {
|
||||
let edge = LineageEdge { from, to, dashed };
|
||||
// Only add if not already present
|
||||
if !self
|
||||
.edges
|
||||
.iter()
|
||||
.any(|e| e.from == edge.from && e.to == edge.to)
|
||||
{
|
||||
self.edges.push(edge);
|
||||
}
|
||||
}
|
||||
|
||||
/// Generate Mermaid flowchart syntax
|
||||
pub fn to_mermaid(&self) -> String {
|
||||
let mut lines = vec!["flowchart TD".to_string()];
|
||||
|
||||
// Add nodes with labels and shapes
|
||||
for node in &self.nodes {
|
||||
let shape = match node.node_type {
|
||||
LineageNodeType::Want => format!("{}[\"🎯 {}\"]", node.id, node.label),
|
||||
LineageNodeType::Partition => format!("{}[/\"📦 {}\"/]", node.id, node.label),
|
||||
LineageNodeType::JobRun => format!("{}([\"⚙️ {}\"])", node.id, node.label),
|
||||
};
|
||||
lines.push(format!(" {}", shape));
|
||||
}
|
||||
|
||||
lines.push(String::new());
|
||||
|
||||
// Add edges
|
||||
for edge in &self.edges {
|
||||
let arrow = if edge.dashed { "-.->" } else { "-->" };
|
||||
lines.push(format!(" {} {} {}", edge.from, arrow, edge.to));
|
||||
}
|
||||
|
||||
lines.push(String::new());
|
||||
|
||||
// Add styles for status colors
|
||||
for node in &self.nodes {
|
||||
if !node.status_fill.is_empty() {
|
||||
lines.push(format!(" style {} fill:{},stroke:{}", node.id, node.status_fill, node.status_stroke));
|
||||
}
|
||||
}
|
||||
|
||||
lines.push(String::new());
|
||||
|
||||
// Add click handlers for navigation
|
||||
for node in &self.nodes {
|
||||
if !node.url.is_empty() {
|
||||
lines.push(format!(" click {} \"{}\"", node.id, node.url));
|
||||
}
|
||||
}
|
||||
|
||||
lines.join("\n")
|
||||
}
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// Status Color Mapping
|
||||
// =============================================================================
|
||||
|
||||
/// Map status names to colors for the lineage graph.
|
||||
/// Status names are matched case-insensitively against proto-generated enum names.
|
||||
pub fn status_to_fill(status_name: &str) -> &'static str {
|
||||
match status_name.to_lowercase().as_str() {
|
||||
"wantsuccessful" | "partitionlive" | "jobrunsucceeded" => "#dcfce7", // green
|
||||
"wantbuilding" | "partitionbuilding" | "jobrunrunning" => "#ede9fe", // blue
|
||||
"wantidle" | "jobrunqueued" => "#9ca3af", // gray
|
||||
"wantfailed" | "partitionfailed" | "jobrunfailed" => "#fee2e2;", // red
|
||||
"wantdepmiss" | "jobrundepmiss" => "#fef3e2", // orange
|
||||
"wantupstreamfailed" | "partitionupstreamfailed" => "#fee2e2", // red
|
||||
"wantupstreambuilding" | "partitionupstreambuilding" => "#a855f7", // purple
|
||||
"wantcanceled" | "jobruncanceled" => "#6b7280", // dark gray
|
||||
_ => "#e5e7eb", // light gray default
|
||||
}
|
||||
}
|
||||
|
||||
pub fn status_to_stroke(status_name: &str) -> &'static str {
|
||||
match status_name.to_lowercase().as_str() {
|
||||
"wantsuccessful" | "partitionlive" | "jobrunsucceeded" => "#22c55e", // green-500
|
||||
"wantbuilding" | "partitionbuilding" | "jobrunrunning" => "#8b5cf6", // violet-500
|
||||
"wantidle" | "jobrunqueued" => "#6b7280", // gray-500
|
||||
"wantfailed" | "partitionfailed" | "jobrunfailed" => "#ef4444", // red-500
|
||||
"wantdepmiss" | "jobrundepmiss" => "#f59e0b", // amber-500
|
||||
"wantupstreamfailed" | "partitionupstreamfailed" => "#ef4444", // red-500
|
||||
"wantcanceled" | "jobruncanceled" => "#4b5563", // gray-600
|
||||
"wantupstreambuilding" | "partitionupstreambuilding" => "#7c3aed", // violet-600
|
||||
_ => "#9ca3af", // gray-400 default
|
||||
}
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// Graph Builder
|
||||
// =============================================================================
|
||||
|
||||
/// Build a lineage graph starting from a want, traversing up to `max_depth` generations.
|
||||
/// The graph shows: Want -> Partitions -> JobRuns -> DerivativeWants -> ...
|
||||
pub fn build_lineage_graph(
|
||||
build_state: &BuildState,
|
||||
want_id: &str,
|
||||
max_depth: usize,
|
||||
) -> LineageGraph {
|
||||
let mut graph = LineageGraph::new();
|
||||
let mut visited_wants: HashSet<String> = HashSet::new();
|
||||
|
||||
fn add_want_to_graph(
|
||||
build_state: &BuildState,
|
||||
graph: &mut LineageGraph,
|
||||
visited_wants: &mut HashSet<String>,
|
||||
want_id: &str,
|
||||
depth: usize,
|
||||
max_depth: usize,
|
||||
) {
|
||||
if depth > max_depth || visited_wants.contains(want_id) {
|
||||
return;
|
||||
}
|
||||
visited_wants.insert(want_id.to_string());
|
||||
|
||||
let Some(want) = build_state.get_want(want_id) else {
|
||||
return;
|
||||
};
|
||||
|
||||
// Create node ID (sanitize for mermaid)
|
||||
let want_node_id = format!("W{}", sanitize_node_id(want_id));
|
||||
let status_name = want.status.as_ref().map(|s| s.name.as_str()).unwrap_or("");
|
||||
|
||||
// Add want node
|
||||
graph.add_node(LineageNode {
|
||||
id: want_node_id.clone(),
|
||||
label: truncate_label(want_id, 50),
|
||||
node_type: LineageNodeType::Want,
|
||||
status_fill: status_to_fill(status_name).to_string(),
|
||||
status_stroke: status_to_stroke(status_name).to_string(),
|
||||
url: format!("/wants/{}", want_id),
|
||||
});
|
||||
|
||||
// Add partition nodes and edges from want to partitions
|
||||
for partition in &want.partitions {
|
||||
let partition_ref = &partition.r#ref;
|
||||
let partition_node_id = format!("P{}", sanitize_node_id(partition_ref));
|
||||
|
||||
// Get partition status if available
|
||||
let partition_status = build_state
|
||||
.get_partition(partition_ref)
|
||||
.and_then(|p| p.status)
|
||||
.map(|s| s.name)
|
||||
.unwrap_or_default();
|
||||
|
||||
graph.add_node(LineageNode {
|
||||
id: partition_node_id.clone(),
|
||||
label: truncate_label(partition_ref, 50),
|
||||
node_type: LineageNodeType::Partition,
|
||||
status_fill: status_to_fill(&partition_status).to_string(),
|
||||
status_stroke: status_to_stroke(&partition_status).to_string(),
|
||||
url: format!("/partitions/{}", urlencoding::encode(partition_ref)),
|
||||
});
|
||||
|
||||
// Want -> Partition edge
|
||||
graph.add_edge(want_node_id.clone(), partition_node_id.clone(), false);
|
||||
}
|
||||
|
||||
// Add job run nodes and edges from partitions to job runs
|
||||
for job_run in &want.job_runs {
|
||||
let job_run_node_id = format!("JR{}", sanitize_node_id(&job_run.id));
|
||||
let job_status_name = job_run
|
||||
.status
|
||||
.as_ref()
|
||||
.map(|s| s.name.as_str())
|
||||
.unwrap_or("");
|
||||
|
||||
// Use job_label if available, otherwise job_run_id
|
||||
let job_label = if job_run.job_label.is_empty() {
|
||||
&job_run.id
|
||||
} else {
|
||||
&job_run.job_label
|
||||
};
|
||||
|
||||
graph.add_node(LineageNode {
|
||||
id: job_run_node_id.clone(),
|
||||
label: truncate_label(job_label, 50),
|
||||
node_type: LineageNodeType::JobRun,
|
||||
status_fill: status_to_fill(job_status_name).to_string(),
|
||||
status_stroke: status_to_stroke(job_status_name).to_string(),
|
||||
url: format!("/job_runs/{}", job_run.id),
|
||||
});
|
||||
|
||||
// Connect partitions being built to this job run (dashed = building relationship)
|
||||
for partition in &job_run.building_partitions {
|
||||
let partition_node_id = format!("P{}", sanitize_node_id(&partition.r#ref));
|
||||
graph.add_edge(partition_node_id, job_run_node_id.clone(), true);
|
||||
}
|
||||
|
||||
// Recurse into derivative wants
|
||||
for derivative_want_id in &job_run.derivative_want_ids {
|
||||
add_want_to_graph(
|
||||
build_state,
|
||||
graph,
|
||||
visited_wants,
|
||||
derivative_want_id,
|
||||
depth + 1,
|
||||
max_depth,
|
||||
);
|
||||
|
||||
// Add edge from job run to derivative want
|
||||
let derivative_want_node_id = format!("W{}", sanitize_node_id(derivative_want_id));
|
||||
graph.add_edge(job_run_node_id.clone(), derivative_want_node_id, false);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
add_want_to_graph(
|
||||
build_state,
|
||||
&mut graph,
|
||||
&mut visited_wants,
|
||||
want_id,
|
||||
0,
|
||||
max_depth,
|
||||
);
|
||||
|
||||
graph
|
||||
}
|
||||
|
||||
/// Sanitize a string to be a valid mermaid node ID (alphanumeric + underscore only)
|
||||
pub fn sanitize_node_id(s: &str) -> String {
|
||||
s.chars()
|
||||
.map(|c| if c.is_alphanumeric() { c } else { '_' })
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Truncate a label to fit in the graph, adding ellipsis if needed
|
||||
pub fn truncate_label(s: &str, max_len: usize) -> String {
|
||||
if s.len() <= max_len {
|
||||
s.to_string()
|
||||
} else {
|
||||
format!("{}...", &s[..max_len - 3])
|
||||
}
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// Tests
|
||||
// =============================================================================
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::build_state::BuildState;
|
||||
use crate::util::test_scenarios::multihop_scenario;
|
||||
|
||||
/// Helper to replay events into a fresh BuildState
|
||||
fn build_state_from_events(events: &[crate::data_build_event::Event]) -> BuildState {
|
||||
let mut state = BuildState::default();
|
||||
for event in events {
|
||||
state.handle_event(event);
|
||||
}
|
||||
state
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_lineage_graph_mermaid_generation() {
|
||||
let mut graph = LineageGraph::new();
|
||||
|
||||
graph.add_node(LineageNode {
|
||||
id: "W_beta_want".to_string(),
|
||||
label: "beta-want".to_string(),
|
||||
node_type: LineageNodeType::Want,
|
||||
status_fill: "#22c55e".to_string(),
|
||||
status_stroke: "#22c55e".to_string(),
|
||||
url: "/wants/beta-want".to_string(),
|
||||
});
|
||||
|
||||
graph.add_node(LineageNode {
|
||||
id: "P_data_beta".to_string(),
|
||||
label: "data/beta".to_string(),
|
||||
node_type: LineageNodeType::Partition,
|
||||
status_fill: "#22c55e".to_string(),
|
||||
status_stroke: "#22c55e".to_string(),
|
||||
url: "/partitions/data%2Fbeta".to_string(),
|
||||
});
|
||||
|
||||
graph.add_node(LineageNode {
|
||||
id: "JR_beta_job".to_string(),
|
||||
label: "//job_beta".to_string(),
|
||||
node_type: LineageNodeType::JobRun,
|
||||
status_fill: "#f97316".to_string(),
|
||||
status_stroke: "#f97316".to_string(),
|
||||
url: "/job_runs/beta-job".to_string(),
|
||||
});
|
||||
|
||||
graph.add_edge("W_beta_want".to_string(), "P_data_beta".to_string(), false);
|
||||
graph.add_edge("P_data_beta".to_string(), "JR_beta_job".to_string(), true);
|
||||
|
||||
let mermaid = graph.to_mermaid();
|
||||
|
||||
assert!(
|
||||
mermaid.contains("flowchart TD"),
|
||||
"Should have flowchart header"
|
||||
);
|
||||
assert!(mermaid.contains("W_beta_want"), "Should have want node");
|
||||
assert!(
|
||||
mermaid.contains("P_data_beta"),
|
||||
"Should have partition node"
|
||||
);
|
||||
assert!(mermaid.contains("JR_beta_job"), "Should have job run node");
|
||||
assert!(mermaid.contains("-->"), "Should have solid edge");
|
||||
assert!(mermaid.contains("-.->"), "Should have dashed edge");
|
||||
assert!(
|
||||
mermaid.contains("style W_beta_want fill:#22c55e"),
|
||||
"Should have status color"
|
||||
);
|
||||
assert!(
|
||||
mermaid.contains("click W_beta_want"),
|
||||
"Should have click handler"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_sanitize_node_id() {
|
||||
assert_eq!(sanitize_node_id("data/beta"), "data_beta");
|
||||
assert_eq!(sanitize_node_id("job-run-123"), "job_run_123");
|
||||
assert_eq!(sanitize_node_id("simple"), "simple");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_truncate_label() {
|
||||
assert_eq!(truncate_label("short", 10), "short");
|
||||
assert_eq!(truncate_label("this is very long", 10), "this is...");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_build_lineage_graph_from_multihop_scenario() {
|
||||
let (events, ids) = multihop_scenario();
|
||||
let state = build_state_from_events(&events);
|
||||
|
||||
let graph = build_lineage_graph(&state, &ids.beta_want_id, 3);
|
||||
|
||||
// Should have nodes for beta want, alpha want, partitions, and job runs
|
||||
assert!(!graph.nodes.is_empty(), "Graph should have nodes");
|
||||
assert!(!graph.edges.is_empty(), "Graph should have edges");
|
||||
|
||||
// Check that the mermaid output is valid
|
||||
let mermaid = graph.to_mermaid();
|
||||
assert!(mermaid.contains("flowchart TD"));
|
||||
assert!(mermaid.contains("beta_want") || mermaid.contains("beta"));
|
||||
}
|
||||
}
|
||||
|
|
@ -1,440 +0,0 @@
|
|||
use crate::{JobLogEntry, JobLogsRequest, JobLogsResponse, log_message};
|
||||
use serde_json;
|
||||
use std::collections::HashMap;
|
||||
use std::fs::{self, File};
|
||||
use std::io::{BufRead, BufReader};
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::time::{SystemTime, UNIX_EPOCH};
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Error, Debug)]
|
||||
pub enum LogAccessError {
|
||||
#[error("IO error: {0}")]
|
||||
Io(#[from] std::io::Error),
|
||||
#[error("JSON parsing error: {0}")]
|
||||
Json(#[from] serde_json::Error),
|
||||
#[error("Invalid request: {0}")]
|
||||
InvalidRequest(String),
|
||||
#[error("Job not found: {0}")]
|
||||
JobNotFound(String),
|
||||
}
|
||||
|
||||
pub struct LogReader {
|
||||
logs_base_path: PathBuf,
|
||||
}
|
||||
|
||||
impl LogReader {
|
||||
pub fn new<P: AsRef<Path>>(logs_base_path: P) -> Self {
|
||||
Self {
|
||||
logs_base_path: logs_base_path.as_ref().to_path_buf(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Create LogReader with the default logs directory
|
||||
pub fn default() -> Self {
|
||||
Self::new(crate::log_collector::LogCollector::default_logs_dir())
|
||||
}
|
||||
|
||||
/// Get job logs according to the request criteria
|
||||
pub fn get_job_logs(&self, request: &JobLogsRequest) -> Result<JobLogsResponse, LogAccessError> {
|
||||
let job_file_path = self.find_job_file(&request.job_run_id)?;
|
||||
|
||||
let file = File::open(&job_file_path)?;
|
||||
let reader = BufReader::new(file);
|
||||
|
||||
let mut entries = Vec::new();
|
||||
let mut count = 0u32;
|
||||
let limit = if request.limit > 0 { request.limit } else { 1000 }; // Default limit
|
||||
|
||||
for line in reader.lines() {
|
||||
let line = line?;
|
||||
|
||||
// Skip empty lines
|
||||
if line.trim().is_empty() {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Parse the log entry
|
||||
let entry: JobLogEntry = serde_json::from_str(&line)?;
|
||||
|
||||
// Apply filters
|
||||
if !self.matches_filters(&entry, request) {
|
||||
continue;
|
||||
}
|
||||
|
||||
entries.push(entry);
|
||||
count += 1;
|
||||
|
||||
// Stop if we've hit the limit
|
||||
if count >= limit {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Check if there are more entries by trying to read one more
|
||||
let has_more = count == limit;
|
||||
|
||||
Ok(JobLogsResponse {
|
||||
entries,
|
||||
has_more,
|
||||
})
|
||||
}
|
||||
|
||||
/// List available job run IDs for a given date range
|
||||
pub fn list_available_jobs(&self, date_range: Option<(String, String)>) -> Result<Vec<String>, LogAccessError> {
|
||||
let mut job_ids = Vec::new();
|
||||
|
||||
// If no date range specified, look at all directories
|
||||
if let Some((start_date, end_date)) = date_range {
|
||||
// Parse date range and iterate through dates
|
||||
for date_str in self.date_range_iterator(&start_date, &end_date)? {
|
||||
let date_dir = self.logs_base_path.join(&date_str);
|
||||
if date_dir.exists() {
|
||||
job_ids.extend(self.get_job_ids_from_directory(&date_dir)?);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// List all date directories and collect job IDs
|
||||
if self.logs_base_path.exists() {
|
||||
for entry in fs::read_dir(&self.logs_base_path)? {
|
||||
let entry = entry?;
|
||||
if entry.file_type()?.is_dir() {
|
||||
job_ids.extend(self.get_job_ids_from_directory(&entry.path())?);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Remove duplicates and sort
|
||||
job_ids.sort();
|
||||
job_ids.dedup();
|
||||
|
||||
Ok(job_ids)
|
||||
}
|
||||
|
||||
/// Get metrics points for a specific job
|
||||
pub fn get_job_metrics(&self, job_run_id: &str) -> Result<Vec<crate::MetricPoint>, LogAccessError> {
|
||||
let job_file_path = self.find_job_file(job_run_id)?;
|
||||
|
||||
let file = File::open(&job_file_path)?;
|
||||
let reader = BufReader::new(file);
|
||||
|
||||
let mut metrics = Vec::new();
|
||||
|
||||
for line in reader.lines() {
|
||||
let line = line?;
|
||||
|
||||
// Skip empty lines
|
||||
if line.trim().is_empty() {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Parse the log entry
|
||||
let entry: JobLogEntry = serde_json::from_str(&line)?;
|
||||
|
||||
// Extract metrics from the entry
|
||||
if let Some(crate::job_log_entry::Content::Metric(metric)) = entry.content {
|
||||
metrics.push(metric);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(metrics)
|
||||
}
|
||||
|
||||
/// Find the JSONL file for a specific job run ID
|
||||
fn find_job_file(&self, job_run_id: &str) -> Result<PathBuf, LogAccessError> {
|
||||
// Search through all date directories for the job file
|
||||
if !self.logs_base_path.exists() {
|
||||
return Err(LogAccessError::JobNotFound(job_run_id.to_string()));
|
||||
}
|
||||
|
||||
for entry in fs::read_dir(&self.logs_base_path)? {
|
||||
let entry = entry?;
|
||||
if entry.file_type()?.is_dir() {
|
||||
let job_file = entry.path().join(format!("{}.jsonl", job_run_id));
|
||||
if job_file.exists() {
|
||||
return Ok(job_file);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Err(LogAccessError::JobNotFound(job_run_id.to_string()))
|
||||
}
|
||||
|
||||
/// Check if a log entry matches the request filters
|
||||
fn matches_filters(&self, entry: &JobLogEntry, request: &JobLogsRequest) -> bool {
|
||||
// Filter by timestamp (since_timestamp is in nanoseconds)
|
||||
if request.since_timestamp > 0 {
|
||||
if let Ok(entry_timestamp) = entry.timestamp.parse::<u64>() {
|
||||
let entry_timestamp_ns = entry_timestamp * 1_000_000_000; // Convert seconds to nanoseconds
|
||||
if entry_timestamp_ns <= request.since_timestamp as u64 {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Filter by log level (only applies to log messages)
|
||||
if request.min_level > 0 {
|
||||
if let Some(crate::job_log_entry::Content::Log(log_msg)) = &entry.content {
|
||||
if log_msg.level < request.min_level {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
// For non-log entries (metrics, events), we include them regardless of min_level
|
||||
}
|
||||
|
||||
true
|
||||
}
|
||||
|
||||
/// Get job IDs from files in a specific directory
|
||||
fn get_job_ids_from_directory(&self, dir_path: &Path) -> Result<Vec<String>, LogAccessError> {
|
||||
let mut job_ids = Vec::new();
|
||||
|
||||
for entry in fs::read_dir(dir_path)? {
|
||||
let entry = entry?;
|
||||
if entry.file_type()?.is_file() {
|
||||
if let Some(file_name) = entry.file_name().to_str() {
|
||||
if file_name.ends_with(".jsonl") {
|
||||
// Extract job ID by removing .jsonl extension
|
||||
let job_id = file_name.trim_end_matches(".jsonl");
|
||||
job_ids.push(job_id.to_string());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(job_ids)
|
||||
}
|
||||
|
||||
/// Generate an iterator over date strings in a range (YYYY-MM-DD format)
|
||||
fn date_range_iterator(&self, start_date: &str, end_date: &str) -> Result<Vec<String>, LogAccessError> {
|
||||
// Simple implementation - for production might want more robust date parsing
|
||||
let start_parts: Vec<&str> = start_date.split('-').collect();
|
||||
let end_parts: Vec<&str> = end_date.split('-').collect();
|
||||
|
||||
if start_parts.len() != 3 || end_parts.len() != 3 {
|
||||
return Err(LogAccessError::InvalidRequest("Invalid date format, expected YYYY-MM-DD".to_string()));
|
||||
}
|
||||
|
||||
// For now, just return the start and end dates
|
||||
// In a full implementation, you'd iterate through all dates in between
|
||||
let mut dates = vec![start_date.to_string()];
|
||||
if start_date != end_date {
|
||||
dates.push(end_date.to_string());
|
||||
}
|
||||
|
||||
Ok(dates)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::{job_log_entry, log_message, LogMessage, PartitionRef, MetricPoint};
|
||||
use std::io::Write;
|
||||
use tempfile::TempDir;
|
||||
|
||||
fn create_test_log_entry(job_id: &str, sequence: u64, timestamp: &str) -> JobLogEntry {
|
||||
JobLogEntry {
|
||||
timestamp: timestamp.to_string(),
|
||||
job_id: job_id.to_string(),
|
||||
outputs: vec![PartitionRef { r#str: "test/partition".to_string() }],
|
||||
sequence_number: sequence,
|
||||
content: Some(job_log_entry::Content::Log(LogMessage {
|
||||
level: log_message::LogLevel::Info as i32,
|
||||
message: format!("Test log message {}", sequence),
|
||||
fields: HashMap::new(),
|
||||
})),
|
||||
}
|
||||
}
|
||||
|
||||
fn create_test_metric_entry(job_id: &str, sequence: u64, timestamp: &str) -> JobLogEntry {
|
||||
JobLogEntry {
|
||||
timestamp: timestamp.to_string(),
|
||||
job_id: job_id.to_string(),
|
||||
outputs: vec![PartitionRef { r#str: "test/partition".to_string() }],
|
||||
sequence_number: sequence,
|
||||
content: Some(job_log_entry::Content::Metric(MetricPoint {
|
||||
name: "test_metric".to_string(),
|
||||
value: 42.0,
|
||||
labels: HashMap::new(),
|
||||
unit: "count".to_string(),
|
||||
})),
|
||||
}
|
||||
}
|
||||
|
||||
fn setup_test_logs(temp_dir: &TempDir) -> Result<(), Box<dyn std::error::Error>> {
|
||||
// Create date directory
|
||||
let date_dir = temp_dir.path().join("2025-01-27");
|
||||
fs::create_dir_all(&date_dir)?;
|
||||
|
||||
// Create a test job file
|
||||
let job_file = date_dir.join("job_123.jsonl");
|
||||
let mut file = File::create(&job_file)?;
|
||||
|
||||
// Write test entries
|
||||
let entry1 = create_test_log_entry("job_123", 1, "1737993600"); // 2025-01-27 12:00:00
|
||||
let entry2 = create_test_log_entry("job_123", 2, "1737993660"); // 2025-01-27 12:01:00
|
||||
let entry3 = create_test_metric_entry("job_123", 3, "1737993720"); // 2025-01-27 12:02:00
|
||||
|
||||
writeln!(file, "{}", serde_json::to_string(&entry1)?)?;
|
||||
writeln!(file, "{}", serde_json::to_string(&entry2)?)?;
|
||||
writeln!(file, "{}", serde_json::to_string(&entry3)?)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_log_reader_creation() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let reader = LogReader::new(temp_dir.path());
|
||||
|
||||
assert_eq!(reader.logs_base_path, temp_dir.path());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_get_job_logs_basic() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
setup_test_logs(&temp_dir).unwrap();
|
||||
|
||||
let reader = LogReader::new(temp_dir.path());
|
||||
let request = JobLogsRequest {
|
||||
job_run_id: "job_123".to_string(),
|
||||
since_timestamp: 0,
|
||||
min_level: 0,
|
||||
limit: 10,
|
||||
};
|
||||
|
||||
let response = reader.get_job_logs(&request).unwrap();
|
||||
|
||||
assert_eq!(response.entries.len(), 3);
|
||||
assert!(!response.has_more);
|
||||
|
||||
// Verify the entries are in order
|
||||
assert_eq!(response.entries[0].sequence_number, 1);
|
||||
assert_eq!(response.entries[1].sequence_number, 2);
|
||||
assert_eq!(response.entries[2].sequence_number, 3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_get_job_logs_with_timestamp_filter() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
setup_test_logs(&temp_dir).unwrap();
|
||||
|
||||
let reader = LogReader::new(temp_dir.path());
|
||||
let request = JobLogsRequest {
|
||||
job_run_id: "job_123".to_string(),
|
||||
since_timestamp: 1737993600_000_000_000, // 2025-01-27 12:00:00 in nanoseconds
|
||||
min_level: 0,
|
||||
limit: 10,
|
||||
};
|
||||
|
||||
let response = reader.get_job_logs(&request).unwrap();
|
||||
|
||||
// Should get entries 2 and 3 (after the timestamp)
|
||||
assert_eq!(response.entries.len(), 2);
|
||||
assert_eq!(response.entries[0].sequence_number, 2);
|
||||
assert_eq!(response.entries[1].sequence_number, 3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_get_job_logs_with_level_filter() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
setup_test_logs(&temp_dir).unwrap();
|
||||
|
||||
let reader = LogReader::new(temp_dir.path());
|
||||
let request = JobLogsRequest {
|
||||
job_run_id: "job_123".to_string(),
|
||||
since_timestamp: 0,
|
||||
min_level: log_message::LogLevel::Warn as i32, // Only WARN and ERROR
|
||||
limit: 10,
|
||||
};
|
||||
|
||||
let response = reader.get_job_logs(&request).unwrap();
|
||||
|
||||
// Should get only the metric entry (sequence 3) since log entries are INFO level
|
||||
assert_eq!(response.entries.len(), 1);
|
||||
assert_eq!(response.entries[0].sequence_number, 3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_get_job_logs_with_limit() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
setup_test_logs(&temp_dir).unwrap();
|
||||
|
||||
let reader = LogReader::new(temp_dir.path());
|
||||
let request = JobLogsRequest {
|
||||
job_run_id: "job_123".to_string(),
|
||||
since_timestamp: 0,
|
||||
min_level: 0,
|
||||
limit: 2,
|
||||
};
|
||||
|
||||
let response = reader.get_job_logs(&request).unwrap();
|
||||
|
||||
assert_eq!(response.entries.len(), 2);
|
||||
assert!(response.has_more);
|
||||
assert_eq!(response.entries[0].sequence_number, 1);
|
||||
assert_eq!(response.entries[1].sequence_number, 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_list_available_jobs() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
setup_test_logs(&temp_dir).unwrap();
|
||||
|
||||
// Create another job file
|
||||
let date_dir = temp_dir.path().join("2025-01-27");
|
||||
let job_file2 = date_dir.join("job_456.jsonl");
|
||||
let mut file2 = File::create(&job_file2).unwrap();
|
||||
let entry = create_test_log_entry("job_456", 1, "1737993600");
|
||||
writeln!(file2, "{}", serde_json::to_string(&entry).unwrap()).unwrap();
|
||||
|
||||
let reader = LogReader::new(temp_dir.path());
|
||||
let job_ids = reader.list_available_jobs(None).unwrap();
|
||||
|
||||
assert_eq!(job_ids.len(), 2);
|
||||
assert!(job_ids.contains(&"job_123".to_string()));
|
||||
assert!(job_ids.contains(&"job_456".to_string()));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_get_job_metrics() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
setup_test_logs(&temp_dir).unwrap();
|
||||
|
||||
let reader = LogReader::new(temp_dir.path());
|
||||
let metrics = reader.get_job_metrics("job_123").unwrap();
|
||||
|
||||
assert_eq!(metrics.len(), 1);
|
||||
assert_eq!(metrics[0].name, "test_metric");
|
||||
assert_eq!(metrics[0].value, 42.0);
|
||||
assert_eq!(metrics[0].unit, "count");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_job_not_found() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let reader = LogReader::new(temp_dir.path());
|
||||
|
||||
let request = JobLogsRequest {
|
||||
job_run_id: "nonexistent_job".to_string(),
|
||||
since_timestamp: 0,
|
||||
min_level: 0,
|
||||
limit: 10,
|
||||
};
|
||||
|
||||
let result = reader.get_job_logs(&request);
|
||||
assert!(result.is_err());
|
||||
assert!(matches!(result.unwrap_err(), LogAccessError::JobNotFound(_)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_default_log_reader() {
|
||||
let reader = LogReader::default();
|
||||
|
||||
// Should use the default logs directory
|
||||
let expected = crate::log_collector::LogCollector::default_logs_dir();
|
||||
assert_eq!(reader.logs_base_path, expected);
|
||||
}
|
||||
}
|
||||
|
|
@ -1,402 +0,0 @@
|
|||
use crate::{JobLogEntry, job_log_entry};
|
||||
use serde_json;
|
||||
use std::collections::HashMap;
|
||||
use std::fs::{self, File, OpenOptions};
|
||||
use std::io::{BufRead, Write};
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::time::{SystemTime, UNIX_EPOCH};
|
||||
use thiserror::Error;
|
||||
|
||||
/// Convert days since Unix epoch to (year, month, day)
|
||||
/// This is a simplified algorithm good enough for log file naming
|
||||
fn days_to_ymd(days: i32) -> (i32, u32, u32) {
|
||||
// Start from 1970-01-01
|
||||
let mut year = 1970;
|
||||
let mut remaining_days = days;
|
||||
|
||||
// Handle years
|
||||
loop {
|
||||
let days_in_year = if is_leap_year(year) { 366 } else { 365 };
|
||||
if remaining_days < days_in_year {
|
||||
break;
|
||||
}
|
||||
remaining_days -= days_in_year;
|
||||
year += 1;
|
||||
}
|
||||
|
||||
// Handle months
|
||||
let mut month = 1;
|
||||
for m in 1..=12 {
|
||||
let days_in_month = days_in_month(year, m);
|
||||
if remaining_days < days_in_month as i32 {
|
||||
month = m;
|
||||
break;
|
||||
}
|
||||
remaining_days -= days_in_month as i32;
|
||||
}
|
||||
|
||||
let day = remaining_days + 1; // Days are 1-indexed
|
||||
(year, month, day as u32)
|
||||
}
|
||||
|
||||
/// Check if a year is a leap year
|
||||
fn is_leap_year(year: i32) -> bool {
|
||||
(year % 4 == 0 && year % 100 != 0) || (year % 400 == 0)
|
||||
}
|
||||
|
||||
/// Get number of days in a given month
|
||||
fn days_in_month(year: i32, month: u32) -> u32 {
|
||||
match month {
|
||||
1 | 3 | 5 | 7 | 8 | 10 | 12 => 31,
|
||||
4 | 6 | 9 | 11 => 30,
|
||||
2 => if is_leap_year(year) { 29 } else { 28 },
|
||||
_ => 30, // Should never happen
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Error, Debug)]
|
||||
pub enum LogCollectorError {
|
||||
#[error("IO error: {0}")]
|
||||
Io(#[from] std::io::Error),
|
||||
#[error("JSON parsing error: {0}")]
|
||||
Json(#[from] serde_json::Error),
|
||||
#[error("Invalid log entry: {0}")]
|
||||
InvalidLogEntry(String),
|
||||
}
|
||||
|
||||
pub struct LogCollector {
|
||||
logs_dir: PathBuf,
|
||||
active_files: HashMap<String, File>,
|
||||
job_label_mapping: HashMap<String, String>, // job_run_id -> job_label
|
||||
}
|
||||
|
||||
impl LogCollector {
|
||||
pub fn new<P: AsRef<Path>>(logs_dir: P) -> Result<Self, LogCollectorError> {
|
||||
let logs_dir = logs_dir.as_ref().to_path_buf();
|
||||
|
||||
// Ensure the base logs directory exists
|
||||
if !logs_dir.exists() {
|
||||
fs::create_dir_all(&logs_dir)?;
|
||||
}
|
||||
|
||||
Ok(Self {
|
||||
logs_dir,
|
||||
active_files: HashMap::new(),
|
||||
job_label_mapping: HashMap::new(),
|
||||
})
|
||||
}
|
||||
|
||||
/// Set the job label for a specific job run ID
|
||||
pub fn set_job_label(&mut self, job_run_id: &str, job_label: &str) {
|
||||
self.job_label_mapping.insert(job_run_id.to_string(), job_label.to_string());
|
||||
}
|
||||
|
||||
/// Get the default logs directory based on environment variable or fallback
|
||||
pub fn default_logs_dir() -> PathBuf {
|
||||
std::env::var("DATABUILD_LOGS_DIR")
|
||||
.map(PathBuf::from)
|
||||
.unwrap_or_else(|_| {
|
||||
// Fallback to ./logs/databuild for safety - avoid system directories
|
||||
std::env::current_dir()
|
||||
.unwrap_or_else(|_| PathBuf::from("."))
|
||||
.join("logs")
|
||||
.join("databuild")
|
||||
})
|
||||
}
|
||||
|
||||
/// Create a date-organized directory path for today
|
||||
fn get_date_directory(&self) -> Result<PathBuf, LogCollectorError> {
|
||||
let now = SystemTime::now()
|
||||
.duration_since(UNIX_EPOCH)
|
||||
.map_err(|e| LogCollectorError::InvalidLogEntry(format!("System time error: {}", e)))?;
|
||||
|
||||
let timestamp = now.as_secs();
|
||||
|
||||
// Convert timestamp to YYYY-MM-DD format
|
||||
// Using a simple calculation instead of chrono
|
||||
let days_since_epoch = timestamp / 86400; // 86400 seconds in a day
|
||||
let days_since_1970 = days_since_epoch as i32;
|
||||
|
||||
// Calculate year, month, day from days since epoch
|
||||
// This is a simplified calculation - good enough for log file naming
|
||||
let (year, month, day) = days_to_ymd(days_since_1970);
|
||||
let date_str = format!("{:04}-{:02}-{:02}", year, month, day);
|
||||
|
||||
let date_dir = self.logs_dir.join(date_str);
|
||||
|
||||
// Ensure the date directory exists
|
||||
if !date_dir.exists() {
|
||||
fs::create_dir_all(&date_dir)?;
|
||||
}
|
||||
|
||||
Ok(date_dir)
|
||||
}
|
||||
|
||||
/// Get or create a file handle for a specific job run
|
||||
fn get_job_file(&mut self, job_run_id: &str) -> Result<&mut File, LogCollectorError> {
|
||||
if !self.active_files.contains_key(job_run_id) {
|
||||
let date_dir = self.get_date_directory()?;
|
||||
let file_path = date_dir.join(format!("{}.jsonl", job_run_id));
|
||||
|
||||
let file = OpenOptions::new()
|
||||
.create(true)
|
||||
.append(true)
|
||||
.open(&file_path)?;
|
||||
|
||||
self.active_files.insert(job_run_id.to_string(), file);
|
||||
}
|
||||
|
||||
Ok(self.active_files.get_mut(job_run_id).unwrap())
|
||||
}
|
||||
|
||||
/// Write a single log entry to the appropriate JSONL file
|
||||
pub fn write_log_entry(&mut self, job_run_id: &str, entry: &JobLogEntry) -> Result<(), LogCollectorError> {
|
||||
let file = self.get_job_file(job_run_id)?;
|
||||
let json_line = serde_json::to_string(entry)?;
|
||||
writeln!(file, "{}", json_line)?;
|
||||
file.flush()?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Consume stdout from a job process and parse/store log entries
|
||||
pub fn consume_job_output<R: BufRead>(&mut self, job_run_id: &str, reader: R) -> Result<(), LogCollectorError> {
|
||||
for line in reader.lines() {
|
||||
let line = line?;
|
||||
|
||||
// Skip empty lines
|
||||
if line.trim().is_empty() {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Try to parse as JobLogEntry
|
||||
match serde_json::from_str::<JobLogEntry>(&line) {
|
||||
Ok(mut entry) => {
|
||||
// Validate that the job_id matches
|
||||
if entry.job_id != job_run_id {
|
||||
return Err(LogCollectorError::InvalidLogEntry(
|
||||
format!("Job ID mismatch: expected {}, got {}", job_run_id, entry.job_id)
|
||||
));
|
||||
}
|
||||
|
||||
// Enrich WrapperJobEvent and Manifest with job_label if available
|
||||
if let Some(job_label) = self.job_label_mapping.get(job_run_id) {
|
||||
match &mut entry.content {
|
||||
Some(job_log_entry::Content::JobEvent(ref mut job_event)) => {
|
||||
job_event.job_label = Some(job_label.clone());
|
||||
}
|
||||
Some(job_log_entry::Content::Manifest(ref mut manifest)) => {
|
||||
if let Some(ref mut task) = manifest.task {
|
||||
if let Some(ref mut job) = task.job {
|
||||
job.label = job_label.clone();
|
||||
}
|
||||
}
|
||||
}
|
||||
_ => {} // No enrichment needed for Log entries
|
||||
}
|
||||
}
|
||||
|
||||
self.write_log_entry(job_run_id, &entry)?;
|
||||
}
|
||||
Err(_) => {
|
||||
// If it's not a JobLogEntry, treat it as raw output and create a log entry
|
||||
let raw_entry = JobLogEntry {
|
||||
timestamp: SystemTime::now()
|
||||
.duration_since(UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_secs()
|
||||
.to_string(),
|
||||
job_id: job_run_id.to_string(),
|
||||
outputs: vec![], // Raw output doesn't have specific outputs
|
||||
sequence_number: 0, // Raw output gets sequence 0
|
||||
content: Some(crate::job_log_entry::Content::Log(crate::LogMessage {
|
||||
level: crate::log_message::LogLevel::Info as i32,
|
||||
message: line,
|
||||
fields: HashMap::new(),
|
||||
})),
|
||||
};
|
||||
|
||||
self.write_log_entry(job_run_id, &raw_entry)?;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Close and flush all active files
|
||||
pub fn close_all(&mut self) -> Result<(), LogCollectorError> {
|
||||
for (_, mut file) in self.active_files.drain() {
|
||||
file.flush()?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Close and flush a specific job's file
|
||||
pub fn close_job(&mut self, job_run_id: &str) -> Result<(), LogCollectorError> {
|
||||
if let Some(mut file) = self.active_files.remove(job_run_id) {
|
||||
file.flush()?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::{job_log_entry, log_message, LogMessage, PartitionRef};
|
||||
use std::io::Cursor;
|
||||
use tempfile::TempDir;
|
||||
|
||||
fn create_test_log_entry(job_id: &str, sequence: u64) -> JobLogEntry {
|
||||
JobLogEntry {
|
||||
timestamp: "1234567890".to_string(),
|
||||
job_id: job_id.to_string(),
|
||||
outputs: vec![PartitionRef { r#str: "test/partition".to_string() }],
|
||||
sequence_number: sequence,
|
||||
content: Some(job_log_entry::Content::Log(LogMessage {
|
||||
level: log_message::LogLevel::Info as i32,
|
||||
message: "Test log message".to_string(),
|
||||
fields: HashMap::new(),
|
||||
})),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_log_collector_creation() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let collector = LogCollector::new(temp_dir.path()).unwrap();
|
||||
|
||||
assert_eq!(collector.logs_dir, temp_dir.path());
|
||||
assert!(collector.active_files.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_write_single_log_entry() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let mut collector = LogCollector::new(temp_dir.path()).unwrap();
|
||||
|
||||
let entry = create_test_log_entry("job_123", 1);
|
||||
collector.write_log_entry("job_123", &entry).unwrap();
|
||||
|
||||
// Verify file was created and contains the entry
|
||||
collector.close_all().unwrap();
|
||||
|
||||
// Check that a date directory was created
|
||||
let date_dirs: Vec<_> = fs::read_dir(temp_dir.path()).unwrap().collect();
|
||||
assert_eq!(date_dirs.len(), 1);
|
||||
|
||||
// Check that the job file exists in the date directory
|
||||
let date_dir_path = date_dirs[0].as_ref().unwrap().path();
|
||||
let job_files: Vec<_> = fs::read_dir(&date_dir_path).unwrap().collect();
|
||||
assert_eq!(job_files.len(), 1);
|
||||
|
||||
let job_file_path = job_files[0].as_ref().unwrap().path();
|
||||
assert!(job_file_path.file_name().unwrap().to_string_lossy().contains("job_123"));
|
||||
|
||||
// Verify content
|
||||
let content = fs::read_to_string(&job_file_path).unwrap();
|
||||
assert!(content.contains("Test log message"));
|
||||
assert!(content.contains("\"sequence_number\":1"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_consume_structured_output() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let mut collector = LogCollector::new(temp_dir.path()).unwrap();
|
||||
|
||||
let entry1 = create_test_log_entry("job_456", 1);
|
||||
let entry2 = create_test_log_entry("job_456", 2);
|
||||
|
||||
let input = format!("{}\n{}\n",
|
||||
serde_json::to_string(&entry1).unwrap(),
|
||||
serde_json::to_string(&entry2).unwrap()
|
||||
);
|
||||
|
||||
let reader = Cursor::new(input);
|
||||
collector.consume_job_output("job_456", reader).unwrap();
|
||||
collector.close_all().unwrap();
|
||||
|
||||
// Verify both entries were written
|
||||
let date_dirs: Vec<_> = fs::read_dir(temp_dir.path()).unwrap().collect();
|
||||
let date_dir_path = date_dirs[0].as_ref().unwrap().path();
|
||||
let job_files: Vec<_> = fs::read_dir(&date_dir_path).unwrap().collect();
|
||||
let job_file_path = job_files[0].as_ref().unwrap().path();
|
||||
|
||||
let content = fs::read_to_string(&job_file_path).unwrap();
|
||||
let lines: Vec<&str> = content.trim().split('\n').collect();
|
||||
assert_eq!(lines.len(), 2);
|
||||
|
||||
// Verify both entries can be parsed back
|
||||
let parsed1: JobLogEntry = serde_json::from_str(lines[0]).unwrap();
|
||||
let parsed2: JobLogEntry = serde_json::from_str(lines[1]).unwrap();
|
||||
assert_eq!(parsed1.sequence_number, 1);
|
||||
assert_eq!(parsed2.sequence_number, 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_consume_mixed_output() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let mut collector = LogCollector::new(temp_dir.path()).unwrap();
|
||||
|
||||
let entry = create_test_log_entry("job_789", 1);
|
||||
let structured_line = serde_json::to_string(&entry).unwrap();
|
||||
|
||||
let input = format!("{}\nRaw output line\nAnother raw line\n", structured_line);
|
||||
|
||||
let reader = Cursor::new(input);
|
||||
collector.consume_job_output("job_789", reader).unwrap();
|
||||
collector.close_all().unwrap();
|
||||
|
||||
// Verify all lines were captured (1 structured + 2 raw)
|
||||
let date_dirs: Vec<_> = fs::read_dir(temp_dir.path()).unwrap().collect();
|
||||
let date_dir_path = date_dirs[0].as_ref().unwrap().path();
|
||||
let job_files: Vec<_> = fs::read_dir(&date_dir_path).unwrap().collect();
|
||||
let job_file_path = job_files[0].as_ref().unwrap().path();
|
||||
|
||||
let content = fs::read_to_string(&job_file_path).unwrap();
|
||||
let lines: Vec<&str> = content.trim().split('\n').collect();
|
||||
assert_eq!(lines.len(), 3);
|
||||
|
||||
// First line should be the structured entry
|
||||
let parsed1: JobLogEntry = serde_json::from_str(lines[0]).unwrap();
|
||||
assert_eq!(parsed1.sequence_number, 1);
|
||||
|
||||
// Second and third lines should be raw output entries
|
||||
let parsed2: JobLogEntry = serde_json::from_str(lines[1]).unwrap();
|
||||
let parsed3: JobLogEntry = serde_json::from_str(lines[2]).unwrap();
|
||||
assert_eq!(parsed2.sequence_number, 0); // Raw output gets sequence 0
|
||||
assert_eq!(parsed3.sequence_number, 0);
|
||||
|
||||
if let Some(job_log_entry::Content::Log(log_msg)) = &parsed2.content {
|
||||
assert_eq!(log_msg.message, "Raw output line");
|
||||
} else {
|
||||
panic!("Expected log content");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_default_logs_dir() {
|
||||
let default_dir = LogCollector::default_logs_dir();
|
||||
|
||||
// Should be a valid path
|
||||
assert!(default_dir.is_absolute() || default_dir.starts_with("."));
|
||||
assert!(default_dir.to_string_lossy().contains("logs"));
|
||||
assert!(default_dir.to_string_lossy().contains("databuild"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_job_id_validation() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let mut collector = LogCollector::new(temp_dir.path()).unwrap();
|
||||
|
||||
let mut entry = create_test_log_entry("wrong_job_id", 1);
|
||||
entry.job_id = "wrong_job_id".to_string();
|
||||
|
||||
let input = serde_json::to_string(&entry).unwrap();
|
||||
let reader = Cursor::new(input);
|
||||
|
||||
let result = collector.consume_job_output("expected_job_id", reader);
|
||||
assert!(result.is_err());
|
||||
assert!(result.unwrap_err().to_string().contains("Job ID mismatch"));
|
||||
}
|
||||
}
|
||||
|
|
@ -1,915 +0,0 @@
|
|||
use crate::*;
|
||||
use std::collections::{HashMap, HashSet};
|
||||
|
||||
/// Represents the status of a job or partition for visualization
|
||||
#[derive(Debug, Clone, PartialEq)]
|
||||
pub enum NodeStatus {
|
||||
Pending,
|
||||
Running,
|
||||
Completed,
|
||||
Failed,
|
||||
Cancelled,
|
||||
Skipped,
|
||||
Available,
|
||||
Delegated,
|
||||
}
|
||||
|
||||
impl NodeStatus {
|
||||
/// Get the CSS class name for this status
|
||||
fn css_class(&self) -> &'static str {
|
||||
match self {
|
||||
NodeStatus::Pending => "pending",
|
||||
NodeStatus::Running => "running",
|
||||
NodeStatus::Completed => "completed",
|
||||
NodeStatus::Failed => "failed",
|
||||
NodeStatus::Cancelled => "cancelled",
|
||||
NodeStatus::Skipped => "skipped",
|
||||
NodeStatus::Available => "available",
|
||||
NodeStatus::Delegated => "delegated",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Extract current status information from build events
|
||||
pub fn extract_status_map(events: &[BuildEvent]) -> (HashMap<String, NodeStatus>, HashMap<String, NodeStatus>) {
|
||||
let mut job_statuses: HashMap<String, NodeStatus> = HashMap::new();
|
||||
let mut partition_statuses: HashMap<String, NodeStatus> = HashMap::new();
|
||||
|
||||
// Process events in chronological order to get latest status
|
||||
let mut sorted_events = events.to_vec();
|
||||
sorted_events.sort_by_key(|e| e.timestamp);
|
||||
|
||||
for event in sorted_events {
|
||||
match &event.event_type {
|
||||
Some(crate::build_event::EventType::JobEvent(job_event)) => {
|
||||
if let Some(job_label) = &job_event.job_label {
|
||||
let status = match job_event.status_code {
|
||||
1 => NodeStatus::Running, // JOB_SCHEDULED
|
||||
2 => NodeStatus::Running, // JOB_RUNNING
|
||||
3 => NodeStatus::Completed, // JOB_COMPLETED
|
||||
4 => NodeStatus::Failed, // JOB_FAILED
|
||||
5 => NodeStatus::Cancelled, // JOB_CANCELLED
|
||||
6 => NodeStatus::Skipped, // JOB_SKIPPED
|
||||
_ => NodeStatus::Pending,
|
||||
};
|
||||
|
||||
// Create a unique key using job label + target partitions (same as node ID)
|
||||
let outputs_label = job_event.target_partitions.iter()
|
||||
.map(|p| p.str.clone())
|
||||
.collect::<Vec<_>>()
|
||||
.join("___");
|
||||
let unique_key = encode_id(&(job_label.label.clone() + "___" + &outputs_label));
|
||||
|
||||
job_statuses.insert(unique_key, status);
|
||||
}
|
||||
}
|
||||
Some(crate::build_event::EventType::PartitionEvent(partition_event)) => {
|
||||
if let Some(partition_ref) = &partition_event.partition_ref {
|
||||
let status = match partition_event.status_code {
|
||||
1 => NodeStatus::Pending, // PARTITION_REQUESTED
|
||||
2 => NodeStatus::Pending, // PARTITION_ANALYZED
|
||||
3 => NodeStatus::Running, // PARTITION_BUILDING
|
||||
4 => NodeStatus::Available, // PARTITION_AVAILABLE
|
||||
5 => NodeStatus::Failed, // PARTITION_FAILED
|
||||
6 => NodeStatus::Delegated, // PARTITION_DELEGATED
|
||||
_ => NodeStatus::Pending,
|
||||
};
|
||||
partition_statuses.insert(partition_ref.str.clone(), status);
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
|
||||
(job_statuses, partition_statuses)
|
||||
}
|
||||
|
||||
/// Convert NodeStatus to EdgeStatus for edge coloring
|
||||
fn map_node_status_to_edge_status(node_status: &NodeStatus) -> EdgeStatus {
|
||||
match node_status {
|
||||
NodeStatus::Failed => EdgeStatus::Failed,
|
||||
NodeStatus::Running => EdgeStatus::Running,
|
||||
NodeStatus::Completed => EdgeStatus::Completed,
|
||||
NodeStatus::Available => EdgeStatus::Available,
|
||||
NodeStatus::Pending => EdgeStatus::Pending,
|
||||
NodeStatus::Cancelled => EdgeStatus::Failed, // Treat cancelled as failed
|
||||
NodeStatus::Skipped => EdgeStatus::Pending, // Treat skipped as pending
|
||||
NodeStatus::Delegated => EdgeStatus::Available, // Treat delegated as available
|
||||
}
|
||||
}
|
||||
|
||||
/// Encodes ID for safe usage in mermaid graph
|
||||
fn encode_id(id: &str) -> String {
|
||||
id.replace("/", "_").replace("=", "_").replace(":", "_")
|
||||
}
|
||||
|
||||
/// Trait for all Mermaid node types
|
||||
trait MermaidNode {
|
||||
fn id(&self) -> &str;
|
||||
#[allow(dead_code)]
|
||||
fn label(&self) -> &str;
|
||||
fn render(&self, status: &NodeStatus) -> String;
|
||||
}
|
||||
|
||||
/// Represents a job node in the Mermaid diagram
|
||||
struct MermaidJobNode {
|
||||
task: Task,
|
||||
id: String,
|
||||
label: String,
|
||||
}
|
||||
|
||||
impl MermaidJobNode {
|
||||
fn from(task: &Task) -> Option<MermaidJobNode> {
|
||||
let job_label: String = match &task.job {
|
||||
Some(job) => job.label.clone(),
|
||||
None => return None,
|
||||
};
|
||||
|
||||
let outputs_label: String = match &task.config {
|
||||
Some(config) => config.outputs.iter()
|
||||
.map(|o| o.str.clone())
|
||||
.collect::<Vec<_>>()
|
||||
.join("___"),
|
||||
None => String::new(),
|
||||
};
|
||||
|
||||
let id = encode_id(&(job_label.clone() + "___" + &outputs_label));
|
||||
let label = format!("**{}** {}", job_label, outputs_label);
|
||||
|
||||
Some(MermaidJobNode {
|
||||
task: task.clone(),
|
||||
id,
|
||||
label,
|
||||
})
|
||||
}
|
||||
|
||||
fn to_mermaid(&self, job_statuses: &HashMap<String, NodeStatus>) -> String {
|
||||
// Use the same unique ID logic for status lookup as we use for the node ID
|
||||
let status = job_statuses.get(&self.id).unwrap_or(&NodeStatus::Pending);
|
||||
self.render(status)
|
||||
}
|
||||
}
|
||||
|
||||
impl MermaidNode for MermaidJobNode {
|
||||
fn id(&self) -> &str {
|
||||
&self.id
|
||||
}
|
||||
|
||||
fn label(&self) -> &str {
|
||||
&self.label
|
||||
}
|
||||
|
||||
fn render(&self, status: &NodeStatus) -> String {
|
||||
format!(" {}[\"{}\"]:::job_{}\n", self.id, self.label, status.css_class())
|
||||
}
|
||||
}
|
||||
|
||||
/// Represents a partition node in the Mermaid diagram
|
||||
struct MermaidPartitionNode {
|
||||
id: String,
|
||||
label: String,
|
||||
is_output: bool,
|
||||
}
|
||||
|
||||
impl MermaidPartitionNode {
|
||||
fn new(partition_ref: &str, is_output: bool) -> Self {
|
||||
let id = format!("ref_{}", encode_id(partition_ref));
|
||||
let label = partition_ref.to_string();
|
||||
|
||||
Self {
|
||||
id,
|
||||
label,
|
||||
is_output,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl MermaidNode for MermaidPartitionNode {
|
||||
fn id(&self) -> &str {
|
||||
&self.id
|
||||
}
|
||||
|
||||
fn label(&self) -> &str {
|
||||
&self.label
|
||||
}
|
||||
|
||||
fn render(&self, status: &NodeStatus) -> String {
|
||||
let node_class = if self.is_output {
|
||||
format!("outputPartition_{}", status.css_class())
|
||||
} else {
|
||||
format!("partition_{}", status.css_class())
|
||||
};
|
||||
|
||||
format!(" {}[(\"{}\")]:::{}\n", self.id, encode_id(&self.label), node_class)
|
||||
}
|
||||
}
|
||||
|
||||
/// Types of edges in the diagram
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
|
||||
enum EdgeType {
|
||||
Solid, // Regular dependency
|
||||
Dotted, // Weak dependency
|
||||
}
|
||||
|
||||
/// Status of an edge for coloring purposes
|
||||
#[derive(Debug, Clone, PartialEq)]
|
||||
enum EdgeStatus {
|
||||
Failed, // Red - critical path issues
|
||||
Running, // Yellow - actively processing
|
||||
Completed, // Green - successfully processed
|
||||
Available, // Light green - data ready
|
||||
Pending, // Gray - waiting/not started
|
||||
}
|
||||
|
||||
/// Represents an edge between two nodes
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
|
||||
struct MermaidEdge {
|
||||
from_id: String,
|
||||
to_id: String,
|
||||
edge_type: EdgeType,
|
||||
}
|
||||
|
||||
impl MermaidEdge {
|
||||
fn new(from_id: String, to_id: String, edge_type: EdgeType) -> Self {
|
||||
Self { from_id, to_id, edge_type }
|
||||
}
|
||||
|
||||
fn render(&self) -> String {
|
||||
match self.edge_type {
|
||||
EdgeType::Solid => format!(" {} --> {}\n", self.from_id, self.to_id),
|
||||
EdgeType::Dotted => format!(" {} -.-> {}\n", self.from_id, self.to_id),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Collection of edges with deduplication
|
||||
struct EdgeCollection {
|
||||
edges: HashSet<MermaidEdge>,
|
||||
}
|
||||
|
||||
impl EdgeCollection {
|
||||
fn new() -> Self {
|
||||
Self {
|
||||
edges: HashSet::new(),
|
||||
}
|
||||
}
|
||||
|
||||
fn add(&mut self, edge: MermaidEdge) {
|
||||
self.edges.insert(edge);
|
||||
}
|
||||
|
||||
fn render_all(&self) -> String {
|
||||
self.edges.iter()
|
||||
.map(|edge| edge.render())
|
||||
.collect::<Vec<_>>()
|
||||
.join("")
|
||||
}
|
||||
}
|
||||
|
||||
/// Style rule for a specific node type and status combination
|
||||
struct StyleRule {
|
||||
class_name: String,
|
||||
fill: &'static str,
|
||||
stroke: &'static str,
|
||||
stroke_width: &'static str,
|
||||
}
|
||||
|
||||
impl StyleRule {
|
||||
fn render(&self) -> String {
|
||||
format!(
|
||||
" classDef {} fill:{},stroke:{},stroke-width:{};\n",
|
||||
self.class_name, self.fill, self.stroke, self.stroke_width
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/// Manages all styling for the Mermaid diagram
|
||||
struct MermaidStyleSheet {
|
||||
rules: Vec<StyleRule>,
|
||||
}
|
||||
|
||||
impl MermaidStyleSheet {
|
||||
fn default() -> Self {
|
||||
let mut rules = Vec::new();
|
||||
|
||||
// Job status styles
|
||||
rules.push(StyleRule {
|
||||
class_name: "job_pending".to_string(),
|
||||
fill: "#e0e0e0",
|
||||
stroke: "#333",
|
||||
stroke_width: "1px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "job_running".to_string(),
|
||||
fill: "#ffeb3b",
|
||||
stroke: "#333",
|
||||
stroke_width: "2px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "job_completed".to_string(),
|
||||
fill: "#4caf50",
|
||||
stroke: "#333",
|
||||
stroke_width: "2px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "job_failed".to_string(),
|
||||
fill: "#f44336",
|
||||
stroke: "#333",
|
||||
stroke_width: "2px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "job_cancelled".to_string(),
|
||||
fill: "#ff9800",
|
||||
stroke: "#333",
|
||||
stroke_width: "2px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "job_skipped".to_string(),
|
||||
fill: "#9e9e9e",
|
||||
stroke: "#333",
|
||||
stroke_width: "1px",
|
||||
});
|
||||
|
||||
// Partition status styles
|
||||
rules.push(StyleRule {
|
||||
class_name: "partition_pending".to_string(),
|
||||
fill: "#e3f2fd",
|
||||
stroke: "#333",
|
||||
stroke_width: "1px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "partition_running".to_string(),
|
||||
fill: "#fff9c4",
|
||||
stroke: "#333",
|
||||
stroke_width: "2px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "partition_available".to_string(),
|
||||
fill: "#c8e6c9",
|
||||
stroke: "#333",
|
||||
stroke_width: "2px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "partition_failed".to_string(),
|
||||
fill: "#ffcdd2",
|
||||
stroke: "#333",
|
||||
stroke_width: "2px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "partition_delegated".to_string(),
|
||||
fill: "#d1c4e9",
|
||||
stroke: "#333",
|
||||
stroke_width: "2px",
|
||||
});
|
||||
|
||||
// Output partition status styles (highlighted versions)
|
||||
rules.push(StyleRule {
|
||||
class_name: "outputPartition_pending".to_string(),
|
||||
fill: "#bbdefb",
|
||||
stroke: "#333",
|
||||
stroke_width: "3px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "outputPartition_running".to_string(),
|
||||
fill: "#fff59d",
|
||||
stroke: "#333",
|
||||
stroke_width: "3px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "outputPartition_available".to_string(),
|
||||
fill: "#a5d6a7",
|
||||
stroke: "#333",
|
||||
stroke_width: "3px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "outputPartition_failed".to_string(),
|
||||
fill: "#ef9a9a",
|
||||
stroke: "#333",
|
||||
stroke_width: "3px",
|
||||
});
|
||||
rules.push(StyleRule {
|
||||
class_name: "outputPartition_delegated".to_string(),
|
||||
fill: "#b39ddb",
|
||||
stroke: "#333",
|
||||
stroke_width: "3px",
|
||||
});
|
||||
|
||||
Self { rules }
|
||||
}
|
||||
|
||||
fn render(&self) -> String {
|
||||
let mut result = String::from("\n %% Styling\n");
|
||||
for rule in &self.rules {
|
||||
result.push_str(&rule.render());
|
||||
}
|
||||
result
|
||||
}
|
||||
|
||||
fn get_edge_color(&self, status: &EdgeStatus) -> &'static str {
|
||||
match status {
|
||||
EdgeStatus::Failed => "#ff4444", // Red
|
||||
EdgeStatus::Running => "#ffaa00", // Orange
|
||||
EdgeStatus::Completed => "#44aa44", // Green
|
||||
EdgeStatus::Available => "#88cc88", // Light green
|
||||
EdgeStatus::Pending => "#888888", // Gray
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Builder for constructing Mermaid diagrams
|
||||
struct MermaidDiagramBuilder {
|
||||
job_nodes: HashMap<String, MermaidJobNode>,
|
||||
partition_nodes: HashMap<String, MermaidPartitionNode>,
|
||||
edges: EdgeCollection,
|
||||
output_refs: HashSet<String>,
|
||||
edge_count: usize,
|
||||
}
|
||||
|
||||
impl MermaidDiagramBuilder {
|
||||
fn new() -> Self {
|
||||
Self {
|
||||
job_nodes: HashMap::new(),
|
||||
partition_nodes: HashMap::new(),
|
||||
edges: EdgeCollection::new(),
|
||||
output_refs: HashSet::new(),
|
||||
edge_count: 0,
|
||||
}
|
||||
}
|
||||
|
||||
fn set_output_refs(&mut self, refs: &[PartitionRef]) {
|
||||
for ref_str in refs {
|
||||
self.output_refs.insert(ref_str.str.clone());
|
||||
}
|
||||
}
|
||||
|
||||
fn add_job_node(&mut self, node: MermaidJobNode) {
|
||||
self.job_nodes.insert(node.id().to_string(), node);
|
||||
}
|
||||
|
||||
fn add_partition_node(&mut self, partition_ref: &str) -> String {
|
||||
let is_output = self.output_refs.contains(partition_ref);
|
||||
let node = MermaidPartitionNode::new(partition_ref, is_output);
|
||||
let id = node.id().to_string();
|
||||
self.partition_nodes.entry(partition_ref.to_string())
|
||||
.or_insert(node);
|
||||
id
|
||||
}
|
||||
|
||||
fn add_edge(&mut self, from_id: String, to_id: String, edge_type: EdgeType) {
|
||||
self.edges.add(MermaidEdge::new(from_id, to_id, edge_type));
|
||||
}
|
||||
|
||||
fn add_edge_with_status(&mut self, from_id: String, to_id: String, edge_type: EdgeType,
|
||||
edge_status: EdgeStatus, result: &mut String, stylesheet: &MermaidStyleSheet) {
|
||||
// Create the edge
|
||||
let edge = MermaidEdge::new(from_id, to_id, edge_type);
|
||||
|
||||
// Check if this edge already exists (for deduplication)
|
||||
if self.edges.edges.contains(&edge) {
|
||||
return; // Skip duplicate edge
|
||||
}
|
||||
|
||||
// Render the edge
|
||||
result.push_str(&edge.render());
|
||||
|
||||
// Add edge to collection for deduplication tracking
|
||||
self.edges.add(edge);
|
||||
|
||||
// Immediately render the linkStyle if status is not pending
|
||||
if edge_status != EdgeStatus::Pending {
|
||||
let color = stylesheet.get_edge_color(&edge_status);
|
||||
result.push_str(&format!(" linkStyle {} stroke:{},stroke-width:2px\n",
|
||||
self.edge_count, color));
|
||||
}
|
||||
|
||||
self.edge_count += 1;
|
||||
}
|
||||
|
||||
fn build_with_edges(self, statuses: &(HashMap<String, NodeStatus>, HashMap<String, NodeStatus>),
|
||||
stylesheet: MermaidStyleSheet, edges_content: String) -> String {
|
||||
let (job_statuses, partition_statuses) = statuses;
|
||||
let mut result = String::from("flowchart TD\n");
|
||||
|
||||
// Render all job nodes
|
||||
for (_, job_node) in self.job_nodes {
|
||||
result.push_str(&job_node.to_mermaid(job_statuses));
|
||||
}
|
||||
|
||||
// Render all partition nodes
|
||||
for (partition_ref, node) in self.partition_nodes {
|
||||
let status = partition_statuses.get(&partition_ref).unwrap_or(&NodeStatus::Pending);
|
||||
result.push_str(&node.render(status));
|
||||
}
|
||||
|
||||
// Add the edges content (which includes linkStyle statements)
|
||||
result.push_str(&edges_content);
|
||||
|
||||
// Apply styles
|
||||
result.push_str(&stylesheet.render());
|
||||
|
||||
result
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
pub fn generate_mermaid_diagram(graph: &JobGraph) -> String {
|
||||
generate_mermaid_with_status(graph, &[])
|
||||
}
|
||||
|
||||
/// Generate a mermaid diagram for a job graph with current status annotations
|
||||
pub fn generate_mermaid_with_status(
|
||||
graph: &JobGraph,
|
||||
events: &[BuildEvent],
|
||||
) -> String {
|
||||
let statuses = extract_status_map(events);
|
||||
let (job_statuses, partition_statuses) = &statuses;
|
||||
let mut builder = MermaidDiagramBuilder::new();
|
||||
let stylesheet = MermaidStyleSheet::default();
|
||||
|
||||
// Set output refs for highlighting
|
||||
builder.set_output_refs(&graph.outputs);
|
||||
|
||||
// String to accumulate edges with their styles
|
||||
let mut edges_content = String::new();
|
||||
|
||||
// Process all task nodes
|
||||
for task in &graph.nodes {
|
||||
if let Some(job_node) = MermaidJobNode::from(task) {
|
||||
let job_id = job_node.id().to_string();
|
||||
builder.add_job_node(job_node);
|
||||
|
||||
if let Some(config) = &task.config {
|
||||
// Process inputs (dependencies)
|
||||
for input in &config.inputs {
|
||||
if let Some(partition_ref) = &input.partition_ref {
|
||||
let ref_id = builder.add_partition_node(&partition_ref.str);
|
||||
let edge_type = if input.dep_type_code == 1 {
|
||||
EdgeType::Solid
|
||||
} else {
|
||||
EdgeType::Dotted
|
||||
};
|
||||
|
||||
// Get partition status for edge coloring
|
||||
let partition_status = partition_statuses.get(&partition_ref.str)
|
||||
.unwrap_or(&NodeStatus::Pending);
|
||||
let edge_status = map_node_status_to_edge_status(partition_status);
|
||||
|
||||
builder.add_edge_with_status(ref_id, job_id.clone(), edge_type,
|
||||
edge_status, &mut edges_content, &stylesheet);
|
||||
}
|
||||
}
|
||||
|
||||
// Process outputs
|
||||
for output in &config.outputs {
|
||||
let ref_id = builder.add_partition_node(&output.str);
|
||||
|
||||
// Get job status for edge coloring
|
||||
let job_status = job_statuses.get(&job_id)
|
||||
.unwrap_or(&NodeStatus::Pending);
|
||||
let edge_status = map_node_status_to_edge_status(job_status);
|
||||
|
||||
builder.add_edge_with_status(job_id.clone(), ref_id, EdgeType::Solid,
|
||||
edge_status, &mut edges_content, &stylesheet);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Build the diagram with edges content
|
||||
builder.build_with_edges(&statuses, stylesheet, edges_content)
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_encode_id() {
|
||||
assert_eq!(encode_id("path/to/file"), "path_to_file");
|
||||
assert_eq!(encode_id("key=value"), "key_value");
|
||||
assert_eq!(encode_id("scope:item"), "scope_item");
|
||||
assert_eq!(encode_id("a/b=c:d"), "a_b_c_d");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_mermaid_job_node() {
|
||||
let mut task = Task::default();
|
||||
task.job = Some(JobLabel { label: "test_job".to_string() });
|
||||
task.config = Some(JobConfig {
|
||||
outputs: vec![
|
||||
PartitionRef { str: "output1".to_string() },
|
||||
PartitionRef { str: "output2".to_string() },
|
||||
],
|
||||
inputs: vec![],
|
||||
args: vec![],
|
||||
env: HashMap::new(),
|
||||
});
|
||||
|
||||
let node = MermaidJobNode::from(&task).expect("Failed to create job node");
|
||||
assert_eq!(node.id(), "test_job___output1___output2");
|
||||
assert_eq!(node.label(), "**test_job** output1___output2");
|
||||
|
||||
let rendered = node.render(&NodeStatus::Running);
|
||||
assert!(rendered.contains("test_job___output1___output2"));
|
||||
assert!(rendered.contains("**test_job** output1___output2"));
|
||||
assert!(rendered.contains("job_running"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_mermaid_partition_node() {
|
||||
let node = MermaidPartitionNode::new("data/partition=1", false);
|
||||
assert_eq!(node.id(), "ref_data_partition_1");
|
||||
assert_eq!(node.label(), "data/partition=1");
|
||||
|
||||
let rendered = node.render(&NodeStatus::Available);
|
||||
assert!(rendered.contains("ref_data_partition_1"));
|
||||
assert!(rendered.contains("data_partition_1"));
|
||||
assert!(rendered.contains("partition_available"));
|
||||
|
||||
// Test output partition
|
||||
let output_node = MermaidPartitionNode::new("output/data", true);
|
||||
let output_rendered = output_node.render(&NodeStatus::Available);
|
||||
assert!(output_rendered.contains("outputPartition_available"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_edge_collection() {
|
||||
let mut edges = EdgeCollection::new();
|
||||
|
||||
// Add edges
|
||||
edges.add(MermaidEdge::new("node1".to_string(), "node2".to_string(), EdgeType::Solid));
|
||||
edges.add(MermaidEdge::new("node2".to_string(), "node3".to_string(), EdgeType::Dotted));
|
||||
|
||||
// Test deduplication
|
||||
edges.add(MermaidEdge::new("node1".to_string(), "node2".to_string(), EdgeType::Solid));
|
||||
|
||||
let rendered = edges.render_all();
|
||||
assert!(rendered.contains("node1 --> node2"));
|
||||
assert!(rendered.contains("node2 -.-> node3"));
|
||||
|
||||
// Should only have 2 unique edges
|
||||
assert_eq!(rendered.matches("-->").count(), 1);
|
||||
assert_eq!(rendered.matches("-.->").count(), 1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_simple_graph_generation() {
|
||||
// Create task 1
|
||||
let mut task1 = Task::default();
|
||||
task1.job = Some(JobLabel { label: "job1".to_string() });
|
||||
task1.config = Some(JobConfig {
|
||||
inputs: vec![{
|
||||
let mut input = DataDep::default();
|
||||
input.partition_ref = Some(PartitionRef { str: "input/data".to_string() });
|
||||
input.dep_type_code = 1; // Solid dependency
|
||||
input.dep_type_name = "materialize".to_string();
|
||||
input
|
||||
}],
|
||||
outputs: vec![
|
||||
PartitionRef { str: "intermediate/data".to_string() },
|
||||
],
|
||||
args: vec![],
|
||||
env: HashMap::new(),
|
||||
});
|
||||
|
||||
// Create task 2
|
||||
let mut task2 = Task::default();
|
||||
task2.job = Some(JobLabel { label: "job2".to_string() });
|
||||
task2.config = Some(JobConfig {
|
||||
inputs: vec![{
|
||||
let mut input = DataDep::default();
|
||||
input.partition_ref = Some(PartitionRef { str: "intermediate/data".to_string() });
|
||||
input.dep_type_code = 0; // Dotted dependency
|
||||
input.dep_type_name = "query".to_string();
|
||||
input
|
||||
}],
|
||||
outputs: vec![
|
||||
PartitionRef { str: "output/data".to_string() },
|
||||
],
|
||||
args: vec![],
|
||||
env: HashMap::new(),
|
||||
});
|
||||
|
||||
// Create a simple graph
|
||||
let mut graph = JobGraph::default();
|
||||
graph.nodes = vec![task1, task2];
|
||||
graph.outputs = vec![
|
||||
PartitionRef { str: "output/data".to_string() },
|
||||
];
|
||||
|
||||
let mermaid = generate_mermaid_diagram(&graph);
|
||||
|
||||
// Check basic structure
|
||||
assert!(mermaid.starts_with("flowchart TD\n"));
|
||||
|
||||
// Check nodes - verify both ID and label are present
|
||||
assert!(mermaid.contains("job1___intermediate_data"), "Missing job1 node ID");
|
||||
assert!(mermaid.contains("**job1** intermediate/data"), "Missing job1 label");
|
||||
assert!(mermaid.contains("job2___output_data"), "Missing job2 node ID");
|
||||
assert!(mermaid.contains("**job2** output/data"), "Missing job2 label");
|
||||
assert!(mermaid.contains("ref_input_data"));
|
||||
assert!(mermaid.contains("ref_intermediate_data"));
|
||||
assert!(mermaid.contains("ref_output_data"));
|
||||
|
||||
// Check edges
|
||||
assert!(mermaid.contains("ref_input_data --> job1"));
|
||||
assert!(mermaid.contains("job1___intermediate_data --> ref_intermediate_data"));
|
||||
assert!(mermaid.contains("ref_intermediate_data -.-> job2"));
|
||||
assert!(mermaid.contains("job2___output_data --> ref_output_data"));
|
||||
|
||||
// Check styling
|
||||
assert!(mermaid.contains("classDef job_pending"));
|
||||
assert!(mermaid.contains("classDef partition_pending"));
|
||||
assert!(mermaid.contains("classDef outputPartition_pending"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_status_extraction() {
|
||||
let mut event1 = BuildEvent::default();
|
||||
event1.timestamp = 1;
|
||||
event1.event_type = Some(crate::build_event::EventType::JobEvent({
|
||||
let mut job_event = JobEvent::default();
|
||||
job_event.job_label = Some(JobLabel { label: "test_job".to_string() });
|
||||
job_event.status_code = 2; // JOB_RUNNING
|
||||
job_event
|
||||
}));
|
||||
|
||||
let mut event2 = BuildEvent::default();
|
||||
event2.timestamp = 2;
|
||||
event2.event_type = Some(crate::build_event::EventType::PartitionEvent({
|
||||
let mut partition_event = PartitionEvent::default();
|
||||
partition_event.partition_ref = Some(PartitionRef { str: "test/partition".to_string() });
|
||||
partition_event.status_code = 4; // PARTITION_AVAILABLE
|
||||
partition_event
|
||||
}));
|
||||
|
||||
let events = vec![event1, event2];
|
||||
|
||||
let (job_statuses, partition_statuses) = extract_status_map(&events);
|
||||
|
||||
// Should use the unique key (job_label + target_partitions) instead of just job_label
|
||||
assert_eq!(job_statuses.get("test_job"), None, "Should not find job by label alone");
|
||||
assert_eq!(partition_statuses.get("test/partition"), Some(&NodeStatus::Available));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_job_status_per_task_instance() {
|
||||
// Test that different task instances with same job label get different status
|
||||
let mut event1 = BuildEvent::default();
|
||||
event1.event_type = Some(crate::build_event::EventType::JobEvent({
|
||||
let mut job_event = JobEvent::default();
|
||||
job_event.job_label = Some(JobLabel { label: "same_job".to_string() });
|
||||
job_event.target_partitions = vec![PartitionRef { str: "output1".to_string() }];
|
||||
job_event.status_code = 2; // JOB_RUNNING
|
||||
job_event
|
||||
}));
|
||||
|
||||
let mut event2 = BuildEvent::default();
|
||||
event2.event_type = Some(crate::build_event::EventType::JobEvent({
|
||||
let mut job_event = JobEvent::default();
|
||||
job_event.job_label = Some(JobLabel { label: "same_job".to_string() });
|
||||
job_event.target_partitions = vec![PartitionRef { str: "output2".to_string() }];
|
||||
job_event.status_code = 3; // JOB_COMPLETED
|
||||
job_event
|
||||
}));
|
||||
|
||||
let events = vec![event1, event2];
|
||||
let (job_statuses, _) = extract_status_map(&events);
|
||||
|
||||
// Each task should have its own status based on unique key
|
||||
assert_eq!(job_statuses.get("same_job___output1"), Some(&NodeStatus::Running));
|
||||
assert_eq!(job_statuses.get("same_job___output2"), Some(&NodeStatus::Completed));
|
||||
assert_eq!(job_statuses.get("same_job"), None, "Should not find job by label alone");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_edge_coloring_with_status() {
|
||||
// Create a simple graph with status
|
||||
let mut task1 = Task::default();
|
||||
task1.job = Some(JobLabel { label: "job1".to_string() });
|
||||
task1.config = Some(JobConfig {
|
||||
inputs: vec![{
|
||||
let mut input = DataDep::default();
|
||||
input.partition_ref = Some(PartitionRef { str: "input/data".to_string() });
|
||||
input.dep_type_code = 1; // Solid dependency
|
||||
input.dep_type_name = "materialize".to_string();
|
||||
input
|
||||
}],
|
||||
outputs: vec![
|
||||
PartitionRef { str: "intermediate/data".to_string() },
|
||||
],
|
||||
args: vec![],
|
||||
env: HashMap::new(),
|
||||
});
|
||||
|
||||
let mut graph = JobGraph::default();
|
||||
graph.nodes = vec![task1];
|
||||
graph.outputs = vec![
|
||||
PartitionRef { str: "intermediate/data".to_string() },
|
||||
];
|
||||
|
||||
// Create events to set status
|
||||
let mut partition_event = BuildEvent::default();
|
||||
partition_event.event_type = Some(crate::build_event::EventType::PartitionEvent({
|
||||
let mut pe = PartitionEvent::default();
|
||||
pe.partition_ref = Some(PartitionRef { str: "input/data".to_string() });
|
||||
pe.status_code = 4; // PARTITION_AVAILABLE
|
||||
pe
|
||||
}));
|
||||
|
||||
let mut job_event = BuildEvent::default();
|
||||
job_event.event_type = Some(crate::build_event::EventType::JobEvent({
|
||||
let mut je = JobEvent::default();
|
||||
je.job_label = Some(JobLabel { label: "job1".to_string() });
|
||||
je.target_partitions = vec![PartitionRef { str: "intermediate/data".to_string() }];
|
||||
je.status_code = 2; // JOB_RUNNING
|
||||
je
|
||||
}));
|
||||
|
||||
let events = vec![partition_event, job_event];
|
||||
let mermaid = generate_mermaid_with_status(&graph, &events);
|
||||
|
||||
// Check that linkStyle statements are present
|
||||
assert!(mermaid.contains("linkStyle"), "Should contain linkStyle statements");
|
||||
assert!(mermaid.contains("#88cc88"), "Should contain available edge color (light green)");
|
||||
assert!(mermaid.contains("#ffaa00"), "Should contain running edge color (orange)");
|
||||
|
||||
// Check basic structure is still intact
|
||||
assert!(mermaid.contains("flowchart TD"));
|
||||
assert!(mermaid.contains("job1___intermediate_data"));
|
||||
assert!(mermaid.contains("ref_input_data"));
|
||||
assert!(mermaid.contains("ref_intermediate_data"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_edge_status_mapping() {
|
||||
assert_eq!(map_node_status_to_edge_status(&NodeStatus::Failed), EdgeStatus::Failed);
|
||||
assert_eq!(map_node_status_to_edge_status(&NodeStatus::Running), EdgeStatus::Running);
|
||||
assert_eq!(map_node_status_to_edge_status(&NodeStatus::Completed), EdgeStatus::Completed);
|
||||
assert_eq!(map_node_status_to_edge_status(&NodeStatus::Available), EdgeStatus::Available);
|
||||
assert_eq!(map_node_status_to_edge_status(&NodeStatus::Pending), EdgeStatus::Pending);
|
||||
assert_eq!(map_node_status_to_edge_status(&NodeStatus::Cancelled), EdgeStatus::Failed);
|
||||
assert_eq!(map_node_status_to_edge_status(&NodeStatus::Skipped), EdgeStatus::Pending);
|
||||
assert_eq!(map_node_status_to_edge_status(&NodeStatus::Delegated), EdgeStatus::Available);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_edge_deduplication() {
|
||||
// Create a graph that could potentially have duplicate edges
|
||||
let mut task1 = Task::default();
|
||||
task1.job = Some(JobLabel { label: "job1".to_string() });
|
||||
task1.config = Some(JobConfig {
|
||||
inputs: vec![{
|
||||
let mut input = DataDep::default();
|
||||
input.partition_ref = Some(PartitionRef { str: "shared_input".to_string() });
|
||||
input.dep_type_code = 1;
|
||||
input.dep_type_name = "materialize".to_string();
|
||||
input
|
||||
}],
|
||||
outputs: vec![
|
||||
PartitionRef { str: "output1".to_string() },
|
||||
],
|
||||
args: vec![],
|
||||
env: HashMap::new(),
|
||||
});
|
||||
|
||||
let mut task2 = Task::default();
|
||||
task2.job = Some(JobLabel { label: "job2".to_string() });
|
||||
task2.config = Some(JobConfig {
|
||||
inputs: vec![{
|
||||
let mut input = DataDep::default();
|
||||
input.partition_ref = Some(PartitionRef { str: "shared_input".to_string() });
|
||||
input.dep_type_code = 1;
|
||||
input.dep_type_name = "materialize".to_string();
|
||||
input
|
||||
}],
|
||||
outputs: vec![
|
||||
PartitionRef { str: "output2".to_string() },
|
||||
],
|
||||
args: vec![],
|
||||
env: HashMap::new(),
|
||||
});
|
||||
|
||||
let mut graph = JobGraph::default();
|
||||
graph.nodes = vec![task1, task2];
|
||||
graph.outputs = vec![
|
||||
PartitionRef { str: "output1".to_string() },
|
||||
PartitionRef { str: "output2".to_string() },
|
||||
];
|
||||
|
||||
let mermaid = generate_mermaid_diagram(&graph);
|
||||
|
||||
// Count how many times the shared edge appears
|
||||
let shared_edge_count = mermaid.matches("ref_shared_input --> job").count();
|
||||
|
||||
// Should only appear once per job (2 total), not duplicated
|
||||
assert_eq!(shared_edge_count, 2, "Should have exactly 2 edges from shared_input (one to each job)");
|
||||
|
||||
// Verify no duplicate edges in the output
|
||||
let lines: Vec<&str> = mermaid.lines().collect();
|
||||
let edge_lines: Vec<&str> = lines.iter().filter(|line| line.contains("-->") || line.contains("-.->")).cloned().collect();
|
||||
let unique_edges: std::collections::HashSet<&str> = edge_lines.iter().cloned().collect();
|
||||
|
||||
assert_eq!(edge_lines.len(), unique_edges.len(), "Should have no duplicate edges in output");
|
||||
}
|
||||
}
|
||||
|
|
@ -1,523 +0,0 @@
|
|||
use crate::{JobLogEntry, job_log_entry, WrapperJobEvent};
|
||||
use std::collections::HashMap;
|
||||
|
||||
/// Template for metric extraction from job events
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct MetricTemplate {
|
||||
pub name: String,
|
||||
pub help: String,
|
||||
pub metric_type: MetricType,
|
||||
pub extractor: MetricExtractor,
|
||||
pub labels: Vec<String>, // Static label names for this metric
|
||||
}
|
||||
|
||||
/// Prometheus metric types
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum MetricType {
|
||||
Counter,
|
||||
Gauge,
|
||||
Histogram,
|
||||
Summary,
|
||||
}
|
||||
|
||||
/// Strategy for extracting metric values from job events
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum MetricExtractor {
|
||||
/// Extract from job event metadata by key
|
||||
EventMetadata {
|
||||
event_type: String,
|
||||
metadata_key: String,
|
||||
/// Optional conversion function name for non-numeric values
|
||||
converter: Option<MetricConverter>,
|
||||
},
|
||||
/// Count occurrences of specific event types
|
||||
EventCount {
|
||||
event_type: String,
|
||||
},
|
||||
/// Extract job duration from start/end events
|
||||
JobDuration,
|
||||
/// Extract peak memory from job summary
|
||||
PeakMemory,
|
||||
/// Extract total CPU time from job summary
|
||||
TotalCpuTime,
|
||||
/// Extract exit code from job events
|
||||
ExitCode,
|
||||
}
|
||||
|
||||
/// Converters for non-numeric metadata values
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum MetricConverter {
|
||||
/// Convert boolean strings to 0/1
|
||||
BoolToFloat,
|
||||
/// Convert status strings to numeric codes
|
||||
StatusToCode(HashMap<String, f64>),
|
||||
/// Parse duration strings like "123ms" to seconds
|
||||
DurationToSeconds,
|
||||
}
|
||||
|
||||
/// Result of metric extraction
|
||||
#[derive(Debug)]
|
||||
pub struct ExtractedMetric {
|
||||
pub name: String,
|
||||
pub value: f64,
|
||||
pub labels: HashMap<String, String>,
|
||||
pub help: String,
|
||||
pub metric_type: MetricType,
|
||||
}
|
||||
|
||||
impl MetricTemplate {
|
||||
/// Extract a metric from a job log entry if applicable
|
||||
pub fn extract(&self, entry: &JobLogEntry) -> Option<ExtractedMetric> {
|
||||
let value = match &self.extractor {
|
||||
MetricExtractor::EventMetadata { event_type, metadata_key, converter } => {
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &entry.content {
|
||||
if event.event_type == *event_type {
|
||||
if let Some(raw_value) = event.metadata.get(metadata_key) {
|
||||
self.convert_value(raw_value, converter)?
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
},
|
||||
MetricExtractor::EventCount { event_type } => {
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &entry.content {
|
||||
if event.event_type == *event_type {
|
||||
1.0
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
},
|
||||
MetricExtractor::JobDuration => {
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &entry.content {
|
||||
if event.event_type == "job_summary" {
|
||||
if let Some(runtime_str) = event.metadata.get("runtime_ms") {
|
||||
runtime_str.parse::<f64>().ok()? / 1000.0 // Convert to seconds
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
},
|
||||
MetricExtractor::PeakMemory => {
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &entry.content {
|
||||
if event.event_type == "job_summary" {
|
||||
if let Some(memory_str) = event.metadata.get("peak_memory_mb") {
|
||||
memory_str.parse::<f64>().ok()?
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
},
|
||||
MetricExtractor::TotalCpuTime => {
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &entry.content {
|
||||
if event.event_type == "job_summary" {
|
||||
if let Some(cpu_str) = event.metadata.get("total_cpu_ms") {
|
||||
cpu_str.parse::<f64>().ok()? / 1000.0 // Convert to seconds
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
},
|
||||
MetricExtractor::ExitCode => {
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &entry.content {
|
||||
if let Some(exit_code) = event.exit_code {
|
||||
exit_code as f64
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
} else {
|
||||
return None;
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
// Generate labels for this metric
|
||||
let mut labels = HashMap::new();
|
||||
|
||||
// Always include job_id as a label (but this is excluded by default for cardinality safety)
|
||||
labels.insert("job_id".to_string(), entry.job_id.clone());
|
||||
|
||||
// Extract job label from manifest if available - this is the low-cardinality identifier
|
||||
if let Some(job_log_entry::Content::Manifest(manifest)) = &entry.content {
|
||||
if let Some(task) = &manifest.task {
|
||||
if let Some(job) = &task.job {
|
||||
labels.insert("job_label".to_string(), job.label.clone());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Add job status and job label if available from job events
|
||||
if let Some(job_log_entry::Content::JobEvent(event)) = &entry.content {
|
||||
if let Some(job_status) = &event.job_status {
|
||||
labels.insert("job_status".to_string(), job_status.clone());
|
||||
}
|
||||
if let Some(job_label) = &event.job_label {
|
||||
labels.insert("job_label".to_string(), job_label.clone());
|
||||
}
|
||||
}
|
||||
|
||||
Some(ExtractedMetric {
|
||||
name: self.name.clone(),
|
||||
value,
|
||||
labels,
|
||||
help: self.help.clone(),
|
||||
metric_type: self.metric_type.clone(),
|
||||
})
|
||||
}
|
||||
|
||||
fn convert_value(&self, raw_value: &str, converter: &Option<MetricConverter>) -> Option<f64> {
|
||||
match converter {
|
||||
None => raw_value.parse().ok(),
|
||||
Some(MetricConverter::BoolToFloat) => {
|
||||
match raw_value.to_lowercase().as_str() {
|
||||
"true" | "1" | "yes" => Some(1.0),
|
||||
"false" | "0" | "no" => Some(0.0),
|
||||
_ => None,
|
||||
}
|
||||
},
|
||||
Some(MetricConverter::StatusToCode(mapping)) => {
|
||||
mapping.get(raw_value).copied()
|
||||
},
|
||||
Some(MetricConverter::DurationToSeconds) => {
|
||||
// Parse formats like "123ms", "45s", "2.5m"
|
||||
if raw_value.ends_with("ms") {
|
||||
raw_value.trim_end_matches("ms").parse::<f64>().ok().map(|v| v / 1000.0)
|
||||
} else if raw_value.ends_with("s") {
|
||||
raw_value.trim_end_matches("s").parse::<f64>().ok()
|
||||
} else if raw_value.ends_with("m") {
|
||||
raw_value.trim_end_matches("m").parse::<f64>().ok().map(|v| v * 60.0)
|
||||
} else {
|
||||
raw_value.parse::<f64>().ok()
|
||||
}
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/// Get standard DataBuild metric templates
|
||||
pub fn get_standard_metrics() -> Vec<MetricTemplate> {
|
||||
vec![
|
||||
// Job execution metrics
|
||||
MetricTemplate {
|
||||
name: "databuild_job_duration_seconds".to_string(),
|
||||
help: "Duration of job execution in seconds".to_string(),
|
||||
metric_type: MetricType::Histogram,
|
||||
extractor: MetricExtractor::JobDuration,
|
||||
labels: vec!["job_label".to_string()],
|
||||
},
|
||||
MetricTemplate {
|
||||
name: "databuild_job_peak_memory_mb".to_string(),
|
||||
help: "Peak memory usage of job in megabytes".to_string(),
|
||||
metric_type: MetricType::Gauge,
|
||||
extractor: MetricExtractor::PeakMemory,
|
||||
labels: vec!["job_label".to_string()],
|
||||
},
|
||||
MetricTemplate {
|
||||
name: "databuild_job_cpu_time_seconds".to_string(),
|
||||
help: "Total CPU time consumed by job in seconds".to_string(),
|
||||
metric_type: MetricType::Counter,
|
||||
extractor: MetricExtractor::TotalCpuTime,
|
||||
labels: vec!["job_label".to_string()],
|
||||
},
|
||||
MetricTemplate {
|
||||
name: "databuild_job_exit_code".to_string(),
|
||||
help: "Exit code of job execution".to_string(),
|
||||
metric_type: MetricType::Gauge,
|
||||
extractor: MetricExtractor::ExitCode,
|
||||
labels: vec!["job_label".to_string(), "job_status".to_string()],
|
||||
},
|
||||
|
||||
// Job event counters
|
||||
MetricTemplate {
|
||||
name: "databuild_job_events_total".to_string(),
|
||||
help: "Total number of job events".to_string(),
|
||||
metric_type: MetricType::Counter,
|
||||
extractor: MetricExtractor::EventCount { event_type: "task_success".to_string() },
|
||||
labels: vec!["job_label".to_string()],
|
||||
},
|
||||
MetricTemplate {
|
||||
name: "databuild_job_failures_total".to_string(),
|
||||
help: "Total number of job failures".to_string(),
|
||||
metric_type: MetricType::Counter,
|
||||
extractor: MetricExtractor::EventCount { event_type: "task_failed".to_string() },
|
||||
labels: vec!["job_label".to_string()],
|
||||
},
|
||||
MetricTemplate {
|
||||
name: "databuild_heartbeats_total".to_string(),
|
||||
help: "Total number of heartbeat events".to_string(),
|
||||
metric_type: MetricType::Counter,
|
||||
extractor: MetricExtractor::EventCount { event_type: "heartbeat".to_string() },
|
||||
labels: vec!["job_label".to_string()],
|
||||
},
|
||||
]
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::{PartitionRef, log_message, LogMessage};
|
||||
|
||||
fn create_test_job_summary_entry(job_id: &str, runtime_ms: &str, memory_mb: &str, cpu_ms: &str, exit_code: i32) -> JobLogEntry {
|
||||
let mut metadata = HashMap::new();
|
||||
metadata.insert("runtime_ms".to_string(), runtime_ms.to_string());
|
||||
metadata.insert("peak_memory_mb".to_string(), memory_mb.to_string());
|
||||
metadata.insert("total_cpu_ms".to_string(), cpu_ms.to_string());
|
||||
metadata.insert("exit_code".to_string(), exit_code.to_string());
|
||||
|
||||
JobLogEntry {
|
||||
timestamp: "1234567890".to_string(),
|
||||
job_id: job_id.to_string(),
|
||||
outputs: vec![PartitionRef { r#str: "reviews/date=2025-01-27".to_string() }],
|
||||
sequence_number: 1,
|
||||
content: Some(job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "job_summary".to_string(),
|
||||
job_status: Some("JOB_COMPLETED".to_string()),
|
||||
exit_code: Some(exit_code),
|
||||
metadata,
|
||||
job_label: None,
|
||||
})),
|
||||
}
|
||||
}
|
||||
|
||||
fn create_test_task_success_entry(job_id: &str) -> JobLogEntry {
|
||||
JobLogEntry {
|
||||
timestamp: "1234567890".to_string(),
|
||||
job_id: job_id.to_string(),
|
||||
outputs: vec![PartitionRef { r#str: "podcasts/date=2025-01-27".to_string() }],
|
||||
sequence_number: 2,
|
||||
content: Some(job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "task_success".to_string(),
|
||||
job_status: Some("JOB_COMPLETED".to_string()),
|
||||
exit_code: Some(0),
|
||||
metadata: HashMap::new(),
|
||||
job_label: None,
|
||||
})),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_job_duration_extraction() {
|
||||
let template = MetricTemplate {
|
||||
name: "test_duration".to_string(),
|
||||
help: "Test duration".to_string(),
|
||||
metric_type: MetricType::Histogram,
|
||||
extractor: MetricExtractor::JobDuration,
|
||||
labels: vec![],
|
||||
};
|
||||
|
||||
let entry = create_test_job_summary_entry("test-job", "2500", "64.5", "1200", 0);
|
||||
let metric = template.extract(&entry).unwrap();
|
||||
|
||||
assert_eq!(metric.name, "test_duration");
|
||||
assert_eq!(metric.value, 2.5); // 2500ms -> 2.5s
|
||||
assert_eq!(metric.labels.get("job_id").unwrap(), "test-job");
|
||||
// Note: job_label would only be available from manifest entries, not job_summary
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_memory_extraction() {
|
||||
let template = MetricTemplate {
|
||||
name: "test_memory".to_string(),
|
||||
help: "Test memory".to_string(),
|
||||
metric_type: MetricType::Gauge,
|
||||
extractor: MetricExtractor::PeakMemory,
|
||||
labels: vec![],
|
||||
};
|
||||
|
||||
let entry = create_test_job_summary_entry("test-job", "2500", "128.75", "1200", 0);
|
||||
let metric = template.extract(&entry).unwrap();
|
||||
|
||||
assert_eq!(metric.value, 128.75);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_cpu_time_extraction() {
|
||||
let template = MetricTemplate {
|
||||
name: "test_cpu".to_string(),
|
||||
help: "Test CPU".to_string(),
|
||||
metric_type: MetricType::Counter,
|
||||
extractor: MetricExtractor::TotalCpuTime,
|
||||
labels: vec![],
|
||||
};
|
||||
|
||||
let entry = create_test_job_summary_entry("test-job", "2500", "64.5", "1500", 0);
|
||||
let metric = template.extract(&entry).unwrap();
|
||||
|
||||
assert_eq!(metric.value, 1.5); // 1500ms -> 1.5s
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_exit_code_extraction() {
|
||||
let template = MetricTemplate {
|
||||
name: "test_exit_code".to_string(),
|
||||
help: "Test exit code".to_string(),
|
||||
metric_type: MetricType::Gauge,
|
||||
extractor: MetricExtractor::ExitCode,
|
||||
labels: vec![],
|
||||
};
|
||||
|
||||
let entry = create_test_job_summary_entry("test-job", "2500", "64.5", "1200", 42);
|
||||
let metric = template.extract(&entry).unwrap();
|
||||
|
||||
assert_eq!(metric.value, 42.0);
|
||||
assert_eq!(metric.labels.get("job_status").unwrap(), "JOB_COMPLETED");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_event_count_extraction() {
|
||||
let template = MetricTemplate {
|
||||
name: "test_success_count".to_string(),
|
||||
help: "Test success count".to_string(),
|
||||
metric_type: MetricType::Counter,
|
||||
extractor: MetricExtractor::EventCount { event_type: "task_success".to_string() },
|
||||
labels: vec![],
|
||||
};
|
||||
|
||||
let entry = create_test_task_success_entry("test-job");
|
||||
let metric = template.extract(&entry).unwrap();
|
||||
|
||||
assert_eq!(metric.value, 1.0);
|
||||
// Note: job_label would only be available from manifest entries, not job events
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_event_metadata_extraction() {
|
||||
let template = MetricTemplate {
|
||||
name: "test_runtime".to_string(),
|
||||
help: "Test runtime from metadata".to_string(),
|
||||
metric_type: MetricType::Gauge,
|
||||
extractor: MetricExtractor::EventMetadata {
|
||||
event_type: "job_summary".to_string(),
|
||||
metadata_key: "runtime_ms".to_string(),
|
||||
converter: None,
|
||||
},
|
||||
labels: vec![],
|
||||
};
|
||||
|
||||
let entry = create_test_job_summary_entry("test-job", "3000", "64.5", "1200", 0);
|
||||
let metric = template.extract(&entry).unwrap();
|
||||
|
||||
assert_eq!(metric.value, 3000.0);
|
||||
}
|
||||
|
||||
|
||||
#[test]
|
||||
fn test_bool_converter() {
|
||||
let template = MetricTemplate {
|
||||
name: "test_bool".to_string(),
|
||||
help: "Test bool".to_string(),
|
||||
metric_type: MetricType::Gauge,
|
||||
extractor: MetricExtractor::EventMetadata {
|
||||
event_type: "test_event".to_string(),
|
||||
metadata_key: "success".to_string(),
|
||||
converter: Some(MetricConverter::BoolToFloat),
|
||||
},
|
||||
labels: vec![],
|
||||
};
|
||||
|
||||
assert_eq!(template.convert_value("true", &Some(MetricConverter::BoolToFloat)), Some(1.0));
|
||||
assert_eq!(template.convert_value("false", &Some(MetricConverter::BoolToFloat)), Some(0.0));
|
||||
assert_eq!(template.convert_value("yes", &Some(MetricConverter::BoolToFloat)), Some(1.0));
|
||||
assert_eq!(template.convert_value("no", &Some(MetricConverter::BoolToFloat)), Some(0.0));
|
||||
assert_eq!(template.convert_value("invalid", &Some(MetricConverter::BoolToFloat)), None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_duration_converter() {
|
||||
let template = MetricTemplate {
|
||||
name: "test_duration".to_string(),
|
||||
help: "Test duration".to_string(),
|
||||
metric_type: MetricType::Gauge,
|
||||
extractor: MetricExtractor::EventMetadata {
|
||||
event_type: "test_event".to_string(),
|
||||
metadata_key: "duration".to_string(),
|
||||
converter: Some(MetricConverter::DurationToSeconds),
|
||||
},
|
||||
labels: vec![],
|
||||
};
|
||||
|
||||
assert_eq!(template.convert_value("1000ms", &Some(MetricConverter::DurationToSeconds)), Some(1.0));
|
||||
assert_eq!(template.convert_value("5s", &Some(MetricConverter::DurationToSeconds)), Some(5.0));
|
||||
assert_eq!(template.convert_value("2.5m", &Some(MetricConverter::DurationToSeconds)), Some(150.0));
|
||||
assert_eq!(template.convert_value("42", &Some(MetricConverter::DurationToSeconds)), Some(42.0));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_standard_metrics() {
|
||||
let metrics = get_standard_metrics();
|
||||
assert!(!metrics.is_empty());
|
||||
|
||||
// Verify we have the key metrics
|
||||
let metric_names: Vec<&String> = metrics.iter().map(|m| &m.name).collect();
|
||||
assert!(metric_names.contains(&&"databuild_job_duration_seconds".to_string()));
|
||||
assert!(metric_names.contains(&&"databuild_job_peak_memory_mb".to_string()));
|
||||
assert!(metric_names.contains(&&"databuild_job_cpu_time_seconds".to_string()));
|
||||
assert!(metric_names.contains(&&"databuild_job_failures_total".to_string()));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_no_extraction_for_wrong_event_type() {
|
||||
let template = MetricTemplate {
|
||||
name: "test_metric".to_string(),
|
||||
help: "Test".to_string(),
|
||||
metric_type: MetricType::Counter,
|
||||
extractor: MetricExtractor::EventCount { event_type: "task_failed".to_string() },
|
||||
labels: vec![],
|
||||
};
|
||||
|
||||
let entry = create_test_task_success_entry("test-job"); // This is task_success, not task_failed
|
||||
let result = template.extract(&entry);
|
||||
|
||||
assert!(result.is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_no_extraction_for_log_entries() {
|
||||
let template = MetricTemplate {
|
||||
name: "test_metric".to_string(),
|
||||
help: "Test".to_string(),
|
||||
metric_type: MetricType::Counter,
|
||||
extractor: MetricExtractor::JobDuration,
|
||||
labels: vec![],
|
||||
};
|
||||
|
||||
// Create a log entry instead of job event
|
||||
let entry = JobLogEntry {
|
||||
timestamp: "1234567890".to_string(),
|
||||
job_id: "test-job".to_string(),
|
||||
outputs: vec![PartitionRef { r#str: "test/partition".to_string() }],
|
||||
sequence_number: 1,
|
||||
content: Some(job_log_entry::Content::Log(LogMessage {
|
||||
level: log_message::LogLevel::Info as i32,
|
||||
message: "Test log message".to_string(),
|
||||
fields: HashMap::new(),
|
||||
})),
|
||||
};
|
||||
|
||||
let result = template.extract(&entry);
|
||||
assert!(result.is_none());
|
||||
}
|
||||
}
|
||||
|
|
@ -1,507 +0,0 @@
|
|||
use crate::{JobLogEntry, log_access::LogReader, metric_templates::{MetricTemplate, ExtractedMetric, MetricType, get_standard_metrics}};
|
||||
use std::collections::{HashMap, HashSet};
|
||||
use std::path::Path;
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Error, Debug)]
|
||||
pub enum MetricsError {
|
||||
#[error("Log access error: {0}")]
|
||||
LogAccess(#[from] crate::log_access::LogAccessError),
|
||||
#[error("IO error: {0}")]
|
||||
Io(#[from] std::io::Error),
|
||||
#[error("Too many label combinations for metric {metric}: {count} > {limit}")]
|
||||
CardinalityLimit { metric: String, count: usize, limit: usize },
|
||||
}
|
||||
|
||||
/// Aggregated metric value with labels
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct AggregatedMetric {
|
||||
pub name: String,
|
||||
pub help: String,
|
||||
pub metric_type: MetricType,
|
||||
pub samples: Vec<MetricSample>,
|
||||
}
|
||||
|
||||
/// Individual metric sample
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct MetricSample {
|
||||
pub labels: HashMap<String, String>,
|
||||
pub value: f64,
|
||||
pub timestamp_ms: Option<u64>,
|
||||
}
|
||||
|
||||
/// Configuration for metrics aggregation
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct MetricsConfig {
|
||||
/// Maximum number of unique label combinations per metric (cardinality safety)
|
||||
pub max_cardinality_per_metric: usize,
|
||||
/// Time range for metrics collection (in hours from now)
|
||||
pub time_range_hours: u64,
|
||||
/// Whether to include job_id in labels (can create high cardinality)
|
||||
pub include_job_id_labels: bool,
|
||||
/// Maximum number of jobs to process per metric
|
||||
pub max_jobs_per_metric: usize,
|
||||
}
|
||||
|
||||
impl Default for MetricsConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
max_cardinality_per_metric: 1000, // Prometheus recommended limit
|
||||
time_range_hours: 24, // Last 24 hours
|
||||
include_job_id_labels: false, // Disabled by default for cardinality safety
|
||||
max_jobs_per_metric: 100, // Limit recent jobs
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Aggregates metrics from job logs with cardinality safety
|
||||
pub struct MetricsAggregator {
|
||||
log_reader: LogReader,
|
||||
config: MetricsConfig,
|
||||
templates: Vec<MetricTemplate>,
|
||||
}
|
||||
|
||||
impl MetricsAggregator {
|
||||
/// Create a new metrics aggregator
|
||||
pub fn new<P: AsRef<Path>>(logs_path: P, config: MetricsConfig) -> Self {
|
||||
Self {
|
||||
log_reader: LogReader::new(logs_path),
|
||||
config,
|
||||
templates: get_standard_metrics(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Create with default configuration
|
||||
pub fn with_defaults<P: AsRef<Path>>(logs_path: P) -> Self {
|
||||
Self::new(logs_path, MetricsConfig::default())
|
||||
}
|
||||
|
||||
/// Add custom metric template
|
||||
pub fn add_template(&mut self, template: MetricTemplate) {
|
||||
self.templates.push(template);
|
||||
}
|
||||
|
||||
/// Aggregate all metrics from recent job logs
|
||||
pub fn aggregate_metrics(&self) -> Result<Vec<AggregatedMetric>, MetricsError> {
|
||||
// Get recent job IDs
|
||||
let job_ids = self.get_recent_job_ids()?;
|
||||
|
||||
let mut aggregated: HashMap<String, AggregatedMetric> = HashMap::new();
|
||||
let mut cardinality_counters: HashMap<String, HashSet<String>> = HashMap::new();
|
||||
|
||||
// Process each job's logs
|
||||
for job_id in job_ids.iter().take(self.config.max_jobs_per_metric) {
|
||||
if let Ok(entries) = self.get_job_entries(job_id) {
|
||||
for entry in entries {
|
||||
self.process_entry(&entry, &mut aggregated, &mut cardinality_counters)?;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(aggregated.into_values().collect())
|
||||
}
|
||||
|
||||
/// Generate Prometheus format output
|
||||
pub fn to_prometheus_format(&self) -> Result<String, MetricsError> {
|
||||
let metrics = self.aggregate_metrics()?;
|
||||
let mut output = String::new();
|
||||
|
||||
for metric in metrics {
|
||||
// Add help comment
|
||||
output.push_str(&format!("# HELP {} {}\n", metric.name, metric.help));
|
||||
|
||||
// Add type comment
|
||||
let type_str = match metric.metric_type {
|
||||
MetricType::Counter => "counter",
|
||||
MetricType::Gauge => "gauge",
|
||||
MetricType::Histogram => "histogram",
|
||||
MetricType::Summary => "summary",
|
||||
};
|
||||
output.push_str(&format!("# TYPE {} {}\n", metric.name, type_str));
|
||||
|
||||
// Add samples
|
||||
for sample in metric.samples {
|
||||
output.push_str(&format!("{}{} {}\n",
|
||||
metric.name,
|
||||
self.format_labels(&sample.labels),
|
||||
sample.value
|
||||
));
|
||||
}
|
||||
output.push('\n');
|
||||
}
|
||||
|
||||
Ok(output)
|
||||
}
|
||||
|
||||
/// Get recent job IDs within the configured time range
|
||||
fn get_recent_job_ids(&self) -> Result<Vec<String>, MetricsError> {
|
||||
// For now, get all available jobs. In production, this would filter by date
|
||||
let job_ids = self.log_reader.list_available_jobs(None)?;
|
||||
Ok(job_ids)
|
||||
}
|
||||
|
||||
/// Get log entries for a specific job
|
||||
fn get_job_entries(&self, job_id: &str) -> Result<Vec<JobLogEntry>, MetricsError> {
|
||||
use crate::JobLogsRequest;
|
||||
|
||||
let request = JobLogsRequest {
|
||||
job_run_id: job_id.to_string(),
|
||||
since_timestamp: 0,
|
||||
min_level: 0,
|
||||
limit: 1000, // Get all entries for the job
|
||||
};
|
||||
|
||||
let response = self.log_reader.get_job_logs(&request)?;
|
||||
Ok(response.entries)
|
||||
}
|
||||
|
||||
/// Process a single log entry through all metric templates
|
||||
fn process_entry(
|
||||
&self,
|
||||
entry: &JobLogEntry,
|
||||
aggregated: &mut HashMap<String, AggregatedMetric>,
|
||||
cardinality_counters: &mut HashMap<String, HashSet<String>>,
|
||||
) -> Result<(), MetricsError> {
|
||||
for template in &self.templates {
|
||||
if let Some(mut extracted) = template.extract(entry) {
|
||||
// Apply cardinality safety filters
|
||||
if !self.config.include_job_id_labels {
|
||||
extracted.labels.remove("job_id");
|
||||
}
|
||||
|
||||
// Check cardinality limit
|
||||
let label_signature = self.get_label_signature(&extracted.labels);
|
||||
let cardinality_set = cardinality_counters
|
||||
.entry(extracted.name.clone())
|
||||
.or_insert_with(HashSet::new);
|
||||
|
||||
if cardinality_set.len() >= self.config.max_cardinality_per_metric
|
||||
&& !cardinality_set.contains(&label_signature) {
|
||||
// Skip this metric to avoid cardinality explosion
|
||||
continue;
|
||||
}
|
||||
|
||||
cardinality_set.insert(label_signature);
|
||||
|
||||
// Add to aggregated metrics
|
||||
let agg_metric = aggregated
|
||||
.entry(extracted.name.clone())
|
||||
.or_insert_with(|| AggregatedMetric {
|
||||
name: extracted.name.clone(),
|
||||
help: extracted.help.clone(),
|
||||
metric_type: extracted.metric_type.clone(),
|
||||
samples: Vec::new(),
|
||||
});
|
||||
|
||||
// For counters, sum values with same labels; for gauges, keep latest
|
||||
let existing_sample = agg_metric.samples.iter_mut()
|
||||
.find(|s| s.labels == extracted.labels);
|
||||
|
||||
if let Some(sample) = existing_sample {
|
||||
match extracted.metric_type {
|
||||
MetricType::Counter => {
|
||||
sample.value += extracted.value; // Sum counters
|
||||
},
|
||||
MetricType::Gauge | MetricType::Histogram | MetricType::Summary => {
|
||||
sample.value = extracted.value; // Replace with latest
|
||||
},
|
||||
}
|
||||
} else {
|
||||
agg_metric.samples.push(MetricSample {
|
||||
labels: extracted.labels,
|
||||
value: extracted.value,
|
||||
timestamp_ms: None, // Could add timestamp parsing if needed
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Generate a signature string for label combinations
|
||||
fn get_label_signature(&self, labels: &HashMap<String, String>) -> String {
|
||||
let mut pairs: Vec<_> = labels.iter().collect();
|
||||
pairs.sort_by_key(|&(k, _)| k);
|
||||
pairs.iter()
|
||||
.map(|(k, v)| format!("{}={}", k, v))
|
||||
.collect::<Vec<_>>()
|
||||
.join(",")
|
||||
}
|
||||
|
||||
/// Format labels for Prometheus output
|
||||
fn format_labels(&self, labels: &HashMap<String, String>) -> String {
|
||||
if labels.is_empty() {
|
||||
return String::new();
|
||||
}
|
||||
|
||||
let mut pairs: Vec<_> = labels.iter().collect();
|
||||
pairs.sort_by_key(|&(k, _)| k);
|
||||
|
||||
let formatted_pairs: Vec<String> = pairs.iter()
|
||||
.map(|(k, v)| format!("{}=\"{}\"", k, self.escape_label_value(v)))
|
||||
.collect();
|
||||
|
||||
format!("{{{}}}", formatted_pairs.join(","))
|
||||
}
|
||||
|
||||
/// Escape label values for Prometheus format
|
||||
fn escape_label_value(&self, value: &str) -> String {
|
||||
value
|
||||
.replace('\\', "\\\\")
|
||||
.replace('"', "\\\"")
|
||||
.replace('\n', "\\n")
|
||||
.replace('\t', "\\t")
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::{job_log_entry, PartitionRef, WrapperJobEvent};
|
||||
use std::io::Write;
|
||||
use tempfile::TempDir;
|
||||
|
||||
fn create_test_logs(temp_dir: &TempDir) -> Result<(), Box<dyn std::error::Error>> {
|
||||
// Create date directory
|
||||
let date_dir = temp_dir.path().join("2025-01-27");
|
||||
std::fs::create_dir_all(&date_dir)?;
|
||||
|
||||
// Create test job file with job summary
|
||||
let job_file = date_dir.join("test_job_123.jsonl");
|
||||
let mut file = std::fs::File::create(&job_file)?;
|
||||
|
||||
let entry = JobLogEntry {
|
||||
timestamp: "1753763856".to_string(),
|
||||
job_id: "test_job_123".to_string(),
|
||||
outputs: vec![PartitionRef { r#str: "reviews/date=2025-01-27".to_string() }],
|
||||
sequence_number: 4,
|
||||
content: Some(job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "job_summary".to_string(),
|
||||
job_status: Some("JOB_COMPLETED".to_string()),
|
||||
exit_code: Some(0),
|
||||
metadata: {
|
||||
let mut meta = HashMap::new();
|
||||
meta.insert("runtime_ms".to_string(), "2500.000".to_string());
|
||||
meta.insert("peak_memory_mb".to_string(), "128.5".to_string());
|
||||
meta.insert("total_cpu_ms".to_string(), "1200.000".to_string());
|
||||
meta.insert("exit_code".to_string(), "0".to_string());
|
||||
meta
|
||||
},
|
||||
job_label: None,
|
||||
})),
|
||||
};
|
||||
|
||||
writeln!(file, "{}", serde_json::to_string(&entry)?)?;
|
||||
|
||||
// Create task_success entry
|
||||
let success_entry = JobLogEntry {
|
||||
timestamp: "1753763857".to_string(),
|
||||
job_id: "test_job_123".to_string(),
|
||||
outputs: vec![PartitionRef { r#str: "reviews/date=2025-01-27".to_string() }],
|
||||
sequence_number: 5,
|
||||
content: Some(job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "task_success".to_string(),
|
||||
job_status: Some("JOB_COMPLETED".to_string()),
|
||||
exit_code: Some(0),
|
||||
metadata: HashMap::new(),
|
||||
job_label: None,
|
||||
})),
|
||||
};
|
||||
|
||||
writeln!(file, "{}", serde_json::to_string(&success_entry)?)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_metrics_aggregation() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
create_test_logs(&temp_dir).unwrap();
|
||||
|
||||
let aggregator = MetricsAggregator::with_defaults(temp_dir.path());
|
||||
let metrics = aggregator.aggregate_metrics().unwrap();
|
||||
|
||||
assert!(!metrics.is_empty());
|
||||
|
||||
// Find duration metric
|
||||
let duration_metric = metrics.iter()
|
||||
.find(|m| m.name == "databuild_job_duration_seconds")
|
||||
.expect("Should have duration metric");
|
||||
|
||||
assert_eq!(duration_metric.samples.len(), 1);
|
||||
assert_eq!(duration_metric.samples[0].value, 2.5); // 2500ms -> 2.5s
|
||||
|
||||
// Verify labels - should only have job_id (which gets excluded) and job_status
|
||||
let labels = &duration_metric.samples[0].labels;
|
||||
assert_eq!(labels.get("job_status").unwrap(), "JOB_COMPLETED");
|
||||
assert!(!labels.contains_key("job_id")); // Should be excluded by default
|
||||
// Note: job_label would only be available from manifest entries, not job_summary events
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_prometheus_format() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
create_test_logs(&temp_dir).unwrap();
|
||||
|
||||
let aggregator = MetricsAggregator::with_defaults(temp_dir.path());
|
||||
let prometheus_output = aggregator.to_prometheus_format().unwrap();
|
||||
|
||||
assert!(prometheus_output.contains("# HELP databuild_job_duration_seconds"));
|
||||
assert!(prometheus_output.contains("# TYPE databuild_job_duration_seconds histogram"));
|
||||
assert!(prometheus_output.contains("databuild_job_duration_seconds{"));
|
||||
assert!(prometheus_output.contains("job_status=\"JOB_COMPLETED\""));
|
||||
assert!(prometheus_output.contains("} 2.5"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_cardinality_safety() {
|
||||
let config = MetricsConfig {
|
||||
max_cardinality_per_metric: 2, // Very low limit for testing
|
||||
time_range_hours: 24,
|
||||
include_job_id_labels: true, // Enable to test cardinality
|
||||
max_jobs_per_metric: 100,
|
||||
};
|
||||
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
|
||||
// Create multiple jobs to test cardinality limit
|
||||
let date_dir = temp_dir.path().join("2025-01-27");
|
||||
std::fs::create_dir_all(&date_dir).unwrap();
|
||||
|
||||
for i in 1..=5 {
|
||||
let job_file = date_dir.join(format!("job_{}.jsonl", i));
|
||||
let mut file = std::fs::File::create(&job_file).unwrap();
|
||||
|
||||
let entry = JobLogEntry {
|
||||
timestamp: "1753763856".to_string(),
|
||||
job_id: format!("job_{}", i),
|
||||
outputs: vec![PartitionRef { r#str: format!("table_{}/date=2025-01-27", i) }],
|
||||
sequence_number: 1,
|
||||
content: Some(job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "task_success".to_string(),
|
||||
job_status: Some("JOB_COMPLETED".to_string()),
|
||||
exit_code: Some(0),
|
||||
metadata: HashMap::new(),
|
||||
job_label: None,
|
||||
})),
|
||||
};
|
||||
|
||||
writeln!(file, "{}", serde_json::to_string(&entry).unwrap()).unwrap();
|
||||
}
|
||||
|
||||
let aggregator = MetricsAggregator::new(temp_dir.path(), config);
|
||||
let metrics = aggregator.aggregate_metrics().unwrap();
|
||||
|
||||
// Find the success count metric
|
||||
let success_metric = metrics.iter()
|
||||
.find(|m| m.name == "databuild_job_events_total")
|
||||
.expect("Should have success count metric");
|
||||
|
||||
// Should be limited by cardinality (max 2 unique label combinations)
|
||||
assert!(success_metric.samples.len() <= 2,
|
||||
"Expected <= 2 samples due to cardinality limit, got {}",
|
||||
success_metric.samples.len());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_label_escaping() {
|
||||
let aggregator = MetricsAggregator::with_defaults("/tmp");
|
||||
|
||||
assert_eq!(aggregator.escape_label_value("normal"), "normal");
|
||||
assert_eq!(aggregator.escape_label_value("with\"quotes"), "with\\\"quotes");
|
||||
assert_eq!(aggregator.escape_label_value("with\\backslash"), "with\\\\backslash");
|
||||
assert_eq!(aggregator.escape_label_value("with\nnewline"), "with\\nnewline");
|
||||
assert_eq!(aggregator.escape_label_value("with\ttab"), "with\\ttab");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_label_signature_generation() {
|
||||
let aggregator = MetricsAggregator::with_defaults("/tmp");
|
||||
|
||||
let mut labels1 = HashMap::new();
|
||||
labels1.insert("job_label".to_string(), "test_job".to_string());
|
||||
labels1.insert("job_status".to_string(), "JOB_COMPLETED".to_string());
|
||||
|
||||
let mut labels2 = HashMap::new();
|
||||
labels2.insert("job_status".to_string(), "JOB_COMPLETED".to_string());
|
||||
labels2.insert("job_label".to_string(), "test_job".to_string());
|
||||
|
||||
// Order shouldn't matter
|
||||
assert_eq!(
|
||||
aggregator.get_label_signature(&labels1),
|
||||
aggregator.get_label_signature(&labels2)
|
||||
);
|
||||
|
||||
let signature = aggregator.get_label_signature(&labels1);
|
||||
assert!(signature.contains("job_label=test_job"));
|
||||
assert!(signature.contains("job_status=JOB_COMPLETED"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_counter_vs_gauge_aggregation() {
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let date_dir = temp_dir.path().join("2025-01-27");
|
||||
std::fs::create_dir_all(&date_dir).unwrap();
|
||||
|
||||
let job_file = date_dir.join("test_job.jsonl");
|
||||
let mut file = std::fs::File::create(&job_file).unwrap();
|
||||
|
||||
// Create multiple task_success events (should be summed as counter)
|
||||
for i in 1..=3 {
|
||||
let entry = JobLogEntry {
|
||||
timestamp: format!("175376385{}", i),
|
||||
job_id: "test_job".to_string(),
|
||||
outputs: vec![PartitionRef { r#str: "reviews/date=2025-01-27".to_string() }],
|
||||
sequence_number: i,
|
||||
content: Some(job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "task_success".to_string(),
|
||||
job_status: Some("JOB_COMPLETED".to_string()),
|
||||
exit_code: Some(0),
|
||||
metadata: HashMap::new(),
|
||||
job_label: None,
|
||||
})),
|
||||
};
|
||||
writeln!(file, "{}", serde_json::to_string(&entry).unwrap()).unwrap();
|
||||
}
|
||||
|
||||
// Create job summaries with different memory values (should use latest as gauge)
|
||||
for (i, memory) in ["100.0", "150.0", "120.0"].iter().enumerate() {
|
||||
let entry = JobLogEntry {
|
||||
timestamp: format!("175376386{}", i),
|
||||
job_id: "test_job".to_string(),
|
||||
outputs: vec![PartitionRef { r#str: "reviews/date=2025-01-27".to_string() }],
|
||||
sequence_number: (i + 10) as u64,
|
||||
content: Some(job_log_entry::Content::JobEvent(WrapperJobEvent {
|
||||
event_type: "job_summary".to_string(),
|
||||
job_status: Some("JOB_COMPLETED".to_string()),
|
||||
exit_code: Some(0),
|
||||
metadata: {
|
||||
let mut meta = HashMap::new();
|
||||
meta.insert("peak_memory_mb".to_string(), memory.to_string());
|
||||
meta.insert("runtime_ms".to_string(), "1000".to_string());
|
||||
meta.insert("total_cpu_ms".to_string(), "500".to_string());
|
||||
meta
|
||||
},
|
||||
job_label: None,
|
||||
})),
|
||||
};
|
||||
writeln!(file, "{}", serde_json::to_string(&entry).unwrap()).unwrap();
|
||||
}
|
||||
|
||||
let aggregator = MetricsAggregator::with_defaults(temp_dir.path());
|
||||
let metrics = aggregator.aggregate_metrics().unwrap();
|
||||
|
||||
// Check counter behavior (task_success events should be summed)
|
||||
let success_metric = metrics.iter()
|
||||
.find(|m| m.name == "databuild_job_events_total")
|
||||
.expect("Should have success count metric");
|
||||
assert_eq!(success_metric.samples[0].value, 3.0); // 3 events summed
|
||||
|
||||
// Check gauge behavior (memory should be latest value)
|
||||
let memory_metric = metrics.iter()
|
||||
.find(|m| m.name == "databuild_job_peak_memory_mb")
|
||||
.expect("Should have memory metric");
|
||||
assert_eq!(memory_metric.samples[0].value, 120.0); // Latest value
|
||||
}
|
||||
}
|
||||
95
databuild/mock_job_run.rs
Normal file
95
databuild/mock_job_run.rs
Normal file
|
|
@ -0,0 +1,95 @@
|
|||
use crate::data_deps::DataDepLogLine;
|
||||
use crate::{JobRunMissingDeps, MissingDeps};
|
||||
use std::collections::HashMap;
|
||||
|
||||
pub struct MockJobRun {
|
||||
sleep_ms: u64,
|
||||
stdout_msg: String,
|
||||
output_file: Option<OutputFile>,
|
||||
exit_code: u8,
|
||||
}
|
||||
|
||||
pub struct OutputFile {
|
||||
path: String,
|
||||
contents: String,
|
||||
}
|
||||
|
||||
impl Default for MockJobRun {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
sleep_ms: 0,
|
||||
stdout_msg: "test executed".to_string(),
|
||||
output_file: None,
|
||||
exit_code: 0,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl MockJobRun {
|
||||
pub fn new() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
|
||||
pub fn sleep_ms(mut self, val: u64) -> Self {
|
||||
self.sleep_ms = val;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn stdout_msg(mut self, val: &String) -> Self {
|
||||
self.stdout_msg = val.into();
|
||||
self
|
||||
}
|
||||
|
||||
pub fn output_file(mut self, path: &String, contents: &String) -> Self {
|
||||
self.output_file = Some(OutputFile {
|
||||
path: path.to_string(),
|
||||
contents: contents.to_string(),
|
||||
});
|
||||
self
|
||||
}
|
||||
|
||||
pub fn exit_code(mut self, val: u8) -> Self {
|
||||
self.exit_code = val;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn dep_miss(self, missing_deps: Vec<MissingDeps>) -> Self {
|
||||
self.exit_code(1).stdout_msg(
|
||||
&DataDepLogLine::DepMiss(JobRunMissingDeps {
|
||||
version: "1".to_string(),
|
||||
missing_deps,
|
||||
})
|
||||
.into(),
|
||||
)
|
||||
}
|
||||
|
||||
pub fn to_env(&self) -> HashMap<String, String> {
|
||||
let mut env = HashMap::new();
|
||||
env.insert(
|
||||
"DATABUILD_TEST_SLEEP_MS".to_string(),
|
||||
self.sleep_ms.to_string(),
|
||||
);
|
||||
env.insert(
|
||||
"DATABUILD_TEST_EXIT_CODE".to_string(),
|
||||
self.exit_code.to_string(),
|
||||
);
|
||||
env.insert("DATABUILD_TEST_STDOUT".to_string(), self.stdout_msg.clone());
|
||||
if let Some(output_file) = &self.output_file {
|
||||
env.insert(
|
||||
"DATABUILD_TEST_OUTPUT_FILE".to_string(),
|
||||
output_file.path.clone(),
|
||||
);
|
||||
env.insert(
|
||||
"DATABUILD_TEST_OUTPUT_CONTENTS".to_string(),
|
||||
output_file.contents.clone(),
|
||||
);
|
||||
}
|
||||
env
|
||||
}
|
||||
|
||||
pub fn bin_path() -> String {
|
||||
std::env::var("TEST_SRCDIR")
|
||||
.map(|srcdir| format!("{}/_main/databuild/test/test_job_helper", srcdir))
|
||||
.unwrap_or_else(|_| "bazel-bin/databuild/test/test_job_helper".to_string())
|
||||
}
|
||||
}
|
||||
|
|
@ -1,15 +0,0 @@
|
|||
use crate::event_log::BuildEventLogError;
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum OrchestrationError {
|
||||
#[error("Event log error: {0}")]
|
||||
EventLog(#[from] BuildEventLogError),
|
||||
|
||||
#[error("Build coordination error: {0}")]
|
||||
Coordination(String),
|
||||
|
||||
#[error("Invalid build state transition: {current} -> {requested}")]
|
||||
InvalidStateTransition { current: String, requested: String },
|
||||
}
|
||||
|
||||
pub type Result<T> = std::result::Result<T, OrchestrationError>;
|
||||
|
|
@ -1,151 +0,0 @@
|
|||
use crate::*;
|
||||
use crate::event_log::{create_build_event, current_timestamp_nanos, generate_event_id};
|
||||
|
||||
/// Helper functions for creating standardized build events
|
||||
|
||||
pub fn create_build_request_received_event(
|
||||
build_request_id: String,
|
||||
requested_partitions: Vec<PartitionRef>,
|
||||
) -> BuildEvent {
|
||||
create_build_event(
|
||||
build_request_id,
|
||||
build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestReceived as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestReceived.to_display_string(),
|
||||
requested_partitions,
|
||||
message: "Build request received".to_string(),
|
||||
}),
|
||||
)
|
||||
}
|
||||
|
||||
pub fn create_build_planning_started_event(
|
||||
build_request_id: String,
|
||||
) -> BuildEvent {
|
||||
create_build_event(
|
||||
build_request_id,
|
||||
build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestPlanning as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestPlanning.to_display_string(),
|
||||
requested_partitions: vec![],
|
||||
message: "Starting build planning".to_string(),
|
||||
}),
|
||||
)
|
||||
}
|
||||
|
||||
pub fn create_build_execution_started_event(
|
||||
build_request_id: String,
|
||||
) -> BuildEvent {
|
||||
create_build_event(
|
||||
build_request_id,
|
||||
build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestExecuting as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestExecuting.to_display_string(),
|
||||
requested_partitions: vec![],
|
||||
message: "Starting build execution".to_string(),
|
||||
}),
|
||||
)
|
||||
}
|
||||
|
||||
pub fn create_build_completed_event(
|
||||
build_request_id: String,
|
||||
result: &super::BuildResult,
|
||||
) -> BuildEvent {
|
||||
let message = match result {
|
||||
super::BuildResult::Success { jobs_completed } => {
|
||||
format!("Build completed successfully with {} jobs", jobs_completed)
|
||||
}
|
||||
super::BuildResult::Failed { jobs_completed, jobs_failed } => {
|
||||
format!("Build failed: {} jobs completed, {} jobs failed", jobs_completed, jobs_failed)
|
||||
}
|
||||
super::BuildResult::FailFast { trigger_job } => {
|
||||
format!("Build failed fast due to job: {}", trigger_job)
|
||||
}
|
||||
};
|
||||
|
||||
let status = match result {
|
||||
super::BuildResult::Success { .. } => BuildRequestStatus::BuildRequestCompleted,
|
||||
super::BuildResult::Failed { .. } | super::BuildResult::FailFast { .. } => BuildRequestStatus::BuildRequestFailed,
|
||||
};
|
||||
|
||||
create_build_event(
|
||||
build_request_id,
|
||||
build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: status as i32,
|
||||
status_name: status.to_display_string(),
|
||||
requested_partitions: vec![],
|
||||
message,
|
||||
}),
|
||||
)
|
||||
}
|
||||
|
||||
pub fn create_analysis_completed_event(
|
||||
build_request_id: String,
|
||||
requested_partitions: Vec<PartitionRef>,
|
||||
task_count: usize,
|
||||
) -> BuildEvent {
|
||||
create_build_event(
|
||||
build_request_id,
|
||||
build_event::EventType::BuildRequestEvent(BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestAnalysisCompleted as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestAnalysisCompleted.to_display_string(),
|
||||
requested_partitions,
|
||||
message: format!("Analysis completed successfully, {} tasks planned", task_count),
|
||||
}),
|
||||
)
|
||||
}
|
||||
|
||||
pub fn create_job_scheduled_event(
|
||||
build_request_id: String,
|
||||
job_event: &JobEvent,
|
||||
) -> BuildEvent {
|
||||
BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id,
|
||||
event_type: Some(build_event::EventType::JobEvent(job_event.clone())),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn create_job_completed_event(
|
||||
build_request_id: String,
|
||||
job_event: &JobEvent,
|
||||
) -> BuildEvent {
|
||||
BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id,
|
||||
event_type: Some(build_event::EventType::JobEvent(job_event.clone())),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn create_partition_available_event(
|
||||
build_request_id: String,
|
||||
partition_event: &PartitionEvent,
|
||||
) -> BuildEvent {
|
||||
BuildEvent {
|
||||
event_id: generate_event_id(),
|
||||
timestamp: current_timestamp_nanos(),
|
||||
build_request_id,
|
||||
event_type: Some(build_event::EventType::PartitionEvent(partition_event.clone())),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn create_delegation_event(
|
||||
build_request_id: String,
|
||||
partition_ref: &str,
|
||||
target_build: &str,
|
||||
message: &str,
|
||||
) -> BuildEvent {
|
||||
let partition = PartitionRef {
|
||||
str: partition_ref.to_string(),
|
||||
};
|
||||
|
||||
create_build_event(
|
||||
build_request_id,
|
||||
build_event::EventType::DelegationEvent(DelegationEvent {
|
||||
partition_ref: Some(partition),
|
||||
delegated_to_build_request_id: target_build.to_string(),
|
||||
message: message.to_string(),
|
||||
}),
|
||||
)
|
||||
}
|
||||
|
|
@ -1,261 +0,0 @@
|
|||
use crate::*;
|
||||
use crate::event_log::{writer::EventWriter, query_engine::BELQueryEngine};
|
||||
use log::info;
|
||||
use std::sync::Arc;
|
||||
|
||||
pub mod error;
|
||||
pub mod events;
|
||||
|
||||
pub use error::{OrchestrationError, Result};
|
||||
|
||||
/// Result of a build execution
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum BuildResult {
|
||||
Success { jobs_completed: usize },
|
||||
Failed { jobs_completed: usize, jobs_failed: usize },
|
||||
FailFast { trigger_job: String },
|
||||
}
|
||||
|
||||
/// Core orchestrator for managing build lifecycle and event emission
|
||||
pub struct BuildOrchestrator {
|
||||
event_writer: EventWriter,
|
||||
build_request_id: String,
|
||||
requested_partitions: Vec<PartitionRef>,
|
||||
}
|
||||
|
||||
impl BuildOrchestrator {
|
||||
/// Create a new build orchestrator
|
||||
pub fn new(
|
||||
query_engine: Arc<BELQueryEngine>,
|
||||
build_request_id: String,
|
||||
requested_partitions: Vec<PartitionRef>,
|
||||
) -> Self {
|
||||
Self {
|
||||
event_writer: EventWriter::new(query_engine),
|
||||
build_request_id,
|
||||
requested_partitions,
|
||||
}
|
||||
}
|
||||
|
||||
/// Get the build request ID
|
||||
pub fn build_request_id(&self) -> &str {
|
||||
&self.build_request_id
|
||||
}
|
||||
|
||||
/// Get the requested partitions
|
||||
pub fn requested_partitions(&self) -> &[PartitionRef] {
|
||||
&self.requested_partitions
|
||||
}
|
||||
|
||||
/// Emit build request received event and start the build lifecycle
|
||||
pub async fn start_build(&self) -> Result<()> {
|
||||
info!("Starting build for request: {}", self.build_request_id);
|
||||
|
||||
self.event_writer.request_build(
|
||||
self.build_request_id.clone(),
|
||||
self.requested_partitions.clone(),
|
||||
).await
|
||||
.map_err(OrchestrationError::EventLog)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Emit build planning started event
|
||||
pub async fn start_planning(&self) -> Result<()> {
|
||||
info!("Starting build planning for request: {}", self.build_request_id);
|
||||
|
||||
self.event_writer.update_build_status(
|
||||
self.build_request_id.clone(),
|
||||
BuildRequestStatus::BuildRequestPlanning,
|
||||
"Starting build planning".to_string(),
|
||||
).await
|
||||
.map_err(OrchestrationError::EventLog)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Emit build execution started event
|
||||
pub async fn start_execution(&self) -> Result<()> {
|
||||
info!("Starting build execution for request: {}", self.build_request_id);
|
||||
|
||||
self.event_writer.update_build_status(
|
||||
self.build_request_id.clone(),
|
||||
BuildRequestStatus::BuildRequestExecuting,
|
||||
"Starting build execution".to_string(),
|
||||
).await
|
||||
.map_err(OrchestrationError::EventLog)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Emit build completion event
|
||||
pub async fn complete_build(&self, result: BuildResult) -> Result<()> {
|
||||
info!("Completing build for request: {} with result: {:?}",
|
||||
self.build_request_id, result);
|
||||
|
||||
let (status, message) = match &result {
|
||||
BuildResult::Success { jobs_completed } => {
|
||||
(BuildRequestStatus::BuildRequestCompleted,
|
||||
format!("Build completed successfully with {} jobs", jobs_completed))
|
||||
}
|
||||
BuildResult::Failed { jobs_completed, jobs_failed } => {
|
||||
(BuildRequestStatus::BuildRequestFailed,
|
||||
format!("Build failed: {} jobs completed, {} jobs failed", jobs_completed, jobs_failed))
|
||||
}
|
||||
BuildResult::FailFast { trigger_job } => {
|
||||
(BuildRequestStatus::BuildRequestFailed,
|
||||
format!("Build failed fast due to job: {}", trigger_job))
|
||||
}
|
||||
};
|
||||
|
||||
self.event_writer.update_build_status(
|
||||
self.build_request_id.clone(),
|
||||
status,
|
||||
message,
|
||||
).await
|
||||
.map_err(OrchestrationError::EventLog)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Emit analysis completed event
|
||||
pub async fn emit_analysis_completed(&self, task_count: usize) -> Result<()> {
|
||||
self.event_writer.update_build_status_with_partitions(
|
||||
self.build_request_id.clone(),
|
||||
BuildRequestStatus::BuildRequestAnalysisCompleted,
|
||||
self.requested_partitions.clone(),
|
||||
format!("Analysis completed successfully, {} tasks planned", task_count),
|
||||
).await
|
||||
.map_err(OrchestrationError::EventLog)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Emit job scheduled event
|
||||
pub async fn emit_job_scheduled(&self, job: &JobEvent) -> Result<()> {
|
||||
let event = events::create_job_scheduled_event(
|
||||
self.build_request_id.clone(),
|
||||
job,
|
||||
);
|
||||
|
||||
self.event_writer.append_event(event).await
|
||||
.map_err(OrchestrationError::EventLog)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Emit job completed event
|
||||
pub async fn emit_job_completed(&self, job: &JobEvent) -> Result<()> {
|
||||
let event = events::create_job_completed_event(
|
||||
self.build_request_id.clone(),
|
||||
job,
|
||||
);
|
||||
|
||||
self.event_writer.append_event(event).await
|
||||
.map_err(OrchestrationError::EventLog)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Emit partition available event
|
||||
pub async fn emit_partition_available(&self, partition: &PartitionEvent) -> Result<()> {
|
||||
let event = events::create_partition_available_event(
|
||||
self.build_request_id.clone(),
|
||||
partition,
|
||||
);
|
||||
|
||||
self.event_writer.append_event(event).await
|
||||
.map_err(OrchestrationError::EventLog)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Emit delegation event
|
||||
pub async fn emit_delegation(
|
||||
&self,
|
||||
partition_ref: &str,
|
||||
target_build: &str,
|
||||
message: &str,
|
||||
) -> Result<()> {
|
||||
let partition = PartitionRef { str: partition_ref.to_string() };
|
||||
|
||||
self.event_writer.record_delegation(
|
||||
self.build_request_id.clone(),
|
||||
partition,
|
||||
target_build.to_string(),
|
||||
message.to_string(),
|
||||
).await
|
||||
.map_err(OrchestrationError::EventLog)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_build_lifecycle_events() {
|
||||
// Use mock BEL query engine for testing
|
||||
let query_engine = crate::event_log::mock::create_mock_bel_query_engine().await.unwrap();
|
||||
let partitions = vec![PartitionRef { str: "test/partition".to_string() }];
|
||||
|
||||
let orchestrator = BuildOrchestrator::new(
|
||||
query_engine,
|
||||
"test-build-123".to_string(),
|
||||
partitions.clone(),
|
||||
);
|
||||
|
||||
// Test full build lifecycle
|
||||
orchestrator.start_build().await.unwrap();
|
||||
orchestrator.start_planning().await.unwrap();
|
||||
orchestrator.start_execution().await.unwrap();
|
||||
orchestrator.complete_build(BuildResult::Success { jobs_completed: 5 }).await.unwrap();
|
||||
|
||||
// Note: Since we're using the real BELQueryEngine with mock storage,
|
||||
// we can't easily inspect emitted events in this test without significant refactoring.
|
||||
// The test verifies that the orchestration methods complete without errors,
|
||||
// which exercises the event emission code paths.
|
||||
|
||||
// TODO: If we need to verify specific events, we could:
|
||||
// 1. Query the mock storage through the query engine
|
||||
// 2. Create a specialized test storage that captures events
|
||||
// 3. Use the existing MockBuildEventLog test pattern with dependency injection
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_partition_and_job_events() {
|
||||
// Use mock BEL query engine for testing
|
||||
let query_engine = crate::event_log::mock::create_mock_bel_query_engine().await.unwrap();
|
||||
|
||||
let orchestrator = BuildOrchestrator::new(
|
||||
query_engine,
|
||||
"test-build-456".to_string(),
|
||||
vec![],
|
||||
);
|
||||
|
||||
// Test analysis completed event
|
||||
orchestrator.emit_analysis_completed(3).await.unwrap();
|
||||
|
||||
// Test job event
|
||||
let partition = PartitionRef { str: "data/users".to_string() };
|
||||
let job_event = JobEvent {
|
||||
job_run_id: "job-run-123".to_string(),
|
||||
job_label: Some(JobLabel { label: "//:test_job".to_string() }),
|
||||
target_partitions: vec![partition.clone()],
|
||||
status_code: JobStatus::JobScheduled as i32,
|
||||
status_name: JobStatus::JobScheduled.to_display_string(),
|
||||
message: "Job scheduled".to_string(),
|
||||
config: None,
|
||||
manifests: vec![],
|
||||
};
|
||||
orchestrator.emit_job_scheduled(&job_event).await.unwrap();
|
||||
|
||||
// Note: Same testing limitation as above.
|
||||
// We verify that the methods complete successfully without panicking.
|
||||
}
|
||||
}
|
||||
1166
databuild/orchestrator.rs
Normal file
1166
databuild/orchestrator.rs
Normal file
File diff suppressed because it is too large
Load diff
577
databuild/partition_state.rs
Normal file
577
databuild/partition_state.rs
Normal file
|
|
@ -0,0 +1,577 @@
|
|||
use crate::util::{HasRelatedIds, RelatedIds};
|
||||
use crate::{PartitionDetail, PartitionRef, PartitionStatus, PartitionStatusCode};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use sha2::{Digest, Sha256};
|
||||
use uuid::Uuid;
|
||||
|
||||
/// Derive a deterministic UUID from job_run_id and partition_ref.
|
||||
/// This ensures replay produces the same UUIDs.
|
||||
pub fn derive_partition_uuid(job_run_id: &str, partition_ref: &str) -> Uuid {
|
||||
let mut hasher = Sha256::new();
|
||||
hasher.update(job_run_id.as_bytes());
|
||||
hasher.update(partition_ref.as_bytes());
|
||||
let hash = hasher.finalize();
|
||||
Uuid::from_slice(&hash[0..16]).expect("SHA256 produces at least 16 bytes")
|
||||
}
|
||||
|
||||
/// State: Partition is currently being built by a job
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct BuildingState {
|
||||
pub job_run_id: String,
|
||||
}
|
||||
|
||||
/// State: Partition is waiting for upstream dependencies to be built
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct UpstreamBuildingState {
|
||||
pub job_run_id: String,
|
||||
pub missing_deps: Vec<PartitionRef>, // partition refs that are missing
|
||||
}
|
||||
|
||||
/// State: Upstream dependencies are satisfied, partition is ready to retry building
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct UpForRetryState {
|
||||
pub original_job_run_id: String, // job that had the dep miss
|
||||
}
|
||||
|
||||
/// State: Partition has been successfully built
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct LiveState {
|
||||
pub built_at: u64,
|
||||
pub built_by: String, // job_run_id
|
||||
}
|
||||
|
||||
/// State: Partition build failed (hard failure, not retryable)
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct FailedState {
|
||||
pub failed_at: u64,
|
||||
pub failed_by: String, // job_run_id
|
||||
}
|
||||
|
||||
/// State: Partition failed because upstream dependencies failed (terminal)
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct UpstreamFailedState {
|
||||
pub failed_at: u64,
|
||||
pub failed_upstream_refs: Vec<PartitionRef>, // which upstream partitions failed
|
||||
}
|
||||
|
||||
/// State: Partition has been marked as invalid/tainted
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct TaintedState {
|
||||
pub tainted_at: u64,
|
||||
pub taint_ids: Vec<String>,
|
||||
/// Job run that originally built this partition (before it was tainted)
|
||||
pub built_by: String,
|
||||
}
|
||||
|
||||
/// Generic partition struct parameterized by state.
|
||||
/// Each partition has a unique UUID derived from the job_run_id that created it.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct PartitionWithState<S> {
|
||||
pub uuid: Uuid,
|
||||
pub partition_ref: PartitionRef,
|
||||
pub state: S,
|
||||
}
|
||||
|
||||
/// Wrapper enum for storing partitions in collections.
|
||||
/// Note: Missing state has been removed - partitions are only created when jobs start building them.
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum Partition {
|
||||
Building(PartitionWithState<BuildingState>),
|
||||
UpstreamBuilding(PartitionWithState<UpstreamBuildingState>),
|
||||
UpForRetry(PartitionWithState<UpForRetryState>),
|
||||
Live(PartitionWithState<LiveState>),
|
||||
Failed(PartitionWithState<FailedState>),
|
||||
UpstreamFailed(PartitionWithState<UpstreamFailedState>),
|
||||
Tainted(PartitionWithState<TaintedState>),
|
||||
}
|
||||
|
||||
/// Type-safe partition reference wrappers that encode state expectations in function signatures. It
|
||||
/// is critical that these be treated with respect, not just summoned because it's convenient.
|
||||
/// These should be created ephemerally from typestate objects via .get_ref() and used
|
||||
/// immediately — never stored long-term, as partition state can change.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct BuildingPartitionRef(pub PartitionRef);
|
||||
impl PartitionWithState<BuildingState> {
|
||||
pub fn get_ref(&self) -> BuildingPartitionRef {
|
||||
BuildingPartitionRef(self.partition_ref.clone())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct UpstreamBuildingPartitionRef(pub PartitionRef);
|
||||
impl PartitionWithState<UpstreamBuildingState> {
|
||||
pub fn get_ref(&self) -> UpstreamBuildingPartitionRef {
|
||||
UpstreamBuildingPartitionRef(self.partition_ref.clone())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct UpForRetryPartitionRef(pub PartitionRef);
|
||||
impl PartitionWithState<UpForRetryState> {
|
||||
pub fn get_ref(&self) -> UpForRetryPartitionRef {
|
||||
UpForRetryPartitionRef(self.partition_ref.clone())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct LivePartitionRef(pub PartitionRef);
|
||||
impl PartitionWithState<LiveState> {
|
||||
pub fn get_ref(&self) -> LivePartitionRef {
|
||||
LivePartitionRef(self.partition_ref.clone())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct FailedPartitionRef(pub PartitionRef);
|
||||
impl PartitionWithState<FailedState> {
|
||||
pub fn get_ref(&self) -> FailedPartitionRef {
|
||||
FailedPartitionRef(self.partition_ref.clone())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct UpstreamFailedPartitionRef(pub PartitionRef);
|
||||
impl PartitionWithState<UpstreamFailedState> {
|
||||
pub fn get_ref(&self) -> UpstreamFailedPartitionRef {
|
||||
UpstreamFailedPartitionRef(self.partition_ref.clone())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct TaintedPartitionRef(pub PartitionRef);
|
||||
impl PartitionWithState<TaintedState> {
|
||||
pub fn get_ref(&self) -> TaintedPartitionRef {
|
||||
TaintedPartitionRef(self.partition_ref.clone())
|
||||
}
|
||||
}
|
||||
|
||||
// Type-safe transition methods for BuildingState
|
||||
impl PartitionWithState<BuildingState> {
|
||||
/// Create a new partition directly in Building state.
|
||||
/// UUID is derived from job_run_id + partition_ref for deterministic replay.
|
||||
pub fn new(job_run_id: String, partition_ref: PartitionRef) -> Self {
|
||||
let uuid = derive_partition_uuid(&job_run_id, &partition_ref.r#ref);
|
||||
PartitionWithState {
|
||||
uuid,
|
||||
partition_ref,
|
||||
state: BuildingState { job_run_id },
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition from Building to Live when a job successfully completes
|
||||
pub fn complete(self, timestamp: u64) -> PartitionWithState<LiveState> {
|
||||
PartitionWithState {
|
||||
uuid: self.uuid,
|
||||
partition_ref: self.partition_ref,
|
||||
state: LiveState {
|
||||
built_at: timestamp,
|
||||
built_by: self.state.job_run_id,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition from Building to Failed when a job fails (hard failure)
|
||||
pub fn fail(self, timestamp: u64) -> PartitionWithState<FailedState> {
|
||||
PartitionWithState {
|
||||
uuid: self.uuid,
|
||||
partition_ref: self.partition_ref,
|
||||
state: FailedState {
|
||||
failed_at: timestamp,
|
||||
failed_by: self.state.job_run_id,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition from Building to UpstreamBuilding when job reports missing dependencies
|
||||
pub fn dep_miss(
|
||||
self,
|
||||
missing_deps: Vec<PartitionRef>,
|
||||
) -> PartitionWithState<UpstreamBuildingState> {
|
||||
PartitionWithState {
|
||||
uuid: self.uuid,
|
||||
partition_ref: self.partition_ref,
|
||||
state: UpstreamBuildingState {
|
||||
job_run_id: self.state.job_run_id,
|
||||
missing_deps,
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Type-safe transition methods for UpstreamBuildingState
|
||||
impl PartitionWithState<UpstreamBuildingState> {
|
||||
/// Transition from UpstreamBuilding to UpForRetry when all upstream deps are satisfied
|
||||
pub fn upstreams_satisfied(self) -> PartitionWithState<UpForRetryState> {
|
||||
PartitionWithState {
|
||||
uuid: self.uuid,
|
||||
partition_ref: self.partition_ref,
|
||||
state: UpForRetryState {
|
||||
original_job_run_id: self.state.job_run_id,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
/// Transition from UpstreamBuilding to UpstreamFailed when an upstream dep fails
|
||||
pub fn upstream_failed(
|
||||
self,
|
||||
failed_upstream_refs: Vec<PartitionRef>,
|
||||
timestamp: u64,
|
||||
) -> PartitionWithState<UpstreamFailedState> {
|
||||
PartitionWithState {
|
||||
uuid: self.uuid,
|
||||
partition_ref: self.partition_ref,
|
||||
state: UpstreamFailedState {
|
||||
failed_at: timestamp,
|
||||
failed_upstream_refs,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
/// Check if a specific upstream ref is in our missing deps
|
||||
pub fn is_waiting_for(&self, upstream_ref: &str) -> bool {
|
||||
self.state
|
||||
.missing_deps
|
||||
.iter()
|
||||
.any(|d| d.r#ref == upstream_ref)
|
||||
}
|
||||
|
||||
/// Remove a satisfied upstream from missing deps. Returns remaining count.
|
||||
pub fn satisfy_upstream(mut self, upstream_ref: &str) -> (Self, usize) {
|
||||
self.state.missing_deps.retain(|r| r.r#ref != upstream_ref);
|
||||
let remaining = self.state.missing_deps.len();
|
||||
(self, remaining)
|
||||
}
|
||||
}
|
||||
|
||||
// Type-safe transition methods for LiveState
|
||||
impl PartitionWithState<LiveState> {
|
||||
/// Transition from Live to Tainted when a taint is applied
|
||||
pub fn taint(self, taint_id: String, timestamp: u64) -> PartitionWithState<TaintedState> {
|
||||
PartitionWithState {
|
||||
uuid: self.uuid,
|
||||
partition_ref: self.partition_ref,
|
||||
state: TaintedState {
|
||||
tainted_at: timestamp,
|
||||
taint_ids: vec![taint_id],
|
||||
built_by: self.state.built_by,
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Type-safe transition methods for TaintedState
|
||||
impl PartitionWithState<TaintedState> {
|
||||
/// Add another taint to an already-tainted partition
|
||||
pub fn add_taint(mut self, taint_id: String) -> Self {
|
||||
if !self.state.taint_ids.contains(&taint_id) {
|
||||
self.state.taint_ids.push(taint_id);
|
||||
}
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
// Helper methods on the Partition enum
|
||||
impl Partition {
|
||||
/// Get the UUID from any state
|
||||
pub fn uuid(&self) -> Uuid {
|
||||
match self {
|
||||
Partition::Building(p) => p.uuid,
|
||||
Partition::UpstreamBuilding(p) => p.uuid,
|
||||
Partition::UpForRetry(p) => p.uuid,
|
||||
Partition::Live(p) => p.uuid,
|
||||
Partition::Failed(p) => p.uuid,
|
||||
Partition::UpstreamFailed(p) => p.uuid,
|
||||
Partition::Tainted(p) => p.uuid,
|
||||
}
|
||||
}
|
||||
|
||||
/// Get the partition reference from any state
|
||||
pub fn partition_ref(&self) -> &PartitionRef {
|
||||
match self {
|
||||
Partition::Building(p) => &p.partition_ref,
|
||||
Partition::UpstreamBuilding(p) => &p.partition_ref,
|
||||
Partition::UpForRetry(p) => &p.partition_ref,
|
||||
Partition::Live(p) => &p.partition_ref,
|
||||
Partition::Failed(p) => &p.partition_ref,
|
||||
Partition::UpstreamFailed(p) => &p.partition_ref,
|
||||
Partition::Tainted(p) => &p.partition_ref,
|
||||
}
|
||||
}
|
||||
|
||||
/// Check if partition is in Live state
|
||||
pub fn is_live(&self) -> bool {
|
||||
matches!(self, Partition::Live(_))
|
||||
}
|
||||
|
||||
/// Check if partition is in a terminal state (Live, Failed, UpstreamFailed, or Tainted)
|
||||
pub fn is_terminal(&self) -> bool {
|
||||
matches!(
|
||||
self,
|
||||
Partition::Live(_)
|
||||
| Partition::Failed(_)
|
||||
| Partition::UpstreamFailed(_)
|
||||
| Partition::Tainted(_)
|
||||
)
|
||||
}
|
||||
|
||||
/// Check if partition is currently being built (includes UpstreamBuilding as it holds a "lease")
|
||||
pub fn is_building(&self) -> bool {
|
||||
matches!(
|
||||
self,
|
||||
Partition::Building(_) | Partition::UpstreamBuilding(_)
|
||||
)
|
||||
}
|
||||
|
||||
/// Check if partition is in UpForRetry state (ready to be rebuilt)
|
||||
pub fn is_up_for_retry(&self) -> bool {
|
||||
matches!(self, Partition::UpForRetry(_))
|
||||
}
|
||||
|
||||
/// Check if partition is failed (hard failure)
|
||||
pub fn is_failed(&self) -> bool {
|
||||
matches!(self, Partition::Failed(_))
|
||||
}
|
||||
|
||||
/// Check if partition is upstream failed
|
||||
pub fn is_upstream_failed(&self) -> bool {
|
||||
matches!(self, Partition::UpstreamFailed(_))
|
||||
}
|
||||
|
||||
/// Check if partition is tainted
|
||||
pub fn is_tainted(&self) -> bool {
|
||||
matches!(self, Partition::Tainted(_))
|
||||
}
|
||||
}
|
||||
|
||||
// ==================== HasRelatedIds trait implementation ====================
|
||||
|
||||
impl HasRelatedIds for Partition {
|
||||
/// Get the IDs of all entities this partition references.
|
||||
/// Note: downstream_partition_uuids and want_ids come from BuildState indexes,
|
||||
/// not from Partition itself.
|
||||
fn related_ids(&self) -> RelatedIds {
|
||||
// Job run ID from the builder (for states that track it)
|
||||
let job_run_ids: Vec<String> = match self {
|
||||
Partition::Building(p) => vec![p.state.job_run_id.clone()],
|
||||
Partition::UpstreamBuilding(p) => vec![p.state.job_run_id.clone()],
|
||||
Partition::UpForRetry(p) => vec![p.state.original_job_run_id.clone()],
|
||||
Partition::Live(p) => vec![p.state.built_by.clone()],
|
||||
Partition::Failed(p) => vec![p.state.failed_by.clone()],
|
||||
Partition::UpstreamFailed(_) => vec![],
|
||||
Partition::Tainted(p) => vec![p.state.built_by.clone()],
|
||||
};
|
||||
|
||||
// Partition refs from missing deps (for UpstreamBuilding state)
|
||||
let partition_refs: Vec<String> = match self {
|
||||
Partition::UpstreamBuilding(p) => p
|
||||
.state
|
||||
.missing_deps
|
||||
.iter()
|
||||
.map(|d| d.r#ref.clone())
|
||||
.collect(),
|
||||
Partition::UpstreamFailed(p) => p
|
||||
.state
|
||||
.failed_upstream_refs
|
||||
.iter()
|
||||
.map(|d| d.r#ref.clone())
|
||||
.collect(),
|
||||
_ => vec![],
|
||||
};
|
||||
|
||||
RelatedIds {
|
||||
partition_refs,
|
||||
partition_uuids: vec![],
|
||||
job_run_ids,
|
||||
want_ids: vec![],
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Partition {
|
||||
/// Convert to PartitionDetail for API responses and queries.
|
||||
/// Note: want_ids and downstream_partition_uuids are empty here and will be
|
||||
/// populated by BuildState from its inverted indexes.
|
||||
/// Upstream lineage is resolved via built_by_job_run_id → job run's read_deps.
|
||||
pub fn to_detail(&self) -> PartitionDetail {
|
||||
match self {
|
||||
Partition::Building(p) => PartitionDetail {
|
||||
r#ref: Some(p.partition_ref.clone()),
|
||||
status: Some(PartitionStatus {
|
||||
code: PartitionStatusCode::PartitionBuilding as i32,
|
||||
name: "PartitionBuilding".to_string(),
|
||||
}),
|
||||
want_ids: vec![], // Populated by BuildState
|
||||
job_run_ids: vec![p.state.job_run_id.clone()],
|
||||
taint_ids: vec![],
|
||||
last_updated_timestamp: None,
|
||||
uuid: p.uuid.to_string(),
|
||||
built_by_job_run_id: None,
|
||||
downstream_partition_uuids: vec![], // Populated by BuildState
|
||||
},
|
||||
Partition::UpstreamBuilding(p) => PartitionDetail {
|
||||
r#ref: Some(p.partition_ref.clone()),
|
||||
status: Some(PartitionStatus {
|
||||
code: PartitionStatusCode::PartitionBuilding as i32, // Use Building status for UpstreamBuilding
|
||||
name: "PartitionUpstreamBuilding".to_string(),
|
||||
}),
|
||||
want_ids: vec![], // Populated by BuildState
|
||||
job_run_ids: vec![p.state.job_run_id.clone()],
|
||||
taint_ids: vec![],
|
||||
last_updated_timestamp: None,
|
||||
uuid: p.uuid.to_string(),
|
||||
built_by_job_run_id: None,
|
||||
downstream_partition_uuids: vec![], // Populated by BuildState
|
||||
},
|
||||
Partition::UpForRetry(p) => PartitionDetail {
|
||||
r#ref: Some(p.partition_ref.clone()),
|
||||
status: Some(PartitionStatus {
|
||||
code: PartitionStatusCode::PartitionBuilding as i32, // Still "building" conceptually
|
||||
name: "PartitionUpForRetry".to_string(),
|
||||
}),
|
||||
want_ids: vec![], // Populated by BuildState
|
||||
job_run_ids: vec![p.state.original_job_run_id.clone()],
|
||||
taint_ids: vec![],
|
||||
last_updated_timestamp: None,
|
||||
uuid: p.uuid.to_string(),
|
||||
built_by_job_run_id: None,
|
||||
downstream_partition_uuids: vec![], // Populated by BuildState
|
||||
},
|
||||
Partition::Live(p) => PartitionDetail {
|
||||
r#ref: Some(p.partition_ref.clone()),
|
||||
status: Some(PartitionStatus {
|
||||
code: PartitionStatusCode::PartitionLive as i32,
|
||||
name: "PartitionLive".to_string(),
|
||||
}),
|
||||
want_ids: vec![], // Populated by BuildState
|
||||
job_run_ids: vec![p.state.built_by.clone()],
|
||||
taint_ids: vec![],
|
||||
last_updated_timestamp: Some(p.state.built_at),
|
||||
uuid: p.uuid.to_string(),
|
||||
built_by_job_run_id: Some(p.state.built_by.clone()),
|
||||
downstream_partition_uuids: vec![], // Populated by BuildState
|
||||
},
|
||||
Partition::Failed(p) => PartitionDetail {
|
||||
r#ref: Some(p.partition_ref.clone()),
|
||||
status: Some(PartitionStatus {
|
||||
code: PartitionStatusCode::PartitionFailed as i32,
|
||||
name: "PartitionFailed".to_string(),
|
||||
}),
|
||||
want_ids: vec![], // Populated by BuildState
|
||||
job_run_ids: vec![p.state.failed_by.clone()],
|
||||
taint_ids: vec![],
|
||||
last_updated_timestamp: Some(p.state.failed_at),
|
||||
uuid: p.uuid.to_string(),
|
||||
built_by_job_run_id: None,
|
||||
downstream_partition_uuids: vec![], // Populated by BuildState
|
||||
},
|
||||
Partition::UpstreamFailed(p) => PartitionDetail {
|
||||
r#ref: Some(p.partition_ref.clone()),
|
||||
status: Some(PartitionStatus {
|
||||
code: PartitionStatusCode::PartitionFailed as i32, // Use Failed status
|
||||
name: "PartitionUpstreamFailed".to_string(),
|
||||
}),
|
||||
want_ids: vec![], // Populated by BuildState
|
||||
job_run_ids: vec![],
|
||||
taint_ids: vec![],
|
||||
last_updated_timestamp: Some(p.state.failed_at),
|
||||
uuid: p.uuid.to_string(),
|
||||
built_by_job_run_id: None,
|
||||
downstream_partition_uuids: vec![], // Populated by BuildState
|
||||
},
|
||||
Partition::Tainted(p) => PartitionDetail {
|
||||
r#ref: Some(p.partition_ref.clone()),
|
||||
status: Some(PartitionStatus {
|
||||
code: PartitionStatusCode::PartitionTainted as i32,
|
||||
name: "PartitionTainted".to_string(),
|
||||
}),
|
||||
want_ids: vec![], // Populated by BuildState
|
||||
job_run_ids: vec![p.state.built_by.clone()],
|
||||
taint_ids: p.state.taint_ids.clone(),
|
||||
last_updated_timestamp: Some(p.state.tainted_at),
|
||||
uuid: p.uuid.to_string(),
|
||||
built_by_job_run_id: Some(p.state.built_by.clone()),
|
||||
downstream_partition_uuids: vec![], // Populated by BuildState
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_derive_partition_uuid_deterministic() {
|
||||
let uuid1 = derive_partition_uuid("job-123", "data/beta");
|
||||
let uuid2 = derive_partition_uuid("job-123", "data/beta");
|
||||
assert_eq!(uuid1, uuid2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_derive_partition_uuid_different_inputs() {
|
||||
let uuid1 = derive_partition_uuid("job-123", "data/beta");
|
||||
let uuid2 = derive_partition_uuid("job-456", "data/beta");
|
||||
let uuid3 = derive_partition_uuid("job-123", "data/alpha");
|
||||
assert_ne!(uuid1, uuid2);
|
||||
assert_ne!(uuid1, uuid3);
|
||||
assert_ne!(uuid2, uuid3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_partition_building_transitions() {
|
||||
let partition = PartitionWithState::<BuildingState>::new(
|
||||
"job-123".to_string(),
|
||||
PartitionRef {
|
||||
r#ref: "data/beta".to_string(),
|
||||
},
|
||||
);
|
||||
|
||||
// Can transition to Live
|
||||
let live = partition.clone().complete(1000);
|
||||
assert_eq!(live.state.built_at, 1000);
|
||||
assert_eq!(live.state.built_by, "job-123");
|
||||
|
||||
// Can transition to Failed
|
||||
let failed = partition.clone().fail(2000);
|
||||
assert_eq!(failed.state.failed_at, 2000);
|
||||
assert_eq!(failed.state.failed_by, "job-123");
|
||||
|
||||
// Can transition to UpstreamBuilding (dep miss)
|
||||
let upstream_building = partition.dep_miss(vec![PartitionRef {
|
||||
r#ref: "data/alpha".to_string(),
|
||||
}]);
|
||||
assert_eq!(upstream_building.state.missing_deps.len(), 1);
|
||||
assert_eq!(upstream_building.state.missing_deps[0].r#ref, "data/alpha");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_upstream_building_transitions() {
|
||||
let building = PartitionWithState::<BuildingState>::new(
|
||||
"job-123".to_string(),
|
||||
PartitionRef {
|
||||
r#ref: "data/beta".to_string(),
|
||||
},
|
||||
);
|
||||
let upstream_building = building.dep_miss(vec![PartitionRef {
|
||||
r#ref: "data/alpha".to_string(),
|
||||
}]);
|
||||
|
||||
// Can transition to UpForRetry
|
||||
let up_for_retry = upstream_building.clone().upstreams_satisfied();
|
||||
assert_eq!(up_for_retry.state.original_job_run_id, "job-123");
|
||||
|
||||
// Can transition to UpstreamFailed
|
||||
let upstream_failed = upstream_building.upstream_failed(
|
||||
vec![PartitionRef {
|
||||
r#ref: "data/alpha".to_string(),
|
||||
}],
|
||||
3000,
|
||||
);
|
||||
assert_eq!(upstream_failed.state.failed_at, 3000);
|
||||
assert_eq!(upstream_failed.state.failed_upstream_refs.len(), 1);
|
||||
assert_eq!(
|
||||
upstream_failed.state.failed_upstream_refs[0].r#ref,
|
||||
"data/alpha"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
@ -1,409 +0,0 @@
|
|||
use crate::*;
|
||||
use crate::event_log::{BuildEventLogError, Result};
|
||||
use crate::event_log::query_engine::BELQueryEngine;
|
||||
use crate::{BuildDetailResponse, BuildTimelineEvent as ServiceBuildTimelineEvent};
|
||||
use std::sync::Arc;
|
||||
// use std::collections::HashMap; // Commented out since not used with new query engine
|
||||
use serde::Serialize;
|
||||
|
||||
/// Repository for querying build data from the build event log
|
||||
pub struct BuildsRepository {
|
||||
query_engine: Arc<BELQueryEngine>,
|
||||
}
|
||||
|
||||
/// Summary of a build request and its current status
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct BuildInfo {
|
||||
pub build_request_id: String,
|
||||
pub status: BuildRequestStatus,
|
||||
pub requested_partitions: Vec<PartitionRef>,
|
||||
pub requested_at: i64,
|
||||
pub started_at: Option<i64>,
|
||||
pub completed_at: Option<i64>,
|
||||
pub duration_ms: Option<i64>,
|
||||
pub total_jobs: usize,
|
||||
pub completed_jobs: usize,
|
||||
pub failed_jobs: usize,
|
||||
pub cancelled_jobs: usize,
|
||||
pub cancelled: bool,
|
||||
pub cancel_reason: Option<String>,
|
||||
}
|
||||
|
||||
/// Detailed timeline of a build's execution events
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct BuildEvent {
|
||||
pub timestamp: i64,
|
||||
pub event_type: String,
|
||||
pub status: Option<BuildRequestStatus>,
|
||||
pub message: String,
|
||||
pub cancel_reason: Option<String>,
|
||||
}
|
||||
|
||||
impl BuildsRepository {
|
||||
/// Create a new BuildsRepository
|
||||
pub fn new(query_engine: Arc<BELQueryEngine>) -> Self {
|
||||
Self { query_engine }
|
||||
}
|
||||
|
||||
/// List all builds with their current status
|
||||
///
|
||||
/// Returns a list of all build requests that have been made,
|
||||
/// including their current status and execution details.
|
||||
pub async fn list(&self, limit: Option<usize>) -> Result<Vec<BuildInfo>> {
|
||||
// Use query engine to list builds with the protobuf request format
|
||||
let request = BuildsListRequest {
|
||||
limit: limit.map(|l| l as u32),
|
||||
offset: Some(0),
|
||||
status_filter: None,
|
||||
};
|
||||
let response = self.query_engine.list_build_requests(request).await?;
|
||||
|
||||
// Convert from protobuf BuildSummary to repository BuildInfo
|
||||
let builds = response.builds.into_iter().map(|build| {
|
||||
BuildInfo {
|
||||
build_request_id: build.build_request_id,
|
||||
status: BuildRequestStatus::try_from(build.status_code).unwrap_or(BuildRequestStatus::BuildRequestUnknown),
|
||||
requested_partitions: build.requested_partitions,
|
||||
requested_at: build.requested_at,
|
||||
started_at: build.started_at,
|
||||
completed_at: build.completed_at,
|
||||
duration_ms: build.duration_ms,
|
||||
total_jobs: build.total_jobs as usize,
|
||||
completed_jobs: build.completed_jobs as usize,
|
||||
failed_jobs: build.failed_jobs as usize,
|
||||
cancelled_jobs: build.cancelled_jobs as usize,
|
||||
cancelled: build.cancelled,
|
||||
cancel_reason: None, // TODO: Add cancel reason to BuildSummary if needed
|
||||
}
|
||||
}).collect();
|
||||
|
||||
Ok(builds)
|
||||
}
|
||||
|
||||
/// Show detailed information about a specific build
|
||||
///
|
||||
/// Returns the complete timeline of events for the specified build,
|
||||
/// including all status changes and any cancellation events.
|
||||
pub async fn show(&self, build_request_id: &str) -> Result<Option<(BuildInfo, Vec<BuildEvent>)>> {
|
||||
// Use query engine to get build summary
|
||||
let summary_result = self.query_engine.get_build_request_summary(build_request_id).await;
|
||||
|
||||
match summary_result {
|
||||
Ok(summary) => {
|
||||
// Convert BuildRequestSummary to BuildInfo
|
||||
let build_info = BuildInfo {
|
||||
build_request_id: summary.build_request_id,
|
||||
status: summary.status,
|
||||
requested_partitions: summary.requested_partitions.into_iter()
|
||||
.map(|s| PartitionRef { str: s })
|
||||
.collect(),
|
||||
requested_at: summary.created_at,
|
||||
started_at: None, // TODO: Track started_at in query engine
|
||||
completed_at: Some(summary.updated_at),
|
||||
duration_ms: None, // TODO: Calculate duration in query engine
|
||||
total_jobs: 0, // TODO: Implement job counting in query engine
|
||||
completed_jobs: 0,
|
||||
failed_jobs: 0,
|
||||
cancelled_jobs: 0,
|
||||
cancelled: false, // TODO: Track cancellation in query engine
|
||||
cancel_reason: None,
|
||||
};
|
||||
|
||||
// Get all events for this build to create a proper timeline
|
||||
let all_events = self.query_engine.get_build_request_events(build_request_id, None).await?;
|
||||
|
||||
// Create timeline from build request events
|
||||
let mut timeline = Vec::new();
|
||||
for event in all_events {
|
||||
if let Some(crate::build_event::EventType::BuildRequestEvent(br_event)) = &event.event_type {
|
||||
if let Ok(status) = BuildRequestStatus::try_from(br_event.status_code) {
|
||||
timeline.push(BuildEvent {
|
||||
timestamp: event.timestamp,
|
||||
event_type: "build_status".to_string(),
|
||||
status: Some(status),
|
||||
message: br_event.message.clone(),
|
||||
cancel_reason: None,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Sort timeline by timestamp
|
||||
timeline.sort_by_key(|e| e.timestamp);
|
||||
|
||||
Ok(Some((build_info, timeline)))
|
||||
}
|
||||
Err(_) => {
|
||||
// Build not found
|
||||
Ok(None)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Show detailed information about a specific build using protobuf response format
|
||||
///
|
||||
/// Returns the complete build details with dual status fields and timeline events.
|
||||
pub async fn show_protobuf(&self, build_request_id: &str) -> Result<Option<BuildDetailResponse>> {
|
||||
// Get build info and timeline using existing show method
|
||||
if let Some((build_info, timeline)) = self.show(build_request_id).await? {
|
||||
// Convert timeline events to protobuf format
|
||||
let protobuf_timeline: Vec<ServiceBuildTimelineEvent> = timeline
|
||||
.into_iter()
|
||||
.map(|event| ServiceBuildTimelineEvent {
|
||||
timestamp: event.timestamp,
|
||||
status_code: event.status.map(|s| s as i32),
|
||||
status_name: event.status.map(|s| s.to_display_string()),
|
||||
message: event.message,
|
||||
event_type: event.event_type,
|
||||
cancel_reason: event.cancel_reason,
|
||||
})
|
||||
.collect();
|
||||
|
||||
let response = BuildDetailResponse {
|
||||
build_request_id: build_info.build_request_id,
|
||||
status_code: build_info.status as i32,
|
||||
status_name: build_info.status.to_display_string(),
|
||||
requested_partitions: build_info.requested_partitions,
|
||||
total_jobs: build_info.total_jobs as u32,
|
||||
completed_jobs: build_info.completed_jobs as u32,
|
||||
failed_jobs: build_info.failed_jobs as u32,
|
||||
cancelled_jobs: build_info.cancelled_jobs as u32,
|
||||
requested_at: build_info.requested_at,
|
||||
started_at: build_info.started_at,
|
||||
completed_at: build_info.completed_at,
|
||||
duration_ms: build_info.duration_ms,
|
||||
cancelled: build_info.cancelled,
|
||||
cancel_reason: build_info.cancel_reason,
|
||||
timeline: protobuf_timeline,
|
||||
};
|
||||
|
||||
Ok(Some(response))
|
||||
} else {
|
||||
Ok(None)
|
||||
}
|
||||
}
|
||||
|
||||
/// Cancel a build with a reason
|
||||
///
|
||||
/// This method uses the EventWriter to write a build cancellation event.
|
||||
/// It validates that the build exists and is in a cancellable state.
|
||||
pub async fn cancel(&self, build_request_id: &str, _reason: String) -> Result<()> {
|
||||
// First check if the build exists and get its current status
|
||||
let build_info = self.show(build_request_id).await?;
|
||||
|
||||
if build_info.is_none() {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Cannot cancel non-existent build: {}", build_request_id)
|
||||
));
|
||||
}
|
||||
|
||||
let (build, _timeline) = build_info.unwrap();
|
||||
|
||||
// Check if build is in a cancellable state
|
||||
match build.status {
|
||||
BuildRequestStatus::BuildRequestCompleted => {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Cannot cancel completed build: {}", build_request_id)
|
||||
));
|
||||
}
|
||||
BuildRequestStatus::BuildRequestFailed => {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Cannot cancel failed build: {}", build_request_id)
|
||||
));
|
||||
}
|
||||
BuildRequestStatus::BuildRequestCancelled => {
|
||||
return Err(BuildEventLogError::QueryError(
|
||||
format!("Build already cancelled: {}", build_request_id)
|
||||
));
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
|
||||
// Create a build cancellation event
|
||||
use crate::event_log::{create_build_event, current_timestamp_nanos, generate_event_id};
|
||||
|
||||
let cancel_event = create_build_event(
|
||||
build_request_id.to_string(),
|
||||
crate::build_event::EventType::BuildRequestEvent(crate::BuildRequestEvent {
|
||||
status_code: BuildRequestStatus::BuildRequestCancelled as i32,
|
||||
status_name: BuildRequestStatus::BuildRequestCancelled.to_display_string(),
|
||||
requested_partitions: build.requested_partitions,
|
||||
message: format!("Build cancelled"),
|
||||
})
|
||||
);
|
||||
|
||||
// Append the cancellation event
|
||||
self.query_engine.append_event(cancel_event).await?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// List builds using protobuf response format with dual status fields
|
||||
///
|
||||
/// Returns BuildSummary protobuf messages with status_code and status_name.
|
||||
pub async fn list_protobuf(&self, limit: Option<usize>) -> Result<Vec<crate::BuildSummary>> {
|
||||
// Get build info using existing list method
|
||||
let builds = self.list(limit).await?;
|
||||
|
||||
// Convert to protobuf format
|
||||
let protobuf_builds: Vec<crate::BuildSummary> = builds
|
||||
.into_iter()
|
||||
.map(|build| crate::BuildSummary {
|
||||
build_request_id: build.build_request_id,
|
||||
status_code: build.status as i32,
|
||||
status_name: build.status.to_display_string(),
|
||||
requested_partitions: build.requested_partitions.into_iter().map(|p| crate::PartitionRef { str: p.str }).collect(),
|
||||
total_jobs: build.total_jobs as u32,
|
||||
completed_jobs: build.completed_jobs as u32,
|
||||
failed_jobs: build.failed_jobs as u32,
|
||||
cancelled_jobs: build.cancelled_jobs as u32,
|
||||
requested_at: build.requested_at,
|
||||
started_at: build.started_at,
|
||||
completed_at: build.completed_at,
|
||||
duration_ms: build.duration_ms,
|
||||
cancelled: build.cancelled,
|
||||
})
|
||||
.collect();
|
||||
|
||||
Ok(protobuf_builds)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::event_log::mock::{create_mock_bel_query_engine, create_mock_bel_query_engine_with_events, test_events};
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_builds_repository_list_empty() {
|
||||
let query_engine = create_mock_bel_query_engine().await.unwrap();
|
||||
let repo = BuildsRepository::new(query_engine);
|
||||
|
||||
let builds = repo.list(None).await.unwrap();
|
||||
assert!(builds.is_empty());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_builds_repository_list_with_data() {
|
||||
let build_id1 = "build-123".to_string();
|
||||
let build_id2 = "build-456".to_string();
|
||||
let partition1 = PartitionRef { str: "data/users".to_string() };
|
||||
let partition2 = PartitionRef { str: "data/orders".to_string() };
|
||||
|
||||
// Create events for multiple builds
|
||||
let events = vec![
|
||||
test_events::build_request_event(Some(build_id1.clone()), vec![partition1.clone()], BuildRequestStatus::BuildRequestReceived),
|
||||
test_events::build_request_event(Some(build_id1.clone()), vec![partition1.clone()], BuildRequestStatus::BuildRequestCompleted),
|
||||
test_events::build_request_event(Some(build_id2.clone()), vec![partition2.clone()], BuildRequestStatus::BuildRequestReceived),
|
||||
test_events::build_request_event(Some(build_id2.clone()), vec![partition2.clone()], BuildRequestStatus::BuildRequestFailed),
|
||||
];
|
||||
|
||||
let query_engine = create_mock_bel_query_engine_with_events(events).await.unwrap();
|
||||
let repo = BuildsRepository::new(query_engine);
|
||||
|
||||
let builds = repo.list(None).await.unwrap();
|
||||
assert_eq!(builds.len(), 2);
|
||||
|
||||
// Find builds by id
|
||||
let build1 = builds.iter().find(|b| b.build_request_id == build_id1).unwrap();
|
||||
let build2 = builds.iter().find(|b| b.build_request_id == build_id2).unwrap();
|
||||
|
||||
assert_eq!(build1.status, BuildRequestStatus::BuildRequestCompleted);
|
||||
assert_eq!(build1.requested_partitions.len(), 1);
|
||||
assert!(!build1.cancelled);
|
||||
|
||||
assert_eq!(build2.status, BuildRequestStatus::BuildRequestFailed);
|
||||
assert_eq!(build2.requested_partitions.len(), 1);
|
||||
assert!(!build2.cancelled);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_builds_repository_show() {
|
||||
let build_id = "build-789".to_string();
|
||||
let partition = PartitionRef { str: "analytics/daily".to_string() };
|
||||
|
||||
let events = vec![
|
||||
test_events::build_request_event(Some(build_id.clone()), vec![partition.clone()], BuildRequestStatus::BuildRequestReceived),
|
||||
test_events::build_request_event(Some(build_id.clone()), vec![partition.clone()], BuildRequestStatus::BuildRequestPlanning),
|
||||
test_events::build_request_event(Some(build_id.clone()), vec![partition.clone()], BuildRequestStatus::BuildRequestExecuting),
|
||||
test_events::build_request_event(Some(build_id.clone()), vec![partition.clone()], BuildRequestStatus::BuildRequestCompleted),
|
||||
];
|
||||
|
||||
let query_engine = create_mock_bel_query_engine_with_events(events).await.unwrap();
|
||||
let repo = BuildsRepository::new(query_engine);
|
||||
|
||||
let result = repo.show(&build_id).await.unwrap();
|
||||
assert!(result.is_some());
|
||||
|
||||
let (info, timeline) = result.unwrap();
|
||||
assert_eq!(info.build_request_id, build_id);
|
||||
assert_eq!(info.status, BuildRequestStatus::BuildRequestCompleted);
|
||||
assert!(!info.cancelled);
|
||||
|
||||
assert_eq!(timeline.len(), 4);
|
||||
assert_eq!(timeline[0].status, Some(BuildRequestStatus::BuildRequestReceived));
|
||||
assert_eq!(timeline[1].status, Some(BuildRequestStatus::BuildRequestPlanning));
|
||||
assert_eq!(timeline[2].status, Some(BuildRequestStatus::BuildRequestExecuting));
|
||||
assert_eq!(timeline[3].status, Some(BuildRequestStatus::BuildRequestCompleted));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_builds_repository_show_nonexistent() {
|
||||
let query_engine = create_mock_bel_query_engine().await.unwrap();
|
||||
let repo = BuildsRepository::new(query_engine);
|
||||
|
||||
let result = repo.show("nonexistent-build").await.unwrap();
|
||||
assert!(result.is_none());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_builds_repository_cancel() {
|
||||
let build_id = "build-cancel-test".to_string();
|
||||
let partition = PartitionRef { str: "test/data".to_string() };
|
||||
|
||||
// Start with a running build
|
||||
let events = vec![
|
||||
test_events::build_request_event(Some(build_id.clone()), vec![partition.clone()], BuildRequestStatus::BuildRequestReceived),
|
||||
test_events::build_request_event(Some(build_id.clone()), vec![partition.clone()], BuildRequestStatus::BuildRequestExecuting),
|
||||
];
|
||||
|
||||
let query_engine = create_mock_bel_query_engine_with_events(events).await.unwrap();
|
||||
let repo = BuildsRepository::new(query_engine.clone());
|
||||
|
||||
// Cancel the build
|
||||
repo.cancel(&build_id, "User requested cancellation".to_string()).await.unwrap();
|
||||
|
||||
// Verify the cancellation was recorded
|
||||
// Note: This test demonstrates the pattern, but the MockBELStorage would need
|
||||
// to be enhanced to properly store build cancel events for full verification
|
||||
|
||||
// Try to cancel a non-existent build
|
||||
let result = repo.cancel("nonexistent-build", "Should fail".to_string()).await;
|
||||
assert!(result.is_err());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_builds_repository_cancel_completed_build() {
|
||||
let build_id = "completed-build".to_string();
|
||||
let partition = PartitionRef { str: "test/data".to_string() };
|
||||
|
||||
// Create a completed build
|
||||
let events = vec![
|
||||
test_events::build_request_event(Some(build_id.clone()), vec![partition.clone()], BuildRequestStatus::BuildRequestReceived),
|
||||
test_events::build_request_event(Some(build_id.clone()), vec![partition.clone()], BuildRequestStatus::BuildRequestCompleted),
|
||||
];
|
||||
|
||||
let query_engine = create_mock_bel_query_engine_with_events(events).await.unwrap();
|
||||
let repo = BuildsRepository::new(query_engine);
|
||||
|
||||
// Try to cancel the completed build - should fail
|
||||
let result = repo.cancel(&build_id, "Should fail".to_string()).await;
|
||||
assert!(result.is_err());
|
||||
|
||||
if let Err(BuildEventLogError::QueryError(msg)) = result {
|
||||
assert!(msg.contains("Cannot cancel completed build"));
|
||||
} else {
|
||||
panic!("Expected QueryError for completed build cancellation");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,499 +0,0 @@
|
|||
use crate::*;
|
||||
use crate::event_log::{BuildEventLogError, Result};
|
||||
use crate::event_log::query_engine::BELQueryEngine;
|
||||
use crate::{JobDetailResponse, JobRunDetail as ServiceJobRunDetail};
|
||||
use std::sync::Arc;
|
||||
use std::collections::HashMap;
|
||||
use serde::Serialize;
|
||||
|
||||
/// Repository for querying job data from the build event log
|
||||
pub struct JobsRepository {
|
||||
query_engine: Arc<BELQueryEngine>,
|
||||
}
|
||||
|
||||
/// Summary of a job's execution history and statistics
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct JobInfo {
|
||||
pub job_label: String,
|
||||
pub total_runs: usize,
|
||||
pub successful_runs: usize,
|
||||
pub failed_runs: usize,
|
||||
pub cancelled_runs: usize,
|
||||
pub last_run_timestamp: i64,
|
||||
pub last_run_status: JobStatus,
|
||||
pub average_partitions_per_run: f64,
|
||||
pub recent_builds: Vec<String>, // Build request IDs that used this job
|
||||
}
|
||||
|
||||
/// Detailed information about a specific job execution
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct JobRunDetail {
|
||||
pub job_run_id: String,
|
||||
pub job_label: String,
|
||||
pub build_request_id: String,
|
||||
pub target_partitions: Vec<PartitionRef>,
|
||||
pub status: JobStatus,
|
||||
pub scheduled_at: i64,
|
||||
pub started_at: Option<i64>,
|
||||
pub completed_at: Option<i64>,
|
||||
pub duration_ms: Option<i64>,
|
||||
pub message: String,
|
||||
pub config: Option<JobConfig>,
|
||||
pub manifests: Vec<PartitionManifest>,
|
||||
}
|
||||
|
||||
impl JobsRepository {
|
||||
/// Create a new JobsRepository
|
||||
pub fn new(query_engine: Arc<BELQueryEngine>) -> Self {
|
||||
Self { query_engine }
|
||||
}
|
||||
|
||||
/// List all jobs with their execution statistics
|
||||
///
|
||||
/// Returns a summary of all jobs that have been executed, including
|
||||
/// success/failure statistics and recent activity.
|
||||
pub async fn list(&self, limit: Option<usize>) -> Result<Vec<JobInfo>> {
|
||||
// Get all job events from the event log
|
||||
let events = self.query_engine.get_events_in_range(0, i64::MAX).await?;
|
||||
|
||||
let mut job_data: HashMap<String, Vec<JobRunDetail>> = HashMap::new();
|
||||
|
||||
// Collect all job events and group by job label
|
||||
for event in events {
|
||||
if let Some(build_event::EventType::JobEvent(j_event)) = &event.event_type {
|
||||
let job_label = j_event.job_label.as_ref()
|
||||
.map(|l| l.label.clone())
|
||||
.unwrap_or_else(|| "unknown".to_string());
|
||||
|
||||
let status = match j_event.status_code {
|
||||
1 => JobStatus::JobScheduled,
|
||||
2 => JobStatus::JobRunning,
|
||||
3 => JobStatus::JobCompleted,
|
||||
4 => JobStatus::JobFailed,
|
||||
5 => JobStatus::JobCancelled,
|
||||
6 => JobStatus::JobSkipped,
|
||||
_ => JobStatus::JobUnknown,
|
||||
};
|
||||
|
||||
// Create or update job run detail
|
||||
let job_runs = job_data.entry(job_label.clone()).or_insert_with(Vec::new);
|
||||
|
||||
// Find existing run or create new one
|
||||
if let Some(existing_run) = job_runs.iter_mut().find(|r| r.job_run_id == j_event.job_run_id) {
|
||||
// Update existing run with new status
|
||||
existing_run.status = status;
|
||||
existing_run.message = j_event.message.clone();
|
||||
|
||||
match status {
|
||||
JobStatus::JobRunning => {
|
||||
existing_run.started_at = Some(event.timestamp);
|
||||
}
|
||||
JobStatus::JobCompleted | JobStatus::JobFailed | JobStatus::JobCancelled => {
|
||||
existing_run.completed_at = Some(event.timestamp);
|
||||
if let Some(started) = existing_run.started_at {
|
||||
existing_run.duration_ms = Some((event.timestamp - started) / 1_000_000); // Convert to ms
|
||||
}
|
||||
existing_run.manifests = j_event.manifests.clone();
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
} else {
|
||||
// Create new job run
|
||||
let job_run = JobRunDetail {
|
||||
job_run_id: j_event.job_run_id.clone(),
|
||||
job_label: job_label.clone(),
|
||||
build_request_id: event.build_request_id.clone(),
|
||||
target_partitions: j_event.target_partitions.clone(),
|
||||
status,
|
||||
scheduled_at: event.timestamp,
|
||||
started_at: if status == JobStatus::JobRunning { Some(event.timestamp) } else { None },
|
||||
completed_at: None,
|
||||
duration_ms: None,
|
||||
message: j_event.message.clone(),
|
||||
config: j_event.config.clone(),
|
||||
manifests: j_event.manifests.clone(),
|
||||
};
|
||||
job_runs.push(job_run);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Convert to JobInfo structs with statistics
|
||||
let mut job_infos: Vec<JobInfo> = job_data.into_iter()
|
||||
.map(|(job_label, job_runs)| {
|
||||
let total_runs = job_runs.len();
|
||||
let successful_runs = job_runs.iter().filter(|r| r.status == JobStatus::JobCompleted).count();
|
||||
let failed_runs = job_runs.iter().filter(|r| r.status == JobStatus::JobFailed).count();
|
||||
let cancelled_runs = job_runs.iter().filter(|r| r.status == JobStatus::JobCancelled).count();
|
||||
|
||||
let (last_run_timestamp, last_run_status) = job_runs.iter()
|
||||
.max_by_key(|r| r.scheduled_at)
|
||||
.map(|r| (r.scheduled_at, r.status.clone()))
|
||||
.unwrap_or((0, JobStatus::JobUnknown));
|
||||
|
||||
let total_partitions: usize = job_runs.iter()
|
||||
.map(|r| r.target_partitions.len())
|
||||
.sum();
|
||||
let average_partitions_per_run = if total_runs > 0 {
|
||||
total_partitions as f64 / total_runs as f64
|
||||
} else {
|
||||
0.0
|
||||
};
|
||||
|
||||
// Get recent unique build request IDs
|
||||
let mut recent_builds: Vec<String> = job_runs.iter()
|
||||
.map(|r| r.build_request_id.clone())
|
||||
.collect::<std::collections::HashSet<_>>()
|
||||
.into_iter()
|
||||
.collect();
|
||||
recent_builds.sort();
|
||||
recent_builds.truncate(10); // Keep last 10 builds
|
||||
|
||||
JobInfo {
|
||||
job_label,
|
||||
total_runs,
|
||||
successful_runs,
|
||||
failed_runs,
|
||||
cancelled_runs,
|
||||
last_run_timestamp,
|
||||
last_run_status,
|
||||
average_partitions_per_run,
|
||||
recent_builds,
|
||||
}
|
||||
})
|
||||
.collect();
|
||||
|
||||
// Sort by last run timestamp (most recent first)
|
||||
job_infos.sort_by(|a, b| b.last_run_timestamp.cmp(&a.last_run_timestamp));
|
||||
|
||||
// Apply limit if specified
|
||||
if let Some(limit) = limit {
|
||||
job_infos.truncate(limit);
|
||||
}
|
||||
|
||||
Ok(job_infos)
|
||||
}
|
||||
|
||||
/// Show detailed information about a specific job
|
||||
///
|
||||
/// Returns all execution runs for the specified job label, including
|
||||
/// detailed timing, status, and output information.
|
||||
pub async fn show(&self, job_label: &str) -> Result<Option<(JobInfo, Vec<JobRunDetail>)>> {
|
||||
// Get all job events for this specific job
|
||||
let events = self.query_engine.get_events_in_range(0, i64::MAX).await?;
|
||||
|
||||
let mut job_runs: Vec<JobRunDetail> = Vec::new();
|
||||
|
||||
// Collect all job events for this job label
|
||||
for event in events {
|
||||
if let Some(build_event::EventType::JobEvent(j_event)) = &event.event_type {
|
||||
let event_job_label = j_event.job_label.as_ref()
|
||||
.map(|l| l.label.clone())
|
||||
.unwrap_or_else(|| "unknown".to_string());
|
||||
|
||||
if event_job_label != job_label {
|
||||
continue;
|
||||
}
|
||||
|
||||
let status = match j_event.status_code {
|
||||
1 => JobStatus::JobScheduled,
|
||||
2 => JobStatus::JobRunning,
|
||||
3 => JobStatus::JobCompleted,
|
||||
4 => JobStatus::JobFailed,
|
||||
5 => JobStatus::JobCancelled,
|
||||
6 => JobStatus::JobSkipped,
|
||||
_ => JobStatus::JobUnknown,
|
||||
};
|
||||
|
||||
// Find existing run or create new one
|
||||
if let Some(existing_run) = job_runs.iter_mut().find(|r| r.job_run_id == j_event.job_run_id) {
|
||||
// Update existing run with new status
|
||||
existing_run.status = status;
|
||||
existing_run.message = j_event.message.clone();
|
||||
|
||||
match status {
|
||||
JobStatus::JobRunning => {
|
||||
existing_run.started_at = Some(event.timestamp);
|
||||
}
|
||||
JobStatus::JobCompleted | JobStatus::JobFailed | JobStatus::JobCancelled => {
|
||||
existing_run.completed_at = Some(event.timestamp);
|
||||
if let Some(started) = existing_run.started_at {
|
||||
existing_run.duration_ms = Some((event.timestamp - started) / 1_000_000); // Convert to ms
|
||||
}
|
||||
existing_run.manifests = j_event.manifests.clone();
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
} else {
|
||||
// Create new job run
|
||||
let job_run = JobRunDetail {
|
||||
job_run_id: j_event.job_run_id.clone(),
|
||||
job_label: job_label.to_string(),
|
||||
build_request_id: event.build_request_id.clone(),
|
||||
target_partitions: j_event.target_partitions.clone(),
|
||||
status,
|
||||
scheduled_at: event.timestamp,
|
||||
started_at: if status == JobStatus::JobRunning { Some(event.timestamp) } else { None },
|
||||
completed_at: None,
|
||||
duration_ms: None,
|
||||
message: j_event.message.clone(),
|
||||
config: j_event.config.clone(),
|
||||
manifests: j_event.manifests.clone(),
|
||||
};
|
||||
job_runs.push(job_run);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if job_runs.is_empty() {
|
||||
return Ok(None);
|
||||
}
|
||||
|
||||
// Sort runs by scheduled time (most recent first)
|
||||
job_runs.sort_by(|a, b| b.scheduled_at.cmp(&a.scheduled_at));
|
||||
|
||||
// Calculate job statistics
|
||||
let total_runs = job_runs.len();
|
||||
let successful_runs = job_runs.iter().filter(|r| r.status == JobStatus::JobCompleted).count();
|
||||
let failed_runs = job_runs.iter().filter(|r| r.status == JobStatus::JobFailed).count();
|
||||
let cancelled_runs = job_runs.iter().filter(|r| r.status == JobStatus::JobCancelled).count();
|
||||
|
||||
let (last_run_timestamp, last_run_status) = job_runs.iter()
|
||||
.max_by_key(|r| r.scheduled_at)
|
||||
.map(|r| (r.scheduled_at, r.status.clone()))
|
||||
.unwrap_or((0, JobStatus::JobUnknown));
|
||||
|
||||
let total_partitions: usize = job_runs.iter()
|
||||
.map(|r| r.target_partitions.len())
|
||||
.sum();
|
||||
let average_partitions_per_run = if total_runs > 0 {
|
||||
total_partitions as f64 / total_runs as f64
|
||||
} else {
|
||||
0.0
|
||||
};
|
||||
|
||||
// Get recent unique build request IDs
|
||||
let mut recent_builds: Vec<String> = job_runs.iter()
|
||||
.map(|r| r.build_request_id.clone())
|
||||
.collect::<std::collections::HashSet<_>>()
|
||||
.into_iter()
|
||||
.collect();
|
||||
recent_builds.sort();
|
||||
recent_builds.truncate(10); // Keep last 10 builds
|
||||
|
||||
let job_info = JobInfo {
|
||||
job_label: job_label.to_string(),
|
||||
total_runs,
|
||||
successful_runs,
|
||||
failed_runs,
|
||||
cancelled_runs,
|
||||
last_run_timestamp,
|
||||
last_run_status,
|
||||
average_partitions_per_run,
|
||||
recent_builds,
|
||||
};
|
||||
|
||||
Ok(Some((job_info, job_runs)))
|
||||
}
|
||||
|
||||
/// Show detailed information about a specific job using protobuf response format
|
||||
///
|
||||
/// Returns the complete job details with dual status fields and run details.
|
||||
pub async fn show_protobuf(&self, job_label: &str) -> Result<Option<JobDetailResponse>> {
|
||||
// Get job info and runs using existing show method
|
||||
if let Some((job_info, job_runs)) = self.show(job_label).await? {
|
||||
// Convert job runs to protobuf format
|
||||
let protobuf_runs: Vec<ServiceJobRunDetail> = job_runs
|
||||
.into_iter()
|
||||
.map(|run| ServiceJobRunDetail {
|
||||
job_run_id: run.job_run_id,
|
||||
build_request_id: run.build_request_id,
|
||||
target_partitions: run.target_partitions,
|
||||
status_code: run.status as i32,
|
||||
status_name: run.status.to_display_string(),
|
||||
started_at: run.started_at,
|
||||
completed_at: run.completed_at,
|
||||
duration_ms: run.duration_ms,
|
||||
message: run.message,
|
||||
})
|
||||
.collect();
|
||||
|
||||
let response = JobDetailResponse {
|
||||
job_label: job_info.job_label,
|
||||
total_runs: job_info.total_runs as u32,
|
||||
successful_runs: job_info.successful_runs as u32,
|
||||
failed_runs: job_info.failed_runs as u32,
|
||||
cancelled_runs: job_info.cancelled_runs as u32,
|
||||
average_partitions_per_run: job_info.average_partitions_per_run,
|
||||
last_run_timestamp: job_info.last_run_timestamp,
|
||||
last_run_status_code: job_info.last_run_status as i32,
|
||||
last_run_status_name: job_info.last_run_status.to_display_string(),
|
||||
recent_builds: job_info.recent_builds,
|
||||
runs: protobuf_runs,
|
||||
};
|
||||
|
||||
Ok(Some(response))
|
||||
} else {
|
||||
Ok(None)
|
||||
}
|
||||
}
|
||||
|
||||
/// List jobs using protobuf response format with dual status fields
|
||||
///
|
||||
/// Returns JobsListResponse protobuf message with JobSummary objects containing
|
||||
/// last_run_status_code and last_run_status_name fields.
|
||||
pub async fn list_protobuf(&self, request: JobsListRequest) -> Result<JobsListResponse> {
|
||||
// Get job info using existing list method
|
||||
let jobs = self.list(request.limit.map(|l| l as usize)).await?;
|
||||
|
||||
// Convert to protobuf format
|
||||
let protobuf_jobs: Vec<crate::JobSummary> = jobs
|
||||
.into_iter()
|
||||
.map(|job| crate::JobSummary {
|
||||
job_label: job.job_label,
|
||||
total_runs: job.total_runs as u32,
|
||||
successful_runs: job.successful_runs as u32,
|
||||
failed_runs: job.failed_runs as u32,
|
||||
cancelled_runs: job.cancelled_runs as u32,
|
||||
average_partitions_per_run: job.average_partitions_per_run,
|
||||
last_run_timestamp: job.last_run_timestamp,
|
||||
last_run_status_code: job.last_run_status as i32,
|
||||
last_run_status_name: job.last_run_status.to_display_string(),
|
||||
recent_builds: job.recent_builds,
|
||||
})
|
||||
.collect();
|
||||
|
||||
let total_count = protobuf_jobs.len() as u32;
|
||||
|
||||
Ok(JobsListResponse {
|
||||
jobs: protobuf_jobs,
|
||||
total_count,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::event_log::mock::{create_mock_bel_query_engine, create_mock_bel_query_engine_with_events, test_events};
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_jobs_repository_list_empty() {
|
||||
let query_engine = create_mock_bel_query_engine().await.unwrap();
|
||||
let repo = JobsRepository::new(query_engine);
|
||||
|
||||
let jobs = repo.list(None).await.unwrap();
|
||||
assert!(jobs.is_empty());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_jobs_repository_list_with_data() {
|
||||
let build_id = "test-build-123".to_string();
|
||||
let job_label1 = JobLabel { label: "//:process_data".to_string() };
|
||||
let job_label2 = JobLabel { label: "//:generate_reports".to_string() };
|
||||
let partition1 = PartitionRef { str: "data/users".to_string() };
|
||||
let partition2 = PartitionRef { str: "reports/summary".to_string() };
|
||||
|
||||
// Create events for multiple jobs
|
||||
let events = vec![
|
||||
test_events::job_event(Some(build_id.clone()), Some("job-run-1".to_string()), job_label1.clone(), vec![partition1.clone()], JobStatus::JobScheduled),
|
||||
test_events::job_event(Some(build_id.clone()), Some("job-run-1".to_string()), job_label1.clone(), vec![partition1.clone()], JobStatus::JobCompleted),
|
||||
test_events::job_event(Some(build_id.clone()), Some("job-run-2".to_string()), job_label2.clone(), vec![partition2.clone()], JobStatus::JobScheduled),
|
||||
test_events::job_event(Some(build_id.clone()), Some("job-run-2".to_string()), job_label2.clone(), vec![partition2.clone()], JobStatus::JobFailed),
|
||||
];
|
||||
|
||||
let query_engine = create_mock_bel_query_engine_with_events(events).await.unwrap();
|
||||
let repo = JobsRepository::new(query_engine);
|
||||
|
||||
let jobs = repo.list(None).await.unwrap();
|
||||
assert_eq!(jobs.len(), 2);
|
||||
|
||||
// Find jobs by label
|
||||
let process_job = jobs.iter().find(|j| j.job_label == "//:process_data").unwrap();
|
||||
let reports_job = jobs.iter().find(|j| j.job_label == "//:generate_reports").unwrap();
|
||||
|
||||
assert_eq!(process_job.total_runs, 1);
|
||||
assert_eq!(process_job.successful_runs, 1);
|
||||
assert_eq!(process_job.failed_runs, 0);
|
||||
assert_eq!(process_job.last_run_status, JobStatus::JobCompleted);
|
||||
|
||||
assert_eq!(reports_job.total_runs, 1);
|
||||
assert_eq!(reports_job.successful_runs, 0);
|
||||
assert_eq!(reports_job.failed_runs, 1);
|
||||
assert_eq!(reports_job.last_run_status, JobStatus::JobFailed);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_jobs_repository_show() {
|
||||
let build_id = "test-build-456".to_string();
|
||||
let job_label = JobLabel { label: "//:analytics_job".to_string() };
|
||||
let partition = PartitionRef { str: "analytics/daily".to_string() };
|
||||
|
||||
let events = vec![
|
||||
test_events::job_event(Some(build_id.clone()), Some("job-run-123".to_string()), job_label.clone(), vec![partition.clone()], JobStatus::JobScheduled),
|
||||
test_events::job_event(Some(build_id.clone()), Some("job-run-123".to_string()), job_label.clone(), vec![partition.clone()], JobStatus::JobRunning),
|
||||
test_events::job_event(Some(build_id.clone()), Some("job-run-123".to_string()), job_label.clone(), vec![partition.clone()], JobStatus::JobCompleted),
|
||||
];
|
||||
|
||||
let query_engine = create_mock_bel_query_engine_with_events(events).await.unwrap();
|
||||
let repo = JobsRepository::new(query_engine);
|
||||
|
||||
let result = repo.show(&job_label.label).await.unwrap();
|
||||
assert!(result.is_some());
|
||||
|
||||
let (info, runs) = result.unwrap();
|
||||
assert_eq!(info.job_label, "//:analytics_job");
|
||||
assert_eq!(info.total_runs, 1);
|
||||
assert_eq!(info.successful_runs, 1);
|
||||
assert_eq!(info.last_run_status, JobStatus::JobCompleted);
|
||||
|
||||
assert_eq!(runs.len(), 1);
|
||||
let run = &runs[0];
|
||||
assert_eq!(run.job_run_id, "job-run-123");
|
||||
assert_eq!(run.status, JobStatus::JobCompleted);
|
||||
assert_eq!(run.target_partitions.len(), 1);
|
||||
assert_eq!(run.target_partitions[0].str, "analytics/daily");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_jobs_repository_show_nonexistent() {
|
||||
let query_engine = create_mock_bel_query_engine().await.unwrap();
|
||||
let repo = JobsRepository::new(query_engine);
|
||||
|
||||
let result = repo.show("//:nonexistent_job").await.unwrap();
|
||||
assert!(result.is_none());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_jobs_repository_statistics() {
|
||||
let build_id = "test-build-789".to_string();
|
||||
let job_label = JobLabel { label: "//:batch_processor".to_string() };
|
||||
let partition = PartitionRef { str: "batch/data".to_string() };
|
||||
|
||||
// Create multiple runs with different outcomes
|
||||
let events = vec![
|
||||
// First run - successful
|
||||
test_events::job_event(Some(build_id.clone()), Some("run-1".to_string()), job_label.clone(), vec![partition.clone()], JobStatus::JobScheduled),
|
||||
test_events::job_event(Some(build_id.clone()), Some("run-1".to_string()), job_label.clone(), vec![partition.clone()], JobStatus::JobCompleted),
|
||||
// Second run - failed
|
||||
test_events::job_event(Some(build_id.clone()), Some("run-2".to_string()), job_label.clone(), vec![partition.clone()], JobStatus::JobScheduled),
|
||||
test_events::job_event(Some(build_id.clone()), Some("run-2".to_string()), job_label.clone(), vec![partition.clone()], JobStatus::JobFailed),
|
||||
// Third run - cancelled
|
||||
test_events::job_event(Some(build_id.clone()), Some("run-3".to_string()), job_label.clone(), vec![partition.clone()], JobStatus::JobScheduled),
|
||||
test_events::job_event(Some(build_id.clone()), Some("run-3".to_string()), job_label.clone(), vec![partition.clone()], JobStatus::JobCancelled),
|
||||
];
|
||||
|
||||
let query_engine = create_mock_bel_query_engine_with_events(events).await.unwrap();
|
||||
let repo = JobsRepository::new(query_engine);
|
||||
|
||||
let result = repo.show(&job_label.label).await.unwrap();
|
||||
assert!(result.is_some());
|
||||
|
||||
let (info, _runs) = result.unwrap();
|
||||
assert_eq!(info.total_runs, 3);
|
||||
assert_eq!(info.successful_runs, 1);
|
||||
assert_eq!(info.failed_runs, 1);
|
||||
assert_eq!(info.cancelled_runs, 1);
|
||||
assert_eq!(info.average_partitions_per_run, 1.0);
|
||||
}
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue