55 lines
1.8 KiB
Markdown
55 lines
1.8 KiB
Markdown
|
|
# Executor
|
|
|
|
Executors act as a job execution abstraction layer to adapt the graph service to different platforms on which jobs can be run (e.g. local processes, containers, kubernetes, cloud container services, databricks/EMR, etc).
|
|
|
|
## Capabilities
|
|
|
|
- stdout/stderr capture
|
|
- producing job BEL events
|
|
- parsing missing upstream partition deps
|
|
- heartbeating - allows the graph to determine what jobs are still live
|
|
- job re-entrance
|
|
|
|
## Job Lifecycle
|
|
|
|
```mermaid
|
|
stateDiagram-v2
|
|
[*] --> Buffering
|
|
Buffering --> Queued : collecting other wants
|
|
Queued --> Running : scheduled
|
|
Running --> Running : heartbeat
|
|
Running --> Failure
|
|
Buffering --> Canceled
|
|
Queued --> Canceled
|
|
Running --> Canceled
|
|
Canceled --> [*] : will not retry
|
|
Running --> MissingDeps
|
|
Running --> Success
|
|
MissingDeps --> [*] : await deps to rerun
|
|
Failure --> [*] : retry according \n to policy
|
|
Success --> [*]
|
|
```
|
|
|
|
At each state transition the executor emits a BEL event to the graph
|
|
|
|
### Buffering
|
|
For jobs that buffer - non buffering jobs emit `Buffering` but immediately move to `Queued`. Signified by BEL event with buffering start timestamp and other relevant details for when job can be queued.
|
|
|
|
### Queued
|
|
Job run will be launched as soon as the constraints allow (pool slots/etc).
|
|
|
|
### Running
|
|
The job run is active, as indicated by continual heartbeating. In this state, the executor will capture logs to disk.
|
|
|
|
### MissingDeps
|
|
Job run has emitted the `__DATABUILD_ERROR__::{...}` line in stdout, executor will emit a missing deps event.
|
|
|
|
### Canceled
|
|
Job run explicitly canceled, emits canceled event along with details.
|
|
|
|
### Success
|
|
The job run has succeeded, executor emits events with written partitions.
|
|
|
|
### Failure
|
|
The job run has failed. The run will be retried according to the
|