1.8 KiB
Executor
Executors act as a job execution abstraction layer to adapt the graph service to different platforms on which jobs can be run (e.g. local processes, containers, kubernetes, cloud container services, databricks/EMR, etc).
Capabilities
- stdout/stderr capture
- producing job BEL events
- parsing missing upstream partition deps
- heartbeating - allows the graph to determine what jobs are still live
- job re-entrance
Job Lifecycle
stateDiagram-v2
[*] --> Buffering
Buffering --> Queued : collecting other wants
Queued --> Running : scheduled
Running --> Running : heartbeat
Running --> Failure
Buffering --> Canceled
Queued --> Canceled
Running --> Canceled
Canceled --> [*] : will not retry
Running --> MissingDeps
Running --> Success
MissingDeps --> [*] : await deps to rerun
Failure --> [*] : retry according \n to policy
Success --> [*]
At each state transition the executor emits a BEL event to the graph
Buffering
For jobs that buffer - non buffering jobs emit Buffering but immediately move to Queued. Signified by BEL event with buffering start timestamp and other relevant details for when job can be queued.
Queued
Job run will be launched as soon as the constraints allow (pool slots/etc).
Running
The job run is active, as indicated by continual heartbeating. In this state, the executor will capture logs to disk.
MissingDeps
Job run has emitted the __DATABUILD_ERROR__::{...} line in stdout, executor will emit a missing deps event.
Canceled
Job run explicitly canceled, emits canceled event along with details.
Success
The job run has succeeded, executor emits events with written partitions.
Failure
The job run has failed. The run will be retried according to the