databuild/docs/design/executor.md
Stuart Axelbrooke ea83610d35
Some checks failed
/ setup (push) Has been cancelled
A lot of refactoring
2025-09-27 15:29:22 -07:00

1.8 KiB

Executor

Executors act as a job execution abstraction layer to adapt the graph service to different platforms on which jobs can be run (e.g. local processes, containers, kubernetes, cloud container services, databricks/EMR, etc).

Capabilities

  • stdout/stderr capture
  • producing job BEL events
  • parsing missing upstream partition deps
  • heartbeating - allows the graph to determine what jobs are still live
  • job re-entrance

Job Lifecycle

stateDiagram-v2
    [*] --> Buffering
    Buffering --> Queued : collecting other wants
    Queued --> Running : scheduled
    Running --> Running : heartbeat
    Running --> Failure
    Buffering --> Canceled
    Queued --> Canceled
    Running --> Canceled
    Canceled --> [*] : will not retry
    Running --> MissingDeps
    Running --> Success
    MissingDeps --> [*] : await deps to rerun
    Failure --> [*] : retry according \n to policy
    Success --> [*]

At each state transition the executor emits a BEL event to the graph

Buffering

For jobs that buffer - non buffering jobs emit Buffering but immediately move to Queued. Signified by BEL event with buffering start timestamp and other relevant details for when job can be queued.

Queued

Job run will be launched as soon as the constraints allow (pool slots/etc).

Running

The job run is active, as indicated by continual heartbeating. In this state, the executor will capture logs to disk.

MissingDeps

Job run has emitted the __DATABUILD_ERROR__::{...} line in stdout, executor will emit a missing deps event.

Canceled

Job run explicitly canceled, emits canceled event along with details.

Success

The job run has succeeded, executor emits events with written partitions.

Failure

The job run has failed. The run will be retried according to the