databuild/design/glossary.md
2025-07-26 00:48:18 -07:00

1.6 KiB

Job

Atomic unit of work, producing and consuming specific partitions. See jobs.

Graph

Composes jobs to build partitions. See graphs

Partition

Partitions are atomic units of data, produced and depended on by jobs. A job can produce multiple partitions, but multiple jobs cannot produce the same partition - e.g. job -> partition relationships must be unique/canonical.

PartitionRef

PartitionsRefs are strings that uniquely identify partitions. They can contain anything, but generally they are S3 URIs, like s3://companybkt/datasets/foo/date=2025-01-01, or custom formats like dal://prod/clicks/region=4/date=2025-01-01/. PartitionRefs are used as dependency signals during task graph analysis. To enable explicit coupling and ergonomics, there are generally helper classes for creating, parsing, and accessing fields for PartitionRefs in GSLs.

PartitionPattern

Patterns that group partitions (e.g. a dataset) and allow for validation (e.g. does this job actually produce the expected output partition?)

JobConfig

The complete configuration of a job needed to produce the desired partitions, as calculated by job.config

JobGraph

A complete graph of job configs, with PartitionRef dependency edges, which when executed will produce the requested partitions.

Graph Specification Language (GSL)

Language-specific libraries that make implementing databuild graphs and jobs more succinct and ergonomic. See graph specification.