databuild/docs/design/glossary.md
Stuart Axelbrooke ea83610d35
Some checks failed
/ setup (push) Has been cancelled
A lot of refactoring
2025-09-27 15:29:22 -07:00

1.3 KiB

Job

Atomic unit of work, producing and consuming specific partitions. See jobs.

Graph

Composes jobs to build partitions. See graphs

Partition

Partitions are atomic units of data, produced and depended on by jobs. A job can produce multiple partitions, but multiple jobs cannot produce the same partition - e.g. job -> partition relationships must be unique/canonical.

PartitionRef

PartitionsRefs are strings that uniquely identify partitions. They can contain anything, but generally they are S3 URIs, like s3://companybkt/datasets/foo/date=2025-01-01, or custom formats like dal://prod/clicks/region=4/date=2025-01-01/. PartitionRefs are used as dependency signals during task graph analysis. To enable explicit coupling and ergonomics, there are generally helper classes for creating, parsing, and accessing fields for PartitionRefs in GDLs.

PartitionPattern

Patterns that group partitions (e.g. a dataset) and allow for validation (e.g. does this job actually produce the expected output partition?)

Graph Definition Language (GDL)

Language-specific libraries that make implementing databuild graphs and jobs more succinct and ergonomic. See graph specification.