1.3 KiB
Job
Atomic unit of work, producing and consuming specific partitions. See jobs.
Graph
Composes jobs to build partitions. See graphs
Partition
Partitions are atomic units of data, produced and depended on by jobs. A job can produce multiple partitions, but multiple jobs cannot produce the same partition - e.g. job -> partition relationships must be unique/canonical.
PartitionRef
PartitionsRefs are strings that uniquely identify partitions. They can contain anything, but generally they are S3
URIs, like s3://companybkt/datasets/foo/date=2025-01-01, or custom formats like
dal://prod/clicks/region=4/date=2025-01-01/. PartitionRefs are used as dependency signals during
task graph analysis. To enable explicit coupling and ergonomics, there are generally
helper classes for creating, parsing, and accessing fields for PartitionRefs in GDLs.
PartitionPattern
Patterns that group partitions (e.g. a dataset) and allow for validation (e.g. does this job actually produce the expected output partition?)
Graph Definition Language (GDL)
Language-specific libraries that make implementing databuild graphs and jobs more succinct and ergonomic. See graph specification.