databuild/design/glossary.md
2025-07-26 00:48:18 -07:00

34 lines
1.6 KiB
Markdown

# `Job`
Atomic unit of work, producing and consuming specific partitions. See [jobs](./core-build.md#jobs).
# `Graph`
Composes [jobs](#job) to build partitions. See [graphs](./core-build.md#graphs)
# `Partition`
Partitions are atomic units of data, produced and depended on by jobs. A job can produce multiple partitions, but
multiple jobs cannot produce the same partition - e.g. job -> partition relationships must be unique/canonical.
# `PartitionRef`
PartitionsRefs are strings that uniquely identify partitions. They can contain anything, but generally they are S3
URIs, like `s3://companybkt/datasets/foo/date=2025-01-01`, or custom formats like
`dal://prod/clicks/region=4/date=2025-01-01/`. PartitionRefs are used as dependency signals during
[task graph analysis](./core-build.md#graphanalyze). To enable explicit coupling and ergonomics, there are generally
helper classes for creating, parsing, and accessing fields for PartitionRefs in [GSLs](#graph-specification-language-gsl).
# `PartitionPattern`
Patterns that group partitions (e.g. a dataset) and allow for validation (e.g. does this job actually produce the
expected output partition?)
# `JobConfig`
The complete configuration of a job needed to produce the desired partitions, as calculated by
[`job.config`](./core-build.md#jobconfig)
# `JobGraph`
A complete graph of job configs, with [`PartitionRef`](#partitionref) dependency edges, which when executed will
produce the requested partitions.
# Graph Specification Language (GSL)
Language-specific libraries that make implementing databuild graphs and jobs more succinct and ergonomic.
See [graph specification](./graph-specification.md).