26 lines
1.3 KiB
Markdown
26 lines
1.3 KiB
Markdown
|
|
# `Job`
|
|
Atomic unit of work, producing and consuming specific partitions. See [jobs](core-build.md#jobs).
|
|
|
|
# `Graph`
|
|
Composes [jobs](#job) to build partitions. See [graphs](core-build.md#graphs)
|
|
|
|
# `Partition`
|
|
Partitions are atomic units of data, produced and depended on by jobs. A job can produce multiple partitions, but
|
|
multiple jobs cannot produce the same partition - e.g. job -> partition relationships must be unique/canonical.
|
|
|
|
# `PartitionRef`
|
|
PartitionsRefs are strings that uniquely identify partitions. They can contain anything, but generally they are S3
|
|
URIs, like `s3://companybkt/datasets/foo/date=2025-01-01`, or custom formats like
|
|
`dal://prod/clicks/region=4/date=2025-01-01/`. PartitionRefs are used as dependency signals during
|
|
[task graph analysis](core-build.md#graphanalyze). To enable explicit coupling and ergonomics, there are generally
|
|
helper classes for creating, parsing, and accessing fields for PartitionRefs in [GDLs](#graph-specification-language-gsl).
|
|
|
|
# `PartitionPattern`
|
|
Patterns that group partitions (e.g. a dataset) and allow for validation (e.g. does this job actually produce the
|
|
expected output partition?)
|
|
|
|
# Graph Definition Language (GDL)
|
|
Language-specific libraries that make implementing databuild graphs and jobs more succinct and ergonomic.
|
|
See [graph specification](graph-specification.md).
|
|
|