34 lines
1.6 KiB
Markdown
34 lines
1.6 KiB
Markdown
|
|
# `Job`
|
|
Atomic unit of work, producing and consuming specific partitions. See [jobs](./core-build.md#jobs).
|
|
|
|
# `Graph`
|
|
Composes [jobs](#job) to build partitions. See [graphs](./core-build.md#graphs)
|
|
|
|
# `Partition`
|
|
Partitions are atomic units of data, produced and depended on by jobs. A job can produce multiple partitions, but
|
|
multiple jobs cannot produce the same partition - e.g. job -> partition relationships must be unique/canonical.
|
|
|
|
# `PartitionRef`
|
|
PartitionsRefs are strings that uniquely identify partitions. They can contain anything, but generally they are S3
|
|
URIs, like `s3://companybkt/datasets/foo/date=2025-01-01`, or custom formats like
|
|
`dal://prod/clicks/region=4/date=2025-01-01/`. PartitionRefs are used as dependency signals during
|
|
[task graph analysis](./core-build.md#graphanalyze). To enable explicit coupling and ergonomics, there are generally
|
|
helper classes for creating, parsing, and accessing fields for PartitionRefs in [GSLs](#graph-specification-language-gsl).
|
|
|
|
# `PartitionPattern`
|
|
Patterns that group partitions (e.g. a dataset) and allow for validation (e.g. does this job actually produce the
|
|
expected output partition?)
|
|
|
|
# `JobConfig`
|
|
The complete configuration of a job needed to produce the desired partitions, as calculated by
|
|
[`job.config`](./core-build.md#jobconfig)
|
|
|
|
# `JobGraph`
|
|
A complete graph of job configs, with [`PartitionRef`](#partitionref) dependency edges, which when executed will
|
|
produce the requested partitions.
|
|
|
|
# Graph Specification Language (GSL)
|
|
Language-specific libraries that make implementing databuild graphs and jobs more succinct and ergonomic.
|
|
See [graph specification](./graph-specification.md).
|
|
|