databuild/docs/ideas/metadata.md

750 B

It would be cool to have user-defined partition/want/job-run metadata, and allow querying of this metadata. Basic example: adding a run_url to a job or adls_location to a partition. More advanced: adding a dbx_cores field to job runs, and using querying over job runs downstream from a want to control parallelism down to the number-of-cores-used level.

Also, taints could be implemented as metadata also, e.g. a databuild.tainted_at field that is just set to the current time upon tainting a partition. This would involve a few endpoints:

  1. Set partition metadata
  2. Get partition metadata

Big question is, do we need taint history? Or metadata assignment history? Temptation is YAGNI, but may be worth imagining here just to make sure.