# Observability ## Purpose Provide comprehensive, platform-agnostic observability for DataBuild applications through standardized job wrapper telemetry. ## Architecture ### Wrapper-Based Observability All observability flows through the job wrapper: - **Jobs** emit application logs to stdout/stderr - **Wrapper** captures and enriches with structured metadata - **Graph** parses structured logs into metrics, events, and monitoring data - [**BEL**](./build-event-log.md) stores aggregated telemetry for historical analysis ### Communication Protocol Log-based telemetry using protobuf-defined structured messages: - LogMessage: Application stdout/stderr with metadata - MetricPoint: StatsD-style metrics with labels - JobEvent: State transitions and system events - PartitionManifest: Job completion with output metadata ## Implementation ### Metrics Collection - Format: StatsD-like embedded in structured logs - Aggregation: Graph components collect and expose via Prometheus - Storage: Summary metrics stored in BEL for historical analysis - Scope: Job execution, resource usage, partition metadata ### Logging - Capture: All job stdout/stderr via wrapper - Enhancement: Automatic injection of job_id, partition_ref, timestamps - Format: Structured JSON for consistent parsing - Retention: Platform-dependent (container logs, cloud logging APIs) ### Monitoring - Heartbeats: 30-second intervals with resource utilization - Health: Exit code categorization and failure analysis - Alerting: Standard Prometheus/alertmanager integration - Debugging: Complete log trails for job troubleshooting ### Platform Integration - **Local**: Direct stdout pipe reading - **Docker**: Container log persistence and `docker logs` - **Kubernetes**: Pod logs API with configurable retention - **Cloud**: Platform logging services (CloudWatch, Cloud Logging)