databuild/design/service.md

170 lines
No EOL
6.8 KiB
Markdown

# Service
Purpose: Enable centrally hostable and human-consumable interface for databuild applications, plus efficient cross-graph coordination.
## Architecture
The service provides two primary capabilities:
1. **Human Interface**: Web dashboard and HTTP API for build management and monitoring
2. **Cross-Graph Coordination**: `GraphService` API enabling efficient event-driven coordination between DataBuild instances
## Correctness Strategy
- Rely on databuild.proto, call same shared code in core
- Fully asserted type safety from core to service to web app
- Core -- databuild.proto --> service -- openapi --> web app
- No magic strings (how? protobuf doesn't have consts. enums values? code gen over yaml?)
## Cross-Graph Coordination
Services expose the `GraphService` API for cross-graph dependency management:
```rust
trait GraphService {
async fn list_events(&self, since_idx: i64, filter: EventFilter) -> Result<EventPage>;
}
```
### Cross-Graph Usage Pattern
```rust
// Downstream graph subscribing to upstream partitions
struct UpstreamDependency {
service_url: String, // e.g., "https://upstream-databuild.corp.com"
partition_patterns: Vec<String>, // e.g., ["data/users/*", "ml/models/prod/*"]
last_sync_idx: i64,
}
// Periodic sync of relevant upstream events
async fn sync_upstream_events(upstream: &UpstreamDependency) -> Result<()> {
let client = GraphServiceClient::new(&upstream.service_url);
let filter = EventFilter {
partition_patterns: upstream.partition_patterns.clone(),
..Default::default()
};
let events = client.list_events(upstream.last_sync_idx, filter).await?;
// Process partition availability events for immediate job triggering
for event in events.events {
if let EventType::PartitionEvent(pe) = event.event_type {
if pe.status_code == PartitionStatus::PartitionAvailable {
trigger_dependent_jobs(&pe.partition_ref).await?;
}
}
}
upstream.last_sync_idx = events.next_idx;
Ok(())
}
```
## API
The purpose of the API is to enable remote, programmatic interaction with databuild applications, and to host endpoints
needed by the [web app](#web-app).
See [OpenAPI spec](../bazel-bin/databuild/client/openapi.json) (may need to
`bazel build //databuild/client:extract_openapi_spec` if its not found).
## Web App
The web app visualizes databuild application state via features like listing past builds, job statistics,
partition liveness, build request status, etc. This section specifies the hierarchy of functions of the web app. Pages
are described in visual order (generally top to bottom).
General requirements:
- Nav at top of page
- DataBuild logo in top left
- Navigation links at the top allowing navigation to each list page:
- Wants list page
- Jobs list page
- Build requests list page
- Triggers list page
- Build event log page
- Graph label at top right
- Search box for finding builds, jobs, and partitions (needs a new service API?)
### Home Page
Jumping off point to navigate and build.
- A text box, an "Analyze" button, and a "Build" button for doing exactly that (would be great to have autocomplete,
also PartitionRef patterns would help with ergonomics for less typing / more safety)
- List recent builds with their requested partitions and current status, with link to build request page
- List of recently attempted partitions, with status, link to partition page, and link to build request page
- List of jobs, with (colored) last week success ratio, and link to job page
### Build Request Page
- Show build request ID and overall status of build (colored) and "Cancel" button at top
- progress bar indicating number of: needs-build partitions, building partitions, non-live delegated partitions, and
live partitions
- Summary information table
- Requested at
- analyze time (with datetime range)
- build time (with datetime range)
- number of tasks in each state (don't include sates with 0 count)
- number of partitions in each state (don't include sates with 0 count)
- Show graph diagram of job graph (collapsable)
- With each job and partition status color coded & linked to related run / partition
- [paginated](#build-event-log-pagination) list of related build events at bottom
### Job Status Page
- Job label
- "Recent Runs" select, controlling page size
- "Recent Runs Page" select - the `< 1 2 3 ... N >` style paginator
- Job success rate (for all selected; colored)
- Bar graph showing job execution run times for last N (selectable between 31, 100, 365)
- Recent task runs
- With links to build request, task run, partition
- With task result
- With run time
- With expandable partition metadata
- [paginated](#build-event-log-pagination) list of related build events at bottom
### Task Run Page
- With job label, task status, and "Cancel" button at top
- Summary information table
- task run ID
- output/input partitions
- task start and end time
- task duration
- Graph similar to [build request page](#build-request-page), all partitions and jobs not involved in this task made
translucent (expandable)
- With [paginated](#build-event-log-pagination) table of build events at bottom
### Partition Status Page
- With PartitionRef, link to matching [PartitionPattern](#partitionpattern-page), color-coded status, and "build" button at top
- List of tasks that produced this partition
- [paginated](#build-event-log-pagination) list of related build events at bottom
### PartitionPattern Page
- Paginated table of partitions that match this partition pattern, sortable by cols, including:
- Partition ref (with link)
- Partition pattern values
- Partition status
- Build request link
- Task link (with run time next to it)
## Triggers List Page
- Paginated list of registered triggers
- With link to trigger detail page
- With expandable list of produced build requests or wants
## Trigger Detail Page
- Trigger name, last run at, and "Trigger" button at top
- Trigger history table, including:
- Trigger time
- Trigger result (successful/failed)
- Partitions or wants requested
## Wants List Page
## Want Detail Page
### Build Event Log Page
I dunno, some people want to look at the raw thing.
- A [paginated](#build-event-log-pagination) list of build event log entries
### Build Event Log Pagination
This element is present on most pages, and should be reusable/pluggable for a given set of events/filters.
- Table with headers of significant fields, sorted by timestamp by default
- With timestamp, event ID, and message field
- With color coded event type
- With links to build requests, jobs, and partitions where IDs are present
- With expandable details that show the preformatted JSON event contents
- With the `< 1 2 3 ... N >` style paginator
- Page size of 100