170 lines
No EOL
6.8 KiB
Markdown
170 lines
No EOL
6.8 KiB
Markdown
|
|
# Service
|
|
Purpose: Enable centrally hostable and human-consumable interface for databuild applications, plus efficient cross-graph coordination.
|
|
|
|
## Architecture
|
|
The service provides two primary capabilities:
|
|
1. **Human Interface**: Web dashboard and HTTP API for build management and monitoring
|
|
2. **Cross-Graph Coordination**: `GraphService` API enabling efficient event-driven coordination between DataBuild instances
|
|
|
|
## Correctness Strategy
|
|
- Rely on databuild.proto, call same shared code in core
|
|
- Fully asserted type safety from core to service to web app
|
|
- Core -- databuild.proto --> service -- openapi --> web app
|
|
- No magic strings (how? protobuf doesn't have consts. enums values? code gen over yaml?)
|
|
|
|
## Cross-Graph Coordination
|
|
Services expose the `GraphService` API for cross-graph dependency management:
|
|
|
|
```rust
|
|
trait GraphService {
|
|
async fn list_events(&self, since_idx: i64, filter: EventFilter) -> Result<EventPage>;
|
|
}
|
|
```
|
|
|
|
### Cross-Graph Usage Pattern
|
|
```rust
|
|
// Downstream graph subscribing to upstream partitions
|
|
struct UpstreamDependency {
|
|
service_url: String, // e.g., "https://upstream-databuild.corp.com"
|
|
partition_patterns: Vec<String>, // e.g., ["data/users/*", "ml/models/prod/*"]
|
|
last_sync_idx: i64,
|
|
}
|
|
|
|
// Periodic sync of relevant upstream events
|
|
async fn sync_upstream_events(upstream: &UpstreamDependency) -> Result<()> {
|
|
let client = GraphServiceClient::new(&upstream.service_url);
|
|
let filter = EventFilter {
|
|
partition_patterns: upstream.partition_patterns.clone(),
|
|
..Default::default()
|
|
};
|
|
|
|
let events = client.list_events(upstream.last_sync_idx, filter).await?;
|
|
|
|
// Process partition availability events for immediate job triggering
|
|
for event in events.events {
|
|
if let EventType::PartitionEvent(pe) = event.event_type {
|
|
if pe.status_code == PartitionStatus::PartitionAvailable {
|
|
trigger_dependent_jobs(&pe.partition_ref).await?;
|
|
}
|
|
}
|
|
}
|
|
|
|
upstream.last_sync_idx = events.next_idx;
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
## API
|
|
The purpose of the API is to enable remote, programmatic interaction with databuild applications, and to host endpoints
|
|
needed by the [web app](#web-app).
|
|
|
|
See [OpenAPI spec](../bazel-bin/databuild/client/openapi.json) (may need to
|
|
`bazel build //databuild/client:extract_openapi_spec` if its not found).
|
|
|
|
## Web App
|
|
The web app visualizes databuild application state via features like listing past builds, job statistics,
|
|
partition liveness, build request status, etc. This section specifies the hierarchy of functions of the web app. Pages
|
|
are described in visual order (generally top to bottom).
|
|
|
|
General requirements:
|
|
- Nav at top of page
|
|
- DataBuild logo in top left
|
|
- Navigation links at the top allowing navigation to each list page:
|
|
- Wants list page
|
|
- Jobs list page
|
|
- Build requests list page
|
|
- Triggers list page
|
|
- Build event log page
|
|
- Graph label at top right
|
|
- Search box for finding builds, jobs, and partitions (needs a new service API?)
|
|
|
|
### Home Page
|
|
Jumping off point to navigate and build.
|
|
- A text box, an "Analyze" button, and a "Build" button for doing exactly that (would be great to have autocomplete,
|
|
also PartitionRef patterns would help with ergonomics for less typing / more safety)
|
|
- List recent builds with their requested partitions and current status, with link to build request page
|
|
- List of recently attempted partitions, with status, link to partition page, and link to build request page
|
|
- List of jobs, with (colored) last week success ratio, and link to job page
|
|
|
|
### Build Request Page
|
|
- Show build request ID and overall status of build (colored) and "Cancel" button at top
|
|
- progress bar indicating number of: needs-build partitions, building partitions, non-live delegated partitions, and
|
|
live partitions
|
|
- Summary information table
|
|
- Requested at
|
|
- analyze time (with datetime range)
|
|
- build time (with datetime range)
|
|
- number of tasks in each state (don't include sates with 0 count)
|
|
- number of partitions in each state (don't include sates with 0 count)
|
|
- Show graph diagram of job graph (collapsable)
|
|
- With each job and partition status color coded & linked to related run / partition
|
|
- [paginated](#build-event-log-pagination) list of related build events at bottom
|
|
|
|
### Job Status Page
|
|
- Job label
|
|
- "Recent Runs" select, controlling page size
|
|
- "Recent Runs Page" select - the `< 1 2 3 ... N >` style paginator
|
|
- Job success rate (for all selected; colored)
|
|
- Bar graph showing job execution run times for last N (selectable between 31, 100, 365)
|
|
- Recent task runs
|
|
- With links to build request, task run, partition
|
|
- With task result
|
|
- With run time
|
|
- With expandable partition metadata
|
|
- [paginated](#build-event-log-pagination) list of related build events at bottom
|
|
|
|
### Task Run Page
|
|
- With job label, task status, and "Cancel" button at top
|
|
- Summary information table
|
|
- task run ID
|
|
- output/input partitions
|
|
- task start and end time
|
|
- task duration
|
|
- Graph similar to [build request page](#build-request-page), all partitions and jobs not involved in this task made
|
|
translucent (expandable)
|
|
- With [paginated](#build-event-log-pagination) table of build events at bottom
|
|
|
|
### Partition Status Page
|
|
- With PartitionRef, link to matching [PartitionPattern](#partitionpattern-page), color-coded status, and "build" button at top
|
|
- List of tasks that produced this partition
|
|
- [paginated](#build-event-log-pagination) list of related build events at bottom
|
|
|
|
### PartitionPattern Page
|
|
- Paginated table of partitions that match this partition pattern, sortable by cols, including:
|
|
- Partition ref (with link)
|
|
- Partition pattern values
|
|
- Partition status
|
|
- Build request link
|
|
- Task link (with run time next to it)
|
|
|
|
## Triggers List Page
|
|
- Paginated list of registered triggers
|
|
- With link to trigger detail page
|
|
- With expandable list of produced build requests or wants
|
|
|
|
## Trigger Detail Page
|
|
- Trigger name, last run at, and "Trigger" button at top
|
|
- Trigger history table, including:
|
|
- Trigger time
|
|
- Trigger result (successful/failed)
|
|
- Partitions or wants requested
|
|
|
|
## Wants List Page
|
|
|
|
## Want Detail Page
|
|
|
|
|
|
### Build Event Log Page
|
|
I dunno, some people want to look at the raw thing.
|
|
- A [paginated](#build-event-log-pagination) list of build event log entries
|
|
|
|
### Build Event Log Pagination
|
|
This element is present on most pages, and should be reusable/pluggable for a given set of events/filters.
|
|
- Table with headers of significant fields, sorted by timestamp by default
|
|
- With timestamp, event ID, and message field
|
|
- With color coded event type
|
|
- With links to build requests, jobs, and partitions where IDs are present
|
|
- With expandable details that show the preformatted JSON event contents
|
|
- With the `< 1 2 3 ... N >` style paginator
|
|
- Page size of 100 |