10 KiB
CLI-Server Automation
This document describes how the DataBuild CLI automatically manages the HTTP server lifecycle, providing a "magical" experience where users don't need to think about starting or stopping servers.
Goals
- Zero-config startup: Running
databuild want data/alphashould "just work" without manual server management - Workspace isolation: Multiple graphs can run independently with separate servers and databases
- Resource efficiency: Servers auto-shutdown after idle timeout
- Transparency: Users can inspect server state and logs when needed
Design Overview
Architecture
┌─────────────────────────────────────────────────────────────┐
│ CLI Process │
│ databuild want data/alpha │
│ │
│ 1. Load config (databuild.json) │
│ 2. Check .databuild/${graph_label}/server.lock │
│ 3. If not running → daemonize server │
│ 4. Forward request to http://localhost:${port}/api/wants │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Daemonized Server │
│ PID: 12345, Port: 8080 │
│ │
│ - Holds file lock on server.lock │
│ - Writes logs to server.log │
│ - Auto-shutdown after idle_timeout_seconds │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ .databuild/${graph_label}/ │
│ │
│ server.lock - Lock file + runtime state (JSON) │
│ bel.sqlite - Build Event Log database │
│ server.log - Server stdout/stderr │
└─────────────────────────────────────────────────────────────┘
Directory Structure
project/
├── databuild.json # User-authored config
├── .databuild/
│ └── ${graph_label}/ # e.g., "podcast_reviews"
│ ├── server.lock # Runtime state + file lock
│ ├── bel.sqlite # Build Event Log (SQLite)
│ └── server.log # Server logs
Configuration
Extended Config Schema
The databuild.json (or custom config file) is extended with:
{
"graph_label": "podcast_reviews",
"idle_timeout_seconds": 3600,
"jobs": [
{
"label": "//examples:daily_summaries",
"entrypoint": "./jobs/daily_summaries.sh",
"environment": { "OUTPUT_DIR": "/data/output" },
"partition_patterns": ["daily_summaries/.*"]
}
]
}
| Field | Type | Default | Description |
|---|---|---|---|
graph_label |
string | required | Unique identifier for this graph, used for .databuild/${graph_label}/ directory |
idle_timeout_seconds |
u64 | 3600 | Server auto-shutdown after this many seconds of inactivity |
jobs |
array | [] | Job configurations (existing schema) |
Runtime State (server.lock)
The server.lock file serves dual purposes:
- File lock: Prevents multiple servers for the same graph
- Runtime state: Contains current server information
{
"pid": 12345,
"port": 8080,
"started_at": 1701234567890,
"config_hash": "sha256:abc123..."
}
| Field | Description |
|---|---|
pid |
Server process ID |
port |
HTTP port the server is listening on |
started_at |
Unix timestamp (milliseconds) when server started |
config_hash |
Hash of config file contents (for detecting config changes) |
CLI Commands
Existing Commands (Enhanced)
All commands that interact with the server now auto-start if needed:
# Creates want, auto-starting server if not running
databuild want data/alpha data/beta
# Lists wants, auto-starting server if not running
databuild wants list
# Lists partitions
databuild partitions list
# Lists job runs
databuild job-runs list
New Commands
# Explicitly start server (for users who want manual control)
databuild serve
databuild serve --config ./custom-config.json
# Show server status
databuild status
# Graceful shutdown
databuild stop
Command: databuild status
Shows current server state:
DataBuild Server Status
━━━━━━━━━━━━━━━━━━━━━━━━
Graph: podcast_reviews
Status: Running
PID: 12345
Port: 8080
Uptime: 2h 34m
Database: .databuild/podcast_reviews/bel.sqlite
Active Job Runs: 2
Pending Wants: 5
Command: databuild stop
Gracefully shuts down the server:
$ databuild stop
Stopping DataBuild server (PID 12345)...
Server stopped.
Server Lifecycle
Startup Flow
CLI invocation (e.g., databuild want data/alpha)
│
▼
Load databuild.json (or --config path)
│
▼
Extract graph_label from config
│
▼
Ensure .databuild/${graph_label}/ exists
│
▼
Try flock(server.lock, LOCK_EX | LOCK_NB)
│
├─── Lock acquired → Server not running (or crashed)
│ │
│ ▼
│ Find available port (start from 3538, increment if busy)
│ │
│ ▼
│ Daemonize: fork → setsid → fork → redirect I/O
│ │
│ ▼
│ Child: Start server, hold lock, write server.lock JSON
│ Parent: Wait for health check, then proceed
│
└─── Lock blocked → Server already running
│
▼
Read port from server.lock
│
▼
Health check (GET /health)
│
├─── Success → Use this server
│
└─── Failure → Wait and retry (server starting up)
│
▼
Forward request to http://localhost:${port}/api/...
Daemonization
The server daemonizes using the classic double-fork pattern:
- First fork: Parent returns immediately to CLI
- setsid(): Become session leader, detach from terminal
- Second fork: Prevent re-acquiring terminal
- Redirect I/O: stdout/stderr →
server.log, stdin →/dev/null - Write lock file: PID, port, started_at, config_hash
- Start serving: Hold file lock for lifetime of process
Idle Timeout
The server monitors request activity:
- Track
last_request_time(updated on each HTTP request) - Background task checks every 60 seconds
- If
now - last_request_time > idle_timeout_seconds→ graceful shutdown
Graceful Shutdown
On shutdown (idle timeout, SIGTERM, or databuild stop):
- Stop accepting new connections
- Wait for in-flight requests to complete (with timeout)
- Signal orchestrator to stop
- Wait for orchestrator thread to finish
- Release file lock (automatic on process exit)
- Exit
Port Selection
When starting a new server:
- Start with default port 3538
- Try to bind; if port in use, increment and retry
- Store selected port in
server.lock - CLI reads port from lock file, not from config
This handles the case where the preferred port is occupied by another process.
Config Change Detection
The config_hash field in server.lock enables detecting when the config file has changed since the server started:
- On CLI invocation, compute hash of current config file
- Compare with
config_hashinserver.lock - If different, warn user:
Warning: Config has changed since server started. Run 'databuild stop && databuild serve' to apply changes.
We don't auto-restart because that could interrupt in-progress builds.
Error Handling
Stale Lock File
If server.lock exists but the lock is not held (process crashed):
- Delete the stale
server.lock - Proceed with normal startup
Server Unreachable
If lock is held but health check fails repeatedly:
- Log warning: "Server appears unresponsive"
- After N retries, suggest: "Try 'kill -9 ${pid}' and retry"
Port Conflict
If preferred port is in use:
- Automatically try next port (3539, 3540, ...)
- Store actual port in
server.lock - CLI reads from lock file, so it always connects to correct port
Future Considerations
Multi-Graph Scenarios
The graph_label based directory structure supports multiple graphs in the same workspace. Each graph has independent:
- Server process
- Port allocation
- BEL database
- Idle timeout
Remote Servers
The current design assumes localhost. Future extensions could support:
- Remote server URLs in config
- SSH tunneling
- Cloud-hosted servers
Job Re-entrance
Currently, if a server crashes mid-build, job runs are orphaned. Future work:
- Detect orphaned job runs on startup
- Resume or mark as failed
- Track external job processes (e.g., Databricks jobs)
Implementation Checklist
- Extend
DatabuildConfigwithgraph_labelandidle_timeout_seconds - Create
ServerLockstruct for reading/writing lock file - Implement file locking with
flock() - Implement daemonization (double-fork pattern)
- Add auto-start logic to existing CLI commands
- Add
databuild stopcommand - Add
databuild statuscommand - Update example configs with
graph_label - Add integration tests for server lifecycle