# CLI-Server Automation This document describes how the DataBuild CLI automatically manages the HTTP server lifecycle, providing a "magical" experience where users don't need to think about starting or stopping servers. ## Goals 1. **Zero-config startup**: Running `databuild want data/alpha` should "just work" without manual server management 2. **Workspace isolation**: Multiple graphs can run independently with separate servers and databases 3. **Resource efficiency**: Servers auto-shutdown after idle timeout 4. **Transparency**: Users can inspect server state and logs when needed ## Design Overview ### Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ CLI Process │ │ databuild want data/alpha │ │ │ │ 1. Load config (databuild.json) │ │ 2. Check .databuild/${graph_label}/server.lock │ │ 3. If not running → daemonize server │ │ 4. Forward request to http://localhost:${port}/api/wants │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Daemonized Server │ │ PID: 12345, Port: 8080 │ │ │ │ - Holds file lock on server.lock │ │ - Writes logs to server.log │ │ - Auto-shutdown after idle_timeout_seconds │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ .databuild/${graph_label}/ │ │ │ │ server.lock - Lock file + runtime state (JSON) │ │ bel.sqlite - Build Event Log database │ │ server.log - Server stdout/stderr │ └─────────────────────────────────────────────────────────────┘ ``` ### Directory Structure ``` project/ ├── databuild.json # User-authored config ├── .databuild/ │ └── ${graph_label}/ # e.g., "podcast_reviews" │ ├── server.lock # Runtime state + file lock │ ├── bel.sqlite # Build Event Log (SQLite) │ └── server.log # Server logs ``` ## Configuration ### Extended Config Schema The `databuild.json` (or custom config file) is extended with: ```json { "graph_label": "podcast_reviews", "idle_timeout_seconds": 3600, "jobs": [ { "label": "//examples:daily_summaries", "entrypoint": "./jobs/daily_summaries.sh", "environment": { "OUTPUT_DIR": "/data/output" }, "partition_patterns": ["daily_summaries/.*"] } ] } ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `graph_label` | string | **required** | Unique identifier for this graph, used for `.databuild/${graph_label}/` directory | | `idle_timeout_seconds` | u64 | 3600 | Server auto-shutdown after this many seconds of inactivity | | `jobs` | array | [] | Job configurations (existing schema) | ### Runtime State (server.lock) The `server.lock` file serves dual purposes: 1. **File lock**: Prevents multiple servers for the same graph 2. **Runtime state**: Contains current server information ```json { "pid": 12345, "port": 8080, "started_at": 1701234567890, "config_hash": "sha256:abc123..." } ``` | Field | Description | |-------|-------------| | `pid` | Server process ID | | `port` | HTTP port the server is listening on | | `started_at` | Unix timestamp (milliseconds) when server started | | `config_hash` | Hash of config file contents (for detecting config changes) | ## CLI Commands ### Existing Commands (Enhanced) All commands that interact with the server now auto-start if needed: ```bash # Creates want, auto-starting server if not running databuild want data/alpha data/beta # Lists wants, auto-starting server if not running databuild wants list # Lists partitions databuild partitions list # Lists job runs databuild job-runs list ``` ### New Commands ```bash # Explicitly start server (for users who want manual control) databuild serve databuild serve --config ./custom-config.json # Show server status databuild status # Graceful shutdown databuild stop ``` ### Command: `databuild status` Shows current server state: ``` DataBuild Server Status ━━━━━━━━━━━━━━━━━━━━━━━━ Graph: podcast_reviews Status: Running PID: 12345 Port: 8080 Uptime: 2h 34m Database: .databuild/podcast_reviews/bel.sqlite Active Job Runs: 2 Pending Wants: 5 ``` ### Command: `databuild stop` Gracefully shuts down the server: ```bash $ databuild stop Stopping DataBuild server (PID 12345)... Server stopped. ``` ## Server Lifecycle ### Startup Flow ``` CLI invocation (e.g., databuild want data/alpha) │ ▼ Load databuild.json (or --config path) │ ▼ Extract graph_label from config │ ▼ Ensure .databuild/${graph_label}/ exists │ ▼ Try flock(server.lock, LOCK_EX | LOCK_NB) │ ├─── Lock acquired → Server not running (or crashed) │ │ │ ▼ │ Find available port (start from 3538, increment if busy) │ │ │ ▼ │ Daemonize: fork → setsid → fork → redirect I/O │ │ │ ▼ │ Child: Start server, hold lock, write server.lock JSON │ Parent: Wait for health check, then proceed │ └─── Lock blocked → Server already running │ ▼ Read port from server.lock │ ▼ Health check (GET /health) │ ├─── Success → Use this server │ └─── Failure → Wait and retry (server starting up) │ ▼ Forward request to http://localhost:${port}/api/... ``` ### Daemonization The server daemonizes using the classic double-fork pattern: 1. **First fork**: Parent returns immediately to CLI 2. **setsid()**: Become session leader, detach from terminal 3. **Second fork**: Prevent re-acquiring terminal 4. **Redirect I/O**: stdout/stderr → `server.log`, stdin → `/dev/null` 5. **Write lock file**: PID, port, started_at, config_hash 6. **Start serving**: Hold file lock for lifetime of process ### Idle Timeout The server monitors request activity: 1. Track `last_request_time` (updated on each HTTP request) 2. Background task checks every 60 seconds 3. If `now - last_request_time > idle_timeout_seconds` → graceful shutdown ### Graceful Shutdown On shutdown (idle timeout, SIGTERM, or `databuild stop`): 1. Stop accepting new connections 2. Wait for in-flight requests to complete (with timeout) 3. Signal orchestrator to stop 4. Wait for orchestrator thread to finish 5. Release file lock (automatic on process exit) 6. Exit ## Port Selection When starting a new server: 1. Start with default port 3538 2. Try to bind; if port in use, increment and retry 3. Store selected port in `server.lock` 4. CLI reads port from lock file, not from config This handles the case where the preferred port is occupied by another process. ## Config Change Detection The `config_hash` field in `server.lock` enables detecting when the config file has changed since the server started: 1. On CLI invocation, compute hash of current config file 2. Compare with `config_hash` in `server.lock` 3. If different, warn user: ``` Warning: Config has changed since server started. Run 'databuild stop && databuild serve' to apply changes. ``` We don't auto-restart because that could interrupt in-progress builds. ## Error Handling ### Stale Lock File If `server.lock` exists but the lock is not held (process crashed): 1. Delete the stale `server.lock` 2. Proceed with normal startup ### Server Unreachable If lock is held but health check fails repeatedly: 1. Log warning: "Server appears unresponsive" 2. After N retries, suggest: "Try 'kill -9 ${pid}' and retry" ### Port Conflict If preferred port is in use: 1. Automatically try next port (3539, 3540, ...) 2. Store actual port in `server.lock` 3. CLI reads from lock file, so it always connects to correct port ## Future Considerations ### Multi-Graph Scenarios The `graph_label` based directory structure supports multiple graphs in the same workspace. Each graph has independent: - Server process - Port allocation - BEL database - Idle timeout ### Remote Servers The current design assumes localhost. Future extensions could support: - Remote server URLs in config - SSH tunneling - Cloud-hosted servers ### Job Re-entrance Currently, if a server crashes mid-build, job runs are orphaned. Future work: - Detect orphaned job runs on startup - Resume or mark as failed - Track external job processes (e.g., Databricks jobs) ## Implementation Checklist - [ ] Extend `DatabuildConfig` with `graph_label` and `idle_timeout_seconds` - [ ] Create `ServerLock` struct for reading/writing lock file - [ ] Implement file locking with `flock()` - [ ] Implement daemonization (double-fork pattern) - [ ] Add auto-start logic to existing CLI commands - [ ] Add `databuild stop` command - [ ] Add `databuild status` command - [ ] Update example configs with `graph_label` - [ ] Add integration tests for server lifecycle