diff --git a/docs/ideas/event-sourced-cpn-framework.md b/docs/ideas/event-sourced-cpn-framework.md new file mode 100644 index 0000000..d2d6314 --- /dev/null +++ b/docs/ideas/event-sourced-cpn-framework.md @@ -0,0 +1,241 @@ +# Event-Sourced CPN Framework + +A vision for a Rust library/framework combining event sourcing, Colored Petri Net semantics, and compile-time safety for building correct distributed systems. + +## The Problem + +In highly connected applications with multiple entity types and relationships (like databuild's Wants, JobRuns, Partitions), developers face combinatorial complexity: + +For each edge type between entities, you need: +1. Forward accessor +2. Inverse accessor (index) +3. Index maintenance on creation +4. Index maintenance on deletion +5. Consistency checks +6. Query patterns for traversal + +As the number of entities and edges grows, this becomes: +- Hard to keep in your head +- Error-prone (forgot to update an index) +- Lots of boilerplate +- Testing burden for plumbing rather than business logic + +The temptation is to "throw hands up" and use SQL with foreign keys, or accept eventual consistency. But this sacrifices the compile-time guarantees Rust can provide. + +## The Vision + +A framework where developers declare: +- **Entities** with their valid states (state machines) +- **Edges** between entities (typed, directional, with cardinality) +- **Transitions** (what state changes are valid, and when) + +And the framework provides: +- Auto-generated accessors (both directions) +- Auto-maintained indexes +- Compile-time invalid transition errors +- Runtime referential integrity (fail-fast or transactional) +- Event log as source of truth with replay capability +- Potential automatic concurrency from CPN place-disjointness + +## Why + +### Correctness Guarantees + +- **Compile-time**: Invalid state transitions are type errors +- **Compile-time**: Edge definitions guarantee bidirectional navigability +- **Runtime**: Referential integrity violations detected immediately +- **Result**: "If it compiles and the event log replays, the state is consistent" + +### Performance "For Free" + +- Indexes auto-maintained as edges are created/destroyed +- No query planning needed - traversal patterns known at compile time +- Potential: CPN place-disjointness → automatic safe concurrency + +### Developer Experience + +- Declare entities, states, edges, transitions +- Library generates: accessors, inverse indexes, transition methods, consistency checks +- Focus on *what* not *how* - the plumbing disappears +- Still Rust: escape hatch to custom logic when needed + +### Testing Burden Reduction + +- No tests for "did I update the index correctly" +- No tests for "can I traverse this relationship backwards" +- Focus tests on business logic, not graph bookkeeping + +## How + +### Foundations + +- **Colored Petri Nets** for state machine composition semantics +- **Typestate pattern** for compile-time transition validity +- **Event sourcing** for persistence and replay + +### Implementation Approach + +Declarative DSL or proc macros for entity/edge/transition definitions: + +```rust +// Hypothetical syntax +entity! { + Want { + states: [New, Idle, Building, Successful, Failed, Canceled], + transitions: [ + New -> Idle, + New -> Building, + Idle -> Building, + Building -> Successful, + Building -> Failed, + // ... + ] + } +} + +edge! { + servicing_wants: JobRun -> many Want, + built_by: Partition -> one JobRun, +} +``` + +Code generation produces: +- Entity structs with state type parameters +- Edge storage with auto-maintained inverses +- Transition methods that enforce valid source states +- Query methods for traversal in both directions + +### The Graph Model + +- Entities are nodes (with state) +- Edges are typed, directional, with cardinality (one/many) +- Both directions always queryable +- Edge creation/deletion is transactional within a step + +### Entry Point + +Single `step(event) -> Result<(), StepError>` that: +1. Validates the event against current state +2. Applies state transitions +3. Updates all affected indexes +4. Returns success or rolls back + +## Transactionality + +### Beyond Fail-Fast + +Instead of panicking on consistency violations, support transactional semantics: + +```rust +// Infallible (panics on error) +state.step(event); + +// Fallible (returns error, state unchanged on failure) +state.try_step(event) -> Result<(), StepError>; + +// Explicit transaction (for multi-event atomicity) +let txn = state.begin(); +txn.apply(event1)?; +txn.apply(event2)?; +txn.commit(); // or rollback on drop +``` + +### What This Enables + +1. **Local atomicity**: A single event either fully applies or doesn't - no partial states + +2. **Distributed coordination**: If `step` can return `Err` instead of panicking: + - Try to apply an event + - If it fails, coordinate with other systems before retrying + - Implement saga patterns, 2PC, etc. + +3. **Speculative execution**: "What if I applied this event?" without committing + - Useful for validation, dry-runs, conflict detection + +4. **Optimistic concurrency**: + - Multiple workers try to apply events concurrently + - Conflicts detected and rolled back + - Retry with updated state + +### Implementation Options + +1. **Copy-on-write / snapshot**: Clone state, apply to clone, swap on success + - Simple but memory-heavy for large state + +2. **Command pattern / undo log**: Record inverse operations, replay backwards on rollback + - More complex, but efficient for small changes to large state + +3. **MVCC-style**: Version all entities, only "commit" versions on success + - Most sophisticated, enables concurrent reads during transaction + +## Relationship to Datomic + +[Datomic](https://docs.datomic.com/datomic-overview.html) is a distributed database built on similar principles that validates many of these ideas in production: + +### Shared Concepts + +| Concept | Datomic | This Framework | +|---------|---------|----------------| +| Immutable facts | Datoms (E-A-V-T tuples) | BEL events | +| Time travel | `as-of` queries | Event replay | +| Speculative execution | [`d/with`](https://docs.datomic.com/transactions/transaction-processing.html) | `try_step()` / transactions | +| Atomic commits | `d/transact` = `d/with` + durable swap | `step()` = validate + apply + persist | +| Transaction-time validation | [Transaction functions](https://docs.datomic.com/transactions/transaction-functions.html) with `db-before` | Transition guards | +| Post-transaction validation | [Entity specs](https://docs.datomic.com/transactions/model.html) with `db-after` | Invariant checks | +| Single writer | Transactor serializes all writes | Single `step()` entry point | +| Horizontal read scaling | Peers cache and query locally | Immutable state snapshots | + +### Datomic's Speculative Writes + +Datomic's `d/with` is particularly relevant - it's a [pure function](https://vvvvalvalval.github.io/posts/2018-11-12-datomic-event-sourcing-without-the-hassle.html) that takes a database value and proposed facts, returning a new database value *without persisting*. This enables: + +- Testing transactions without mutation +- Composing transaction data before committing +- [Enforcing invariants](https://stackoverflow.com/questions/48268887/how-to-prevent-transactions-from-violating-application-invariants-in-datomic) by speculatively applying, checking, then committing or aborting +- Development against production data safely (via libraries like Datomock) + +### What Datomic Doesn't Provide + +- **CPN state machine semantics**: Typed transitions between entity states +- **Compile-time transition validity**: Invalid transitions caught by the type system +- **Auto-generated bidirectional indexes**: Declared edges automatically traversable both ways +- **Rust**: Memory safety, zero-cost abstractions, embeddable + +The vision here is essentially: *Datomic's transaction model + CPN state machines + Rust compile-time safety* + +## Open Questions + +- How to express transition guards (conditions beyond "in state X")? +- How to handle edges to entities that don't exist yet (forward references)? +- Serialization format for the event log? +- How much CPN formalism to expose vs. hide? +- What's the right granularity for "places" in the CPN model? +- How does this interact with async/distributed systems? + +## Potential Names + +Something evoking: event-sourced + graph + state machines + Rust + +- `petri-graph` +- `ironweave` (iron = Rust, weave = connected graph) +- `factforge` +- `datumflow` + +## Prior Art to Investigate + +- Datomic (Clojure, distributed immutable database) +- Bevy ECS (Rust, entity-component-system with events) +- CPN Tools (Petri net modeling/simulation) +- Diesel / SeaORM (Rust, compile-time SQL checking) +- EventStoreDB (event sourcing infrastructure) + +## Next Steps + +This document captures the "why" and "how" at a conceptual level. To validate: + +1. Prototype the macro/DSL syntax for a simple 2-3 entity system +2. Implement auto-indexed bidirectional edges +3. Implement typestate transitions +4. Add speculative execution (`try_step`) +5. Benchmark against hand-written equivalent +6. Evaluate ergonomics in real use (databuild as first consumer)