databuild/docs/ideas/event-sourced-cpn-framework.md
2025-12-01 02:14:27 +08:00

8.7 KiB

Event-Sourced CPN Framework

A vision for a Rust library/framework combining event sourcing, Colored Petri Net semantics, and compile-time safety for building correct distributed systems.

The Problem

In highly connected applications with multiple entity types and relationships (like databuild's Wants, JobRuns, Partitions), developers face combinatorial complexity:

For each edge type between entities, you need:

  1. Forward accessor
  2. Inverse accessor (index)
  3. Index maintenance on creation
  4. Index maintenance on deletion
  5. Consistency checks
  6. Query patterns for traversal

As the number of entities and edges grows, this becomes:

  • Hard to keep in your head
  • Error-prone (forgot to update an index)
  • Lots of boilerplate
  • Testing burden for plumbing rather than business logic

The temptation is to "throw hands up" and use SQL with foreign keys, or accept eventual consistency. But this sacrifices the compile-time guarantees Rust can provide.

The Vision

A framework where developers declare:

  • Entities with their valid states (state machines)
  • Edges between entities (typed, directional, with cardinality)
  • Transitions (what state changes are valid, and when)

And the framework provides:

  • Auto-generated accessors (both directions)
  • Auto-maintained indexes
  • Compile-time invalid transition errors
  • Runtime referential integrity (fail-fast or transactional)
  • Event log as source of truth with replay capability
  • Potential automatic concurrency from CPN place-disjointness

Why

Correctness Guarantees

  • Compile-time: Invalid state transitions are type errors
  • Compile-time: Edge definitions guarantee bidirectional navigability
  • Runtime: Referential integrity violations detected immediately
  • Result: "If it compiles and the event log replays, the state is consistent"

Performance "For Free"

  • Indexes auto-maintained as edges are created/destroyed
  • No query planning needed - traversal patterns known at compile time
  • Potential: CPN place-disjointness → automatic safe concurrency

Developer Experience

  • Declare entities, states, edges, transitions
  • Library generates: accessors, inverse indexes, transition methods, consistency checks
  • Focus on what not how - the plumbing disappears
  • Still Rust: escape hatch to custom logic when needed

Testing Burden Reduction

  • No tests for "did I update the index correctly"
  • No tests for "can I traverse this relationship backwards"
  • Focus tests on business logic, not graph bookkeeping

How

Foundations

  • Colored Petri Nets for state machine composition semantics
  • Typestate pattern for compile-time transition validity
  • Event sourcing for persistence and replay

Implementation Approach

Declarative DSL or proc macros for entity/edge/transition definitions:

// Hypothetical syntax
entity! {
    Want {
        states: [New, Idle, Building, Successful, Failed, Canceled],
        transitions: [
            New -> Idle,
            New -> Building,
            Idle -> Building,
            Building -> Successful,
            Building -> Failed,
            // ...
        ]
    }
}

edge! {
    servicing_wants: JobRun -> many Want,
    built_by: Partition -> one JobRun,
}

Code generation produces:

  • Entity structs with state type parameters
  • Edge storage with auto-maintained inverses
  • Transition methods that enforce valid source states
  • Query methods for traversal in both directions

The Graph Model

  • Entities are nodes (with state)
  • Edges are typed, directional, with cardinality (one/many)
  • Both directions always queryable
  • Edge creation/deletion is transactional within a step

Entry Point

Single step(event) -> Result<(), StepError> that:

  1. Validates the event against current state
  2. Applies state transitions
  3. Updates all affected indexes
  4. Returns success or rolls back

Transactionality

Beyond Fail-Fast

Instead of panicking on consistency violations, support transactional semantics:

// Infallible (panics on error)
state.step(event);

// Fallible (returns error, state unchanged on failure)
state.try_step(event) -> Result<(), StepError>;

// Explicit transaction (for multi-event atomicity)
let txn = state.begin();
txn.apply(event1)?;
txn.apply(event2)?;
txn.commit(); // or rollback on drop

What This Enables

  1. Local atomicity: A single event either fully applies or doesn't - no partial states

  2. Distributed coordination: If step can return Err instead of panicking:

    • Try to apply an event
    • If it fails, coordinate with other systems before retrying
    • Implement saga patterns, 2PC, etc.
  3. Speculative execution: "What if I applied this event?" without committing

    • Useful for validation, dry-runs, conflict detection
  4. Optimistic concurrency:

    • Multiple workers try to apply events concurrently
    • Conflicts detected and rolled back
    • Retry with updated state

Implementation Options

  1. Copy-on-write / snapshot: Clone state, apply to clone, swap on success

    • Simple but memory-heavy for large state
  2. Command pattern / undo log: Record inverse operations, replay backwards on rollback

    • More complex, but efficient for small changes to large state
  3. MVCC-style: Version all entities, only "commit" versions on success

    • Most sophisticated, enables concurrent reads during transaction

Relationship to Datomic

Datomic is a distributed database built on similar principles that validates many of these ideas in production:

Shared Concepts

Concept Datomic This Framework
Immutable facts Datoms (E-A-V-T tuples) BEL events
Time travel as-of queries Event replay
Speculative execution d/with try_step() / transactions
Atomic commits d/transact = d/with + durable swap step() = validate + apply + persist
Transaction-time validation Transaction functions with db-before Transition guards
Post-transaction validation Entity specs with db-after Invariant checks
Single writer Transactor serializes all writes Single step() entry point
Horizontal read scaling Peers cache and query locally Immutable state snapshots

Datomic's Speculative Writes

Datomic's d/with is particularly relevant - it's a pure function that takes a database value and proposed facts, returning a new database value without persisting. This enables:

  • Testing transactions without mutation
  • Composing transaction data before committing
  • Enforcing invariants by speculatively applying, checking, then committing or aborting
  • Development against production data safely (via libraries like Datomock)

What Datomic Doesn't Provide

  • CPN state machine semantics: Typed transitions between entity states
  • Compile-time transition validity: Invalid transitions caught by the type system
  • Auto-generated bidirectional indexes: Declared edges automatically traversable both ways
  • Rust: Memory safety, zero-cost abstractions, embeddable

The vision here is essentially: Datomic's transaction model + CPN state machines + Rust compile-time safety

Open Questions

  • How to express transition guards (conditions beyond "in state X")?
  • How to handle edges to entities that don't exist yet (forward references)?
  • Serialization format for the event log?
  • How much CPN formalism to expose vs. hide?
  • What's the right granularity for "places" in the CPN model?
  • How does this interact with async/distributed systems?

Potential Names

Something evoking: event-sourced + graph + state machines + Rust

  • petri-graph
  • ironweave (iron = Rust, weave = connected graph)
  • factforge
  • datumflow

Prior Art to Investigate

  • Datomic (Clojure, distributed immutable database)
  • Bevy ECS (Rust, entity-component-system with events)
  • CPN Tools (Petri net modeling/simulation)
  • Diesel / SeaORM (Rust, compile-time SQL checking)
  • EventStoreDB (event sourcing infrastructure)

Next Steps

This document captures the "why" and "how" at a conceptual level. To validate:

  1. Prototype the macro/DSL syntax for a simple 2-3 entity system
  2. Implement auto-indexed bidirectional edges
  3. Implement typestate transitions
  4. Add speculative execution (try_step)
  5. Benchmark against hand-written equivalent
  6. Evaluate ergonomics in real use (databuild as first consumer)