Command Palette

Search for a command to run...

Level 4 · 35 min

Sagas

Sagas coordinate multi-step business processes across multiple services or aggregates where a single ACID transaction is impossible. Each step does a local transaction; on failure, compensating transactions undo completed steps.

What a Saga Is

A saga is a sequence of local transactions. Each local transaction updates one service's data and publishes an event or message to trigger the next step. If a step fails, the saga executes compensating transactions for all previously completed steps. Sagas replace distributed ACID transactions (2PC — two-phase commit), which are synchronous, blocking, and create tight coupling between services. 2PC requires a distributed lock across all participants — a single slow or unavailable service blocks the entire transaction. Sagas trade atomicity for availability: each step commits independently. The system is eventually consistent. Compensating transactions are semantic undos — they apply the inverse business logic (issue a refund, release a seat hold) rather than a database rollback.

Choreography vs Orchestration

Choreography: each service knows what to do when it receives an event. OrderPlaced → InventoryService deducts stock and publishes InventoryDeducted → PaymentService charges card and publishes PaymentCharged → ShippingService creates shipment. No central coordinator. Advantages: loose coupling, simple each service. Disadvantages: business process logic is spread across services — hard to see the overall flow, hard to add a new step. Orchestration: a Saga Orchestrator sends commands to each service and listens for replies. The orchestrator knows the full saga flow: send ReserveInventory command, await InventoryReserved, send ChargePayment, await PaymentCharged, send CreateShipment. Advantages: business process is explicit and visible in one place, easier to add/remove steps, easier to implement rollback logic. Newman's framing: 'If orchestration is command-and-control, choreographed sagas represent a trust-but-verify architecture.' His team-based heuristic: 'I am very relaxed in the use of orchestrated sagas when one team owns implementation of the entire saga. If you have multiple teams involved, I greatly prefer the more decomposed choreographed saga — the more loosely coupled architecture allows teams to work more in isolation.' The compensating transaction distinction: 'With our saga, we have multiple transactions involved, and some of those may have already committed before we decide to roll back. You need to implement a compensating transaction' — a semantic undo that applies inverse business logic, not a database rollback. — Sam Newman, Building Microservices (2nd ed.)

Compensating Transactions and Failure Handling

Not all operations can be compensated. A sent email cannot be unsent — this is a 'pivot transaction' (the point of no return). Design sagas so that non-compensatable operations occur last. Idempotency is critical: saga steps can be retried (network failures, crashes). Each step must be idempotent — calling it twice has the same effect as calling it once. Use idempotency keys for external calls (payment APIs). Outbox pattern: write the event/command to an outbox table in the same local transaction as the data change — guarantees at-least-once delivery. Saga state machine: track the saga's current step and failed steps. If the saga orchestrator crashes, it restarts from the last recorded step. Common saga frameworks: Axon Saga (Java), Temporal.io (workflow as code), AWS Step Functions.

Key Takeaways

  • Sagas achieve eventual consistency across services without distributed locks. Each step is a local transaction — compensate on failure.
  • Choreography distributes process knowledge across services (loose coupling, harder to trace). Orchestration centralizes process knowledge (easier to trace, explicit flow).
  • All saga steps must be idempotent — they will be retried. Non-compensatable operations (sent emails, published messages) must be the last step — they cannot be undone.

Code example

// Orchestration saga (pseudocode)\nclass PlaceOrderSaga {\n  async execute(orderId: string) {\n    try {\n      await inventoryService.reserve(orderId);   // Step 1\n      await paymentService.charge(orderId);       // Step 2\n      await shippingService.create(orderId);      // Step 3\n    } catch (error) {\n      // Rollback in reverse order\n      await shippingService.cancel(orderId);      // Compensate 3\n      await paymentService.refund(orderId);       // Compensate 2\n      await inventoryService.release(orderId);    // Compensate 1\n    }\n  }\n}\n\n// Each step must be idempotent:\n// inventoryService.reserve(orderId) called twice = same result as called once