Level 4 · 35 min
Sagas
Sagas coordinate multi-step business processes across multiple services or aggregates where a single ACID transaction is impossible. Each step does a local transaction; on failure, compensating transactions undo completed steps.
What a Saga Is
A saga is a sequence of local transactions. Each local transaction updates one service's data and publishes an event or message to trigger the next step. If a step fails, the saga executes compensating transactions for all previously completed steps. Sagas replace distributed ACID transactions (2PC — two-phase commit), which are synchronous, blocking, and create tight coupling between services. 2PC requires a distributed lock across all participants — a single slow or unavailable service blocks the entire transaction. Sagas trade atomicity for availability: each step commits independently. The system is eventually consistent. Compensating transactions are semantic undos — they apply the inverse business logic (issue a refund, release a seat hold) rather than a database rollback.
Choreography vs Orchestration
Choreography: each service knows what to do when it receives an event. OrderPlaced → InventoryService deducts stock and publishes InventoryDeducted → PaymentService charges card and publishes PaymentCharged → ShippingService creates shipment. No central coordinator. Advantages: loose coupling, simple each service. Disadvantages: business process logic is spread across services — hard to see the overall flow, hard to add a new step. Orchestration: a Saga Orchestrator sends commands to each service and listens for replies. The orchestrator knows the full saga flow: send ReserveInventory command, await InventoryReserved, send ChargePayment, await PaymentCharged, send CreateShipment. Advantages: business process is explicit and visible in one place, easier to add/remove steps, easier to implement rollback logic. Newman's framing: 'If orchestration is command-and-control, choreographed sagas represent a trust-but-verify architecture.' His team-based heuristic: 'I am very relaxed in the use of orchestrated sagas when one team owns implementation of the entire saga. If you have multiple teams involved, I greatly prefer the more decomposed choreographed saga — the more loosely coupled architecture allows teams to work more in isolation.' The compensating transaction distinction: 'With our saga, we have multiple transactions involved, and some of those may have already committed before we decide to roll back. You need to implement a compensating transaction' — a semantic undo that applies inverse business logic, not a database rollback. — Sam Newman, Building Microservices (2nd ed.)
Compensating Transactions and Failure Handling
Not all operations can be compensated. A sent email cannot be unsent — this is a 'pivot transaction' (the point of no return). Design sagas so that non-compensatable operations occur last. Idempotency is critical: saga steps can be retried (network failures, crashes). Each step must be idempotent — calling it twice has the same effect as calling it once. Use idempotency keys for external calls (payment APIs). Outbox pattern: write the event/command to an outbox table in the same local transaction as the data change — guarantees at-least-once delivery. Saga state machine: track the saga's current step and failed steps. If the saga orchestrator crashes, it restarts from the last recorded step. Common saga frameworks: Axon Saga (Java), Temporal.io (workflow as code), AWS Step Functions.
Code example
// Orchestration saga (pseudocode)\nclass PlaceOrderSaga {\n async execute(orderId: string) {\n try {\n await inventoryService.reserve(orderId); // Step 1\n await paymentService.charge(orderId); // Step 2\n await shippingService.create(orderId); // Step 3\n } catch (error) {\n // Rollback in reverse order\n await shippingService.cancel(orderId); // Compensate 3\n await paymentService.refund(orderId); // Compensate 2\n await inventoryService.release(orderId); // Compensate 1\n }\n }\n}\n\n// Each step must be idempotent:\n// inventoryService.reserve(orderId) called twice = same result as called once