Saga Pattern: Managing Distributed Transactions in Microservices

Posted on: 4/21/2026 5:13:44 AM

Imagine placing an order on an e-commerce site: the system needs to create the order, charge the customer's account, decrement inventory, and schedule shipping — each step lives in a different microservice with its own database. If "charge the account" succeeds but "decrement inventory" fails because the item is sold out, what happens? You can't use a single ROLLBACK because the data is scattered across 4 independent databases. That's the classic Distributed Transaction problem — and the Saga Pattern is the production-proven solution, used by Netflix, Uber, Shopee, Grab, and countless others.

This post goes deep on Saga Pattern architecture: from the foundations, through Choreography vs Orchestration, to real implementations with MassTransit on .NET 10 and Temporal for durable execution. Everything aimed at production, not just demos.

87% Microservice systems that need distributed transactions
2PC ✗ Two-Phase Commit doesn't scale
3 types of transactions in Saga (Compensable, Pivot, Retryable)
<100ms Target saga step latency

1. The problem: why distributed transactions are hard

In a monolith, a single business operation usually fits inside one database transaction — with full ACID guarantees (Atomicity, Consistency, Isolation, Durability). Once you split into microservices and each owns its own database (the database-per-service pattern), that end-to-end ACID guarantee disappears.

Two-Phase Commit (2PC) — the old solution, major limits

The traditional way to handle distributed transactions is Two-Phase Commit (2PC): a coordinator asks all participants to "prepare" (phase 1), then tells them to "commit" or "rollback" (phase 2). It's theoretically correct, but has serious limits in a microservices setting:

PropertyTwo-Phase CommitSaga Pattern
MechanismLocks all resources until commitChain of local transactions + compensations
LatencyHigh (waits for all participants)Low (each step independent)
AvailabilityLow — 1 participant down = everyone blockedHigh — failures only trigger compensation
ScalabilityPoor (lock contention grows with participants)Good (event-driven, async)
IsolationFull (serializable)Partial (needs countermeasures)
FitsSingle database clusterMicroservices, cross-service workflows

Real-world warning

Most popular message brokers — RabbitMQ, Apache Kafka, Azure Service Bus — don't support 2PC. If your system communicates via message queues, 2PC is simply not an option — you have to use Saga or an equivalent pattern.

2. What is the Saga Pattern?

The Saga Pattern breaks a distributed transaction into a chain of local transactions — each one runs inside a single service, updates that service's database, and emits an event/message to trigger the next step. If a step fails, the saga runs a reverse chain of compensating transactions to undo the work already completed.

sequenceDiagram
    participant O as Order Service
    participant P as Payment Service
    participant I as Inventory Service
    participant S as Shipping Service

    O->>O: T1: Create order (PENDING)
    O->>P: Event: OrderCreated
    P->>P: T2: Charge payment
    P->>I: Event: PaymentCompleted
    I->>I: T3: Decrement inventory
    I->>S: Event: InventoryReserved
    S->>S: T4: Schedule shipping
    S->>O: Event: ShippingScheduled
    O->>O: Update: CONFIRMED

Figure 1: Happy path saga — 4 local transactions completing in order

Three kinds of saga transactions

Not every step in a saga is equal. The Microsoft Azure Architecture Center classifies them cleanly:

Compensable Can be undone via a compensating transaction. E.g., create order -> cancel order
Pivot The "point of no return" — the boundary between compensable and retryable phases
Retryable After the pivot, these steps are idempotent and will be retried until they succeed
graph LR
    T1["T1: Create Order
(Compensable)"] T2["T2: Charge Payment
(Compensable)"] T3["T3: Confirm Stock
(Pivot)"] T4["T4: Ship
(Retryable)"] T5["T5: Send Email
(Retryable)"] T1 --> T2 --> T3 --> T4 --> T5 style T1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style T2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style T3 fill:#e94560,stroke:#fff,color:#fff style T4 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50 style T5 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

Figure 2: Transaction classification — Compensable (pink) -> Pivot (red) -> Retryable (green)

3. Choreography vs Orchestration

There are two main ways to implement a saga — each fits a different level of system complexity.

3.1 Choreography: decentralized, event-driven

In the Choreography model, there is no central coordinator. Each service listens for events from the previous service, runs its business logic, and emits the next event. Think of a jazz band — every musician listens and coordinates without a conductor.

graph LR
    OS["Order Service"]
    PS["Payment Service"]
    IS["Inventory Service"]
    SS["Shipping Service"]
    EB["Event Bus"]

    OS -->|OrderCreated| EB
    EB -->|OrderCreated| PS
    PS -->|PaymentCompleted| EB
    EB -->|PaymentCompleted| IS
    IS -->|InventoryReserved| EB
    EB -->|InventoryReserved| SS
    SS -->|ShippingScheduled| EB
    EB -->|ShippingScheduled| OS

    style OS fill:#e94560,stroke:#fff,color:#fff
    style PS fill:#2c3e50,stroke:#fff,color:#fff
    style IS fill:#2c3e50,stroke:#fff,color:#fff
    style SS fill:#2c3e50,stroke:#fff,color:#fff
    style EB fill:#f8f9fa,stroke:#e94560,color:#e94560

Figure 3: Choreography — services talk over an event bus with no coordinator

When a failure happens, the service emits a compensating event and earlier services must listen and undo their own work:

sequenceDiagram
    participant O as Order Service
    participant P as Payment Service
    participant I as Inventory Service

    O->>P: Event: OrderCreated
    P->>P: T2: Charge payment OK
    P->>I: Event: PaymentCompleted
    I->>I: T3: Decrement inventory FAILED (out of stock)
    I->>P: Event: InventoryFailed
    P->>P: C2: Refund
    P->>O: Event: PaymentRefunded
    O->>O: C1: Cancel order

Figure 4: Compensation flow in Choreography — each service handles its own rollback

3.2 Orchestration: centralized and explicit

In the Orchestration model, a Saga Orchestrator (a.k.a. Saga Manager) drives the entire flow. It knows the step order, sends commands to each service, receives responses, and decides the next step or triggers compensation. Think of a symphony conductor.

graph TD
    ORCH["Saga Orchestrator
(State Machine)"] OS["Order Service"] PS["Payment Service"] IS["Inventory Service"] SS["Shipping Service"] ORCH -->|"1. CreateOrder"| OS OS -->|"OrderCreated"| ORCH ORCH -->|"2. ProcessPayment"| PS PS -->|"PaymentCompleted"| ORCH ORCH -->|"3. ReserveInventory"| IS IS -->|"InventoryReserved"| ORCH ORCH -->|"4. ScheduleShipping"| SS SS -->|"ShippingScheduled"| ORCH style ORCH fill:#e94560,stroke:#fff,color:#fff style OS fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style PS fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style IS fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style SS fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50

Figure 5: Orchestration — a Saga Orchestrator drives the whole flow through a state machine

3.3 Detailed comparison

CriterionChoreographyOrchestration
CoordinationDecentralized — each service knows the next stepCentralized — orchestrator manages the entire flow
CouplingLoose coupling via eventsServices only know the orchestrator, not each other
VisibilityHard to trace — logic spread across servicesEasy to trace — state machine holds explicit state
Cyclic dependenciesPossible — services subscribe to each other's eventsNone — one-way flow through the orchestrator
Single point of failureNoYes (orchestrator) — needs HA
Fits2–4 services, simple workflows5+ services, complex workflows, audit needed
TestingHard — must run every serviceEasier — mock participants, test the orchestrator
Adding a new stepPainful — rewire the event chainSimple — add a step in the state machine

Rule of thumb

If drawing the flow on paper takes more than 4 arrows, or you have complex branching logic — pick Orchestration. If the flow is simple and linear and you want maximum decoupling — Choreography is enough. Most large production systems pick Orchestration because it's easier to debug and maintain.

4. Compensating Transactions: the art of undo

A compensating transaction is not simply "undo" — it's a reverse business operation that brings the system back to a consistent state. Key point: a compensating transaction creates a new state that is business-equivalent, not a bit-perfect restoration of the previous state.

Concrete examples

Forward TransactionCompensating TransactionNote
Create order (status: PENDING)Update status → CANCELLEDDon't delete the row — keep audit trail
Charge accountRefund + write transaction logLog the reason: "Saga compensation"
Decrement stock (stock -= quantity)Increment stock (stock += quantity)Must check for concurrent modifications
Send confirmation emailCannot be undone!This is exactly why email belongs at the end as a retryable step
Charge credit card via gatewayCall the Refund APISome gateways take 24h to process refunds

Golden rule

Put irreversible steps (sending email, calling external APIs with no undo, sending SMS) at the end of the saga — after the pivot transaction. That way, if compensation is needed, these steps haven't run yet and don't need to be undone.

5. Data anomalies and how to prevent them

The Saga Pattern doesn't give you the isolation levels of a database transaction. That leads to anomalies you have to handle explicitly:

Common anomalies

AnomalyDescriptionExample
Lost UpdatesSaga A overwrites Saga B's result without knowingTwo orders both decrement stock; only one decrement is recorded
Dirty ReadsSaga B reads data that Saga A is modifying (not yet committed, or will be compensated)A service reads decremented stock, but the order gets cancelled afterward — stock compensation hasn't run yet
Fuzzy ReadsWithin the same saga, two reads of the same data give different resultsStep 1 reads price = 100k, step 3 re-reads = 120k (changed mid-flight)

Countermeasures

The Microsoft Azure Architecture Center suggests several strategies:

Semantic Lock Flag records as "in-progress" with an application-level flag
Commutative Update Design updates so they can be applied in any order
Pessimistic View Reorder the saga so updates happen in retryable steps
Reread Value Re-read the data before updating to detect changes

Semantic Lock in practice — instead of leaving Order in "Created" state, flag it as "PENDING_PAYMENT" so other services know the record is inside a saga:

public enum OrderStatus
{
    PendingPayment,   // Semantic lock — saga in flight
    PendingInventory, // Semantic lock — waiting on inventory
    Confirmed,        // Saga completed successfully
    Cancelled,        // Saga was compensated
    Failed            // Saga failed, cannot be compensated
}

6. Implementing with MassTransit on .NET 10

MassTransit is the most popular message-bus library for .NET and ships a Saga State Machine — the cleanest, most powerful way to build orchestration sagas in the .NET ecosystem. The state-machine approach lets you define a saga as a finite state machine with explicit states, events, and transitions.

Project layout

src/
├── OrderSaga/
│   ├── OrderSagaState.cs         // State entity
│   ├── OrderSagaStateMachine.cs  // State machine definition
│   └── Events/
│       ├── OrderCreated.cs
│       ├── PaymentCompleted.cs
│       ├── PaymentFailed.cs
│       ├── InventoryReserved.cs
│       └── InventoryFailed.cs
├── OrderService/
├── PaymentService/
└── InventoryService/

Saga state definition

public class OrderSagaState : SagaStateMachineInstance, ISagaVersion
{
    public Guid CorrelationId { get; set; }
    public int Version { get; set; }
    public string CurrentState { get; set; } = default!;

    public Guid OrderId { get; set; }
    public Guid CustomerId { get; set; }
    public decimal TotalAmount { get; set; }
    public int ItemCount { get; set; }

    public DateTime CreatedAt { get; set; }
    public DateTime? PaymentCompletedAt { get; set; }
    public DateTime? InventoryReservedAt { get; set; }
    public string? FailureReason { get; set; }
}

State machine definition

public class OrderSagaStateMachine : MassTransitStateMachine<OrderSagaState>
{
    public State AwaitingPayment { get; private set; } = default!;
    public State AwaitingInventory { get; private set; } = default!;
    public State Completed { get; private set; } = default!;
    public State Compensating { get; private set; } = default!;
    public State Failed { get; private set; } = default!;

    public Event<OrderCreated> OrderCreated { get; private set; } = default!;
    public Event<PaymentCompleted> PaymentCompleted { get; private set; } = default!;
    public Event<PaymentFailed> PaymentFailed { get; private set; } = default!;
    public Event<InventoryReserved> InventoryReserved { get; private set; } = default!;
    public Event<InventoryFailed> InventoryFailed { get; private set; } = default!;

    public OrderSagaStateMachine()
    {
        InstanceState(x => x.CurrentState);

        Event(() => OrderCreated,
            x => x.CorrelateById(ctx => ctx.Message.OrderId));
        Event(() => PaymentCompleted,
            x => x.CorrelateById(ctx => ctx.Message.OrderId));
        Event(() => PaymentFailed,
            x => x.CorrelateById(ctx => ctx.Message.OrderId));
        Event(() => InventoryReserved,
            x => x.CorrelateById(ctx => ctx.Message.OrderId));
        Event(() => InventoryFailed,
            x => x.CorrelateById(ctx => ctx.Message.OrderId));

        Initially(
            When(OrderCreated)
                .Then(ctx =>
                {
                    ctx.Saga.OrderId = ctx.Message.OrderId;
                    ctx.Saga.CustomerId = ctx.Message.CustomerId;
                    ctx.Saga.TotalAmount = ctx.Message.TotalAmount;
                    ctx.Saga.CreatedAt = DateTime.UtcNow;
                })
                .Publish(ctx => new ProcessPayment(
                    ctx.Saga.OrderId,
                    ctx.Saga.CustomerId,
                    ctx.Saga.TotalAmount))
                .TransitionTo(AwaitingPayment)
        );

        During(AwaitingPayment,
            When(PaymentCompleted)
                .Then(ctx =>
                    ctx.Saga.PaymentCompletedAt = DateTime.UtcNow)
                .Publish(ctx => new ReserveInventory(
                    ctx.Saga.OrderId,
                    ctx.Saga.ItemCount))
                .TransitionTo(AwaitingInventory),

            When(PaymentFailed)
                .Then(ctx =>
                    ctx.Saga.FailureReason = ctx.Message.Reason)
                .Publish(ctx => new CancelOrder(ctx.Saga.OrderId))
                .TransitionTo(Failed)
        );

        During(AwaitingInventory,
            When(InventoryReserved)
                .Then(ctx =>
                    ctx.Saga.InventoryReservedAt = DateTime.UtcNow)
                .Publish(ctx => new ConfirmOrder(ctx.Saga.OrderId))
                .TransitionTo(Completed),

            When(InventoryFailed)
                .Then(ctx =>
                    ctx.Saga.FailureReason = ctx.Message.Reason)
                .Publish(ctx => new RefundPayment(
                    ctx.Saga.OrderId,
                    ctx.Saga.TotalAmount))
                .Publish(ctx => new CancelOrder(ctx.Saga.OrderId))
                .TransitionTo(Compensating)
        );
    }
}
stateDiagram-v2
    [*] --> AwaitingPayment : OrderCreated / ProcessPayment
    AwaitingPayment --> AwaitingInventory : PaymentCompleted / ReserveInventory
    AwaitingPayment --> Failed : PaymentFailed / CancelOrder
    AwaitingInventory --> Completed : InventoryReserved / ConfirmOrder
    AwaitingInventory --> Compensating : InventoryFailed / RefundPayment + CancelOrder
    Compensating --> Failed : CompensationCompleted

Figure 6: OrderSaga state diagram — MassTransit State Machine

Registration in Program.cs

builder.Services.AddMassTransit(x =>
{
    x.AddSagaStateMachine<OrderSagaStateMachine, OrderSagaState>()
        .EntityFrameworkRepository(r =>
        {
            r.ConcurrencyMode = ConcurrencyMode.Optimistic;
            r.AddDbContext<DbContext, OrderSagaDbContext>((provider, optionsBuilder) =>
            {
                optionsBuilder.UseSqlServer(
                    builder.Configuration.GetConnectionString("SagaDb"));
            });
        });

    x.UsingRabbitMq((context, cfg) =>
    {
        cfg.Host("rabbitmq://localhost");
        cfg.ConfigureEndpoints(context);
    });
});

Why use Optimistic Concurrency?

MassTransit persists saga state in a database (EF Core, MongoDB, Redis...). When multiple events land on the same saga instance at once, Optimistic Concurrency uses a version column to detect conflicts — conflicting events are automatically retried. That's the safest way to handle concurrent saga updates without a distributed lock.

7. Temporal: durable execution for sagas

If MassTransit represents the "traditional" state-machine + message-broker approach, Temporal represents a newer paradigm: Durable Execution. Instead of wiring state, events, and retries yourself, Temporal guarantees your code runs to completion — across crashes, network failures, and redeploys mid-flight.

[Workflow]
public class OrderSagaWorkflow
{
    [WorkflowRun]
    public async Task<OrderResult> RunAsync(OrderRequest request)
    {
        var orderId = Workflow.NewGuid();

        // Step 1: Create order
        await Workflow.ExecuteActivityAsync(
            (OrderActivities a) => a.CreateOrderAsync(orderId, request),
            new() { StartToCloseTimeout = TimeSpan.FromSeconds(30) });

        try
        {
            // Step 2: Charge payment
            await Workflow.ExecuteActivityAsync(
                (PaymentActivities a) => a.ProcessPaymentAsync(
                    orderId, request.Amount),
                new()
                {
                    StartToCloseTimeout = TimeSpan.FromSeconds(30),
                    RetryPolicy = new()
                    {
                        MaximumAttempts = 3,
                        InitialInterval = TimeSpan.FromSeconds(1),
                        BackoffCoefficient = 2.0
                    }
                });

            // Step 3: Decrement inventory
            await Workflow.ExecuteActivityAsync(
                (InventoryActivities a) => a.ReserveInventoryAsync(
                    orderId, request.Items),
                new() { StartToCloseTimeout = TimeSpan.FromSeconds(30) });

            // Step 4: Confirm order
            await Workflow.ExecuteActivityAsync(
                (OrderActivities a) => a.ConfirmOrderAsync(orderId),
                new() { StartToCloseTimeout = TimeSpan.FromSeconds(10) });

            return new OrderResult(orderId, OrderStatus.Confirmed);
        }
        catch (ActivityFailureException ex)
        {
            // Compensation: undo in reverse order
            await Workflow.ExecuteActivityAsync(
                (PaymentActivities a) => a.RefundPaymentAsync(
                    orderId, request.Amount),
                new() { StartToCloseTimeout = TimeSpan.FromSeconds(30) });

            await Workflow.ExecuteActivityAsync(
                (OrderActivities a) => a.CancelOrderAsync(
                    orderId, ex.Message),
                new() { StartToCloseTimeout = TimeSpan.FromSeconds(10) });

            return new OrderResult(orderId, OrderStatus.Cancelled);
        }
    }
}

MassTransit vs Temporal

CriterionMassTransit SagaTemporal Workflow
Programming modelDeclarative state machineImperative code (async/await)
PersistenceEF Core, MongoDB, RedisTemporal Server (PostgreSQL/MySQL/Cassandra)
Retry & TimeoutSelf-configured via middlewareBuilt in, per-activity configuration
VisibilityBuild your own dashboardTemporal Web UI included
InfrastructureNeeds a message broker (RabbitMQ/Kafka)Needs a Temporal Server cluster
Learning curveModerate (familiar .NET stack)Higher (new paradigm)
Long-running workflowsSupported but needs timeout managementExcellent — workflows that run for days/weeks
Best fitPure .NET team with an existing brokerComplex workflows, detailed audit trail

8. Idempotency: a hard requirement

In distributed systems messages can be redelivered (at-least-once delivery). If a saga handler isn't idempotent, re-processing the same event causes incorrect side effects — e.g., double-charging the same order. Idempotency is not nice-to-have, it's mandatory.

Techniques for enforcing idempotency

1. Idempotency key: every message carries a unique key. The handler checks the key before processing:

public async Task Handle(ProcessPayment message)
{
    var idempotencyKey = $"payment:{message.OrderId}";

    var alreadyProcessed = await _db.ProcessedMessages
        .AnyAsync(m => m.Key == idempotencyKey);

    if (alreadyProcessed)
        return; // Already processed, skip

    await _paymentGateway.ChargeAsync(
        message.CustomerId, message.Amount);

    _db.ProcessedMessages.Add(new ProcessedMessage
    {
        Key = idempotencyKey,
        ProcessedAt = DateTime.UtcNow
    });

    await _db.SaveChangesAsync();
}

2. Conditional update (optimistic): use a WHERE clause to update only when the current state matches:

UPDATE Orders
SET Status = 'Confirmed', Version = Version + 1
WHERE Id = @OrderId AND Status = 'PendingInventory' AND Version = @ExpectedVersion

3. Outbox pattern: write the message to the database in the same transaction as the business data, then publish later. Combined with an idempotent consumer, this yields "exactly-once" semantics:

sequenceDiagram
    participant S as Service
    participant DB as Database
    participant OB as Outbox Publisher
    participant MB as Message Broker

    S->>DB: BEGIN TRANSACTION
    S->>DB: UPDATE business data
    S->>DB: INSERT INTO Outbox (message)
    S->>DB: COMMIT

    OB->>DB: Poll Outbox table
    OB->>MB: Publish message
    OB->>DB: Mark as published

Figure 7: Outbox Pattern — atomicity between business data and message publishing

MassTransit ships a built-in Outbox

MassTransit includes a Transactional Outbox that integrates with EF Core. Just turn on cfg.AddEntityFrameworkOutbox<OrderDbContext>() — messages are automatically written to the outbox table in the same transaction and published by a background worker. No need to implement it yourself.

9. Monitoring and observability for sagas

Sagas span multiple services and can run for seconds or even days. Without observability, you're blind when things go wrong in production.

Metrics to track

MetricDescriptionSuggested alert threshold
saga_started_totalNumber of sagas startedSudden spike > 3x baseline
saga_completed_totalNumber of sagas that completed successfullyDrop > 20% vs started
saga_compensated_totalNumber of sagas that were compensatedRate > 5% over 15 minutes
saga_duration_secondsSaga runtime (histogram)P99 > 30s
saga_step_failures_totalStep failures (pre-retry)Sustained rate > 10/min
saga_stuck_countSagas stuck in the same stateAny saga stuck > 10 minutes

Distributed tracing

Propagate the Correlation ID (the saga instance ID) through every message and HTTP call. With OpenTelemetry, you can trace the full saga flow end-to-end:

// MassTransit propagates CorrelationId through message headers automatically.
// Combine with OpenTelemetry:
services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("MassTransit")
        .AddAspNetCoreInstrumentation()
        .AddSqlClientInstrumentation()
        .AddOtlpExporter());

10. When to use (and not use) sagas

Use a saga when:

  • The business process spans multiple services with their own databases
  • You need eventual consistency across services
  • The workflow has clear compensating logic for each step
  • The system needs independent scaling — each service scales on its own
  • Steps can run async — no immediate-response requirement

Don't use a saga when:

  • All data lives in one database — just use a regular database transaction
  • You need strong consistency (serializable isolation) — sagas only give eventual consistency
  • Compensating transactions are infeasible for multiple steps
  • The workflow is trivial (2 services, no branching) — saga overhead isn't worth it
graph TD
    Q1{"Data in
one database?"} Q2{"Need strong
consistency?"} Q3{"Workflow > 4 steps
or has branching?"} A1["Use a DB transaction"] A2["Consider 2PC
or redesign"] A3["Saga Choreography"] A4["Saga Orchestration"] Q1 -->|Yes| A1 Q1 -->|No| Q2 Q2 -->|Yes| A2 Q2 -->|No| Q3 Q3 -->|No| A3 Q3 -->|Yes| A4 style Q1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style Q2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style Q3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style A1 fill:#4CAF50,stroke:#fff,color:#fff style A2 fill:#ff9800,stroke:#fff,color:#fff style A3 fill:#2c3e50,stroke:#fff,color:#fff style A4 fill:#e94560,stroke:#fff,color:#fff

Figure 8: Decision tree — pick the right approach for distributed transactions

11. Conclusion

The Saga Pattern is not a silver bullet — it trades isolation for availability and scalability. But in a microservices world where each service needs to stand on its own, it's the most practical pattern for keeping data consistent across the system.

Key takeaways:

  • Choreography for simple workflows, Orchestration for complex ones — most production systems pick orchestration
  • Put irreversible steps after the pivot transaction
  • Idempotency is mandatory — use idempotency keys + conditional updates + outbox pattern
  • Invest in monitoring: saga lifecycle metrics, distributed tracing, alerts on stuck sagas
  • MassTransit is the sweet spot for .NET teams, Temporal for very complex workflows

Practical advice

Start simple. Not every cross-service operation needs a saga. Before you implement one, ask: "Can we redesign this so the operation fits in a single service?" Sometimes, the right service boundary removes the need for a distributed transaction altogether.

References: