Microsoft Orleans 9 on .NET 10 — Virtual Actors, Distributed Grains, and Stateful Cloud-Native Architecture for Games, IoT, and AI Agents

Posted on: 4/17/2026 4:09:35 AM

Table of contents

1. Why stateful services are reclaiming the spotlight in 2026
1. Three decisive questions before choosing Orleans
2. Evolution — from Project Orleans to .NET 10 Cloud-Native
3. Virtual Actor Model — the core difference from classical Actors
1. Four traits that set Orleans apart from Akka
4. Grain basics — from interface to implementation
5. Silos, clusters, and placement — where grains really run
6. Persistence — where grain state lives
1. A classic persistence trap
7. Streams and Broadcast Channel — pub/sub inside the cluster
8. Transactions — ACID across multiple grains
9. Orleans 9 + .NET Aspire 10 — from code to cluster in one command
10. Observability — seeing the cluster to trust it
11. Quick comparison — Orleans vs Akka.NET, Proto.Actor, Dapr Actors
12. Production use cases — three typical scenarios
13. Four common anti-patterns
14. Production checklist — Don't forget these before go-live
1. Before opening the cluster to real traffic
15. Closing — when Orleans is the right choice
16. References

1. Why stateful services are reclaiming the spotlight in 2026

For nearly a decade since the cloud-native boom, "stateless" has been close to a doctrine. Every service was expected to scale out easily, state was pushed to databases or caches, and "stateful" sounded old-fashioned. But 2026 tells a different story: multiplayer game systems, low-latency financial exchanges, IoT digital twins, real-time chat, collaboration tools, and — most notably — AI agents with long-term conversation memory all need to keep state inside the process. The reason is simple: every database round-trip is a cost, and when an entity (a player, an order book, a device, a conversation) takes thousands of requests per second, pushing state outside is what caps scaling.

Microsoft Orleans was born at Microsoft Research in 2010 exactly to solve that class of problem. At its core is the Virtual Actor Model — a pragmatic variant of the classical Actor Model (Erlang, Akka) where developers don't manage actor lifecycle, allocation, or cleanup themselves. Every "grain" (Orleans's actor unit) logically always exists; the Orleans runtime automatically activates it when a request arrives, keeps it alive in RAM while it's working, and cleans it up when idle. By Orleans 9 in 2026 — released with .NET 10 — the framework has come a long way since Halo 4 used it to scale presence/matchmaking in 2012.

This article is a practitioner's handbook for architects and senior engineers considering Orleans for 2026 systems. We won't stop at "Hello Grain" — we'll go straight into the hard decisions: when to use Virtual Actors instead of stateless microservices, how to design grains for a real domain, placement strategies, layered persistence, streams and transactions, integrating .NET Aspire 10 to orchestrate the cluster, head-to-head comparisons with Akka.NET, Proto.Actor, and Dapr Actors, four common anti-patterns, and a go-live checklist.

~10Mconcurrently active grains on a typical 32-silo cluster

<5msp50 latency for a local grain call (same silo)

100K+grain activations per second on production clusters

9.0the 2026 branch, full .NET 10 + partial NativeAOT

Three decisive questions before choosing Orleans

Does your domain genuinely have substantial entity state inside each "object" (player, order, device, session), or is it just CRUD around SQL tables? Are you comfortable with single-threaded per grain (each grain processes one thing at a time) in exchange for free concurrency? Are you willing to operate a cluster membership protocol (whether self-managed via ADO or using Azure/Kubernetes discovery)? Three "yes" answers mean Orleans saves a remarkable amount of code and performance. Any "no" and a stateless microservice + database is often simpler.

2. Evolution — from Project Orleans to .NET 10 Cloud-Native

Knowing Orleans's history explains why today's API has some "unusual" conventions compared with standard DI habits, why the grain directory is so important, and why version 9 puts so much weight on .NET Aspire integration and standard OpenTelemetry observability.

2010 — Project Orleans at Microsoft Research

The eXtreme Computing Group built Orleans on .NET Framework 4.0 to address "cloud services anyone can write". The term "virtual actor" was coined here, distinguishing it from classical actors that must be allocated and destroyed manually.

2012 — Halo 4 presence service

343 Industries used Orleans as the presence, party, and matchmaking backend for Halo 4 — more than a million concurrent players on under 300 servers. The first case study proving virtual actors scale to AAA game size.

2015 — Open-sourced on GitHub

Microsoft moved Orleans to open source under the .NET Foundation. The community grew fast — Skype, Gears of War, Age of Empires, and many European banks announced production use cases.

2018 — Orleans 2.x ported to .NET Core

The end of the .NET Framework era. Configuration moved to the builder pattern, DI became native via IServiceCollection, and cross-platform support for Linux/macOS arrived.

2021 — Orleans 3.x + 4.x with code generation

Roslyn source generators replaced dynamic proxies — no more Fody/Castle dependencies, AOT-friendly. A new serializer was many times faster than BinaryFormatter.

2023 — Orleans 7 with .NET 8

Streaming API reimagined, Reminder v2, placement director improvements, experimental gRPC transport. Client/silo host merged into a single generic host.

Q4 2024 — Orleans 8 + .NET 9

Standard OpenTelemetry telemetry on every grain call, metric cardinality drastically reduced. Grain interfaces support generic-type attributes. Key types extended to composite keys.

2026 — Orleans 9 with .NET 10

Tight integration with .NET Aspire 10 for local/production orchestration, NativeAOT for portions of the hot path (serialization + dispatch), broadcast channel replacing parts of streams, cluster membership that understands Kubernetes implicitly via StatefulSets + headless services, and cross-grain transactions production-stable.

3. Virtual Actor Model — the core difference from classical Actors

The classical Actor Model (Carl Hewitt 1973, Erlang, Akka) has three traits: each actor holds private state, processes messages sequentially, and communicates only via messages. But developers must Spawn actors themselves, manage lifecycle, and supervise them. When your entity count runs into the millions — one actor per user, one actor per device — manual management becomes a burden. Orleans's answer is the virtual actor: every grain logically always exists and is identified by a key (Guid, string, long, or composite). On the first request, the runtime activates an instance in RAM on some silo in the cluster; after sufficient idle time, the runtime deactivates and releases memory. Next request, another activation — possibly on a different silo. Developers "just call the grain", without knowing where it is or whether it's in memory.

graph LR
    CLIENT["Client / Web API"] --> PROXY["IGrainFactory.GetGrain<IPlayerGrain>(playerId)"]
    PROXY --> DIR["Grain Directory: which silo is playerId on?"]
    DIR -->|"not yet active"| PLACEMENT["Placement Director"]
    PLACEMENT --> SILO_C["Silo C (round-robin / prefer-local)"]
    SILO_C --> ACTIVATE["Activate instance + OnActivateAsync"]
    ACTIVATE --> STATE["Load state from storage"]
    STATE --> INVOKE["Invoke method"]
    DIR -->|"already active"| CACHE["Local cache: grain ref"]
    CACHE --> SILO_B["Silo B is hosting the grain"]
    SILO_B --> INVOKE
    INVOKE --> RESP["Return response to client"]

Lifecycle of a grain call — every routing concern hides behind GrainFactory.

The biggest win of this model isn't raw performance, it's the programming model: developers write code as if the entity were local, without thinking about connection pools, shards, or partition keys. Each grain has turn-based concurrency — only one request at a time — so no locks are needed to protect state. That's why game, fintech, and IoT teams accept a little flexibility loss for a huge productivity gain.

Four traits that set Orleans apart from Akka

Always-exist: no Spawn step. Calling a grain that "doesn't yet exist" is the same as calling an idle one.
Location transparency: placement is a runtime decision; the developer doesn't know which silo a grain is on and shouldn't.
Automatic activation/deactivation: the runtime creates/destroys instances according to CollectionAgeLimit.
Typed interfaces: grains are called through plain .NET interfaces, not Tell/Ask with boxed types. Roslyn tooling keeps the chain type-safe end-to-end.

4. Grain basics — from interface to implementation

A grain has three parts: an interface inheriting IGrainWithXKey (Guid/String/Integer/CompositeKey), a class inheriting Grain that implements that interface, and optionally [GenerateSerializer] on DTOs that move between grains. Orleans 9 leans on source generators: proxies are built at compile time with no runtime reflection, trim-friendly and AOT-friendly.

public interface IPlayerGrain : IGrainWithGuidKey
{
    Task<PlayerProfile> GetProfileAsync();
    Task AddScoreAsync(int delta);
    Task<MatchResult> EnterMatchAsync(Guid matchId);
}

[GenerateSerializer]
public sealed record PlayerProfile(
    [property: Id(0)] Guid Id,
    [property: Id(1)] string DisplayName,
    [property: Id(2)] int Score,
    [property: Id(3)] DateTimeOffset LastSeen);

public sealed class PlayerGrain : Grain, IPlayerGrain
{
    private readonly IPersistentState<PlayerProfile> _state;
    private readonly ILogger<PlayerGrain> _logger;

    public PlayerGrain(
        [PersistentState("profile", "players")] IPersistentState<PlayerProfile> state,
        ILogger<PlayerGrain> logger)
    {
        _state = state;
        _logger = logger;
    }

    public Task<PlayerProfile> GetProfileAsync() => Task.FromResult(_state.State);

    public async Task AddScoreAsync(int delta)
    {
        _state.State = _state.State with
        {
            Score = _state.State.Score + delta,
            LastSeen = DateTimeOffset.UtcNow
        };
        await _state.WriteStateAsync();
    }

    public async Task<MatchResult> EnterMatchAsync(Guid matchId)
    {
        var match = GrainFactory.GetGrain<IMatchGrain>(matchId);
        return await match.JoinAsync(this.GetPrimaryKey());
    }
}

Note three details: IPersistentState<T> injects directly into the constructor — Orleans DI already provides the provider. Serialization attributes [Id(0)] on each field — Orleans 9 uses a position-based serializer that supports schema evolution (safe to append new fields, unsafe to reorder). And GrainFactory is available on the base Grain class for calling other grains, making it feel like an object graph in a monolith.

5. Silos, clusters, and placement — where grains really run

An Orleans cluster consists of multiple silos — each silo is a .NET process that hosts grains. The cluster needs a membership table to know which silos are alive; this can be Azure Storage, ADO.NET, Consul, ZooKeeper, Redis, MongoDB, or in 2026 Kubernetes-aware membership reading endpoints from a headless service. When a silo joins or leaves, the membership ring changes and the grain directory rebalances.

graph TB
    subgraph CLUSTER["Orleans Cluster"]
        S1["Silo 1
- GrainDirectory partition A
- 12K active grains"]
        S2["Silo 2
- GrainDirectory partition B
- 11K active grains"]
        S3["Silo 3
- GrainDirectory partition C
- 9K active grains"]
        S4["Silo 4 (just joined)
- takes partition D
- rebalance 10%"]
    end
    MEMBERSHIP["Membership Table
ADO/AzureStorage/K8s"]
    S1 --- MEMBERSHIP
    S2 --- MEMBERSHIP
    S3 --- MEMBERSHIP
    S4 --- MEMBERSHIP
    STORAGE["Storage Providers
PostgreSQL / Azure / Cosmos / S3"]
    S1 --> STORAGE
    S2 --> STORAGE
    S3 --> STORAGE
    S4 --> STORAGE

A three-silo cluster; silo 4 joins and takes on a portion of the grain directory.

Placement strategies decide which silo a grain is first activated on. Orleans 9 offers the familiar options:

RandomPlacement (default) — simple, evenly distributed on large clusters.
ActivationCountBasedPlacement — pick the silo with the fewest active grains. Great for homogeneous workloads.
PreferLocalPlacement — if the request arrives at silo A, activate there. Ideal when the grain consumes local info (e.g. a cache grain reading a file on the node).
HashBasedPlacement — deterministic by key. Useful for routing related grains to the same silo to reduce cross-silo RPCs.
SiloRoleBasedPlacement — tag silos (CPU-heavy, GPU-enabled) and place grains into matching roles.
ResourceOptimizedPlacement (stable in Orleans 9) — the placement director reads CPU/memory/p95-latency metrics through telemetry and places grains on the least-loaded silo, with configurable weights.

Bad placement is a leading cause of unexplained slowdowns at scale. If your domain has natural locality (game: player + match should sit on the same silo; IoT: device + room sensor should sit on the same silo), investing in HashBasedPlacement with a custom director returns a lot of performance.

6. Persistence — where grain state lives

By default, Orleans stores nothing; state lives only in RAM and vanishes when the grain deactivates. To make state durable, use a grain storage provider: an abstraction that reads/writes by (grainType, grainKey) -> payload. The popular 2026 providers:

Provider	Best fit	Read/write latency	Operations notes
ADO.NET (Postgres / SQL Server)	Domains with constraints, standard backup needs, SQL-comfortable teams	2-8 ms / 5-15 ms	Index on PKI (grainTypeHash + grainId). Init scripts ready.
Azure Table Storage / Cosmos DB	Azure-native workloads, very high scale, RU-based pricing	5-20 ms / 8-25 ms	Simple, 1 MB (Table) / 2 MB (Cosmos) entity size limit.
AWS DynamoDB	AWS-native workloads, single-digit ms, partition auto-scale	2-10 ms / 5-15 ms	Watch for hot partitions; use adaptive capacity.
MongoDB	Document-oriented complex state, easy schema migration	3-10 ms / 5-20 ms	Enable a 3-node replica set.
Redis	Ephemeral state or cache-backed, acceptable to lose on Redis failure	<1 ms / 1-3 ms	Use when latency matters more than durability.
MemoryStorage	Unit tests, development	microseconds	Never enable in production.

Orleans 9 lets you bind multiple providers to the same grain: one attribute [PersistentState("profile", "players-sql")] for durable state, another [PersistentState("cache", "hot-redis")] for hot, recomputable data. That's why domain design around Orleans avoids many painful database migrations: a grain is the natural boundary of an aggregate root.

A classic persistence trap

Don't confuse WriteStateAsync() with "saved instantly, no errors possible". It returns a successful task once the storage provider confirms the write — but if Azure Table is throttled, that call can take 5-30 seconds. While the grain processes a request, system-wide throughput dips. Mitigations: consider write-behind with an internal buffer + a reminder that periodically flushes, or switch to an event-sourcing pattern with an append-only log.

7. Streams and Broadcast Channel — pub/sub inside the cluster

Orleans streams let a grain publish events to many subscribing consumers with at-most-once or at-least-once delivery guarantees depending on the provider. Typical providers:

MemoryStream — in-cluster, not persistent, great for dev.
Azure Queue / AWS SQS stream providers — pull-based, silo-side caching, natural backpressure.
EventHub stream provider — partitioned by EventHub, each silo gets a partition set; scales well for high-throughput workloads.
Kafka stream provider (community, stable in 9.x) — Kafka as transport, rewindable.

// Publisher — the match grain emits an event
var streamProvider = this.GetStreamProvider("kafka");
var streamId = StreamId.Create("matches", matchId);
var stream = streamProvider.GetStream<MatchEvent>(streamId);
await stream.OnNextAsync(new MatchEvent.PlayerJoined(playerId, DateTimeOffset.UtcNow));

// Consumer — stats grain subscribes implicitly
[ImplicitStreamSubscription("matches")]
public sealed class StatsGrain : Grain, IStatsGrain, IAsyncObserver<MatchEvent>
{
    public override async Task OnActivateAsync(CancellationToken ct)
    {
        var provider = this.GetStreamProvider("kafka");
        var stream = provider.GetStream<MatchEvent>(StreamId.Create("matches", this.GetPrimaryKey()));
        await stream.SubscribeAsync(this);
    }

    public Task OnNextAsync(MatchEvent evt, StreamSequenceToken? token) =>
        evt switch
        {
            MatchEvent.PlayerJoined j => AddJoinAsync(j),
            MatchEvent.PlayerLeft l => RemoveAsync(l),
            _ => Task.CompletedTask
        };
}

Orleans 9 adds the Broadcast Channel — an in-cluster fan-out channel optimized for "broadcast to every active grain of type T". Unlike streams, there's no strict delivery guarantee, but it's very lightweight (memory-only, cluster-internal transport). Typical uses: grain cache invalidation, broadcasting config changes, keep-alive pings.

8. Transactions — ACID across multiple grains

A lot of actor-model discussions end with "but actors don't have transactions". Orleans disproved that from version 3, and by 2026 Orleans transactions are stable enough for production at several European banks and fintechs. The mechanism is a tailored two-phase commit protocol + write-ahead log stored on regular storage.

public interface IAccountGrain : IGrainWithGuidKey
{
    [Transaction(TransactionOption.Join)]
    Task DebitAsync(decimal amount);

    [Transaction(TransactionOption.Join)]
    Task CreditAsync(decimal amount);

    [Transaction(TransactionOption.Supported)]
    Task<decimal> GetBalanceAsync();
}

public interface ITransferService : IGrainWithGuidKey
{
    [Transaction(TransactionOption.Create)]
    Task TransferAsync(Guid from, Guid to, decimal amount);
}

public sealed class TransferService : Grain, ITransferService
{
    public async Task TransferAsync(Guid from, Guid to, decimal amount)
    {
        var a = GrainFactory.GetGrain<IAccountGrain>(from);
        var b = GrainFactory.GetGrain<IAccountGrain>(to);
        await a.DebitAsync(amount);
        await b.CreditAsync(amount);
    }
}

The runtime guarantees: if any step in TransferAsync fails, both grains roll back their state to the pre-transaction point. Three caveats: transactions have noticeable overhead (typically 2-5× regular calls), so reserve them for business logic that truly needs ACID; storage providers must implement ITransactionalStateStorage; and avoid cross-cluster transactions — keep them within a cluster to stay under ~50 ms p99.

9. Orleans 9 + .NET Aspire 10 — from code to cluster in one command

One of Orleans 9's biggest strengths is first-class integration with .NET Aspire 10. Previously, running a local dev Orleans cluster meant opening several terminals, tweaking ports, and configuring a file-based membership provider. With Aspire 10, a single AppHost declares silos, clients, storage, and the dashboard:

var builder = DistributedApplication.CreateBuilder(args);

var pg = builder.AddPostgres("pg")
    .WithDataVolume()
    .AddDatabase("orleans");

var orleans = builder.AddOrleans("cluster")
    .WithClustering(pg)
    .WithGrainStorage("profile", pg)
    .WithGrainStorage("events", pg)
    .WithReminders(pg)
    .WithStreaming();

builder.AddProject<Projects.GameApi>("api")
    .WithReference(orleans.AsClient());

builder.AddProject<Projects.GameSilo>("silo")
    .WithReference(orleans)
    .WithReplicas(3);

builder.Build().Run();

The result: dotnet run gives you a 3-silo cluster, Postgres, and the Aspire dashboard with topology, logs, and metrics. Aspire 10 can also export Kubernetes manifests or Azure Container Apps deployments. Developer-experience-wise, this is the biggest leap Orleans has had since going open source.

graph LR
    DEV["dotnet run (AppHost)"] --> ASPIRE["Aspire Dashboard"]
    ASPIRE --> API["API project (Orleans Client)"]
    ASPIRE --> SILO1["Silo replica 1"]
    ASPIRE --> SILO2["Silo replica 2"]
    ASPIRE --> SILO3["Silo replica 3"]
    ASPIRE --> PG["Postgres (clustering + storage + reminders)"]
    SILO1 --> PG
    SILO2 --> PG
    SILO3 --> PG
    API -->|"grain call"| SILO1
    API -->|"grain call"| SILO2
    API -->|"grain call"| SILO3

A typical Aspire AppHost topology in dev mode — production only differs in the storage provider and replica count.

10. Observability — seeing the cluster to trust it

Orleans 9 emits standard OpenTelemetry metrics and traces for every grain call, silo lifecycle event, storage latency, reminder tick, and membership change. The most important metrics for a dashboard:

orleans.grain.calls broken down by grain.type, result (ok/error/timeout). Quickly spots dead or overloaded grains.
orleans.grain.activation.count — grains in RAM. Spikes may indicate a leak or deactivation being blocked.
orleans.grain.latency p50/p95/p99 — reveals hot grains.
orleans.storage.read/write.duration — the provider is wobbling.
orleans.membership.changes — silos joining/leaving. Flapping = network or GC issues.
orleans.messaging.queue.length — message backlog; a steady rise means downstream bottleneck.

On large clusters, cardinality can explode if you record metrics per grainId. Orleans 8.x addressed this with attribute-based filters: by default, metrics are recorded at the grain.type level, not tagged with grain.id unless explicitly enabled. 2026 production stacks commonly pipe metrics into Prometheus/Mimir, traces into Tempo/Jaeger, and logs into Loki/Elasticsearch.

11. Quick comparison — Orleans vs Akka.NET, Proto.Actor, Dapr Actors

Criterion	Orleans 9	Akka.NET 1.5	Proto.Actor	Dapr Actors
Model	Virtual Actor	Classical Actor (with Cluster Sharding)	Classical Actor + Cluster + Grain addon	Virtual Actor (via sidecar)
Host language	.NET 10	.NET / F#	.NET, Go, Kotlin, Python	Any language via sidecar
Placement	Many built-in + custom strategies	Consistent hashing / LeastShardAllocation	Random/Partition	Dapr placement service
Persistence	Provider-based, diverse plugins	Akka Persistence + journal/snapshot	Persistence addon	Dapr state stores
Transactions	Built-in cross-grain ACID	Not native; patterns must be hand-rolled	None	Limited (single actor)
Streams	Built-in + multiple providers	Akka Streams (reactive, powerful)	Minimal	Pub/Sub via Dapr, not rewindable by default
Tooling	Aspire + OTel	Petabridge tooling	Community	Dapr Dashboard + Radius
Client language	.NET	.NET / cross-platform via HTTP	Polyglot	Polyglot via sidecar
Best when	Pure .NET team, clear aggregates, needs locality + transactions	Team with reactive DNA, heavy Akka Streams use	Polyglot, lightweight needs	Multi-language, preferring open cloud-native standards

There's no single correct choice. For a 100% .NET team with clear aggregates (player, order, device, conversation) who want to program against Domain-Driven Design directly without jumping to YAML service meshes, Orleans is still the top pick for 2026. For a polyglot team or one that prefers vendor-neutral cloud-native standards, Dapr Actors deserves serious consideration even though the programming model isn't quite as elegant.

12. Production use cases — three typical scenarios

12.1 Multiplayer game — Player, Match, Leaderboard

A three-grain template: PlayerGrain holds profile + wallet + inventory, MatchGrain holds room state (players, scores, ticks), LeaderboardGrain aggregates by region. Placement: Player uses HashBased by region, Match uses ActivationCountBased, Leaderboard uses SiloRoleBased on an "aggregator" silo with large memory. Matches stream to Leaderboard via the Kafka provider.

12.2 IoT Digital Twin — Device, Room, Gateway

Every IoT device maps to a DeviceGrain receiving telemetry, computing derived metrics, and caching snapshots. RoomGrain aggregates DeviceGrains in the same room; GatewayGrain talks to the physical gateway over MQTT. HashBasedPlacement keeps the same gateway's grains on the same silo, cutting cross-silo calls. Storage: TimescaleDB/Postgres for historical telemetry, Redis for real-time snapshots.

12.3 AI agent with conversation memory — Conversation Grain

A ConversationGrain holds message history, tool-call logs, and per-session token budget. The grain calls LLM providers (OpenAI / Anthropic / self-hosted vLLM) through a regular HTTP client, persists messages, and exposes IAsyncEnumerable token streams to the API. The advantage over stateless: you don't reload the conversation from the DB per request — after the first, subsequent calls for the same conversation hit a warm grain in RAM.

13. Four common anti-patterns

Anti-pattern #1 — The God Grain

Stuffing the whole system's state into one "Manager" grain. Grains are single-threaded, so every request queues up. Fix: split by natural aggregate roots; a grain shouldn't serve thousands of distinct entities.

Anti-pattern #2 — Forgetting WriteStateAsync

Mutating state but never writing it; on deactivation, it's gone. The opposite — calling WriteStateAsync on every tiny mutation — hammers storage. The right pattern: batch within a method scope, or use event-sourcing with an append-only log.

Anti-pattern #3 — Long-running work inside a grain handler

A handler calls an external HTTP endpoint that takes 5 seconds. For those 5 seconds the grain blocks every other incoming request. Fix: [AlwaysInterleave] for read-only methods, move long work to dedicated background grains, or use IAsyncEnumerable streaming.

Anti-pattern #4 — Grains calling each other in a cycle

Grain A calls B while B calls A — deadlock because of turn-based concurrency. Catch it early with chaos tests that inject artificial latency. Fix: use [Reentrant] for grains you can prove are safe, or flip the call chain into event-driven via streams.

14. Production checklist — Don't forget these before go-live

Before opening the cluster to real traffic

At least 3 silos in production — stable membership quorum, tolerates one node down without flapping.
Pick the right ClusterId, clearly different across dev/staging/prod. Overlaps are a disaster — old grains bump into new ones.
Enable OpenTelemetry from the start, not bolted on after an incident.
Limit CollectionAgeLimit per grain type: high-count, cheap-to-activate grains can deactivate after 5 minutes idle; heavy-state grains should stay longer.
Silo GC tuning: server GC is on by default, but for heaps >16 GB benchmark standard GC against .NET 10's new DATAS GC.
Kubernetes readiness probes: Orleans silos expose a /healthz endpoint (via the Orleans.Diagnostics.HealthChecks package); wire it into K8s probes so rolling updates don't drop traffic.
Graceful shutdown: propagate SIGTERM into the silo so it stops gracefully — moves active grains to other silos and flushes state. Use at least a 60 s timeout.
Version policy: use [Version(n)] on grain interfaces so the cluster can rolling-upgrade without breaking; enable strict mode to reject old-version clients calling newer grains where risky.
Storage backups: daily snapshots with monthly restore tests. Orleans has no safety net — a lost grain state is gone forever.
Chaos load tests: kill a silo at peak on staging and measure p99 recovery. >10 s recovery means membership timeouts need tuning.
Bound the grain type count: clusters with too many grain types (>500) hit metric cardinality and directory overhead. Rethink DDD aggregates if you approach that.

15. Closing — when Orleans is the right choice

Virtual Actors aren't a silver bullet. If your domain is simple CRUD around SQL tables, traditional stateless microservices are easier to operate and onboard. But when domain entities have meaningful state, frequent self-interaction (e.g. a conversation with dozens of turns), and natural concurrency constraints (a player should only act one thing at a time), Orleans turns those constraints into natural code instead of hand-built cache + lock + queue layers.

In 2026, with Orleans 9 on .NET 10, the framework has addressed its three biggest historical weaknesses: hard local operations (Aspire), non-standard observability (OpenTelemetry), and static placement (Resource Optimized Placement). The rest is on the developer — design grains along aggregate roots, avoid the God Grain, understand the cost of transactions and use them only where warranted, and be ready to operate a cluster with a membership ring instead of just a few stateless pods.

If you're designing a multiplayer game system, a million-device IoT platform, a chat platform with millions of concurrent sessions, or an AI agent framework that needs long-term conversation memory — Orleans 9 belongs on your shortlist. Conversely, if your business is standard REST + Postgres, don't force-fit Orleans; the cluster learning and ops cost outweighs the benefits.

16. References

#Microsoft Orleans #Orleans 9 #Virtual Actor #Actor Model #Distributed Grains #Grain Placement #Grain Persistence #Orleans Streams #Broadcast Channel #Orleans Transactions #Silo #Cluster Membership #.NET 10 #.NET Aspire #.NET Aspire 10 #OpenTelemetry #Stateful Services #Cloud Native #Akka.NET #Proto.Actor #Dapr Actors #Kubernetes #Native AOT #Source Generator #ACID Transactions #Reminder #Reentrant Grain #Game Backend #IoT Digital Twin #AI Agent #Conversation Grain #system design #Distributed Systems #Microservices

# Microsoft Orleans 9 on .NET 10 — Virtual Actors, Distributed Grains, and Stateful Cloud-Native Architecture for Games, IoT, and AI Agents

## 1. Why stateful services are reclaiming the spotlight in 2026

Microsoft Orleans was born at Microsoft Research in 2010 exactly to solve that class of problem. At its core is the **Virtual Actor Model** — a pragmatic variant of the classical Actor Model (Erlang, Akka) where developers don't manage actor lifecycle, allocation, or cleanup themselves. Every "grain" (Orleans's actor unit) logically always exists; the Orleans runtime automatically activates it when a request arrives, keeps it alive in RAM while it's working, and cleans it up when idle. By Orleans 9 in 2026 — released with .NET 10 — the framework has come a long way since Halo 4 used it to scale presence/matchmaking in 2012.

This article is a practitioner's handbook for architects and senior engineers considering Orleans for 2026 systems. We won't stop at "Hello Grain" — we'll go straight into the hard decisions: when to use Virtual Actors instead of stateless microservices, how to design grains for a real domain, placement strategies, layered persistence, streams and transactions, integrating **.NET Aspire 10** to orchestrate the cluster, head-to-head comparisons with Akka.NET, Proto.Actor, and Dapr Actors, four common anti-patterns, and a go-live checklist.

~10Mconcurrently active grains on a typical 32-silo cluster

<5msp50 latency for a local grain call (same silo)

100K+grain activations per second on production clusters

9.0the 2026 branch, full .NET 10 + partial NativeAOT

#### Three decisive questions before choosing Orleans

Does your domain genuinely have **substantial entity state** inside each "object" (player, order, device, session), or is it just CRUD around SQL tables? Are you comfortable with **single-threaded per grain** (each grain processes one thing at a time) in exchange for free concurrency? Are you willing to operate a **cluster membership protocol** (whether self-managed via ADO or using Azure/Kubernetes discovery)? Three "yes" answers mean Orleans saves a remarkable amount of code and performance. Any "no" and a stateless microservice + database is often simpler.

## 2. Evolution — from Project Orleans to .NET 10 Cloud-Native

Knowing Orleans's history explains why today's API has some "unusual" conventions compared with standard DI habits, why the *grain directory* is so important, and why version 9 puts so much weight on .NET Aspire integration and standard OpenTelemetry observability.

2010 — Project Orleans at Microsoft Research

2012 — Halo 4 presence service

2015 — Open-sourced on GitHub

Microsoft moved Orleans to open source under the .NET Foundation. The community grew fast — Skype, Gears of War, Age of Empires, and many European banks announced production use cases.

2018 — Orleans 2.x ported to .NET Core

The end of the .NET Framework era. Configuration moved to the builder pattern, DI became native via IServiceCollection, and cross-platform support for Linux/macOS arrived.

2021 — Orleans 3.x + 4.x with code generation

Roslyn source generators replaced dynamic proxies — no more Fody/Castle dependencies, AOT-friendly. A new serializer was many times faster than BinaryFormatter.

2023 — Orleans 7 with .NET 8

Streaming API reimagined, Reminder v2, placement director improvements, experimental gRPC transport. Client/silo host merged into a single generic host.

Q4 2024 — Orleans 8 + .NET 9

Standard OpenTelemetry telemetry on every grain call, metric cardinality drastically reduced. Grain interfaces support generic-type attributes. Key types extended to composite keys.

2026 — Orleans 9 with .NET 10

## 3. Virtual Actor Model — the core difference from classical Actors

The classical Actor Model (Carl Hewitt 1973, Erlang, Akka) has three traits: each actor holds private state, processes messages sequentially, and communicates only via messages. But developers must `Spawn` actors themselves, manage lifecycle, and supervise them. When your entity count runs into the millions — one actor per user, one actor per device — manual management becomes a burden. Orleans's answer is the *virtual actor*: every grain logically always exists and is identified by a key (Guid, string, long, or composite). On the first request, the runtime **activates** an instance in RAM on some silo in the cluster; after sufficient idle time, the runtime **deactivates** and releases memory. Next request, another activation — possibly on a different silo. Developers "just call the grain", without knowing where it is or whether it's in memory.

```
graph LR
    CLIENT["Client / Web API"] --> PROXY["IGrainFactory.GetGrain<IPlayerGrain>(playerId)"]
    PROXY --> DIR["Grain Directory: which silo is playerId on?"]
    DIR -->|"not yet active"| PLACEMENT["Placement Director"]
    PLACEMENT --> SILO_C["Silo C (round-robin / prefer-local)"]
    SILO_C --> ACTIVATE["Activate instance + OnActivateAsync"]
    ACTIVATE --> STATE["Load state from storage"]
    STATE --> INVOKE["Invoke method"]
    DIR -->|"already active"| CACHE["Local cache: grain ref"]
    CACHE --> SILO_B["Silo B is hosting the grain"]
    SILO_B --> INVOKE
    INVOKE --> RESP["Return response to client"]

```

Lifecycle of a grain call — every routing concern hides behind GrainFactory.

The biggest win of this model isn't raw performance, it's the *programming model*: developers write code as if the entity were local, without thinking about connection pools, shards, or partition keys. Each grain has turn-based concurrency — only one request at a time — so no locks are needed to protect state. That's why game, fintech, and IoT teams accept a little flexibility loss for a huge productivity gain.

#### Four traits that set Orleans apart from Akka

- **Always-exist**: no `Spawn` step. Calling a grain that "doesn't yet exist" is the same as calling an idle one.
- **Location transparency**: placement is a runtime decision; the developer doesn't know which silo a grain is on and shouldn't.
- **Automatic activation/deactivation**: the runtime creates/destroys instances according to `CollectionAgeLimit`.
- **Typed interfaces**: grains are called through plain .NET interfaces, not `Tell`/`Ask` with boxed types. Roslyn tooling keeps the chain type-safe end-to-end.

## 4. Grain basics — from interface to implementation

A grain has three parts: an interface inheriting `IGrainWithXKey` (Guid/String/Integer/CompositeKey), a class inheriting `Grain` that implements that interface, and optionally `[GenerateSerializer]` on DTOs that move between grains. Orleans 9 leans on source generators: proxies are built at compile time with no runtime reflection, trim-friendly and AOT-friendly.

```
public interface IPlayerGrain : IGrainWithGuidKey
{
    Task<PlayerProfile> GetProfileAsync();
    Task AddScoreAsync(int delta);
    Task<MatchResult> EnterMatchAsync(Guid matchId);
}

[GenerateSerializer]
public sealed record PlayerProfile(
    [property: Id(0)] Guid Id,
    [property: Id(1)] string DisplayName,
    [property: Id(2)] int Score,
    [property: Id(3)] DateTimeOffset LastSeen);

public sealed class PlayerGrain : Grain, IPlayerGrain
{
    private readonly IPersistentState<PlayerProfile> _state;
    private readonly ILogger<PlayerGrain> _logger;

public PlayerGrain(
        [PersistentState("profile", "players")] IPersistentState<PlayerProfile> state,
        ILogger<PlayerGrain> logger)
    {
        _state = state;
        _logger = logger;
    }

public Task<PlayerProfile> GetProfileAsync() => Task.FromResult(_state.State);

public async Task AddScoreAsync(int delta)
    {
        _state.State = _state.State with
        {
            Score = _state.State.Score + delta,
            LastSeen = DateTimeOffset.UtcNow
        };
        await _state.WriteStateAsync();
    }

public async Task<MatchResult> EnterMatchAsync(Guid matchId)
    {
        var match = GrainFactory.GetGrain<IMatchGrain>(matchId);
        return await match.JoinAsync(this.GetPrimaryKey());
    }
}
```
Note three details: `IPersistentState<T>` injects directly into the constructor — Orleans DI already provides the provider. Serialization attributes `[Id(0)]` on each field — Orleans 9 uses a position-based serializer that supports schema evolution (safe to append new fields, unsafe to reorder). And `GrainFactory` is available on the base `Grain` class for calling other grains, making it feel like an object graph in a monolith.

## 5. Silos, clusters, and placement — where grains really run

An Orleans cluster consists of multiple **silos** — each silo is a .NET process that hosts grains. The cluster needs a *membership table* to know which silos are alive; this can be Azure Storage, ADO.NET, Consul, ZooKeeper, Redis, MongoDB, or in 2026 *Kubernetes-aware membership* reading endpoints from a headless service. When a silo joins or leaves, the membership ring changes and the grain directory rebalances.

```
graph TB
    subgraph CLUSTER["Orleans Cluster"]
        S1["Silo 1  
- GrainDirectory partition A  
- 12K active grains"]
        S2["Silo 2  
- GrainDirectory partition B  
- 11K active grains"]
        S3["Silo 3  
- GrainDirectory partition C  
- 9K active grains"]
        S4["Silo 4 (just joined)  
- takes partition D  
- rebalance 10%"]
    end
    MEMBERSHIP["Membership Table  
ADO/AzureStorage/K8s"]
    S1 --- MEMBERSHIP
    S2 --- MEMBERSHIP
    S3 --- MEMBERSHIP
    S4 --- MEMBERSHIP
    STORAGE["Storage Providers  
PostgreSQL / Azure / Cosmos / S3"]
    S1 --> STORAGE
    S2 --> STORAGE
    S3 --> STORAGE
    S4 --> STORAGE

```

A three-silo cluster; silo 4 joins and takes on a portion of the grain directory.

**Placement strategies** decide which silo a grain is first activated on. Orleans 9 offers the familiar options:

- `RandomPlacement` (default) — simple, evenly distributed on large clusters.
- `ActivationCountBasedPlacement` — pick the silo with the fewest active grains. Great for homogeneous workloads.
- `PreferLocalPlacement` — if the request arrives at silo A, activate there. Ideal when the grain consumes local info (e.g. a cache grain reading a file on the node).
- `HashBasedPlacement` — deterministic by key. Useful for routing related grains to the same silo to reduce cross-silo RPCs.
- `SiloRoleBasedPlacement` — tag silos (CPU-heavy, GPU-enabled) and place grains into matching roles.
- `ResourceOptimizedPlacement` (stable in Orleans 9) — the placement director reads CPU/memory/p95-latency metrics through telemetry and places grains on the least-loaded silo, with configurable weights.

Bad placement is a leading cause of unexplained slowdowns at scale. If your domain has natural *locality* (game: player + match should sit on the same silo; IoT: device + room sensor should sit on the same silo), investing in HashBasedPlacement with a custom director returns a lot of performance.

## 6. Persistence — where grain state lives

By default, Orleans stores nothing; state lives only in RAM and vanishes when the grain deactivates. To make state durable, use a **grain storage provider**: an abstraction that reads/writes by `(grainType, grainKey) -> payload`. The popular 2026 providers:

| Provider | Best fit | Read/write latency | Operations notes |
| --- | --- | --- | --- |
| ADO.NET (Postgres / SQL Server) | Domains with constraints, standard backup needs, SQL-comfortable teams | 2-8 ms / 5-15 ms | Index on PKI (grainTypeHash + grainId). Init scripts ready. |
| Azure Table Storage / Cosmos DB | Azure-native workloads, very high scale, RU-based pricing | 5-20 ms / 8-25 ms | Simple, 1 MB (Table) / 2 MB (Cosmos) entity size limit. |
| AWS DynamoDB | AWS-native workloads, single-digit ms, partition auto-scale | 2-10 ms / 5-15 ms | Watch for hot partitions; use adaptive capacity. |
| MongoDB | Document-oriented complex state, easy schema migration | 3-10 ms / 5-20 ms | Enable a 3-node replica set. |
| Redis | Ephemeral state or cache-backed, acceptable to lose on Redis failure | <1 ms / 1-3 ms | Use when latency matters more than durability. |
| MemoryStorage | Unit tests, development | microseconds | Never enable in production. |

Orleans 9 lets you bind multiple providers to the same grain: one attribute `[PersistentState("profile", "players-sql")]` for durable state, another `[PersistentState("cache", "hot-redis")]` for hot, recomputable data. That's why domain design around Orleans avoids many painful database migrations: a grain is the natural boundary of an aggregate root.

#### A classic persistence trap

Don't confuse `WriteStateAsync()` with "saved instantly, no errors possible". It returns a successful task once the storage provider confirms the write — but if Azure Table is throttled, that call can take 5-30 seconds. While the grain processes a request, system-wide throughput dips. Mitigations: consider *write-behind* with an internal buffer + a reminder that periodically flushes, or switch to an event-sourcing pattern with an append-only log.

## 7. Streams and Broadcast Channel — pub/sub inside the cluster

Orleans streams let a grain publish events to many subscribing consumers with at-most-once or at-least-once delivery guarantees depending on the provider. Typical providers:

- **MemoryStream** — in-cluster, not persistent, great for dev.
- **Azure Queue** / **AWS SQS** stream providers — pull-based, silo-side caching, natural backpressure.
- **EventHub** stream provider — partitioned by EventHub, each silo gets a partition set; scales well for high-throughput workloads.
- **Kafka stream provider** (community, stable in 9.x) — Kafka as transport, rewindable.

```
// Publisher — the match grain emits an event
var streamProvider = this.GetStreamProvider("kafka");
var streamId = StreamId.Create("matches", matchId);
var stream = streamProvider.GetStream<MatchEvent>(streamId);
await stream.OnNextAsync(new MatchEvent.PlayerJoined(playerId, DateTimeOffset.UtcNow));

// Consumer — stats grain subscribes implicitly
[ImplicitStreamSubscription("matches")]
public sealed class StatsGrain : Grain, IStatsGrain, IAsyncObserver<MatchEvent>
{
    public override async Task OnActivateAsync(CancellationToken ct)
    {
        var provider = this.GetStreamProvider("kafka");
        var stream = provider.GetStream<MatchEvent>(StreamId.Create("matches", this.GetPrimaryKey()));
        await stream.SubscribeAsync(this);
    }

public Task OnNextAsync(MatchEvent evt, StreamSequenceToken? token) =>
        evt switch
        {
            MatchEvent.PlayerJoined j => AddJoinAsync(j),
            MatchEvent.PlayerLeft l => RemoveAsync(l),
            _ => Task.CompletedTask
        };
}
```
Orleans 9 adds the **Broadcast Channel** — an in-cluster fan-out channel optimized for "broadcast to every active grain of type T". Unlike streams, there's no strict delivery guarantee, but it's very lightweight (memory-only, cluster-internal transport). Typical uses: grain cache invalidation, broadcasting config changes, keep-alive pings.

## 8. Transactions — ACID across multiple grains

```
public interface IAccountGrain : IGrainWithGuidKey
{
    [Transaction(TransactionOption.Join)]
    Task DebitAsync(decimal amount);

[Transaction(TransactionOption.Join)]
    Task CreditAsync(decimal amount);

[Transaction(TransactionOption.Supported)]
    Task<decimal> GetBalanceAsync();
}

public interface ITransferService : IGrainWithGuidKey
{
    [Transaction(TransactionOption.Create)]
    Task TransferAsync(Guid from, Guid to, decimal amount);
}

public sealed class TransferService : Grain, ITransferService
{
    public async Task TransferAsync(Guid from, Guid to, decimal amount)
    {
        var a = GrainFactory.GetGrain<IAccountGrain>(from);
        var b = GrainFactory.GetGrain<IAccountGrain>(to);
        await a.DebitAsync(amount);
        await b.CreditAsync(amount);
    }
}
```
The runtime guarantees: if any step in `TransferAsync` fails, both grains roll back their state to the pre-transaction point. Three caveats: transactions have noticeable overhead (typically 2-5× regular calls), so reserve them for business logic that truly needs ACID; storage providers must implement `ITransactionalStateStorage`; and avoid cross-cluster transactions — keep them within a cluster to stay under ~50 ms p99.

## 9. Orleans 9 + .NET Aspire 10 — from code to cluster in one command

```
var builder = DistributedApplication.CreateBuilder(args);

var pg = builder.AddPostgres("pg")
    .WithDataVolume()
    .AddDatabase("orleans");

var orleans = builder.AddOrleans("cluster")
    .WithClustering(pg)
    .WithGrainStorage("profile", pg)
    .WithGrainStorage("events", pg)
    .WithReminders(pg)
    .WithStreaming();

builder.AddProject<Projects.GameApi>("api")
    .WithReference(orleans.AsClient());

builder.AddProject<Projects.GameSilo>("silo")
    .WithReference(orleans)
    .WithReplicas(3);

builder.Build().Run();
```
The result: `dotnet run` gives you a 3-silo cluster, Postgres, and the Aspire dashboard with topology, logs, and metrics. Aspire 10 can also export Kubernetes manifests or Azure Container Apps deployments. Developer-experience-wise, this is the biggest leap Orleans has had since going open source.

```
graph LR
    DEV["dotnet run (AppHost)"] --> ASPIRE["Aspire Dashboard"]
    ASPIRE --> API["API project (Orleans Client)"]
    ASPIRE --> SILO1["Silo replica 1"]
    ASPIRE --> SILO2["Silo replica 2"]
    ASPIRE --> SILO3["Silo replica 3"]
    ASPIRE --> PG["Postgres (clustering + storage + reminders)"]
    SILO1 --> PG
    SILO2 --> PG
    SILO3 --> PG
    API -->|"grain call"| SILO1
    API -->|"grain call"| SILO2
    API -->|"grain call"| SILO3

```

A typical Aspire AppHost topology in dev mode — production only differs in the storage provider and replica count.

## 10. Observability — seeing the cluster to trust it

- **orleans.grain.calls** broken down by `grain.type`, `result` (ok/error/timeout). Quickly spots dead or overloaded grains.
- **orleans.grain.activation.count** — grains in RAM. Spikes may indicate a leak or deactivation being blocked.
- **orleans.grain.latency** p50/p95/p99 — reveals hot grains.
- **orleans.storage.read/write.duration** — the provider is wobbling.
- **orleans.membership.changes** — silos joining/leaving. Flapping = network or GC issues.
- **orleans.messaging.queue.length** — message backlog; a steady rise means downstream bottleneck.

On large clusters, cardinality can explode if you record metrics per grainId. Orleans 8.x addressed this with attribute-based filters: by default, metrics are recorded at the `grain.type` level, not tagged with `grain.id` unless explicitly enabled. 2026 production stacks commonly pipe metrics into Prometheus/Mimir, traces into Tempo/Jaeger, and logs into Loki/Elasticsearch.

## 11. Quick comparison — Orleans vs Akka.NET, Proto.Actor, Dapr Actors

| Criterion | Orleans 9 | Akka.NET 1.5 | Proto.Actor | Dapr Actors |
| --- | --- | --- | --- | --- |
| Model | Virtual Actor | Classical Actor (with Cluster Sharding) | Classical Actor + Cluster + Grain addon | Virtual Actor (via sidecar) |
| Host language | .NET 10 | .NET / F# | .NET, Go, Kotlin, Python | Any language via sidecar |
| Placement | Many built-in + custom strategies | Consistent hashing / LeastShardAllocation | Random/Partition | Dapr placement service |
| Persistence | Provider-based, diverse plugins | Akka Persistence + journal/snapshot | Persistence addon | Dapr state stores |
| Transactions | Built-in cross-grain ACID | Not native; patterns must be hand-rolled | None | Limited (single actor) |
| Streams | Built-in + multiple providers | Akka Streams (reactive, powerful) | Minimal | Pub/Sub via Dapr, not rewindable by default |
| Tooling | Aspire + OTel | Petabridge tooling | Community | Dapr Dashboard + Radius |
| Client language | .NET | .NET / cross-platform via HTTP | Polyglot | Polyglot via sidecar |
| Best when | Pure .NET team, clear aggregates, needs locality + transactions | Team with reactive DNA, heavy Akka Streams use | Polyglot, lightweight needs | Multi-language, preferring open cloud-native standards |

## 12. Production use cases — three typical scenarios

### 12.1 Multiplayer game — Player, Match, Leaderboard

A three-grain template: `PlayerGrain` holds profile + wallet + inventory, `MatchGrain` holds room state (players, scores, ticks), `LeaderboardGrain` aggregates by region. Placement: Player uses HashBased by region, Match uses ActivationCountBased, Leaderboard uses SiloRoleBased on an "aggregator" silo with large memory. Matches stream to Leaderboard via the Kafka provider.

### 12.2 IoT Digital Twin — Device, Room, Gateway

Every IoT device maps to a `DeviceGrain` receiving telemetry, computing derived metrics, and caching snapshots. `RoomGrain` aggregates DeviceGrains in the same room; `GatewayGrain` talks to the physical gateway over MQTT. HashBasedPlacement keeps the same gateway's grains on the same silo, cutting cross-silo calls. Storage: TimescaleDB/Postgres for historical telemetry, Redis for real-time snapshots.

### 12.3 AI agent with conversation memory — Conversation Grain

A `ConversationGrain` holds message history, tool-call logs, and per-session token budget. The grain calls LLM providers (OpenAI / Anthropic / self-hosted vLLM) through a regular HTTP client, persists messages, and exposes `IAsyncEnumerable` token streams to the API. The advantage over stateless: you don't reload the conversation from the DB per request — after the first, subsequent calls for the same conversation hit a warm grain in RAM.

## 13. Four common anti-patterns

#### Anti-pattern #1 — The God Grain

#### Anti-pattern #2 — Forgetting WriteStateAsync

#### Anti-pattern #3 — Long-running work inside a grain handler

A handler calls an external HTTP endpoint that takes 5 seconds. For those 5 seconds the grain blocks every other incoming request. Fix: `[AlwaysInterleave]` for read-only methods, move long work to dedicated background grains, or use `IAsyncEnumerable` streaming.

#### Anti-pattern #4 — Grains calling each other in a cycle

Grain A calls B while B calls A — deadlock because of turn-based concurrency. Catch it early with chaos tests that inject artificial latency. Fix: use `[Reentrant]` for grains you can prove are safe, or flip the call chain into event-driven via streams.

## 14. Production checklist — Don't forget these before go-live

#### Before opening the cluster to real traffic

- **At least 3 silos** in production — stable membership quorum, tolerates one node down without flapping.
- **Pick the right ClusterId**, clearly different across dev/staging/prod. Overlaps are a disaster — old grains bump into new ones.
- **Enable OpenTelemetry** from the start, not bolted on after an incident.
- **Limit CollectionAgeLimit** per grain type: high-count, cheap-to-activate grains can deactivate after 5 minutes idle; heavy-state grains should stay longer.
- **Silo GC tuning**: server GC is on by default, but for heaps >16 GB benchmark standard GC against .NET 10's new DATAS GC.
- **Kubernetes readiness probes**: Orleans silos expose a `/healthz` endpoint (via the `Orleans.Diagnostics.HealthChecks` package); wire it into K8s probes so rolling updates don't drop traffic.
- **Graceful shutdown**: propagate `SIGTERM` into the silo so it stops gracefully — moves active grains to other silos and flushes state. Use at least a 60 s timeout.
- **Version policy**: use `[Version(n)]` on grain interfaces so the cluster can rolling-upgrade without breaking; enable strict mode to reject old-version clients calling newer grains where risky.
- **Storage backups**: daily snapshots with monthly restore tests. Orleans has no safety net — a lost grain state is gone forever.
- **Chaos load tests**: kill a silo at peak on staging and measure p99 recovery. >10 s recovery means membership timeouts need tuning.
- **Bound the grain type count**: clusters with too many grain types (>500) hit metric cardinality and directory overhead. Rethink DDD aggregates if you approach that.

## 15. Closing — when Orleans is the right choice

## 16. References

- [Microsoft Learn — Orleans Overview](https://learn.microsoft.com/en-us/dotnet/orleans/overview)
- [GitHub — dotnet/orleans source code](https://github.com/dotnet/orleans)
- [Orleans Grain Fundamentals](https://learn.microsoft.com/en-us/dotnet/orleans/grains/)
- [Orleans Grain Persistence Providers](https://learn.microsoft.com/en-us/dotnet/orleans/grains/grain-persistence/)
- [Orleans Streams Documentation](https://learn.microsoft.com/en-us/dotnet/orleans/streaming/)
- [Orleans ACID Transactions](https://learn.microsoft.com/en-us/dotnet/orleans/grains/transactions)
- [.NET Aspire + Orleans Integration](https://learn.microsoft.com/en-us/dotnet/aspire/orleans/overview)
- [Activation Collection & Lifecycle](https://learn.microsoft.com/en-us/dotnet/orleans/host/configuration-guide/activation-collection)
- [Microsoft Research — Project Orleans](https://www.microsoft.com/en-us/research/project/orleans-virtual-actors/)
- [Akka.NET Documentation](https://getakka.net/)
- [Proto.Actor Official Site](https://proto.actor/)
- [Dapr Actors Building Block](https://docs.dapr.io/developing-applications/building-blocks/actors/)

ClickHouse 2026 — Sub-second OLAP Architecture with SharedMergeTree, Parallel Replicas, and Storage-Compute Separation for Petabyte Analytics

Background Jobs on .NET 10 in 2026 — Hangfire, Quartz.NET, and MassTransit: Schedulers, Retry, Distributed Lock, and the Outbox Pattern for Production Async Workflows

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.