Microsoft.Extensions.AI — The Unified AI Abstraction Layer for .NET 10
Posted on: 4/25/2026 9:12:55 PM
Table of contents
- Why Microsoft.Extensions.AI?
- Architecture Overview
- Two Core Packages
- IChatClient — The Unified Chat Completion Interface
- Middleware Pipeline — The Real Power
- IEmbeddingGenerator — Vector Embeddings for RAG
- Function Calling (Tool Use)
- Comparison with Other Approaches
- Real-world ASP.NET Core Integration
- Advanced Pattern: Multi-provider with Named Clients
- Writing Custom Middleware
- IImageGenerator — Text-to-Image Generation (Experimental)
- Decision Tree: When to Use What?
- Development Timeline
- Production Best Practices
- Conclusion
- References
When building AI-powered applications on .NET, developers face a familiar challenge: each provider (OpenAI, Azure OpenAI, Ollama, Anthropic…) comes with its own SDK, its own API surface, and its own way of handling streaming. Switching providers means rewriting code. Microsoft.Extensions.AI was created to solve this problem once and for all — providing a unified abstraction layer that integrates deeply into .NET's familiar Dependency Injection ecosystem, allowing you to swap providers without changing any business logic.
Why Microsoft.Extensions.AI?
Before this library, integrating AI into a .NET application meant choosing between two paths: using a provider's SDK directly (vendor lock-in) or building your own abstraction layer (time-consuming, hard to maintain). Microsoft.Extensions.AI offers a third way — a standard co-developed by Microsoft and the .NET community.
Key Distinction
Microsoft.Extensions.AI does not replace Semantic Kernel. It sits at a lower level — providing primitive types (IChatClient, IEmbeddingGenerator) that Semantic Kernel uses internally. If you only need chat completion or embeddings, Extensions.AI alone is sufficient. When you need complex orchestration (plugins, planning, agents), combine it with Semantic Kernel.
Architecture Overview
The system follows a clean layered architecture where each layer has distinct responsibilities:
graph TD
A["Application Code
Controller, Service, Worker"] --> B["Microsoft.Extensions.AI
Middleware Pipeline"]
B --> C["IChatClient / IEmbeddingGenerator
Abstractions"]
C --> D["Provider Implementations"]
D --> E["Azure OpenAI"]
D --> F["OpenAI"]
D --> G["Ollama
Local Models"]
D --> H["Anthropic
Claude"]
D --> I["Custom Provider"]
B --> J["UseLogging()"]
B --> K["UseDistributedCache()"]
B --> L["UseOpenTelemetry()"]
B --> M["UseFunctionInvocation()"]
style A fill:#e94560,stroke:#fff,color:#fff
style B fill:#2c3e50,stroke:#fff,color:#fff
style C fill:#16213e,stroke:#fff,color:#fff
style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style E fill:#0078d4,stroke:#fff,color:#fff
style F fill:#412991,stroke:#fff,color:#fff
style G fill:#4CAF50,stroke:#fff,color:#fff
style H fill:#d97706,stroke:#fff,color:#fff
style I fill:#888,stroke:#fff,color:#fff
style J fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
style K fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
style L fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
style M fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
Layered architecture of Microsoft.Extensions.AI in .NET 10
Two Core Packages
| Package | Role | When to Use |
|---|---|---|
| Microsoft.Extensions.AI.Abstractions | Defines interfaces (IChatClient, IEmbeddingGenerator, IImageGenerator) and exchange types (ChatMessage, ChatResponse…) | For library authors — provider adapter implementations only need this package |
| Microsoft.Extensions.AI | Provides middleware pipeline (logging, caching, telemetry, function invocation) + builder pattern for DI | For application developers — reference this package for full functionality |
IChatClient — The Unified Chat Completion Interface
This is the central interface of the entire library. With just 2 core methods, it covers every use case from simple prompts to multi-turn conversations with tool calling:
public interface IChatClient : IDisposable
{
// Synchronous chat completion — returns full response
Task<ChatResponse> GetResponseAsync(
IEnumerable<ChatMessage> messages,
ChatOptions? options = null,
CancellationToken cancellationToken = default);
// Streaming — returns tokens via IAsyncEnumerable
IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
IEnumerable<ChatMessage> messages,
ChatOptions? options = null,
CancellationToken cancellationToken = default);
ChatClientMetadata Metadata { get; }
}
Basic Example: Chat with Azure OpenAI
using Microsoft.Extensions.AI;
using Azure.AI.OpenAI;
// Create IChatClient from Azure OpenAI SDK
IChatClient client = new AzureOpenAIClient(
new Uri("https://my-resource.openai.azure.com/"),
new Azure.AzureKeyCredential("api-key"))
.GetChatClient("gpt-4o")
.AsIChatClient();
// Call chat — as simple as calling any service
ChatResponse response = await client.GetResponseAsync(
"Explain Dependency Injection in 3 sentences");
Console.WriteLine(response.Text);
Streaming Response
await foreach (var update in client.GetStreamingResponseAsync(
"Write a quick sort function in C#"))
{
Console.Write(update.Text);
}
Switch to Ollama (Local Model) — Change Only 1 Line
// Before: Azure OpenAI
IChatClient client = new AzureOpenAIClient(endpoint, key)
.GetChatClient("gpt-4o").AsIChatClient();
// After: Ollama local — business code UNCHANGED
IChatClient client = new OllamaChatClient(
new Uri("http://localhost:11434"), "llama3.2");
Provider-agnostic is the Key
All downstream code — chat logic, streaming, function calling — stays exactly the same. Only the IChatClient initialization line changes. This is exactly the pattern .NET developers are familiar with from ILogger and IDistributedCache — now applied to AI.
Middleware Pipeline — The Real Power
If it stopped at abstraction, Microsoft.Extensions.AI would be unremarkable. What makes it a game-changer is the middleware pipeline — allowing you to stack cross-cutting concerns just like ASP.NET Core middleware:
sequenceDiagram
participant App as Application
participant Log as UseLogging()
participant Cache as UseDistributedCache()
participant OTel as UseOpenTelemetry()
participant Func as UseFunctionInvocation()
participant LLM as LLM Provider
App->>Log: GetResponseAsync()
Log->>Log: Log request
Log->>Cache: Forward
Cache->>Cache: Check cache
alt Cache hit
Cache-->>Log: Return cached response
Log-->>App: Response
else Cache miss
Cache->>OTel: Forward
OTel->>OTel: Start trace span
OTel->>Func: Forward
Func->>LLM: Call provider
LLM-->>Func: Response (may include tool calls)
Func->>Func: Execute tools, re-call if needed
Func-->>OTel: Final response
OTel->>OTel: End span
OTel-->>Cache: Response
Cache->>Cache: Store in cache
Cache-->>Log: Response
Log->>Log: Log response
Log-->>App: Response
end
Request flow through the middleware pipeline
Register Middleware via DI
var builder = WebApplication.CreateBuilder(args);
// Register IChatClient with full middleware pipeline
builder.Services.AddChatClient(services =>
new AzureOpenAIClient(
new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
new Azure.AzureKeyCredential(
builder.Configuration["AzureOpenAI:ApiKey"]!))
.GetChatClient("gpt-4o")
.AsIChatClient()
.AsBuilder()
.UseLogging() // Log all requests/responses
.UseDistributedCache() // Cache responses for identical prompts
.UseOpenTelemetry() // Distributed tracing
.UseFunctionInvocation() // Auto-invoke tools when LLM requests
.Build(services));
// Backend cache
builder.Services.AddDistributedMemoryCache(); // Or Redis cache
Middleware Order Matters
Middleware is applied outermost-first. UseLogging() at the outermost position logs everything including cache hits. Place UseDistributedCache() before UseOpenTelemetry() to avoid creating trace spans for cached responses — reducing noise in monitoring.
IEmbeddingGenerator — Vector Embeddings for RAG
The second interface serves RAG (Retrieval-Augmented Generation) systems, semantic search, and memory stores:
public interface IEmbeddingGenerator<TInput, TEmbedding> : IDisposable
where TEmbedding : Embedding
{
Task<GeneratedEmbeddings<TEmbedding>> GenerateAsync(
IEnumerable<TInput> values,
EmbeddingGenerationOptions? options = null,
CancellationToken cancellationToken = default);
}
Example: Generate Embeddings for Semantic Search
IEmbeddingGenerator<string, Embedding<float>> generator =
new AzureOpenAIClient(endpoint, key)
.GetEmbeddingClient("text-embedding-3-small")
.AsIEmbeddingGenerator();
// Generate embeddings for a batch of documents
var embeddings = await generator.GenerateAsync(new[]
{
"Microservices architecture patterns",
"Event-driven design with message queues",
"CQRS and Event Sourcing in .NET 10"
});
// Each embedding is a float[] vector — store in a vector database
foreach (var emb in embeddings)
{
float[] vector = emb.Vector.ToArray();
// Store in Qdrant, Milvus, pgvector...
}
Function Calling (Tool Use)
One of the most powerful features: let the LLM invoke C# functions you define, with UseFunctionInvocation() middleware automatically dispatching calls and re-sending results to the LLM:
[Description("Get weather information for a city")]
static async Task<string> GetWeather(
[Description("City name")] string city)
{
// Call actual weather API
return $"Weather in {city}: 28°C, sunny";
}
IChatClient client = new AzureOpenAIClient(endpoint, key)
.GetChatClient("gpt-4o")
.AsIChatClient()
.AsBuilder()
.UseFunctionInvocation() // Auto-invoke when LLM requests
.Build();
var options = new ChatOptions
{
Tools = [AIFunctionFactory.Create(GetWeather)]
};
var response = await client.GetResponseAsync(
"What's the weather like in Hanoi today?", options);
// LLM auto-calls GetWeather("Hanoi"), receives result,
// then composes a natural language answer
Comparison with Other Approaches
| Criteria | Direct SDK | Microsoft.Extensions.AI | Semantic Kernel |
|---|---|---|---|
| Provider lock-in | High — code tightly coupled to 1 SDK | None — swap via DI | None — uses IChatClient underneath |
| Middleware pipeline | DIY | Built-in (logging, cache, telemetry, function calling) | Built-in + filters, plugins |
| Dependency Injection | Manual wiring | First-class support | First-class support |
| Orchestration (planning, agents) | No | No — primitives only | Yes — plugins, planner, agent framework |
| Learning curve | Low but fragmented | Low, familiar to .NET devs | Medium — many concepts |
| Package size | Small | Very small (~lightweight) | Larger — full orchestration |
| Best for | Quick prototypes | Production apps needing simple AI | Complex AI-first applications |
Real-world ASP.NET Core Integration
Here's a complete pattern for an API endpoint using IChatClient via DI:
// Program.cs
var builder = WebApplication.CreateBuilder(args);
// Register cache backend
builder.Services.AddStackExchangeRedisCache(options =>
{
options.Configuration = builder.Configuration
.GetConnectionString("Redis");
});
// Register IChatClient with middleware pipeline
builder.Services.AddChatClient(services =>
{
var config = builder.Configuration;
return new AzureOpenAIClient(
new Uri(config["AI:Endpoint"]!),
new Azure.AzureKeyCredential(config["AI:ApiKey"]!))
.GetChatClient(config["AI:Model"]!)
.AsIChatClient()
.AsBuilder()
.UseLogging()
.UseDistributedCache()
.UseOpenTelemetry(
configure: otel => otel.EnableSensitiveData = false)
.UseFunctionInvocation()
.Build(services);
});
var app = builder.Build();
app.MapPost("/api/chat", async (
IChatClient chatClient,
ChatRequest request) =>
{
var messages = new List<ChatMessage>
{
new(ChatRole.System, "You are a technical assistant specializing in .NET."),
new(ChatRole.User, request.Message)
};
var response = await chatClient.GetResponseAsync(messages);
return Results.Ok(new { reply = response.Text });
});
app.Run();
Advanced Pattern: Multi-provider with Named Clients
In production, you often need multiple LLMs for different purposes — e.g., a small model for classification, a large model for generation:
// Register 2 clients with different names
builder.Services.AddKeyedSingleton<IChatClient>("fast", (sp, _) =>
new OllamaChatClient(
new Uri("http://localhost:11434"), "phi-4-mini"));
builder.Services.AddKeyedSingleton<IChatClient>("smart", (sp, _) =>
new AzureOpenAIClient(endpoint, key)
.GetChatClient("gpt-4o")
.AsIChatClient()
.AsBuilder()
.UseDistributedCache()
.Build(sp));
// Inject with [FromKeyedServices]
app.MapPost("/api/classify", async (
[FromKeyedServices("fast")] IChatClient fast,
string text) =>
{
var result = await fast.GetResponseAsync(
$"Classify intent: {text}. Return: support|sales|billing");
return Results.Ok(new { intent = result.Text?.Trim() });
});
app.MapPost("/api/generate", async (
[FromKeyedServices("smart")] IChatClient smart,
string prompt) =>
{
var result = await smart.GetResponseAsync(prompt);
return Results.Ok(new { content = result.Text });
});
Writing Custom Middleware
You can write your own middleware. Example — rate limiting middleware using Polly:
public class RateLimitingChatClient : DelegatingChatClient
{
private readonly RateLimiter _limiter;
public RateLimitingChatClient(
IChatClient inner, RateLimiter limiter)
: base(inner)
{
_limiter = limiter;
}
public override async Task<ChatResponse> GetResponseAsync(
IEnumerable<ChatMessage> messages,
ChatOptions? options = null,
CancellationToken cancellationToken = default)
{
using var lease = await _limiter
.AcquireAsync(1, cancellationToken);
if (!lease.IsAcquired)
throw new RateLimitExceededException("AI rate limit hit");
return await base.GetResponseAsync(
messages, options, cancellationToken);
}
}
// Register
builder.Services.AddChatClient(services =>
new AzureOpenAIClient(endpoint, key)
.GetChatClient("gpt-4o")
.AsIChatClient()
.AsBuilder()
.Use(inner => new RateLimitingChatClient(
inner,
new TokenBucketRateLimiter(new TokenBucketRateLimiterOptions
{
TokenLimit = 100,
ReplenishmentPeriod = TimeSpan.FromMinutes(1),
TokensPerPeriod = 50
})))
.UseLogging()
.Build(services));
IImageGenerator — Text-to-Image Generation (Experimental)
The newest interface (marked experimental) enables text-to-image generation integration with the same familiar DI pattern:
IImageGenerator imageGen = /* provider implementation */;
var result = await imageGen.GenerateAsync(
new ImageGenerationRequest("A futuristic city with neon lights"),
new ImageGenerationOptions
{
Width = 1024,
Height = 1024
});
// result contains URL or byte[] of the generated image
Decision Tree: When to Use What?
graph TD
Q["Need AI in your .NET app?"] --> A{"Need complex
orchestration?"}
A -- Yes --> B{"Need agents,
planning, plugins?"}
B -- Yes --> C["Semantic Kernel
+ Agent Framework"]
B -- No --> D["Semantic Kernel
simple"]
A -- No --> E{"Need middleware
(cache, logging)?"}
E -- Yes --> F["Microsoft.Extensions.AI
+ Middleware Pipeline"]
E -- No --> G{"Quick
prototype?"}
G -- Yes --> H["Direct SDK
(OpenAI, Ollama...)"]
G -- No --> F
style Q fill:#e94560,stroke:#fff,color:#fff
style C fill:#2c3e50,stroke:#fff,color:#fff
style D fill:#2c3e50,stroke:#fff,color:#fff
style F fill:#4CAF50,stroke:#fff,color:#fff
style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50
Decision tree: Choose the right AI approach for your .NET app
Development Timeline
Production Best Practices
1. Always Use DI, Never New Directly
Register IChatClient via AddChatClient() so the middleware pipeline works correctly and remains testable via mock/stub.
2. Cache Aggressively for Deterministic Prompts
Prompts that don't change (system prompts, classification tasks) should enable UseDistributedCache() to significantly reduce API call costs.
3. Use OpenTelemetry to Monitor Costs
Every LLM call consumes tokens = costs money. UseOpenTelemetry() helps track request count, latency, and token usage via metrics — connect to Prometheus/Grafana to alert when exceeding budget.
4. Be Careful with EnableSensitiveData
When EnableSensitiveData = true is set in OpenTelemetry middleware, all prompts and responses will be logged — dangerous if containing PII. Production should keep this as false.
Conclusion
Microsoft.Extensions.AI is a significant step forward in standardizing how .NET developers integrate AI. Instead of each provider having its own way, you now have IChatClient — equivalent to ILogger for logging, IDistributedCache for caching — a single interface, flexible middleware pipeline, and zero-code-change provider swapping. With .NET 10, the library is stable and production-ready. If you're building any .NET application with AI integration, this is the foundation to start with.
References
- Microsoft.Extensions.AI libraries - .NET | Microsoft Learn
- Introducing Microsoft.Extensions.AI Preview - .NET Blog
- NuGet Gallery | Microsoft.Extensions.AI 10.5.0
- GitHub - dotnet/extensions - Microsoft.Extensions.AI source
- Microsoft.Extensions.AI: The New Foundation for .NET AI Development
- Generative AI with LLMs in C# in 2026 - .NET Blog
Valkey vs Redis 2026 — The Fork That Reshaped the In-Memory Database Landscape
Astro Framework — The Secret Weapon for High-Performance Websites in the Edge Computing Era
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.