Microsoft.Extensions.AI — The Unified AI Abstraction Layer for .NET 10

Posted on: 4/25/2026 9:12:55 PM

When building AI-powered applications on .NET, developers face a familiar challenge: each provider (OpenAI, Azure OpenAI, Ollama, Anthropic…) comes with its own SDK, its own API surface, and its own way of handling streaming. Switching providers means rewriting code. Microsoft.Extensions.AI was created to solve this problem once and for all — providing a unified abstraction layer that integrates deeply into .NET's familiar Dependency Injection ecosystem, allowing you to swap providers without changing any business logic.

10.5.0Latest stable version on NuGet
3Core interfaces: IChatClient, IEmbeddingGenerator, IImageGenerator
4+Built-in middleware (Logging, Cache, Telemetry, Function Calling)
0Lines of code to change when swapping providers

Why Microsoft.Extensions.AI?

Before this library, integrating AI into a .NET application meant choosing between two paths: using a provider's SDK directly (vendor lock-in) or building your own abstraction layer (time-consuming, hard to maintain). Microsoft.Extensions.AI offers a third way — a standard co-developed by Microsoft and the .NET community.

Key Distinction

Microsoft.Extensions.AI does not replace Semantic Kernel. It sits at a lower level — providing primitive types (IChatClient, IEmbeddingGenerator) that Semantic Kernel uses internally. If you only need chat completion or embeddings, Extensions.AI alone is sufficient. When you need complex orchestration (plugins, planning, agents), combine it with Semantic Kernel.

Architecture Overview

The system follows a clean layered architecture where each layer has distinct responsibilities:

graph TD
    A["Application Code
Controller, Service, Worker"] --> B["Microsoft.Extensions.AI
Middleware Pipeline"] B --> C["IChatClient / IEmbeddingGenerator
Abstractions"] C --> D["Provider Implementations"] D --> E["Azure OpenAI"] D --> F["OpenAI"] D --> G["Ollama
Local Models"] D --> H["Anthropic
Claude"] D --> I["Custom Provider"] B --> J["UseLogging()"] B --> K["UseDistributedCache()"] B --> L["UseOpenTelemetry()"] B --> M["UseFunctionInvocation()"] style A fill:#e94560,stroke:#fff,color:#fff style B fill:#2c3e50,stroke:#fff,color:#fff style C fill:#16213e,stroke:#fff,color:#fff style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style E fill:#0078d4,stroke:#fff,color:#fff style F fill:#412991,stroke:#fff,color:#fff style G fill:#4CAF50,stroke:#fff,color:#fff style H fill:#d97706,stroke:#fff,color:#fff style I fill:#888,stroke:#fff,color:#fff style J fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50 style K fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50 style L fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50 style M fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50

Layered architecture of Microsoft.Extensions.AI in .NET 10

Two Core Packages

PackageRoleWhen to Use
Microsoft.Extensions.AI.AbstractionsDefines interfaces (IChatClient, IEmbeddingGenerator, IImageGenerator) and exchange types (ChatMessage, ChatResponse…)For library authors — provider adapter implementations only need this package
Microsoft.Extensions.AIProvides middleware pipeline (logging, caching, telemetry, function invocation) + builder pattern for DIFor application developers — reference this package for full functionality

IChatClient — The Unified Chat Completion Interface

This is the central interface of the entire library. With just 2 core methods, it covers every use case from simple prompts to multi-turn conversations with tool calling:

public interface IChatClient : IDisposable
{
    // Synchronous chat completion — returns full response
    Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

    // Streaming — returns tokens via IAsyncEnumerable
    IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

    ChatClientMetadata Metadata { get; }
}

Basic Example: Chat with Azure OpenAI

using Microsoft.Extensions.AI;
using Azure.AI.OpenAI;

// Create IChatClient from Azure OpenAI SDK
IChatClient client = new AzureOpenAIClient(
        new Uri("https://my-resource.openai.azure.com/"),
        new Azure.AzureKeyCredential("api-key"))
    .GetChatClient("gpt-4o")
    .AsIChatClient();

// Call chat — as simple as calling any service
ChatResponse response = await client.GetResponseAsync(
    "Explain Dependency Injection in 3 sentences");
Console.WriteLine(response.Text);

Streaming Response

await foreach (var update in client.GetStreamingResponseAsync(
    "Write a quick sort function in C#"))
{
    Console.Write(update.Text);
}

Switch to Ollama (Local Model) — Change Only 1 Line

// Before: Azure OpenAI
IChatClient client = new AzureOpenAIClient(endpoint, key)
    .GetChatClient("gpt-4o").AsIChatClient();

// After: Ollama local — business code UNCHANGED
IChatClient client = new OllamaChatClient(
    new Uri("http://localhost:11434"), "llama3.2");

Provider-agnostic is the Key

All downstream code — chat logic, streaming, function calling — stays exactly the same. Only the IChatClient initialization line changes. This is exactly the pattern .NET developers are familiar with from ILogger and IDistributedCache — now applied to AI.

Middleware Pipeline — The Real Power

If it stopped at abstraction, Microsoft.Extensions.AI would be unremarkable. What makes it a game-changer is the middleware pipeline — allowing you to stack cross-cutting concerns just like ASP.NET Core middleware:

sequenceDiagram
    participant App as Application
    participant Log as UseLogging()
    participant Cache as UseDistributedCache()
    participant OTel as UseOpenTelemetry()
    participant Func as UseFunctionInvocation()
    participant LLM as LLM Provider

    App->>Log: GetResponseAsync()
    Log->>Log: Log request
    Log->>Cache: Forward
    Cache->>Cache: Check cache
    alt Cache hit
        Cache-->>Log: Return cached response
        Log-->>App: Response
    else Cache miss
        Cache->>OTel: Forward
        OTel->>OTel: Start trace span
        OTel->>Func: Forward
        Func->>LLM: Call provider
        LLM-->>Func: Response (may include tool calls)
        Func->>Func: Execute tools, re-call if needed
        Func-->>OTel: Final response
        OTel->>OTel: End span
        OTel-->>Cache: Response
        Cache->>Cache: Store in cache
        Cache-->>Log: Response
        Log->>Log: Log response
        Log-->>App: Response
    end

Request flow through the middleware pipeline

Register Middleware via DI

var builder = WebApplication.CreateBuilder(args);

// Register IChatClient with full middleware pipeline
builder.Services.AddChatClient(services =>
    new AzureOpenAIClient(
            new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
            new Azure.AzureKeyCredential(
                builder.Configuration["AzureOpenAI:ApiKey"]!))
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .UseLogging()                    // Log all requests/responses
        .UseDistributedCache()           // Cache responses for identical prompts
        .UseOpenTelemetry()              // Distributed tracing
        .UseFunctionInvocation()         // Auto-invoke tools when LLM requests
        .Build(services));

// Backend cache
builder.Services.AddDistributedMemoryCache(); // Or Redis cache

Middleware Order Matters

Middleware is applied outermost-first. UseLogging() at the outermost position logs everything including cache hits. Place UseDistributedCache() before UseOpenTelemetry() to avoid creating trace spans for cached responses — reducing noise in monitoring.

IEmbeddingGenerator — Vector Embeddings for RAG

The second interface serves RAG (Retrieval-Augmented Generation) systems, semantic search, and memory stores:

public interface IEmbeddingGenerator<TInput, TEmbedding> : IDisposable
    where TEmbedding : Embedding
{
    Task<GeneratedEmbeddings<TEmbedding>> GenerateAsync(
        IEnumerable<TInput> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default);
}
IEmbeddingGenerator<string, Embedding<float>> generator =
    new AzureOpenAIClient(endpoint, key)
        .GetEmbeddingClient("text-embedding-3-small")
        .AsIEmbeddingGenerator();

// Generate embeddings for a batch of documents
var embeddings = await generator.GenerateAsync(new[]
{
    "Microservices architecture patterns",
    "Event-driven design with message queues",
    "CQRS and Event Sourcing in .NET 10"
});

// Each embedding is a float[] vector — store in a vector database
foreach (var emb in embeddings)
{
    float[] vector = emb.Vector.ToArray();
    // Store in Qdrant, Milvus, pgvector...
}

Function Calling (Tool Use)

One of the most powerful features: let the LLM invoke C# functions you define, with UseFunctionInvocation() middleware automatically dispatching calls and re-sending results to the LLM:

[Description("Get weather information for a city")]
static async Task<string> GetWeather(
    [Description("City name")] string city)
{
    // Call actual weather API
    return $"Weather in {city}: 28°C, sunny";
}

IChatClient client = new AzureOpenAIClient(endpoint, key)
    .GetChatClient("gpt-4o")
    .AsIChatClient()
    .AsBuilder()
    .UseFunctionInvocation() // Auto-invoke when LLM requests
    .Build();

var options = new ChatOptions
{
    Tools = [AIFunctionFactory.Create(GetWeather)]
};

var response = await client.GetResponseAsync(
    "What's the weather like in Hanoi today?", options);
// LLM auto-calls GetWeather("Hanoi"), receives result,
// then composes a natural language answer

Comparison with Other Approaches

CriteriaDirect SDKMicrosoft.Extensions.AISemantic Kernel
Provider lock-inHigh — code tightly coupled to 1 SDKNone — swap via DINone — uses IChatClient underneath
Middleware pipelineDIYBuilt-in (logging, cache, telemetry, function calling)Built-in + filters, plugins
Dependency InjectionManual wiringFirst-class supportFirst-class support
Orchestration (planning, agents)NoNo — primitives onlyYes — plugins, planner, agent framework
Learning curveLow but fragmentedLow, familiar to .NET devsMedium — many concepts
Package sizeSmallVery small (~lightweight)Larger — full orchestration
Best forQuick prototypesProduction apps needing simple AIComplex AI-first applications

Real-world ASP.NET Core Integration

Here's a complete pattern for an API endpoint using IChatClient via DI:

// Program.cs
var builder = WebApplication.CreateBuilder(args);

// Register cache backend
builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = builder.Configuration
        .GetConnectionString("Redis");
});

// Register IChatClient with middleware pipeline
builder.Services.AddChatClient(services =>
{
    var config = builder.Configuration;
    return new AzureOpenAIClient(
            new Uri(config["AI:Endpoint"]!),
            new Azure.AzureKeyCredential(config["AI:ApiKey"]!))
        .GetChatClient(config["AI:Model"]!)
        .AsIChatClient()
        .AsBuilder()
        .UseLogging()
        .UseDistributedCache()
        .UseOpenTelemetry(
            configure: otel => otel.EnableSensitiveData = false)
        .UseFunctionInvocation()
        .Build(services);
});

var app = builder.Build();

app.MapPost("/api/chat", async (
    IChatClient chatClient,
    ChatRequest request) =>
{
    var messages = new List<ChatMessage>
    {
        new(ChatRole.System, "You are a technical assistant specializing in .NET."),
        new(ChatRole.User, request.Message)
    };

    var response = await chatClient.GetResponseAsync(messages);
    return Results.Ok(new { reply = response.Text });
});

app.Run();

Advanced Pattern: Multi-provider with Named Clients

In production, you often need multiple LLMs for different purposes — e.g., a small model for classification, a large model for generation:

// Register 2 clients with different names
builder.Services.AddKeyedSingleton<IChatClient>("fast", (sp, _) =>
    new OllamaChatClient(
        new Uri("http://localhost:11434"), "phi-4-mini"));

builder.Services.AddKeyedSingleton<IChatClient>("smart", (sp, _) =>
    new AzureOpenAIClient(endpoint, key)
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .UseDistributedCache()
        .Build(sp));

// Inject with [FromKeyedServices]
app.MapPost("/api/classify", async (
    [FromKeyedServices("fast")] IChatClient fast,
    string text) =>
{
    var result = await fast.GetResponseAsync(
        $"Classify intent: {text}. Return: support|sales|billing");
    return Results.Ok(new { intent = result.Text?.Trim() });
});

app.MapPost("/api/generate", async (
    [FromKeyedServices("smart")] IChatClient smart,
    string prompt) =>
{
    var result = await smart.GetResponseAsync(prompt);
    return Results.Ok(new { content = result.Text });
});

Writing Custom Middleware

You can write your own middleware. Example — rate limiting middleware using Polly:

public class RateLimitingChatClient : DelegatingChatClient
{
    private readonly RateLimiter _limiter;

    public RateLimitingChatClient(
        IChatClient inner, RateLimiter limiter)
        : base(inner)
    {
        _limiter = limiter;
    }

    public override async Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await _limiter
            .AcquireAsync(1, cancellationToken);

        if (!lease.IsAcquired)
            throw new RateLimitExceededException("AI rate limit hit");

        return await base.GetResponseAsync(
            messages, options, cancellationToken);
    }
}

// Register
builder.Services.AddChatClient(services =>
    new AzureOpenAIClient(endpoint, key)
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .Use(inner => new RateLimitingChatClient(
            inner,
            new TokenBucketRateLimiter(new TokenBucketRateLimiterOptions
            {
                TokenLimit = 100,
                ReplenishmentPeriod = TimeSpan.FromMinutes(1),
                TokensPerPeriod = 50
            })))
        .UseLogging()
        .Build(services));

IImageGenerator — Text-to-Image Generation (Experimental)

The newest interface (marked experimental) enables text-to-image generation integration with the same familiar DI pattern:

IImageGenerator imageGen = /* provider implementation */;

var result = await imageGen.GenerateAsync(
    new ImageGenerationRequest("A futuristic city with neon lights"),
    new ImageGenerationOptions
    {
        Width = 1024,
        Height = 1024
    });

// result contains URL or byte[] of the generated image

Decision Tree: When to Use What?

graph TD
    Q["Need AI in your .NET app?"] --> A{"Need complex
orchestration?"} A -- Yes --> B{"Need agents,
planning, plugins?"} B -- Yes --> C["Semantic Kernel
+ Agent Framework"] B -- No --> D["Semantic Kernel
simple"] A -- No --> E{"Need middleware
(cache, logging)?"} E -- Yes --> F["Microsoft.Extensions.AI
+ Middleware Pipeline"] E -- No --> G{"Quick
prototype?"} G -- Yes --> H["Direct SDK
(OpenAI, Ollama...)"] G -- No --> F style Q fill:#e94560,stroke:#fff,color:#fff style C fill:#2c3e50,stroke:#fff,color:#fff style D fill:#2c3e50,stroke:#fff,color:#fff style F fill:#4CAF50,stroke:#fff,color:#fff style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Decision tree: Choose the right AI approach for your .NET app

Development Timeline

October 2024
Microsoft announces Microsoft.Extensions.AI Preview — introducing IChatClient and IEmbeddingGenerator for the first time.
November 2025
.NET 10 GA — Microsoft.Extensions.AI officially stable, version 10.0.0 ships with .NET 10.
January 2026
Version 10.3.0 — improved middleware pipeline, deeper OpenTelemetry integration.
April 2026
Version 10.5.0 (current) — adds IImageGenerator (experimental), improved function calling, expanded multi-modal content support.

Production Best Practices

1. Always Use DI, Never New Directly

Register IChatClient via AddChatClient() so the middleware pipeline works correctly and remains testable via mock/stub.

2. Cache Aggressively for Deterministic Prompts

Prompts that don't change (system prompts, classification tasks) should enable UseDistributedCache() to significantly reduce API call costs.

3. Use OpenTelemetry to Monitor Costs

Every LLM call consumes tokens = costs money. UseOpenTelemetry() helps track request count, latency, and token usage via metrics — connect to Prometheus/Grafana to alert when exceeding budget.

4. Be Careful with EnableSensitiveData

When EnableSensitiveData = true is set in OpenTelemetry middleware, all prompts and responses will be logged — dangerous if containing PII. Production should keep this as false.

Conclusion

Microsoft.Extensions.AI is a significant step forward in standardizing how .NET developers integrate AI. Instead of each provider having its own way, you now have IChatClient — equivalent to ILogger for logging, IDistributedCache for caching — a single interface, flexible middleware pipeline, and zero-code-change provider swapping. With .NET 10, the library is stable and production-ready. If you're building any .NET application with AI integration, this is the foundation to start with.

References