Microsoft.Extensions.AI — The Unified AI Abstraction Layer for .NET 10

Posted on: 4/25/2026 9:12:55 PM

Table of contents

Why Microsoft.Extensions.AI?
1. Key Distinction
Architecture Overview
Two Core Packages
IChatClient — The Unified Chat Completion Interface
Middleware Pipeline — The Real Power
1. Register Middleware via DI
  1. Middleware Order Matters
IEmbeddingGenerator — Vector Embeddings for RAG
1. Example: Generate Embeddings for Semantic Search
Function Calling (Tool Use)
Comparison with Other Approaches
Real-world ASP.NET Core Integration
Advanced Pattern: Multi-provider with Named Clients
Writing Custom Middleware
IImageGenerator — Text-to-Image Generation (Experimental)
Decision Tree: When to Use What?
Development Timeline
Production Best Practices
Conclusion
References

When building AI-powered applications on .NET, developers face a familiar challenge: each provider (OpenAI, Azure OpenAI, Ollama, Anthropic…) comes with its own SDK, its own API surface, and its own way of handling streaming. Switching providers means rewriting code. Microsoft.Extensions.AI was created to solve this problem once and for all — providing a unified abstraction layer that integrates deeply into .NET's familiar Dependency Injection ecosystem, allowing you to swap providers without changing any business logic.

10.5.0Latest stable version on NuGet

3Core interfaces: IChatClient, IEmbeddingGenerator, IImageGenerator

4+Built-in middleware (Logging, Cache, Telemetry, Function Calling)

0Lines of code to change when swapping providers

Why Microsoft.Extensions.AI?

Before this library, integrating AI into a .NET application meant choosing between two paths: using a provider's SDK directly (vendor lock-in) or building your own abstraction layer (time-consuming, hard to maintain). Microsoft.Extensions.AI offers a third way — a standard co-developed by Microsoft and the .NET community.

Key Distinction

Microsoft.Extensions.AI does not replace Semantic Kernel. It sits at a lower level — providing primitive types (IChatClient, IEmbeddingGenerator) that Semantic Kernel uses internally. If you only need chat completion or embeddings, Extensions.AI alone is sufficient. When you need complex orchestration (plugins, planning, agents), combine it with Semantic Kernel.

Architecture Overview

The system follows a clean layered architecture where each layer has distinct responsibilities:

graph TD
    A["Application Code
Controller, Service, Worker"] --> B["Microsoft.Extensions.AI
Middleware Pipeline"]
    B --> C["IChatClient / IEmbeddingGenerator
Abstractions"]
    C --> D["Provider Implementations"]
    D --> E["Azure OpenAI"]
    D --> F["OpenAI"]
    D --> G["Ollama
Local Models"]
    D --> H["Anthropic
Claude"]
    D --> I["Custom Provider"]

    B --> J["UseLogging()"]
    B --> K["UseDistributedCache()"]
    B --> L["UseOpenTelemetry()"]
    B --> M["UseFunctionInvocation()"]

    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#16213e,stroke:#fff,color:#fff
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#0078d4,stroke:#fff,color:#fff
    style F fill:#412991,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#d97706,stroke:#fff,color:#fff
    style I fill:#888,stroke:#fff,color:#fff
    style J fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style K fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style L fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style M fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50

Layered architecture of Microsoft.Extensions.AI in .NET 10

Two Core Packages

Package	Role	When to Use
Microsoft.Extensions.AI.Abstractions	Defines interfaces (IChatClient, IEmbeddingGenerator, IImageGenerator) and exchange types (ChatMessage, ChatResponse…)	For library authors — provider adapter implementations only need this package
Microsoft.Extensions.AI	Provides middleware pipeline (logging, caching, telemetry, function invocation) + builder pattern for DI	For application developers — reference this package for full functionality

IChatClient — The Unified Chat Completion Interface

This is the central interface of the entire library. With just 2 core methods, it covers every use case from simple prompts to multi-turn conversations with tool calling:

public interface IChatClient : IDisposable
{
    // Synchronous chat completion — returns full response
    Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

    // Streaming — returns tokens via IAsyncEnumerable
    IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

    ChatClientMetadata Metadata { get; }
}

Basic Example: Chat with Azure OpenAI

using Microsoft.Extensions.AI;
using Azure.AI.OpenAI;

// Create IChatClient from Azure OpenAI SDK
IChatClient client = new AzureOpenAIClient(
        new Uri("https://my-resource.openai.azure.com/"),
        new Azure.AzureKeyCredential("api-key"))
    .GetChatClient("gpt-4o")
    .AsIChatClient();

// Call chat — as simple as calling any service
ChatResponse response = await client.GetResponseAsync(
    "Explain Dependency Injection in 3 sentences");
Console.WriteLine(response.Text);

Streaming Response

await foreach (var update in client.GetStreamingResponseAsync(
    "Write a quick sort function in C#"))
{
    Console.Write(update.Text);
}

Switch to Ollama (Local Model) — Change Only 1 Line

// Before: Azure OpenAI
IChatClient client = new AzureOpenAIClient(endpoint, key)
    .GetChatClient("gpt-4o").AsIChatClient();

// After: Ollama local — business code UNCHANGED
IChatClient client = new OllamaChatClient(
    new Uri("http://localhost:11434"), "llama3.2");

Provider-agnostic is the Key

All downstream code — chat logic, streaming, function calling — stays exactly the same. Only the IChatClient initialization line changes. This is exactly the pattern .NET developers are familiar with from ILogger and IDistributedCache — now applied to AI.

Middleware Pipeline — The Real Power

If it stopped at abstraction, Microsoft.Extensions.AI would be unremarkable. What makes it a game-changer is the middleware pipeline — allowing you to stack cross-cutting concerns just like ASP.NET Core middleware:

sequenceDiagram
    participant App as Application
    participant Log as UseLogging()
    participant Cache as UseDistributedCache()
    participant OTel as UseOpenTelemetry()
    participant Func as UseFunctionInvocation()
    participant LLM as LLM Provider

    App->>Log: GetResponseAsync()
    Log->>Log: Log request
    Log->>Cache: Forward
    Cache->>Cache: Check cache
    alt Cache hit
        Cache-->>Log: Return cached response
        Log-->>App: Response
    else Cache miss
        Cache->>OTel: Forward
        OTel->>OTel: Start trace span
        OTel->>Func: Forward
        Func->>LLM: Call provider
        LLM-->>Func: Response (may include tool calls)
        Func->>Func: Execute tools, re-call if needed
        Func-->>OTel: Final response
        OTel->>OTel: End span
        OTel-->>Cache: Response
        Cache->>Cache: Store in cache
        Cache-->>Log: Response
        Log->>Log: Log response
        Log-->>App: Response
    end

Request flow through the middleware pipeline

Register Middleware via DI

var builder = WebApplication.CreateBuilder(args);

// Register IChatClient with full middleware pipeline
builder.Services.AddChatClient(services =>
    new AzureOpenAIClient(
            new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
            new Azure.AzureKeyCredential(
                builder.Configuration["AzureOpenAI:ApiKey"]!))
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .UseLogging()                    // Log all requests/responses
        .UseDistributedCache()           // Cache responses for identical prompts
        .UseOpenTelemetry()              // Distributed tracing
        .UseFunctionInvocation()         // Auto-invoke tools when LLM requests
        .Build(services));

// Backend cache
builder.Services.AddDistributedMemoryCache(); // Or Redis cache

Middleware Order Matters

Middleware is applied outermost-first. UseLogging() at the outermost position logs everything including cache hits. Place UseDistributedCache() before UseOpenTelemetry() to avoid creating trace spans for cached responses — reducing noise in monitoring.

IEmbeddingGenerator — Vector Embeddings for RAG

The second interface serves RAG (Retrieval-Augmented Generation) systems, semantic search, and memory stores:

public interface IEmbeddingGenerator<TInput, TEmbedding> : IDisposable
    where TEmbedding : Embedding
{
    Task<GeneratedEmbeddings<TEmbedding>> GenerateAsync(
        IEnumerable<TInput> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default);
}

Example: Generate Embeddings for Semantic Search

IEmbeddingGenerator<string, Embedding<float>> generator =
    new AzureOpenAIClient(endpoint, key)
        .GetEmbeddingClient("text-embedding-3-small")
        .AsIEmbeddingGenerator();

// Generate embeddings for a batch of documents
var embeddings = await generator.GenerateAsync(new[]
{
    "Microservices architecture patterns",
    "Event-driven design with message queues",
    "CQRS and Event Sourcing in .NET 10"
});

// Each embedding is a float[] vector — store in a vector database
foreach (var emb in embeddings)
{
    float[] vector = emb.Vector.ToArray();
    // Store in Qdrant, Milvus, pgvector...
}

Function Calling (Tool Use)

One of the most powerful features: let the LLM invoke C# functions you define, with UseFunctionInvocation() middleware automatically dispatching calls and re-sending results to the LLM:

[Description("Get weather information for a city")]
static async Task<string> GetWeather(
    [Description("City name")] string city)
{
    // Call actual weather API
    return $"Weather in {city}: 28°C, sunny";
}

IChatClient client = new AzureOpenAIClient(endpoint, key)
    .GetChatClient("gpt-4o")
    .AsIChatClient()
    .AsBuilder()
    .UseFunctionInvocation() // Auto-invoke when LLM requests
    .Build();

var options = new ChatOptions
{
    Tools = [AIFunctionFactory.Create(GetWeather)]
};

var response = await client.GetResponseAsync(
    "What's the weather like in Hanoi today?", options);
// LLM auto-calls GetWeather("Hanoi"), receives result,
// then composes a natural language answer

Comparison with Other Approaches

Criteria	Direct SDK	Microsoft.Extensions.AI	Semantic Kernel
Provider lock-in	High — code tightly coupled to 1 SDK	None — swap via DI	None — uses IChatClient underneath
Middleware pipeline	DIY	Built-in (logging, cache, telemetry, function calling)	Built-in + filters, plugins
Dependency Injection	Manual wiring	First-class support	First-class support
Orchestration (planning, agents)	No	No — primitives only	Yes — plugins, planner, agent framework
Learning curve	Low but fragmented	Low, familiar to .NET devs	Medium — many concepts
Package size	Small	Very small (~lightweight)	Larger — full orchestration
Best for	Quick prototypes	Production apps needing simple AI	Complex AI-first applications

Real-world ASP.NET Core Integration

Here's a complete pattern for an API endpoint using IChatClient via DI:

// Program.cs
var builder = WebApplication.CreateBuilder(args);

// Register cache backend
builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = builder.Configuration
        .GetConnectionString("Redis");
});

// Register IChatClient with middleware pipeline
builder.Services.AddChatClient(services =>
{
    var config = builder.Configuration;
    return new AzureOpenAIClient(
            new Uri(config["AI:Endpoint"]!),
            new Azure.AzureKeyCredential(config["AI:ApiKey"]!))
        .GetChatClient(config["AI:Model"]!)
        .AsIChatClient()
        .AsBuilder()
        .UseLogging()
        .UseDistributedCache()
        .UseOpenTelemetry(
            configure: otel => otel.EnableSensitiveData = false)
        .UseFunctionInvocation()
        .Build(services);
});

var app = builder.Build();

app.MapPost("/api/chat", async (
    IChatClient chatClient,
    ChatRequest request) =>
{
    var messages = new List<ChatMessage>
    {
        new(ChatRole.System, "You are a technical assistant specializing in .NET."),
        new(ChatRole.User, request.Message)
    };

    var response = await chatClient.GetResponseAsync(messages);
    return Results.Ok(new { reply = response.Text });
});

app.Run();

Advanced Pattern: Multi-provider with Named Clients

In production, you often need multiple LLMs for different purposes — e.g., a small model for classification, a large model for generation:

// Register 2 clients with different names
builder.Services.AddKeyedSingleton<IChatClient>("fast", (sp, _) =>
    new OllamaChatClient(
        new Uri("http://localhost:11434"), "phi-4-mini"));

builder.Services.AddKeyedSingleton<IChatClient>("smart", (sp, _) =>
    new AzureOpenAIClient(endpoint, key)
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .UseDistributedCache()
        .Build(sp));

// Inject with [FromKeyedServices]
app.MapPost("/api/classify", async (
    [FromKeyedServices("fast")] IChatClient fast,
    string text) =>
{
    var result = await fast.GetResponseAsync(
        $"Classify intent: {text}. Return: support|sales|billing");
    return Results.Ok(new { intent = result.Text?.Trim() });
});

app.MapPost("/api/generate", async (
    [FromKeyedServices("smart")] IChatClient smart,
    string prompt) =>
{
    var result = await smart.GetResponseAsync(prompt);
    return Results.Ok(new { content = result.Text });
});

Writing Custom Middleware

You can write your own middleware. Example — rate limiting middleware using Polly:

public class RateLimitingChatClient : DelegatingChatClient
{
    private readonly RateLimiter _limiter;

    public RateLimitingChatClient(
        IChatClient inner, RateLimiter limiter)
        : base(inner)
    {
        _limiter = limiter;
    }

    public override async Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await _limiter
            .AcquireAsync(1, cancellationToken);

        if (!lease.IsAcquired)
            throw new RateLimitExceededException("AI rate limit hit");

        return await base.GetResponseAsync(
            messages, options, cancellationToken);
    }
}

// Register
builder.Services.AddChatClient(services =>
    new AzureOpenAIClient(endpoint, key)
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .Use(inner => new RateLimitingChatClient(
            inner,
            new TokenBucketRateLimiter(new TokenBucketRateLimiterOptions
            {
                TokenLimit = 100,
                ReplenishmentPeriod = TimeSpan.FromMinutes(1),
                TokensPerPeriod = 50
            })))
        .UseLogging()
        .Build(services));

IImageGenerator — Text-to-Image Generation (Experimental)

The newest interface (marked experimental) enables text-to-image generation integration with the same familiar DI pattern:

IImageGenerator imageGen = /* provider implementation */;

var result = await imageGen.GenerateAsync(
    new ImageGenerationRequest("A futuristic city with neon lights"),
    new ImageGenerationOptions
    {
        Width = 1024,
        Height = 1024
    });

// result contains URL or byte[] of the generated image

Decision Tree: When to Use What?

graph TD
    Q["Need AI in your .NET app?"] --> A{"Need complex
orchestration?"}
    A -- Yes --> B{"Need agents,
planning, plugins?"}
    B -- Yes --> C["Semantic Kernel
+ Agent Framework"]
    B -- No --> D["Semantic Kernel
simple"]
    A -- No --> E{"Need middleware
(cache, logging)?"}
    E -- Yes --> F["Microsoft.Extensions.AI
+ Middleware Pipeline"]
    E -- No --> G{"Quick
prototype?"}
    G -- Yes --> H["Direct SDK
(OpenAI, Ollama...)"]
    G -- No --> F

    style Q fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Decision tree: Choose the right AI approach for your .NET app

Development Timeline

October 2024

Microsoft announces Microsoft.Extensions.AI Preview — introducing IChatClient and IEmbeddingGenerator for the first time.

November 2025

.NET 10 GA — Microsoft.Extensions.AI officially stable, version 10.0.0 ships with .NET 10.

January 2026

Version 10.3.0 — improved middleware pipeline, deeper OpenTelemetry integration.

April 2026

Version 10.5.0 (current) — adds IImageGenerator (experimental), improved function calling, expanded multi-modal content support.

Production Best Practices

1. Always Use DI, Never New Directly

Register IChatClient via AddChatClient() so the middleware pipeline works correctly and remains testable via mock/stub.

2. Cache Aggressively for Deterministic Prompts

Prompts that don't change (system prompts, classification tasks) should enable UseDistributedCache() to significantly reduce API call costs.

3. Use OpenTelemetry to Monitor Costs

Every LLM call consumes tokens = costs money. UseOpenTelemetry() helps track request count, latency, and token usage via metrics — connect to Prometheus/Grafana to alert when exceeding budget.

4. Be Careful with EnableSensitiveData

When EnableSensitiveData = true is set in OpenTelemetry middleware, all prompts and responses will be logged — dangerous if containing PII. Production should keep this as false.

Conclusion

Microsoft.Extensions.AI is a significant step forward in standardizing how .NET developers integrate AI. Instead of each provider having its own way, you now have IChatClient — equivalent to ILogger for logging, IDistributedCache for caching — a single interface, flexible middleware pipeline, and zero-code-change provider swapping. With .NET 10, the library is stable and production-ready. If you're building any .NET application with AI integration, this is the foundation to start with.

References

#system design #RAG #OpenAI #Function Calling #.NET 10 #OpenTelemetry #Microsoft Extensions AI #Ollama #ASP.NET Core #Dependency Injection #Semantic Kernel #C# #IChatClient #IEmbeddingGenerator #AI Abstraction #Middleware Pipeline #Azure OpenAI #Embeddings #LLM

# Microsoft.Extensions.AI — The Unified AI Abstraction Layer for .NET 10

When building AI-powered applications on .NET, developers face a familiar challenge: each provider (OpenAI, Azure OpenAI, Ollama, Anthropic…) comes with its own SDK, its own API surface, and its own way of handling streaming. Switching providers means rewriting code. **Microsoft.Extensions.AI** was created to solve this problem once and for all — providing a unified abstraction layer that integrates deeply into .NET's familiar Dependency Injection ecosystem, allowing you to swap providers without changing any business logic.

10.5.0Latest stable version on NuGet

3Core interfaces: IChatClient, IEmbeddingGenerator, IImageGenerator

4+Built-in middleware (Logging, Cache, Telemetry, Function Calling)

0Lines of code to change when swapping providers

## Why Microsoft.Extensions.AI?

#### Key Distinction

Microsoft.Extensions.AI **does not replace** Semantic Kernel. It sits at a lower level — providing primitive types (IChatClient, IEmbeddingGenerator) that Semantic Kernel uses internally. If you only need chat completion or embeddings, Extensions.AI alone is sufficient. When you need complex orchestration (plugins, planning, agents), combine it with Semantic Kernel.

## Architecture Overview

The system follows a clean layered architecture where each layer has distinct responsibilities:

```
graph TD
    A["Application Code  
Controller, Service, Worker"] --> B["Microsoft.Extensions.AI  
Middleware Pipeline"]
    B --> C["IChatClient / IEmbeddingGenerator  
Abstractions"]
    C --> D["Provider Implementations"]
    D --> E["Azure OpenAI"]
    D --> F["OpenAI"]
    D --> G["Ollama  
Local Models"]
    D --> H["Anthropic  
Claude"]
    D --> I["Custom Provider"]

B --> J["UseLogging()"]
    B --> K["UseDistributedCache()"]
    B --> L["UseOpenTelemetry()"]
    B --> M["UseFunctionInvocation()"]

style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#16213e,stroke:#fff,color:#fff
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#0078d4,stroke:#fff,color:#fff
    style F fill:#412991,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#d97706,stroke:#fff,color:#fff
    style I fill:#888,stroke:#fff,color:#fff
    style J fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style K fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style L fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style M fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50

```
Layered architecture of Microsoft.Extensions.AI in .NET 10

## Two Core Packages

| Package | Role | When to Use |
| --- | --- | --- |
| **Microsoft.Extensions.AI.Abstractions** | Defines interfaces (IChatClient, IEmbeddingGenerator, IImageGenerator) and exchange types (ChatMessage, ChatResponse…) | For library authors — provider adapter implementations only need this package |
| **Microsoft.Extensions.AI** | Provides middleware pipeline (logging, caching, telemetry, function invocation) + builder pattern for DI | For application developers — reference this package for full functionality |

## IChatClient — The Unified Chat Completion Interface

This is the central interface of the entire library. With just 2 core methods, it covers every use case from simple prompts to multi-turn conversations with tool calling:

```csharp
public interface IChatClient : IDisposable
{
    // Synchronous chat completion — returns full response
    Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

// Streaming — returns tokens via IAsyncEnumerable
    IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

ChatClientMetadata Metadata { get; }
}
```

### Basic Example: Chat with Azure OpenAI

```csharp
using Microsoft.Extensions.AI;
using Azure.AI.OpenAI;

// Create IChatClient from Azure OpenAI SDK
IChatClient client = new AzureOpenAIClient(
        new Uri("https://my-resource.openai.azure.com/"),
        new Azure.AzureKeyCredential("api-key"))
    .GetChatClient("gpt-4o")
    .AsIChatClient();

// Call chat — as simple as calling any service
ChatResponse response = await client.GetResponseAsync(
    "Explain Dependency Injection in 3 sentences");
Console.WriteLine(response.Text);
```

### Streaming Response

```csharp
await foreach (var update in client.GetStreamingResponseAsync(
    "Write a quick sort function in C#"))
{
    Console.Write(update.Text);
}
```

### Switch to Ollama (Local Model) — Change Only 1 Line

```csharp
// Before: Azure OpenAI
IChatClient client = new AzureOpenAIClient(endpoint, key)
    .GetChatClient("gpt-4o").AsIChatClient();

// After: Ollama local — business code UNCHANGED
IChatClient client = new OllamaChatClient(
    new Uri("http://localhost:11434"), "llama3.2");
```

#### Provider-agnostic is the Key

## Middleware Pipeline — The Real Power

If it stopped at abstraction, Microsoft.Extensions.AI would be unremarkable. What makes it a game-changer is the **middleware pipeline** — allowing you to stack cross-cutting concerns just like ASP.NET Core middleware:

```
sequenceDiagram
    participant App as Application
    participant Log as UseLogging()
    participant Cache as UseDistributedCache()
    participant OTel as UseOpenTelemetry()
    participant Func as UseFunctionInvocation()
    participant LLM as LLM Provider

App->>Log: GetResponseAsync()
    Log->>Log: Log request
    Log->>Cache: Forward
    Cache->>Cache: Check cache
    alt Cache hit
        Cache-->>Log: Return cached response
        Log-->>App: Response
    else Cache miss
        Cache->>OTel: Forward
        OTel->>OTel: Start trace span
        OTel->>Func: Forward
        Func->>LLM: Call provider
        LLM-->>Func: Response (may include tool calls)
        Func->>Func: Execute tools, re-call if needed
        Func-->>OTel: Final response
        OTel->>OTel: End span
        OTel-->>Cache: Response
        Cache->>Cache: Store in cache
        Cache-->>Log: Response
        Log->>Log: Log response
        Log-->>App: Response
    end

```
Request flow through the middleware pipeline

### Register Middleware via DI

```csharp
var builder = WebApplication.CreateBuilder(args);

// Register IChatClient with full middleware pipeline
builder.Services.AddChatClient(services =>
    new AzureOpenAIClient(
            new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
            new Azure.AzureKeyCredential(
                builder.Configuration["AzureOpenAI:ApiKey"]!))
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .UseLogging()                    // Log all requests/responses
        .UseDistributedCache()           // Cache responses for identical prompts
        .UseOpenTelemetry()              // Distributed tracing
        .UseFunctionInvocation()         // Auto-invoke tools when LLM requests
        .Build(services));

// Backend cache
builder.Services.AddDistributedMemoryCache(); // Or Redis cache
```

#### Middleware Order Matters

Middleware is applied outermost-first. `UseLogging()` at the outermost position logs everything including cache hits. Place `UseDistributedCache()` before `UseOpenTelemetry()` to avoid creating trace spans for cached responses — reducing noise in monitoring.

## IEmbeddingGenerator — Vector Embeddings for RAG

The second interface serves RAG (Retrieval-Augmented Generation) systems, semantic search, and memory stores:

```csharp
public interface IEmbeddingGenerator<TInput, TEmbedding> : IDisposable
    where TEmbedding : Embedding
{
    Task<GeneratedEmbeddings<TEmbedding>> GenerateAsync(
        IEnumerable<TInput> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default);
}
```

### Example: Generate Embeddings for Semantic Search

```csharp
IEmbeddingGenerator<string, Embedding<float>> generator =
    new AzureOpenAIClient(endpoint, key)
        .GetEmbeddingClient("text-embedding-3-small")
        .AsIEmbeddingGenerator();

// Generate embeddings for a batch of documents
var embeddings = await generator.GenerateAsync(new[]
{
    "Microservices architecture patterns",
    "Event-driven design with message queues",
    "CQRS and Event Sourcing in .NET 10"
});

// Each embedding is a float[] vector — store in a vector database
foreach (var emb in embeddings)
{
    float[] vector = emb.Vector.ToArray();
    // Store in Qdrant, Milvus, pgvector...
}
```

## Function Calling (Tool Use)

One of the most powerful features: let the LLM invoke C# functions you define, with `UseFunctionInvocation()` middleware automatically dispatching calls and re-sending results to the LLM:

```csharp
[Description("Get weather information for a city")]
static async Task<string> GetWeather(
    [Description("City name")] string city)
{
    // Call actual weather API
    return $"Weather in {city}: 28°C, sunny";
}

IChatClient client = new AzureOpenAIClient(endpoint, key)
    .GetChatClient("gpt-4o")
    .AsIChatClient()
    .AsBuilder()
    .UseFunctionInvocation() // Auto-invoke when LLM requests
    .Build();

var options = new ChatOptions
{
    Tools = [AIFunctionFactory.Create(GetWeather)]
};

var response = await client.GetResponseAsync(
    "What's the weather like in Hanoi today?", options);
// LLM auto-calls GetWeather("Hanoi"), receives result,
// then composes a natural language answer
```

## Comparison with Other Approaches

| Criteria | Direct SDK | Microsoft.Extensions.AI | Semantic Kernel |
| --- | --- | --- | --- |
| **Provider lock-in** | High — code tightly coupled to 1 SDK | None — swap via DI | None — uses IChatClient underneath |
| **Middleware pipeline** | DIY | Built-in (logging, cache, telemetry, function calling) | Built-in + filters, plugins |
| **Dependency Injection** | Manual wiring | First-class support | First-class support |
| **Orchestration (planning, agents)** | No | No — primitives only | Yes — plugins, planner, agent framework |
| **Learning curve** | Low but fragmented | Low, familiar to .NET devs | Medium — many concepts |
| **Package size** | Small | Very small (~lightweight) | Larger — full orchestration |
| **Best for** | Quick prototypes | Production apps needing simple AI | Complex AI-first applications |

## Real-world ASP.NET Core Integration

Here's a complete pattern for an API endpoint using IChatClient via DI:

```csharp
// Program.cs
var builder = WebApplication.CreateBuilder(args);

// Register cache backend
builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = builder.Configuration
        .GetConnectionString("Redis");
});

// Register IChatClient with middleware pipeline
builder.Services.AddChatClient(services =>
{
    var config = builder.Configuration;
    return new AzureOpenAIClient(
            new Uri(config["AI:Endpoint"]!),
            new Azure.AzureKeyCredential(config["AI:ApiKey"]!))
        .GetChatClient(config["AI:Model"]!)
        .AsIChatClient()
        .AsBuilder()
        .UseLogging()
        .UseDistributedCache()
        .UseOpenTelemetry(
            configure: otel => otel.EnableSensitiveData = false)
        .UseFunctionInvocation()
        .Build(services);
});

var app = builder.Build();

app.MapPost("/api/chat", async (
    IChatClient chatClient,
    ChatRequest request) =>
{
    var messages = new List<ChatMessage>
    {
        new(ChatRole.System, "You are a technical assistant specializing in .NET."),
        new(ChatRole.User, request.Message)
    };

var response = await chatClient.GetResponseAsync(messages);
    return Results.Ok(new { reply = response.Text });
});

app.Run();
```

## Advanced Pattern: Multi-provider with Named Clients

In production, you often need multiple LLMs for different purposes — e.g., a small model for classification, a large model for generation:

```csharp
// Register 2 clients with different names
builder.Services.AddKeyedSingleton<IChatClient>("fast", (sp, _) =>
    new OllamaChatClient(
        new Uri("http://localhost:11434"), "phi-4-mini"));

builder.Services.AddKeyedSingleton<IChatClient>("smart", (sp, _) =>
    new AzureOpenAIClient(endpoint, key)
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .UseDistributedCache()
        .Build(sp));

// Inject with [FromKeyedServices]
app.MapPost("/api/classify", async (
    [FromKeyedServices("fast")] IChatClient fast,
    string text) =>
{
    var result = await fast.GetResponseAsync(
        $"Classify intent: {text}. Return: support|sales|billing");
    return Results.Ok(new { intent = result.Text?.Trim() });
});

app.MapPost("/api/generate", async (
    [FromKeyedServices("smart")] IChatClient smart,
    string prompt) =>
{
    var result = await smart.GetResponseAsync(prompt);
    return Results.Ok(new { content = result.Text });
});
```

## Writing Custom Middleware

You can write your own middleware. Example — rate limiting middleware using Polly:

```csharp
public class RateLimitingChatClient : DelegatingChatClient
{
    private readonly RateLimiter _limiter;

public RateLimitingChatClient(
        IChatClient inner, RateLimiter limiter)
        : base(inner)
    {
        _limiter = limiter;
    }

public override async Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await _limiter
            .AcquireAsync(1, cancellationToken);

if (!lease.IsAcquired)
            throw new RateLimitExceededException("AI rate limit hit");

return await base.GetResponseAsync(
            messages, options, cancellationToken);
    }
}

// Register
builder.Services.AddChatClient(services =>
    new AzureOpenAIClient(endpoint, key)
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .Use(inner => new RateLimitingChatClient(
            inner,
            new TokenBucketRateLimiter(new TokenBucketRateLimiterOptions
            {
                TokenLimit = 100,
                ReplenishmentPeriod = TimeSpan.FromMinutes(1),
                TokensPerPeriod = 50
            })))
        .UseLogging()
        .Build(services));
```

## IImageGenerator — Text-to-Image Generation (Experimental)

The newest interface (marked experimental) enables text-to-image generation integration with the same familiar DI pattern:

```csharp
IImageGenerator imageGen = /* provider implementation */;

var result = await imageGen.GenerateAsync(
    new ImageGenerationRequest("A futuristic city with neon lights"),
    new ImageGenerationOptions
    {
        Width = 1024,
        Height = 1024
    });

// result contains URL or byte[] of the generated image
```

## Decision Tree: When to Use What?

```
graph TD
    Q["Need AI in your .NET app?"] --> A{"Need complex  
orchestration?"}
    A -- Yes --> B{"Need agents,  
planning, plugins?"}
    B -- Yes --> C["Semantic Kernel  
+ Agent Framework"]
    B -- No --> D["Semantic Kernel  
simple"]
    A -- No --> E{"Need middleware  
(cache, logging)?"}
    E -- Yes --> F["Microsoft.Extensions.AI  
+ Middleware Pipeline"]
    E -- No --> G{"Quick  
prototype?"}
    G -- Yes --> H["Direct SDK  
(OpenAI, Ollama...)"]
    G -- No --> F

style Q fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50

```
Decision tree: Choose the right AI approach for your .NET app

## Development Timeline

October 2024

Microsoft announces Microsoft.Extensions.AI Preview — introducing IChatClient and IEmbeddingGenerator for the first time.

November 2025

.NET 10 GA — Microsoft.Extensions.AI officially stable, version 10.0.0 ships with .NET 10.

January 2026

Version 10.3.0 — improved middleware pipeline, deeper OpenTelemetry integration.

April 2026

Version 10.5.0 (current) — adds IImageGenerator (experimental), improved function calling, expanded multi-modal content support.

## Production Best Practices

#### 1. Always Use DI, Never New Directly

Register IChatClient via `AddChatClient()` so the middleware pipeline works correctly and remains testable via mock/stub.

#### 2. Cache Aggressively for Deterministic Prompts

Prompts that don't change (system prompts, classification tasks) should enable `UseDistributedCache()` to significantly reduce API call costs.

#### 3. Use OpenTelemetry to Monitor Costs

Every LLM call consumes tokens = costs money. `UseOpenTelemetry()` helps track request count, latency, and token usage via metrics — connect to Prometheus/Grafana to alert when exceeding budget.

#### 4. Be Careful with EnableSensitiveData

When `EnableSensitiveData = true` is set in OpenTelemetry middleware, all prompts and responses will be logged — dangerous if containing PII. Production should keep this as `false`.

## Conclusion

## References

- [Microsoft.Extensions.AI libraries - .NET | Microsoft Learn](https://learn.microsoft.com/en-us/dotnet/ai/microsoft-extensions-ai)
- [Introducing Microsoft.Extensions.AI Preview - .NET Blog](https://devblogs.microsoft.com/dotnet/introducing-microsoft-extensions-ai-preview/)
- [NuGet Gallery | Microsoft.Extensions.AI 10.5.0](https://www.nuget.org/packages/Microsoft.Extensions.AI/)
- [GitHub - dotnet/extensions - Microsoft.Extensions.AI source](https://github.com/dotnet/extensions/tree/main/src/Libraries/Microsoft.Extensions.AI)
- [Microsoft.Extensions.AI: The New Foundation for .NET AI Development](https://www.dotnetstudioai.com/university/microsoft-extensions-ai-dotnet-guide/)
- [Generative AI with LLMs in C# in 2026 - .NET Blog](https://devblogs.microsoft.com/dotnet/generative-ai-with-large-language-models-in-dotnet-and-csharp/)

Valkey vs Redis 2026 — The Fork That Reshaped the In-Memory Database Landscape

Astro Framework — The Secret Weapon for High-Performance Websites in the Edge Computing Era

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.