Microsoft.Extensions.AI — Tầng trừu tượng AI thống nhất cho .NET 10

Posted on: 4/25/2026 9:12:55 PM

Table of contents

Tại sao cần Microsoft.Extensions.AI?
1. Điểm mấu chốt
Kiến trúc tổng quan
Hai package cốt lõi
IChatClient — Interface thống nhất cho Chat Completion
Middleware Pipeline — Sức mạnh thực sự
1. Đăng ký middleware qua DI
  1. Thứ tự middleware quan trọng
IEmbeddingGenerator — Vector Embeddings cho RAG
1. Ví dụ: Tạo embeddings cho semantic search
Function Calling (Tool Use)
So sánh với các approach khác
Tích hợp vào ứng dụng ASP.NET Core thực tế
Pattern nâng cao: Multi-provider với Named Clients
Viết custom middleware
IImageGenerator — Sinh ảnh từ text (Experimental)
Khi nào dùng gì?
Lộ trình phát triển
Best practices cho Production
Kết luận
Tham khảo

Khi xây dựng ứng dụng AI trên .NET, lập trình viên thường phải đối mặt với bài toán quen thuộc: mỗi provider (OpenAI, Azure OpenAI, Ollama, Anthropic…) lại có SDK riêng, API surface riêng, cách xử lý streaming riêng. Đổi provider đồng nghĩa với viết lại code. Microsoft.Extensions.AI ra đời để giải quyết triệt để vấn đề này — cung cấp một tầng trừu tượng thống nhất, tích hợp sâu vào hệ sinh thái Dependency Injection quen thuộc của .NET, cho phép swap provider mà không thay đổi business logic.

10.5.0Phiên bản stable mới nhất trên NuGet

3Interface chính: IChatClient, IEmbeddingGenerator, IImageGenerator

4+Middleware tích hợp sẵn (Logging, Cache, Telemetry, Function Calling)

0Dòng code cần thay đổi khi swap provider

Tại sao cần Microsoft.Extensions.AI?

Trước khi có thư viện này, mỗi khi tích hợp AI vào ứng dụng .NET, bạn phải chọn giữa hai con đường: dùng trực tiếp SDK của từng provider (bị vendor lock-in) hoặc tự xây abstraction layer (tốn thời gian, khó maintain). Microsoft.Extensions.AI đưa ra con đường thứ ba — một chuẩn chung được Microsoft và cộng đồng .NET đồng phát triển.

Điểm mấu chốt

Microsoft.Extensions.AI không thay thế Semantic Kernel. Nó nằm ở tầng thấp hơn — cung cấp primitive types (IChatClient, IEmbeddingGenerator) mà Semantic Kernel sử dụng bên trong. Nếu bạn chỉ cần chat completion hoặc embedding, dùng trực tiếp Extensions.AI là đủ. Khi cần orchestration phức tạp (plugins, planning, agents), hãy kết hợp với Semantic Kernel.

Kiến trúc tổng quan

Hệ thống được thiết kế theo mô hình phân tầng rõ ràng, mỗi tầng có trách nhiệm riêng biệt:

graph TD
    A["Application Code
Controller, Service, Worker"] --> B["Microsoft.Extensions.AI
Middleware Pipeline"]
    B --> C["IChatClient / IEmbeddingGenerator
Abstractions"]
    C --> D["Provider Implementations"]
    D --> E["Azure OpenAI"]
    D --> F["OpenAI"]
    D --> G["Ollama
Local Models"]
    D --> H["Anthropic
Claude"]
    D --> I["Custom Provider"]

    B --> J["UseLogging()"]
    B --> K["UseDistributedCache()"]
    B --> L["UseOpenTelemetry()"]
    B --> M["UseFunctionInvocation()"]

    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#16213e,stroke:#fff,color:#fff
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#0078d4,stroke:#fff,color:#fff
    style F fill:#412991,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#d97706,stroke:#fff,color:#fff
    style I fill:#888,stroke:#fff,color:#fff
    style J fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style K fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style L fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style M fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50

Kiến trúc phân tầng của Microsoft.Extensions.AI trong .NET 10

Hai package cốt lõi

Package	Vai trò	Khi nào dùng
Microsoft.Extensions.AI.Abstractions	Định nghĩa các interface (IChatClient, IEmbeddingGenerator, IImageGenerator) và exchange types (ChatMessage, ChatResponse…)	Cho library authors — ai viết provider adapter chỉ cần reference package này
Microsoft.Extensions.AI	Cung cấp middleware pipeline (logging, caching, telemetry, function invocation) + builder pattern cho DI	Cho application developers — reference package này để có full functionality

IChatClient — Interface thống nhất cho Chat Completion

Đây là interface trung tâm của toàn bộ thư viện. Chỉ với 2 method chính, nó bao quát mọi use case từ simple prompt đến multi-turn conversation có tool calling:

public interface IChatClient : IDisposable
{
    // Chat completion đồng bộ — trả về full response
    Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

    // Streaming — trả về từng token qua IAsyncEnumerable
    IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

    ChatClientMetadata Metadata { get; }
}

Ví dụ cơ bản: Chat với Azure OpenAI

using Microsoft.Extensions.AI;
using Azure.AI.OpenAI;

// Tạo IChatClient từ Azure OpenAI SDK
IChatClient client = new AzureOpenAIClient(
        new Uri("https://my-resource.openai.azure.com/"),
        new Azure.AzureKeyCredential("api-key"))
    .GetChatClient("gpt-4o")
    .AsIChatClient();

// Gọi chat — đơn giản như gọi một service bất kỳ
ChatResponse response = await client.GetResponseAsync(
    "Giải thích Dependency Injection trong 3 câu");
Console.WriteLine(response.Text);

Streaming response

await foreach (var update in client.GetStreamingResponseAsync(
    "Viết hàm quick sort bằng C#"))
{
    Console.Write(update.Text);
}

Đổi sang Ollama (local model) — chỉ thay 1 dòng

// Trước: Azure OpenAI
IChatClient client = new AzureOpenAIClient(endpoint, key)
    .GetChatClient("gpt-4o").AsIChatClient();

// Sau: Ollama local — business code KHÔNG đổi
IChatClient client = new OllamaChatClient(
    new Uri("http://localhost:11434"), "llama3.2");

Provider-agnostic là chìa khóa

Toàn bộ code phía sau — từ chat logic, streaming, function calling — đều giữ nguyên. Chỉ cần thay dòng khởi tạo IChatClient. Đây chính xác là pattern mà .NET developers đã quen với ILogger, IDistributedCache — giờ áp dụng cho AI.

Middleware Pipeline — Sức mạnh thực sự

Nếu chỉ dừng ở abstraction layer, Microsoft.Extensions.AI không có gì đặc biệt. Điểm khiến nó trở thành game-changer là middleware pipeline — cho phép bạn xếp chồng các cross-cutting concerns giống như middleware trong ASP.NET Core:

sequenceDiagram
    participant App as Application
    participant Log as UseLogging()
    participant Cache as UseDistributedCache()
    participant OTel as UseOpenTelemetry()
    participant Func as UseFunctionInvocation()
    participant LLM as LLM Provider

    App->>Log: GetResponseAsync()
    Log->>Log: Log request
    Log->>Cache: Forward
    Cache->>Cache: Check cache
    alt Cache hit
        Cache-->>Log: Return cached response
        Log-->>App: Response
    else Cache miss
        Cache->>OTel: Forward
        OTel->>OTel: Start trace span
        OTel->>Func: Forward
        Func->>LLM: Call provider
        LLM-->>Func: Response (may include tool calls)
        Func->>Func: Execute tools, re-call if needed
        Func-->>OTel: Final response
        OTel->>OTel: End span
        OTel-->>Cache: Response
        Cache->>Cache: Store in cache
        Cache-->>Log: Response
        Log->>Log: Log response
        Log-->>App: Response
    end

Luồng xử lý request qua middleware pipeline

Đăng ký middleware qua DI

var builder = WebApplication.CreateBuilder(args);

// Đăng ký IChatClient với full middleware pipeline
builder.Services.AddChatClient(services =>
    new AzureOpenAIClient(
            new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
            new Azure.AzureKeyCredential(
                builder.Configuration["AzureOpenAI:ApiKey"]!))
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .UseLogging()                    // Log mọi request/response
        .UseDistributedCache()           // Cache response cho prompt giống nhau
        .UseOpenTelemetry()              // Distributed tracing
        .UseFunctionInvocation()         // Tự động gọi tool khi LLM yêu cầu
        .Build(services));

// Inject IChatClient vào controller/service như bình thường
builder.Services.AddDistributedMemoryCache(); // Hoặc Redis cache

Thứ tự middleware quan trọng

Middleware xếp theo thứ tự outermost-first. UseLogging() ở ngoài cùng sẽ log mọi thứ kể cả cache hit. Đặt UseDistributedCache() trước UseOpenTelemetry() để tránh tạo trace span cho cached response — giảm noise trong monitoring.

IEmbeddingGenerator — Vector Embeddings cho RAG

Interface thứ hai phục vụ cho các hệ thống RAG (Retrieval-Augmented Generation), semantic search, và memory store:

public interface IEmbeddingGenerator<TInput, TEmbedding> : IDisposable
    where TEmbedding : Embedding
{
    Task<GeneratedEmbeddings<TEmbedding>> GenerateAsync(
        IEnumerable<TInput> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default);
}

Ví dụ: Tạo embeddings cho semantic search

IEmbeddingGenerator<string, Embedding<float>> generator =
    new AzureOpenAIClient(endpoint, key)
        .GetEmbeddingClient("text-embedding-3-small")
        .AsIEmbeddingGenerator();

// Tạo embedding cho batch documents
var embeddings = await generator.GenerateAsync(new[]
{
    "Microservices architecture patterns",
    "Event-driven design with message queues",
    "CQRS and Event Sourcing in .NET 10"
});

// Mỗi embedding là một vector float[] — lưu vào vector database
foreach (var emb in embeddings)
{
    float[] vector = emb.Vector.ToArray();
    // Store in Qdrant, Milvus, pgvector...
}

Function Calling (Tool Use)

Một trong những tính năng mạnh nhất: cho phép LLM gọi các function C# bạn định nghĩa, và middleware UseFunctionInvocation() tự động dispatch + re-call LLM với kết quả:

[Description("Lấy thông tin thời tiết cho một thành phố")]
static async Task<string> GetWeather(
    [Description("Tên thành phố")] string city)
{
    // Gọi weather API thực tế
    return $"Thời tiết tại {city}: 28°C, trời nắng";
}

IChatClient client = new AzureOpenAIClient(endpoint, key)
    .GetChatClient("gpt-4o")
    .AsIChatClient()
    .AsBuilder()
    .UseFunctionInvocation() // Tự động gọi function khi LLM yêu cầu
    .Build();

var options = new ChatOptions
{
    Tools = [AIFunctionFactory.Create(GetWeather)]
};

var response = await client.GetResponseAsync(
    "Thời tiết Hà Nội hôm nay thế nào?", options);
// LLM tự gọi GetWeather("Hà Nội"), nhận kết quả,
// rồi compose câu trả lời tự nhiên

So sánh với các approach khác

Tiêu chí	Dùng SDK trực tiếp	Microsoft.Extensions.AI	Semantic Kernel
Provider lock-in	Cao — code gắn chặt với 1 SDK	Không — swap bằng DI	Không — dùng IChatClient bên dưới
Middleware pipeline	Tự viết	Có sẵn (logging, cache, telemetry, function calling)	Có + thêm filters, plugins
Dependency Injection	Manual wiring	First-class support	First-class support
Orchestration (planning, agents)	Không	Không — chỉ primitive	Có — plugins, planner, agent framework
Learning curve	Thấp nhưng fragmented	Thấp, quen thuộc với .NET dev	Trung bình — nhiều concept
Package size	Nhỏ	Rất nhỏ (~lightweight)	Lớn hơn — full orchestration
Phù hợp cho	Prototype nhanh	Production app cần AI đơn giản	AI-first app phức tạp

Tích hợp vào ứng dụng ASP.NET Core thực tế

Dưới đây là pattern hoàn chỉnh cho một API endpoint sử dụng IChatClient qua DI:

// Program.cs
var builder = WebApplication.CreateBuilder(args);

// Đăng ký cache backend
builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = builder.Configuration
        .GetConnectionString("Redis");
});

// Đăng ký IChatClient với middleware pipeline
builder.Services.AddChatClient(services =>
{
    var config = builder.Configuration;
    return new AzureOpenAIClient(
            new Uri(config["AI:Endpoint"]!),
            new Azure.AzureKeyCredential(config["AI:ApiKey"]!))
        .GetChatClient(config["AI:Model"]!)
        .AsIChatClient()
        .AsBuilder()
        .UseLogging()
        .UseDistributedCache()
        .UseOpenTelemetry(
            configure: otel => otel.EnableSensitiveData = false)
        .UseFunctionInvocation()
        .Build(services);
});

var app = builder.Build();

app.MapPost("/api/chat", async (
    IChatClient chatClient,
    ChatRequest request) =>
{
    var messages = new List<ChatMessage>
    {
        new(ChatRole.System, "Bạn là trợ lý kỹ thuật chuyên .NET."),
        new(ChatRole.User, request.Message)
    };

    var response = await chatClient.GetResponseAsync(messages);
    return Results.Ok(new { reply = response.Text });
});

app.Run();

Pattern nâng cao: Multi-provider với Named Clients

Trong production, bạn thường cần nhiều LLM cho các mục đích khác nhau — ví dụ model nhỏ cho classification, model lớn cho generation:

// Đăng ký 2 client với tên khác nhau
builder.Services.AddKeyedSingleton<IChatClient>("fast", (sp, _) =>
    new OllamaChatClient(
        new Uri("http://localhost:11434"), "phi-4-mini"));

builder.Services.AddKeyedSingleton<IChatClient>("smart", (sp, _) =>
    new AzureOpenAIClient(endpoint, key)
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .UseDistributedCache()
        .Build(sp));

// Inject bằng [FromKeyedServices]
app.MapPost("/api/classify", async (
    [FromKeyedServices("fast")] IChatClient fast,
    string text) =>
{
    var result = await fast.GetResponseAsync(
        $"Phân loại intent: {text}. Trả về: support|sales|billing");
    return Results.Ok(new { intent = result.Text?.Trim() });
});

app.MapPost("/api/generate", async (
    [FromKeyedServices("smart")] IChatClient smart,
    string prompt) =>
{
    var result = await smart.GetResponseAsync(prompt);
    return Results.Ok(new { content = result.Text });
});

Viết custom middleware

Bạn hoàn toàn có thể viết middleware riêng. Ví dụ — middleware rate limiting kết hợp Polly:

public class RateLimitingChatClient : DelegatingChatClient
{
    private readonly RateLimiter _limiter;

    public RateLimitingChatClient(
        IChatClient inner, RateLimiter limiter)
        : base(inner)
    {
        _limiter = limiter;
    }

    public override async Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await _limiter
            .AcquireAsync(1, cancellationToken);

        if (!lease.IsAcquired)
            throw new RateLimitExceededException("AI rate limit hit");

        return await base.GetResponseAsync(
            messages, options, cancellationToken);
    }
}

// Đăng ký
builder.Services.AddChatClient(services =>
    new AzureOpenAIClient(endpoint, key)
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .Use(inner => new RateLimitingChatClient(
            inner,
            new TokenBucketRateLimiter(new TokenBucketRateLimiterOptions
            {
                TokenLimit = 100,
                ReplenishmentPeriod = TimeSpan.FromMinutes(1),
                TokensPerPeriod = 50
            })))
        .UseLogging()
        .Build(services));

IImageGenerator — Sinh ảnh từ text (Experimental)

Interface mới nhất (đánh dấu experimental) cho phép tích hợp text-to-image generation với cùng pattern DI quen thuộc:

IImageGenerator imageGen = /* provider implementation */;

var result = await imageGen.GenerateAsync(
    new ImageGenerationRequest("A futuristic city with neon lights"),
    new ImageGenerationOptions
    {
        Width = 1024,
        Height = 1024
    });

// result chứa URL hoặc byte[] của ảnh generated

Khi nào dùng gì?

graph TD
    Q["Bạn cần AI trong .NET app?"] --> A{"Cần orchestration
phức tạp?"}
    A -- Có --> B{"Cần agents,
planning, plugins?"}
    B -- Có --> C["Semantic Kernel
+ Agent Framework"]
    B -- Không --> D["Semantic Kernel
đơn giản"]
    A -- Không --> E{"Cần middleware
(cache, logging)?"}
    E -- Có --> F["Microsoft.Extensions.AI
+ Middleware Pipeline"]
    E -- Không --> G{"Prototype
nhanh?"}
    G -- Có --> H["SDK trực tiếp
(OpenAI, Ollama...)"]
    G -- Không --> F

    style Q fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Decision tree: Chọn approach AI phù hợp cho .NET app

Lộ trình phát triển

Tháng 10/2024

Microsoft công bố Microsoft.Extensions.AI Preview — giới thiệu IChatClient và IEmbeddingGenerator lần đầu.

Tháng 11/2025

.NET 10 GA — Microsoft.Extensions.AI chính thức stable, phiên bản 10.0.0 ship cùng .NET 10.

Tháng 1/2026

Phiên bản 10.3.0 — cải tiến middleware pipeline, tích hợp OpenTelemetry sâu hơn.

Tháng 4/2026

Phiên bản 10.5.0 hiện tại — thêm IImageGenerator (experimental), cải thiện function calling, hỗ trợ multi-modal content mở rộng.

Best practices cho Production

1. Luôn dùng DI, không new trực tiếp

Đăng ký IChatClient qua AddChatClient() để middleware pipeline hoạt động đúng và testable qua mock/stub.

2. Cache aggressively cho deterministic prompts

Các prompt không thay đổi (system prompts, classification tasks) nên bật UseDistributedCache() để giảm chi phí API call đáng kể.

3. Dùng OpenTelemetry để monitor chi phí

Mỗi LLM call đều tốn token → tốn tiền. UseOpenTelemetry() giúp track request count, latency, và token usage qua metrics — kết nối với Prometheus/Grafana để alert khi vượt budget.

4. Cẩn thận với EnableSensitiveData

Khi bật EnableSensitiveData = true trong OpenTelemetry middleware, toàn bộ prompt và response sẽ được log — nguy hiểm nếu chứa PII. Production nên giữ false.

Kết luận

Microsoft.Extensions.AI là bước tiến quan trọng trong việc chuẩn hóa cách .NET developers tích hợp AI. Thay vì mỗi provider một kiểu, giờ đây bạn có IChatClient — tương đương ILogger cho logging, IDistributedCache cho caching — một interface duy nhất, middleware pipeline linh hoạt, và khả năng swap provider zero-code-change. Đặc biệt với .NET 10, thư viện đã stable và sẵn sàng cho production. Nếu bạn đang xây dựng bất kỳ ứng dụng .NET nào có tích hợp AI, đây là foundation nên bắt đầu.

Tham khảo

#system design #RAG #OpenAI #Function Calling #.NET 10 #OpenTelemetry #Microsoft Extensions AI #Ollama #ASP.NET Core #Dependency Injection #Semantic Kernel #C# #IChatClient #IEmbeddingGenerator #AI Abstraction #Middleware Pipeline #Azure OpenAI #Embeddings #LLM

# Microsoft.Extensions.AI — Tầng trừu tượng AI thống nhất cho .NET 10

Khi xây dựng ứng dụng AI trên .NET, lập trình viên thường phải đối mặt với bài toán quen thuộc: mỗi provider (OpenAI, Azure OpenAI, Ollama, Anthropic…) lại có SDK riêng, API surface riêng, cách xử lý streaming riêng. Đổi provider đồng nghĩa với viết lại code. **Microsoft.Extensions.AI** ra đời để giải quyết triệt để vấn đề này — cung cấp một tầng trừu tượng thống nhất, tích hợp sâu vào hệ sinh thái Dependency Injection quen thuộc của .NET, cho phép swap provider mà không thay đổi business logic.

10.5.0Phiên bản stable mới nhất trên NuGet

3Interface chính: IChatClient, IEmbeddingGenerator, IImageGenerator

4+Middleware tích hợp sẵn (Logging, Cache, Telemetry, Function Calling)

0Dòng code cần thay đổi khi swap provider

## Tại sao cần Microsoft.Extensions.AI?

#### Điểm mấu chốt

Microsoft.Extensions.AI **không thay thế** Semantic Kernel. Nó nằm ở tầng thấp hơn — cung cấp primitive types (IChatClient, IEmbeddingGenerator) mà Semantic Kernel sử dụng bên trong. Nếu bạn chỉ cần chat completion hoặc embedding, dùng trực tiếp Extensions.AI là đủ. Khi cần orchestration phức tạp (plugins, planning, agents), hãy kết hợp với Semantic Kernel.

## Kiến trúc tổng quan

Hệ thống được thiết kế theo mô hình phân tầng rõ ràng, mỗi tầng có trách nhiệm riêng biệt:

```
graph TD
    A["Application Code  
Controller, Service, Worker"] --> B["Microsoft.Extensions.AI  
Middleware Pipeline"]
    B --> C["IChatClient / IEmbeddingGenerator  
Abstractions"]
    C --> D["Provider Implementations"]
    D --> E["Azure OpenAI"]
    D --> F["OpenAI"]
    D --> G["Ollama  
Local Models"]
    D --> H["Anthropic  
Claude"]
    D --> I["Custom Provider"]

B --> J["UseLogging()"]
    B --> K["UseDistributedCache()"]
    B --> L["UseOpenTelemetry()"]
    B --> M["UseFunctionInvocation()"]

style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#16213e,stroke:#fff,color:#fff
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#0078d4,stroke:#fff,color:#fff
    style F fill:#412991,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#d97706,stroke:#fff,color:#fff
    style I fill:#888,stroke:#fff,color:#fff
    style J fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style K fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style L fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style M fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50

```
Kiến trúc phân tầng của Microsoft.Extensions.AI trong .NET 10

## Hai package cốt lõi

| Package | Vai trò | Khi nào dùng |
| --- | --- | --- |
| **Microsoft.Extensions.AI.Abstractions** | Định nghĩa các interface (IChatClient, IEmbeddingGenerator, IImageGenerator) và exchange types (ChatMessage, ChatResponse…) | Cho library authors — ai viết provider adapter chỉ cần reference package này |
| **Microsoft.Extensions.AI** | Cung cấp middleware pipeline (logging, caching, telemetry, function invocation) + builder pattern cho DI | Cho application developers — reference package này để có full functionality |

## IChatClient — Interface thống nhất cho Chat Completion

Đây là interface trung tâm của toàn bộ thư viện. Chỉ với 2 method chính, nó bao quát mọi use case từ simple prompt đến multi-turn conversation có tool calling:

```csharp
public interface IChatClient : IDisposable
{
    // Chat completion đồng bộ — trả về full response
    Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

// Streaming — trả về từng token qua IAsyncEnumerable
    IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

ChatClientMetadata Metadata { get; }
}
```

### Ví dụ cơ bản: Chat với Azure OpenAI

```csharp
using Microsoft.Extensions.AI;
using Azure.AI.OpenAI;

// Tạo IChatClient từ Azure OpenAI SDK
IChatClient client = new AzureOpenAIClient(
        new Uri("https://my-resource.openai.azure.com/"),
        new Azure.AzureKeyCredential("api-key"))
    .GetChatClient("gpt-4o")
    .AsIChatClient();

// Gọi chat — đơn giản như gọi một service bất kỳ
ChatResponse response = await client.GetResponseAsync(
    "Giải thích Dependency Injection trong 3 câu");
Console.WriteLine(response.Text);
```

### Streaming response

```csharp
await foreach (var update in client.GetStreamingResponseAsync(
    "Viết hàm quick sort bằng C#"))
{
    Console.Write(update.Text);
}
```

### Đổi sang Ollama (local model) — chỉ thay 1 dòng

```csharp
// Trước: Azure OpenAI
IChatClient client = new AzureOpenAIClient(endpoint, key)
    .GetChatClient("gpt-4o").AsIChatClient();

// Sau: Ollama local — business code KHÔNG đổi
IChatClient client = new OllamaChatClient(
    new Uri("http://localhost:11434"), "llama3.2");
```

#### Provider-agnostic là chìa khóa

## Middleware Pipeline — Sức mạnh thực sự

Nếu chỉ dừng ở abstraction layer, Microsoft.Extensions.AI không có gì đặc biệt. Điểm khiến nó trở thành game-changer là **middleware pipeline** — cho phép bạn xếp chồng các cross-cutting concerns giống như middleware trong ASP.NET Core:

```
sequenceDiagram
    participant App as Application
    participant Log as UseLogging()
    participant Cache as UseDistributedCache()
    participant OTel as UseOpenTelemetry()
    participant Func as UseFunctionInvocation()
    participant LLM as LLM Provider

App->>Log: GetResponseAsync()
    Log->>Log: Log request
    Log->>Cache: Forward
    Cache->>Cache: Check cache
    alt Cache hit
        Cache-->>Log: Return cached response
        Log-->>App: Response
    else Cache miss
        Cache->>OTel: Forward
        OTel->>OTel: Start trace span
        OTel->>Func: Forward
        Func->>LLM: Call provider
        LLM-->>Func: Response (may include tool calls)
        Func->>Func: Execute tools, re-call if needed
        Func-->>OTel: Final response
        OTel->>OTel: End span
        OTel-->>Cache: Response
        Cache->>Cache: Store in cache
        Cache-->>Log: Response
        Log->>Log: Log response
        Log-->>App: Response
    end

```
Luồng xử lý request qua middleware pipeline

### Đăng ký middleware qua DI

```csharp
var builder = WebApplication.CreateBuilder(args);

// Đăng ký IChatClient với full middleware pipeline
builder.Services.AddChatClient(services =>
    new AzureOpenAIClient(
            new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
            new Azure.AzureKeyCredential(
                builder.Configuration["AzureOpenAI:ApiKey"]!))
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .UseLogging()                    // Log mọi request/response
        .UseDistributedCache()           // Cache response cho prompt giống nhau
        .UseOpenTelemetry()              // Distributed tracing
        .UseFunctionInvocation()         // Tự động gọi tool khi LLM yêu cầu
        .Build(services));

// Inject IChatClient vào controller/service như bình thường
builder.Services.AddDistributedMemoryCache(); // Hoặc Redis cache
```

#### Thứ tự middleware quan trọng

Middleware xếp theo thứ tự outermost-first. `UseLogging()` ở ngoài cùng sẽ log mọi thứ kể cả cache hit. Đặt `UseDistributedCache()` trước `UseOpenTelemetry()` để tránh tạo trace span cho cached response — giảm noise trong monitoring.

## IEmbeddingGenerator — Vector Embeddings cho RAG

Interface thứ hai phục vụ cho các hệ thống RAG (Retrieval-Augmented Generation), semantic search, và memory store:

```csharp
public interface IEmbeddingGenerator<TInput, TEmbedding> : IDisposable
    where TEmbedding : Embedding
{
    Task<GeneratedEmbeddings<TEmbedding>> GenerateAsync(
        IEnumerable<TInput> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default);
}
```

### Ví dụ: Tạo embeddings cho semantic search

```csharp
IEmbeddingGenerator<string, Embedding<float>> generator =
    new AzureOpenAIClient(endpoint, key)
        .GetEmbeddingClient("text-embedding-3-small")
        .AsIEmbeddingGenerator();

// Tạo embedding cho batch documents
var embeddings = await generator.GenerateAsync(new[]
{
    "Microservices architecture patterns",
    "Event-driven design with message queues",
    "CQRS and Event Sourcing in .NET 10"
});

// Mỗi embedding là một vector float[] — lưu vào vector database
foreach (var emb in embeddings)
{
    float[] vector = emb.Vector.ToArray();
    // Store in Qdrant, Milvus, pgvector...
}
```

## Function Calling (Tool Use)

Một trong những tính năng mạnh nhất: cho phép LLM gọi các function C# bạn định nghĩa, và middleware `UseFunctionInvocation()` tự động dispatch + re-call LLM với kết quả:

```csharp
[Description("Lấy thông tin thời tiết cho một thành phố")]
static async Task<string> GetWeather(
    [Description("Tên thành phố")] string city)
{
    // Gọi weather API thực tế
    return $"Thời tiết tại {city}: 28°C, trời nắng";
}

IChatClient client = new AzureOpenAIClient(endpoint, key)
    .GetChatClient("gpt-4o")
    .AsIChatClient()
    .AsBuilder()
    .UseFunctionInvocation() // Tự động gọi function khi LLM yêu cầu
    .Build();

var options = new ChatOptions
{
    Tools = [AIFunctionFactory.Create(GetWeather)]
};

var response = await client.GetResponseAsync(
    "Thời tiết Hà Nội hôm nay thế nào?", options);
// LLM tự gọi GetWeather("Hà Nội"), nhận kết quả,
// rồi compose câu trả lời tự nhiên
```

## So sánh với các approach khác

| Tiêu chí | Dùng SDK trực tiếp | Microsoft.Extensions.AI | Semantic Kernel |
| --- | --- | --- | --- |
| **Provider lock-in** | Cao — code gắn chặt với 1 SDK | Không — swap bằng DI | Không — dùng IChatClient bên dưới |
| **Middleware pipeline** | Tự viết | Có sẵn (logging, cache, telemetry, function calling) | Có + thêm filters, plugins |
| **Dependency Injection** | Manual wiring | First-class support | First-class support |
| **Orchestration (planning, agents)** | Không | Không — chỉ primitive | Có — plugins, planner, agent framework |
| **Learning curve** | Thấp nhưng fragmented | Thấp, quen thuộc với .NET dev | Trung bình — nhiều concept |
| **Package size** | Nhỏ | Rất nhỏ (~lightweight) | Lớn hơn — full orchestration |
| **Phù hợp cho** | Prototype nhanh | Production app cần AI đơn giản | AI-first app phức tạp |

## Tích hợp vào ứng dụng ASP.NET Core thực tế

Dưới đây là pattern hoàn chỉnh cho một API endpoint sử dụng IChatClient qua DI:

```csharp
// Program.cs
var builder = WebApplication.CreateBuilder(args);

// Đăng ký cache backend
builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = builder.Configuration
        .GetConnectionString("Redis");
});

// Đăng ký IChatClient với middleware pipeline
builder.Services.AddChatClient(services =>
{
    var config = builder.Configuration;
    return new AzureOpenAIClient(
            new Uri(config["AI:Endpoint"]!),
            new Azure.AzureKeyCredential(config["AI:ApiKey"]!))
        .GetChatClient(config["AI:Model"]!)
        .AsIChatClient()
        .AsBuilder()
        .UseLogging()
        .UseDistributedCache()
        .UseOpenTelemetry(
            configure: otel => otel.EnableSensitiveData = false)
        .UseFunctionInvocation()
        .Build(services);
});

var app = builder.Build();

app.MapPost("/api/chat", async (
    IChatClient chatClient,
    ChatRequest request) =>
{
    var messages = new List<ChatMessage>
    {
        new(ChatRole.System, "Bạn là trợ lý kỹ thuật chuyên .NET."),
        new(ChatRole.User, request.Message)
    };

var response = await chatClient.GetResponseAsync(messages);
    return Results.Ok(new { reply = response.Text });
});

app.Run();
```

## Pattern nâng cao: Multi-provider với Named Clients

Trong production, bạn thường cần nhiều LLM cho các mục đích khác nhau — ví dụ model nhỏ cho classification, model lớn cho generation:

```csharp
// Đăng ký 2 client với tên khác nhau
builder.Services.AddKeyedSingleton<IChatClient>("fast", (sp, _) =>
    new OllamaChatClient(
        new Uri("http://localhost:11434"), "phi-4-mini"));

builder.Services.AddKeyedSingleton<IChatClient>("smart", (sp, _) =>
    new AzureOpenAIClient(endpoint, key)
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .UseDistributedCache()
        .Build(sp));

// Inject bằng [FromKeyedServices]
app.MapPost("/api/classify", async (
    [FromKeyedServices("fast")] IChatClient fast,
    string text) =>
{
    var result = await fast.GetResponseAsync(
        $"Phân loại intent: {text}. Trả về: support|sales|billing");
    return Results.Ok(new { intent = result.Text?.Trim() });
});

app.MapPost("/api/generate", async (
    [FromKeyedServices("smart")] IChatClient smart,
    string prompt) =>
{
    var result = await smart.GetResponseAsync(prompt);
    return Results.Ok(new { content = result.Text });
});
```

## Viết custom middleware

Bạn hoàn toàn có thể viết middleware riêng. Ví dụ — middleware rate limiting kết hợp Polly:

```csharp
public class RateLimitingChatClient : DelegatingChatClient
{
    private readonly RateLimiter _limiter;

public RateLimitingChatClient(
        IChatClient inner, RateLimiter limiter)
        : base(inner)
    {
        _limiter = limiter;
    }

public override async Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await _limiter
            .AcquireAsync(1, cancellationToken);

if (!lease.IsAcquired)
            throw new RateLimitExceededException("AI rate limit hit");

return await base.GetResponseAsync(
            messages, options, cancellationToken);
    }
}

// Đăng ký
builder.Services.AddChatClient(services =>
    new AzureOpenAIClient(endpoint, key)
        .GetChatClient("gpt-4o")
        .AsIChatClient()
        .AsBuilder()
        .Use(inner => new RateLimitingChatClient(
            inner,
            new TokenBucketRateLimiter(new TokenBucketRateLimiterOptions
            {
                TokenLimit = 100,
                ReplenishmentPeriod = TimeSpan.FromMinutes(1),
                TokensPerPeriod = 50
            })))
        .UseLogging()
        .Build(services));
```

## IImageGenerator — Sinh ảnh từ text (Experimental)

Interface mới nhất (đánh dấu experimental) cho phép tích hợp text-to-image generation với cùng pattern DI quen thuộc:

```csharp
IImageGenerator imageGen = /* provider implementation */;

var result = await imageGen.GenerateAsync(
    new ImageGenerationRequest("A futuristic city with neon lights"),
    new ImageGenerationOptions
    {
        Width = 1024,
        Height = 1024
    });

// result chứa URL hoặc byte[] của ảnh generated
```

## Khi nào dùng gì?

```
graph TD
    Q["Bạn cần AI trong .NET app?"] --> A{"Cần orchestration  
phức tạp?"}
    A -- Có --> B{"Cần agents,  
planning, plugins?"}
    B -- Có --> C["Semantic Kernel  
+ Agent Framework"]
    B -- Không --> D["Semantic Kernel  
đơn giản"]
    A -- Không --> E{"Cần middleware  
(cache, logging)?"}
    E -- Có --> F["Microsoft.Extensions.AI  
+ Middleware Pipeline"]
    E -- Không --> G{"Prototype  
nhanh?"}
    G -- Có --> H["SDK trực tiếp  
(OpenAI, Ollama...)"]
    G -- Không --> F

style Q fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50

```
Decision tree: Chọn approach AI phù hợp cho .NET app

## Lộ trình phát triển

Tháng 10/2024

Microsoft công bố Microsoft.Extensions.AI Preview — giới thiệu IChatClient và IEmbeddingGenerator lần đầu.

Tháng 11/2025

.NET 10 GA — Microsoft.Extensions.AI chính thức stable, phiên bản 10.0.0 ship cùng .NET 10.

Tháng 1/2026

Phiên bản 10.3.0 — cải tiến middleware pipeline, tích hợp OpenTelemetry sâu hơn.

Tháng 4/2026

Phiên bản 10.5.0 hiện tại — thêm IImageGenerator (experimental), cải thiện function calling, hỗ trợ multi-modal content mở rộng.

## Best practices cho Production

#### 1. Luôn dùng DI, không new trực tiếp

Đăng ký IChatClient qua `AddChatClient()` để middleware pipeline hoạt động đúng và testable qua mock/stub.

#### 2. Cache aggressively cho deterministic prompts

Các prompt không thay đổi (system prompts, classification tasks) nên bật `UseDistributedCache()` để giảm chi phí API call đáng kể.

#### 3. Dùng OpenTelemetry để monitor chi phí

Mỗi LLM call đều tốn token → tốn tiền. `UseOpenTelemetry()` giúp track request count, latency, và token usage qua metrics — kết nối với Prometheus/Grafana để alert khi vượt budget.

#### 4. Cẩn thận với EnableSensitiveData

Khi bật `EnableSensitiveData = true` trong OpenTelemetry middleware, toàn bộ prompt và response sẽ được log — nguy hiểm nếu chứa PII. Production nên giữ `false`.

## Kết luận

## Tham khảo

- [Microsoft.Extensions.AI libraries - .NET | Microsoft Learn](https://learn.microsoft.com/en-us/dotnet/ai/microsoft-extensions-ai)
- [Introducing Microsoft.Extensions.AI Preview - .NET Blog](https://devblogs.microsoft.com/dotnet/introducing-microsoft-extensions-ai-preview/)
- [NuGet Gallery | Microsoft.Extensions.AI 10.5.0](https://www.nuget.org/packages/Microsoft.Extensions.AI/)
- [GitHub - dotnet/extensions - Microsoft.Extensions.AI source](https://github.com/dotnet/extensions/tree/main/src/Libraries/Microsoft.Extensions.AI)
- [Microsoft.Extensions.AI: The New Foundation for .NET AI Development](https://www.dotnetstudioai.com/university/microsoft-extensions-ai-dotnet-guide/)
- [Generative AI with LLMs in C# in 2026 - .NET Blog](https://devblogs.microsoft.com/dotnet/generative-ai-with-large-language-models-in-dotnet-and-csharp/)

Valkey vs Redis 2026 — Cuộc Chia Tách Định Hình Lại Thế Giới In-Memory Database

Astro Framework — Vũ Khí Bí Mật Cho Website Hiệu Năng Cao Trong Kỷ Nguyên Edge Computing

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.