Meilisearch 2026 — Deep dive: Hybrid Search, Fragments API và tương lai AI Retrieval

Posted on: 4/20/2026 10:27:44 AM

Table of contents

1. Vì sao Meilisearch trở thành search engine đáng chú ý nhất 2026
2. Kiến trúc bên trong — Rust, LMDB và ranking rules
1. 2.1. Vì sao Rust + LMDB lại quan trọng
2. 2.2. Ranking Rules — Trái tim của relevance
3. Meilisearch v1.15 — Typo số, string filter và tiếng Trung
4. Meilisearch v1.16 — Fragments API, Exports và Documents Sort
5. Hybrid Search — Kết hợp keyword + semantic
6. Roadmap 2026 — Meilisearch đi đâu?
7. So sánh với Elasticsearch, Typesense, Algolia
8. Tích hợp với .NET Core
9. Performance và Benchmark thực tế
1. Benchmark — nhìn rộng hơn số
10. Chat API — RAG ngay trong Meilisearch
11. Khi nào chọn Meilisearch?
1. ✅ Chọn Meilisearch khi:
2. ❌ Cân nhắc khác khi:
12. Timeline phát triển Meilisearch
Tổng kết
1. Nguồn tham khảo

1. Vì sao Meilisearch trở thành search engine đáng chú ý nhất 2026

Nếu Elasticsearch là "con bò tót" của thế giới search — mạnh mẽ, vạn năng nhưng cồng kềnh, thì Meilisearch là con dao mổ chính xác: nhỏ gọn, nhanh đến mức khó tin, và được thiết kế cho developer experience từ ngày đầu. Ra đời năm 2018 tại Pháp bởi một team gồm ba kỹ sư từng làm việc với Algolia, Meilisearch kế thừa triết lý "search instant" nhưng đi theo con đường open source với giấy phép MIT.

Năm 2026 đánh dấu bước chuyển mình lớn của Meilisearch: từ một full-text search engine thuần túy, nó trở thành unified search & AI retrieval platform — tích hợp hybrid search (keyword + semantic), vector search, multi-modal (Fragments API), chat engine cho RAG, và sắp tới là sharding + serverless indexes. Bài viết này sẽ đào sâu vào kiến trúc, các tính năng mới nhất của v1.15/v1.16, roadmap 2026, và cách tích hợp thực tế với .NET Core.

<50ms Response time điển hình

48K+ GitHub Stars

v1.16 Phiên bản mới nhất

Rust Ngôn ngữ core

2. Kiến trúc bên trong — Rust, LMDB và ranking rules

2.1. Vì sao Rust + LMDB lại quan trọng

Meilisearch được viết hoàn toàn bằng Rust — ngôn ngữ cho phép tận dụng hiệu năng gần bằng C/C++ nhưng với memory safety tuyệt đối. Storage engine là LMDB (Lightning Memory-Mapped Database) — một key-value store B+Tree được Symas phát triển, nổi tiếng với đặc điểm zero-copy reads và hỗ trợ MVCC (Multi-Version Concurrency Control).

Sự khác biệt then chốt so với Typesense (keep everything in RAM) là Meilisearch sử dụng memory-mapped files: OS chỉ load các page cần thiết vào RAM theo nguyên lý demand paging. Điều này cho phép index dataset lớn hơn RAM vật lý mà vẫn giữ được tốc độ đọc gần như in-memory (nhờ OS page cache).

graph TB
    A[Search Request] --> B[HTTP Server
Actix-web]
    B --> C[Query Parser]
    C --> D{Query Type}
    D -->|Keyword| E[Inverted Index]
    D -->|Semantic| F[Vector Index
HNSW]
    D -->|Hybrid| G[Both + Rerank]
    E --> H[Ranking Rules
Bucket Sort]
    F --> H
    G --> H
    H --> I[LMDB
Memory-mapped]
    I --> J[Response]
    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style I fill:#16213e,stroke:#fff,color:#fff
    style J fill:#4CAF50,stroke:#fff,color:#fff
    style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Pipeline xử lý request trong Meilisearch — từ HTTP tới LMDB

2.2. Ranking Rules — Trái tim của relevance

Khác với Elasticsearch dùng BM25 làm mặc định, Meilisearch có hệ thống ranking rules theo tầng (bucket sort) — mỗi rule lọc dần ứng viên thay vì tính điểm tổng hợp. Thứ tự mặc định:

Words — document chứa nhiều term của query nhất
Typo — ưu tiên document có ít lỗi chính tả hơn
Proximity — các term xuất hiện gần nhau
Attribute — term match attribute ưu tiên cao hơn (title > description)
Sort — custom sort theo field số/chuỗi
Exactness — exact match được ưu tiên

Bạn có thể thêm custom rule, ví dụ release_date:desc để ưu tiên bài mới hơn. Điểm mạnh của cách tiếp cận này là deterministic và dễ debug: lý do một document được xếp hạng cao có thể trace từng rule một, không phải là "magic score" như BM25.

// Cấu hình ranking rules
{
  "rankingRules": [
    "words",
    "typo",
    "proximity",
    "attribute",
    "sort",
    "exactness",
    "release_date:desc",
    "popularity:desc"
  ]
}

3. Meilisearch v1.15 — Typo số, string filter và tiếng Trung

3.1. Disable typo tolerance cho số

Trước v1.15, Meilisearch áp dụng typo tolerance cho mọi từ kể cả số. Điều này gây phiền toái với dữ liệu như mã bưu điện, số điện thoại, năm — tìm 2024 có thể trả về cả 2025 hay 2004. Phiên bản 1.15 cho phép tắt typo tolerance riêng cho số:

PATCH /indexes/{index_uid}/settings/typo-tolerance
{
  "disableOnNumbers": true
}

Bonus: indexing nhanh hơn đáng kể với dataset có nhiều số unique, vì Meilisearch không cần build typo variations cho chúng.

3.2. Lexicographic string filter

Operator so sánh (<, <=, >, >=, TO) giờ hoạt động với chuỗi, sắp xếp theo thứ tự lexicographic. Cực kỳ hữu dụng với date ISO format:

// Tìm bài viết trước ngày 17/07/2023
POST /indexes/posts/search
{
  "q": "rust",
  "filter": "release_date < \"2023-07-17\""
}

// Range: các phiên bản từ v1.10 đến v1.15
{
  "filter": "version \"1.10\" TO \"1.15\""
}

3.3. Cải thiện tokenizer tiếng Trung

Tokenizer Charabia của Meilisearch được cải tiến để segment tiếng Trung chính xác hơn — quan trọng vì tiếng Trung không có space giữa các từ như tiếng Anh. Lưu ý: nếu dataset của bạn có nội dung tiếng Trung, bắt buộc phải reindex sau khi upgrade lên v1.15, nếu không query tiếng Trung có thể bị ignore hoàn toàn.

Tiếng Việt trong Meilisearch

Meilisearch hỗ trợ tiếng Việt ổn ở mức cơ bản qua Unicode normalization, nhưng chưa có rule đặc biệt cho thanh điệu/dấu. Để tối ưu, nên thêm cả version có dấu và không dấu vào searchableAttributes, hoặc dùng synonyms map ("ơn": ["on"]). Một số developer dùng pre-processing Python với unidecode để tạo field title_unsigned song song.

4. Meilisearch v1.16 — Fragments API, Exports và Documents Sort

Đây là tính năng đáng giá nhất của v1.16. Fragments cho phép decompose document và query thành các phần ngữ nghĩa riêng biệt — ví dụ một sản phẩm có thể có fragment text (tên, mô tả) và fragment image (URL ảnh). Ứng dụng thực tế: người dùng mô tả "áo len xanh dương dáng rộng" → Meilisearch dùng text embedding match description VÀ image embedding match ảnh → kết quả tốt hơn hẳn full-text truyền thống.

// Cấu hình fragments cho movie index
PATCH /indexes/movies/settings/embedders
{
  "multimodal": {
    "source": "rest",
    "url": "https://api.openai.com/v1/embeddings",
    "indexingFragments": {
      "textPart": {
        "value": {
          "text": "{{doc.title}}. {{doc.description}}"
        }
      },
      "imagePart": {
        "value": {
          "image": "{{doc.poster_url}}"
        }
      }
    },
    "searchFragments": {
      "queryText": {
        "value": { "text": "{{q}}" }
      },
      "queryImage": {
        "value": { "image": "{{media.image}}" }
      }
    }
  }
}

// Search với ảnh base64
POST /indexes/movies/search
{
  "q": "phim hành động",
  "hybrid": {
    "embedder": "multimodal",
    "semanticRatio": 0.7
  },
  "media": {
    "image": "data:image/jpeg;base64,/9j/4AAQ..."
  }
}

4.2. Exports API — Migration không cần dump

Trước v1.16, migrate data giữa instances phải dùng dump/snapshot — chậm và đòi hỏi downtime. Exports API giải quyết điều này: transfer documents thẳng từ instance A sang instance B qua HTTP:

POST /experimental-features
{
  "exports": true
}

POST /export
{
  "url": "https://target-instance.meilisearch.io",
  "apiKey": "target_master_key",
  "indexes": {
    "products": {},
    "users": {
      "filter": "is_active = true",
      "overrideSettings": false
    }
  }
}

Particularly hữu dụng khi migrate từ local dev lên Meilisearch Cloud — không cần SSH vào server, không cần scp file dump hàng GB.

4.3. Sort trên Documents API

Trước đây sort chỉ có trong Search API; v1.16 thêm sort cho Documents API — list/export documents theo thứ tự bất kỳ:

GET /indexes/products/documents?sort=price:desc,name:asc&limit=50

5. Hybrid Search — Kết hợp keyword + semantic

5.1. Vì sao hybrid mới là "đúng đắn"

Keyword search xuất sắc khi user biết chính xác họ tìm gì ("iPhone 15 Pro Max 256GB"), nhưng thất bại với query mô tả ("điện thoại cao cấp để quay vlog"). Semantic search (vector) ngược lại: hiểu intent tốt nhưng có thể miss exact match quan trọng. Hybrid kết hợp cả hai với tỷ trọng có thể điều chỉnh:

graph LR
    A[Query: dien thoai
quay vlog tot] --> B[Keyword
BM25-like]
    A --> C[Embedder
OpenAI/Jina/Voyage]
    C --> D[Vector Search
HNSW]
    B --> E[Score Fusion
semanticRatio: 0.5]
    D --> E
    E --> F[Reranker
Optional]
    F --> G[Final Results]
    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style D fill:#16213e,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#4CAF50,stroke:#fff,color:#fff

Luồng hybrid search trong Meilisearch — keyword + vector + rerank

5.2. Cấu hình embedder

// Bước 1: Cấu hình embedder (OpenAI)
PATCH /indexes/products/settings/embedders
{
  "openai-small": {
    "source": "openAi",
    "model": "text-embedding-3-small",
    "apiKey": "sk-...",
    "documentTemplate": "A product titled {{doc.name}} in category {{doc.category}} with description: {{doc.description}}",
    "dimensions": 1536
  }
}

// Bước 2: Search hybrid
POST /indexes/products/search
{
  "q": "dien thoai quay vlog tot",
  "hybrid": {
    "embedder": "openai-small",
    "semanticRatio": 0.7
  },
  "limit": 20
}

Composite Embedder — Pattern thực tế

Pattern hay nhất cho production: dùng embedder local nhỏ (ví dụ all-MiniLM-L6-v2 qua Ollama) cho query (latency thấp, miễn phí), và embedder remote chất lượng cao (OpenAI, Voyage) cho indexing (chạy một lần, chất lượng vector tốt). Meilisearch 2026 đang biến pattern này thành single toggle qua composite embedder API.

5.3. Reranking — Tăng relevance thêm một bậc

Sau khi hybrid trả về top-K candidates, có thể đưa qua reranker model để sắp xếp lại. Meilisearch hỗ trợ native Cohere Rerank, và đang mở rộng sang Jina, Voyage, REST generic:

POST /indexes/products/search
{
  "q": "dien thoai quay vlog",
  "hybrid": {
    "embedder": "openai-small",
    "semanticRatio": 0.5
  },
  "ranker": {
    "model": "cohere-rerank-v3",
    "apiKey": "cohere_...",
    "topK": 10
  },
  "limit": 10
}

6. Roadmap 2026 — Meilisearch đi đâu?

Tháng 3/2026, Meilisearch công bố roadmap 4 initiative lớn:

6.1. Any Workload — Sharding & Serverless

Sharding (đã release v1.37): chia index thành nhiều shard trên nhiều node, phá vỡ giới hạn single-machine. High availability qua replication.

Serverless Indexes (Q3 2026): index không active sẽ move xuống object storage (S3), spin up khi có query. Use case huge: SaaS multi-tenant với hàng triệu tenants trong đó chỉ 5% active đồng thời — thay vì trả tiền cho 1 triệu warm index, chỉ trả cho 50.000 active.

6.2. Hybrid by Default

Guided setup trên Cloud dashboard — bật semantic search trong <2 phút
AI-generated document templates — Meilisearch phân tích sample documents và tự sinh template tối ưu
AI Gateway — middleware giữa Meilisearch và mọi AI provider: retry, fallback, auth, metering, caching cực mạnh (cùng text → cùng vector → cache hit 100%)
Proprietary embedding + reranking models — chạy ngay trong binary, không cần external API

6.3. From API to Platform

Dashboard Cloud expose mọi API feature — shard management, index swap, webhooks, transfer
"Top 50 slowest requests" dashboard với pattern detection và optimization suggestions
Per-request tracing (v1.35+) — breakdown chi tiết: tokenization, keyword, semantic, formatting
AI diagnostic helper giải thích slow query bằng plain language

6.4. RAG at Scale

Chat engine với parallel multi-search — single tool call bắn nhiều search song song
Dynamic facet discovery — LLM tự explore filterable attributes
Conversational memory scope theo tenant
Unified OpenAI-compatible gateway: OpenAI, Anthropic, Mistral, Cohere, Vertex, Bedrock, Ollama với fallback chains

7. So sánh với Elasticsearch, Typesense, Algolia

Tiêu chí	Meilisearch	Elasticsearch	Typesense	Algolia
License	MIT (OSS)	SSPL/AGPL	GPLv3 (OSS)	Proprietary
Ngôn ngữ core	Rust	Java	C++	C++ (closed)
Storage	LMDB (mmap disk)	Lucene (disk)	In-memory	Proprietary
Latency typical	<50ms	100-500ms	<30ms	<50ms
RAM requirement	Linh hoạt	Cao	Bằng dataset	N/A (hosted)
Typo tolerance	Built-in, bật mặc định	Cần config fuzzy	Built-in	Built-in
Hybrid search	Native (v1.6+)	Qua kNN + BM25	Native	Qua NeuralSearch (add-on)
Multi-modal	Native (v1.16 Fragments)	Qua vector	Hạn chế	Hạn chế
Sharding	v1.37+ (mới)	Mature	Có	Có
RAG/Chat	Native (Chats API)	Qua ES Relevance Engine	Hạn chế	Qua NeuralSearch
Setup complexity	Rất đơn giản	Phức tạp	Đơn giản	Đơn giản (hosted)
Pricing model	Self-host free / Cloud theo usage	Elastic Cloud / self-host	Self-host free / Cloud	Theo operation (đắt)

8. Tích hợp với .NET Core

8.1. Setup SDK

# Cài package
dotnet add package MeiliSearch

# docker-compose.yml cho dev
version: '3.8'
services:
  meilisearch:
    image: getmeili/meilisearch:v1.16
    ports:
      - "7700:7700"
    environment:
      - MEILI_MASTER_KEY=your_master_key_here
      - MEILI_ENV=development
    volumes:
      - meili_data:/meili_data

volumes:
  meili_data:

8.2. Model và Service class

public class Product
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public string Category { get; set; }
    public decimal Price { get; set; }
    public DateTime CreatedAt { get; set; }
    public string[] Tags { get; set; }
}

public class MeiliSearchService
{
    private readonly MeilisearchClient _client;
    private readonly Index _productsIndex;

    public MeiliSearchService(IConfiguration config)
    {
        _client = new MeilisearchClient(
            config["Meilisearch:Url"] ?? "http://localhost:7700",
            config["Meilisearch:ApiKey"]
        );
        _productsIndex = _client.Index("products");
    }

    public async Task IndexProductsAsync(IEnumerable<Product> products)
    {
        var taskInfo = await _productsIndex.AddDocumentsAsync(products);
        await _client.WaitForTaskAsync(taskInfo.TaskUid);
    }

    public async Task<SearchResult<Product>> SearchAsync(
        string query,
        int page = 1,
        int pageSize = 20,
        string? categoryFilter = null)
    {
        var searchQuery = new SearchQuery
        {
            HitsPerPage = pageSize,
            Page = page,
            AttributesToHighlight = new[] { "name", "description" },
            HighlightPreTag = "<mark>",
            HighlightPostTag = "</mark>"
        };

        if (!string.IsNullOrWhiteSpace(categoryFilter))
        {
            searchQuery.Filter = $"category = \"{categoryFilter}\"";
        }

        return await _productsIndex.SearchAsync<Product>(query, searchQuery);
    }

    public async Task ConfigureIndexAsync()
    {
        await _productsIndex.UpdateSearchableAttributesAsync(new[]
        {
            "name",
            "description",
            "category",
            "tags"
        });

        await _productsIndex.UpdateFilterableAttributesAsync(new[]
        {
            "category",
            "price",
            "createdAt"
        });

        await _productsIndex.UpdateSortableAttributesAsync(new[]
        {
            "price",
            "createdAt"
        });

        await _productsIndex.UpdateRankingRulesAsync(new[]
        {
            "words", "typo", "proximity", "attribute",
            "sort", "exactness", "createdAt:desc"
        });
    }
}

8.3. API Controller

[ApiController]
[Route("api/products")]
public class ProductsController : ControllerBase
{
    private readonly MeiliSearchService _search;

    public ProductsController(MeiliSearchService search) => _search = search;

    [HttpGet("search")]
    public async Task<IActionResult> Search(
        [FromQuery] string q,
        [FromQuery] int page = 1,
        [FromQuery] int pageSize = 20,
        [FromQuery] string? category = null)
    {
        var result = await _search.SearchAsync(q, page, pageSize, category);

        return Ok(new
        {
            hits = result.Hits,
            page = result.Page,
            totalPages = result.TotalPages,
            totalHits = result.TotalHits,
            processingTimeMs = result.ProcessingTimeMs
        });
    }
}

8.4. Background Sync — CDC pattern với SQL Server

Trong hệ thống thực tế, data master nằm trong database SQL, Meilisearch chỉ là search layer. Pattern sync phổ biến:

public class ProductSyncService : BackgroundService
{
    private readonly IServiceProvider _sp;
    private readonly MeiliSearchService _search;

    protected override async Task ExecuteAsync(CancellationToken stop)
    {
        while (!stop.IsCancellationRequested)
        {
            using var scope = _sp.CreateScope();
            var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();

            var lastSync = await GetLastSyncTimeAsync();

            var changed = await db.Products
                .Where(p => p.UpdatedAt > lastSync)
                .AsNoTracking()
                .ToListAsync(stop);

            if (changed.Count > 0)
            {
                await _search.IndexProductsAsync(changed);
                await SaveLastSyncTimeAsync(DateTime.UtcNow);
            }

            await Task.Delay(TimeSpan.FromSeconds(10), stop);
        }
    }
}

Scale-up pattern

Với dataset lớn, thay vì polling, dùng Change Data Capture (SQL Server CDC, Debezium) → publish message lên Kafka → consumer gọi IndexProductsAsync. Pattern này giữ Meilisearch eventual consistent với DB master trong vài giây, chịu tải ghi cực cao. Kết hợp với UpdateDocumentsAsync (partial update) để tránh phải ship toàn bộ document mỗi lần đổi một field.

9. Performance và Benchmark thực tế

Một benchmark điển hình trên dataset 1 triệu sản phẩm e-commerce (~2GB JSON):

Operation	Meilisearch v1.16	Elasticsearch 8.x	Typesense
Initial indexing (1M docs)	~8 phút	~15 phút	~5 phút
Index size on disk	~2.8 GB	~4.5 GB	~3.2 GB (RAM)
RAM khi search (2GB dataset)	~500 MB	~4 GB	3.2 GB (bắt buộc)
P50 latency (keyword)	12ms	45ms	8ms
P99 latency (keyword)	48ms	180ms	35ms
P50 latency (hybrid)	35ms	120ms	28ms
QPS single node (4 vCPU)	~2,500	~1,200	~3,800
Update latency	100-500ms	1-3s	50-200ms

Benchmark — nhìn rộng hơn số

Typesense thắng raw speed vì keep all in RAM, nhưng tradeoff là bạn phải cấp đủ RAM cho dataset. Meilisearch thắng về tỷ lệ hiệu năng/chi phí — dataset 10GB chạy tốt trên máy 4GB RAM, trong khi Typesense sẽ crash. Elasticsearch thua ở both nhưng bù lại có hệ sinh thái Kibana, Logstash, ingest pipelines mà Meilisearch không có.

10. Chat API — RAG ngay trong Meilisearch

Từ v1.15.1, Meilisearch có Chats API — một endpoint OpenAI-compatible cho phép conversational retrieval mà không cần xây separate RAG pipeline:

// Cấu hình chat workspace
POST /chats/docs-support/settings
{
  "source": "openAi",
  "orgId": "org-...",
  "apiKey": "sk-...",
  "baseUrl": "https://api.openai.com/v1",
  "prompts": {
    "system": "Bạn là trợ lý hỗ trợ tài liệu của chúng tôi. Chỉ trả lời dựa trên documents retrieved."
  }
}

// Bật tool per-index
PATCH /indexes/docs/settings/chat
{
  "description": "Tài liệu kỹ thuật sản phẩm, gồm FAQ và hướng dẫn",
  "documentTemplate": "Title: {{doc.title}}\nContent: {{doc.content}}",
  "searchParameters": {
    "hybrid": { "embedder": "openai-small", "semanticRatio": 0.6 },
    "limit": 5
  }
}

// Gửi chat request — OpenAI-compatible
POST /chats/docs-support/chat/completions
{
  "model": "gpt-4o-mini",
  "messages": [
    { "role": "user", "content": "Làm sao để cấu hình webhook?" }
  ],
  "stream": true
}

Meilisearch tự: (1) dùng LLM sinh search query từ message, (2) query index, (3) inject documents vào prompt, (4) stream response. Citation events được emit song song với content stream để UI highlight sources. Đây là cách nhanh nhất để thêm AI chatbot vào app mà không tự xây pipeline RAG phức tạp.

11. Khi nào chọn Meilisearch?

graph TD
    A[Cần search engine] --> B{Dataset size?}
    B -->|<100M docs| C{Cần AI/Semantic?}
    B -->|>100M docs + complex analytics| D[Elasticsearch]

    C -->|Có| E{Ưu tiên OSS?}
    C -->|Không, chỉ keyword| F{Latency cực quan trọng?}

    E -->|Có| G[✅ Meilisearch]
    E -->|Không, budget OK| H[Algolia]

    F -->|Có, đủ RAM| I[Typesense]
    F -->|Cân bằng chi phí| G

    style A fill:#e94560,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style H fill:#2c3e50,stroke:#fff,color:#fff
    style I fill:#2c3e50,stroke:#fff,color:#fff

Decision tree chọn search engine theo workload

✅ Chọn Meilisearch khi:

Cần search-as-you-type latency <50ms cho ứng dụng web/mobile
Team nhỏ, muốn setup nhanh, tài liệu dễ đọc, self-host trên VPS bình thường
Dataset từ vài nghìn đến hàng chục triệu documents
Cần hybrid search (keyword + semantic) out-of-the-box
Muốn thêm RAG chatbot mà không tự xây pipeline
Budget giới hạn, không muốn khóa vào Algolia pricing

❌ Cân nhắc khác khi:

Cần analytics phức tạp (aggregation nhiều tầng, time-series) — Elasticsearch/OpenSearch
Dataset trăm triệu docs với traffic nặng — chờ sharding Meilisearch mature, hoặc dùng ES
Cần log management, APM, SIEM ecosystem — Elastic Stack
Team 100% muốn hosted solution không lo infra — Algolia

12. Timeline phát triển Meilisearch

2018

Meilisearch ra đời từ team cựu Algolia, open source MIT từ đầu

2021

v0.20 — Stable typo tolerance, custom ranking rules

2023

v1.0 GA — Production-ready, API stable

2023 Q4

v1.6 — Vector search (HNSW) và hybrid search ra mắt

2024

v1.10-v1.13 — Embedder API stable, Meilisearch Cloud GA, Chat GA

2025 Q4

v1.15 — Disable typo on numbers, lexicographic string filter, Chinese tokenizer cải thiện, Chats API

2026 Q1

v1.16 — Fragments API (multi-modal), Exports API, Documents sort

2026 Q2

v1.35-v1.37 — Per-request tracing, sharding release

2026 Q3 (dự kiến)

Serverless indexes, AI Gateway, proprietary embedding + reranking models

Tổng kết

Meilisearch năm 2026 không còn là "Algolia open source" đơn thuần. Nó đã tiến hóa thành một platform unified cho search + AI retrieval, với các tính năng mà vài năm trước đòi hỏi bạn phải kết hợp Elasticsearch + Pinecone + LangChain + custom reranker. Với Fragments API, AI Gateway sắp tới, serverless indexes và proprietary embedding model, Meilisearch đang đặt cược lớn vào tương lai nơi mỗi app đều cần search thông minh chứ không chỉ là full-text đơn thuần.

Với developer .NET, Meilisearch là lựa chọn gần như không có đối thủ nếu bạn cần search engine hiện đại, latency thấp, hybrid ready, và không muốn quản lý JVM cluster Elasticsearch. Setup Docker 5 phút, SDK C# sạch sẽ, docs tuyệt vời — các yếu tố DX mà developer Việt Nam luôn đánh giá cao.

Nguồn tham khảo

#Meilisearch #Search Engine #Hybrid Search #Vector Search #RAG #.NET Core

# Meilisearch 2026 — Deep dive: Hybrid Search, Fragments API và tương lai AI Retrieval

## 1. Vì sao Meilisearch trở thành search engine đáng chú ý nhất 2026

Nếu Elasticsearch là "con bò tót" của thế giới search — mạnh mẽ, vạn năng nhưng cồng kềnh, thì Meilisearch là con dao mổ chính xác: nhỏ gọn, nhanh đến mức khó tin, và được thiết kế cho **developer experience** từ ngày đầu. Ra đời năm 2018 tại Pháp bởi một team gồm ba kỹ sư từng làm việc với Algolia, Meilisearch kế thừa triết lý "search instant" nhưng đi theo con đường open source với giấy phép MIT.

Năm 2026 đánh dấu bước chuyển mình lớn của Meilisearch: từ một full-text search engine thuần túy, nó trở thành **unified search & AI retrieval platform** — tích hợp hybrid search (keyword + semantic), vector search, multi-modal (Fragments API), chat engine cho RAG, và sắp tới là sharding + serverless indexes. Bài viết này sẽ đào sâu vào kiến trúc, các tính năng mới nhất của v1.15/v1.16, roadmap 2026, và cách tích hợp thực tế với .NET Core.

<50ms Response time điển hình

48K+ GitHub Stars

v1.16 Phiên bản mới nhất

Rust Ngôn ngữ core

## 2. Kiến trúc bên trong — Rust, LMDB và ranking rules

### 2.1. Vì sao Rust + LMDB lại quan trọng

Meilisearch được viết hoàn toàn bằng **Rust** — ngôn ngữ cho phép tận dụng hiệu năng gần bằng C/C++ nhưng với memory safety tuyệt đối. Storage engine là **LMDB** (Lightning Memory-Mapped Database) — một key-value store B+Tree được Symas phát triển, nổi tiếng với đặc điểm *zero-copy reads* và hỗ trợ MVCC (Multi-Version Concurrency Control).

Sự khác biệt then chốt so với Typesense (keep everything in RAM) là Meilisearch sử dụng **memory-mapped files**: OS chỉ load các page cần thiết vào RAM theo nguyên lý demand paging. Điều này cho phép index dataset lớn hơn RAM vật lý mà vẫn giữ được tốc độ đọc gần như in-memory (nhờ OS page cache).

```
graph TB
    A[Search Request] --> B[HTTP Server  
Actix-web]
    B --> C[Query Parser]
    C --> D{Query Type}
    D -->|Keyword| E[Inverted Index]
    D -->|Semantic| F[Vector Index  
HNSW]
    D -->|Hybrid| G[Both + Rerank]
    E --> H[Ranking Rules  
Bucket Sort]
    F --> H
    G --> H
    H --> I[LMDB  
Memory-mapped]
    I --> J[Response]
    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style I fill:#16213e,stroke:#fff,color:#fff
    style J fill:#4CAF50,stroke:#fff,color:#fff
    style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50

```
Pipeline xử lý request trong Meilisearch — từ HTTP tới LMDB

### 2.2. Ranking Rules — Trái tim của relevance

Khác với Elasticsearch dùng BM25 làm mặc định, Meilisearch có hệ thống **ranking rules theo tầng** (bucket sort) — mỗi rule lọc dần ứng viên thay vì tính điểm tổng hợp. Thứ tự mặc định:

1. **Words** — document chứa nhiều term của query nhất
2. **Typo** — ưu tiên document có ít lỗi chính tả hơn
3. **Proximity** — các term xuất hiện gần nhau
4. **Attribute** — term match attribute ưu tiên cao hơn (title > description)
5. **Sort** — custom sort theo field số/chuỗi
6. **Exactness** — exact match được ưu tiên

Bạn có thể thêm custom rule, ví dụ `release_date:desc` để ưu tiên bài mới hơn. Điểm mạnh của cách tiếp cận này là **deterministic và dễ debug**: lý do một document được xếp hạng cao có thể trace từng rule một, không phải là "magic score" như BM25.

```
// Cấu hình ranking rules
{
  "rankingRules": [
    "words",
    "typo",
    "proximity",
    "attribute",
    "sort",
    "exactness",
    "release_date:desc",
    "popularity:desc"
  ]
}
```

## 3. Meilisearch v1.15 — Typo số, string filter và tiếng Trung

### 3.1. Disable typo tolerance cho số

Trước v1.15, Meilisearch áp dụng typo tolerance cho mọi từ kể cả số. Điều này gây phiền toái với dữ liệu như mã bưu điện, số điện thoại, năm — tìm `2024` có thể trả về cả `2025` hay `2004`. Phiên bản 1.15 cho phép tắt typo tolerance riêng cho số:

```
PATCH /indexes/{index_uid}/settings/typo-tolerance
{
  "disableOnNumbers": true
}
```
Bonus: indexing nhanh hơn đáng kể với dataset có nhiều số unique, vì Meilisearch không cần build typo variations cho chúng.

### 3.2. Lexicographic string filter

Operator so sánh (`<`, `<=`, `>`, `>=`, `TO`) giờ hoạt động với chuỗi, sắp xếp theo thứ tự lexicographic. Cực kỳ hữu dụng với date ISO format:

```
// Tìm bài viết trước ngày 17/07/2023
POST /indexes/posts/search
{
  "q": "rust",
  "filter": "release_date < \"2023-07-17\""
}

// Range: các phiên bản từ v1.10 đến v1.15
{
  "filter": "version \"1.10\" TO \"1.15\""
}
```

### 3.3. Cải thiện tokenizer tiếng Trung

Tokenizer **Charabia** của Meilisearch được cải tiến để segment tiếng Trung chính xác hơn — quan trọng vì tiếng Trung không có space giữa các từ như tiếng Anh. **Lưu ý:** nếu dataset của bạn có nội dung tiếng Trung, bắt buộc phải reindex sau khi upgrade lên v1.15, nếu không query tiếng Trung có thể bị ignore hoàn toàn.

#### Tiếng Việt trong Meilisearch

Meilisearch hỗ trợ tiếng Việt ổn ở mức cơ bản qua Unicode normalization, nhưng chưa có rule đặc biệt cho thanh điệu/dấu. Để tối ưu, nên thêm cả version có dấu và không dấu vào `searchableAttributes`, hoặc dùng `synonyms` map (`"ơn": ["on"]`). Một số developer dùng pre-processing Python với `unidecode` để tạo field `title_unsigned` song song.

## 4. Meilisearch v1.16 — Fragments API, Exports và Documents Sort

### 4.1. Fragments API — Multi-modal search

Đây là tính năng *đáng giá nhất* của v1.16. Fragments cho phép decompose document và query thành các phần ngữ nghĩa riêng biệt — ví dụ một sản phẩm có thể có fragment text (tên, mô tả) và fragment image (URL ảnh). Ứng dụng thực tế: người dùng mô tả "áo len xanh dương dáng rộng" → Meilisearch dùng text embedding match description VÀ image embedding match ảnh → kết quả tốt hơn hẳn full-text truyền thống.

```
// Cấu hình fragments cho movie index
PATCH /indexes/movies/settings/embedders
{
  "multimodal": {
    "source": "rest",
    "url": "https://api.openai.com/v1/embeddings",
    "indexingFragments": {
      "textPart": {
        "value": {
          "text": "{{doc.title}}. {{doc.description}}"
        }
      },
      "imagePart": {
        "value": {
          "image": "{{doc.poster_url}}"
        }
      }
    },
    "searchFragments": {
      "queryText": {
        "value": { "text": "{{q}}" }
      },
      "queryImage": {
        "value": { "image": "{{media.image}}" }
      }
    }
  }
}

// Search với ảnh base64
POST /indexes/movies/search
{
  "q": "phim hành động",
  "hybrid": {
    "embedder": "multimodal",
    "semanticRatio": 0.7
  },
  "media": {
    "image": "data:image/jpeg;base64,/9j/4AAQ..."
  }
}
```

### 4.2. Exports API — Migration không cần dump

```
POST /experimental-features
{
  "exports": true
}

POST /export
{
  "url": "https://target-instance.meilisearch.io",
  "apiKey": "target_master_key",
  "indexes": {
    "products": {},
    "users": {
      "filter": "is_active = true",
      "overrideSettings": false
    }
  }
}
```
Particularly hữu dụng khi migrate từ local dev lên Meilisearch Cloud — không cần SSH vào server, không cần scp file dump hàng GB.

### 4.3. Sort trên Documents API

Trước đây sort chỉ có trong Search API; v1.16 thêm sort cho Documents API — list/export documents theo thứ tự bất kỳ:

```
GET /indexes/products/documents?sort=price:desc,name:asc&limit=50
```

## 5. Hybrid Search — Kết hợp keyword + semantic

### 5.1. Vì sao hybrid mới là "đúng đắn"

Keyword search xuất sắc khi user biết chính xác họ tìm gì (`"iPhone 15 Pro Max 256GB"`), nhưng thất bại với query mô tả (`"điện thoại cao cấp để quay vlog"`). Semantic search (vector) ngược lại: hiểu intent tốt nhưng có thể miss exact match quan trọng. **Hybrid** kết hợp cả hai với tỷ trọng có thể điều chỉnh:

```
graph LR
    A[Query: dien thoai  
quay vlog tot] --> B[Keyword  
BM25-like]
    A --> C[Embedder  
OpenAI/Jina/Voyage]
    C --> D[Vector Search  
HNSW]
    B --> E[Score Fusion  
semanticRatio: 0.5]
    D --> E
    E --> F[Reranker  
Optional]
    F --> G[Final Results]
    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style D fill:#16213e,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#4CAF50,stroke:#fff,color:#fff

```
Luồng hybrid search trong Meilisearch — keyword + vector + rerank

### 5.2. Cấu hình embedder

```
// Bước 1: Cấu hình embedder (OpenAI)
PATCH /indexes/products/settings/embedders
{
  "openai-small": {
    "source": "openAi",
    "model": "text-embedding-3-small",
    "apiKey": "sk-...",
    "documentTemplate": "A product titled {{doc.name}} in category {{doc.category}} with description: {{doc.description}}",
    "dimensions": 1536
  }
}

// Bước 2: Search hybrid
POST /indexes/products/search
{
  "q": "dien thoai quay vlog tot",
  "hybrid": {
    "embedder": "openai-small",
    "semanticRatio": 0.7
  },
  "limit": 20
}
```

#### Composite Embedder — Pattern thực tế

Pattern hay nhất cho production: dùng embedder **local nhỏ** (ví dụ `all-MiniLM-L6-v2` qua Ollama) cho query (latency thấp, miễn phí), và embedder **remote chất lượng cao** (OpenAI, Voyage) cho indexing (chạy một lần, chất lượng vector tốt). Meilisearch 2026 đang biến pattern này thành single toggle qua composite embedder API.

### 5.3. Reranking — Tăng relevance thêm một bậc

```
POST /indexes/products/search
{
  "q": "dien thoai quay vlog",
  "hybrid": {
    "embedder": "openai-small",
    "semanticRatio": 0.5
  },
  "ranker": {
    "model": "cohere-rerank-v3",
    "apiKey": "cohere_...",
    "topK": 10
  },
  "limit": 10
}
```

## 6. Roadmap 2026 — Meilisearch đi đâu?

Tháng 3/2026, Meilisearch công bố roadmap 4 initiative lớn:

### 6.1. Any Workload — Sharding & Serverless

**Sharding (đã release v1.37):** chia index thành nhiều shard trên nhiều node, phá vỡ giới hạn single-machine. High availability qua replication.

**Serverless Indexes (Q3 2026):** index không active sẽ move xuống object storage (S3), spin up khi có query. Use case huge: SaaS multi-tenant với hàng triệu tenants trong đó chỉ 5% active đồng thời — thay vì trả tiền cho 1 triệu warm index, chỉ trả cho 50.000 active.

### 6.2. Hybrid by Default

- **Guided setup** trên Cloud dashboard — bật semantic search trong <2 phút
- **AI-generated document templates** — Meilisearch phân tích sample documents và tự sinh template tối ưu
- **AI Gateway** — middleware giữa Meilisearch và mọi AI provider: retry, fallback, auth, metering, caching cực mạnh (cùng text → cùng vector → cache hit 100%)
- **Proprietary embedding + reranking models** — chạy ngay trong binary, không cần external API

### 6.3. From API to Platform

- Dashboard Cloud expose mọi API feature — shard management, index swap, webhooks, transfer
- "Top 50 slowest requests" dashboard với pattern detection và optimization suggestions
- Per-request tracing (v1.35+) — breakdown chi tiết: tokenization, keyword, semantic, formatting
- AI diagnostic helper giải thích slow query bằng plain language

### 6.4. RAG at Scale

- Chat engine với **parallel multi-search** — single tool call bắn nhiều search song song
- Dynamic facet discovery — LLM tự explore filterable attributes
- Conversational memory scope theo tenant
- Unified OpenAI-compatible gateway: OpenAI, Anthropic, Mistral, Cohere, Vertex, Bedrock, Ollama với fallback chains

## 7. So sánh với Elasticsearch, Typesense, Algolia

| Tiêu chí | Meilisearch | Elasticsearch | Typesense | Algolia |
| --- | --- | --- | --- | --- |
| **License** | MIT (OSS) | SSPL/AGPL | GPLv3 (OSS) | Proprietary |
| **Ngôn ngữ core** | Rust | Java | C++ | C++ (closed) |
| **Storage** | LMDB (mmap disk) | Lucene (disk) | In-memory | Proprietary |
| **Latency typical** | <50ms | 100-500ms | <30ms | <50ms |
| **RAM requirement** | Linh hoạt | Cao | Bằng dataset | N/A (hosted) |
| **Typo tolerance** | Built-in, bật mặc định | Cần config fuzzy | Built-in | Built-in |
| **Hybrid search** | Native (v1.6+) | Qua kNN + BM25 | Native | Qua NeuralSearch (add-on) |
| **Multi-modal** | Native (v1.16 Fragments) | Qua vector | Hạn chế | Hạn chế |
| **Sharding** | v1.37+ (mới) | Mature | Có | Có |
| **RAG/Chat** | Native (Chats API) | Qua ES Relevance Engine | Hạn chế | Qua NeuralSearch |
| **Setup complexity** | Rất đơn giản | Phức tạp | Đơn giản | Đơn giản (hosted) |
| **Pricing model** | Self-host free / Cloud theo usage | Elastic Cloud / self-host | Self-host free / Cloud | Theo operation (đắt) |

## 8. Tích hợp với .NET Core

### 8.1. Setup SDK

```
# Cài package
dotnet add package MeiliSearch

# docker-compose.yml cho dev
version: '3.8'
services:
  meilisearch:
    image: getmeili/meilisearch:v1.16
    ports:
      - "7700:7700"
    environment:
      - MEILI_MASTER_KEY=your_master_key_here
      - MEILI_ENV=development
    volumes:
      - meili_data:/meili_data

volumes:
  meili_data:
```

### 8.2. Model và Service class

```
public class Product
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public string Category { get; set; }
    public decimal Price { get; set; }
    public DateTime CreatedAt { get; set; }
    public string[] Tags { get; set; }
}

public class MeiliSearchService
{
    private readonly MeilisearchClient _client;
    private readonly Index _productsIndex;

public MeiliSearchService(IConfiguration config)
    {
        _client = new MeilisearchClient(
            config["Meilisearch:Url"] ?? "http://localhost:7700",
            config["Meilisearch:ApiKey"]
        );
        _productsIndex = _client.Index("products");
    }

public async Task IndexProductsAsync(IEnumerable<Product> products)
    {
        var taskInfo = await _productsIndex.AddDocumentsAsync(products);
        await _client.WaitForTaskAsync(taskInfo.TaskUid);
    }

public async Task<SearchResult<Product>> SearchAsync(
        string query,
        int page = 1,
        int pageSize = 20,
        string? categoryFilter = null)
    {
        var searchQuery = new SearchQuery
        {
            HitsPerPage = pageSize,
            Page = page,
            AttributesToHighlight = new[] { "name", "description" },
            HighlightPreTag = "<mark>",
            HighlightPostTag = "</mark>"
        };

if (!string.IsNullOrWhiteSpace(categoryFilter))
        {
            searchQuery.Filter = $"category = \"{categoryFilter}\"";
        }

return await _productsIndex.SearchAsync<Product>(query, searchQuery);
    }

public async Task ConfigureIndexAsync()
    {
        await _productsIndex.UpdateSearchableAttributesAsync(new[]
        {
            "name",
            "description",
            "category",
            "tags"
        });

await _productsIndex.UpdateFilterableAttributesAsync(new[]
        {
            "category",
            "price",
            "createdAt"
        });

await _productsIndex.UpdateSortableAttributesAsync(new[]
        {
            "price",
            "createdAt"
        });

await _productsIndex.UpdateRankingRulesAsync(new[]
        {
            "words", "typo", "proximity", "attribute",
            "sort", "exactness", "createdAt:desc"
        });
    }
}
```

### 8.3. API Controller

```
[ApiController]
[Route("api/products")]
public class ProductsController : ControllerBase
{
    private readonly MeiliSearchService _search;

public ProductsController(MeiliSearchService search) => _search = search;

[HttpGet("search")]
    public async Task<IActionResult> Search(
        [FromQuery] string q,
        [FromQuery] int page = 1,
        [FromQuery] int pageSize = 20,
        [FromQuery] string? category = null)
    {
        var result = await _search.SearchAsync(q, page, pageSize, category);

return Ok(new
        {
            hits = result.Hits,
            page = result.Page,
            totalPages = result.TotalPages,
            totalHits = result.TotalHits,
            processingTimeMs = result.ProcessingTimeMs
        });
    }
}
```

### 8.4. Background Sync — CDC pattern với SQL Server

Trong hệ thống thực tế, data master nằm trong database SQL, Meilisearch chỉ là search layer. Pattern sync phổ biến:

```
public class ProductSyncService : BackgroundService
{
    private readonly IServiceProvider _sp;
    private readonly MeiliSearchService _search;

protected override async Task ExecuteAsync(CancellationToken stop)
    {
        while (!stop.IsCancellationRequested)
        {
            using var scope = _sp.CreateScope();
            var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();

var lastSync = await GetLastSyncTimeAsync();

var changed = await db.Products
                .Where(p => p.UpdatedAt > lastSync)
                .AsNoTracking()
                .ToListAsync(stop);

if (changed.Count > 0)
            {
                await _search.IndexProductsAsync(changed);
                await SaveLastSyncTimeAsync(DateTime.UtcNow);
            }

await Task.Delay(TimeSpan.FromSeconds(10), stop);
        }
    }
}
```

#### Scale-up pattern

Với dataset lớn, thay vì polling, dùng **Change Data Capture** (SQL Server CDC, Debezium) → publish message lên Kafka → consumer gọi `IndexProductsAsync`. Pattern này giữ Meilisearch *eventual consistent* với DB master trong vài giây, chịu tải ghi cực cao. Kết hợp với `UpdateDocumentsAsync` (partial update) để tránh phải ship toàn bộ document mỗi lần đổi một field.

## 9. Performance và Benchmark thực tế

Một benchmark điển hình trên dataset 1 triệu sản phẩm e-commerce (~2GB JSON):

| Operation | Meilisearch v1.16 | Elasticsearch 8.x | Typesense |
| --- | --- | --- | --- |
| Initial indexing (1M docs) | ~8 phút | ~15 phút | ~5 phút |
| Index size on disk | ~2.8 GB | ~4.5 GB | ~3.2 GB (RAM) |
| RAM khi search (2GB dataset) | ~500 MB | ~4 GB | 3.2 GB (bắt buộc) |
| P50 latency (keyword) | 12ms | 45ms | 8ms |
| P99 latency (keyword) | 48ms | 180ms | 35ms |
| P50 latency (hybrid) | 35ms | 120ms | 28ms |
| QPS single node (4 vCPU) | ~2,500 | ~1,200 | ~3,800 |
| Update latency | 100-500ms | 1-3s | 50-200ms |

#### Benchmark — nhìn rộng hơn số

Typesense thắng raw speed vì keep all in RAM, nhưng tradeoff là bạn phải cấp đủ RAM cho dataset. Meilisearch thắng về **tỷ lệ hiệu năng/chi phí** — dataset 10GB chạy tốt trên máy 4GB RAM, trong khi Typesense sẽ crash. Elasticsearch thua ở both nhưng bù lại có hệ sinh thái Kibana, Logstash, ingest pipelines mà Meilisearch không có.

## 10. Chat API — RAG ngay trong Meilisearch

Từ v1.15.1, Meilisearch có **Chats API** — một endpoint OpenAI-compatible cho phép conversational retrieval mà không cần xây separate RAG pipeline:

```
// Cấu hình chat workspace
POST /chats/docs-support/settings
{
  "source": "openAi",
  "orgId": "org-...",
  "apiKey": "sk-...",
  "baseUrl": "https://api.openai.com/v1",
  "prompts": {
    "system": "Bạn là trợ lý hỗ trợ tài liệu của chúng tôi. Chỉ trả lời dựa trên documents retrieved."
  }
}

// Bật tool per-index
PATCH /indexes/docs/settings/chat
{
  "description": "Tài liệu kỹ thuật sản phẩm, gồm FAQ và hướng dẫn",
  "documentTemplate": "Title: {{doc.title}}\nContent: {{doc.content}}",
  "searchParameters": {
    "hybrid": { "embedder": "openai-small", "semanticRatio": 0.6 },
    "limit": 5
  }
}

// Gửi chat request — OpenAI-compatible
POST /chats/docs-support/chat/completions
{
  "model": "gpt-4o-mini",
  "messages": [
    { "role": "user", "content": "Làm sao để cấu hình webhook?" }
  ],
  "stream": true
}
```
Meilisearch tự: (1) dùng LLM sinh search query từ message, (2) query index, (3) inject documents vào prompt, (4) stream response. Citation events được emit song song với content stream để UI highlight sources. Đây là cách nhanh nhất để thêm AI chatbot vào app mà không tự xây pipeline RAG phức tạp.

## 11. Khi nào chọn Meilisearch?

```
graph TD
    A[Cần search engine] --> B{Dataset size?}
    B -->|<100M docs| C{Cần AI/Semantic?}
    B -->|>100M docs + complex analytics| D[Elasticsearch]

C -->|Có| E{Ưu tiên OSS?}
    C -->|Không, chỉ keyword| F{Latency cực quan trọng?}

E -->|Có| G[✅ Meilisearch]
    E -->|Không, budget OK| H[Algolia]

F -->|Có, đủ RAM| I[Typesense]
    F -->|Cân bằng chi phí| G

style A fill:#e94560,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style H fill:#2c3e50,stroke:#fff,color:#fff
    style I fill:#2c3e50,stroke:#fff,color:#fff

```
Decision tree chọn search engine theo workload

### ✅ Chọn Meilisearch khi:

- Cần search-as-you-type latency <50ms cho ứng dụng web/mobile
- Team nhỏ, muốn setup nhanh, tài liệu dễ đọc, self-host trên VPS bình thường
- Dataset từ vài nghìn đến hàng chục triệu documents
- Cần hybrid search (keyword + semantic) out-of-the-box
- Muốn thêm RAG chatbot mà không tự xây pipeline
- Budget giới hạn, không muốn khóa vào Algolia pricing

### ❌ Cân nhắc khác khi:

- Cần analytics phức tạp (aggregation nhiều tầng, time-series) — Elasticsearch/OpenSearch
- Dataset trăm triệu docs với traffic nặng — chờ sharding Meilisearch mature, hoặc dùng ES
- Cần log management, APM, SIEM ecosystem — Elastic Stack
- Team 100% muốn hosted solution không lo infra — Algolia

## 12. Timeline phát triển Meilisearch

2018

Meilisearch ra đời từ team cựu Algolia, open source MIT từ đầu

2021

**v0.20** — Stable typo tolerance, custom ranking rules

2023

**v1.0 GA** — Production-ready, API stable

2023 Q4

**v1.6** — Vector search (HNSW) và hybrid search ra mắt

2024

**v1.10-v1.13** — Embedder API stable, Meilisearch Cloud GA, Chat GA

2025 Q4

**v1.15** — Disable typo on numbers, lexicographic string filter, Chinese tokenizer cải thiện, Chats API

2026 Q1

**v1.16** — Fragments API (multi-modal), Exports API, Documents sort

2026 Q2

**v1.35-v1.37** — Per-request tracing, sharding release

2026 Q3 (dự kiến)

**Serverless indexes**, AI Gateway, proprietary embedding + reranking models

## Tổng kết

### Nguồn tham khảo

- [Meilisearch 1.16 — Fragments, Exports, Documents Sort](https://www.meilisearch.com/blog/meilisearch-1-16)
- [Meilisearch 1.15 — Typo tolerance, string filter, Chinese](https://www.meilisearch.com/blog/meilisearch-1-15)
- [Meilisearch Roadmap Roundup — March 2026](https://www.meilisearch.com/blog/2026-march-roadmap)
- [Hybrid Search Documentation](https://www.meilisearch.com/docs/capabilities/hybrid_search/overview)
- [Meilisearch — Comparison to Alternatives](https://www.meilisearch.com/docs/learn/resources/comparison_to_alternatives)
- [Meilisearch .NET SDK — GitHub](https://github.com/meilisearch/meilisearch-dotnet)
- [Meilisearch Releases — GitHub](https://github.com/meilisearch/meilisearch/releases)
- [Top 10 Elasticsearch Alternatives 2026](https://www.meilisearch.com/blog/elasticsearch-alternatives)

Nginx vs Caddy vs Traefik — Chọn Reverse Proxy cho hệ thống 2026

Database Indexing — Nghệ thuật tối ưu query cho hệ thống Production

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.