Case Studies Advanced 5 min read

Design a Typeahead Autocomplete (Search-as-You-Type)

How to build search-as-you-type autocomplete in .NET: trie data structure, Redis sorted set rankings, debouncing on the client, and freshness via streaming updates.

Phùng Anh Tú · May 26, 2026

Table of contents

When does typeahead need its own design?
What numbers should I budget for?
What does the architecture look like?
What is the .NET 10 wiring with Redis sorted sets?
How does the freshness pipeline keep rankings recent?
What scale-out path does this support?
What failure modes does this introduce?
When is autocomplete overkill?
Where should you go from here?

Search-as-you-type is the case study where data structures matter: the wrong choice gives you 500 ms latency and the right choice gives you 20 ms. This chapter shows the two practical implementations - Redis sorted sets and a custom trie service - and the .NET wiring plus the freshness pipeline that keeps the rankings recent.

When does typeahead need its own design?

Three signals.

Latency must be under 100 ms. Anything slower than that and the user types past the suggestions. The cache layer is mandatory; the database alone cannot hit that latency at scale.

The vocabulary changes. New product names, trending searches, freshly registered cities - the data is not static. You need a pipeline to update rankings, not a one-time index build.

Many terms, many prefixes. Search across millions of terms with millions of prefixes. The naive 'WHERE name LIKE @prefix%' query collapses; specialised data structures earn their keep.

If you have hundreds of static terms (country list, language dropdown), client-side filtering is the right answer.

What numbers should I budget for?

Searches / day              50M
Avg keystrokes / search     7
Total prefix queries        50M * 7 = 350M / day
After client debounce       350M / 7 = 50M (one per pause)
Peak QPS                    50M / 100K * 5 = 2,500
Latency budget              < 100 ms p99 end-to-end
Vocabulary size             1M unique terms
Avg prefix bucket size      top 10 completions

Without client debounce, the QPS is 7x higher; debouncing is the single biggest server-side optimisation. The 1M term vocabulary fits comfortably in Redis - sorted sets with TTL keep it fresh, with rate limiting protecting the endpoint from abuse.

What does the architecture look like?

flowchart LR
    Client[Browser] -->|debounce 200ms| App[ASP.NET Core]
    App -->|ZRANGE prefix| Redis[(Redis<br/>sorted sets)]
    App --> Result[top K completions]
    SearchLog[Search clicks] --> Stream[(Kafka / event queue)]
    Stream --> Agg[Aggregator]
    Agg -->|update scores| Redis
    Reload[Nightly batch] -->|full rebuild| Redis

The query path is one Redis call. The freshness path is a streaming aggregator updating scores. A nightly batch job rebuilds from scratch to recover from any drift.

What is the .NET 10 wiring with Redis sorted sets?

Index time:

public class AutocompleteIndexer(IConnectionMultiplexer redis)
{
    public async Task IndexTermAsync(string term, double popularity)
    {
        var db = redis.GetDatabase();
        var lower = term.ToLowerInvariant();
        // For each prefix from length 2 to len(term), add the term to a sorted set.
        for (var len = 2; len <= lower.Length; len++)
        {
            var prefix = lower.Substring(0, len);
            await db.SortedSetAddAsync($"ac:{prefix}", term, popularity);
        }
    }
}

// Or store term once with a separate "all prefixes share the trie" model:
// More memory-efficient version uses a trie node per prefix.

Query time:

app.MapGet("/autocomplete", async (string q, IConnectionMultiplexer redis) =>
{
    if (string.IsNullOrWhiteSpace(q) || q.Length < 2) return Results.Ok(Array.Empty<string>());
    var db = redis.GetDatabase();
    var key = $"ac:{q.ToLowerInvariant()}";
    var top = await db.SortedSetRangeByRankAsync(key, 0, 9, Order.Descending);
    return Results.Ok(top.Select(v => v.ToString()).ToArray());
})
.RequireRateLimiting("autocomplete");

Three details. The prefix-fanout at index time multiplies storage by the average term length (~10x). For a million terms, that is ~10M sorted-set entries - cheap on Redis. The query is a single ZRANGE - sub-millisecond. Rate-limiting is essential because autocomplete endpoints are easy to abuse.

How does the freshness pipeline keep rankings recent?

flowchart LR
    SearchEvent --> K[(Kafka)]
    K --> Counter[Counting worker]
    Counter --> Window[(Sliding 7d counts)]
    Window --> Updater[Score updater]
    Updater -->|ZADD or ZINCRBY| Redis
    Reload[Nightly job] --> Redis

Search events stream through Kafka. A windowed aggregator computes 7-day counts per term. An updater pushes the new scores to Redis via ZADD (overwrite) or ZINCRBY (incremental). The nightly job is the safety net that rebuilds the entire index from logs; even if every streaming worker breaks, autocomplete catches up within 24 hours.

What scale-out path does this support?

Redis: cluster, shard sorted sets by prefix hash; one prefix fits one shard so no cross-shard query.
App tier: stateless, scales horizontally; each query is one Redis call so the bottleneck is Redis CPU, not app.
Streaming pipeline: Kafka partitioning + Flink/Spark or custom .NET workers; partition by term hash.
Storage of cold terms: Postgres for the canonical list; Redis for the hot prefixes only.

For very large vocabularies (>10M terms), a custom trie service in .NET (a server holding the trie in memory) cuts Redis memory substantially.

What failure modes does this introduce?

Stale rankings - the streaming pipeline is broken; rankings are outdated. Mitigation: nightly batch rebuild as the safety net; alert on aggregator lag.
Hot prefix - "a" returns 10K results across the user base; one Redis CPU saturates. Mitigation: cache hot prefixes in IMemoryCache per app instance for a few seconds.
Cold prefix on first user - new product launches, prefix has no data. Mitigation: fall back to Elasticsearch query if Redis returns empty; populate Redis on miss.
Profanity in suggestions - user-entered queries become suggestions to others. Mitigation: blocklist filter at index time and at query time.

When is autocomplete overkill?

When the input is from a small fixed set (state names, language codes) - load on the client and filter there. When the search backend already returns top-10 results in under 50 ms (small indexes), call it directly. The dedicated autocomplete service is for the case where the search engine is too slow or too expensive to call on every keystroke.

Where should you go from here?

Next case study: payment / checkout system - the case where idempotency, sagas, and reliability all collide. The hardest case study in the series.

Frequently asked questions

Trie or Redis sorted set?

Sorted set per prefix is simpler and uses one Redis call per query. Trie is more memory-efficient when you have millions of terms but requires a custom service. For most product autocompletes (a few hundred thousand terms), the sorted-set approach is sufficient and easier to operate. Drop in the trie when memory cost becomes a problem.

Why do I need to debounce on the client?

Without it, every keystroke sends a request - one user typing 'system' sends six requests in a few hundred ms. Debouncing waits for ~200 ms of inactivity before sending the request. Reduces server load by 80% and improves perceived UX because the user sees one stable result instead of three flickering ones.

How do I keep the rankings fresh?

Streaming pipeline. Search clicks emit events, an aggregator computes daily and weekly counts, and a worker updates Redis sorted sets. The analytics events case study covers the same shape - autocomplete is a downstream consumer of the same event stream.

What about typo tolerance?

Two layers. First, normalise input (lowercase, strip accents) before lookup. Second, a fuzzy fallback - if the prefix returns nothing, query Elasticsearch with Fuzziness AUTO. The search chapter covers fuzzy matching; combine the two for the best UX.