Design a Typeahead Autocomplete (Search-as-You-Type)
How to build search-as-you-type autocomplete in .NET: trie data structure, Redis sorted set rankings, debouncing on the client, and freshness via streaming updates.
Table of contents
- When does typeahead need its own design?
- What numbers should I budget for?
- What does the architecture look like?
- What is the .NET 10 wiring with Redis sorted sets?
- How does the freshness pipeline keep rankings recent?
- What scale-out path does this support?
- What failure modes does this introduce?
- When is autocomplete overkill?
- Where should you go from here?
Search-as-you-type is the case study where data structures matter: the wrong choice gives you 500 ms latency and the right choice gives you 20 ms. This chapter shows the two practical implementations - Redis sorted sets and a custom trie service - and the .NET wiring plus the freshness pipeline that keeps the rankings recent.
When does typeahead need its own design?
Three signals.
Latency must be under 100 ms. Anything slower than that and the user types past the suggestions. The cache layer is mandatory; the database alone cannot hit that latency at scale.
The vocabulary changes. New product names, trending searches, freshly registered cities - the data is not static. You need a pipeline to update rankings, not a one-time index build.
Many terms, many prefixes. Search across millions of terms with millions of prefixes. The naive 'WHERE name LIKE @prefix%' query collapses; specialised data structures earn their keep.
If you have hundreds of static terms (country list, language dropdown), client-side filtering is the right answer.
What numbers should I budget for?
Searches / day 50M
Avg keystrokes / search 7
Total prefix queries 50M * 7 = 350M / day
After client debounce 350M / 7 = 50M (one per pause)
Peak QPS 50M / 100K * 5 = 2,500
Latency budget < 100 ms p99 end-to-end
Vocabulary size 1M unique terms
Avg prefix bucket size top 10 completions
Without client debounce, the QPS is 7x higher; debouncing is the single biggest server-side optimisation. The 1M term vocabulary fits comfortably in Redis - sorted sets with TTL keep it fresh, with rate limiting protecting the endpoint from abuse.
What does the architecture look like?
flowchart LR
Client[Browser] -->|debounce 200ms| App[ASP.NET Core]
App -->|ZRANGE prefix| Redis[(Redis<br/>sorted sets)]
App --> Result[top K completions]
SearchLog[Search clicks] --> Stream[(Kafka / event queue)]
Stream --> Agg[Aggregator]
Agg -->|update scores| Redis
Reload[Nightly batch] -->|full rebuild| Redis
The query path is one Redis call. The freshness path is a streaming aggregator updating scores. A nightly batch job rebuilds from scratch to recover from any drift.
What is the .NET 10 wiring with Redis sorted sets?
Index time:
public class AutocompleteIndexer(IConnectionMultiplexer redis)
{
public async Task IndexTermAsync(string term, double popularity)
{
var db = redis.GetDatabase();
var lower = term.ToLowerInvariant();
// For each prefix from length 2 to len(term), add the term to a sorted set.
for (var len = 2; len <= lower.Length; len++)
{
var prefix = lower.Substring(0, len);
await db.SortedSetAddAsync($"ac:{prefix}", term, popularity);
}
}
}
// Or store term once with a separate "all prefixes share the trie" model:
// More memory-efficient version uses a trie node per prefix.
Query time:
app.MapGet("/autocomplete", async (string q, IConnectionMultiplexer redis) =>
{
if (string.IsNullOrWhiteSpace(q) || q.Length < 2) return Results.Ok(Array.Empty<string>());
var db = redis.GetDatabase();
var key = $"ac:{q.ToLowerInvariant()}";
var top = await db.SortedSetRangeByRankAsync(key, 0, 9, Order.Descending);
return Results.Ok(top.Select(v => v.ToString()).ToArray());
})
.RequireRateLimiting("autocomplete");
Three details. The prefix-fanout at index time multiplies storage
by the average term length (~10x). For a million terms, that is
~10M sorted-set entries - cheap on Redis. The query is a single
ZRANGE - sub-millisecond. Rate-limiting is essential because
autocomplete endpoints are easy to abuse.
How does the freshness pipeline keep rankings recent?
flowchart LR
SearchEvent --> K[(Kafka)]
K --> Counter[Counting worker]
Counter --> Window[(Sliding 7d counts)]
Window --> Updater[Score updater]
Updater -->|ZADD or ZINCRBY| Redis
Reload[Nightly job] --> Redis
Search events stream through Kafka. A windowed aggregator computes
7-day counts per term. An updater pushes the new scores to Redis
via ZADD (overwrite) or ZINCRBY (incremental). The nightly
job is the safety net that rebuilds the entire index from logs;
even if every streaming worker breaks, autocomplete catches up
within 24 hours.
What scale-out path does this support?
- Redis: cluster, shard sorted sets by prefix hash; one prefix fits one shard so no cross-shard query.
- App tier: stateless, scales horizontally; each query is one Redis call so the bottleneck is Redis CPU, not app.
- Streaming pipeline: Kafka partitioning + Flink/Spark or custom .NET workers; partition by term hash.
- Storage of cold terms: Postgres for the canonical list; Redis for the hot prefixes only.
For very large vocabularies (>10M terms), a custom trie service in .NET (a server holding the trie in memory) cuts Redis memory substantially.
What failure modes does this introduce?
- Stale rankings - the streaming pipeline is broken; rankings are outdated. Mitigation: nightly batch rebuild as the safety net; alert on aggregator lag.
- Hot prefix - "a" returns 10K results across the user base; one Redis CPU saturates. Mitigation: cache hot prefixes in IMemoryCache per app instance for a few seconds.
- Cold prefix on first user - new product launches, prefix has no data. Mitigation: fall back to Elasticsearch query if Redis returns empty; populate Redis on miss.
- Profanity in suggestions - user-entered queries become suggestions to others. Mitigation: blocklist filter at index time and at query time.
When is autocomplete overkill?
When the input is from a small fixed set (state names, language codes) - load on the client and filter there. When the search backend already returns top-10 results in under 50 ms (small indexes), call it directly. The dedicated autocomplete service is for the case where the search engine is too slow or too expensive to call on every keystroke.
Where should you go from here?
Next case study: payment / checkout system - the case where idempotency, sagas, and reliability all collide. The hardest case study in the series.
Frequently asked questions
Trie or Redis sorted set?
Why do I need to debounce on the client?
How do I keep the rankings fresh?
What about typo tolerance?
Fuzziness AUTO. The search chapter covers fuzzy matching; combine the two for the best UX.