Design a URL Shortener (TinyURL/Bitly) in .NET
End-to-end design for a URL shortener: capacity estimation, base62 encoding, Postgres + Redis storage, click analytics, and the ASP.NET Core code that ties it together.
Table of contents
- When does someone actually ask you to build this?
- What back-of-envelope numbers shape the design?
- What does the architecture look like?
- How does the .NET 10 implementation look end-to-end?
- What does the click analytics pipeline look like?
- What scale-out path does the design support?
- What failure modes need monitoring?
- When is a custom URL shortener overkill?
- Where should you go from here?
The URL shortener is the simplest end-to-end system you can build that exercises every block in the series: cache, database, queue, observability, rate limiting. This chapter designs one, then wires it in ASP.NET Core, with the back-of-envelope numbers that justify each component choice.
When does someone actually ask you to build this?
Three contexts. Interview, where it is the warm-up question. Internal tool, where Slack/email links need to be shorter and trackable. Product feature, where you are building Bitly or a QR-code service.
The architectural ideas transfer to any "single-key resource lookup" system: feature flags, geo-DNS, A/B test assignment. URL shortener is the canonical training problem because it makes every constraint explicit.
What back-of-envelope numbers shape the design?
Reuse the calculations from chapter 2:
DAU 1M
Shortens / day 1M (one per active user)
Redirects / day 100M (100:1 read:write ratio)
Peak redirects/s 100M / 100K * 5 = 5K req/s
Avg URL row size 200 bytes (short + long + meta)
Storage / year 1M * 365 * 200 = 73 GB
Cache hit rate 90% (Zipf distribution; 1% URLs serve 90% reads)
DB read load 500 req/s after cache
The numbers say: one Postgres node, one Redis cache, two ASP.NET Core replicas, one analytics queue. No sharding, no NoSQL, no microservices.
What does the architecture look like?
flowchart LR
Client[Browser] -->|GET /abc123| LB[Load Balancer]
LB --> App[ASP.NET Core]
App -->|GET cache| Redis[(Redis cache)]
Redis -->|miss| App
App -->|SELECT long_url| PG[(Postgres)]
App -->|publish ClickEvent| Q[(Queue)]
App -->|301 / 302 redirect| Client
Q --> Analytics[Analytics worker]
Analytics --> CH[(ClickHouse / DW)]
Two paths. Hot path: redirect, cache hit, single Redis call, return. Cold path: cache miss, Postgres SELECT, populate cache, return. Analytics is async via queue - never block the redirect on click counting.
How does the .NET 10 implementation look end-to-end?
// Schema
public class ShortUrl
{
public long Id { get; set; }
public string Code { get; set; } = ""; // base62 of Id
public string LongUrl { get; set; } = "";
public DateTime CreatedAt { get; set; }
public Guid OwnerId { get; set; }
}
// Code generator
public static class Base62
{
private const string Alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
public static string Encode(long n)
{
if (n == 0) return "0";
var sb = new StringBuilder();
while (n > 0) { sb.Insert(0, Alphabet[(int)(n % 62)]); n /= 62; }
return sb.ToString();
}
}
// Shorten endpoint
app.MapPost("/shorten", async (ShortenDto dto, AppDbContext db, ClaimsPrincipal user) =>
{
var entity = new ShortUrl { LongUrl = dto.Url, OwnerId = user.GetUserId(), CreatedAt = DateTime.UtcNow };
db.ShortUrls.Add(entity);
await db.SaveChangesAsync();
entity.Code = Base62.Encode(entity.Id); // Id assigned by DB after Save
await db.SaveChangesAsync();
return Results.Ok(new { code = entity.Code, url = $"https://anhtu.dev/{entity.Code}" });
})
.RequireAuthorization()
.RequireRateLimiting("per-user");
// Redirect endpoint
app.MapGet("/{code}", async (string code, IDistributedCache cache, AppDbContext db,
IPublishEndpoint bus, HttpContext ctx) =>
{
var cacheKey = $"u:{code}";
var cached = await cache.GetStringAsync(cacheKey);
if (cached is not null)
{
await bus.Publish(new ClickEvent(code, DateTimeOffset.UtcNow, ctx.Connection.RemoteIpAddress?.ToString()));
return Results.Redirect(cached, permanent: false); // 302
}
var url = await db.ShortUrls.AsNoTracking().Where(u => u.Code == code).Select(u => u.LongUrl).FirstOrDefaultAsync();
if (url is null) return Results.NotFound();
await cache.SetStringAsync(cacheKey, url,
new DistributedCacheEntryOptions { AbsoluteExpirationRelativeToNow = TimeSpan.FromHours(24) });
await bus.Publish(new ClickEvent(code, DateTimeOffset.UtcNow, ctx.Connection.RemoteIpAddress?.ToString()));
return Results.Redirect(url, permanent: false);
});
Three details. The double SaveChangesAsync is the simplest pattern
to use the auto-generated ID for the code; production systems batch
this with a sequence pre-allocation. Results.Redirect(..., permanent: false)
returns 302 and lets us count clicks. The click event goes to a
queue (chapter 6) - never to
the database synchronously.
What does the click analytics pipeline look like?
flowchart LR
App[Redirect handler] --> Q[(RabbitMQ)]
Q --> W1[Click counter worker]
Q --> W2[Geo enricher worker]
W1 --> CH[(ClickHouse)]
W2 --> CH
CH --> Dash[Dashboard]
The queue fans out to multiple consumers. The counter aggregates per minute and writes to ClickHouse (or Postgres for low traffic). The geo enricher resolves IP to country and writes the enriched event. Dashboards read aggregates from ClickHouse. The analytics events case study covers this layer in detail.
What scale-out path does the design support?
Each component scales independently:
- Web tier: stateless, add replicas behind the load balancer.
- Cache: scale Redis cluster; partition by code prefix.
- DB: at 1B URLs, partition the table by year of creation; read replicas for the analytics dashboards.
- Queue: RabbitMQ cluster; or migrate to Kafka if click volume exceeds 1M/s.
- Analytics: ClickHouse shards naturally by date.
The only design choice that hurts at scale is the auto-increment ID - distributed sequences need a separate service (Snowflake-style ID generator) above ~10K shortens/sec.
What failure modes need monitoring?
- Cache stampede on a viral URL - one key getting 100K rps; the cache handles it but DB falls over on miss. Mitigation: single-flight lock on cache miss; pre-populate hot keys.
- Code collision (if hashing) - rare but real. Mitigation: retry with new salt; counter-based avoids it entirely.
- Click queue backlog - analytics consumer slow, queue grows. Mitigation: alert on queue depth; drop the observability metrics.
- Open redirect abuse - spammers shorten malicious URLs. Mitigation: domain allowlist, malware scanning at shorten time.
When is a custom URL shortener overkill?
Two cases.
One: low volume. If you shorten 100 URLs/day for an internal tool, a Bitly account is cheaper. The build cost of the schema + analytics + abuse handling is not worth recreating for small scale.
Two: rich tracking needs. UTM parameters, conversion tracking, A/B link tests - these are 80% of what specialised products (Bitly, Rebrandly) charge for. Building from scratch means rebuilding their feature set, not just the redirect.
Where should you go from here?
Next case study: design a rate limiter - the rate-limit middleware from chapter 14 was the consumer side; the case study shows how to build a distributed limiter from scratch. After that the news feed and chat case studies get more involved.