Case Studies Intermediate 5 min read

Design a URL Shortener (TinyURL/Bitly) in .NET

End-to-end design for a URL shortener: capacity estimation, base62 encoding, Postgres + Redis storage, click analytics, and the ASP.NET Core code that ties it together.

Table of contents
  1. When does someone actually ask you to build this?
  2. What back-of-envelope numbers shape the design?
  3. What does the architecture look like?
  4. How does the .NET 10 implementation look end-to-end?
  5. What does the click analytics pipeline look like?
  6. What scale-out path does the design support?
  7. What failure modes need monitoring?
  8. When is a custom URL shortener overkill?
  9. Where should you go from here?

The URL shortener is the simplest end-to-end system you can build that exercises every block in the series: cache, database, queue, observability, rate limiting. This chapter designs one, then wires it in ASP.NET Core, with the back-of-envelope numbers that justify each component choice.

When does someone actually ask you to build this?

Three contexts. Interview, where it is the warm-up question. Internal tool, where Slack/email links need to be shorter and trackable. Product feature, where you are building Bitly or a QR-code service.

The architectural ideas transfer to any "single-key resource lookup" system: feature flags, geo-DNS, A/B test assignment. URL shortener is the canonical training problem because it makes every constraint explicit.

What back-of-envelope numbers shape the design?

Reuse the calculations from chapter 2:

DAU                 1M
Shortens / day      1M  (one per active user)
Redirects / day     100M (100:1 read:write ratio)
Peak redirects/s    100M / 100K * 5 = 5K req/s
Avg URL row size    200 bytes (short + long + meta)
Storage / year      1M * 365 * 200 = 73 GB
Cache hit rate      90% (Zipf distribution; 1% URLs serve 90% reads)
DB read load        500 req/s after cache

The numbers say: one Postgres node, one Redis cache, two ASP.NET Core replicas, one analytics queue. No sharding, no NoSQL, no microservices.

What does the architecture look like?

flowchart LR
    Client[Browser] -->|GET /abc123| LB[Load Balancer]
    LB --> App[ASP.NET Core]
    App -->|GET cache| Redis[(Redis cache)]
    Redis -->|miss| App
    App -->|SELECT long_url| PG[(Postgres)]
    App -->|publish ClickEvent| Q[(Queue)]
    App -->|301 / 302 redirect| Client
    Q --> Analytics[Analytics worker]
    Analytics --> CH[(ClickHouse / DW)]

Two paths. Hot path: redirect, cache hit, single Redis call, return. Cold path: cache miss, Postgres SELECT, populate cache, return. Analytics is async via queue - never block the redirect on click counting.

How does the .NET 10 implementation look end-to-end?

// Schema
public class ShortUrl
{
    public long Id { get; set; }
    public string Code { get; set; } = "";   // base62 of Id
    public string LongUrl { get; set; } = "";
    public DateTime CreatedAt { get; set; }
    public Guid OwnerId { get; set; }
}

// Code generator
public static class Base62
{
    private const string Alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    public static string Encode(long n)
    {
        if (n == 0) return "0";
        var sb = new StringBuilder();
        while (n > 0) { sb.Insert(0, Alphabet[(int)(n % 62)]); n /= 62; }
        return sb.ToString();
    }
}

// Shorten endpoint
app.MapPost("/shorten", async (ShortenDto dto, AppDbContext db, ClaimsPrincipal user) =>
{
    var entity = new ShortUrl { LongUrl = dto.Url, OwnerId = user.GetUserId(), CreatedAt = DateTime.UtcNow };
    db.ShortUrls.Add(entity);
    await db.SaveChangesAsync();
    entity.Code = Base62.Encode(entity.Id);   // Id assigned by DB after Save
    await db.SaveChangesAsync();
    return Results.Ok(new { code = entity.Code, url = $"https://anhtu.dev/{entity.Code}" });
})
.RequireAuthorization()
.RequireRateLimiting("per-user");

// Redirect endpoint
app.MapGet("/{code}", async (string code, IDistributedCache cache, AppDbContext db,
                              IPublishEndpoint bus, HttpContext ctx) =>
{
    var cacheKey = $"u:{code}";
    var cached = await cache.GetStringAsync(cacheKey);
    if (cached is not null)
    {
        await bus.Publish(new ClickEvent(code, DateTimeOffset.UtcNow, ctx.Connection.RemoteIpAddress?.ToString()));
        return Results.Redirect(cached, permanent: false);  // 302
    }

    var url = await db.ShortUrls.AsNoTracking().Where(u => u.Code == code).Select(u => u.LongUrl).FirstOrDefaultAsync();
    if (url is null) return Results.NotFound();

    await cache.SetStringAsync(cacheKey, url,
        new DistributedCacheEntryOptions { AbsoluteExpirationRelativeToNow = TimeSpan.FromHours(24) });
    await bus.Publish(new ClickEvent(code, DateTimeOffset.UtcNow, ctx.Connection.RemoteIpAddress?.ToString()));
    return Results.Redirect(url, permanent: false);
});

Three details. The double SaveChangesAsync is the simplest pattern to use the auto-generated ID for the code; production systems batch this with a sequence pre-allocation. Results.Redirect(..., permanent: false) returns 302 and lets us count clicks. The click event goes to a queue (chapter 6) - never to the database synchronously.

What does the click analytics pipeline look like?

flowchart LR
    App[Redirect handler] --> Q[(RabbitMQ)]
    Q --> W1[Click counter worker]
    Q --> W2[Geo enricher worker]
    W1 --> CH[(ClickHouse)]
    W2 --> CH
    CH --> Dash[Dashboard]

The queue fans out to multiple consumers. The counter aggregates per minute and writes to ClickHouse (or Postgres for low traffic). The geo enricher resolves IP to country and writes the enriched event. Dashboards read aggregates from ClickHouse. The analytics events case study covers this layer in detail.

What scale-out path does the design support?

Each component scales independently:

The only design choice that hurts at scale is the auto-increment ID - distributed sequences need a separate service (Snowflake-style ID generator) above ~10K shortens/sec.

What failure modes need monitoring?

When is a custom URL shortener overkill?

Two cases.

One: low volume. If you shorten 100 URLs/day for an internal tool, a Bitly account is cheaper. The build cost of the schema + analytics + abuse handling is not worth recreating for small scale.

Two: rich tracking needs. UTM parameters, conversion tracking, A/B link tests - these are 80% of what specialised products (Bitly, Rebrandly) charge for. Building from scratch means rebuilding their feature set, not just the redirect.

Where should you go from here?

Next case study: design a rate limiter - the rate-limit middleware from chapter 14 was the consumer side; the case study shows how to build a distributed limiter from scratch. After that the news feed and chat case studies get more involved.

Frequently asked questions

Hash or counter for the short key?
Counter encoded as base62 wins for reads (sequential, cache-friendly) and for storage (no collision retry). Hash (MD5/SHA truncated) wins when the key must be unguessable - public link sharing where users should not enumerate. Most production systems are counter-based with a salt to obscure the order; collision handling is unnecessary.
301 or 302 for the redirect?
302 (temporary) for analytics, 301 (permanent) for performance. 301 lets browsers cache the redirect forever - subsequent clicks bypass your service, which is great for load but terrible for click counting. Most shorteners use 302 because click tracking is the product. Cache-Control: private, max-age=86400 on the 302 keeps day-long re-clicks in the browser cache.
Why not just hash the long URL?
Two problems: (1) hash collisions are real even at 8 chars and add a retry loop on writes; (2) the same long URL hashes to the same short key, so two users get the same key - usually undesired since each user wants their own link with their own analytics. Counter-based gives one short per shorten request and avoids both problems.
How do I handle the 'celebrity URL' case?
One short URL going viral generates millions of redirect requests per minute. The cache absorbs almost all of it - the URL is hot, Redis answers in <1 ms. Click events go to a queue and are aggregated downstream; never write to a counter on every click. The news feed case study covers the same hot-key pattern in more depth.