ClickHouse — Real-Time Analytics Database for Large-Scale Systems

Posted on: 4/25/2026 10:10:31 AM

When your system starts processing billions of events per day — from access logs, monitoring metrics, to clickstream data — traditional databases begin to choke. PostgreSQL, MySQL, or SQL Server are all designed for OLTP (transaction processing), not for scanning millions of rows in the blink of an eye. That's exactly where ClickHouse comes in — an open-source columnar database, designed from the ground up for real-time analytics at massive scale.

100xFaster than PostgreSQL for analytical queries
1B+Rows inserted per second (cluster)
6xLess storage compared to row-store
<100msQuery latency on petabyte-scale data

1. Why Do We Need a Columnar Database?

In traditional row-oriented databases, each row is stored contiguously on disk. When you run an analytical query like SELECT AVG(price) FROM orders WHERE created_at > '2026-01-01', the database has to read every complete row — including columns you don't need (customer_name, address, phone...) — before extracting the price column.

Columnar databases flip this logic: data for each column is stored contiguously. The query above only needs to read 2 columns (price and created_at), completely skipping the rest. With a 50-column table, you only read 4% of the data.

graph LR
    subgraph ROW["Row-Oriented Storage"]
        R1["Row 1: id|name|price|date|addr"]
        R2["Row 2: id|name|price|date|addr"]
        R3["Row 3: id|name|price|date|addr"]
    end
    subgraph COL["Column-Oriented Storage"]
        C1["id: 1, 2, 3, 4, 5..."]
        C2["name: A, B, C, D, E..."]
        C3["price: 10, 20, 15, 30..."]
        C4["date: 01, 02, 03, 04..."]
    end
    Q["SELECT AVG(price)
WHERE date > X"] --> COL Q -.->|"Reads entire rows"| ROW style ROW fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50 style COL fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50 style Q fill:#e94560,stroke:#fff,color:#fff

How row-store vs column-store read data differently

2. How Does ClickHouse Work?

2.1. MergeTree Engine Architecture

At the heart of ClickHouse is the MergeTree engine family — a storage engine designed for extremely fast writes and high-performance analytical queries. When you INSERT data, ClickHouse doesn't insert rows one by one into a B-tree index like traditional databases. Instead:

  1. Writes as immutable "parts": Each INSERT batch creates a "part" — a directory containing data sorted by primary key, with each column stored as a separate file.
  2. Background merging: ClickHouse continuously merges smaller parts into larger ones in the background, similar to how LSM-trees work — but optimized for columnar data.
  3. Sparse primary index: Instead of indexing every row, ClickHouse only stores an index entry for every 8,192 rows (a granule). A 1-billion-row table needs only ~122,000 index entries — small enough to fit entirely in RAM.
graph TD
    INSERT["INSERT batch"] --> P1["Part 1
(sorted, immutable)"] INSERT --> P2["Part 2"] INSERT --> P3["Part 3"] P1 --> MERGE["Background Merge"] P2 --> MERGE P3 --> MERGE MERGE --> BIG["Merged Part
(larger, optimized)"] BIG --> QUERY["Query Engine"] QUERY --> IDX["Sparse Index
(1 entry / 8192 rows)"] IDX --> SCAN["Column Scan
(reads only needed columns)"] style INSERT fill:#e94560,stroke:#fff,color:#fff style MERGE fill:#2c3e50,stroke:#fff,color:#fff style BIG fill:#16213e,stroke:#fff,color:#fff style QUERY fill:#4CAF50,stroke:#fff,color:#fff style IDX fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style SCAN fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

Data flow in the MergeTree Engine

2.2. Vectorized Query Execution

ClickHouse processes data in batches (chunks) rather than one row at a time — a technique called vectorized execution, inspired by the MonetDB/X100 model. Each operator in the query pipeline receives a block of thousands of rows, applies computations to the entire block at once, fully leveraging SIMD instructions (Single Instruction, Multiple Data) of modern CPUs.

Why is vectorized execution faster?

In the traditional Volcano model (row-at-a-time), each row must pass through a chain of virtual function calls. With 1 billion rows, that's 1 billion virtual function invocations. Vectorized execution reduces this to just ~122,000 calls (1B / 8,192) — a 99.99% reduction in overhead. Combined with SIMD, the CPU can process 4-16 values simultaneously in a single instruction cycle.

2.3. Smart Data Compression

Because data in the same column shares the same type and often has repeating patterns, ClickHouse achieves extremely high compression ratios:

CodecBest ForCompression Ratio
LZ4Default, balanced speed and compression3-5x
ZSTDCold data, prioritize storage savings5-10x
Delta + ZSTDTimestamp columns, monotonically increasing values10-20x
T64Integer columns with small range8-15x
DoubleDeltaTime-series data15-40x

A 1TB log table in PostgreSQL can shrink to just 60-150GB in ClickHouse — significant storage cost savings, especially on cloud infrastructure.

3. MergeTree Engine Variants

ClickHouse offers more than just the basic MergeTree — the family includes several variants for different use cases:

EngineFunctionUse Case
ReplacingMergeTreeAutomatically deduplicates rows with same primary key during mergeCDC (Change Data Capture), upsert patterns
AggregatingMergeTreePre-aggregates data during the merge processMaterialized views, metric rollups
CollapsingMergeTreeSupports update/delete logic by inserting "cancel" rowsData with frequently changing state
SummingMergeTreeAutomatically sums numeric columns during mergeCounters, revenue aggregation
VersionedCollapsingMergeTreeLike Collapsing but supports out-of-order insertsDistributed ingestion without guaranteed ordering

4. Practical Schema Design for Log Analytics

Imagine you're building an API monitoring system handling hundreds of millions of requests per day. Here's how to design an optimized schema in ClickHouse:

CREATE TABLE api_logs
(
    timestamp DateTime64(3),
    date Date DEFAULT toDate(timestamp),
    service LowCardinality(String),
    endpoint LowCardinality(String),
    method Enum8('GET' = 1, 'POST' = 2, 'PUT' = 3, 'DELETE' = 4, 'PATCH' = 5),
    status_code UInt16,
    response_time_ms UInt32,
    request_size UInt32,
    response_size UInt32,
    user_id UInt64,
    trace_id UUID,
    error_message Nullable(String),
    tags Map(String, String)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (service, endpoint, timestamp)
TTL date + INTERVAL 90 DAY DELETE
SETTINGS index_granularity = 8192;

Key Design Principles

  • LowCardinality: Use for string columns with few unique values (service name, endpoint). Creates automatic dictionary encoding, reducing storage 5-10x and speeding up filters.
  • Enum8/Enum16: For columns with a fixed set of values (HTTP method). Stores only 1-2 bytes instead of character strings.
  • ORDER BY: Order from low cardinality to high. Filtering by service + endpoint becomes extremely fast thanks to the sparse index.
  • PARTITION BY: Partition by month. TTL automatically deletes data older than 90 days — no cron job needed.
  • Map type: Flexible for non-fixed metadata (tags, labels) without needing ALTER TABLE.

4.1. Sample Analytical Queries

-- Top 10 slowest endpoints in the last 24 hours
SELECT
    service,
    endpoint,
    quantile(0.95)(response_time_ms) AS p95_ms,
    quantile(0.99)(response_time_ms) AS p99_ms,
    count() AS total_requests,
    countIf(status_code >= 500) AS errors,
    round(countIf(status_code >= 500) * 100.0 / count(), 2) AS error_rate
FROM api_logs
WHERE timestamp >= now() - INTERVAL 24 HOUR
GROUP BY service, endpoint
ORDER BY p95_ms DESC
LIMIT 10;
-- Hourly traffic pattern analysis (heatmap data)
SELECT
    toHour(timestamp) AS hour,
    toDayOfWeek(timestamp) AS day_of_week,
    count() AS requests,
    avg(response_time_ms) AS avg_latency
FROM api_logs
WHERE date >= today() - 7
GROUP BY hour, day_of_week
ORDER BY day_of_week, hour;

5. Materialized Views — Pre-compute for Speed

One of ClickHouse's most powerful features is Materialized Views — allowing you to pre-aggregate data as it's inserted, without needing a separate ETL pipeline.

-- Create destination table for per-minute metrics
CREATE TABLE api_metrics_1m
(
    timestamp DateTime,
    service LowCardinality(String),
    endpoint LowCardinality(String),
    requests AggregateFunction(count, UInt64),
    p95_latency AggregateFunction(quantile(0.95), UInt32),
    errors AggregateFunction(sum, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY (service, endpoint, timestamp);

-- Materialized view auto-aggregates on INSERT
CREATE MATERIALIZED VIEW api_metrics_1m_mv
TO api_metrics_1m AS
SELECT
    toStartOfMinute(timestamp) AS timestamp,
    service,
    endpoint,
    countState() AS requests,
    quantileState(0.95)(response_time_ms) AS p95_latency,
    sumState(if(status_code >= 500, 1, 0)) AS errors
FROM api_logs
GROUP BY timestamp, service, endpoint;

Real-World Impact

Querying the source table api_logs (1 billion rows): ~2-5 seconds. The same query on api_metrics_1m (pre-aggregated data only): ~10-50ms. That's a 100-500x speedup, with zero additional ETL effort.

6. ClickHouse vs Other Analytics Solutions

CriteriaClickHousePostgreSQLElasticsearch
Storage ModelColumnar (OLAP)Row-oriented (OLTP)Inverted Index + Doc Store
Analytical Query SpeedUltra-fast (~ms on billions of rows)Slow (minutes-hours on billions)Moderate (seconds-minutes)
INSERT ThroughputMillions of rows/secThousands of rows/secTens of thousands docs/sec
Data CompressionExcellent (6-20x)Average (2-3x)Poor (1.5-3x)
Full-text SearchBasic (bloom filter)Good (tsvector)Excellent (inverted index)
UPDATE/DELETELimited (async mutation)Excellent (MVCC)Good (doc-level)
Operational CostLow (less RAM, less storage)MediumHigh (RAM-heavy)
SQL SupportVery comprehensiveMost completeLimited (DSL-first)

7. Production Deployment Architecture

A production analytics system typically combines ClickHouse with other components following the Kappa architecture pattern:

graph LR
    APP["Application
Services"] --> KAFKA["Apache Kafka
/ Redpanda"] KAFKA --> CH["ClickHouse
Cluster"] CH --> GRAFANA["Grafana
Dashboard"] CH --> API["Analytics API
(.NET / Node.js)"] API --> UI["Frontend
(Vue.js)"] CH --> MV["Materialized
Views"] MV --> CH KAFKA --> REDIS["Redis
(real-time cache)"] REDIS --> API style APP fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style KAFKA fill:#2c3e50,stroke:#fff,color:#fff style CH fill:#e94560,stroke:#fff,color:#fff style GRAFANA fill:#16213e,stroke:#fff,color:#fff style API fill:#4CAF50,stroke:#fff,color:#fff style UI fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50 style MV fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style REDIS fill:#ff9800,stroke:#fff,color:#fff

Kappa Architecture with ClickHouse as the analytics core

Data Flow Explained

  1. Ingestion: Application services send events to Kafka/Redpanda. Never write directly to ClickHouse to avoid many small INSERTs (anti-pattern).
  2. Batch insert: ClickHouse Kafka engine or a consumer service reads batches from Kafka, inserting in large blocks (10,000-100,000 rows/batch) — the most optimal approach.
  3. Pre-aggregation: Materialized Views automatically aggregate data in real-time upon insert, serving dashboards faster.
  4. Query layer: Grafana connects directly (official plugin). An API layer serves custom analytics for the frontend.
  5. Hot cache: Redis caches frequently-requested query results, reducing ClickHouse load under high concurrent queries.

8. Query Performance Optimization

8.1. The Golden Rule: Read Less Data

ClickHouse is fastest when it reads the least data. Every optimization revolves around this principle:

Query Optimization Checklist

  • Filter first, transform later: Always put WHERE clauses filtering on columns in the ORDER BY key. ClickHouse uses the sparse index to skip entire irrelevant granules.
  • Use PREWHERE: For small filter columns. ClickHouse reads the PREWHERE column first, eliminates non-matching rows, then reads remaining columns — significantly reducing I/O.
  • Avoid SELECT *: Only select the columns you need. Each additional column = additional I/O.
  • Projections: Create alternative physical sort orders for queries with filter patterns different from the PRIMARY KEY.
  • Skip indexes: Bloom filter indexes for high-cardinality columns (trace_id, user_id) not in the ORDER BY.

8.2. PREWHERE and Skip Index Examples

-- Add bloom filter index for trace_id
ALTER TABLE api_logs
ADD INDEX idx_trace_id trace_id TYPE bloom_filter(0.01) GRANULARITY 4;

-- ClickHouse auto-moves lightweight filters to PREWHERE
-- But you can force it when needed:
SELECT timestamp, endpoint, response_time_ms, error_message
FROM api_logs
PREWHERE service = 'payment-service'
WHERE timestamp >= now() - INTERVAL 1 HOUR
  AND status_code >= 500;

9. ClickHouse Cloud and SharedMergeTree

ClickHouse Cloud (managed service) introduces SharedMergeTree — a new engine that completely separates storage and compute:

FeatureMergeTree (self-hosted)SharedMergeTree (Cloud)
StorageLocal disk or attached volumeObject storage (S3/GCS)
Compute ScalingMust rebalance data when adding nodesAdd stateless nodes, query immediately
Merge ProcessRuns on query nodes, affects latencyRuns on separate nodes, no query impact
CostPay for servers 24/7Pay-per-query + storage
Best ForStable workloads, want full controlVariable workloads, optimize for cost

10. When NOT to Use ClickHouse

ClickHouse is not a silver bullet

  • OLTP applications: Need frequent UPDATE/DELETE, ACID transactions → use PostgreSQL, SQL Server.
  • Complex full-text search: Need relevance scoring, fuzzy matching → use Elasticsearch/OpenSearch.
  • Small data (<100GB): PostgreSQL with proper indexes is fast enough, no need for extra complexity.
  • Heavy multi-table JOINs: ClickHouse supports JOINs but isn't optimized like traditional RDBMS for complex join patterns.
  • Point lookups: Fetching a single row by primary key — row-stores are faster.

Conclusion

ClickHouse has proven itself as the leading solution for real-time data analytics at massive scale. With its columnar architecture, vectorized execution, and smart compression system, it enables processing billions of rows in milliseconds — something traditional databases need minutes or hours for. When combined with Kafka for ingestion, Redis for caching, and Grafana for visualization, ClickHouse becomes the centerpiece of a modern analytics architecture, meeting the increasingly complex observability and business intelligence needs of modern software systems.

References