October 3, 2023 • 10 min read • Part 3 of 10

Part 3: The MergeTree Family

Understanding the core engine of ClickHouse: MergeTree, Sorting, and Partitioning.

ClickHouse MergeTree Engines

If ClickHouse is a race car, the MergeTree engine is its V12 engine. It’s the default and most versatile table engine in ClickHouse.

What is MergeTree?

The MergeTree family of engines is designed for inserting very large amounts of data into a table. The data is quickly written to the table part by part, and then rules are applied for merging these parts in the background.

Key Concepts

Primary Key

Unlike other DBs, the primary key in ClickHouse is not unique. It’s used for sorting data and index granularity.

Partitioning

Partitioning splits your data into logical blocks (e.g., by month) to optimize queries and data management.

Creating a Table

CREATE TABLE hits (
timestamp DateTime,
user_id UInt64,
url String,
event_type Enum(‘view’ = 1, ‘click’ = 2)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id);

How Merges Work

When you insert data, ClickHouse creates a new “part” on disk. In the background, ClickHouse merges these parts together to keep the number of files manageable and data sorted.

Conclusion

Understanding MergeTree is crucial for mastering ClickHouse. In the next part, we’ll look at how to model your data effectively.

← Part 2 - Installation Next: Part 4 - Data Modeling →

Tags: ClickHouse MergeTree Engines

← Back to Blog