October 4, 2023 • 10 min read • Part 4 of 10

Part 4: Data Modeling

Best practices for schema design, primary keys, and denormalization in ClickHouse.

ClickHouse Data Modeling Schema Design

Modeling data in ClickHouse is different from traditional relational databases. Forget 3rd Normal Form; here, we embrace denormalization.

Denormalization is King

Joins in distributed systems are expensive. In ClickHouse, it’s often better to store wide tables with many columns rather than joining multiple tables at query time.

Choosing a Primary Key

The primary key in ClickHouse (defined in ORDER BY) determines how data is sorted on disk. This is critical for query performance.

Rule of Thumb

Choose columns that you frequently filter by in your WHERE clause. Put low-cardinality columns first in the key.

Data Types Matter

Using the correct data types saves storage and improves speed.

Use LowCardinality(String) for strings with few unique values.
Use DateTime or DateTime64 for timestamps.
Use UInt* types for non-negative numbers.

Conclusion

A well-designed schema is the foundation of a fast ClickHouse cluster. In the next part, we’ll learn how to get data INTO that schema.

← Part 3 - The MergeTree Family Next: Part 5 - Ingestion →

Tags: ClickHouse Data Modeling Schema Design

← Back to Blog