Part 4: Data Modeling
Best practices for schema design, primary keys, and denormalization in ClickHouse.
Modeling data in ClickHouse is different from traditional relational databases. Forget 3rd Normal Form; here, we embrace denormalization.
Denormalization is King
Joins in distributed systems are expensive. In ClickHouse, it’s often better to store wide tables with many columns rather than joining multiple tables at query time.
Choosing a Primary Key
The primary key in ClickHouse (defined in ORDER BY)
determines how data is sorted on disk. This is critical for query
performance.
Rule of Thumb
Choose columns that you frequently filter by in your WHERE clause. Put low-cardinality columns first in the key.
Data Types Matter
Using the correct data types saves storage and improves speed.
Use
LowCardinality(String)for strings with few unique values.Use
DateTimeorDateTime64for timestamps.- Use
UInt*types for non-negative numbers.
Conclusion
A well-designed schema is the foundation of a fast ClickHouse cluster. In the next part, we’ll learn how to get data INTO that schema.