Part 4: Data Modeling & Schema Design
How to design tables for a distributed system. Why UUIDs are your best friend and SERIAL is your enemy.
In a single-node database, you don’t worry much about where data lives. In a distributed database, data locality is everything.
The Primary Key is King
In CockroachDB, the Primary Key determines how data is sorted and distributed across the cluster. It’s not just a unique constraint; it’s the physical location of your data.
The Problem with SERIAL
In Postgres, we love SERIAL (auto-incrementing integers). In CockroachDB, this is a bad idea.
Why? Because 1, 2, 3, 4… are all close together. This creates a “hot range”. All your writes will go to the single node that holds the end of the table.
Enter UUIDs
Use UUID for primary keys. They are random, which means your data is spread evenly across all ranges (and thus all nodes).
-- Good
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
amount DECIMAL
);
JSONB Support
CockroachDB has excellent support for JSONB (Binary JSON). This allows you to have a semi-structured schema, which is great for rapid development.
SELECT * FROM users WHERE metadata->>'role' = 'admin';
You can even create inverted indexes on JSONB columns for blazing fast lookups.
Conclusion
Distributed data modeling requires a shift in thinking. Avoid hotspots, use UUIDs, and leverage JSONB. Next, let’s break things.