November 4, 2023 • 10 min read • Part 4 of 10

Part 4: Data Modeling & Schema Design

How to design tables for a distributed system. Why UUIDs are your best friend and SERIAL is your enemy.

CockroachDB Data Modeling Schema

In a single-node database, you don’t worry much about where data lives. In a distributed database, data locality is everything.

The Primary Key is King

In CockroachDB, the Primary Key determines how data is sorted and distributed across the cluster. It’s not just a unique constraint; it’s the physical location of your data.

The Problem with SERIAL

In Postgres, we love SERIAL (auto-incrementing integers). In CockroachDB, this is a bad idea.

Why? Because 1, 2, 3, 4… are all close together. This creates a “hot range”. All your writes will go to the single node that holds the end of the table.

Enter UUIDs

Use UUID for primary keys. They are random, which means your data is spread evenly across all ranges (and thus all nodes).

-- Good
CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    amount DECIMAL
);

JSONB Support

CockroachDB has excellent support for JSONB (Binary JSON). This allows you to have a semi-structured schema, which is great for rapid development.

SELECT * FROM users WHERE metadata->>'role' = 'admin';

You can even create inverted indexes on JSONB columns for blazing fast lookups.

Conclusion

Distributed data modeling requires a shift in thinking. Avoid hotspots, use UUIDs, and leverage JSONB. Next, let’s break things.

← Part 3 - SQL Compatibility Next: Part 5 - Fault Tolerance →

Tags: CockroachDB Data Modeling Schema

← Back to Blog