Casandra Data Modelling

Basic Modelling Principle

Cassandra is a distributed database in which data is partitioned and stored across multiple nodes within a cluster.

  1. Spread data evenly around the cluster — The idea is to have same amount of data in every node in the cluster.
  2. Minimize the number of partitions read — The idea here is to read rows from as few partitions as possible.Partitions are groups of rows that share the same partition key.

Use Case

Defining the data model i.e. partition key for the Ariba Catalog solution having diverse customer base across industry verticals is stimulating. Typical use cases for the query look up is based on the catalog name and the item key i.e. compound key of supplier id , item id for a particular customer

Catalog Name as the partition key

This will ensure looking up items based on the catalog name efficiently, however looking up items based on the Item key will be sub optimal as a catalog can have millions of items.

Item key as the partition key

While this will definitely spread items across nodes evenly, looking up items based on the catalog name will be highly inefficient as it needs to scan multiple partitions.

Hybrid approach — Enforcing the number of partitions

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Anand

Anand

2 Followers

Experienced technical leader in designing and developing scalable products across domains.