During some recent pre-production testing on a Cassandra cluster, I came across an odd load imbalance:
We’re using the random partitioner and a 2x replication factor with the NetworkTopologyStrategy. The replication is working (hence the 200% Effective-Ownership), but there is an obvious load imbalance between nodes.
I tracked down the problem using the “describering” command:
The “endpoints” indicate which nodes receive row copies for given key range. A sketch of the endpoints shows what’s happening:
As expected, we are replicating across our two racks. However, all the replicated data is going to only node 1-a and node 2-b.
Why? The DataStax documentation gives the answer:
With NetworkTopologyStrategy, replica placement is determined independently within each data center (or replication group). The first replica per data center is placed according to the partitioner (same as with SimpleStrategy). Additional replicas in the same data center are then determined by walking the ring clockwise until a node in a different rack from the previous replica is found.
Going back to the output of the original “ring” command, here’s a visualization of the token assignments:
So if a key belongs to node 1-a, we place the replicated row by walking clockwise along the ring till we hit node 2-a. However, under the same logic, a key belong to node 1-b will also replicate to node 2-a!
Once we understood what’s going on, this imbalance was easy to fix. We changed our token assignments to alternate between racks:
Now node 1-a replicates to node 2-a and node 1-b replicates to node 2-b:
Each of the four nodes now has 50% effective-ownership of the key space. Problem solved.