Command Palette

Search for a command to run...

Level 3 · 30 min

Sharding

Sharding distributes data across multiple servers (shards) to scale beyond single-node capacity. Choosing the right shard key is the most critical sharding decision — a wrong choice is difficult and expensive to reverse.

Shard Key Selection

A shard key determines how documents are distributed across shards. Good shard key criteria: high cardinality (many distinct values — enables fine-grained distribution), even write distribution (no single shard receives disproportionate writes), query isolation (queries can target a specific shard vs scatter-gather across all). Targeted queries include the shard key in the filter — mongos routes to the specific shard(s). Scatter-gather queries do not include the shard key — mongos broadcasts to all shards and merges results (expensive). Bad shard key examples: low-cardinality fields (country with 50 values = 50 possible shards maximum), monotonically increasing fields (all writes go to the last shard).

Hash vs Range Sharding

Range sharding distributes chunks based on shard key value ranges. Adjacent shard key values land on the same shard. Pros: efficient range queries (all documents in a date range on one shard). Cons: monotonically increasing keys (ObjectId, timestamp) create write hotspots — all new documents go to the last shard. Hash sharding applies a hash function to the shard key before distributing. Hash-adjacent values are distributed across shards. Pros: even write distribution even for monotonically increasing keys. Cons: range queries scatter across all shards (hash destroys order). Zone sharding (tag-based) assigns shard key ranges to specific shards — used for geo-local data, compliance requirements, or tiered storage. Key insight from MongoDB: The Definitive Guide (3rd ed., Bradshaw, Brazil, Chodorow): default chunk size is 64 MB, providing 'a good balance' between migration overhead and distribution granularity. The authors are explicit on shard key immutability: 'Once you shard a collection you cannot change your shard key, so it is important to choose correctly.' Shard key cardinality directly bounds the maximum number of shards: a logLevel field with only 'DEBUG', 'WARN', 'ERROR' values means 'MongoDB wouldn't be able to break up your data into more than three chunks' regardless of cluster size.

Chunk Management and Balancing

Data is distributed as chunks (default 128 MB). Each chunk covers a contiguous range of shard key values. The balancer automatically migrates chunks between shards when the distribution becomes uneven (threshold: 8 chunks difference). Chunk splits occur when a chunk grows beyond the max size. Jumbo chunks: chunks that cannot be split because all documents in the chunk have the same shard key value — a symptom of low shard key cardinality. Jumbo chunks cannot be migrated by the balancer, causing permanent imbalance. The balancer runs in the background and causes some performance overhead during migrations — schedule balancing windows during off-peak hours with balancerSchedule.

Key Takeaways

  • The shard key cannot be changed after sharding (MongoDB 5.0+ allows limited resharding). Choose carefully upfront — model your query patterns before selecting.
  • Monotonically increasing shard keys (timestamp, ObjectId) cause write hotspots with range sharding. Use hashed sharding for uniform write distribution.
  • Jumbo chunks are a sign of low shard key cardinality. They cannot be balanced, causing permanent uneven distribution. Prevent by choosing high-cardinality shard keys.

Code example

// Enable sharding on a database\nsh.enableSharding('myapp')\n\n// Shard a collection with hashed shard key (avoids hotspot)\nsh.shardCollection(\n  'myapp.events',\n  {tenantId: 1, _id: 'hashed}\n)\n\n// Check shard distribution\ndb.events.getShardDistribution()\n\n// Check for jumbo chunks\ndb.getSiblingDB('config').chunks.find({jumbo: true}).count()