Level 3 · 30 min

Clustering

Redis Cluster provides horizontal scaling and automatic failover. It shards data across multiple master nodes using 16384 hash slots, with each master optionally having replica nodes for high availability.

Hash Slots

Redis Cluster uses CRC16 (XMODEM variant) of the full key, modulo 16384, to assign hash slots. If the key contains a hash tag, only the substring inside the first pair of braces is hashed: CLUSTER KEYSLOT {user:1001}.session and CLUSTER KEYSLOT {user:1001}.cart both return the same slot value. Nodes communicate via the gossip protocol on port+10000 (the cluster bus port) — each node gossips cluster state to a random subset of peers every second. MOVED redirect is permanent: the responsible node for that slot has changed; the client should update its slot-to-node routing table. ASK redirect is transient: it occurs during slot migration (MIGRATING/IMPORTING state). The client must prefix the redirected command with ASKING and must NOT update its routing table — the slot is still in motion. Use CLUSTER KEYSLOT key to inspect slot assignment during debugging.

Hash Tags and Cross-Slot Operations

A production incident: a Redis Cluster with 6 nodes (3 primary, 3 replica) had cluster-node-timeout set to 15000ms. During a JVM GC pause on one primary lasting 12 seconds, replicas detected the primary as unreachable but had not yet exceeded the 15-second timeout, so no failover was initiated. Some clients that had already timed out began routing reads to replicas using READONLY, receiving data that was 12 seconds stale. After reducing cluster-node-timeout to 5000ms, a subsequent 4-second GC pause triggered failover in 4.8 seconds — 47,000 writes were rejected with CLUSTERDOWN errors before the elected replica began accepting writes. The tradeoff is explicit: faster cluster-node-timeout reduces the failover window but increases false-positive failovers on transient network hiccups or short GC pauses. Production insight from Redis in Action: Carlson documents that when a slave (replica) connects to a master, the master initiates a BGSAVE snapshot; if multiple slaves reconnect simultaneously after a network partition heals, they all trigger the same snapshot — "Though this is great from a memory standpoint, it can cause master Redis servers to have a lot of connections to deal with at once." The gossip protocol that Cluster uses for node discovery scales gossip message volume quadratically with node count — inspect CLUSTER INFO's cluster_stats_messages_ping_sent to verify gossip overhead stays manageable as you add nodes beyond 12–15.

Cluster Failover

Slot migration uses two intermediate states. To move slot 3999 from node A to node B: node A enters MIGRATING state for slot 3999, node B enters IMPORTING state. Keys still on A are served by A; migrated keys on B are served by B. If a client asks A for a key already migrated, A returns ASK (not MOVED) — client must send ASKING before retrying on B. In Docker and Kubernetes, Redis nodes must advertise external-facing addresses via cluster-announce-ip, cluster-announce-port, and cluster-announce-bus-port — without these, gossip messages contain container-internal IPs that are unreachable by nodes on different hosts. replica-lazy-flush yes (Redis 4.0+) enables asynchronous flushing of a replica's old dataset during full resync, reducing memory pressure spike during resync. CLUSTER NODES shows the full cluster topology, slot ranges, and node roles.

Key Takeaways

16384 hash slots distributed among masters. Keys hash to slots — clients are redirected if they hit the wrong node.
Hash tags ({user:1001}) co-locate related keys on the same slot — required for multi-key operations.
Cluster provides automatic failover via replica promotion — transparent to clients with a cluster-aware client library.

Code example

# Hash tag: co-locate session and cart on same slot
SET {user:1001}.session "data"
SET {user:1001}.cart "[item1, item2]"
# Both hash on {user:1001} → same slot → MGET works
MGET {user:1001}.session {user:1001}.cart

# CLUSTER INFO
127.0.0.1:7000> CLUSTER INFO
cluster_enabled:1
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_known_nodes:6  # 3 masters + 3 replicas

# MOVED redirect (client must retry at correct address)
> GET mykey
(error) MOVED 3999 127.0.0.1:7001
# Client connects to 7001 and retries