Here's something nobody warns you about when you first add Redis to your stack: the caching itself is the easy part. You'll have it running in twenty minutes. What'll eat your next three weekends is figuring out when to invalidate, what TTL to use, and why your users are seeing stale data even though you're "definitely clearing the cache on update." I've shipped Redis caching into four different production Node.js apps over the years, and every single time, the bugs weren't in the Redis commands -- they were in the strategy.
So that's what this post is really about. Not just the GET and SET commands, but the actual patterns, trade-offs, and landmines you'll step on when Redis becomes a critical part of your architecture.
Why Redis Is the Default Choice (And When It Shouldn't Be)
Redis gives you sub-millisecond reads from memory. That's the pitch, and it's a good one. A database query that takes 40ms suddenly takes 0.3ms. Your API response times drop, your database stops sweating, and everyone's happy.
But Redis isn't just a dumb key-value store. It supports strings, hashes, lists, sets, sorted sets, streams, and more. That versatility is why people use it for everything from session management to rate limiting to real-time leaderboards to job queues. It's the Swiss Army knife of infrastructure.
That said, I want to be honest about when Redis is overkill or the wrong fit. If your app has 50 users and your database queries return in 5ms, adding Redis just adds operational complexity for negligible gain. If you're caching things that change every second, you'll spend more time invalidating than you save on reads. And if you don't have a plan for what happens when Redis goes down, you've turned an optional performance boost into a single point of failure. Start with the question "what problem am I solving?" not "how do I add Redis?"
Setting Up Redis Clients: ioredis vs node-redis
Two client libraries dominate the Node.js ecosystem: ioredis and node-redis. Both work fine. Both support async/await. I've used both in production. Here's my honest take: I default to ioredis. Its cluster and sentinel support has historically been more battle-tested, and its API feels more natural. But node-redis v4+ has closed the gap significantly, so pick whichever your team already uses.
# Install ioredis
npm install ioredis
# Or install node-redis
npm install redis
Here's a production-ready ioredis setup. Notice the retry strategy and lazyConnect -- you want control over when the connection actually happens, especially in testing.
const Redis = require('ioredis');
const redis = new Redis({
host: process.env.REDIS_HOST || '127.0.0.1',
port: parseInt(process.env.REDIS_PORT) || 6379,
password: process.env.REDIS_PASSWORD || undefined,
db: 0,
retryDelayOnFailover: 300,
maxRetriesPerRequest: 3,
lazyConnect: true
});
redis.on('connect', () => console.log('Redis connected'));
redis.on('error', (err) => console.error('Redis error:', err));
await redis.connect();
And the node-redis equivalent:
const { createClient } = require('redis');
const redis = createClient({
url: process.env.REDIS_URL || 'redis://localhost:6379',
socket: {
reconnectStrategy: (retries) => Math.min(retries * 100, 5000)
}
});
redis.on('connect', () => console.log('Redis connected'));
redis.on('error', (err) => console.error('Redis error:', err));
await redis.connect();
One thing that bites people in production: always configure a reconnect strategy. Redis connections drop. Networks hiccup. If your client doesn't auto-reconnect with backoff, your app will start throwing errors and never recover until you restart it. Also, enable TLS if Redis is accessed over any network boundary. I've seen credentials fly over plain TCP more times than I'd like to admit.
The Cache-Aside Pattern (Your Bread and Butter)
This is the pattern you'll use 80% of the time, and it's the one you should reach for by default. The logic is dead simple: check the cache first. If it's there, return it. If not, fetch from the database, store it in the cache, and return it. Sometimes called "lazy loading" because you only cache data that's actually requested.
async function getUserById(userId) {
const cacheKey = `user:${userId}`;
// Step 1: Check the cache
const cached = await redis.get(cacheKey);
if (cached) {
console.log('Cache hit for', cacheKey);
return JSON.parse(cached);
}
// Step 2: Cache miss - fetch from database
console.log('Cache miss for', cacheKey);
const result = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
const user = result.rows[0];
if (!user) return null;
// Step 3: Store in cache with TTL
await redis.set(cacheKey, JSON.stringify(user), 'EX', 3600); // 1 hour TTL
return user;
}
Simple and effective. But here's the thing nobody mentions in the basic tutorials: this pattern has a nasty problem called the thundering herd. Imagine a popular cache key expires. In the next 100ms, 500 requests all hit that endpoint. Every single one sees a cache miss. Every single one queries the database. Your database gets hammered with 500 identical queries simultaneously. I've watched this take down a production Postgres instance.
The fix is a locking mechanism. Only one request actually fetches from the database; everyone else waits for it to populate the cache.
async function getUserByIdWithLock(userId) {
const cacheKey = `user:${userId}`;
const lockKey = `lock:${cacheKey}`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
// Try to acquire a lock
const lockAcquired = await redis.set(lockKey, '1', 'EX', 10, 'NX');
if (lockAcquired) {
try {
// Double-check after acquiring lock
const recheck = await redis.get(cacheKey);
if (recheck) return JSON.parse(recheck);
const result = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
const user = result.rows[0];
if (user) {
await redis.set(cacheKey, JSON.stringify(user), 'EX', 3600);
}
return user;
} finally {
await redis.del(lockKey);
}
} else {
// Someone else is fetching. Wait and retry.
await new Promise(resolve => setTimeout(resolve, 200));
return getUserByIdWithLock(userId);
}
}
That NX flag on the lock SET is crucial -- it means "only set if the key doesn't already exist," which makes it an atomic lock acquisition. The 10-second expiry on the lock is your safety net in case the process holding the lock crashes. Without it, you'd deadlock forever. Ask me how I know.
Write-Through and Read-Through Patterns
Cache-aside works great for read-heavy workloads where you can tolerate brief staleness. But sometimes you need the cache to always reflect reality. That's where write-through comes in: every write to the database immediately updates the cache too.
async function updateUser(userId, updates) {
const cacheKey = `user:${userId}`;
// Step 1: Update the database
const result = await db.query(
'UPDATE users SET name = $1, email = $2, updated_at = NOW() WHERE id = $3 RETURNING *',
[updates.name, updates.email, userId]
);
const updatedUser = result.rows[0];
// Step 2: Update the cache immediately
if (updatedUser) {
await redis.set(cacheKey, JSON.stringify(updatedUser), 'EX', 3600);
}
return updatedUser;
}
async function createUser(userData) {
const result = await db.query(
'INSERT INTO users (name, email, password_hash) VALUES ($1, $2, $3) RETURNING *',
[userData.name, userData.email, userData.passwordHash]
);
const newUser = result.rows[0];
// Write to cache so subsequent reads are instant
await redis.set(`user:${newUser.id}`, JSON.stringify(newUser), 'EX', 3600);
return newUser;
}
The trade-off is clear: writes get slower (two operations instead of one), but your cache is never stale for that key. There's a subtle gotcha here too. If your database write succeeds but the Redis write fails, you now have inconsistency in the other direction -- the database has new data but the cache has old data. In high-stakes scenarios, you might want to wrap both in a try/catch and invalidate the cache key on Redis failure rather than leaving stale data sitting there.
The read-through pattern takes cache-aside and wraps it in a clean abstraction. Instead of scattering caching logic across every function, you build a caching layer that handles the fetch-on-miss automatically.
class CacheThrough {
constructor(redisClient, defaultTTL = 3600) {
this.redis = redisClient;
this.defaultTTL = defaultTTL;
}
async get(key, fetchFn, ttl) {
const cached = await this.redis.get(key);
if (cached) {
return JSON.parse(cached);
}
const data = await fetchFn();
if (data !== null && data !== undefined) {
await this.redis.set(key, JSON.stringify(data), 'EX', ttl || this.defaultTTL);
}
return data;
}
async invalidate(key) {
await this.redis.del(key);
}
async invalidatePattern(pattern) {
const keys = await this.redis.keys(pattern);
if (keys.length > 0) {
await this.redis.del(...keys);
}
}
}
// Usage -- much cleaner than scattering cache logic everywhere
const cache = new CacheThrough(redis);
const user = await cache.get(
`user:${userId}`,
() => db.query('SELECT * FROM users WHERE id = $1', [userId]).then(r => r.rows[0]),
1800
);
I genuinely recommend building something like this early in your project. It pays for itself immediately. Every developer on the team uses the same caching interface, and when you need to change the caching behavior (add logging, add metrics, switch TTL strategies), you do it in one place.
TTL Strategies and Cache Invalidation (The Actually Hard Part)
There's a famous quote: "There are only two hard things in Computer Science: cache invalidation and naming things." Having dealt with both extensively, I can confirm cache invalidation is worse.
Setting the right TTL is more art than science. Too short and you're barely caching at all. Too long and your users see stale data. Here's what I've landed on after lots of trial and error:
// Different TTL strategies for different data types
const TTL = {
USER_PROFILE: 3600, // 1 hour - changes infrequently
PRODUCT_LIST: 300, // 5 minutes - changes moderately
STOCK_PRICE: 10, // 10 seconds - changes frequently
SITE_CONFIG: 86400, // 24 hours - changes very rarely
SESSION: 1800, // 30 minutes - security requirement
RATE_LIMIT: 60 // 1 minute - sliding window
};
await redis.set('user:42', data, 'EX', TTL.USER_PROFILE);
await redis.set('products:featured', data, 'EX', TTL.PRODUCT_LIST);
The key insight: TTL should be based on how much staleness your users can tolerate, not how often the data changes. A product catalog that updates once a day could still have a 5-minute TTL if seeing a slightly stale price for 5 minutes is acceptable.
For active invalidation, the simplest approach is event-based: when data changes, explicitly kill the relevant cache keys.
async function deleteUser(userId) {
await db.query('DELETE FROM users WHERE id = $1', [userId]);
// Invalidate all related cache entries
await redis.del(`user:${userId}`);
await redis.del(`user:${userId}:posts`);
await redis.del(`user:${userId}:followers`);
// Invalidate list caches that may contain this user
const listKeys = await redis.keys('users:list:*');
if (listKeys.length > 0) {
await redis.del(...listKeys);
}
}
This works, but it gets messy fast. You're now maintaining a mental model of every cache key that might be affected by every data change. Miss one, and you've got stale data. One approach that helps is version-based invalidation -- instead of deleting keys, you bump a version number and let old keys expire naturally.
async function getProductsWithVersion(category) {
const version = await redis.get(`products:version:${category}`);
const cacheKey = `products:${category}:v${version}`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
const products = await db.query(
'SELECT * FROM products WHERE category = $1',
[category]
);
await redis.set(cacheKey, JSON.stringify(products.rows), 'EX', 3600);
return products.rows;
}
// When products change, just bump the version
async function onProductUpdated(category) {
await redis.incr(`products:version:${category}`);
}
One critical production warning: never use the KEYS command in production. It scans every single key in your Redis instance and blocks the entire server while it does it. I've seen this cause a 30-second freeze on a Redis instance with 2 million keys. Use SCAN instead -- it does the same thing but iteratively, without blocking.
async function invalidateByPattern(pattern) {
let cursor = '0';
do {
const [nextCursor, keys] = await redis.scan(cursor, 'MATCH', pattern, 'COUNT', 100);
cursor = nextCursor;
if (keys.length > 0) {
await redis.del(...keys);
}
} while (cursor !== '0');
}
Pub/Sub and Session Storage
If you're running multiple Node.js instances behind a load balancer (and in production, you should be), you have a problem: if instance A invalidates a cache key in Redis, that's fine. But what if instance A also has an in-memory LRU cache? How do the other instances know to clear their local copies? Enter Redis Pub/Sub.
// Publisher (e.g., in your API when data changes)
async function publishUserUpdate(userId, userData) {
await redis.publish('user:updated', JSON.stringify({ userId, data: userData }));
}
// Subscriber (e.g., in each application instance)
const subscriber = new Redis(); // Dedicated connection for subscriptions
subscriber.subscribe('user:updated', 'order:created', (err, count) => {
if (err) console.error('Subscribe error:', err);
console.log(`Subscribed to ${count} channels`);
});
subscriber.on('message', (channel, message) => {
const data = JSON.parse(message);
switch (channel) {
case 'user:updated':
// Invalidate local in-memory cache
localCache.delete(`user:${data.userId}`);
break;
case 'order:created':
// Trigger real-time notification
notifyAdmins(data);
break;
}
});
One gotcha that trips up basically everyone the first time: a Redis connection used for subscriptions can't be used for anything else. The moment you call subscribe(), that connection enters subscriber mode and will reject regular commands. Always create a separate dedicated connection for Pub/Sub.
For session storage, Redis is the no-brainer choice for Node.js apps. It lets you share sessions across multiple instances and survives app restarts. The setup with Express is straightforward:
const session = require('express-session');
const RedisStore = require('connect-redis').default;
const app = express();
app.use(session({
store: new RedisStore({
client: redis,
prefix: 'sess:',
ttl: 1800 // 30 minutes
}),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false,
cookie: {
secure: process.env.NODE_ENV === 'production',
httpOnly: true,
maxAge: 1800000 // 30 minutes in milliseconds
}
}));
Set resave: false and saveUninitialized: false. I see people leave these as true all the time and then wonder why their Redis memory usage keeps climbing. With those defaults, every single request creates or touches a session, even for unauthenticated users hitting your health check endpoint.
Rate Limiting with Redis
Redis is perfect for rate limiting because it gives you atomic operations and built-in expiration -- the two things you absolutely need for accurate rate limiting in a distributed system. I've seen people try to do rate limiting in application memory, and it falls apart the moment you have more than one instance. User hits instance A five times, then hits instance B five times, and suddenly they've made ten requests against a limit of five.
The sliding window approach using sorted sets is the most accurate method. It avoids the boundary problem that plagues fixed windows (where someone could make double the allowed requests by timing them at the edge of two windows).
async function slidingWindowRateLimit(identifier, limit, windowSeconds) {
const key = `ratelimit:${identifier}`;
const now = Date.now();
const windowStart = now - (windowSeconds * 1000);
const pipeline = redis.pipeline();
// Remove entries outside the window
pipeline.zremrangebyscore(key, 0, windowStart);
// Count entries in the window
pipeline.zcard(key);
// Add the current request
pipeline.zadd(key, now, `${now}:${Math.random()}`);
// Set expiry on the key
pipeline.expire(key, windowSeconds);
const results = await pipeline.exec();
const requestCount = results[1][1];
return {
allowed: requestCount < limit,
remaining: Math.max(0, limit - requestCount - 1),
resetAt: new Date(now + windowSeconds * 1000)
};
}
// Express middleware
async function rateLimitMiddleware(req, res, next) {
const identifier = req.ip;
const { allowed, remaining, resetAt } = await slidingWindowRateLimit(identifier, 100, 60);
res.set('X-RateLimit-Limit', '100');
res.set('X-RateLimit-Remaining', String(remaining));
res.set('X-RateLimit-Reset', resetAt.toISOString());
if (!allowed) {
return res.status(429).json({ error: 'Too many requests. Please try again later.' });
}
next();
}
Notice the use of pipeline() here. That batches all four Redis commands into a single round trip. Without pipelining, you'd make four separate network calls for every single incoming request. That latency adds up fast.
If you don't need sliding-window accuracy and want something simpler, a fixed window counter uses less memory and fewer operations:
async function fixedWindowRateLimit(identifier, limit, windowSeconds) {
const key = `ratelimit:${identifier}:${Math.floor(Date.now() / (windowSeconds * 1000))}`;
const count = await redis.incr(key);
if (count === 1) {
await redis.expire(key, windowSeconds);
}
return {
allowed: count <= limit,
remaining: Math.max(0, limit - count)
};
}
For most APIs, the fixed window approach is perfectly fine. Don't over-engineer the rate limiter when a simpler solution does the job. Save the sliding window for scenarios where abuse at window boundaries is a real concern (payment APIs, authentication endpoints).
Sentinel and Cluster: Running Redis in Production Without Losing Sleep
Running a single Redis instance in production is playing with fire. When -- not if -- that instance goes down, your cache disappears, your rate limiter stops working, and your sessions evaporate. Redis gives you two high-availability options, and you need at least one of them.
Redis Sentinel is the simpler option. It monitors your primary Redis instance and a set of replicas. If the primary dies, Sentinel automatically promotes a replica. Your application connects through Sentinel and gets transparently rerouted.
const redis = new Redis({
sentinels: [
{ host: 'sentinel-1.example.com', port: 26379 },
{ host: 'sentinel-2.example.com', port: 26379 },
{ host: 'sentinel-3.example.com', port: 26379 }
],
name: 'mymaster', // Name of the Sentinel master group
password: process.env.REDIS_PASSWORD,
sentinelPassword: process.env.SENTINEL_PASSWORD,
db: 0
});
redis.on('reconnecting', () => console.log('Redis reconnecting (failover?)'));
redis.on('+node', (node) => console.log('New node added:', node));
Redis Cluster shards your data across multiple nodes using hash slots. This is for when your data outgrows a single machine's memory, or you need write throughput beyond what one node can handle.
const Redis = require('ioredis');
const cluster = new Redis.Cluster([
{ host: 'redis-node-1.example.com', port: 6379 },
{ host: 'redis-node-2.example.com', port: 6379 },
{ host: 'redis-node-3.example.com', port: 6379 }
], {
redisOptions: {
password: process.env.REDIS_PASSWORD,
tls: process.env.NODE_ENV === 'production' ? {} : undefined
},
scaleReads: 'slave', // Read from replicas for better throughput
clusterRetryStrategy: (times) => Math.min(times * 200, 5000)
});
// Usage is the same as a regular Redis client
await cluster.set('key', 'value');
const value = await cluster.get('key');
The biggest gotcha with Redis Cluster: multi-key operations only work when all keys hash to the same slot. Commands like MGET, transactions, and Lua scripts that touch multiple keys will blow up if those keys live on different nodes. The workaround is hash tags -- wrap the common part of your key in curly braces so Redis hashes only that portion.
// These keys will be stored on the same node because {user:42} is the hash tag
await cluster.set('{user:42}:profile', profileData);
await cluster.set('{user:42}:settings', settingsData);
// Now MGET works safely
const [profile, settings] = await cluster.mget('{user:42}:profile', '{user:42}:settings');
My recommendation: start with Sentinel. It's simpler to operate, easier to reason about, and handles 95% of production use cases. You only need Cluster when you've genuinely outgrown a single node's memory or write capacity. Many teams jump to Cluster too early and spend months dealing with the operational complexity of resharding and cross-slot limitations when Sentinel would have been perfectly fine. Scale when the metrics tell you to, not when your architecture diagram looks cool.
Comments (0)
No comments yet. Be the first to share your thoughts!