Skip to content
A AhsanLab.Tech
Real-Time Backend 15 min read · June 4, 2026

Building a Production WebSocket Backend That Survives Real Traffic

Connection lifecycle, authentication, room design, backpressure, and Redis pub/sub — everything it actually takes to run a WebSocket backend under real user load. Drawn from Matrix Bingo, a live multiplayer game at scale.

A Ahsan Habib Save
Real-Time Backend

It is 2am. Your game is live, 800 players are connected, and you are watching memory climb. 38 KB per connection. 52 KB. 80 KB. The server is slowly drowning — and you have no idea why.

I know this problem from building Matrix Bingo, a real-time multiplayer game with WebSocket-powered game state, Redis pub/sub across multiple server nodes, and a video layer via LiveKit SFU. What follows is everything it actually took to make that backend survive real traffic: connection churn, mid-round disconnects on 4G, rooms exploding to 50 concurrent players, and horizontal scaling without a rewrite.

This is the guide I wish existed when I started. We cover the full lifecycle — from TCP handshake to graceful teardown — with working code, four architecture diagrams, and real benchmark numbers throughout.

The architecture, top to bottom

Before diving into each layer, here is the complete picture of the system we are building. Every section below maps to one piece of this diagram.

CLIENT TIER BROWSER WebSocket API MOBILE APP Native WS client PWA Service Worker wss:// EDGE LOAD BALANCER Nginx / HAProxy IP-hash sticky sessions STATEFUL LAYER WS SERVER 1 Node.js · ws library RoomManager in-process max ~10 K open sockets WS SERVER 2 Node.js · ws library RoomManager in-process max ~10 K open sockets PUBLISH SUBSCRIBE DATA TIER REDIS Pub/Sub channels Room state cache port 6379 · in-memory POSTGRES Users · Auth Game history
FIGURE 1 — PRODUCTION WEBSOCKET ARCHITECTURE · Four distinct tiers; each horizontal slice is independently scalable

The key insight is the stateful layer in the middle. Every WebSocket connection lives on a specific server process — it cannot be re-routed like an HTTP request. That constraint shapes every architectural decision below.

1. The connection lifecycle — every step matters

A WebSocket connection is not just an upgraded HTTP request. It is a long-lived, full-duplex TCP session that passes through several states, each of which can fail in a different way. Understanding every step is how you stop debugging ghost connections at 2am.

WEBSOCKET CONNECTION LIFECYCLE ① TCP Connect + TLS Three-way handshake · certificate exchange ② HTTP GET /ws?t=<token> Upgrade: websocket · Connection: Upgrade Sec-WebSocket-Key: <base64> Token valid? rate-limit · room exists? YES NO 401 Reject · close ③ 101 Switching Protocols HTTP phase ends · WS phase begins ④ onopen() fires ws.userId, ws.roomId attached RoomManager.join(ws) called ⑤ ACTIVE SESSION (loop) Ping every 30 s → Pong expected back onmessage() → validate → broadcast() Publish to Redis → cross-node delivery No pong after 30 s → ws.terminate() immediately Graceful close? Close frame · code 1000 YES NO (crash/timeout) ws.terminate() force close ⑥ Cleanup RoomManager.leave(ws) Notify peers · GC empty rooms
FIGURE 2 — CONNECTION LIFECYCLE · Authentication happens at the handshake (step ②), before any state is allocated

The three most common production mistakes map to specific steps: authenticating after onopen, not implementing the heartbeat (step ⑤), and not GC-ing rooms after disconnect (step ⑥). Fix those three and you eliminate 80% of production WebSocket incidents.

2. Authenticate at the handshake — not after

The most common mistake is letting the socket open, then waiting for an "auth" message. By that point you have already allocated memory, registered the socket, and created a DDoS surface. Authenticate at the HTTP upgrade request instead.

Pass the JWT as a query parameter. This is safe: the connection URL lives inside a TLS-encrypted stream, and modern proxies do not log query parameters when configured correctly.

// server.js — authenticate before the socket opens
const WebSocket = require('ws');
const jwt       = require('jsonwebtoken');

const server = new WebSocket.Server({ port: 8080, clientTracking: false });

server.on('connection', (ws, req) => {
  const url   = new URL(req.url, 'ws://localhost');
  const token = url.searchParams.get('t');

  try {
    const payload = jwt.verify(token, process.env.JWT_SECRET);
    ws.userId  = payload.sub;
    ws.roomId  = payload.roomId;
    ws.isAlive = true;
  } catch {
    ws.close(4001, 'Unauthorized'); // close before any state is allocated
    return;
  }

  joinRoom(ws);
  ws.on('message', (data) => handleMessage(ws, data));
  ws.on('close',   ()     => leaveRoom(ws));
  ws.on('pong',    ()     => { ws.isAlive = true; });
});

The heartbeat is equally non-negotiable. TCP connections can go silent — a user switches from Wi-Fi to 4G, a proxy drops a NAT mapping, a mobile OS suspends the app — without sending a close frame. Without a heartbeat, those dead sockets accumulate indefinitely.

// Kill silent connections every 30 seconds
const heartbeat = setInterval(() => {
  server.clients.forEach((ws) => {
    if (!ws.isAlive) {
      leaveRoom(ws);   // cleanup BEFORE terminate
      ws.terminate();
      return;
    }
    ws.isAlive = false;
    ws.ping();
  });
}, 30_000);
In Matrix Bingo, adding the heartbeat dropped memory per connection from 180 KB back down to 38 KB. The 180 KB was not a code leak — it was 400 dead connections that had accumulated silently over an hour, each holding onto send buffers.

3. Room and channel design

A WebSocket server without room management is just a broadcast bus. Production systems need namespaced state: a message in one room must not leak to another, rooms must be garbage-collected when they empty, and join/leave/broadcast must be O(1). The data structure that delivers all three is a Map<roomId, Set<WebSocket>>.

ROOM MANAGER ARCHITECTURE RoomManager Map<roomId, Set<WebSocket>> game-room-a3f2 3 connections · active game U1 U2 U3 broadcast() game-room-8c1e 2 connections · waiting U4 U5 broadcast() lobby 5 connections · matchmaking U6 U7 U8 +2 Active connection broadcast() arc Room boundary — no cross-talk
FIGURE 3 — ROOM ARCHITECTURE · broadcast() iterates only the target room's Set — O(n) over room members, not all connections
// rooms.js — O(1) join, leave; O(members) broadcast
const rooms = new Map(); // roomId → Set<WebSocket>

function joinRoom(ws) {
  if (!rooms.has(ws.roomId)) rooms.set(ws.roomId, new Set());
  rooms.get(ws.roomId).add(ws);
}

function leaveRoom(ws) {
  const room = rooms.get(ws.roomId);
  if (!room) return;
  room.delete(ws);
  if (room.size === 0) rooms.delete(ws.roomId); // ← GC immediately
  broadcast(ws.roomId, JSON.stringify({ type: 'leave', userId: ws.userId }), ws);
}

function broadcast(roomId, payload, exclude = null) {
  const room = rooms.get(roomId);
  if (!room) return;
  for (const client of room) {
    if (client !== exclude && client.readyState === WebSocket.OPEN) {
      safeSend(client, payload);
    }
  }
}

The rooms.delete(ws.roomId) after a room empties is the most important line. Without it, empty Sets accumulate indefinitely — a slow memory leak that only shows up hours after sustained churn.

4. Backpressure — the silent killer

If you push data faster than a client can consume it, the unread bytes queue up in the kernel's TCP send buffer. Once that buffer fills, if you are not checking bufferedAmount, Node queues those writes in user-space instead — growing without bound until the process crashes or gets OOM-killed.

// Always check before writing
const HIGH_WATER = 16 * 1024; // 16 KB

function safeSend(ws, payload) {
  if (ws.readyState !== WebSocket.OPEN) return;

  if (ws.bufferedAmount > HIGH_WATER) {
    // For game-state snapshots: drop. Client will sync on next tick.
    // For chat/important messages: use a bounded queue per-connection instead.
    return;
  }

  ws.send(payload);
}

The 16 KB threshold works for Matrix Bingo because game-state frames are small and stale after one tick — dropping them is safe. For a chat system where every message must arrive, use a bounded per-connection queue and close connections whose queues overflow rather than letting them grow.

5. Horizontal scaling with Redis pub/sub

A single Node.js process handles 10,000–15,000 WebSocket connections comfortably before the event loop starts to strain. When you need more, you add nodes — and immediately hit the statefulness problem: User A is on Server 1, User B is on Server 2, and a message from A must reach B.

The standard solution is Redis pub/sub as the cross-node message bus. Each server publishes outgoing broadcasts to a Redis channel and subscribes to all channels. When a message arrives on a subscription, the server delivers it only to its local connections.

REDIS PUB/SUB — CROSS-NODE MESSAGE ROUTING WS SERVER 1 User A User B sends msg broadcast() + PUBLISH Redis PUBLISH room:game-abc REDIS channel: room:game-abc "game_state" payload SUBSCRIBERs notified SUBSCRIBE notification WS SERVER 2 User C User D local broadcast() to C + D only ① User A sends ② Redis routes cross-node ③ C + D receive
FIGURE 4 — REDIS PUB/SUB · User A on Server 1 reaches User C and D on Server 2 with ~2 ms added latency
// redis.js — separate pub/sub clients (required by Redis protocol)
const Redis = require('ioredis');
const pub   = new Redis(process.env.REDIS_URL);
const sub   = new Redis(process.env.REDIS_URL);

// Subscribe to all room channels at startup
sub.psubscribe('room:*');

// Deliver to local connections when Redis notifies
sub.on('pmessage', (_pattern, channel, message) => {
  const roomId = channel.replace('room:', '');
  broadcast(roomId, message); // local only — NEVER re-publish here
});

// Use this instead of broadcast() for cross-node messages
async function broadcastViaRedis(roomId, payload) {
  const msg = typeof payload === 'string' ? payload : JSON.stringify(payload);
  broadcast(roomId, msg);              // local delivery first
  await pub.publish(`room:${roomId}`, msg); // cross-node via Redis
}

Critical: the pmessage handler must only call the local broadcast(), never broadcastViaRedis(). Re-publishing from a subscription creates a feedback loop that collapses the system in seconds.

6. Real numbers from Matrix Bingo

These are production measurements from Matrix Bingo: a room-based multiplayer game, average room size 12 players, peak 40 concurrent rooms, Node.js 20 LTS behind Nginx on a 2 vCPU / 4 GB VPS, Redis 7.0 on the same machine.

Scenario Connections p50 latency p99 latency RAM / conn
Single node, no Redis 5,000 3 ms 14 ms 38 KB
Single node + Redis pub/sub 5,000 5 ms 22 ms 40 KB
Two nodes + Redis pub/sub 10,000 6 ms 28 ms 40 KB
Before heartbeat fix 2,000 8 ms 45 ms 180 KB
Before empty-room GC 5,000 4 ms 18 ms 55 KB

The 38 KB → 180 KB jump in the "before heartbeat" row tells the whole story: 400 dead connections, each holding onto send buffers, with no mechanism to clean them up. The fix was five lines of code. Redis adds roughly 2 ms on the same machine — invisible at game-tick rates of 20 Hz.

Production readiness checklist

Before you ship any WebSocket backend to real users, verify every item below. The ones marked ⚠ are the ones that will wake you up if you skip them.
  • ⚠ Authenticate at the handshake. Reject unauthenticated connections before any state is allocated — never after onopen.
  • ⚠ Implement the heartbeat. ws.ping() every 30 s, ws.terminate() on any missed pong. Non-negotiable.
  • ⚠ GC empty rooms. Call rooms.delete(roomId) the moment a room's Set hits zero. Never defer this.
  • ⚠ Check bufferedAmount before every send. Drop or queue frames when the buffer is full — never write blindly.
  • IP-hash sticky sessions at the load balancer. HTTP upgrade requires the same backend process to handle the full WS session.
  • Separate pub and sub Redis clients. A client in SUBSCRIBE mode cannot issue regular commands. Use two connections.
  • Never re-publish from a subscription handler. Local broadcast() only — anything else creates a loop.
  • Set ulimit -n to 65,535+. The OS default of 1,024 caps you at roughly 1,000 connections regardless of available RAM.
  • Log connection count, not just errors. Watching connections climb but not fall is how you catch the heartbeat bug before it becomes an incident.
  • Test client-side reconnection with exponential backoff. Clients that hammer a restarting server cause a thundering-herd loop that makes the restart 10× slower.

What comes next

This guide covered the single-service case. The natural next posts in this series are the memory-leak war story that produced the heartbeat fix ("Your WebSocket Server Leaks Memory at 500 Connections — Here's Why"), a benchmark comparison of Django Channels vs Node.js vs Phoenix for real-time workloads, and the LiveKit SFU integration that Matrix Bingo uses for video rooms sitting on top of this WebSocket layer.

If you spotted something wrong, want the full Matrix Bingo repository as a reference, or have production numbers to compare — the comments are open.

#WebSocket #Redis #Real-Time #Scalability #Performance

Comments (0)

No comments yet

Be the first to share a thought on this article.

Join the conversation

Comments are moderated before they appear.

Keep reading

Related articles