WebSockets From Scratch, No Library
June 5, 2026
Every time you see a live notification, a real-time cursor, or a chat message appear without refreshing the page, WebSockets are probably involved. Most tutorials hand you Socket.io and call it a day. This post skips the library and goes straight to the protocol.
Why Not Just HTTP?
HTTP is a request-response protocol. The client asks, the server answers, and the connection closes. That model works fine for fetching a webpage, but it falls apart the moment you need the server to push data without the client asking first. Long-polling was the workaround people used before WebSockets existed. The client would send a request, the server would hold it open until something happened, respond, and then the client would immediately send another request. It worked, barely, and it was expensive.
WebSockets solve this cleanly. One handshake, and both sides can send data whenever they want for as long as the connection stays open.
The Handshake
A WebSocket connection does not start as a WebSocket connection. It starts as a plain HTTP request. The client sends a regular GET request but includes a special header signaling that it wants to upgrade the protocol. It also sends a randomly generated base64 key.
The server responds with a 101 Switching Protocols status. Before it does, it takes the client's key, appends a fixed magic string defined in the spec, runs a SHA-1 hash on the result, and sends back the base64 encoding of that hash. This is not encryption. It is just proof that the server genuinely understands the WebSocket protocol and is not accidentally replying to the wrong kind of request.
Once the client validates that response, the HTTP connection is handed over entirely. Both sides are now speaking WebSocket framing over the same TCP connection that started the conversation.
Frames
Data over a WebSocket travels in frames, not as a raw stream. Every message is wrapped in a small binary header before being sent across the wire. That header carries a few important pieces of information: whether this is the last fragment of a larger message, what type of data is inside, whether the payload is masked, and how long the payload is.
The type information is called an opcode. Text messages, binary messages, connection closes, pings, and pongs each have their own opcode. Ping and pong are built directly into the protocol as a keep-alive mechanism. Either side can send a ping at any time and the other side is expected to respond with a pong. This is how dropped connections get detected before either side tries to send real data into a void.
Masking
Every frame sent from a client to a server must be masked. This is not optional and it is not about security in the traditional sense. It exists to protect infrastructure. Older proxies and caches sitting between a client and a server were not built with WebSockets in mind. If unmasked data happened to look like an HTTP response, a caching proxy could store it and serve it to other users. Masking makes the payload look like noise to anything that does not understand the WebSocket protocol.
The client generates a four byte key and XORs each byte of the payload with the corresponding key byte, cycling through the key as needed. The server knows to reverse this before reading the message. Frames going from server to client are never masked.
Message Fragmentation
Large messages do not have to be sent as one frame. The protocol supports fragmentation, where a single logical message is split across multiple frames. The first frame signals that more is coming, intermediate frames carry continuation opcodes, and the final frame sets a flag indicating the message is complete. The receiver reassembles them in order. This matters in practice because it allows a sender to start streaming data before it knows the full size of what it is sending.
Closing the Connection
Closing a WebSocket connection is a two step process. The side that wants to close sends a close frame, optionally with a status code and a reason. The other side responds with its own close frame, and then the underlying TCP connection is torn down. This graceful shutdown exists so neither side drops data that was already in flight when the decision to close was made.
Where Libraries Fit In
Libraries like Socket.io are built on top of all of this. They add rooms, namespaces, automatic reconnection, and a fallback to long-polling when WebSockets are not available. That is genuinely useful. But they are abstractions over a protocol that is already fairly simple. Understanding the layer underneath means you can reason about latency, debug unexpected disconnects, implement the protocol in environments where no library exists, and make informed decisions about when a library is actually solving a problem versus adding weight.
The WebSocket spec is short and readable. The handshake is a few headers. The framing is a handful of bit manipulations. Everything else, broadcasts, rooms, presence, is application logic that you build on top.