Mercurial
annotate mrjunejune/src/blog/websocket-demystified/index.md @ 169:295ac2e5ec00
[MrJuneJune] Created separate target for generating html from md.
| author | MrJuneJune <me@mrjunejune.com> |
|---|---|
| date | Mon, 19 Jan 2026 17:33:18 -0800 |
| parents | 902e29c38d66 |
| children |
| rev | line source |
|---|---|
|
169
295ac2e5ec00
[MrJuneJune] Created separate target for generating html from md.
MrJuneJune <me@mrjunejune.com>
parents:
133
diff
changeset
|
1 --- |
|
295ac2e5ec00
[MrJuneJune] Created separate target for generating html from md.
MrJuneJune <me@mrjunejune.com>
parents:
133
diff
changeset
|
2 title: WebSocket Demystified |
|
295ac2e5ec00
[MrJuneJune] Created separate target for generating html from md.
MrJuneJune <me@mrjunejune.com>
parents:
133
diff
changeset
|
3 description: A deep dive into WebSocket internals, debunking the 65535 port myth and building a WebSocket implementation from scratch. |
|
295ac2e5ec00
[MrJuneJune] Created separate target for generating html from md.
MrJuneJune <me@mrjunejune.com>
parents:
133
diff
changeset
|
4 --- |
|
295ac2e5ec00
[MrJuneJune] Created separate target for generating html from md.
MrJuneJune <me@mrjunejune.com>
parents:
133
diff
changeset
|
5 |
| 130 | 6 # WebSocket Demystified |
| 7 | |
|
133
902e29c38d66
[Blog] Final copy for websocket one.
June Park <parkjune1995@gmail.com>
parents:
132
diff
changeset
|
8 WebSockets have been around for more than 10 years now. The [RFC 6455](https://www.rfc-editor.org/rfc/rfc6455) was dropped way back in 2011. This was inevitable. As apps got more complex, people wanted real bidirectional communication without resorting to hacky solutions like raw TCP connections with custom keys or the constant overhead of short/long polling. |
| 130 | 9 |
| 10 Today, it’s the standard for everything from chat apps to LLM interfaces, where the model streams bytes back to you one token at a time as it predicts the next word. | |
| 11 | |
|
132
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
12 Most developers just grab a library like [ws](https://github.com/websockets/ws) for Node.js or [websockets](https://websockets.readthedocs.io/) for Python and call it a day. But many don’t realize the underlying mechanism is actually pretty simple to implement yourself in a day or so. Also they misunderstand what websocket actually do. Let's look at how to build it from scratch and debunk myths. |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
13 |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
14 ### The "65,535" Myth |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
15 |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
16 Before we write a single line of code, Let's talk about common myth. You’ll often hear developers say, *"A server can only handle 65,535 WebSocket connections because there are only 65,535 ports."* |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
17 |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
18 If this were true, Discord or Slack would need millions of separate IP addresses just to function. The confusion comes from the 16-bit size of the TCP port field, but a connection isn't defined by a port alone. You will see in this blog. |
| 130 | 19 |
| 20 --- | |
| 21 | |
| 22 ## Requirements | |
| 23 | |
| 24 * Ability to type | |
|
132
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
25 * Half a brain to real mechanics, not the surface-level myths. |
| 130 | 26 * A computer |
| 27 | |
| 28 --- | |
| 29 | |
| 30 ## The Lifecycle | |
| 31 | |
| 32 To get a WebSocket up and running, you have to follow a specific dance. It’s not just "connecting to a port"; it's an evolution of an existing relationship. | |
| 33 | |
| 34 1. **The Handshake:** A client sends a "pretty please" HTTP request asking to upgrade the connection. | |
| 35 2. **The Response:** The server agrees (101 Switching Protocols) and sends back a specific hash. | |
| 36 3. **The Switch:** Both sides stop talking "HTTP" and start talking "Frames." | |
| 37 4. **The Interaction:** Bidirectional, binary-framed messaging until someone closes the door. | |
| 38 | |
| 39 --- | |
| 40 | |
| 41 ## Opening Handshakes | |
| 42 | |
| 43 To start the upgrade from HTTP to WebSocket, the client sends a standard GET request but with some very specific headers. | |
| 44 | |
|
132
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
45 <div class="center"> <img src="/public/web-socket-header.webp" /> </div> |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
46 |
| 131 | 47 |
| 130 | 48 I’m assuming you know how HTTP works. If not, you can open a developer tool by right clicking on your browser and seeing into network tab and refershign the page. The only interesting values here is the `Sec-WebSocket-Key`. This key is usually a 16-byte random value encoded in **Base64**. |
| 49 | |
|
133
902e29c38d66
[Blog] Final copy for websocket one.
June Park <parkjune1995@gmail.com>
parents:
132
diff
changeset
|
50 **Note:** It’s not for security. It’s to prevent intermediate caches from accidentally serving a cached WebSocket response to a different client. |
| 130 | 51 |
| 52 But before we jump into that, we need to construct that Base64 key. | |
| 53 | |
| 54 ### What is Base64? | |
| 55 | |
| 56 Let's ask Gemini: | |
| 57 | |
| 58 > "Base64 is a binary-to-text encoding scheme that represents data in an ASCII string format by translating it into a radix-64 representation, using a specific set of 64 printable characters." — Gemini | |
| 59 | |
| 131 | 60 |
| 130 | 61 Nice, it didn't halluciante. Let's constracut these. Here they are 64 characters that are safe to print in ASCII.: |
| 62 | |
| 63 ```c | |
| 64 static const char base64_chars[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; | |
| 65 | |
| 66 ``` | |
| 67 | |
| 68 In Python, this is a one-liner: | |
| 69 | |
| 70 ```python | |
| 71 import base64 | |
| 72 import os | |
| 73 print(base64.b64encode(os.urandom(16))) | |
| 74 | |
| 75 ``` | |
| 76 | |
| 77 But we are rewriting this from scratch in **C**, so we need to suffer a little. The logic is: take 3 bytes (24 bits) and split them into 4 chunks of 6 bits each. Each 6-bit chunk becomes an index into our `base64_chars` array. | |
| 78 | |
| 79 #### Step 1: Generate Random Bytes | |
| 80 | |
| 81 First, we grab 16 random bytes. | |
| 82 | |
| 83 ```c | |
| 84 srand((unsigned int)time(NULL)); | |
| 85 uint8 random_value[16]; | |
| 86 | |
| 87 for (int i = 0; i < 16; i++) | |
| 88 random_value[i] = (uint8)(rand() % 256); | |
| 89 ``` | |
| 90 | |
| 91 #### Step 2: The Bit-Shifting Magic | |
| 92 | |
| 93 We loop through our 16 bytes in groups of 3. We pack them into a 32-bit integer, then carve that integer into 6-bit slices. | |
| 94 | |
| 95 > **Note:** The length isn't strictly defined as 32, but many implementations land there. | |
| 96 | |
| 97 ```c | |
| 98 char result[32] = {0}; | |
| 99 int32 result_index = 0; | |
| 100 | |
| 101 for (int i = 0; i < 16; i += 3) { | |
| 102 uint32 first_value = 0, second_value = 0, third_value = 0; | |
| 103 | |
| 104 if (i < 15) { | |
| 105 first_value = (uint32)random_value[i] << 16; | |
| 106 second_value = (uint32)random_value[i+1] << 8; // Fixed logic from original draft | |
| 107 third_value = (uint32)random_value[i+2]; | |
| 108 } else { | |
| 109 // Handle the trailing bytes (padding logic usually goes here) | |
| 110 first_value = (uint32)random_value[i] << 16; | |
| 111 } | |
| 112 | |
| 113 uint32 group_value = first_value | second_value | third_value; | |
| 114 | |
| 115 // Map bits to characters: 0x3F is 0011 1111 (keeps only 6 bits) | |
| 116 result[result_index++] = base64_chars[(group_value >> 18) & 0x3F]; | |
| 117 result[result_index++] = base64_chars[(group_value >> 12) & 0x3F]; | |
| 118 result[result_index++] = base64_chars[(group_value >> 6) & 0x3F]; | |
| 119 result[result_index++] = base64_chars[group_value & 0x3F]; | |
| 120 } | |
| 121 | |
| 122 ``` | |
| 123 | |
| 124 Now you have a `Sec-WebSocket-Key`. When the server gets it, it appends a "Magic String" (`258EAFA5-E914-47DA-95CA-C5AB0DC85B11`), SHA1 hashes it, and Base64 encodes it back to you as `Sec-WebSocket-Accept`. | |
| 125 | |
| 126 --- | |
| 127 | |
| 128 ## Upgrading the Protocol | |
| 129 | |
| 130 If you are the client, you just wait for that `101 Switching Protocols` response. Once you see it, you stop sending HTTP text and start sending frames. | |
| 131 | |
| 132 If you are the **server**, you need to keep that connection alive. In my project, `Seobeo`, I create a separate connection object, throw away the HTTP request info to save memory, and start a fresh buffer for WebSocket frames. | |
| 133 | |
| 134 ```c | |
| 135 // Transitioning the state from HTTP to WebSocket | |
| 136 Seobeo_WebSocket_Server_Connection *p_conn = malloc(sizeof(Seobeo_WebSocket_Server_Connection)); | |
| 137 memset(p_conn, 0, sizeof(Seobeo_WebSocket_Server_Connection)); | |
| 138 | |
|
132
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
139 p_conn->p_handle = p_handle; // file descriptor |
| 130 | 140 p_conn->is_active = TRUE; |
| 141 p_conn->fragment_capacity = 4096; | |
| 142 p_conn->fragment_buffer = malloc(p_conn->fragment_capacity); | |
| 143 | |
| 144 ``` | |
| 145 | |
|
132
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
146 ### Wait, what is the p_handle or file descriptor here?? |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
147 |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
148 The File Descriptor (FD) is the internal ID badge your server assigns to a connection. The OS identifies a unique connection via a 4-tuple; Source IP, Source Port, Destination IP, Destination Port. You can think of the OS as a giant hashmap that links these 4-tuples to an integer (the FD). |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
149 |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
150 So the limits are from the number of FD and RAM capactiy. You can check your system's FD limit with: |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
151 |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
152 ```bash |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
153 ulimit -n |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
154 ``` |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
155 |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
156 Now, we have debunked this myth. Let's see how the protocol actually works. |
|
7a63e41a21fb
[Seobeo] Added debug targets.
June Park <parkjune1995@gmail.com>
parents:
131
diff
changeset
|
157 |
| 130 | 158 --- |
| 159 | |
| 160 ## Frame-Based Protocols | |
| 161 | |
| 162 This is where the logic gets "cancerous." WebSockets don't just send raw strings; they wrap everything in a **Frame**. | |
| 163 | |
| 164 ### The Opcode Table | |
| 123 | 165 |
| 130 | 166 The first byte contains the `FIN` bit (is this the end of the message?) and the `Opcode` (what kind of data is this?). |
| 167 | |
| 168 | Opcode (Hex) | Meaning | Description | | |
| 169 | --- | --- | --- | | |
| 170 | `0x0` | Continuation | Part of a multi-frame message | | |
| 171 | `0x1` | Text | UTF-8 payload | | |
| 172 | `0x2` | Binary | Raw binary data | | |
| 173 | `0x8` | Close | Terminate the connection | | |
| 174 | `0x9` | Ping | Heartbeat check | | |
| 175 | `0xA` | Pong | Heartbeat response | | |
| 176 | |
| 177 ### The Masking Rule | |
| 178 | |
| 179 * **Client to Server:** MUST be masked. | |
| 180 * **Server to Client:** MUST NOT be masked. | |
| 181 If a client sends unmasked data, the server must close the connection. It’s the law. | |
| 182 | |
| 183 ### Why the Bitwise Mess? (Endianness) | |
| 184 | |
| 185 In the code below, you’ll see things like `payload_length >> 56`. This is because network protocol headers use **Big-Endian** (most significant byte first). If your computer is Little-Endian (most are), you have to manually shift bits into the right order so the wire sees them correctly. | |
| 186 | |
| 187 --- | |
| 188 | |
| 189 ## Sending a Frame (Client Side) | |
| 190 | |
| 191 Here is how we construct a frame to send data to the server. | |
| 192 | |
| 193 ```c | |
| 194 uint8 frame[14]; | |
| 195 size_t frame_len = 0; | |
| 196 | |
| 197 // Byte 0: FIN bit (0x80) and Opcode | |
| 198 frame[0] = (fin ? 0x80 : 0x00) | (opcode & 0x0F); | |
| 199 frame_len++; | |
| 200 | |
| 201 // Generate a 4-byte mask key | |
| 202 uint8 mask_key[4]; | |
| 203 for (int i = 0; i < 4; i++) | |
| 204 mask_key[i] = (uint8)(rand() % 256); | |
| 205 | |
| 206 // Byte 1+: Payload Length logic | |
| 207 if (payload_length < 126) { | |
| 208 frame[1] = 0x80 | (uint8)payload_length; // 0x80 sets the MASK bit | |
| 209 frame_len++; | |
| 210 } else if (payload_length <= 65535) { | |
| 211 frame[1] = 0x80 | 126; | |
| 212 frame[2] = (uint8)((payload_length >> 8) & 0xFF); | |
| 213 frame[3] = (uint8)(payload_length & 0xFF); | |
| 214 frame_len += 3; | |
| 215 } else { | |
| 216 frame[1] = 0x80 | 127; | |
| 217 for (int i = 0; i < 8; i++) | |
| 218 frame[2 + i] = (uint8)((payload_length >> (56 - i * 8)) & 0xFF); | |
| 219 frame_len += 9; | |
| 220 } | |
| 221 | |
| 222 // Attach the mask key | |
| 223 memcpy(frame + frame_len, mask_key, 4); | |
| 224 frame_len += 4; | |
| 225 | |
| 226 ``` | |
| 227 | |
| 228 To actually send the data, you XOR every byte with the mask: | |
| 229 | |
| 230 ```c | |
| 231 for (size_t i = 0; i < length; i++) | |
| 232 data[i] ^= mask_key[i % 4]; | |
| 123 | 233 |
| 130 | 234 ``` |
| 235 | |
| 236 --- | |
| 237 | |
| 238 ## Receiving a Frame (Server Side) | |
| 239 | |
| 240 On the server side, we have to do the reverse. We peel the onion layer by layer. | |
| 241 | |
| 242 #### 1. Parse the Header | |
| 243 | |
| 244 We check the first two bytes to see how big the payload is and if it's masked. | |
| 245 | |
| 246 ```c | |
| 247 uint8 *buf = p_conn->p_handle->read_buffer; | |
| 248 uint8 byte1 = buf[0]; | |
| 249 uint8 byte2 = buf[1]; | |
| 250 | |
| 251 boolean fin = (byte1 & 0x80) != 0; | |
| 252 Seobeo_WebSocket_Opcode opcode = (Seobeo_WebSocket_Opcode)(byte1 & 0x0F); | |
| 253 boolean masked = (byte2 & 0x80) != 0; | |
| 254 uint64 payload_len = byte2 & 0x7F; | |
| 255 | |
| 256 size_t header_len = 2; | |
| 257 | |
| 258 ``` | |
| 259 | |
| 260 #### 2. Handle Extended Lengths | |
| 261 | |
| 262 If the length is 126 or 127, it means the actual size is hidden in the next 2 or 8 bytes. | |
| 263 | |
| 264 ```c | |
| 265 if (payload_len == 126) { | |
| 266 payload_len = (buf[2] << 8) | buf[3]; | |
| 267 header_len += 2; | |
| 268 } else if (payload_len == 127) { | |
| 269 payload_len = 0; | |
| 270 for (int i = 0; i < 8; i++) | |
| 271 payload_len = (payload_len << 8) | buf[2 + i]; | |
| 272 header_len += 8; | |
| 273 } | |
| 274 | |
| 275 ``` | |
| 276 | |
| 277 #### 3. Unmask the Payload | |
| 278 | |
| 279 If the data is masked (and it should be if it's from a client), we use that 4-byte key to flip the bits back to normal. | |
| 280 | |
| 281 ```c | |
| 282 uint8 mask_key[4] = {0}; | |
| 283 if (masked) { | |
| 284 memcpy(mask_key, buf + header_len, 4); | |
| 285 header_len += 4; | |
| 286 } | |
| 287 | |
| 288 uint8 *payload = malloc(payload_len); | |
| 289 memcpy(payload, buf + header_len, payload_len); | |
| 290 | |
| 291 if (masked) | |
| 292 Seobeo_WebSocket_Unmask_Data(payload, payload_len, mask_key); | |
| 293 | |
| 294 ``` | |
| 295 | |
| 296 --- | |
| 297 | |
| 298 ## Conclusion | |
| 299 | |
| 300 That’s it. That is WebSockets in a nutshell. Once you handle the bit-shifting for the length and the XOR masking, you’re just reading and writing to a socket like any other protocol. | |
| 301 | |
|
133
902e29c38d66
[Blog] Final copy for websocket one.
June Park <parkjune1995@gmail.com>
parents:
132
diff
changeset
|
302 You can test my implementation at [this page](https://mrjunejune.com/talk). Open two tabs and talk to yourself! |