comparison mrjunejune/src/blog/websocket-demystified/index.md @ 131:b230a743a01e

Added blog.
author June Park <parkjune1995@gmail.com>
date Fri, 09 Jan 2026 07:42:04 -0800
parents 3a564ffb2092
children 7a63e41a21fb
comparison
equal deleted inserted replaced
129:f7860f491a8c 131:b230a743a01e
1 # Websocket Demystified 1 # WebSocket Demystified
2 2
3 Websocket has been around for more than 10 years now. (Its [RFC](https://www.rfc-editor.org/rfc/rfc6455) was created in 2011.). This was inevitable as apps got more complexed people wanted to create an application that can create bidirectional communicate with server, and not create hacky solutions that will create raw TCP connection between client and server with some keys or do some short/long polling. Now, this is the most widely used protocol for LLM chat usages or any chat usages as expected since LLM sends messages in a stream of bytes as it is predicting next words at a time. Many developers create websocket connection through 3 WebSockets have been around for more than 10 years now—the [RFC 6455](https://www.rfc-editor.org/rfc/rfc6455) was dropped way back in 2011. This was inevitable. As apps got more complex, people wanted real bidirectional communication without resorting to hacky solutions like raw TCP connections with custom keys or the constant overhead of short/long polling.
4 4
5 Today, it’s the standard for everything from chat apps to LLM interfaces, where the model streams bytes back to you one token at a time as it predicts the next word.
6
7 Most developers just grab a library like [ws](https://github.com/websockets/ws) for Node.js or [websockets](https://websockets.readthedocs.io/) for Python and call it a day. But many don’t realize the underlying mechanism is actually pretty simple to implement yourself in a day or so. Let's look at how to build it from scratch.
8
9 ---
10
11 ## Requirements
12
13 * Ability to type
14 * Half a brain
15 * A computer
16
17 ---
18
19 ## The Lifecycle
20
21 To get a WebSocket up and running, you have to follow a specific dance. It’s not just "connecting to a port"; it's an evolution of an existing relationship.
22
23 1. **The Handshake:** A client sends a "pretty please" HTTP request asking to upgrade the connection.
24 2. **The Response:** The server agrees (101 Switching Protocols) and sends back a specific hash.
25 3. **The Switch:** Both sides stop talking "HTTP" and start talking "Frames."
26 4. **The Interaction:** Bidirectional, binary-framed messaging until someone closes the door.
27
28 ---
29
30 ## Opening Handshakes
31
32 To start the upgrade from HTTP to WebSocket, the client sends a standard GET request but with some very specific headers.
33
34 <div class="center"> <img src="/public/white-noise-grass.png" /> </div>
35
36 I’m assuming you know how HTTP works. If not, you can open a developer tool by right clicking on your browser and seeing into network tab and refershign the page. The only interesting values here is the `Sec-WebSocket-Key`. This key is usually a 16-byte random value encoded in **Base64**.
37
38 **Note:** It’s not for security—it’s to prevent intermediate caches from accidentally serving a cached WebSocket response to a different client.
39
40 But before we jump into that, we need to construct that Base64 key.
41
42 ### What is Base64?
43
44 Let's ask Gemini:
45
46 > "Base64 is a binary-to-text encoding scheme that represents data in an ASCII string format by translating it into a radix-64 representation, using a specific set of 64 printable characters." — Gemini
47
48
49 Nice, it didn't halluciante. Let's constracut these. Here they are 64 characters that are safe to print in ASCII.:
50
51 ```c
52 static const char base64_chars[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
53
54 ```
55
56 In Python, this is a one-liner:
57
58 ```python
59 import base64
60 import os
61 print(base64.b64encode(os.urandom(16)))
62
63 ```
64
65 But we are rewriting this from scratch in **C**, so we need to suffer a little. The logic is: take 3 bytes (24 bits) and split them into 4 chunks of 6 bits each. Each 6-bit chunk becomes an index into our `base64_chars` array.
66
67 #### Step 1: Generate Random Bytes
68
69 First, we grab 16 random bytes.
70
71 ```c
72 srand((unsigned int)time(NULL));
73 uint8 random_value[16];
74
75 for (int i = 0; i < 16; i++)
76 random_value[i] = (uint8)(rand() % 256);
77 ```
78
79 #### Step 2: The Bit-Shifting Magic
80
81 We loop through our 16 bytes in groups of 3. We pack them into a 32-bit integer, then carve that integer into 6-bit slices.
82
83 > **Note:** The length isn't strictly defined as 32, but many implementations land there.
84
85 ```c
86 char result[32] = {0};
87 int32 result_index = 0;
88
89 for (int i = 0; i < 16; i += 3) {
90 uint32 first_value = 0, second_value = 0, third_value = 0;
91
92 if (i < 15) {
93 first_value = (uint32)random_value[i] << 16;
94 second_value = (uint32)random_value[i+1] << 8; // Fixed logic from original draft
95 third_value = (uint32)random_value[i+2];
96 } else {
97 // Handle the trailing bytes (padding logic usually goes here)
98 first_value = (uint32)random_value[i] << 16;
99 }
100
101 uint32 group_value = first_value | second_value | third_value;
102
103 // Map bits to characters: 0x3F is 0011 1111 (keeps only 6 bits)
104 result[result_index++] = base64_chars[(group_value >> 18) & 0x3F];
105 result[result_index++] = base64_chars[(group_value >> 12) & 0x3F];
106 result[result_index++] = base64_chars[(group_value >> 6) & 0x3F];
107 result[result_index++] = base64_chars[group_value & 0x3F];
108 }
109
110 ```
111
112 Now you have a `Sec-WebSocket-Key`. When the server gets it, it appends a "Magic String" (`258EAFA5-E914-47DA-95CA-C5AB0DC85B11`), SHA1 hashes it, and Base64 encodes it back to you as `Sec-WebSocket-Accept`.
113
114 ---
115
116 ## Upgrading the Protocol
117
118 If you are the client, you just wait for that `101 Switching Protocols` response. Once you see it, you stop sending HTTP text and start sending frames.
119
120 If you are the **server**, you need to keep that connection alive. In my project, `Seobeo`, I create a separate connection object, throw away the HTTP request info to save memory, and start a fresh buffer for WebSocket frames.
121
122 ```c
123 // Transitioning the state from HTTP to WebSocket
124 Seobeo_WebSocket_Server_Connection *p_conn = malloc(sizeof(Seobeo_WebSocket_Server_Connection));
125 memset(p_conn, 0, sizeof(Seobeo_WebSocket_Server_Connection));
126
127 p_conn->p_handle = p_handle;
128 p_conn->is_active = TRUE;
129 p_conn->fragment_capacity = 4096;
130 p_conn->fragment_buffer = malloc(p_conn->fragment_capacity);
131
132 ```
133
134 ---
135
136 ## Frame-Based Protocols
137
138 This is where the logic gets "cancerous." WebSockets don't just send raw strings; they wrap everything in a **Frame**.
139
140 ### The Opcode Table
141
142 The first byte contains the `FIN` bit (is this the end of the message?) and the `Opcode` (what kind of data is this?).
143
144 | Opcode (Hex) | Meaning | Description |
145 | --- | --- | --- |
146 | `0x0` | Continuation | Part of a multi-frame message |
147 | `0x1` | Text | UTF-8 payload |
148 | `0x2` | Binary | Raw binary data |
149 | `0x8` | Close | Terminate the connection |
150 | `0x9` | Ping | Heartbeat check |
151 | `0xA` | Pong | Heartbeat response |
152
153 ### The Masking Rule
154
155 * **Client to Server:** MUST be masked.
156 * **Server to Client:** MUST NOT be masked.
157 If a client sends unmasked data, the server must close the connection. It’s the law.
158
159 ### Why the Bitwise Mess? (Endianness)
160
161 In the code below, you’ll see things like `payload_length >> 56`. This is because network protocol headers use **Big-Endian** (most significant byte first). If your computer is Little-Endian (most are), you have to manually shift bits into the right order so the wire sees them correctly.
162
163 ---
164
165 ## Sending a Frame (Client Side)
166
167 Here is how we construct a frame to send data to the server.
168
169 ```c
170 uint8 frame[14];
171 size_t frame_len = 0;
172
173 // Byte 0: FIN bit (0x80) and Opcode
174 frame[0] = (fin ? 0x80 : 0x00) | (opcode & 0x0F);
175 frame_len++;
176
177 // Generate a 4-byte mask key
178 uint8 mask_key[4];
179 for (int i = 0; i < 4; i++)
180 mask_key[i] = (uint8)(rand() % 256);
181
182 // Byte 1+: Payload Length logic
183 if (payload_length < 126) {
184 frame[1] = 0x80 | (uint8)payload_length; // 0x80 sets the MASK bit
185 frame_len++;
186 } else if (payload_length <= 65535) {
187 frame[1] = 0x80 | 126;
188 frame[2] = (uint8)((payload_length >> 8) & 0xFF);
189 frame[3] = (uint8)(payload_length & 0xFF);
190 frame_len += 3;
191 } else {
192 frame[1] = 0x80 | 127;
193 for (int i = 0; i < 8; i++)
194 frame[2 + i] = (uint8)((payload_length >> (56 - i * 8)) & 0xFF);
195 frame_len += 9;
196 }
197
198 // Attach the mask key
199 memcpy(frame + frame_len, mask_key, 4);
200 frame_len += 4;
201
202 ```
203
204 To actually send the data, you XOR every byte with the mask:
205
206 ```c
207 for (size_t i = 0; i < length; i++)
208 data[i] ^= mask_key[i % 4];
209
210 ```
211
212 ---
213
214 ## Receiving a Frame (Server Side)
215
216 On the server side, we have to do the reverse. We peel the onion layer by layer.
217
218 #### 1. Parse the Header
219
220 We check the first two bytes to see how big the payload is and if it's masked.
221
222 ```c
223 uint8 *buf = p_conn->p_handle->read_buffer;
224 uint8 byte1 = buf[0];
225 uint8 byte2 = buf[1];
226
227 boolean fin = (byte1 & 0x80) != 0;
228 Seobeo_WebSocket_Opcode opcode = (Seobeo_WebSocket_Opcode)(byte1 & 0x0F);
229 boolean masked = (byte2 & 0x80) != 0;
230 uint64 payload_len = byte2 & 0x7F;
231
232 size_t header_len = 2;
233
234 ```
235
236 #### 2. Handle Extended Lengths
237
238 If the length is 126 or 127, it means the actual size is hidden in the next 2 or 8 bytes.
239
240 ```c
241 if (payload_len == 126) {
242 payload_len = (buf[2] << 8) | buf[3];
243 header_len += 2;
244 } else if (payload_len == 127) {
245 payload_len = 0;
246 for (int i = 0; i < 8; i++)
247 payload_len = (payload_len << 8) | buf[2 + i];
248 header_len += 8;
249 }
250
251 ```
252
253 #### 3. Unmask the Payload
254
255 If the data is masked (and it should be if it's from a client), we use that 4-byte key to flip the bits back to normal.
256
257 ```c
258 uint8 mask_key[4] = {0};
259 if (masked) {
260 memcpy(mask_key, buf + header_len, 4);
261 header_len += 4;
262 }
263
264 uint8 *payload = malloc(payload_len);
265 memcpy(payload, buf + header_len, payload_len);
266
267 if (masked)
268 Seobeo_WebSocket_Unmask_Data(payload, payload_len, mask_key);
269
270 ```
271
272 ---
273
274 ## Conclusion
275
276 That’s it. That is WebSockets in a nutshell. Once you handle the bit-shifting for the length and the XOR masking, you’re just reading and writing to a socket like any other protocol.
277
278 You can test my implementation at `mrjunejune.com/talk`. Open two tabs and talk to yourself—it’s a great way to verify your frames are flying correctly.