Mercurial
comparison third_party/luajit/doc/ext_buffer.html @ 178:94705b5986b3
[ThirdParty] Added WRK and luajit for load testing.
| author | MrJuneJune <me@mrjunejune.com> |
|---|---|
| date | Thu, 22 Jan 2026 20:10:30 -0800 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 177:24fe8ff94056 | 178:94705b5986b3 |
|---|---|
| 1 <!DOCTYPE html> | |
| 2 <html> | |
| 3 <head> | |
| 4 <title>String Buffer Library</title> | |
| 5 <meta charset="utf-8"> | |
| 6 <meta name="Copyright" content="Copyright (C) 2005-2023"> | |
| 7 <meta name="Language" content="en"> | |
| 8 <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen"> | |
| 9 <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print"> | |
| 10 <style type="text/css"> | |
| 11 .lib { | |
| 12 vertical-align: middle; | |
| 13 margin-left: 5px; | |
| 14 padding: 0 5px; | |
| 15 font-size: 60%; | |
| 16 border-radius: 5px; | |
| 17 background: #c5d5ff; | |
| 18 color: #000; | |
| 19 } | |
| 20 </style> | |
| 21 </head> | |
| 22 <body> | |
| 23 <div id="site"> | |
| 24 <a href="https://luajit.org"><span>Lua<span id="logo">JIT</span></span></a> | |
| 25 </div> | |
| 26 <div id="head"> | |
| 27 <h1>String Buffer Library</h1> | |
| 28 </div> | |
| 29 <div id="nav"> | |
| 30 <ul><li> | |
| 31 <a href="luajit.html">LuaJIT</a> | |
| 32 <ul><li> | |
| 33 <a href="https://luajit.org/download.html">Download <span class="ext">»</span></a> | |
| 34 </li><li> | |
| 35 <a href="install.html">Installation</a> | |
| 36 </li><li> | |
| 37 <a href="running.html">Running</a> | |
| 38 </li></ul> | |
| 39 </li><li> | |
| 40 <a href="extensions.html">Extensions</a> | |
| 41 <ul><li> | |
| 42 <a href="ext_ffi.html">FFI Library</a> | |
| 43 <ul><li> | |
| 44 <a href="ext_ffi_tutorial.html">FFI Tutorial</a> | |
| 45 </li><li> | |
| 46 <a href="ext_ffi_api.html">ffi.* API</a> | |
| 47 </li><li> | |
| 48 <a href="ext_ffi_semantics.html">FFI Semantics</a> | |
| 49 </li></ul> | |
| 50 </li><li> | |
| 51 <a class="current" href="ext_buffer.html">String Buffers</a> | |
| 52 </li><li> | |
| 53 <a href="ext_jit.html">jit.* Library</a> | |
| 54 </li><li> | |
| 55 <a href="ext_c_api.html">Lua/C API</a> | |
| 56 </li><li> | |
| 57 <a href="ext_profiler.html">Profiler</a> | |
| 58 </li></ul> | |
| 59 </li><li> | |
| 60 <a href="https://luajit.org/status.html">Status <span class="ext">»</span></a> | |
| 61 </li><li> | |
| 62 <a href="https://luajit.org/faq.html">FAQ <span class="ext">»</span></a> | |
| 63 </li><li> | |
| 64 <a href="https://luajit.org/list.html">Mailing List <span class="ext">»</span></a> | |
| 65 </li></ul> | |
| 66 </div> | |
| 67 <div id="main"> | |
| 68 <p> | |
| 69 The string buffer library allows <b>high-performance manipulation of | |
| 70 string-like data</b>. | |
| 71 </p> | |
| 72 <p> | |
| 73 Unlike Lua strings, which are constants, string buffers are | |
| 74 <b>mutable</b> sequences of 8-bit (binary-transparent) characters. Data | |
| 75 can be stored, formatted and encoded into a string buffer and later | |
| 76 converted, extracted or decoded. | |
| 77 </p> | |
| 78 <p> | |
| 79 The convenient string buffer API simplifies common string manipulation | |
| 80 tasks, that would otherwise require creating many intermediate strings. | |
| 81 String buffers improve performance by eliminating redundant memory | |
| 82 copies, object creation, string interning and garbage collection | |
| 83 overhead. In conjunction with the FFI library, they allow zero-copy | |
| 84 operations. | |
| 85 </p> | |
| 86 <p> | |
| 87 The string buffer library also includes a high-performance | |
| 88 <a href="serialize">serializer</a> for Lua objects. | |
| 89 </p> | |
| 90 | |
| 91 <h2 id="use">Using the String Buffer Library</h2> | |
| 92 <p> | |
| 93 The string buffer library is built into LuaJIT by default, but it's not | |
| 94 loaded by default. Add this to the start of every Lua file that needs | |
| 95 one of its functions: | |
| 96 </p> | |
| 97 <pre class="code"> | |
| 98 local buffer = require("string.buffer") | |
| 99 </pre> | |
| 100 <p> | |
| 101 The convention for the syntax shown on this page is that <tt>buffer</tt> | |
| 102 refers to the buffer library and <tt>buf</tt> refers to an individual | |
| 103 buffer object. | |
| 104 </p> | |
| 105 <p> | |
| 106 Please note the difference between a Lua function call, e.g. | |
| 107 <tt>buffer.new()</tt> (with a dot) and a Lua method call, e.g. | |
| 108 <tt>buf:reset()</tt> (with a colon). | |
| 109 </p> | |
| 110 | |
| 111 <h3 id="buffer_object">Buffer Objects</h3> | |
| 112 <p> | |
| 113 A buffer object is a garbage-collected Lua object. After creation with | |
| 114 <tt>buffer.new()</tt>, it can (and should) be reused for many operations. | |
| 115 When the last reference to a buffer object is gone, it will eventually | |
| 116 be freed by the garbage collector, along with the allocated buffer | |
| 117 space. | |
| 118 </p> | |
| 119 <p> | |
| 120 Buffers operate like a FIFO (first-in first-out) data structure. Data | |
| 121 can be appended (written) to the end of the buffer and consumed (read) | |
| 122 from the front of the buffer. These operations may be freely mixed. | |
| 123 </p> | |
| 124 <p> | |
| 125 The buffer space that holds the characters is managed automatically | |
| 126 — it grows as needed and already consumed space is recycled. Use | |
| 127 <tt>buffer.new(size)</tt> and <tt>buf:free()</tt>, if you need more | |
| 128 control. | |
| 129 </p> | |
| 130 <p> | |
| 131 The maximum size of a single buffer is the same as the maximum size of a | |
| 132 Lua string, which is slightly below two gigabytes. For huge data sizes, | |
| 133 neither strings nor buffers are the right data structure — use the | |
| 134 FFI library to directly map memory or files up to the virtual memory | |
| 135 limit of your OS. | |
| 136 </p> | |
| 137 | |
| 138 <h3 id="buffer_overview">Buffer Method Overview</h3> | |
| 139 <ul> | |
| 140 <li> | |
| 141 The <tt>buf:put*()</tt>-like methods append (write) characters to the | |
| 142 end of the buffer. | |
| 143 </li> | |
| 144 <li> | |
| 145 The <tt>buf:get*()</tt>-like methods consume (read) characters from the | |
| 146 front of the buffer. | |
| 147 </li> | |
| 148 <li> | |
| 149 Other methods, like <tt>buf:tostring()</tt> only read the buffer | |
| 150 contents, but don't change the buffer. | |
| 151 </li> | |
| 152 <li> | |
| 153 The <tt>buf:set()</tt> method allows zero-copy consumption of a string | |
| 154 or an FFI cdata object as a buffer. | |
| 155 </li> | |
| 156 <li> | |
| 157 The FFI-specific methods allow zero-copy read/write-style operations or | |
| 158 modifying the buffer contents in-place. Please check the | |
| 159 <a href="#ffi_caveats">FFI caveats</a> below, too. | |
| 160 </li> | |
| 161 <li> | |
| 162 Methods that don't need to return anything specific, return the buffer | |
| 163 object itself as a convenience. This allows method chaining, e.g.: | |
| 164 <tt>buf:reset():encode(obj)</tt> or <tt>buf:skip(len):get()</tt> | |
| 165 </li> | |
| 166 </ul> | |
| 167 | |
| 168 <h2 id="create">Buffer Creation and Management</h2> | |
| 169 | |
| 170 <h3 id="buffer_new"><tt>local buf = buffer.new([size [,options]])<br> | |
| 171 local buf = buffer.new([options])</tt></h3> | |
| 172 <p> | |
| 173 Creates a new buffer object. | |
| 174 </p> | |
| 175 <p> | |
| 176 The optional <tt>size</tt> argument ensures a minimum initial buffer | |
| 177 size. This is strictly an optimization when the required buffer size is | |
| 178 known beforehand. The buffer space will grow as needed, in any case. | |
| 179 </p> | |
| 180 <p> | |
| 181 The optional table <tt>options</tt> sets various | |
| 182 <a href="#serialize_options">serialization options</a>. | |
| 183 </p> | |
| 184 | |
| 185 <h3 id="buffer_reset"><tt>buf = buf:reset()</tt></h3> | |
| 186 <p> | |
| 187 Reset (empty) the buffer. The allocated buffer space is not freed and | |
| 188 may be reused. | |
| 189 </p> | |
| 190 | |
| 191 <h3 id="buffer_free"><tt>buf = buf:free()</tt></h3> | |
| 192 <p> | |
| 193 The buffer space of the buffer object is freed. The object itself | |
| 194 remains intact, empty and may be reused. | |
| 195 </p> | |
| 196 <p> | |
| 197 Note: you normally don't need to use this method. The garbage collector | |
| 198 automatically frees the buffer space, when the buffer object is | |
| 199 collected. Use this method, if you need to free the associated memory | |
| 200 immediately. | |
| 201 </p> | |
| 202 | |
| 203 <h2 id="write">Buffer Writers</h2> | |
| 204 | |
| 205 <h3 id="buffer_put"><tt>buf = buf:put([str|num|obj] [,…])</tt></h3> | |
| 206 <p> | |
| 207 Appends a string <tt>str</tt>, a number <tt>num</tt> or any object | |
| 208 <tt>obj</tt> with a <tt>__tostring</tt> metamethod to the buffer. | |
| 209 Multiple arguments are appended in the given order. | |
| 210 </p> | |
| 211 <p> | |
| 212 Appending a buffer to a buffer is possible and short-circuited | |
| 213 internally. But it still involves a copy. Better combine the buffer | |
| 214 writes to use a single buffer. | |
| 215 </p> | |
| 216 | |
| 217 <h3 id="buffer_putf"><tt>buf = buf:putf(format, …)</tt></h3> | |
| 218 <p> | |
| 219 Appends the formatted arguments to the buffer. The <tt>format</tt> | |
| 220 string supports the same options as <tt>string.format()</tt>. | |
| 221 </p> | |
| 222 | |
| 223 <h3 id="buffer_putcdata"><tt>buf = buf:putcdata(cdata, len)</tt><span class="lib">FFI</span></h3> | |
| 224 <p> | |
| 225 Appends the given <tt>len</tt> number of bytes from the memory pointed | |
| 226 to by the FFI <tt>cdata</tt> object to the buffer. The object needs to | |
| 227 be convertible to a (constant) pointer. | |
| 228 </p> | |
| 229 | |
| 230 <h3 id="buffer_set"><tt>buf = buf:set(str)<br> | |
| 231 buf = buf:set(cdata, len)</tt><span class="lib">FFI</span></h3> | |
| 232 <p> | |
| 233 This method allows zero-copy consumption of a string or an FFI cdata | |
| 234 object as a buffer. It stores a reference to the passed string | |
| 235 <tt>str</tt> or the FFI <tt>cdata</tt> object in the buffer. Any buffer | |
| 236 space originally allocated is freed. This is <i>not</i> an append | |
| 237 operation, unlike the <tt>buf:put*()</tt> methods. | |
| 238 </p> | |
| 239 <p> | |
| 240 After calling this method, the buffer behaves as if | |
| 241 <tt>buf:free():put(str)</tt> or <tt>buf:free():put(cdata, len)</tt> | |
| 242 had been called. However, the data is only referenced and not copied, as | |
| 243 long as the buffer is only consumed. | |
| 244 </p> | |
| 245 <p> | |
| 246 In case the buffer is written to later on, the referenced data is copied | |
| 247 and the object reference is removed (copy-on-write semantics). | |
| 248 </p> | |
| 249 <p> | |
| 250 The stored reference is an anchor for the garbage collector and keeps the | |
| 251 originally passed string or FFI cdata object alive. | |
| 252 </p> | |
| 253 | |
| 254 <h3 id="buffer_reserve"><tt>ptr, len = buf:reserve(size)</tt><span class="lib">FFI</span><br> | |
| 255 <tt>buf = buf:commit(used)</tt><span class="lib">FFI</span></h3> | |
| 256 <p> | |
| 257 The <tt>reserve</tt> method reserves at least <tt>size</tt> bytes of | |
| 258 write space in the buffer. It returns an <tt>uint8_t *</tt> FFI | |
| 259 cdata pointer <tt>ptr</tt> that points to this space. | |
| 260 </p> | |
| 261 <p> | |
| 262 The available length in bytes is returned in <tt>len</tt>. This is at | |
| 263 least <tt>size</tt> bytes, but may be more to facilitate efficient | |
| 264 buffer growth. You can either make use of the additional space or ignore | |
| 265 <tt>len</tt> and only use <tt>size</tt> bytes. | |
| 266 </p> | |
| 267 <p> | |
| 268 The <tt>commit</tt> method appends the <tt>used</tt> bytes of the | |
| 269 previously returned write space to the buffer data. | |
| 270 </p> | |
| 271 <p> | |
| 272 This pair of methods allows zero-copy use of C read-style APIs: | |
| 273 </p> | |
| 274 <pre class="code"> | |
| 275 local MIN_SIZE = 65536 | |
| 276 repeat | |
| 277 local ptr, len = buf:reserve(MIN_SIZE) | |
| 278 local n = C.read(fd, ptr, len) | |
| 279 if n == 0 then break end -- EOF. | |
| 280 if n < 0 then error("read error") end | |
| 281 buf:commit(n) | |
| 282 until false | |
| 283 </pre> | |
| 284 <p> | |
| 285 The reserved write space is <i>not</i> initialized. At least the | |
| 286 <tt>used</tt> bytes <b>must</b> be written to before calling the | |
| 287 <tt>commit</tt> method. There's no need to call the <tt>commit</tt> | |
| 288 method, if nothing is added to the buffer (e.g. on error). | |
| 289 </p> | |
| 290 | |
| 291 <h2 id="read">Buffer Readers</h2> | |
| 292 | |
| 293 <h3 id="buffer_length"><tt>len = #buf</tt></h3> | |
| 294 <p> | |
| 295 Returns the current length of the buffer data in bytes. | |
| 296 </p> | |
| 297 | |
| 298 <h3 id="buffer_concat"><tt>res = str|num|buf .. str|num|buf […]</tt></h3> | |
| 299 <p> | |
| 300 The Lua concatenation operator <tt>..</tt> also accepts buffers, just | |
| 301 like strings or numbers. It always returns a string and not a buffer. | |
| 302 </p> | |
| 303 <p> | |
| 304 Note that although this is supported for convenience, this thwarts one | |
| 305 of the main reasons to use buffers, which is to avoid string | |
| 306 allocations. Rewrite it with <tt>buf:put()</tt> and <tt>buf:get()</tt>. | |
| 307 </p> | |
| 308 <p> | |
| 309 Mixing this with unrelated objects that have a <tt>__concat</tt> | |
| 310 metamethod may not work, since these probably only expect strings. | |
| 311 </p> | |
| 312 | |
| 313 <h3 id="buffer_skip"><tt>buf = buf:skip(len)</tt></h3> | |
| 314 <p> | |
| 315 Skips (consumes) <tt>len</tt> bytes from the buffer up to the current | |
| 316 length of the buffer data. | |
| 317 </p> | |
| 318 | |
| 319 <h3 id="buffer_get"><tt>str, … = buf:get([len|nil] [,…])</tt></h3> | |
| 320 <p> | |
| 321 Consumes the buffer data and returns one or more strings. If called | |
| 322 without arguments, the whole buffer data is consumed. If called with a | |
| 323 number, up to <tt>len</tt> bytes are consumed. A <tt>nil</tt> argument | |
| 324 consumes the remaining buffer space (this only makes sense as the last | |
| 325 argument). Multiple arguments consume the buffer data in the given | |
| 326 order. | |
| 327 </p> | |
| 328 <p> | |
| 329 Note: a zero length or no remaining buffer data returns an empty string | |
| 330 and not <tt>nil</tt>. | |
| 331 </p> | |
| 332 | |
| 333 <h3 id="buffer_tostring"><tt>str = buf:tostring()<br> | |
| 334 str = tostring(buf)</tt></h3> | |
| 335 <p> | |
| 336 Creates a string from the buffer data, but doesn't consume it. The | |
| 337 buffer remains unchanged. | |
| 338 </p> | |
| 339 <p> | |
| 340 Buffer objects also define a <tt>__tostring</tt> metamethod. This means | |
| 341 buffers can be passed to the global <tt>tostring()</tt> function and | |
| 342 many other functions that accept this in place of strings. The important | |
| 343 internal uses in functions like <tt>io.write()</tt> are short-circuited | |
| 344 to avoid the creation of an intermediate string object. | |
| 345 </p> | |
| 346 | |
| 347 <h3 id="buffer_ref"><tt>ptr, len = buf:ref()</tt><span class="lib">FFI</span></h3> | |
| 348 <p> | |
| 349 Returns an <tt>uint8_t *</tt> FFI cdata pointer <tt>ptr</tt> that | |
| 350 points to the buffer data. The length of the buffer data in bytes is | |
| 351 returned in <tt>len</tt>. | |
| 352 </p> | |
| 353 <p> | |
| 354 The returned pointer can be directly passed to C functions that expect a | |
| 355 buffer and a length. You can also do bytewise reads | |
| 356 (<tt>local x = ptr[i]</tt>) or writes | |
| 357 (<tt>ptr[i] = 0x40</tt>) of the buffer data. | |
| 358 </p> | |
| 359 <p> | |
| 360 In conjunction with the <tt>skip</tt> method, this allows zero-copy use | |
| 361 of C write-style APIs: | |
| 362 </p> | |
| 363 <pre class="code"> | |
| 364 repeat | |
| 365 local ptr, len = buf:ref() | |
| 366 if len == 0 then break end | |
| 367 local n = C.write(fd, ptr, len) | |
| 368 if n < 0 then error("write error") end | |
| 369 buf:skip(n) | |
| 370 until n >= len | |
| 371 </pre> | |
| 372 <p> | |
| 373 Unlike Lua strings, buffer data is <i>not</i> implicitly | |
| 374 zero-terminated. It's not safe to pass <tt>ptr</tt> to C functions that | |
| 375 expect zero-terminated strings. If you're not using <tt>len</tt>, then | |
| 376 you're doing something wrong. | |
| 377 </p> | |
| 378 | |
| 379 <h2 id="serialize">Serialization of Lua Objects</h2> | |
| 380 <p> | |
| 381 The following functions and methods allow <b>high-speed serialization</b> | |
| 382 (encoding) of a Lua object into a string and decoding it back to a Lua | |
| 383 object. This allows convenient storage and transport of <b>structured | |
| 384 data</b>. | |
| 385 </p> | |
| 386 <p> | |
| 387 The encoded data is in an <a href="#serialize_format">internal binary | |
| 388 format</a>. The data can be stored in files, binary-transparent | |
| 389 databases or transmitted to other LuaJIT instances across threads, | |
| 390 processes or networks. | |
| 391 </p> | |
| 392 <p> | |
| 393 Encoding speed can reach up to 1 Gigabyte/second on a modern desktop- or | |
| 394 server-class system, even when serializing many small objects. Decoding | |
| 395 speed is mostly constrained by object creation cost. | |
| 396 </p> | |
| 397 <p> | |
| 398 The serializer handles most Lua types, common FFI number types and | |
| 399 nested structures. Functions, thread objects, other FFI cdata and full | |
| 400 userdata cannot be serialized (yet). | |
| 401 </p> | |
| 402 <p> | |
| 403 The encoder serializes nested structures as trees. Multiple references | |
| 404 to a single object will be stored separately and create distinct objects | |
| 405 after decoding. Circular references cause an error. | |
| 406 </p> | |
| 407 | |
| 408 <h3 id="serialize_methods">Serialization Functions and Methods</h3> | |
| 409 | |
| 410 <h3 id="buffer_encode"><tt>str = buffer.encode(obj)<br> | |
| 411 buf = buf:encode(obj)</tt></h3> | |
| 412 <p> | |
| 413 Serializes (encodes) the Lua object <tt>obj</tt>. The stand-alone | |
| 414 function returns a string <tt>str</tt>. The buffer method appends the | |
| 415 encoding to the buffer. | |
| 416 </p> | |
| 417 <p> | |
| 418 <tt>obj</tt> can be any of the supported Lua types — it doesn't | |
| 419 need to be a Lua table. | |
| 420 </p> | |
| 421 <p> | |
| 422 This function may throw an error when attempting to serialize | |
| 423 unsupported object types, circular references or deeply nested tables. | |
| 424 </p> | |
| 425 | |
| 426 <h3 id="buffer_decode"><tt>obj = buffer.decode(str)<br> | |
| 427 obj = buf:decode()</tt></h3> | |
| 428 <p> | |
| 429 The stand-alone function deserializes (decodes) the string | |
| 430 <tt>str</tt>, the buffer method deserializes one object from the | |
| 431 buffer. Both return a Lua object <tt>obj</tt>. | |
| 432 </p> | |
| 433 <p> | |
| 434 The returned object may be any of the supported Lua types — | |
| 435 even <tt>nil</tt>. | |
| 436 </p> | |
| 437 <p> | |
| 438 This function may throw an error when fed with malformed or incomplete | |
| 439 encoded data. The stand-alone function throws when there's left-over | |
| 440 data after decoding a single top-level object. The buffer method leaves | |
| 441 any left-over data in the buffer. | |
| 442 </p> | |
| 443 <p> | |
| 444 Attempting to deserialize an FFI type will throw an error, if the FFI | |
| 445 library is not built-in or has not been loaded, yet. | |
| 446 </p> | |
| 447 | |
| 448 <h3 id="serialize_options">Serialization Options</h3> | |
| 449 <p> | |
| 450 The <tt>options</tt> table passed to <tt>buffer.new()</tt> may contain | |
| 451 the following members (all optional): | |
| 452 </p> | |
| 453 <ul> | |
| 454 <li> | |
| 455 <tt>dict</tt> is a Lua table holding a <b>dictionary of strings</b> that | |
| 456 commonly occur as table keys of objects you are serializing. These keys | |
| 457 are compactly encoded as indexes during serialization. A well-chosen | |
| 458 dictionary saves space and improves serialization performance. | |
| 459 </li> | |
| 460 <li> | |
| 461 <tt>metatable</tt> is a Lua table holding a <b>dictionary of metatables</b> | |
| 462 for the table objects you are serializing. | |
| 463 </li> | |
| 464 </ul> | |
| 465 <p> | |
| 466 <tt>dict</tt> needs to be an array of strings and <tt>metatable</tt> needs | |
| 467 to be an array of tables. Both starting at index 1 and without holes (no | |
| 468 <tt>nil</tt> in between). The tables are anchored in the buffer object and | |
| 469 internally modified into a two-way index (don't do this yourself, just pass | |
| 470 a plain array). The tables must not be modified after they have been passed | |
| 471 to <tt>buffer.new()</tt>. | |
| 472 </p> | |
| 473 <p> | |
| 474 The <tt>dict</tt> and <tt>metatable</tt> tables used by the encoder and | |
| 475 decoder must be the same. Put the most common entries at the front. Extend | |
| 476 at the end to ensure backwards-compatibility — older encodings can | |
| 477 then still be read. You may also set some indexes to <tt>false</tt> to | |
| 478 explicitly drop backwards-compatibility. Old encodings that use these | |
| 479 indexes will throw an error when decoded. | |
| 480 </p> | |
| 481 <p> | |
| 482 Metatables that are not found in the <tt>metatable</tt> dictionary are | |
| 483 ignored when encoding. Decoding returns a table with a <tt>nil</tt> | |
| 484 metatable. | |
| 485 </p> | |
| 486 <p> | |
| 487 Note: parsing and preparation of the options table is somewhat | |
| 488 expensive. Create a buffer object only once and recycle it for multiple | |
| 489 uses. Avoid mixing encoder and decoder buffers, since the | |
| 490 <tt>buf:set()</tt> method frees the already allocated buffer space: | |
| 491 </p> | |
| 492 <pre class="code"> | |
| 493 local options = { | |
| 494 dict = { "commonly", "used", "string", "keys" }, | |
| 495 } | |
| 496 local buf_enc = buffer.new(options) | |
| 497 local buf_dec = buffer.new(options) | |
| 498 | |
| 499 local function encode(obj) | |
| 500 return buf_enc:reset():encode(obj):get() | |
| 501 end | |
| 502 | |
| 503 local function decode(str) | |
| 504 return buf_dec:set(str):decode() | |
| 505 end | |
| 506 </pre> | |
| 507 | |
| 508 <h3 id="serialize_stream">Streaming Serialization</h3> | |
| 509 <p> | |
| 510 In some contexts, it's desirable to do piecewise serialization of large | |
| 511 datasets, also known as <i>streaming</i>. | |
| 512 </p> | |
| 513 <p> | |
| 514 This serialization format can be safely concatenated and supports streaming. | |
| 515 Multiple encodings can simply be appended to a buffer and later decoded | |
| 516 individually: | |
| 517 </p> | |
| 518 <pre class="code"> | |
| 519 local buf = buffer.new() | |
| 520 buf:encode(obj1) | |
| 521 buf:encode(obj2) | |
| 522 local copy1 = buf:decode() | |
| 523 local copy2 = buf:decode() | |
| 524 </pre> | |
| 525 <p> | |
| 526 Here's how to iterate over a stream: | |
| 527 </p> | |
| 528 <pre class="code"> | |
| 529 while #buf ~= 0 do | |
| 530 local obj = buf:decode() | |
| 531 -- Do something with obj. | |
| 532 end | |
| 533 </pre> | |
| 534 <p> | |
| 535 Since the serialization format doesn't prepend a length to its encoding, | |
| 536 network applications may need to transmit the length, too. | |
| 537 </p> | |
| 538 | |
| 539 <h3 id="serialize_format">Serialization Format Specification</h3> | |
| 540 <p> | |
| 541 This serialization format is designed for <b>internal use</b> by LuaJIT | |
| 542 applications. Serialized data is upwards-compatible and portable across | |
| 543 all supported LuaJIT platforms. | |
| 544 </p> | |
| 545 <p> | |
| 546 It's an <b>8-bit binary format</b> and not human-readable. It uses e.g. | |
| 547 embedded zeroes and stores embedded Lua string objects unmodified, which | |
| 548 are 8-bit-clean, too. Encoded data can be safely concatenated for | |
| 549 streaming and later decoded one top-level object at a time. | |
| 550 </p> | |
| 551 <p> | |
| 552 The encoding is reasonably compact, but tuned for maximum performance, | |
| 553 not for minimum space usage. It compresses well with any of the common | |
| 554 byte-oriented data compression algorithms. | |
| 555 </p> | |
| 556 <p> | |
| 557 Although documented here for reference, this format is explicitly | |
| 558 <b>not</b> intended to be a 'public standard' for structured data | |
| 559 interchange across computer languages (like JSON or MessagePack). Please | |
| 560 do not use it as such. | |
| 561 </p> | |
| 562 <p> | |
| 563 The specification is given below as a context-free grammar with a | |
| 564 top-level <tt>object</tt> as the starting point. Alternatives are | |
| 565 separated by the <tt>|</tt> symbol and <tt>*</tt> indicates repeats. | |
| 566 Grouping is implicit or indicated by <tt>{…}</tt>. Terminals are | |
| 567 either plain hex numbers, encoded as bytes, or have a <tt>.format</tt> | |
| 568 suffix. | |
| 569 </p> | |
| 570 <pre> | |
| 571 object → nil | false | true | |
| 572 | null | lightud32 | lightud64 | |
| 573 | int | num | tab | tab_mt | |
| 574 | int64 | uint64 | complex | |
| 575 | string | |
| 576 | |
| 577 nil → 0x00 | |
| 578 false → 0x01 | |
| 579 true → 0x02 | |
| 580 | |
| 581 null → 0x03 // NULL lightuserdata | |
| 582 lightud32 → 0x04 data.I // 32 bit lightuserdata | |
| 583 lightud64 → 0x05 data.L // 64 bit lightuserdata | |
| 584 | |
| 585 int → 0x06 int.I // int32_t | |
| 586 num → 0x07 double.L | |
| 587 | |
| 588 tab → 0x08 // Empty table | |
| 589 | 0x09 h.U h*{object object} // Key/value hash | |
| 590 | 0x0a a.U a*object // 0-based array | |
| 591 | 0x0b a.U a*object h.U h*{object object} // Mixed | |
| 592 | 0x0c a.U (a-1)*object // 1-based array | |
| 593 | 0x0d a.U (a-1)*object h.U h*{object object} // Mixed | |
| 594 tab_mt → 0x0e (index-1).U tab // Metatable dict entry | |
| 595 | |
| 596 int64 → 0x10 int.L // FFI int64_t | |
| 597 uint64 → 0x11 uint.L // FFI uint64_t | |
| 598 complex → 0x12 re.L im.L // FFI complex | |
| 599 | |
| 600 string → (0x20+len).U len*char.B | |
| 601 | 0x0f (index-1).U // String dict entry | |
| 602 | |
| 603 .B = 8 bit | |
| 604 .I = 32 bit little-endian | |
| 605 .L = 64 bit little-endian | |
| 606 .U = prefix-encoded 32 bit unsigned number n: | |
| 607 0x00..0xdf → n.B | |
| 608 0xe0..0x1fdf → (0xe0|(((n-0xe0)>>8)&0x1f)).B ((n-0xe0)&0xff).B | |
| 609 0x1fe0.. → 0xff n.I | |
| 610 </pre> | |
| 611 | |
| 612 <h2 id="error">Error handling</h2> | |
| 613 <p> | |
| 614 Many of the buffer methods can throw an error. Out-of-memory or usage | |
| 615 errors are best caught with an outer wrapper for larger parts of code. | |
| 616 There's not much one can do after that, anyway. | |
| 617 </p> | |
| 618 <p> | |
| 619 OTOH, you may want to catch some errors individually. Buffer methods need | |
| 620 to receive the buffer object as the first argument. The Lua colon-syntax | |
| 621 <tt>obj:method()</tt> does that implicitly. But to wrap a method with | |
| 622 <tt>pcall()</tt>, the arguments need to be passed like this: | |
| 623 </p> | |
| 624 <pre class="code"> | |
| 625 local ok, err = pcall(buf.encode, buf, obj) | |
| 626 if not ok then | |
| 627 -- Handle error in err. | |
| 628 end | |
| 629 </pre> | |
| 630 | |
| 631 <h2 id="ffi_caveats">FFI caveats</h2> | |
| 632 <p> | |
| 633 The string buffer library has been designed to work well together with | |
| 634 the FFI library. But due to the low-level nature of the FFI library, | |
| 635 some care needs to be taken: | |
| 636 </p> | |
| 637 <p> | |
| 638 First, please remember that FFI pointers are zero-indexed. The space | |
| 639 returned by <tt>buf:reserve()</tt> and <tt>buf:ref()</tt> starts at the | |
| 640 returned pointer and ends before <tt>len</tt> bytes after that. | |
| 641 </p> | |
| 642 <p> | |
| 643 I.e. the first valid index is <tt>ptr[0]</tt> and the last valid index | |
| 644 is <tt>ptr[len-1]</tt>. If the returned length is zero, there's no valid | |
| 645 index at all. The returned pointer may even be <tt>NULL</tt>. | |
| 646 </p> | |
| 647 <p> | |
| 648 The space pointed to by the returned pointer is only valid as long as | |
| 649 the buffer is not modified in any way (neither append, nor consume, nor | |
| 650 reset, etc.). The pointer is also not a GC anchor for the buffer object | |
| 651 itself. | |
| 652 </p> | |
| 653 <p> | |
| 654 Buffer data is only guaranteed to be byte-aligned. Casting the returned | |
| 655 pointer to a data type with higher alignment may cause unaligned | |
| 656 accesses. It depends on the CPU architecture whether this is allowed or | |
| 657 not (it's always OK on x86/x64 and mostly OK on other modern | |
| 658 architectures). | |
| 659 </p> | |
| 660 <p> | |
| 661 FFI pointers or references do not count as GC anchors for an underlying | |
| 662 object. E.g. an <tt>array</tt> allocated with <tt>ffi.new()</tt> is | |
| 663 anchored by <tt>buf:set(array, len)</tt>, but not by | |
| 664 <tt>buf:set(array+offset, len)</tt>. The addition of the offset | |
| 665 creates a new pointer, even when the offset is zero. In this case, you | |
| 666 need to make sure there's still a reference to the original array as | |
| 667 long as its contents are in use by the buffer. | |
| 668 </p> | |
| 669 <p> | |
| 670 Even though each LuaJIT VM instance is single-threaded (but you can | |
| 671 create multiple VMs), FFI data structures can be accessed concurrently. | |
| 672 Be careful when reading/writing FFI cdata from/to buffers to avoid | |
| 673 concurrent accesses or modifications. In particular, the memory | |
| 674 referenced by <tt>buf:set(cdata, len)</tt> must not be modified | |
| 675 while buffer readers are working on it. Shared, but read-only memory | |
| 676 mappings of files are OK, but only if the file does not change. | |
| 677 </p> | |
| 678 <br class="flush"> | |
| 679 </div> | |
| 680 <div id="foot"> | |
| 681 <hr class="hide"> | |
| 682 Copyright © 2005-2023 | |
| 683 <span class="noprint"> | |
| 684 · | |
| 685 <a href="contact.html">Contact</a> | |
| 686 </span> | |
| 687 </div> | |
| 688 </body> | |
| 689 </html> |