Mercurial
comparison third_party/luajit/doc/ext_profiler.html @ 178:94705b5986b3
[ThirdParty] Added WRK and luajit for load testing.
| author | MrJuneJune <me@mrjunejune.com> |
|---|---|
| date | Thu, 22 Jan 2026 20:10:30 -0800 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 177:24fe8ff94056 | 178:94705b5986b3 |
|---|---|
| 1 <!DOCTYPE html> | |
| 2 <html> | |
| 3 <head> | |
| 4 <title>Profiler</title> | |
| 5 <meta charset="utf-8"> | |
| 6 <meta name="Copyright" content="Copyright (C) 2005-2023"> | |
| 7 <meta name="Language" content="en"> | |
| 8 <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen"> | |
| 9 <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print"> | |
| 10 </head> | |
| 11 <body> | |
| 12 <div id="site"> | |
| 13 <a href="https://luajit.org"><span>Lua<span id="logo">JIT</span></span></a> | |
| 14 </div> | |
| 15 <div id="head"> | |
| 16 <h1>Profiler</h1> | |
| 17 </div> | |
| 18 <div id="nav"> | |
| 19 <ul><li> | |
| 20 <a href="luajit.html">LuaJIT</a> | |
| 21 <ul><li> | |
| 22 <a href="https://luajit.org/download.html">Download <span class="ext">»</span></a> | |
| 23 </li><li> | |
| 24 <a href="install.html">Installation</a> | |
| 25 </li><li> | |
| 26 <a href="running.html">Running</a> | |
| 27 </li></ul> | |
| 28 </li><li> | |
| 29 <a href="extensions.html">Extensions</a> | |
| 30 <ul><li> | |
| 31 <a href="ext_ffi.html">FFI Library</a> | |
| 32 <ul><li> | |
| 33 <a href="ext_ffi_tutorial.html">FFI Tutorial</a> | |
| 34 </li><li> | |
| 35 <a href="ext_ffi_api.html">ffi.* API</a> | |
| 36 </li><li> | |
| 37 <a href="ext_ffi_semantics.html">FFI Semantics</a> | |
| 38 </li></ul> | |
| 39 </li><li> | |
| 40 <a href="ext_buffer.html">String Buffers</a> | |
| 41 </li><li> | |
| 42 <a href="ext_jit.html">jit.* Library</a> | |
| 43 </li><li> | |
| 44 <a href="ext_c_api.html">Lua/C API</a> | |
| 45 </li><li> | |
| 46 <a class="current" href="ext_profiler.html">Profiler</a> | |
| 47 </li></ul> | |
| 48 </li><li> | |
| 49 <a href="https://luajit.org/status.html">Status <span class="ext">»</span></a> | |
| 50 </li><li> | |
| 51 <a href="https://luajit.org/faq.html">FAQ <span class="ext">»</span></a> | |
| 52 </li><li> | |
| 53 <a href="https://luajit.org/list.html">Mailing List <span class="ext">»</span></a> | |
| 54 </li></ul> | |
| 55 </div> | |
| 56 <div id="main"> | |
| 57 <p> | |
| 58 LuaJIT has an integrated statistical profiler with very low overhead. It | |
| 59 allows sampling the currently executing stack and other parameters in | |
| 60 regular intervals. | |
| 61 </p> | |
| 62 <p> | |
| 63 The integrated profiler can be accessed from three levels: | |
| 64 </p> | |
| 65 <ul> | |
| 66 <li>The <a href="#hl_profiler">bundled high-level profiler</a>, invoked by the | |
| 67 <a href="#j_p"><tt>-jp</tt></a> command line option.</li> | |
| 68 <li>A <a href="#ll_lua_api">low-level Lua API</a> to control the profiler.</li> | |
| 69 <li>A <a href="#ll_c_api">low-level C API</a> to control the profiler.</li> | |
| 70 </ul> | |
| 71 | |
| 72 <h2 id="hl_profiler">High-Level Profiler</h2> | |
| 73 <p> | |
| 74 The bundled high-level profiler offers basic profiling functionality. It | |
| 75 generates simple textual summaries or source code annotations. It can be | |
| 76 accessed with the <a href="#j_p"><tt>-jp</tt></a> command line option | |
| 77 or from Lua code by loading the underlying <tt>jit.p</tt> module. | |
| 78 </p> | |
| 79 <p> | |
| 80 To cut to the chase — run this to get a CPU usage profile by | |
| 81 function name: | |
| 82 </p> | |
| 83 <pre class="code"> | |
| 84 luajit -jp myapp.lua | |
| 85 </pre> | |
| 86 <p> | |
| 87 It's <em>not</em> a stated goal of the bundled profiler to add every | |
| 88 possible option or to cater for special profiling needs. The low-level | |
| 89 profiler APIs are documented below. They may be used by third-party | |
| 90 authors to implement advanced functionality, e.g. IDE integration or | |
| 91 graphical profilers. | |
| 92 </p> | |
| 93 <p> | |
| 94 Note: Sampling works for both interpreted and JIT-compiled code. The | |
| 95 results for JIT-compiled code may sometimes be surprising. LuaJIT | |
| 96 heavily optimizes and inlines Lua code — there's no simple | |
| 97 one-to-one correspondence between source code lines and the sampled | |
| 98 machine code. | |
| 99 </p> | |
| 100 | |
| 101 <h3 id="j_p"><tt>-jp=[options[,output]]</tt></h3> | |
| 102 <p> | |
| 103 The <tt>-jp</tt> command line option starts the high-level profiler. | |
| 104 When the application run by the command line terminates, the profiler | |
| 105 stops and writes the results to <tt>stdout</tt> or to the specified | |
| 106 <tt>output</tt> file. | |
| 107 </p> | |
| 108 <p> | |
| 109 The <tt>options</tt> argument specifies how the profiling is to be | |
| 110 performed: | |
| 111 </p> | |
| 112 <ul> | |
| 113 <li><tt>f</tt> — Stack dump: function name, otherwise module:line. | |
| 114 This is the default mode.</li> | |
| 115 <li><tt>F</tt> — Stack dump: ditto, but dump module:name.</li> | |
| 116 <li><tt>l</tt> — Stack dump: module:line.</li> | |
| 117 <li><tt><number></tt> — stack dump depth (callee ← | |
| 118 caller). Default: 1.</li> | |
| 119 <li><tt>-<number></tt> — Inverse stack dump depth (caller | |
| 120 → callee).</li> | |
| 121 <li><tt>s</tt> — Split stack dump after first stack level. Implies | |
| 122 depth ≥ 2 or depth ≤ -2.</li> | |
| 123 <li><tt>p</tt> — Show full path for module names.</li> | |
| 124 <li><tt>v</tt> — Show VM states.</li> | |
| 125 <li><tt>z</tt> — Show <a href="#jit_zone">zones</a>.</li> | |
| 126 <li><tt>r</tt> — Show raw sample counts. Default: show percentages.</li> | |
| 127 <li><tt>a</tt> — Annotate excerpts from source code files.</li> | |
| 128 <li><tt>A</tt> — Annotate complete source code files.</li> | |
| 129 <li><tt>G</tt> — Produce raw output suitable for graphical tools.</li> | |
| 130 <li><tt>m<number></tt> — Minimum sample percentage to be shown. | |
| 131 Default: 3%.</li> | |
| 132 <li><tt>i<number></tt> — Sampling interval in milliseconds. | |
| 133 Default: 10ms.<br> | |
| 134 Note: The actual sampling precision is OS-dependent.</li> | |
| 135 </ul> | |
| 136 <p> | |
| 137 The default output for <tt>-jp</tt> is a list of the most CPU consuming | |
| 138 spots in the application. Increasing the stack dump depth with (say) | |
| 139 <tt>-jp=2</tt> may help to point out the main callers or callees of | |
| 140 hotspots. But sample aggregation is still flat per unique stack dump. | |
| 141 </p> | |
| 142 <p> | |
| 143 To get a two-level view (split view) of callers/callees, use | |
| 144 <tt>-jp=s</tt> or <tt>-jp=-s</tt>. The percentages shown for the second | |
| 145 level are relative to the first level. | |
| 146 </p> | |
| 147 <p> | |
| 148 To see how much time is spent in each line relative to a function, use | |
| 149 <tt>-jp=fl</tt>. | |
| 150 </p> | |
| 151 <p> | |
| 152 To see how much time is spent in different VM states or | |
| 153 <a href="#jit_zone">zones</a>, use <tt>-jp=v</tt> or <tt>-jp=z</tt>. | |
| 154 </p> | |
| 155 <p> | |
| 156 Combinations of <tt>v/z</tt> with <tt>f/F/l</tt> produce two-level | |
| 157 views, e.g. <tt>-jp=vf</tt> or <tt>-jp=fv</tt>. This shows the time | |
| 158 spent in a VM state or zone vs. hotspots. This can be used to answer | |
| 159 questions like "Which time-consuming functions are only interpreted?" or | |
| 160 "What's the garbage collector overhead for a specific function?". | |
| 161 </p> | |
| 162 <p> | |
| 163 Multiple options can be combined — but not all combinations make | |
| 164 sense, see above. E.g. <tt>-jp=3si4m1</tt> samples three stack levels | |
| 165 deep in 4ms intervals and shows a split view of the CPU consuming | |
| 166 functions and their callers with a 1% threshold. | |
| 167 </p> | |
| 168 <p> | |
| 169 Source code annotations produced by <tt>-jp=a</tt> or <tt>-jp=A</tt> are | |
| 170 always flat and at the line level. Obviously, the source code files need | |
| 171 to be readable by the profiler script. | |
| 172 </p> | |
| 173 <p> | |
| 174 The high-level profiler can also be started and stopped from Lua code with: | |
| 175 </p> | |
| 176 <pre class="code"> | |
| 177 require("jit.p").start(options, output) | |
| 178 ... | |
| 179 require("jit.p").stop() | |
| 180 </pre> | |
| 181 | |
| 182 <h3 id="jit_zone"><tt>jit.zone</tt> — Zones</h3> | |
| 183 <p> | |
| 184 Zones can be used to provide information about different parts of an | |
| 185 application to the high-level profiler. E.g. a game could make use of an | |
| 186 <tt>"AI"</tt> zone, a <tt>"PHYS"</tt> zone, etc. Zones are hierarchical, | |
| 187 organized as a stack. | |
| 188 </p> | |
| 189 <p> | |
| 190 The <tt>jit.zone</tt> module needs to be loaded explicitly: | |
| 191 </p> | |
| 192 <pre class="code"> | |
| 193 local zone = require("jit.zone") | |
| 194 </pre> | |
| 195 <ul> | |
| 196 <li><tt>zone("name")</tt> pushes a named zone to the zone stack.</li> | |
| 197 <li><tt>zone()</tt> pops the current zone from the zone stack and | |
| 198 returns its name.</li> | |
| 199 <li><tt>zone:get()</tt> returns the current zone name or <tt>nil</tt>.</li> | |
| 200 <li><tt>zone:flush()</tt> flushes the zone stack.</li> | |
| 201 </ul> | |
| 202 <p> | |
| 203 To show the time spent in each zone use <tt>-jp=z</tt>. To show the time | |
| 204 spent relative to hotspots use e.g. <tt>-jp=zf</tt> or <tt>-jp=fz</tt>. | |
| 205 </p> | |
| 206 | |
| 207 <h2 id="ll_lua_api">Low-level Lua API</h2> | |
| 208 <p> | |
| 209 The <tt>jit.profile</tt> module gives access to the low-level API of the | |
| 210 profiler from Lua code. This module needs to be loaded explicitly: | |
| 211 <pre class="code"> | |
| 212 local profile = require("jit.profile") | |
| 213 </pre> | |
| 214 <p> | |
| 215 This module can be used to implement your own higher-level profiler. | |
| 216 A typical profiling run starts the profiler, captures stack dumps in | |
| 217 the profiler callback, adds them to a hash table to aggregate the number | |
| 218 of samples, stops the profiler and then analyzes all captured | |
| 219 stack dumps. Other parameters can be sampled in the profiler callback, | |
| 220 too. But it's important not to spend too much time in the callback, | |
| 221 since this may skew the statistics. | |
| 222 </p> | |
| 223 | |
| 224 <h3 id="profile_start"><tt>profile.start(mode, cb)</tt> | |
| 225 — Start profiler</h3> | |
| 226 <p> | |
| 227 This function starts the profiler. The <tt>mode</tt> argument is a | |
| 228 string holding options: | |
| 229 </p> | |
| 230 <ul> | |
| 231 <li><tt>f</tt> — Profile with precision down to the function level.</li> | |
| 232 <li><tt>l</tt> — Profile with precision down to the line level.</li> | |
| 233 <li><tt>i<number></tt> — Sampling interval in milliseconds (default | |
| 234 10ms).</br> | |
| 235 Note: The actual sampling precision is OS-dependent. | |
| 236 </li> | |
| 237 </ul> | |
| 238 <p> | |
| 239 The <tt>cb</tt> argument is a callback function which is called with | |
| 240 three arguments: <tt>(thread, samples, vmstate)</tt>. The callback is | |
| 241 called on a separate coroutine, the <tt>thread</tt> argument is the | |
| 242 state that holds the stack to sample for profiling. Note: do | |
| 243 <em>not</em> modify the stack of that state or call functions on it. | |
| 244 </p> | |
| 245 <p> | |
| 246 <tt>samples</tt> gives the number of accumulated samples since the last | |
| 247 callback (usually 1). | |
| 248 </p> | |
| 249 <p> | |
| 250 <tt>vmstate</tt> holds the VM state at the time the profiling timer | |
| 251 triggered. This may or may not correspond to the state of the VM when | |
| 252 the profiling callback is called. The state is either <tt>'N'</tt> | |
| 253 native (compiled) code, <tt>'I'</tt> interpreted code, <tt>'C'</tt> | |
| 254 C code, <tt>'G'</tt> the garbage collector, or <tt>'J'</tt> the JIT | |
| 255 compiler. | |
| 256 </p> | |
| 257 | |
| 258 <h3 id="profile_stop"><tt>profile.stop()</tt> | |
| 259 — Stop profiler</h3> | |
| 260 <p> | |
| 261 This function stops the profiler. | |
| 262 </p> | |
| 263 | |
| 264 <h3 id="profile_dump"><tt>dump = profile.dumpstack([thread,] fmt, depth)</tt> | |
| 265 — Dump stack </h3> | |
| 266 <p> | |
| 267 This function allows taking stack dumps in an efficient manner. It | |
| 268 returns a string with a stack dump for the <tt>thread</tt> (coroutine), | |
| 269 formatted according to the <tt>fmt</tt> argument: | |
| 270 </p> | |
| 271 <ul> | |
| 272 <li><tt>p</tt> — Preserve the full path for module names. Otherwise, | |
| 273 only the file name is used.</li> | |
| 274 <li><tt>f</tt> — Dump the function name if it can be derived. Otherwise, | |
| 275 use module:line.</li> | |
| 276 <li><tt>F</tt> — Ditto, but dump module:name.</li> | |
| 277 <li><tt>l</tt> — Dump module:line.</li> | |
| 278 <li><tt>Z</tt> — Zap the following characters for the last dumped | |
| 279 frame.</li> | |
| 280 <li>All other characters are added verbatim to the output string.</li> | |
| 281 </ul> | |
| 282 <p> | |
| 283 The <tt>depth</tt> argument gives the number of frames to dump, starting | |
| 284 at the topmost frame of the thread. A negative number dumps the frames in | |
| 285 inverse order. | |
| 286 </p> | |
| 287 <p> | |
| 288 The first example prints a list of the current module names and line | |
| 289 numbers of up to 10 frames in separate lines. The second example prints | |
| 290 semicolon-separated function names for all frames (up to 100) in inverse | |
| 291 order: | |
| 292 </p> | |
| 293 <pre class="code"> | |
| 294 print(profile.dumpstack(thread, "l\n", 10)) | |
| 295 print(profile.dumpstack(thread, "lZ;", -100)) | |
| 296 </pre> | |
| 297 | |
| 298 <h2 id="ll_c_api">Low-level C API</h2> | |
| 299 <p> | |
| 300 The profiler can be controlled directly from C code, e.g. for | |
| 301 use by IDEs. The declarations are in <tt>"luajit.h"</tt> (see | |
| 302 <a href="ext_c_api.html">Lua/C API</a> extensions). | |
| 303 </p> | |
| 304 | |
| 305 <h3 id="luaJIT_profile_start"><tt>luaJIT_profile_start(L, mode, cb, data)</tt> | |
| 306 — Start profiler</h3> | |
| 307 <p> | |
| 308 This function starts the profiler. <a href="#profile_start">See | |
| 309 above</a> for a description of the <tt>mode</tt> argument. | |
| 310 </p> | |
| 311 <p> | |
| 312 The <tt>cb</tt> argument is a callback function with the following | |
| 313 declaration: | |
| 314 </p> | |
| 315 <pre class="code"> | |
| 316 typedef void (*luaJIT_profile_callback)(void *data, lua_State *L, | |
| 317 int samples, int vmstate); | |
| 318 </pre> | |
| 319 <p> | |
| 320 <tt>data</tt> is available for use by the callback. <tt>L</tt> is the | |
| 321 state that holds the stack to sample for profiling. Note: do | |
| 322 <em>not</em> modify this stack or call functions on this stack — | |
| 323 use a separate coroutine for this purpose. <a href="#profile_start">See | |
| 324 above</a> for a description of <tt>samples</tt> and <tt>vmstate</tt>. | |
| 325 </p> | |
| 326 | |
| 327 <h3 id="luaJIT_profile_stop"><tt>luaJIT_profile_stop(L)</tt> | |
| 328 — Stop profiler</h3> | |
| 329 <p> | |
| 330 This function stops the profiler. | |
| 331 </p> | |
| 332 | |
| 333 <h3 id="luaJIT_profile_dumpstack"><tt>p = luaJIT_profile_dumpstack(L, fmt, depth, len)</tt> | |
| 334 — Dump stack </h3> | |
| 335 <p> | |
| 336 This function allows taking stack dumps in an efficient manner. | |
| 337 <a href="#profile_dump">See above</a> for a description of <tt>fmt</tt> | |
| 338 and <tt>depth</tt>. | |
| 339 </p> | |
| 340 <p> | |
| 341 This function returns a <tt>const char *</tt> pointing to a | |
| 342 private string buffer of the profiler. The <tt>int *len</tt> | |
| 343 argument returns the length of the output string. The buffer is | |
| 344 overwritten on the next call and deallocated when the profiler stops. | |
| 345 You either need to consume the content immediately or copy it for later | |
| 346 use. | |
| 347 </p> | |
| 348 <br class="flush"> | |
| 349 </div> | |
| 350 <div id="foot"> | |
| 351 <hr class="hide"> | |
| 352 Copyright © 2005-2023 | |
| 353 <span class="noprint"> | |
| 354 · | |
| 355 <a href="contact.html">Contact</a> | |
| 356 </span> | |
| 357 </div> | |
| 358 </body> | |
| 359 </html> |