Mercurial
comparison third_party/bun/node_modules/js-tokens/README.md @ 12:de54585a40f1
Adding bun and node modules.
| author | June Park <parkjune1995@gmail.com> |
|---|---|
| date | Thu, 02 Oct 2025 14:39:48 -0700 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 11:f33d9ff8b6e8 | 12:de54585a40f1 |
|---|---|
| 1 Overview [](https://travis-ci.org/lydell/js-tokens) | |
| 2 ======== | |
| 3 | |
| 4 A regex that tokenizes JavaScript. | |
| 5 | |
| 6 ```js | |
| 7 var jsTokens = require("js-tokens").default | |
| 8 | |
| 9 var jsString = "var foo=opts.foo;\n..." | |
| 10 | |
| 11 jsString.match(jsTokens) | |
| 12 // ["var", " ", "foo", "=", "opts", ".", "foo", ";", "\n", ...] | |
| 13 ``` | |
| 14 | |
| 15 | |
| 16 Installation | |
| 17 ============ | |
| 18 | |
| 19 `npm install js-tokens` | |
| 20 | |
| 21 ```js | |
| 22 import jsTokens from "js-tokens" | |
| 23 // or: | |
| 24 var jsTokens = require("js-tokens").default | |
| 25 ``` | |
| 26 | |
| 27 | |
| 28 Usage | |
| 29 ===== | |
| 30 | |
| 31 ### `jsTokens` ### | |
| 32 | |
| 33 A regex with the `g` flag that matches JavaScript tokens. | |
| 34 | |
| 35 The regex _always_ matches, even invalid JavaScript and the empty string. | |
| 36 | |
| 37 The next match is always directly after the previous. | |
| 38 | |
| 39 ### `var token = matchToToken(match)` ### | |
| 40 | |
| 41 ```js | |
| 42 import {matchToToken} from "js-tokens" | |
| 43 // or: | |
| 44 var matchToToken = require("js-tokens").matchToToken | |
| 45 ``` | |
| 46 | |
| 47 Takes a `match` returned by `jsTokens.exec(string)`, and returns a `{type: | |
| 48 String, value: String}` object. The following types are available: | |
| 49 | |
| 50 - string | |
| 51 - comment | |
| 52 - regex | |
| 53 - number | |
| 54 - name | |
| 55 - punctuator | |
| 56 - whitespace | |
| 57 - invalid | |
| 58 | |
| 59 Multi-line comments and strings also have a `closed` property indicating if the | |
| 60 token was closed or not (see below). | |
| 61 | |
| 62 Comments and strings both come in several flavors. To distinguish them, check if | |
| 63 the token starts with `//`, `/*`, `'`, `"` or `` ` ``. | |
| 64 | |
| 65 Names are ECMAScript IdentifierNames, that is, including both identifiers and | |
| 66 keywords. You may use [is-keyword-js] to tell them apart. | |
| 67 | |
| 68 Whitespace includes both line terminators and other whitespace. | |
| 69 | |
| 70 [is-keyword-js]: https://github.com/crissdev/is-keyword-js | |
| 71 | |
| 72 | |
| 73 ECMAScript support | |
| 74 ================== | |
| 75 | |
| 76 The intention is to always support the latest ECMAScript version whose feature | |
| 77 set has been finalized. | |
| 78 | |
| 79 If adding support for a newer version requires changes, a new version with a | |
| 80 major verion bump will be released. | |
| 81 | |
| 82 Currently, ECMAScript 2018 is supported. | |
| 83 | |
| 84 | |
| 85 Invalid code handling | |
| 86 ===================== | |
| 87 | |
| 88 Unterminated strings are still matched as strings. JavaScript strings cannot | |
| 89 contain (unescaped) newlines, so unterminated strings simply end at the end of | |
| 90 the line. Unterminated template strings can contain unescaped newlines, though, | |
| 91 so they go on to the end of input. | |
| 92 | |
| 93 Unterminated multi-line comments are also still matched as comments. They | |
| 94 simply go on to the end of the input. | |
| 95 | |
| 96 Unterminated regex literals are likely matched as division and whatever is | |
| 97 inside the regex. | |
| 98 | |
| 99 Invalid ASCII characters have their own capturing group. | |
| 100 | |
| 101 Invalid non-ASCII characters are treated as names, to simplify the matching of | |
| 102 names (except unicode spaces which are treated as whitespace). Note: See also | |
| 103 the [ES2018](#es2018) section. | |
| 104 | |
| 105 Regex literals may contain invalid regex syntax. They are still matched as | |
| 106 regex literals. They may also contain repeated regex flags, to keep the regex | |
| 107 simple. | |
| 108 | |
| 109 Strings may contain invalid escape sequences. | |
| 110 | |
| 111 | |
| 112 Limitations | |
| 113 =========== | |
| 114 | |
| 115 Tokenizing JavaScript using regexes—in fact, _one single regex_—won’t be | |
| 116 perfect. But that’s not the point either. | |
| 117 | |
| 118 You may compare jsTokens with [esprima] by using `esprima-compare.js`. | |
| 119 See `npm run esprima-compare`! | |
| 120 | |
| 121 [esprima]: http://esprima.org/ | |
| 122 | |
| 123 ### Template string interpolation ### | |
| 124 | |
| 125 Template strings are matched as single tokens, from the starting `` ` `` to the | |
| 126 ending `` ` ``, including interpolations (whose tokens are not matched | |
| 127 individually). | |
| 128 | |
| 129 Matching template string interpolations requires recursive balancing of `{` and | |
| 130 `}`—something that JavaScript regexes cannot do. Only one level of nesting is | |
| 131 supported. | |
| 132 | |
| 133 ### Division and regex literals collision ### | |
| 134 | |
| 135 Consider this example: | |
| 136 | |
| 137 ```js | |
| 138 var g = 9.82 | |
| 139 var number = bar / 2/g | |
| 140 | |
| 141 var regex = / 2/g | |
| 142 ``` | |
| 143 | |
| 144 A human can easily understand that in the `number` line we’re dealing with | |
| 145 division, and in the `regex` line we’re dealing with a regex literal. How come? | |
| 146 Because humans can look at the whole code to put the `/` characters in context. | |
| 147 A JavaScript regex cannot. It only sees forwards. (Well, ES2018 regexes can also | |
| 148 look backwards. See the [ES2018](#es2018) section). | |
| 149 | |
| 150 When the `jsTokens` regex scans throught the above, it will see the following | |
| 151 at the end of both the `number` and `regex` rows: | |
| 152 | |
| 153 ```js | |
| 154 / 2/g | |
| 155 ``` | |
| 156 | |
| 157 It is then impossible to know if that is a regex literal, or part of an | |
| 158 expression dealing with division. | |
| 159 | |
| 160 Here is a similar case: | |
| 161 | |
| 162 ```js | |
| 163 foo /= 2/g | |
| 164 foo(/= 2/g) | |
| 165 ``` | |
| 166 | |
| 167 The first line divides the `foo` variable with `2/g`. The second line calls the | |
| 168 `foo` function with the regex literal `/= 2/g`. Again, since `jsTokens` only | |
| 169 sees forwards, it cannot tell the two cases apart. | |
| 170 | |
| 171 There are some cases where we _can_ tell division and regex literals apart, | |
| 172 though. | |
| 173 | |
| 174 First off, we have the simple cases where there’s only one slash in the line: | |
| 175 | |
| 176 ```js | |
| 177 var foo = 2/g | |
| 178 foo /= 2 | |
| 179 ``` | |
| 180 | |
| 181 Regex literals cannot contain newlines, so the above cases are correctly | |
| 182 identified as division. Things are only problematic when there are more than | |
| 183 one non-comment slash in a single line. | |
| 184 | |
| 185 Secondly, not every character is a valid regex flag. | |
| 186 | |
| 187 ```js | |
| 188 var number = bar / 2/e | |
| 189 ``` | |
| 190 | |
| 191 The above example is also correctly identified as division, because `e` is not a | |
| 192 valid regex flag. I initially wanted to future-proof by allowing `[a-zA-Z]*` | |
| 193 (any letter) as flags, but it is not worth it since it increases the amount of | |
| 194 ambigous cases. So only the standard `g`, `m`, `i`, `y` and `u` flags are | |
| 195 allowed. This means that the above example will be identified as division as | |
| 196 long as you don’t rename the `e` variable to some permutation of `gmiyus` 1 to 6 | |
| 197 characters long. | |
| 198 | |
| 199 Lastly, we can look _forward_ for information. | |
| 200 | |
| 201 - If the token following what looks like a regex literal is not valid after a | |
| 202 regex literal, but is valid in a division expression, then the regex literal | |
| 203 is treated as division instead. For example, a flagless regex cannot be | |
| 204 followed by a string, number or name, but all of those three can be the | |
| 205 denominator of a division. | |
| 206 - Generally, if what looks like a regex literal is followed by an operator, the | |
| 207 regex literal is treated as division instead. This is because regexes are | |
| 208 seldomly used with operators (such as `+`, `*`, `&&` and `==`), but division | |
| 209 could likely be part of such an expression. | |
| 210 | |
| 211 Please consult the regex source and the test cases for precise information on | |
| 212 when regex or division is matched (should you need to know). In short, you | |
| 213 could sum it up as: | |
| 214 | |
| 215 If the end of a statement looks like a regex literal (even if it isn’t), it | |
| 216 will be treated as one. Otherwise it should work as expected (if you write sane | |
| 217 code). | |
| 218 | |
| 219 ### ES2018 ### | |
| 220 | |
| 221 ES2018 added some nice regex improvements to the language. | |
| 222 | |
| 223 - [Unicode property escapes] should allow telling names and invalid non-ASCII | |
| 224 characters apart without blowing up the regex size. | |
| 225 - [Lookbehind assertions] should allow matching telling division and regex | |
| 226 literals apart in more cases. | |
| 227 - [Named capture groups] might simplify some things. | |
| 228 | |
| 229 These things would be nice to do, but are not critical. They probably have to | |
| 230 wait until the oldest maintained Node.js LTS release supports those features. | |
| 231 | |
| 232 [Unicode property escapes]: http://2ality.com/2017/07/regexp-unicode-property-escapes.html | |
| 233 [Lookbehind assertions]: http://2ality.com/2017/05/regexp-lookbehind-assertions.html | |
| 234 [Named capture groups]: http://2ality.com/2017/05/regexp-named-capture-groups.html | |
| 235 | |
| 236 | |
| 237 License | |
| 238 ======= | |
| 239 | |
| 240 [MIT](LICENSE). |