README.md
1 # CJS Module Lexer 2 3 [![Build Status][travis-image]][travis-url] 4 5 A [very fast](#benchmarks) JS CommonJS module syntax lexer used to detect the most likely list of named exports of a CommonJS module. 6 7 Outputs the list of named exports (`exports.name = ...`) and possible module reexports (`module.exports = require('...')`), including the common transpiler variations of these cases. 8 9 Forked from https://github.com/guybedford/es-module-lexer. 10 11 _Comprehensively handles the JS language grammar while remaining small and fast. - ~90ms per MB of JS cold and ~15ms per MB of JS warm, [see benchmarks](#benchmarks) for more info._ 12 13 ### Usage 14 15 ``` 16 npm install cjs-module-lexer 17 ``` 18 19 For use in CommonJS: 20 21 ```js 22 const { parse } = require('cjs-module-lexer'); 23 24 // `init` return a promise for parity with the ESM API, but you do not have to call it 25 26 const { exports, reexports } = parse(` 27 // named exports detection 28 module.exports.a = 'a'; 29 (function () { 30 exports.b = 'b'; 31 })(); 32 Object.defineProperty(exports, 'c', { value: 'c' }); 33 /* exports.d = 'not detected'; */ 34 35 // reexports detection 36 if (maybe) module.exports = require('./dep1.js'); 37 if (another) module.exports = require('./dep2.js'); 38 39 // literal exports assignments 40 module.exports = { a, b: c, d, 'e': f } 41 42 // __esModule detection 43 Object.defineProperty(module.exports, '__esModule', { value: true }) 44 `); 45 46 // exports === ['a', 'b', 'c', '__esModule'] 47 // reexports === ['./dep1.js', './dep2.js'] 48 ``` 49 50 When using the ESM version, Wasm is supported instead: 51 52 ```js 53 import { parse, init } from 'cjs-module-lexer'; 54 // init needs to be called and waited upon 55 await init(); 56 const { exports, reexports } = parse(source); 57 ``` 58 59 The Wasm build is around 1.5x faster and without a cold start. 60 61 ### Grammar 62 63 CommonJS exports matches are run against the source token stream. 64 65 The token grammar is: 66 67 ``` 68 IDENTIFIER: As defined by ECMA-262, without support for identifier `\` escapes, filtered to remove strict reserved words: 69 "implements", "interface", "let", "package", "private", "protected", "public", "static", "yield", "enum" 70 71 STRING_LITERAL: A `"` or `'` bounded ECMA-262 string literal. 72 73 IDENTIFIER_STRING: ( `"` IDENTIFIER `"` | `'` IDENTIFIER `'` ) 74 75 MODULE_EXPORTS: `module` `.` `exports` 76 77 EXPORTS_IDENTIFIER: MODULE_EXPORTS_IDENTIFIER | `exports` 78 79 EXPORTS_DOT_ASSIGN: EXPORTS_IDENTIFIER `.` IDENTIFIER `=` 80 81 EXPORTS_LITERAL_COMPUTED_ASSIGN: EXPORTS_IDENTIFIER `[` IDENTIFIER_STRING `]` `=` 82 83 EXPORTS_LITERAL_PROP: (IDENTIFIER `:` IDENTIFIER)?) | (IDENTIFIER_STRING `:` IDENTIFIER) 84 85 EXPORTS_SPREAD: `...` (IDENTIFIER | REQUIRE) 86 87 EXPORTS_MEMBER: EXPORTS_DOT_ASSIGN | EXPORTS_LITERAL_COMPUTED_ASSIGN 88 89 EXPORTS_DEFINE: `Object` `.` `defineProperty `(` IDENTIFIER_STRING `, {` 90 (`enumerable: true,`)? 91 ( 92 `value:` | 93 `get` (`: function` IDENTIFIER? )? `()` {` return IDENTIFIER (`.` IDENTIFIER | `[` IDENTIFIER_STRING `]`)? `;`? `}` 94 ) 95 `})` 96 97 EXPORTS_LITERAL: MODULE_EXPORTS `=` `{` (EXPORTS_LITERAL_PROP | EXPORTS_SPREAD) `,`)+ `}` 98 99 REQUIRE: `require` `(` STRING_LITERAL `)` 100 101 EXPORTS_ASSIGN: (`var` | `const` | `let`) IDENTIFIER `=` REQUIRE 102 103 MODULE_EXPORTS_ASSIGN: MODULE_EXPORTS `=` REQUIRE 104 105 EXPORT_STAR: (`__export` | `__exportStar`) `(` REQUIRE 106 107 EXPORT_STAR_LIB: `Object.keys(` IDENTIFIER$1 `).forEach(function (` IDENTIFIER$2 `) {` 108 ( 109 `if (` IDENTIFIER$2 `===` ( `'default'` | `"default"` ) `||` IDENTIFIER$2 `===` ( '__esModule' | `"__esModule"` ) `) return` `;`? | 110 `if (` IDENTIFIER$2 `!==` ( `'default'` | `"default"` ) `)` 111 ) 112 ( 113 `if (` IDENTIFIER$2 `in` EXPORTS_IDENTIFIER `&&` EXPORTS_IDENTIFIER `[` IDENTIFIER$2 `] ===` IDENTIFIER$1 `[` IDENTIFIER$2 `]) return` `;`? 114 )? 115 ( 116 EXPORTS_IDENTIFIER `[` IDENTIFIER$2 `] =` IDENTIFIER$1 `[` IDENTIFIER$2 `]` `;`? | 117 `Object.defineProperty(` EXPORTS_IDENTIFIER `, ` IDENTIFIER$2 `, { enumerable: true, get: function () { return ` IDENTIFIER$1 `[` IDENTIFIER$2 `]` `;`? } })` `;`? 118 ) 119 `})` 120 ``` 121 122 Spacing between tokens is taken to be any ECMA-262 whitespace, ECMA-262 block comment or ECMA-262 line comment. 123 124 * The returned export names are taken to be the combination of the `IDENTIFIER` and `IDENTIFIER_STRING` slots for all `EXPORTS_MEMBER`, `EXPORTS_LITERAL` and `EXPORTS_DEFINE` matches. 125 * The reexport specifiers are taken to be the the combination of: 126 1. The `REQUIRE` matches of the last matched of either `MODULE_EXPORTS_ASSIGN` or `EXPORTS_LITERAL`. 127 2. All _top-level_ `EXPORT_STAR` `REQUIRE` matches and `EXPORTS_ASSIGN` matches whose `IDENTIFIER` also matches the first `IDENTIFIER` in `EXPORT_STAR_LIB`. 128 129 ### Parsing Examples 130 131 #### Named Exports Parsing 132 133 The basic matching rules for named exports are `exports.name`, `exports['name']` or `Object.defineProperty(exports, 'name', ...)`. This matching is done without scope analysis and regardless of the expression position: 134 135 ```js 136 // DETECTS EXPORTS: a, b 137 (function (exports) { 138 exports.a = 'a'; 139 exports['b'] = 'b'; 140 })(exports); 141 ``` 142 143 Because there is no scope analysis, the above detection may overclassify: 144 145 ```js 146 // DETECTS EXPORTS: a, b, c 147 (function (exports, Object) { 148 exports.a = 'a'; 149 exports['b'] = 'b'; 150 if (false) 151 exports.c = 'c'; 152 })(NOT_EXPORTS, NOT_OBJECT); 153 ``` 154 155 It will in turn underclassify in cases where the identifiers are renamed: 156 157 ```js 158 // DETECTS: NO EXPORTS 159 (function (e) { 160 e.a = 'a'; 161 e['b'] = 'b'; 162 })(exports); 163 ``` 164 165 `Object.defineProperty` is detected for specifically value and getter forms returning an identifier or member expression: 166 167 ```js 168 // DETECTS: a, b, c, d, __esModule 169 Object.defineProperty(exports, 'a', { 170 enumerable: true, 171 get: function () { 172 return q.p; 173 } 174 }); 175 Object.defineProperty(exports, 'b', { 176 enumerable: true, 177 get: function () { 178 return q['p']; 179 } 180 }); 181 Object.defineProperty(exports, 'c', { 182 enumerable: true, 183 get () { 184 return b; 185 } 186 }); 187 Object.defineProperty(exports, 'd', { value: 'd' }); 188 Object.defineProperty(exports, '__esModule', { value: true }); 189 ``` 190 191 Alternative object definition structures or getter function bodies are not detected: 192 193 ```js 194 // DETECTS: NO EXPORTS 195 Object.defineProperty(exports, 'a', { 196 enumerable: false, 197 get () { 198 return p; 199 } 200 }); 201 Object.defineProperty(exports, 'b', { 202 configurable: true, 203 get () { 204 return p; 205 } 206 }); 207 Object.defineProperty(exports, 'c', { 208 get: () => p 209 }); 210 Object.defineProperty(exports, 'd', { 211 enumerable: true, 212 get: function () { 213 return dynamic(); 214 } 215 }); 216 Object.defineProperty(exports, 'e', { 217 enumerable: true, 218 get () { 219 return 'str'; 220 } 221 }); 222 ``` 223 224 `Object.defineProperties` is also not supported. 225 226 #### Exports Object Assignment 227 228 A best-effort is made to detect `module.exports` object assignments, but because this is not a full parser, arbitrary expressions are not handled in the 229 object parsing process. 230 231 Simple object definitions are supported: 232 233 ```js 234 // DETECTS EXPORTS: a, b, c 235 module.exports = { 236 a, 237 'b': b, 238 c: c, 239 ...d 240 }; 241 ``` 242 243 Object properties that are not identifiers or string expressions will bail out of the object detection, while spreads are ignored: 244 245 ```js 246 // DETECTS EXPORTS: a, b 247 module.exports = { 248 a, 249 ...d, 250 b: require('c'), 251 c: "not detected since require('c') above bails the object detection" 252 } 253 ``` 254 255 `Object.defineProperties` is not currently supported either. 256 257 #### module.exports reexport assignment 258 259 Any `module.exports = require('mod')` assignment is detected as a reexport, but only the last one is returned: 260 261 ```js 262 // DETECTS REEXPORTS: c 263 module.exports = require('a'); 264 (module => module.exports = require('b'))(NOT_MODULE); 265 if (false) module.exports = require('c'); 266 ``` 267 268 This is to avoid over-classification in Webpack bundles with externals which include `module.exports = require('external')` in their source for every external dependency. 269 270 In exports object assignment, any spread of `require()` are detected as multiple separate reexports: 271 272 ```js 273 // DETECTS REEXPORTS: a, b 274 module.exports = require('ignored'); 275 module.exports = { 276 ...require('a'), 277 ...require('b') 278 }; 279 ``` 280 281 #### Transpiler Re-exports 282 283 For named exports, transpiler output works well with the rules described above. 284 285 But for star re-exports, special care is taken to support common patterns of transpiler outputs from Babel and TypeScript as well as bundlers like RollupJS. 286 These reexport and star reexport patterns are restricted to only be detected at the top-level as provided by the direct output of these tools. 287 288 For example, `export * from 'external'` is output by Babel as: 289 290 ```js 291 "use strict"; 292 293 exports.__esModule = true; 294 295 var _external = require("external"); 296 297 Object.keys(_external).forEach(function (key) { 298 if (key === "default" || key === "__esModule") return; 299 exports[key] = _external[key]; 300 }); 301 ``` 302 303 Where the `var _external = require("external")` is specifically detected as well as the `Object.keys(_external)` statement, down to the exact 304 for of that entire expression including minor variations of the output. The `_external` and `key` identifiers are carefully matched in this 305 detection. 306 307 Similarly for TypeScript, `export * from 'external'` is output as: 308 309 ```js 310 "use strict"; 311 function __export(m) { 312 for (var p in m) if (!exports.hasOwnProperty(p)) exports[p] = m[p]; 313 } 314 Object.defineProperty(exports, "__esModule", { value: true }); 315 __export(require("external")); 316 ``` 317 318 Where the `__export(require("external"))` statement is explicitly detected as a reexport, including variations `tslib.__export` and `__exportStar`. 319 320 ### Environment Support 321 322 Node.js 10+, and [all browsers with Web Assembly support](https://caniuse.com/#feat=wasm). 323 324 ### JS Grammar Support 325 326 * Token state parses all line comments, block comments, strings, template strings, blocks, parens and punctuators. 327 * Division operator / regex token ambiguity is handled via backtracking checks against punctuator prefixes, including closing brace or paren backtracking. 328 * Always correctly parses valid JS source, but may parse invalid JS source without errors. 329 330 ### Benchmarks 331 332 Benchmarks can be run with `npm run bench`. 333 334 Current results: 335 336 JS Build: 337 338 ``` 339 Module load time 340 > 5ms 341 Cold Run, All Samples 342 test/samples/*.js (3635 KiB) 343 > 323ms 344 345 Warm Runs (average of 25 runs) 346 test/samples/angular.js (1410 KiB) 347 > 14.84ms 348 test/samples/angular.min.js (303 KiB) 349 > 4.8ms 350 test/samples/d3.js (553 KiB) 351 > 7.84ms 352 test/samples/d3.min.js (250 KiB) 353 > 4ms 354 test/samples/magic-string.js (34 KiB) 355 > 0.72ms 356 test/samples/magic-string.min.js (20 KiB) 357 > 0.4ms 358 test/samples/rollup.js (698 KiB) 359 > 9.32ms 360 test/samples/rollup.min.js (367 KiB) 361 > 6.52ms 362 363 Warm Runs, All Samples (average of 25 runs) 364 test/samples/*.js (3635 KiB) 365 > 44ms 366 ``` 367 368 Wasm Build: 369 ``` 370 Module load time 371 > 11ms 372 Cold Run, All Samples 373 test/samples/*.js (3635 KiB) 374 > 42ms 375 376 Warm Runs (average of 25 runs) 377 test/samples/angular.js (1410 KiB) 378 > 9.92ms 379 test/samples/angular.min.js (303 KiB) 380 > 3.2ms 381 test/samples/d3.js (553 KiB) 382 > 5.2ms 383 test/samples/d3.min.js (250 KiB) 384 > 2.52ms 385 test/samples/magic-string.js (34 KiB) 386 > 0.16ms 387 test/samples/magic-string.min.js (20 KiB) 388 > 0.04ms 389 test/samples/rollup.js (698 KiB) 390 > 6.44ms 391 test/samples/rollup.min.js (367 KiB) 392 > 3.96ms 393 394 Warm Runs, All Samples (average of 25 runs) 395 test/samples/*.js (3635 KiB) 396 > 30.48ms 397 ``` 398 399 ### Wasm Build Steps 400 401 To build download the WASI SDK from https://github.com/WebAssembly/wasi-sdk/releases. 402 403 The Makefile assumes the existence of "wasi-sdk-11.0" and "wabt" (optional) as sibling folders to this project. 404 405 The build through the Makefile is then run via `make lib/lexer.wasm`, which can also be triggered via `npm run build-wasm` to create `dist/lexer.js`. 406 407 On Windows it may be preferable to use the Linux subsystem. 408 409 After the Web Assembly build, the CJS build can be triggered via `npm run build`. 410 411 Optimization passes are run with [Binaryen](https://github.com/WebAssembly/binaryen) prior to publish to reduce the Web Assembly footprint. 412 413 ### License 414 415 MIT 416 417 [travis-url]: https://travis-ci.org/guybedford/es-module-lexer 418 [travis-image]: https://travis-ci.org/guybedford/es-module-lexer.svg?branch=master