README.md
  1  # CJS Module Lexer
  2  
  3  [![Build Status][travis-image]][travis-url]
  4  
  5  A [very fast](#benchmarks) JS CommonJS module syntax lexer used to detect the most likely list of named exports of a CommonJS module.
  6  
  7  Outputs the list of named exports (`exports.name = ...`) and possible module reexports (`module.exports = require('...')`), including the common transpiler variations of these cases.
  8  
  9  Forked from https://github.com/guybedford/es-module-lexer.
 10  
 11  _Comprehensively handles the JS language grammar while remaining small and fast. - ~90ms per MB of JS cold and ~15ms per MB of JS warm, [see benchmarks](#benchmarks) for more info._
 12  
 13  ### Usage
 14  
 15  ```
 16  npm install cjs-module-lexer
 17  ```
 18  
 19  For use in CommonJS:
 20  
 21  ```js
 22  const { parse } = require('cjs-module-lexer');
 23  
 24  // `init` return a promise for parity with the ESM API, but you do not have to call it
 25  
 26  const { exports, reexports } = parse(`
 27    // named exports detection
 28    module.exports.a = 'a';
 29    (function () {
 30      exports.b = 'b';
 31    })();
 32    Object.defineProperty(exports, 'c', { value: 'c' });
 33    /* exports.d = 'not detected'; */
 34  
 35    // reexports detection
 36    if (maybe) module.exports = require('./dep1.js');
 37    if (another) module.exports = require('./dep2.js');
 38  
 39    // literal exports assignments
 40    module.exports = { a, b: c, d, 'e': f }
 41  
 42    // __esModule detection
 43    Object.defineProperty(module.exports, '__esModule', { value: true })
 44  `);
 45  
 46  // exports === ['a', 'b', 'c', '__esModule']
 47  // reexports === ['./dep1.js', './dep2.js']
 48  ```
 49  
 50  When using the ESM version, Wasm is supported instead:
 51  
 52  ```js
 53  import { parse, init } from 'cjs-module-lexer';
 54  // init needs to be called and waited upon
 55  await init();
 56  const { exports, reexports } = parse(source);
 57  ```
 58  
 59  The Wasm build is around 1.5x faster and without a cold start.
 60  
 61  ### Grammar
 62  
 63  CommonJS exports matches are run against the source token stream.
 64  
 65  The token grammar is:
 66  
 67  ```
 68  IDENTIFIER: As defined by ECMA-262, without support for identifier `\` escapes, filtered to remove strict reserved words:
 69              "implements", "interface", "let", "package", "private", "protected", "public", "static", "yield", "enum"
 70  
 71  STRING_LITERAL: A `"` or `'` bounded ECMA-262 string literal.
 72  
 73  IDENTIFIER_STRING: ( `"` IDENTIFIER `"` | `'` IDENTIFIER `'` )
 74  
 75  MODULE_EXPORTS: `module` `.` `exports`
 76  
 77  EXPORTS_IDENTIFIER: MODULE_EXPORTS_IDENTIFIER | `exports`
 78  
 79  EXPORTS_DOT_ASSIGN: EXPORTS_IDENTIFIER `.` IDENTIFIER `=`
 80  
 81  EXPORTS_LITERAL_COMPUTED_ASSIGN: EXPORTS_IDENTIFIER `[` IDENTIFIER_STRING `]` `=`
 82  
 83  EXPORTS_LITERAL_PROP: (IDENTIFIER  `:` IDENTIFIER)?) | (IDENTIFIER_STRING `:` IDENTIFIER)
 84  
 85  EXPORTS_SPREAD: `...` (IDENTIFIER | REQUIRE)
 86  
 87  EXPORTS_MEMBER: EXPORTS_DOT_ASSIGN | EXPORTS_LITERAL_COMPUTED_ASSIGN
 88  
 89  EXPORTS_DEFINE: `Object` `.` `defineProperty `(` IDENTIFIER_STRING `, {`
 90    (`enumerable: true,`)?
 91    (
 92      `value:` |
 93      `get` (`: function` IDENTIFIER? )?  `()` {` return IDENTIFIER (`.` IDENTIFIER | `[` IDENTIFIER_STRING `]`)? `;`? `}`
 94    )
 95    `})`
 96  
 97  EXPORTS_LITERAL: MODULE_EXPORTS `=` `{` (EXPORTS_LITERAL_PROP | EXPORTS_SPREAD) `,`)+ `}`
 98  
 99  REQUIRE: `require` `(` STRING_LITERAL `)`
100  
101  EXPORTS_ASSIGN: (`var` | `const` | `let`) IDENTIFIER `=` REQUIRE
102  
103  MODULE_EXPORTS_ASSIGN: MODULE_EXPORTS `=` REQUIRE
104  
105  EXPORT_STAR: (`__export` | `__exportStar`) `(` REQUIRE
106  
107  EXPORT_STAR_LIB: `Object.keys(` IDENTIFIER$1 `).forEach(function (` IDENTIFIER$2 `) {`
108    (
109      `if (` IDENTIFIER$2 `===` ( `'default'` | `"default"` ) `||` IDENTIFIER$2 `===` ( '__esModule' | `"__esModule"` ) `) return` `;`? |
110      `if (` IDENTIFIER$2 `!==` ( `'default'` | `"default"` ) `)`
111    )
112    (
113      `if (` IDENTIFIER$2 `in` EXPORTS_IDENTIFIER `&&` EXPORTS_IDENTIFIER `[` IDENTIFIER$2 `] ===` IDENTIFIER$1 `[` IDENTIFIER$2 `]) return` `;`?
114    )?
115    (
116      EXPORTS_IDENTIFIER `[` IDENTIFIER$2 `] =` IDENTIFIER$1 `[` IDENTIFIER$2 `]` `;`? |
117      `Object.defineProperty(` EXPORTS_IDENTIFIER `, ` IDENTIFIER$2 `, { enumerable: true, get: function () { return ` IDENTIFIER$1 `[` IDENTIFIER$2 `]` `;`? } })` `;`?
118    )
119    `})`
120  ```
121  
122  Spacing between tokens is taken to be any ECMA-262 whitespace, ECMA-262 block comment or ECMA-262 line comment.
123  
124  * The returned export names are taken to be the combination of the `IDENTIFIER` and `IDENTIFIER_STRING` slots for all `EXPORTS_MEMBER`, `EXPORTS_LITERAL` and `EXPORTS_DEFINE` matches.
125  * The reexport specifiers are taken to be the the combination of:
126    1. The `REQUIRE` matches of the last matched of either `MODULE_EXPORTS_ASSIGN` or `EXPORTS_LITERAL`.
127    2. All _top-level_ `EXPORT_STAR` `REQUIRE` matches and `EXPORTS_ASSIGN` matches whose `IDENTIFIER` also matches the first `IDENTIFIER` in `EXPORT_STAR_LIB`.
128  
129  ### Parsing Examples
130  
131  #### Named Exports Parsing
132  
133  The basic matching rules for named exports are `exports.name`, `exports['name']` or `Object.defineProperty(exports, 'name', ...)`. This matching is done without scope analysis and regardless of the expression position:
134  
135  ```js
136  // DETECTS EXPORTS: a, b
137  (function (exports) {
138    exports.a = 'a'; 
139    exports['b'] = 'b';
140  })(exports);
141  ```
142  
143  Because there is no scope analysis, the above detection may overclassify:
144  
145  ```js
146  // DETECTS EXPORTS: a, b, c
147  (function (exports, Object) {
148    exports.a = 'a';
149    exports['b'] = 'b';
150    if (false)
151      exports.c = 'c';
152  })(NOT_EXPORTS, NOT_OBJECT);
153  ```
154  
155  It will in turn underclassify in cases where the identifiers are renamed:
156  
157  ```js
158  // DETECTS: NO EXPORTS
159  (function (e) {
160    e.a = 'a';
161    e['b'] = 'b';
162  })(exports);
163  ```
164  
165  `Object.defineProperty` is detected for specifically value and getter forms returning an identifier or member expression:
166  
167  ```js
168  // DETECTS: a, b, c, d, __esModule
169  Object.defineProperty(exports, 'a', {
170    enumerable: true,
171    get: function () {
172      return q.p;
173    }
174  });
175  Object.defineProperty(exports, 'b', {
176    enumerable: true,
177    get: function () {
178      return q['p'];
179    }
180  });
181  Object.defineProperty(exports, 'c', {
182    enumerable: true,
183    get () {
184      return b;
185    }
186  });
187  Object.defineProperty(exports, 'd', { value: 'd' });
188  Object.defineProperty(exports, '__esModule', { value: true });
189  ```
190  
191  Alternative object definition structures or getter function bodies are not detected:
192  
193  ```js
194  // DETECTS: NO EXPORTS
195  Object.defineProperty(exports, 'a', {
196    enumerable: false,
197    get () {
198      return p;
199    }
200  });
201  Object.defineProperty(exports, 'b', {
202    configurable: true,
203    get () {
204      return p;
205    }
206  });
207  Object.defineProperty(exports, 'c', {
208    get: () => p
209  });
210  Object.defineProperty(exports, 'd', {
211    enumerable: true,
212    get: function () {
213      return dynamic();
214    }
215  });
216  Object.defineProperty(exports, 'e', {
217    enumerable: true,
218    get () {
219      return 'str';
220    }
221  });
222  ```
223  
224  `Object.defineProperties` is also not supported.
225  
226  #### Exports Object Assignment
227  
228  A best-effort is made to detect `module.exports` object assignments, but because this is not a full parser, arbitrary expressions are not handled in the
229  object parsing process.
230  
231  Simple object definitions are supported:
232  
233  ```js
234  // DETECTS EXPORTS: a, b, c
235  module.exports = {
236    a,
237    'b': b,
238    c: c,
239    ...d
240  };
241  ```
242  
243  Object properties that are not identifiers or string expressions will bail out of the object detection, while spreads are ignored:
244  
245  ```js
246  // DETECTS EXPORTS: a, b
247  module.exports = {
248    a,
249    ...d,
250    b: require('c'),
251    c: "not detected since require('c') above bails the object detection"
252  }
253  ```
254  
255  `Object.defineProperties` is not currently supported either.
256  
257  #### module.exports reexport assignment
258  
259  Any `module.exports = require('mod')` assignment is detected as a reexport, but only the last one is returned:
260  
261  ```js
262  // DETECTS REEXPORTS: c
263  module.exports = require('a');
264  (module => module.exports = require('b'))(NOT_MODULE);
265  if (false) module.exports = require('c');
266  ```
267  
268  This is to avoid over-classification in Webpack bundles with externals which include `module.exports = require('external')` in their source for every external dependency.
269  
270  In exports object assignment, any spread of `require()` are detected as multiple separate reexports:
271  
272  ```js
273  // DETECTS REEXPORTS: a, b
274  module.exports = require('ignored');
275  module.exports = {
276    ...require('a'),
277    ...require('b')
278  };
279  ```
280  
281  #### Transpiler Re-exports
282  
283  For named exports, transpiler output works well with the rules described above.
284  
285  But for star re-exports, special care is taken to support common patterns of transpiler outputs from Babel and TypeScript as well as bundlers like RollupJS.
286  These reexport and star reexport patterns are restricted to only be detected at the top-level as provided by the direct output of these tools.
287  
288  For example, `export * from 'external'` is output by Babel as:
289  
290  ```js
291  "use strict";
292  
293  exports.__esModule = true;
294  
295  var _external = require("external");
296  
297  Object.keys(_external).forEach(function (key) {
298    if (key === "default" || key === "__esModule") return;
299    exports[key] = _external[key];
300  });
301  ```
302  
303  Where the `var _external = require("external")` is specifically detected as well as the `Object.keys(_external)` statement, down to the exact
304  for of that entire expression including minor variations of the output. The `_external` and `key` identifiers are carefully matched in this
305  detection.
306  
307  Similarly for TypeScript, `export * from 'external'` is output as:
308  
309  ```js
310  "use strict";
311  function __export(m) {
312      for (var p in m) if (!exports.hasOwnProperty(p)) exports[p] = m[p];
313  }
314  Object.defineProperty(exports, "__esModule", { value: true });
315  __export(require("external"));
316  ```
317  
318  Where the `__export(require("external"))` statement is explicitly detected as a reexport, including variations `tslib.__export` and `__exportStar`.
319  
320  ### Environment Support
321  
322  Node.js 10+, and [all browsers with Web Assembly support](https://caniuse.com/#feat=wasm).
323  
324  ### JS Grammar Support
325  
326  * Token state parses all line comments, block comments, strings, template strings, blocks, parens and punctuators.
327  * Division operator / regex token ambiguity is handled via backtracking checks against punctuator prefixes, including closing brace or paren backtracking.
328  * Always correctly parses valid JS source, but may parse invalid JS source without errors.
329  
330  ### Benchmarks
331  
332  Benchmarks can be run with `npm run bench`.
333  
334  Current results:
335  
336  JS Build:
337  
338  ```
339  Module load time
340  > 5ms
341  Cold Run, All Samples
342  test/samples/*.js (3635 KiB)
343  > 323ms
344  
345  Warm Runs (average of 25 runs)
346  test/samples/angular.js (1410 KiB)
347  > 14.84ms
348  test/samples/angular.min.js (303 KiB)
349  > 4.8ms
350  test/samples/d3.js (553 KiB)
351  > 7.84ms
352  test/samples/d3.min.js (250 KiB)
353  > 4ms
354  test/samples/magic-string.js (34 KiB)
355  > 0.72ms
356  test/samples/magic-string.min.js (20 KiB)
357  > 0.4ms
358  test/samples/rollup.js (698 KiB)
359  > 9.32ms
360  test/samples/rollup.min.js (367 KiB)
361  > 6.52ms
362  
363  Warm Runs, All Samples (average of 25 runs)
364  test/samples/*.js (3635 KiB)
365  > 44ms
366  ```
367  
368  Wasm Build:
369  ```
370  Module load time
371  > 11ms
372  Cold Run, All Samples
373  test/samples/*.js (3635 KiB)
374  > 42ms
375  
376  Warm Runs (average of 25 runs)
377  test/samples/angular.js (1410 KiB)
378  > 9.92ms
379  test/samples/angular.min.js (303 KiB)
380  > 3.2ms
381  test/samples/d3.js (553 KiB)
382  > 5.2ms
383  test/samples/d3.min.js (250 KiB)
384  > 2.52ms
385  test/samples/magic-string.js (34 KiB)
386  > 0.16ms
387  test/samples/magic-string.min.js (20 KiB)
388  > 0.04ms
389  test/samples/rollup.js (698 KiB)
390  > 6.44ms
391  test/samples/rollup.min.js (367 KiB)
392  > 3.96ms
393  
394  Warm Runs, All Samples (average of 25 runs)
395  test/samples/*.js (3635 KiB)
396  > 30.48ms
397  ```
398  
399  ### Wasm Build Steps
400  
401  To build download the WASI SDK from https://github.com/WebAssembly/wasi-sdk/releases.
402  
403  The Makefile assumes the existence of "wasi-sdk-11.0" and "wabt" (optional) as sibling folders to this project.
404  
405  The build through the Makefile is then run via `make lib/lexer.wasm`, which can also be triggered via `npm run build-wasm` to create `dist/lexer.js`.
406  
407  On Windows it may be preferable to use the Linux subsystem.
408  
409  After the Web Assembly build, the CJS build can be triggered via `npm run build`.
410  
411  Optimization passes are run with [Binaryen](https://github.com/WebAssembly/binaryen) prior to publish to reduce the Web Assembly footprint.
412  
413  ### License
414  
415  MIT
416  
417  [travis-url]: https://travis-ci.org/guybedford/es-module-lexer
418  [travis-image]: https://travis-ci.org/guybedford/es-module-lexer.svg?branch=master