/ cloudformation-templates / node_modules / aws-cdk / node_modules / xml2js / node_modules / sax / README.md
README.md
1 # sax js 2 3 A sax-style parser for XML and HTML. 4 5 Designed with [node](http://nodejs.org/) in mind, but should work fine in 6 the browser or other CommonJS implementations. 7 8 ## What This Is 9 10 * A very simple tool to parse through an XML string. 11 * A stepping stone to a streaming HTML parser. 12 * A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML 13 docs. 14 15 ## What This Is (probably) Not 16 17 * An HTML Parser - That's a fine goal, but this isn't it. It's just 18 XML. 19 * A DOM Builder - You can use it to build an object model out of XML, 20 but it doesn't do that out of the box. 21 * XSLT - No DOM = no querying. 22 * 100% Compliant with (some other SAX implementation) - Most SAX 23 implementations are in Java and do a lot more than this does. 24 * An XML Validator - It does a little validation when in strict mode, but 25 not much. 26 * A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic 27 masochism. 28 * A DTD-aware Thing - Fetching DTDs is a much bigger job. 29 30 ## Regarding `<!DOCTYPE`s and `<!ENTITY`s 31 32 The parser will handle the basic XML entities in text nodes and attribute 33 values: `& < > ' "`. It's possible to define additional 34 entities in XML by putting them in the DTD. This parser doesn't do anything 35 with that. If you want to listen to the `ondoctype` event, and then fetch 36 the doctypes, and read the entities and add them to `parser.ENTITIES`, then 37 be my guest. 38 39 Unknown entities will fail in strict mode, and in loose mode, will pass 40 through unmolested. 41 42 ## Usage 43 44 ```javascript 45 var sax = require("./lib/sax"), 46 strict = true, // set to false for html-mode 47 parser = sax.parser(strict); 48 49 parser.onerror = function (e) { 50 // an error happened. 51 }; 52 parser.ontext = function (t) { 53 // got some text. t is the string of text. 54 }; 55 parser.onopentag = function (node) { 56 // opened a tag. node has "name" and "attributes" 57 }; 58 parser.onattribute = function (attr) { 59 // an attribute. attr has "name" and "value" 60 }; 61 parser.onend = function () { 62 // parser stream is done, and ready to have more stuff written to it. 63 }; 64 65 parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close(); 66 67 // stream usage 68 // takes the same options as the parser 69 var saxStream = require("sax").createStream(strict, options) 70 saxStream.on("error", function (e) { 71 // unhandled errors will throw, since this is a proper node 72 // event emitter. 73 console.error("error!", e) 74 // clear the error 75 this._parser.error = null 76 this._parser.resume() 77 }) 78 saxStream.on("opentag", function (node) { 79 // same object as above 80 }) 81 // pipe is supported, and it's readable/writable 82 // same chunks coming in also go out. 83 fs.createReadStream("file.xml") 84 .pipe(saxStream) 85 .pipe(fs.createWriteStream("file-copy.xml")) 86 ``` 87 88 89 ## Arguments 90 91 Pass the following arguments to the parser function. All are optional. 92 93 `strict` - Boolean. Whether or not to be a jerk. Default: `false`. 94 95 `opt` - Object bag of settings regarding string formatting. All default to `false`. 96 97 Settings supported: 98 99 * `trim` - Boolean. Whether or not to trim text and comment nodes. 100 * `normalize` - Boolean. If true, then turn any whitespace into a single 101 space. 102 * `lowercase` - Boolean. If true, then lowercase tag names and attribute names 103 in loose mode, rather than uppercasing them. 104 * `xmlns` - Boolean. If true, then namespaces are supported. 105 * `position` - Boolean. If false, then don't track line/col/position. 106 * `strictEntities` - Boolean. If true, only parse [predefined XML 107 entities](http://www.w3.org/TR/REC-xml/#sec-predefined-ent) 108 (`&`, `'`, `>`, `<`, and `"`) 109 110 ## Methods 111 112 `write` - Write bytes onto the stream. You don't have to do this all at 113 once. You can keep writing as much as you want. 114 115 `close` - Close the stream. Once closed, no more data may be written until 116 it is done processing the buffer, which is signaled by the `end` event. 117 118 `resume` - To gracefully handle errors, assign a listener to the `error` 119 event. Then, when the error is taken care of, you can call `resume` to 120 continue parsing. Otherwise, the parser will not continue while in an error 121 state. 122 123 ## Members 124 125 At all times, the parser object will have the following members: 126 127 `line`, `column`, `position` - Indications of the position in the XML 128 document where the parser currently is looking. 129 130 `startTagPosition` - Indicates the position where the current tag starts. 131 132 `closed` - Boolean indicating whether or not the parser can be written to. 133 If it's `true`, then wait for the `ready` event to write again. 134 135 `strict` - Boolean indicating whether or not the parser is a jerk. 136 137 `opt` - Any options passed into the constructor. 138 139 `tag` - The current tag being dealt with. 140 141 And a bunch of other stuff that you probably shouldn't touch. 142 143 ## Events 144 145 All events emit with a single argument. To listen to an event, assign a 146 function to `on<eventname>`. Functions get executed in the this-context of 147 the parser object. The list of supported events are also in the exported 148 `EVENTS` array. 149 150 When using the stream interface, assign handlers using the EventEmitter 151 `on` function in the normal fashion. 152 153 `error` - Indication that something bad happened. The error will be hanging 154 out on `parser.error`, and must be deleted before parsing can continue. By 155 listening to this event, you can keep an eye on that kind of stuff. Note: 156 this happens *much* more in strict mode. Argument: instance of `Error`. 157 158 `text` - Text node. Argument: string of text. 159 160 `doctype` - The `<!DOCTYPE` declaration. Argument: doctype string. 161 162 `processinginstruction` - Stuff like `<?xml foo="blerg" ?>`. Argument: 163 object with `name` and `body` members. Attributes are not parsed, as 164 processing instructions have implementation dependent semantics. 165 166 `sgmldeclaration` - Random SGML declarations. Stuff like `<!ENTITY p>` 167 would trigger this kind of event. This is a weird thing to support, so it 168 might go away at some point. SAX isn't intended to be used to parse SGML, 169 after all. 170 171 `opentagstart` - Emitted immediately when the tag name is available, 172 but before any attributes are encountered. Argument: object with a 173 `name` field and an empty `attributes` set. Note that this is the 174 same object that will later be emitted in the `opentag` event. 175 176 `opentag` - An opening tag. Argument: object with `name` and `attributes`. 177 In non-strict mode, tag names are uppercased, unless the `lowercase` 178 option is set. If the `xmlns` option is set, then it will contain 179 namespace binding information on the `ns` member, and will have a 180 `local`, `prefix`, and `uri` member. 181 182 `closetag` - A closing tag. In loose mode, tags are auto-closed if their 183 parent closes. In strict mode, well-formedness is enforced. Note that 184 self-closing tags will have `closeTag` emitted immediately after `openTag`. 185 Argument: tag name. 186 187 `attribute` - An attribute node. Argument: object with `name` and `value`. 188 In non-strict mode, attribute names are uppercased, unless the `lowercase` 189 option is set. If the `xmlns` option is set, it will also contains namespace 190 information. 191 192 `comment` - A comment node. Argument: the string of the comment. 193 194 `opencdata` - The opening tag of a `<![CDATA[` block. 195 196 `cdata` - The text of a `<![CDATA[` block. Since `<![CDATA[` blocks can get 197 quite large, this event may fire multiple times for a single block, if it 198 is broken up into multiple `write()`s. Argument: the string of random 199 character data. 200 201 `closecdata` - The closing tag (`]]>`) of a `<![CDATA[` block. 202 203 `opennamespace` - If the `xmlns` option is set, then this event will 204 signal the start of a new namespace binding. 205 206 `closenamespace` - If the `xmlns` option is set, then this event will 207 signal the end of a namespace binding. 208 209 `end` - Indication that the closed stream has ended. 210 211 `ready` - Indication that the stream has reset, and is ready to be written 212 to. 213 214 `noscript` - In non-strict mode, `<script>` tags trigger a `"script"` 215 event, and their contents are not checked for special xml characters. 216 If you pass `noscript: true`, then this behavior is suppressed. 217 218 ## Reporting Problems 219 220 It's best to write a failing test if you find an issue. I will always 221 accept pull requests with failing tests if they demonstrate intended 222 behavior, but it is very hard to figure out what issue you're describing 223 without a test. Writing a test is also the best way for you yourself 224 to figure out if you really understand the issue you think you have with 225 sax-js.