README.md
1 # UCL - Universal Configuration Language 2 3 **Table of Contents** *generated with [DocToc](http://doctoc.herokuapp.com/)* 4 5 - [Introduction](#introduction) 6 - [Basic structure](#basic-structure) 7 - [Improvements to the json notation](#improvements-to-the-json-notation) 8 - [General syntax sugar](#general-syntax-sugar) 9 - [Automatic arrays creation](#automatic-arrays-creation) 10 - [Named keys hierarchy](#named-keys-hierarchy) 11 - [Convenient numbers and booleans](#convenient-numbers-and-booleans) 12 - [General improvements](#general-improvements) 13 - [Comments](#comments) 14 - [Macros support](#macros-support) 15 - [Variables support](#variables-support) 16 - [Multiline strings](#multiline-strings) 17 - [Single quoted strings](#single-quoted-strings) 18 - [Emitter](#emitter) 19 - [Validation](#validation) 20 - [Performance](#performance) 21 - [Conclusion](#conclusion) 22 23 ## Introduction 24 25 This repository provides the `C` library for parsing configurations written in `UCL` - universal configuration language. It also provides functions to operate with other formats: 26 27 * `JSON`: read, write and pretty format 28 * `Messagepack`: read and write 29 * `S-Expressions`: read only (canonical form) 30 * `Yaml`: limited write support (mainly for compatibility) 31 32 If you are looking for the libucl API documentation you can find it at [this page](doc/api.md). 33 34 ## Basic structure 35 36 `UCL` is heavily infused by `nginx` configuration as the example of a convenient configuration 37 system. However, `UCL` is fully compatible with `JSON` format and is able to parse json files. 38 For example, you can write the same configuration in the following ways: 39 40 * in nginx like: 41 42 ```nginx 43 param = value; 44 section { 45 param = value; 46 param1 = value1; 47 flag = true; 48 number = 10k; 49 time = 0.2s; 50 string = "something"; 51 subsection { 52 host = { 53 host = "hostname"; 54 port = 900; 55 } 56 host = { 57 host = "hostname"; 58 port = 901; 59 } 60 } 61 } 62 ``` 63 64 * or in JSON: 65 66 ```json 67 { 68 "param": "value", 69 "section": { 70 "param": "value", 71 "param1": "value1", 72 "flag": true, 73 "number": 10000, 74 "time": "0.2s", 75 "string": "something", 76 "subsection": { 77 "host": [ 78 { 79 "host": "hostname", 80 "port": 900 81 }, 82 { 83 "host": "hostname", 84 "port": 901 85 } 86 ] 87 } 88 } 89 } 90 ``` 91 92 ## Improvements to the json notation. 93 94 There are various things that make ucl configuration more convenient for editing than strict json: 95 96 ### General syntax sugar 97 98 * Braces are not necessary to enclose a top object: it is automatically treated as an object: 99 100 ```json 101 "key": "value" 102 ``` 103 is equal to: 104 ```json 105 {"key": "value"} 106 ``` 107 108 * There is no requirement of quotes for strings and keys, moreover, `:` may be replaced `=` or even be skipped for objects: 109 110 ```nginx 111 key = value; 112 section { 113 key = value; 114 } 115 ``` 116 is equal to: 117 ```json 118 { 119 "key": "value", 120 "section": { 121 "key": "value" 122 } 123 } 124 ``` 125 126 * No commas mess: you can safely place a comma or semicolon for the last element in an array or an object: 127 128 ```json 129 { 130 "key1": "value", 131 "key2": "value", 132 } 133 ``` 134 ### Automatic arrays creation 135 136 * Non-unique keys in an object are allowed and are automatically converted to the arrays internally: 137 138 ```json 139 { 140 "key": "value1", 141 "key": "value2" 142 } 143 ``` 144 is converted to: 145 ```json 146 { 147 "key": ["value1", "value2"] 148 } 149 ``` 150 151 ### Named keys hierarchy 152 153 UCL accepts named keys and organize them into objects hierarchy internally. Here is an example of this process: 154 ```nginx 155 section "blah" { 156 key = value; 157 } 158 section foo { 159 key = value; 160 } 161 ``` 162 163 is converted to the following object: 164 165 ```nginx 166 section { 167 blah { 168 key = value; 169 } 170 foo { 171 key = value; 172 } 173 } 174 ``` 175 176 Plain definitions may be more complex and contain more than a single level of nested objects: 177 178 ```nginx 179 section "blah" "foo" { 180 key = value; 181 } 182 ``` 183 184 is presented as: 185 186 ```nginx 187 section { 188 blah { 189 foo { 190 key = value; 191 } 192 } 193 } 194 ``` 195 196 ### Convenient numbers and booleans 197 198 * Numbers can have suffixes to specify standard multipliers: 199 + `[kKmMgG]` - standard 10 base multipliers (so `1k` is translated to 1000) 200 + `[kKmMgG]b` - 2 power multipliers (so `1kb` is translated to 1024) 201 + `[s|min|d|w|y]` - time multipliers, all time values are translated to float number of seconds, for example `10min` is translated to 600.0 and `10ms` is translated to 0.01 202 * Hexadecimal integers can be used by `0x` prefix, for example `key = 0xff`. However, floating point values can use decimal base only. 203 * Booleans can be specified as `true` or `yes` or `on` and `false` or `no` or `off`. 204 * It is still possible to treat numbers and booleans as strings by enclosing them in double quotes. 205 206 ## General improvements 207 208 ### Comments 209 210 UCL supports different style of comments: 211 212 * single line: `#` 213 * multiline: `/* ... */` 214 215 Multiline comments may be nested: 216 ```c 217 # Sample single line comment 218 /* 219 some comment 220 /* nested comment */ 221 end of comment 222 */ 223 ``` 224 225 ### Macros support 226 227 UCL supports external macros both multiline and single line ones: 228 ```nginx 229 .macro_name "sometext"; 230 .macro_name { 231 Some long text 232 .... 233 }; 234 ``` 235 236 Moreover, each macro can accept an optional list of arguments in braces. These 237 arguments themselves are the `UCL` object that is parsed and passed to a macro as 238 options: 239 240 ```nginx 241 .macro_name(param=value) "something"; 242 .macro_name(param={key=value}) "something"; 243 .macro_name(.include "params.conf") "something"; 244 .macro_name(#this is multiline macro 245 param = [value1, value2]) "something"; 246 .macro_name(key="()") "something"; 247 ``` 248 249 UCL also provide a convenient `include` macro to load content from another files 250 to the current UCL object. This macro accepts either path to file: 251 252 ```nginx 253 .include "/full/path.conf" 254 .include "./relative/path.conf" 255 .include "${CURDIR}/path.conf" 256 ``` 257 258 or URL (if ucl is built with url support provided by either `libcurl` or `libfetch`): 259 260 .include "http://example.com/file.conf" 261 262 `.include` macro supports a set of options: 263 264 * `try` (default: **false**) - if this option is `true` than UCL treats errors on loading of 265 this file as non-fatal. For example, such a file can be absent but it won't stop the parsing 266 of the top-level document. 267 * `sign` (default: **false**) - if this option is `true` UCL loads and checks the signature for 268 a file from path named `<FILEPATH>.sig`. Trusted public keys should be provided for UCL API after 269 parser is created but before any configurations are parsed. 270 * `glob` (default: **false**) - if this option is `true` UCL treats the filename as GLOB pattern and load 271 all files that matches the specified pattern (normally the format of patterns is defined in `glob` manual page 272 for your operating system). This option is meaningless for URL includes. 273 * `url` (default: **true**) - allow URL includes. 274 * `path` (default: empty) - A UCL_ARRAY of directories to search for the include file. 275 Search ends after the first match, unless `glob` is true, then all matches are included. 276 * `prefix` (default false) - Put included contents inside an object, instead 277 of loading them into the root. If no `key` is provided, one is automatically generated based on each files basename() 278 * `key` (default: <empty string>) - Key to load contents of include into. If 279 the key already exists, it must be the correct type 280 * `target` (default: object) - Specify if the `prefix` `key` should be an 281 object or an array. 282 * `priority` (default: 0) - specify priority for the include (see below). 283 * `duplicate` (default: 'append') - specify policy of duplicates resolving: 284 - `append` - default strategy, if we have new object of higher priority then it replaces old one, if we have new object with less priority it is ignored completely, and if we have two duplicate objects with the same priority then we have a multi-value key (implicit array) 285 - `merge` - if we have object or array, then new keys are merged inside, if we have a plain object then an implicit array is formed (regardless of priorities) 286 - `error` - create error on duplicate keys and stop parsing 287 - `rewrite` - always rewrite an old value with new one (ignoring priorities) 288 289 Priorities are used by UCL parser to manage the policy of objects rewriting during including other files 290 as following: 291 292 * If we have two objects with the same priority then we form an implicit array 293 * If a new object has bigger priority then we overwrite an old one 294 * If a new object has lower priority then we ignore it 295 296 By default, the priority of top-level object is set to zero (lowest priority). Currently, 297 you can define up to 16 priorities (from 0 to 15). Includes with bigger priorities will 298 rewrite keys from the objects with lower priorities as specified by the policy. The priority 299 of the top-level or any other object can be changed with the `.priority` macro, which has no 300 options and takes the new priority: 301 302 ``` 303 # Default priority: 0. 304 foo = 6 305 .priority 5 306 # The following will have priority 5. 307 bar = 6 308 baz = 7 309 # The following will be included with a priority of 3, 5, and 6 respectively. 310 .include(priority=3) "path.conf" 311 .include(priority=5) "equivalent-path.conf" 312 .include(priority=6) "highpriority-path.conf" 313 ``` 314 315 ### Variables support 316 317 UCL supports variables in input. Variables are registered by a user of the UCL parser and can be presented in the following forms: 318 319 * `${VARIABLE}` 320 * `$VARIABLE` 321 322 UCL currently does not support nested variables. To escape variables one could use double dollar signs: 323 324 * `$${VARIABLE}` is converted to `${VARIABLE}` 325 * `$$VARIABLE` is converted to `$VARIABLE` 326 327 However, if no valid variables are found in a string, no expansion will be performed (and `$$` thus remains unchanged). This may be a subject 328 to change in future libucl releases. 329 330 ### Multiline strings 331 332 UCL can handle multiline strings as well as single line ones. It uses shell/perl like notation for such objects: 333 ``` 334 key = <<EOD 335 some text 336 splitted to 337 lines 338 EOD 339 ``` 340 341 In this example `key` will be interpreted as the following string: `some text\nsplitted to\nlines`. 342 Here are some rules for this syntax: 343 344 * Multiline terminator must start just after `<<` symbols and it must consist of capital letters only (e.g. `<<eof` or `<< EOF` won't work); 345 * Terminator must end with a single newline character (and no spaces are allowed between terminator and newline character); 346 * To finish multiline string you need to include a terminator string just after newline and followed by a newline (no spaces or other characters are allowed as well); 347 * The initial and the final newlines are not inserted to the resulting string, but you can still specify newlines at the beginning and at the end of a value, for example: 348 349 ``` 350 key <<EOD 351 352 some 353 text 354 355 EOD 356 ``` 357 358 ### Single quoted strings 359 360 It is possible to use single quoted strings to simplify escaping rules. All values passed in single quoted strings are *NOT* escaped, with two exceptions: a single `'` character just before `\` character, and a newline character just after `\` character that is ignored. 361 362 ``` 363 key = 'value'; # Read as value 364 key = 'value\n\'; # Read as value\n\ 365 key = 'value\''; # Read as value' 366 key = 'value\ 367 bla'; # Read as valuebla 368 ``` 369 370 ## Emitter 371 372 Each UCL object can be serialized to one of the four supported formats: 373 374 * `JSON` - canonic json notation (with spaces indented structure); 375 * `Compacted JSON` - compact json notation (without spaces or newlines); 376 * `Configuration` - nginx like notation; 377 * `YAML` - yaml inlined notation; 378 * `messagepack` - MessagePack binary format. 379 380 ## Validation 381 382 UCL allows validation of objects. It uses the same schema that is used for json: [json schema v4](http://json-schema.org). UCL supports the full set of json schema with the exception of remote references. This feature is unlikely useful for configuration objects. Of course, a schema definition can be in UCL format instead of JSON that simplifies schemas writing. Moreover, since UCL supports multiple values for keys in an object it is possible to specify generic integer constraints `maxValues` and `minValues` to define the limits of values count in a single key. UCL currently is not absolutely strict about validation schemas themselves, therefore UCL users should supply valid schemas (as it is defined in json-schema draft v4) to ensure that the input objects are validated properly. 383 384 ## Performance 385 386 Are UCL parser and emitter fast enough? Well, there are some numbers. 387 I got a 19Mb file that consist of ~700 thousand lines of json (obtained via 388 http://www.json-generator.com/). Then I checked jansson library that performs json 389 parsing and emitting and compared it with UCL. Here are results: 390 391 ``` 392 jansson: parsed json in 1.3899 seconds 393 jansson: emitted object in 0.2609 seconds 394 395 ucl: parsed input in 0.6649 seconds 396 ucl: emitted config in 0.2423 seconds 397 ucl: emitted json in 0.2329 seconds 398 ucl: emitted compact json in 0.1811 seconds 399 ucl: emitted yaml in 0.2489 seconds 400 ``` 401 402 So far, UCL seems to be significantly faster than jansson on parsing and slightly faster on emitting. Moreover, 403 UCL compiled with optimizations (-O3) performs significantly faster: 404 ``` 405 ucl: parsed input in 0.3002 seconds 406 ucl: emitted config in 0.1174 seconds 407 ucl: emitted json in 0.1174 seconds 408 ucl: emitted compact json in 0.0991 seconds 409 ucl: emitted yaml in 0.1354 seconds 410 ``` 411 412 You can do your own benchmarks by running `make check` in libucl top directory. 413 414 ## Conclusion 415 416 UCL has clear design that should be very convenient for reading and writing. At the same time it is compatible with 417 JSON language and therefore can be used as a simple JSON parser. Macro logic provides an ability to extend configuration 418 language (for example by including some lua code) and comments allow to disable or enable the parts of a configuration 419 quickly.