/ external / libucl / README.md
README.md
  1  # UCL - Universal Configuration Language
  2  
  3  **Table of Contents**  *generated with [DocToc](http://doctoc.herokuapp.com/)*
  4  
  5  - [Introduction](#introduction)
  6  - [Basic structure](#basic-structure)
  7  - [Improvements to the json notation](#improvements-to-the-json-notation)
  8  	- [General syntax sugar](#general-syntax-sugar)
  9  	- [Automatic arrays creation](#automatic-arrays-creation)
 10  	- [Named keys hierarchy](#named-keys-hierarchy)
 11  	- [Convenient numbers and booleans](#convenient-numbers-and-booleans)
 12  - [General improvements](#general-improvements)
 13  	- [Comments](#comments)
 14  	- [Macros support](#macros-support)
 15  	- [Variables support](#variables-support)
 16  	- [Multiline strings](#multiline-strings)
 17  	- [Single quoted strings](#single-quoted-strings)
 18  - [Emitter](#emitter)
 19  - [Validation](#validation)
 20  - [Performance](#performance)
 21  - [Conclusion](#conclusion)
 22  
 23  ## Introduction
 24  
 25  This repository provides the `C` library for parsing configurations written in `UCL` - universal configuration language. It also provides functions to operate with other formats:
 26  
 27  * `JSON`: read, write and pretty format
 28  * `Messagepack`: read and write
 29  * `S-Expressions`: read only (canonical form)
 30  * `Yaml`: limited write support (mainly for compatibility)
 31  
 32  If you are looking for the libucl API documentation you can find it at [this page](doc/api.md).
 33  
 34  ## Basic structure
 35  
 36  `UCL` is heavily infused by `nginx` configuration as the example of a convenient configuration
 37  system. However, `UCL` is fully compatible with `JSON` format and is able to parse json files.
 38  For example, you can write the same configuration in the following ways:
 39  
 40  * in nginx like:
 41  
 42  ```nginx
 43  param = value;
 44  section {
 45      param = value;
 46      param1 = value1;
 47      flag = true;
 48      number = 10k;
 49      time = 0.2s;
 50      string = "something";
 51      subsection {
 52          host = {
 53              host = "hostname";
 54              port = 900;
 55          }
 56          host = {
 57              host = "hostname";
 58              port = 901;
 59          }
 60      }
 61  }
 62  ```
 63  
 64  * or in JSON:
 65  
 66  ```json
 67  {
 68      "param": "value",
 69      "section": {
 70          "param": "value",
 71          "param1": "value1",
 72          "flag": true,
 73          "number": 10000,
 74          "time": "0.2s",
 75          "string": "something",
 76          "subsection": {
 77              "host": [
 78                  {
 79                      "host": "hostname",
 80                      "port": 900
 81                  },
 82                  {
 83                      "host": "hostname",
 84                      "port": 901
 85                  }
 86              ]
 87          }
 88      }
 89  }
 90  ```
 91  
 92  ## Improvements to the json notation.
 93  
 94  There are various things that make ucl configuration more convenient for editing than strict json:
 95  
 96  ### General syntax sugar
 97  
 98  * Braces are not necessary to enclose a top object: it is automatically treated as an object:
 99  
100  ```json
101  "key": "value"
102  ```
103  is equal to:
104  ```json
105  {"key": "value"}
106  ```
107  
108  * There is no requirement of quotes for strings and keys, moreover, `:` may be replaced `=` or even be skipped for objects:
109  
110  ```nginx
111  key = value;
112  section {
113      key = value;
114  }
115  ```
116  is equal to:
117  ```json
118  {
119      "key": "value",
120      "section": {
121          "key": "value"
122      }
123  }
124  ```
125  
126  * No commas mess: you can safely place a comma or semicolon for the last element in an array or an object:
127  
128  ```json
129  {
130      "key1": "value",
131      "key2": "value",
132  }
133  ```
134  ### Automatic arrays creation
135  
136  * Non-unique keys in an object are allowed and are automatically converted to the arrays internally:
137  
138  ```json
139  {
140      "key": "value1",
141      "key": "value2"
142  }
143  ```
144  is converted to:
145  ```json
146  {
147      "key": ["value1", "value2"]
148  }
149  ```
150  
151  ### Named keys hierarchy
152  
153  UCL accepts named keys and organize them into objects hierarchy internally. Here is an example of this process:
154  ```nginx
155  section "blah" {
156  	key = value;
157  }
158  section foo {
159  	key = value;
160  }
161  ```
162  
163  is converted to the following object:
164  
165  ```nginx
166  section {
167  	blah {
168  		key = value;
169  	}
170  	foo {
171  		key = value;
172  	}
173  }
174  ```
175  
176  Plain definitions may be more complex and contain more than a single level of nested objects:
177  
178  ```nginx
179  section "blah" "foo" {
180  	key = value;
181  }
182  ```
183  
184  is presented as:
185  
186  ```nginx
187  section {
188  	blah {
189  		foo {
190  			key = value;
191  		}
192  	}
193  }
194  ```
195  
196  ### Convenient numbers and booleans
197  
198  * Numbers can have suffixes to specify standard multipliers:
199      + `[kKmMgG]` - standard 10 base multipliers (so `1k` is translated to 1000)
200      + `[kKmMgG]b` - 2 power multipliers (so `1kb` is translated to 1024)
201      + `[s|min|d|w|y]` - time multipliers, all time values are translated to float number of seconds, for example `10min` is translated to 600.0 and `10ms` is translated to 0.01
202  * Hexadecimal integers can be used by `0x` prefix, for example `key = 0xff`. However, floating point values can use decimal base only.
203  * Booleans can be specified as `true` or `yes` or `on` and `false` or `no` or `off`.
204  * It is still possible to treat numbers and booleans as strings by enclosing them in double quotes.
205  
206  ## General improvements
207  
208  ### Comments
209  
210  UCL supports different style of comments:
211  
212  * single line: `#`
213  * multiline: `/* ... */`
214  
215  Multiline comments may be nested:
216  ```c
217  # Sample single line comment
218  /*
219   some comment
220   /* nested comment */
221   end of comment
222  */
223  ```
224  
225  ### Macros support
226  
227  UCL supports external macros both multiline and single line ones:
228  ```nginx
229  .macro_name "sometext";
230  .macro_name {
231      Some long text
232      ....
233  };
234  ```
235  
236  Moreover, each macro can accept an optional list of arguments in braces. These
237  arguments themselves are the `UCL` object that is parsed and passed to a macro as
238  options:
239  
240  ```nginx
241  .macro_name(param=value) "something";
242  .macro_name(param={key=value}) "something";
243  .macro_name(.include "params.conf") "something";
244  .macro_name(#this is multiline macro
245  param = [value1, value2]) "something";
246  .macro_name(key="()") "something";
247  ```
248  
249  UCL also provide a convenient `include` macro to load content from another files
250  to the current UCL object. This macro accepts either path to file:
251  
252  ```nginx
253  .include "/full/path.conf"
254  .include "./relative/path.conf"
255  .include "${CURDIR}/path.conf"
256  ```
257  
258  or URL (if ucl is built with url support provided by either `libcurl` or `libfetch`):
259  
260  	.include "http://example.com/file.conf"
261  
262  `.include` macro supports a set of options:
263  
264  * `try` (default: **false**) - if this option is `true` than UCL treats errors on loading of
265  this file as non-fatal. For example, such a file can be absent but it won't stop the parsing
266  of the top-level document.
267  * `sign` (default: **false**) - if this option is `true` UCL loads and checks the signature for
268  a file from path named `<FILEPATH>.sig`. Trusted public keys should be provided for UCL API after
269  parser is created but before any configurations are parsed.
270  * `glob` (default: **false**) - if this option is `true` UCL treats the filename as GLOB pattern and load
271  all files that matches the specified pattern (normally the format of patterns is defined in `glob` manual page
272  for your operating system). This option is meaningless for URL includes.
273  * `url` (default: **true**) - allow URL includes.
274  * `path` (default: empty) - A UCL_ARRAY of directories to search for the include file.
275  Search ends after the first match, unless `glob` is true, then all matches are included.
276  * `prefix` (default false) - Put included contents inside an object, instead
277  of loading them into the root. If no `key` is provided, one is automatically generated based on each files basename()
278  * `key` (default: <empty string>) - Key to load contents of include into. If
279  the key already exists, it must be the correct type
280  * `target` (default: object) - Specify if the `prefix` `key` should be an
281  object or an array.
282  * `priority` (default: 0) - specify priority for the include (see below).
283  * `duplicate` (default: 'append') - specify policy of duplicates resolving:
284  	- `append` - default strategy, if we have new object of higher priority then it replaces old one, if we have new object with less priority it is ignored completely, and if we have two duplicate objects with the same priority then we have a multi-value key (implicit array)
285  	- `merge` - if we have object or array, then new keys are merged inside, if we have a plain object then an implicit array is formed (regardless of priorities)
286  	- `error` - create error on duplicate keys and stop parsing
287  	- `rewrite` - always rewrite an old value with new one (ignoring priorities)
288  
289  Priorities are used by UCL parser to manage the policy of objects rewriting during including other files
290  as following:
291  
292  * If we have two objects with the same priority then we form an implicit array
293  * If a new object has bigger priority then we overwrite an old one
294  * If a new object has lower priority then we ignore it
295  
296  By default, the priority of top-level object is set to zero (lowest priority). Currently,
297  you can define up to 16 priorities (from 0 to 15). Includes with bigger priorities will
298  rewrite keys from the objects with lower priorities as specified by the policy. The priority
299  of the top-level or any other object can be changed with the `.priority` macro, which has no
300  options and takes the new priority:
301  
302  ```
303  # Default priority: 0.
304  foo = 6
305  .priority 5
306  # The following will have priority 5.
307  bar = 6
308  baz = 7
309  # The following will be included with a priority of 3, 5, and 6 respectively.
310  .include(priority=3) "path.conf"
311  .include(priority=5) "equivalent-path.conf"
312  .include(priority=6) "highpriority-path.conf"
313  ```
314  
315  ### Variables support
316  
317  UCL supports variables in input. Variables are registered by a user of the UCL parser and can be presented in the following forms:
318  
319  * `${VARIABLE}`
320  * `$VARIABLE`
321  
322  UCL currently does not support nested variables. To escape variables one could use double dollar signs:
323  
324  * `$${VARIABLE}` is converted to `${VARIABLE}`
325  * `$$VARIABLE` is converted to `$VARIABLE`
326  
327  However, if no valid variables are found in a string, no expansion will be performed (and `$$` thus remains unchanged). This may be a subject
328  to change in future libucl releases.
329  
330  ### Multiline strings
331  
332  UCL can handle multiline strings as well as single line ones. It uses shell/perl like notation for such objects:
333  ```
334  key = <<EOD
335  some text
336  splitted to
337  lines
338  EOD
339  ```
340  
341  In this example `key` will be interpreted as the following string: `some text\nsplitted to\nlines`.
342  Here are some rules for this syntax:
343  
344  * Multiline terminator must start just after `<<` symbols and it must consist of capital letters only (e.g. `<<eof` or `<< EOF` won't work);
345  * Terminator must end with a single newline character (and no spaces are allowed between terminator and newline character);
346  * To finish multiline string you need to include a terminator string just after newline and followed by a newline (no spaces or other characters are allowed as well);
347  * The initial and the final newlines are not inserted to the resulting string, but you can still specify newlines at the beginning and at the end of a value, for example:
348  
349  ```
350  key <<EOD
351  
352  some
353  text
354  
355  EOD
356  ```
357  
358  ### Single quoted strings
359  
360  It is possible to use single quoted strings to simplify escaping rules. All values passed in single quoted strings are *NOT* escaped, with two exceptions: a single `'` character just before `\` character, and a newline character just after `\` character that is ignored.
361  
362  ```
363  key = 'value'; # Read as value
364  key = 'value\n\'; # Read as  value\n\
365  key = 'value\''; # Read as value'
366  key = 'value\
367  bla'; # Read as valuebla
368  ```
369  
370  ## Emitter
371  
372  Each UCL object can be serialized to one of the four supported formats:
373  
374  * `JSON` - canonic json notation (with spaces indented structure);
375  * `Compacted JSON` - compact json notation (without spaces or newlines);
376  * `Configuration` - nginx like notation;
377  * `YAML` - yaml inlined notation;
378  * `messagepack` - MessagePack binary format.
379  
380  ## Validation
381  
382  UCL allows validation of objects. It uses the same schema that is used for json: [json schema v4](http://json-schema.org). UCL supports the full set of json schema with the exception of remote references. This feature is unlikely useful for configuration objects. Of course, a schema definition can be in UCL format instead of JSON that simplifies schemas writing. Moreover, since UCL supports multiple values for keys in an object it is possible to specify generic integer constraints `maxValues` and `minValues` to define the limits of values count in a single key. UCL currently is not absolutely strict about validation schemas themselves, therefore UCL users should supply valid schemas (as it is defined in json-schema draft v4) to ensure that the input objects are validated properly.
383  
384  ## Performance
385  
386  Are UCL parser and emitter fast enough? Well, there are some numbers.
387  I got a 19Mb file that consist of ~700 thousand lines of json (obtained via
388  http://www.json-generator.com/). Then I checked jansson library that performs json
389  parsing and emitting and compared it with UCL. Here are results:
390  
391  ```
392  jansson: parsed json in 1.3899 seconds
393  jansson: emitted object in 0.2609 seconds
394  
395  ucl: parsed input in 0.6649 seconds
396  ucl: emitted config in 0.2423 seconds
397  ucl: emitted json in 0.2329 seconds
398  ucl: emitted compact json in 0.1811 seconds
399  ucl: emitted yaml in 0.2489 seconds
400  ```
401  
402  So far, UCL seems to be significantly faster than jansson on parsing and slightly faster on emitting. Moreover,
403  UCL compiled with optimizations (-O3) performs significantly faster:
404  ```
405  ucl: parsed input in 0.3002 seconds
406  ucl: emitted config in 0.1174 seconds
407  ucl: emitted json in 0.1174 seconds
408  ucl: emitted compact json in 0.0991 seconds
409  ucl: emitted yaml in 0.1354 seconds
410  ```
411  
412  You can do your own benchmarks by running `make check` in libucl top directory.
413  
414  ## Conclusion
415  
416  UCL has clear design that should be very convenient for reading and writing. At the same time it is compatible with
417  JSON language and therefore can be used as a simple JSON parser. Macro logic provides an ability to extend configuration
418  language (for example by including some lua code) and comments allow to disable or enable the parts of a configuration
419  quickly.