/ libxml2 / doc / guidelines.html
guidelines.html
  1  <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
  2      "http://www.w3.org/TR/html4/loose.dtd">
  3  <html>
  4  <head>
  5    <meta http-equiv="Content-Type" content="text/html">
  6    <style type="text/css"></style>
  7  <!--
  8  TD {font-family: Verdana,Arial,Helvetica}
  9  BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
 10  H1 {font-family: Verdana,Arial,Helvetica}
 11  H2 {font-family: Verdana,Arial,Helvetica}
 12  H3 {font-family: Verdana,Arial,Helvetica}
 13  A:link, A:visited, A:active { text-decoration: underline }
 14    </style>
 15  -->
 16    <title>XML resources publication guidelines</title>
 17  </head>
 18  
 19  <body bgcolor="#fffacd" text="#000000">
 20  <h1 align="center">XML resources publication guidelines</h1>
 21  
 22  <p></p>
 23  
 24  <p>The goal of this document is to provide a set of guidelines and tips
 25  helping the publication and deployment of <a
 26  href="http://www.w3.org/XML/">XML</a> resources for the <a
 27  href="http://www.gnome.org/">GNOME project</a>. However it is not tied to
 28  GNOME and might be helpful more generally. I welcome <a
 29  href="mailto:veillard@redhat.com">feedback</a> on this document.</p>
 30  
 31  <p>The intended audience is the software developers who started using XML
 32  for some of the resources of their project, as a storage format, for data
 33  exchange, checking or transformations. There have been an increasing number
 34  of new XML formats defined, but not all steps have been taken, possibly because of
 35  lack of documentation, to truly gain all the benefits of the use of XML.
 36  These guidelines hope to improve the matter and provide a better overview of
 37  the overall XML processing and associated steps needed to deploy it
 38  successfully:</p>
 39  
 40  <p>Table of contents:</p>
 41  <ol>
 42    <li><a href="#Design">Design guidelines</a></li>
 43    <li><a href="#Canonical">Canonical URL</a></li>
 44    <li><a href="#Catalog">Catalog setup</a></li>
 45    <li><a href="#Package">Package integration</a></li>
 46  </ol>
 47  
 48  <h2><a name="Design">Design guidelines</a></h2>
 49  
 50  <p>This part intends to focus on the format itself of XML. It may  arrive
 51  a bit too late since the structure of the document may already be cast in
 52  existing and deployed code. Still, here are a few rules which might be helpful
 53  when designing a new XML vocabulary or making the revision of an existing
 54  format:</p>
 55  
 56  <h3>Reuse existing formats:</h3>
 57  
 58  <p>This may sounds a bit simplistic, but before designing your own format,
 59  try to lookup existing XML vocabularies on similar data. Ideally this allows
 60  you to reuse them, in which case a lot of the existing tools like DTD, schemas
 61  and stylesheets may already be available. If you are looking at a
 62  documentation format, <a href="http://www.docbook.org/">DocBook</a> should
 63  handle your needs. If reuse is not possible because some semantic or use case
 64  aspects are too different this will be helpful avoiding design errors like
 65  targeting the vocabulary to the wrong abstraction level. In this format
 66  design phase try to be synthetic and be sure to express the real content of
 67  your data and use the XML structure to express the semantic and context of
 68  those data.</p>
 69  
 70  <h3>DTD rules:</h3>
 71  
 72  <p>Building a DTD (Document Type Definition) or a Schema describing the
 73  structure allowed by instances is the core of the design process of the
 74  vocabulary. Here are a few tips:</p>
 75  <ul>
 76    <li>use significant words for the element and attributes names.</li>
 77    <li>do not use attributes for general textual content, attributes
 78      will be modified by the parser before reaching the application,
 79      spaces and line informations will be modified.</li>
 80    <li>use single elements for every string that might be subject to
 81      localization. The canonical way to localize XML content is to use
 82      siblings element carrying different xml:lang attributes like in the
 83      following:
 84      <pre>&lt;welcome&gt;
 85    &lt;msg xml:lang="en"&gt;hello&lt;/msg&gt;
 86    &lt;msg xml:lang="fr"&gt;bonjour&lt;/msg&gt;
 87  &lt;/welcome&gt;</pre>
 88    </li>
 89    <li>use attributes to refine the content of an element but avoid them for
 90      more complex tasks, attribute parsing is not cheaper than an element and
 91      it is far easier to make an element content more complex while attribute
 92      will have to remain very simple.</li>
 93  </ul>
 94  
 95  <h3>Versioning:</h3>
 96  
 97  <p>As part of the design, make sure the structure you define will be usable
 98  for future extension that you may not consider for the current version. There
 99  are two parts to this:</p>
100  <ul>
101    <li>Make sure the instance contains a version number which will allow to
102      make backward compatibility easy. Something as simple as having a
103      <code>version="1.0"</code> on the root document of the instance is
104      sufficient.</li>
105    <li>While designing the code doing the analysis of the data provided by the
106      XML parser, make sure you can work with unknown versions, generate a UI
107      warning and process only the tags recognized by your version but keep in
108      mind that you should not break on unknown elements if the version
109      attribute was not in the recognized set.</li>
110  </ul>
111  
112  <h3>Other design parts:</h3>
113  
114  <p>While defining you vocabulary, try to think in term of other usage of your
115  data, for example how using XSLT stylesheets could be used to make an HTML
116  view of your data, or to convert it into a different format. Checking XML
117  Schemas and looking at defining an XML Schema with a more complete
118  validation and datatyping of your data structures is important, this helps
119  avoiding some mistakes in the design phase.</p>
120  
121  <h3>Namespace:</h3>
122  
123  <p>If you expect your XML vocabulary to be used or recognized outside of your
124  application (for example binding a specific processing from a graphic shell
125  like Nautilus to an instance of your data) then you should really define an <a
126  href="http://www.w3.org/TR/REC-xml-names/">XML namespace</a> for your
127  vocabulary. A namespace name is an URL (absolute URI more precisely). It is
128  generally recommended to anchor it as an HTTP resource to a server associated
129  with the software project. See the next section about this. In practice this
130  will mean that XML parsers will not handle your element names as-is but as a
131  couple based on the namespace name and the element name. This allows it to
132  recognize and disambiguate processing. Unicity of the namespace name can be
133  for the most part guaranteed by the use of the DNS registry. Namespace can
134  also be used to carry versioning information like:</p>
135  
136  <p><code>"http://www.gnome.org/project/projectname/1.0/"</code></p>
137  
138  <p>An easy way to use them is to make them the default namespace on the
139  root element of the XML instance like:</p>
140  <pre>&lt;structure xmlns="http://www.gnome.org/project/projectname/1.0/"&gt;
141    &lt;data&gt;
142    ...
143    &lt;/data&gt;
144  &lt;/structure&gt;</pre>
145  
146  <p>In that document, structure and all descendant elements like data are in
147  the given namespace.</p>
148  
149  <h2><a name="Canonical">Canonical URL</a></h2>
150  
151  <p>As seen in the previous namespace section, while XML processing is not
152  tied to the Web there is a natural synergy between both. XML was designed to
153  be available on the Web, and keeping the infrastructure that way helps
154  deploying the XML resources. The core of this issue is the notion of
155  "Canonical URL" of an XML resource. The resource can be an XML document, a
156  DTD, a stylesheet, a schema, or even non-XML data associated with an XML
157  resource, the canonical URL is the URL where the "master" copy of that
158  resource is expected to be present on the Web. Usually when processing XML a
159  copy of the resource will be present on the local disk, maybe in
160  /usr/share/xml or /usr/share/sgml maybe in /opt or even on C:\projectname\
161  (horror !). The key point is that the way to name that resource should be
162  independent of the actual place where it resides on disk if it is available,
163  and the fact that the processing will still work if there is no local copy
164  (and that the machine where the processing is connected to the Internet).</p>
165  
166  <p>What this really means is that one should never use the local name of a
167  resource to reference it but always use the canonical URL. For example in a
168  DocBook instance the following should not be used:</p>
169  <pre>&lt;!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
170  
171  
172                           "/usr/share/xml/docbook/4.2/docbookx.dtd"&gt;</pre>
173  
174  <p>But always reference the canonical URL for the DTD:</p>
175  <pre>&lt;!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
176  
177  
178                           "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"&gt;   </pre>
179  
180  <p>Similarly, the document instance may reference the <a
181  href="http://www.w3.org/TR/xslt">XSLT</a> stylesheets needed to process it to
182  generate HTML, and the canonical URL should be used:</p>
183  <pre>&lt;?xml-stylesheet
184    href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"
185    type="text/xsl"?&gt;</pre>
186  
187  <p>Defining the canonical URL for the resources needed should obey a few
188  simple rules similar to those used to design namespace names:</p>
189  <ul>
190    <li>use a DNS name you know is associated to the project and will be
191      available on the long term</li>
192    <li>within that server space, reserve the right to the subtree where you
193      intend to keep those data</li>
194    <li>version the URL so that multiple concurrent versions of the resources
195      can be hosted simultaneously</li>
196  </ul>
197  
198  <h2><a name="Catalog">Catalog setup</a></h2>
199  
200  <h3>How catalogs work:</h3>
201  
202  <p>The catalogs are the technical mechanism which allow the XML processing
203  tools to use a local copy of the resources if it is available even if the
204  instance document references the canonical URL. <a
205  href="http://www.oasis-open.org/committees/entity/">XML Catalogs</a> are
206  anchored in the root catalog (usually <code>/etc/xml/catalog</code> or
207  defined by the user). They are a tree of XML documents defining the mappings
208  between the canonical naming space and the local installed ones, this can be
209  seen as a static cache structure.</p>
210  
211  <p>When the XML processor is asked to process a resource it will
212  automatically test for a locally available version in the catalog, starting
213  from the root catalog, and possibly fetching sub-catalog resources until it
214  finds that the catalog has that resource or not. If not the default
215  processing of fetching the resource from the Web is done, allowing in most
216  case to recover from a catalog miss. The key point is that the document
217  instances are totally independent of the availability of a catalog or from
218  the actual place where the local resource they reference may be installed.
219  This greatly improves the management of the documents in the long run, making
220  them independent of the platform or toolchain used to process them. The
221  figure below tries to express that  mechanism:<img src="catalog.gif"
222  alt="Picture describing the catalog "></p>
223  
224  <h3>Usual catalog setup:</h3>
225  
226  <p>Usually catalogs for a project are setup as a 2 level hierarchical cache,
227  the root catalog containing only "delegates" indicating a separate subcatalog
228  dedicated to the project. The goal is to keep the root catalog clean and
229  simplify the maintenance of the catalog by using separate catalogs per
230  project. For example when creating a catalog for the <a
231  href="http://www.w3.org/TR/xhtml1">XHTML1</a> DTDs, only 3 items are added to
232  the root catalog:</p>
233  <pre>  &lt;delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0"
234                    catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/&gt;
235    &lt;delegateSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
236                    catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/&gt;
237    &lt;delegateURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
238                    catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/&gt;</pre>
239  
240  <p>They are all "delegates" meaning that if the catalog system is asked to
241  resolve a reference corresponding to them, it has to lookup a sub catalog.
242  Here the subcatalog was installed as
243  <code>/usr/share/sgml/xhtml1/xmlcatalog</code> in the local tree. That
244  decision is left to the sysadmin or the packager for that system and may
245  obey different rules, but the actual place on the filesystem (or on a
246  resource cache on the local network) will not influence the processing as
247  long as it is available. The first rule indicate that if the reference uses a
248  PUBLIC identifier beginning with the</p>
249  
250  <p><code>"-//W3C//DTD XHTML 1.0"</code></p>
251  
252  <p>substring, then the catalog lookup should be limited to the specific given
253  lookup catalog. Similarly the second and third entries indicate those
254  delegation rules for SYSTEM, DOCTYPE or normal URI references when the URL
255  starts with the <code>"http://www.w3.org/TR/xhtml1/DTD"</code> substring
256  which indicates the location on the W3C server where the XHTML1 resources are
257  stored. Those are the beginning of all Canonical URLs for XHTML1 resources.
258  Those three rules are sufficient in practice to capture all references to XHTML1
259  resources and direct the processing tools to the right subcatalog.</p>
260  
261  <h3>A subcatalog example:</h3>
262  
263  <p>Here is the complete subcatalog used for XHTML1:</p>
264  <pre>&lt;?xml version="1.0"?&gt;
265  &lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
266            "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
267  &lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
268    &lt;public publicId="-//W3C//DTD XHTML 1.0 Strict//EN"
269            uri="xhtml1-20020801/DTD/xhtml1-strict.dtd"/&gt;
270    &lt;public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
271            uri="xhtml1-20020801/DTD/xhtml1-transitional.dtd"/&gt;
272    &lt;public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN"
273            uri="xhtml1-20020801/DTD/xhtml1-frameset.dtd"/&gt;
274    &lt;rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
275            rewritePrefix="xhtml1-20020801/DTD"/&gt;
276    &lt;rewriteURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
277            rewritePrefix="xhtml1-20020801/DTD"/&gt;
278  &lt;/catalog&gt;</pre>
279  
280  <p>There are a few things to notice:</p>
281  <ul>
282    <li>this is an XML resource, it points to the DTD using Canonical URLs, the
283      root element defines a namespace (but based on an URN not an HTTP
284    URL).</li>
285    <li>it contains 5 rules, the 3 first ones are direct mapping for the 3
286      PUBLIC identifiers defined by the XHTML1 specification and associating
287      them with the local resource containing the DTD, the 2 last ones are
288      rewrite rules allowing to build the local filename for any URL based on
289      "http://www.w3.org/TR/xhtml1/DTD", the local cache simplifies the rules by
290      keeping the same structure as the on-line server at the Canonical URL</li>
291    <li>the local resources are designated using URI references (the uri or
292      rewritePrefix attributes), the base being the containing sub-catalog URL,
293      which means that in practice the copy of the XHTML1 strict DTD is stored
294      locally in
295      <code>/usr/share/sgml/xhtml1/xmlcatalog/xhtml1-20020801/DTD/xhtml1-strict.dtd</code></li>
296  </ul>
297  
298  <p>Those 5 rules are sufficient to cover all references to the resources held
299  at the Canonical URL for the XHTML1 DTDs.</p>
300  
301  <h2><a name="Package">Package integration</a></h2>
302  
303  <p>Creating and removing catalogs should be handled as part of the process of
304  (un)installing the local copy of the resources. The catalog files being XML
305  resources should be processed with XML based tools to avoid problems with the
306  generated files, the xmlcatalog command coming with libxml2 allows you to create
307  catalogs, and add or remove rules at that time. Here is a complete example
308  coming from the RPM for the XHTML1 DTDs post install script. While this example
309  is platform and packaging specific, this can be useful as a an example in
310  other contexts:</p>
311  <pre>%post
312  CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
313  #
314  # Register it in the super catalog with the appropriate delegates
315  #
316  ROOTCATALOG=/etc/xml/catalog
317  
318  if [ ! -r $ROOTCATALOG ]
319  then
320      /usr/bin/xmlcatalog --noout --create $ROOTCATALOG
321  fi
322  
323  if [ -w $ROOTCATALOG ]
324  then
325          /usr/bin/xmlcatalog --noout --add "delegatePublic" \
326                  "-//W3C//DTD XHTML 1.0" \
327                  "file://$CATALOG" $ROOTCATALOG
328          /usr/bin/xmlcatalog --noout --add "delegateSystem" \
329                  "http://www.w3.org/TR/xhtml1/DTD" \
330                  "file://$CATALOG" $ROOTCATALOG
331          /usr/bin/xmlcatalog --noout --add "delegateURI" \
332                  "http://www.w3.org/TR/xhtml1/DTD" \
333                  "file://$CATALOG" $ROOTCATALOG
334  fi</pre>
335  
336  <p>The XHTML1 subcatalog is not created on-the-fly in that case, it is
337  installed as part of the files of the packages. So the only work needed is to
338  make sure the root catalog exists and register the delegate rules.</p>
339  
340  <p>Similarly, the script for the post-uninstall just remove the rules from the
341  catalog:</p>
342  <pre>%postun
343  #
344  # On removal, unregister the xmlcatalog from the supercatalog
345  #
346  if [ "$1" = 0 ]; then
347      CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
348      ROOTCATALOG=/etc/xml/catalog
349  
350      if [ -w $ROOTCATALOG ]
351      then
352              /usr/bin/xmlcatalog --noout --del \
353                      "-//W3C//DTD XHTML 1.0" $ROOTCATALOG
354              /usr/bin/xmlcatalog --noout --del \
355                      "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
356              /usr/bin/xmlcatalog --noout --del \
357                      "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
358      fi
359  fi</pre>
360  
361  <p>Note the test against $1, this is needed to not remove the delegate rules
362  in case of upgrade of the package.</p>
363  
364  <p>Following the set of guidelines and tips provided in this document should
365  help deploy the XML resources in the GNOME framework without much pain and
366  ensure a smooth evolution of the resource and instances.</p>
367  
368  <p><a href="mailto:veillard@redhat.com">Daniel Veillard</a></p>
369  
370  <p>$Id$</p>
371  
372  <p></p>
373  </body>
374  </html>