XInclude Analysis

From zedwiki

Jump to: navigation, search

Contents

How XInclude works

From a syntax perspective, XInclude is very intuitive. If you want to insert the content from "includedDoc.xml" into your current document, you just use the following element:

<xi:include href="includedDoc.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />

There are a lot of other options and complications, but for a simple document inclusion, that's all there is.

XInclude and ZedAI schemas

What may not be as intuitive is that the XInclude operation takes place on XML Information Sets. The XInclude processor parses the source document and when it finds the xi:include element, it parses and merges in the referenced external document into the result infoset. What this means to us is that we don't need to add xi:include to our content models -- the inclusion is transparent to schema validation, which will take place on the result infoset.

Because XInclude operates on XML infosets, the included documents must be well-formed, and hence must have a single top-level element. This limits the ways in which documents can be broken up into multiple files, but that should not pose any serious issues.

The XInclude spec includes a requirement that the xml:base attribute be added to the top-level included element in order to keep relative URIs correct. This would require us to add xml:base to our set of core attributes so that the merged infoset will validate after inclusion. This is a relatively minor change. XHTML2 has an open issue relating to XInclude support, and I wonder if they might do this themselves.

XInclude and processing performance

Because the merging of included documents into the resulting infoset takes place at parse time, XInclude will not solve all performance problems encountered because of large document sizes. Any process that operates on the entire book will still need to work on the entire result infoset, and hence will presumably have the same performance issues. The authoring process may be helped, however, because the chunks can be edited independently. They will not validate against the schema (because they do not use html as the document element), but an XML editor can still use the schema to flag other errors, suggest elements, etc. (I used Oxygen to create sample documents, and this worked great).

Support for XInclude

There is support for XInclude in various XML parsing libraries. JAXP supports it for SAX and DOM, although not for StAX (although apparently it is not difficult to build a filter to support it for StAX). There is a .Net implementation from the Mvp.Xml project. There are probably other implementations out there, as well.

Conclusions and recommendations

Based on the above information, it is clear that XInclude meets the criteria for a good solution to the problem of authoring large ZedAI documents. It is standards-based, has support in current tools, and is flexible enough to allow a wide variety of document-partitioning schemes. Therefore, the ZedAI standard should be crafted in a way to support the use of XInclude.

It appears that the only step that need be taken to support XInclude is to add the xml:base attribute to our set of common attributes so that it can appear on all top-level elements in the included files, as is required by the XInclude specification. The other attribute that may be added by XInclude processors is xml:lang, but we already allow this everywhere so no change is needed.

Issues for discussion

  1. Do we need to explicitly state anything about XInclude in the ZedAI specification itself? Because XInclude is, in effect, a preprocessing step for parsing, it will be transparent to programs that process ZedAI documents. Therefore, there is no pressing need to reference it in the specification. Other options for promoting XInclude include an informative note that is not part of the standard itself that points to XInclude as an authoring solution, or information about this in the Structure Guidelines or similar "best practices" documentation.
  2. Is this a sufficient response to the complaints about having to contain books within a single document? It solves authoring issues, but not infoset processing issues. Many (most? all?) of these are playback (distribution) issues, however, and hence are not a concern of ZedAI.

Sample documents and code

The following examples are in the sandbox of the ZedAI code repository at GoogleCode:

The source document validates in Oxygen, when XInclude support is turned on (via the XML->XML Parser tree of the Options->Preferences menu item).

  • CloneDocument.xsl is a simple transform that just outputs its input. If you apply this to the sample source document using a parser that supports XInclude, you should get the merged document as output. This works in Oxygen, for example, when XInclude support is turned on.
  • XIncludeTester.java: Java code to apply an XSLT transform to an input document, using JAXP with XInclude support. Note that I have turned off the addition of @xml:base so that the output will validate against our schema. Some XInclude processors include this option, but it is outside the spec itself.
  • CS_XIncludeTester.zip: C# source code for a command line application that does the same thing as the java sample above.
Personal tools