ZedAI A Discussion on Inline Content Models
From zedwiki
Contents |
Introduction
This page presents a discussion on what content model to use for block elements such as x:p, x:li, x:td, x:th, d:note, x:caption, etc
I will discuss pros and cons of various models, and will try to present my view on what the content model should be for these elements.
Two very different content models
The following sections present the two most extreme (but not so extreme that they are irrelevant) models for a block of text.
The very loose model
In the very loose model, text may be freely mixed with various elements, some representing inline elements and others representing block elements. The following is an example:
<p>
In this paragraph we have some <em>special words</em> words<d:noteref ref="n0003">3</d:noteref>.
<d:s>The following two lines capture the spirit of this paragraph</d:s>
<l>It contains some text that seems a bit artificial and <d:w xml:lang="no">oppkonstruert</d:w>.</l>
<l>It contains markups that seems to be a bit messy, but that actually could be appropriate in a real world case.</l>
<d:pagebreak value="34" />
As we have now started a new page, there is room for a list, which states that ...
<ul>
<li>... the list has two list items.</li>
<li>... this is the last one.</li>
</ul>
Still some available space on this page, so we will have a graph illustrating something:
<object src="graph.png">
A graph showing that ...
</object>
One could argue that an image of some kind should <em>not</em> form the part of a paragraph,
but the same thing could be said about the list...
</p>
Some properties of a very loose content model:
- A person doing the markup might consider this a very good model, as you simply add whatever markup required, no more and no less, to add the proper semantics to the text.
- It will be more difficult to write good transformation rules from the ZedAI file to various distribution formats. As an example, when creating a talking book from the ZedAI, you might implement some kind of sentence detection in the sence that, in the content document of the DTB, every sentence is marked up as such. This would be rather difficult to achieve with a loose model.
- Markup based on this model does not look very clean.
The very tight model
The very tight model assumes that a paragraph consists of a set of sentences, and requires that every sentence is marked up properly:
<p>
<d:s>The very tight model assumes that a paragraph consists of a set of sentences,
and requires that every sentence is marked up properly.</d:s>
<d:s>Inside the sentence, several other elements are allowed.</d:s>
<d:s>So if you have to, you could <em>mark up some text as italic</em>, and
you could add a reference to a footnote.<d:noteref ref="fn05">5</d:noteref></d:s>
</p>
The problem with this model is that the assumption is wrong. A paragraph, and other block elements, normally consists of other things than just a set of sentences. A pagebreak element is just one example.
Obviously, a very tight model makes a very predictable markup. This will simplify transformation processes from the ZedAi file to distribution formats. Apart from that, this kind of markup gets very verbose.
The needs of the real world
Fortunately, we don't have to choose among these two models, or try to define some kind of intermediate model.
Due to the choice possibilities in Relax NG, we may define a set of basic (sub) content models, and then make all of them available for markup of a block of text.
What follows is a presentation of typical needs for markup of a piece of text.
Sub-model 1: Text only
One possible content model is the plain text model. Very often, all that is needed is to identify that part of the text as, say a list item, with no further granularity:
<li> They burn less fuel than a gasoline engine performing the same work, due to the engine's high efficiency and diesel fuel's higher energy density than gasoline. </li> <li> Diesel fuel is considered safer than gasoline in many applications. Although diesel fuel will burn in open air using a wick, it will not explode and does not release a large amount of flammable vapour. </li>
Sub-model 2: Text mixed with a small set of child elements
Often, you will also need to add some additional markup for small parts (in the range one character to a few words) of the text:
<caption>Carbon dioxide (CO<sub>2</sub>) and global warming</caption>
<p> A typical sound reinforcement system consists of three primary parts: <em>input transducers</em>, which convert sound energy into an audio signal, <em>signal processors</em> which alter the audio signal characteristics, and <em>output transducers</em> , which convert the electrical audio signal into sound energy. </p>
There is a limit regarding what kind of elements that can be used like this, whithout any further markup of the context they are placed in. The following is a very restricted list of elements that may be used as child of block elements:
- a
- em
- d:noteref
- d:pagebreak
- sub
- sup
There may be good reasons to extend this list with other elements, such as abbr, cite etc, so a discussion is needed to define the set of allowed children.
Sub-model 3a: Children only (no block elements)
Often, there will be a need to clarify the structure of a block of text with additional markup, typically to identify the sentences that make up the block.
<p>
<d:s>Many types of input transducers can be found in a sound reinforcement system,
with microphones being the most commonly used input device.</d:s>
<d:s>They can be classified according to <x:em>their method of transduction</x:em>,
<x:em>their pickup (or polar) pattern</x:em> or <x:em>their functional application</x:em>.</d:s>
<d:pagebreak value="103" />
<d:s>Most microphones used in sound reinforcement are either dynamic or condenser microphones.</d:s>
</p>
Sub-model 3b: Children only (inline and block elements)
This content model permits block elements inside another block of text; the typical example would be a list inside a paragraph:
<p>
<d:s>Other less common power sources include:</d:s>
<ul>
<li>
<d:s>Electric motors, often linked to solar panels to create a solar-powered aircraft.</d:s>
<object src="solarpanel.png">
The image shows a medium sized solar panel, attached to the wings of the aircraft.
</object>
</li>
<li>Rubber bands, wound many times to store energy, are mostly used for flying models.</li>
</ul>
</p>
Note that, with this content model, the use of an object element in the first list item, requires the text in the list item to be wrapped inside a d:s element.
A proposed content model for a block of text
The markup of a block of text, must be valid according to one of ...
- ... sub-model 1, with no mark up inside the text
- ... sub-model 2, with text mixed with the folllowing elements
- a
- em
- d:noteref
- d:pagebreak
- sub
- sup
- ... sub-model 3b, allowing the following elements as children of the block
- blockquote
- d:s
- l
- d:pagebreak
- ol
- ul
All markup examples above, except the very first, will be valid according to this content model, as they are all valid according to at least one of the sub-models.
The first example is not valid as the proposed content model disallows a mix of text and several of the children used inside the p element.
Note that I (a bit reluctantly) choose sub-model 3b instead of 3a. I am not a big fan of putting block elements inside other block elements. However, the alternative is to put something that is grammatically bound closely together into separate elements, and for that reason I suggest that blockquotes and lists may be located inside a block of text.
What is a "block of text"?
With a "block of text", I mean ...
- ... a paragraph (p)
- ... a list item (li)
- ... a definition (dd)
- ... an entry in a table (td and th)
- ... a caption (caption)
- ... a note (d:note)
- ... an annotation (d:annotation)
It is not obvious that all these elements should have the same content model, so we may need to have a discussion on that. For example, the caption element should perhaps not be allowed to contain a d:pagebreak or a table. Also, we don't want a paragraph to contain a paragraph, but it could be nice to split the content of a list item or a table cell into paragraphs.
Note that ...
- blockquote is not on the list, as I would expect a quote to only consist of a set of one or more sentences, so it will need a tigther model than the one described above. And we also want to avoid nested blockquotes.
- blockcode is not on the list, as the content of a blockcode should represent a piece of code, and should require no markup at all.
Invalid markup according to this model
The following examples are all invalid according to the proposed content model. If you find that these examples represent markup that is natural to use, then mye suggested content model is in trouble.
<p>
In Norway, the typical process of saying "Hello" or "Good morning" in an informal way, would be to
smile and say <span xml:lang="no">"Hei"</span> or <span xml:lang="no">"God morgen"</span>.
Obviously, the process is not very different from the way it is done in other corners of the World.
</p>
The span element can not be mixed with text in this way. Would probaly be better to use the q element.
<ol>
<li>Fill a kettle with water.</li>
<li>Add
<ul>
<li>carrots</li>
<li>honey</li>
<li>mushrooms</li>
</ul>
</li>
<li>Boil and stir for fifteen minutes.</li>
</ol>
Plain text and lists can not be mixed inside a list item.
<p>
Mainland Spain is dominated by high plateaus and mountain ranges, such as the Sierra Nevada.
Running from these heights are several major rivers such as the Tagus, the Ebro,
the Duero, the Guadiana and the Guadalquivir.
Alluvial plains are found along the coast, the largest of which is that of the Guadalquivir in Andalusia.
<object src="spain02.png">
Spanish countryside in Huesca, Aragon in northeastern Spain.
</object>
</p>
Plain text and objects can not be mixed inside a paragraph.
<caption>The Birth of the <em>World Wide Web Consortium</em> (<abbr full="#w3c">W3C</abbr>)</caption>
The abbr element is not allowed as a child of the caption element.
<td>
<div xml:id="dt002-match">Liverpool FC vs. Arsenal</div>
<div xml:id="dt002-date">2003-07-05</div>
<div xml:id="dt002-result">2-1</div>
</td>
A td element may not have div as children.
