XHTML Grammar Overview
There are about 100 elements in the XHTML Document Type family. Here, only a subset will be introduced.
Refer to the Read More section for more details.
If you have not yet reviewed the XML Syntax Introduction section, you are advised to do so before proceeding.
- Root element
- Children of root
- The XHTML DTD defines that the root element
<html>can have only two children:
<html> <head>...</head> <body>...</body> </html>
<head> element contains data (children) such as meta information that is not necessarily presented to the user of the completed document.
<body> element contains all of the document data/text. The text of the document will occur in different elements, that all are children of
According to the XHTML DTD,
<head> must come before
<body>, thus the following example is invalid:
<html> <body>...</body> <head>...</head> </html>
but the following is valid:
<html> <head>...</head> <body>...</body> </html>
<title>element contains the document title.
The text node of
<title>will not be displayed as a part of the document itself, but may be displayed or used in other ways, such as in the caption bar of the browser window.
<meta />elements contain meta level information about the document.
Since the meta element is empty, the information is contained within attributes instead of text nodes.
... <head> <title>A Farewell To Arms</title> <meta name='dc:title' content='A Farewell To Arms' /> <meta name='dc:creator' content='Ernest Hemingway' /> </head> ...
For more information on metadata, refer to the DTB Metadata Section.
XHTML allows six heading levels. Each level increment describes on additional step down on the hierarchical structure axis. The element names to be used are:
In DAISY DTBs it is required that heading levels must not be omitted, that is, in the sequence of levels, a level must not be "skipped".
The following example is forbidden:
<h1>Chapter 1</h1> <p>Paragraph text</p> <h3>Chapter 1.1.1</h3> <p>Paragraph text</p>
The following example is correct:
<h1>Chapter 1</h1> <p>Paragraph text</p> <h2>Chapter 1.1</h2> <p>Paragraph text</p>
A very common element in XHTML is the paragraph element.
In XHTML the element name for paragraph is
XHTML includes two types of lists. The first, unordered, is often called a "bullet list". Syntax in XHTML is:
<ul> <li>list item 1</li> <li>list item 2</li> <li>list item 3</li> </ul>
As shown above, in the unordered list
<ul> is the parent of any number of
The second list type in XHTML - the ordered list - uses the same syntax, but element name is "ol". This will cause numbered list items.
<ol> <li>list item 1</li> <li>list item 2</li> <li>list item 3</li> </ol>
Definition lists are used to define terms and words. Three elements are used in combination:
<dl> definition list
<dt> definition term
<dd> definition data
<dl> <dt>XML</dt> <dd>Abbreviation for eXtensible Markup Language</dd> <dt>XHTML</dt> <dd>An XML DTD, XHTML is an abbreviation for eXtensible HyperText Markup Language</dd> </dl>
As shown above, the definition list
<dl> is the parent of any number of paired
div are used when no other element in the XHTML DTD is suitable to describe what kind of text the element contains.
For example, in the XHTML DTD there is no element for a "page". Instead, we use the span element, and add a class attribute to describe what the element represents.
There are several class attributes used in Daisy 2.02 to specify element content.
<span class="page-normal">23</span> <span class="page-front">IV</span> <span class="page-special">A-10</span> <span class="noteref">1</span> <div class="notebody">notebody text</span> <span class="sidebar">sidebar text</span>
Note that besides the class attribute values above, which you must use when including these types of text/data in Daisy 2.02 DTBs, you are free to create class attribute values of your own. XHTML specifies element names, but it does not specify class attribute values.
Read more about special element usage in DAISY DTBs in the DAISY XHTML Element Usage Requirements Section.
Note that you are free to create class attribute values of your own. XHTML specifies element names, but it does not specify class attribute values. It is your responsibility to create semantically meaningful values for the class attributes you use.
Example of custom class attribute values: <span class="sent"> <span class="wrd">This</span> <span class="wrd">is</span> <span class="wrd">a</span> <span class="wrd">sentence</span> <span class="dot">.</span> <span>
Images are included in the document using the
img element. Two important attributes are added;
src which is the link (URL) to the image, and
alt which is a short text describing the image.
<img src="flower01.jpg" alt="An image of a flower" />
<img src="http://www.botanica.org/gfx/flower01.jpg" alt="An image of a flower" />
When a short text equivalent does not suffice, provide additional information in a file referenced in the
<img src="http://www.botanica.org/gfx/flower01.jpg" alt="An image of a flower" longdesc="/flower01.html" />
The XHTML DTD differentiates between inline and block elements.
Block elements are elements that may contain text nodes or other elements as children.
Inline elements normally only contain text nodes, NOT nested children.
Inline elements may be nested within block elements, but block elements may not be nested within inline elements.
<p>This paragraph has an <em>emphasis</em> element nested inside it </p>
The above example is allowed, because <p> is block and <em> is inline.
But the below example is not allowed (invalid) because <span> is inline and <h1> is block.
<span>This span has a <h1>heading</h1> element nested inside it, which is not allowed. </span>
Some commonly used inline XHTML elements are:
- <em>: emphasis
<p>It is <em>very</em> important to understand this. </p>
- <strong>: strong emphasis
<p> We <strong>strongly recommend</strong> that you try this at home. </p>
- <kbd>: keyboard (indicates this is a computer keyboard shortcut).
<p> To close the program, you may use the <kbd> Alt+F4 </kbd> keyboard shortcut. </p>
- <q>: inline quote
<p> And then he said <q> I think I understand this. </q> </p>
- <span>: with class attribute
<p>And then he asked <span class='question'> Do you think this is easy? </span> </p>
In most browsers, the visual display differs between inline and block elements; a block element causes a line break, but an inline element does not.
Example: Use of the span element will cause all text nodes to appear on the same line, because
span is an inline element.
<div> <span>span element</span> <span>span element</span> </div>
Example: Use of the div element will cause text nodes to appear on one line each, because
div is a block element.
<div> <div>div element</div> <div>div element</div> </div>
Below is the result of the above two examples:
table element is used with the following children in combination:
<tr> table row
<td> table data or table cell
<th> table header or column heading
<table summary="Table summary text"> <caption>Table caption</caption> <tr> <th>Column 1</th> <th>Column 2</th> </tr> <tr> <td>Cell 1</td> <td>Cell 2</td> </tr> <tr> <td>Cell 3</td> <td>Cell 4</td> </tr> </table>
Hyperlinks are created using the anchor element (
<a>) in combination with the
The hyperlink points to another resource, in the same document, or to another document somewhere else on the Web. The current document is the source of the link; the value of the href attribute, a URL, is the target.
<a href="http://www.daisy.org"> DAISY website </a>
The target resource can either be another document, or a fragment of a document.
When fragments are referenced, the targets should consist of an
id attribute value.
<a href="news.html#workshop"> DAISY Workshop News </a> [pointing to:] <h3 id="workshop">DAISY Workshops 2003</h3> <p>A workshop is being held in August...</p>
The id attribute (target) value can be duplicated in an
anchor element with a
name attribute as well for user agent compatibility purposes.
<h3> <a id="workshop" name="workshop"> APCD Workshops 2003 </a> </h3> <p>A workshop is being held in July...</p>
name attribute was deprecated as fragment identifier in XHTML 1.0 and removed entirely in XHTML 1.1
Every document on the Web has a unique address. The document's address is known as its uniform resource locator - URL.
Several XHTML elements include a URL attribute value, including hyperlinks, inline images, and forms. All use the same URL syntax.
DAISY DTBs also use the URL syntax (although sometimes referred to as a URI) to provide linkage information.
- Absolute - pointing to web server on the World Wide Web
- Absolute - pointing to specific document at web server on the World Wide Web
- Relative - pointing to other document at same server
- Relative - pointing to fragment (of an id or name attribute) in other document
- Relative - pointing to fragment (of an id or name attribute) in same document
There are two main cases when certain characters can not be typed as-is in text nodes or attribute values:
- Their presence would be misinterpreted as markup
- They are not available on the used system/keyboard
To handle this problem, XML uses a construct called character entity references. This is a "virtual" reference to a certain character.
A character entity reference always begin with the ampersand sign (&) and always end with the semicolon sign (;).
To cover for the first case above (misinterpretation as markup), XML predefines five character entity references. These are:
- The less-than sign - the opening angle bracket (<)
- The ampersand (&)
- The greater-than sign - the closing angle bracket (>)
- The straight, double quotation marks (")
- The apostrophe; a.k.a. the straight single quote (')
To cover for the second case above (character not available on the used keyboard), XHTML has defined three sets of named character entity references.
Refer to the XHTML 1 Alphanumeric Character Entities table for a full listing.
Lets say you have a piece of XHTML markup code that you want to include in a tutorial about markup. The correct semantical element to use for code snippets is
code. The piece of markup that you want to include in the tutorial text looks like this:
<p> A paragraph with an <em>emphasized</em> word. </p>
Now, two problems arise.
If you just paste this example in as-is, the less than and greater than signs will be interpreted as markup by the XHTML parser/browser (Internet Explorer for example). This will result in the code snippet not showing as code in the display.
This is solved by escaping the less than and greater than signs using character entities as described above.
The whitespace (linebreaks, tabs, spaces etc) in the example will be truncated into a single space. This is the default behavior of parsers/browsers.
This is solved by using the
preelement means "preformatted" and indicates that the text of
preshall have its original whitespace characters preserved.
The actual code of the example snippet becomes:
<pre> <code> <p> A paragraph with an <em>emphasized</em> word. </p> </code> </pre>
xml:langattributes are used to convey the natural language of the presentation. See further in the Internationalization tutorial.
Language codes are available in Language Code Listing.
<html lang="en-GB" xml:lang="en-GB" > <head>...</head> <body>...</body> </html>
In an XHTML document, the DOCTYPE declaration for Transitional DTD content should be as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> ... </html>
The DOCTYPE declaration for Strict DTD content should be as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> ... </html>
<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> ... </html>
If you use a character set encoding other than utf-8, you must specify an encoding attribute on the XML declaration. (See further the internationalization tutorial)
<?xml version="1.0" encoding="iso-8859-1" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> ... </html>
(Note: the two links above do not point out elements and attributes that were deprecated in XHTML 1.1 and later)Getting started with HTML by Dave Raggett Getting started with HTML - advanced by Dave Raggett URI specification (rfc2396)
Text is available under the terms of the DAISY Consortium Intellectual Property Policy, Licensing, and Working Group Process.