Introduction to the XML syntax
XML Markup is a description of the document's storage layout and logical structure. Markup is the methid with which the structure and semantics of the information in document is conveyed.
This section introduces Markup basics, and some common terminology central to all xml-related technologies.
Create a new text (
.txt) file and save it with your name as the filename.
Write the below name, street adress, postal code, city, country, and two telephone numbers on one line each in the text document.
Miki Azuma 3-2-11 Nishi-Shinjuku Shinjuku Japan 03-5909-8220 03-5909-8289
XML's basic unit of data and markup is called element.
The element NAME mostly describes the kind of data/text that is contained within the element.
Example of an element:
<paragraph>This is a paragraph</paragraph>
Another example of an element:
XML itself imposes no restrictions (except some restrictions on characters that can be used) on the element name. The language author decides what name is appropriate for the particular data/structure.
In so called document-centric XML, the element names mostly describe the structural and/or semantic role of the enclosed text, in its context.
Add element names to each line in your text document. Use an appropriate semantic to describe the element content.
<name>Miki Azuma</name> <street>3-2-11 Nishi-Shinjuku</street> <city>Shinjuku</city> <country>Japan</country> <telephone>03-5909-8220</telephone> <telephone>03-5909-8289</telephone>
Note that XML element names can not contain spaces.
Open tag and Close tag
The XML element has an open tag and a closing tag. These are delimited by the less than ("<") and the greater than (">") characters.
In the example above,
<name> is the open tag of the element, and
</name> is the closing tag of the element.
Note that the only difference is the "slash" character ("/") in the closing tag.
The open tag and the closing tag of an element always have the same name. The example below is an error.
Also note that element names are case sensitive. The example below is an error, because the open tag uses lower case, and the closing tag uses uppercase.
All XML documents form a tree structure.
All XML Documents must have exactly one root element. Example:
<root> <paragraph>text</paragraph> <paragraph>text</paragraph> </root>
A commonly used synonym for "root" is "document element".
Add a root element to your document. Remember to use an appropriate semantic to describe the element content. In this case, since this is the root element, the semantic should describe the content of the whole document.
<address> <name>Miki Azuma</name> <street>3-2-11 Nishi-Shinjuku</street> <city>Shinjuku</city> <country>Japan</country> <telephone>03-5909-8220</telephone> <telephone>03-5909-8289</telephone> </address>
All XML elements may have attributes that contain additional information about the data/text.
In this example, the attribute name is "content". The attribute value is "introduction".
There must always be an equals sign (=) between the attribute name and the attribute value.
The attribute value must be enclosed in double or single quotes. The following two attributes are regarded as identical.
<paragraph content="introduction"></paragraph> <paragraph content='introduction'></paragraph>
An attribute is always contained within the open tag of the element.
Attributes on closing tags are forbidden.
There is a problem in the semantics used in the document so far. The problem is that there is no way to distiguish between the two telephone numbers. Which one is the home number, and which one is the office number?
Although this problem could have been solved by giving different element names to the two telephone numbers (
phone-home for example), lets solve it by adding attributes instead.
type attribute to the telephone elements. Give the attributes values that appropriately describe the semantics.
[...] <telephone type="office">03-5909-8220</telephone> <telephone type="home">03-5909-8289</telephone> [...]
Child, Parent and Sibling: "nesting"
Elements often form a parent-child relationship. Example:
<parent> <child>Text</child> </parent>
This relationship is often referred to as elements being nested within each other. In the example above,
child is nested within
Of course, a parent of one child may at the same time be the child of another parent. Example:
<root> <paragraph> <sentence>This is a sentence</sentence> <sentence>This is another sentence</sentence> </paragraph> </root>
The above is an example of three-level nesting (root-paragraph-sentence).
<paragraph> is parent of
<sentence>, and at the same time
<paragraph> is child of
Elements that occur at the same nesting level (such as the two
<sentence> elements above), form a sibling relationship.
Refine the semantics of the
name element by adding two child elements to it. The first child element should be
first-name and the second should be
[...] <name> <first-name>Miki</first-name> <last-name>Azuma</last-name> </name> [...]
The name element now has two children:
last-name elements are siblings to eachother. They have a common parent: the
The Text Node
Some elements have a text node, others do not.
In the below example, "paragraph" has a text node, but "document" does not.
<document> <paragraph>This is the text node</paragraph> </document>
The Empty element
Elements that do not have text nodes, nor other elements as children, can be expressed as empty elements.
The syntax for empty elements is slightly different:
Note that this element does not have an opening and a closing tag! They are merged into one.
Empty elements often have attributes that contain additional information. Example:
<image file="/myImages/image.png" />
In this example, the attribute "file" contains a pointer to an image.
Add the empty element
added as the first child of the root. Add to this element the attribute
date with todays date as the value.
<address> <added date="2003-08-01" /> <name> [...]
Does the date attribute value follow the international standard for dates?
The international standard for dates (ISO 8601) uses the date format yyyy-mm-dd. Make sure the date value you added complies with this standard.
scheme attribute to the added element that makes it explicitly clear that the date format used ISO 8601.
<address> <added date="2003-08-01" scheme="iso 8601" /> [...]
Wellformedness and "Malformedness"
If all elements in the document that are opened (
<elementName>) are also closed (
</elementName>) at their respective nesting level, this means that the document is wellformed. Wellformedness is a fundamental requirement.
Else, it is malformed. Malformedness is a fundamental error.
Example of malformedness:
<name> <first-name>Miki <last-name>Azuma</last-name> </name>
The example above is malformed XML because the
first-name element was not closed.
Another example of malformedness:
<name> <first-name>This is the text</last-name> </name>
The example above is malformed XML because the
first-name element was not closed, and because there was a closing tag for an element
last-name which was not open.
Another example of malformedness:
<document> <paragraph> <sentence> This is a sentence inside a paragraph. </paragraph> </sentence> </document>
The example above is malformed XML because the paragraph element was closed before the sentence element was closed. Elements must be closed at the same nesting level as they were opened.
Make sure your document is well formed. In your text editor, choose "save as" and save the document with the file extension ".xml".
Then go to the folder where the document is placed. Open the document in internet explorer.
If your document is well formed, the XML Document tree will be displayed in Internet Explorer as a collapsible tree. If it is not well formed, an error message will be shown.
XML provides mechanisms to impose constraints on the documents storage layout and logical structure. One of these mechanisms is the schema.
There are several XML schema languages. The original and most broadly supported schema language (also defined by the XML 1.0 specification) is the Document Type Definition (DTD).
The DTD defines a collection of element and attribute names that are allowed in the document. It also defines the relationship between these names (which element is allowed as a child of which, which attribute is allowed on which element, etcetera). This collection is sometimes referred to as an XML grammar, and sometimes an XML language.
Example: The XHTML 1.0 DTD defines that the element name for paragraph is "p". In the document it should read as follows:
<p>This is a paragraph</p>
The DTD also defines for which elements text nodes are allowed, and which elements are empty.
What is the reason to want to impose grammatical constraints on an XML language?
If all elements, attributes, etc, use names and syntax as defined in the DTD, the document is valid. If this is not the case, it is invalid. Example:
<para>This is a paragraph using the element name "para".</para>
"para" is unknown to the XHTML DTD. The above example is wellformed XML but invalid XHTML. Element name used must be "p":
<p>This is a paragraph using the element name "p" as defined by the XHTML DTD</p>
Wellformedness and Validity
Wellformedness refers to the syntactical correctness of the document.
Validity refers to the grammatical correctness of the document.
A document that is not wellformed can never be valid.
A document that is wellformed can be invalid.
The Prolog: XML and DOCTYPE declarations
The prolog resides above the root element open tag.
The prolog contains the XML and DOCTYPE declarations.
The doctype declaration is required to tell if the the document is valid, since the doctype declaration tells which DTD the document is associated with.
The XML declaration is required only if character set is other than Unicode (utf-8), but it is recommended to always include the XML declaration.
The XML declaration always occurs as the first element in the file:
The DOCTYPE declaration always occurs after the XML declaration, and before the root element open tag:
<?xml version="1.0"?> <!DOCTYPE ... > <root> ... </root>
Terms you should understand now
- open tag
- close tag
- root element/document element
- empty element
- text node
- DTD Validity
Text is available under the terms of the DAISY Consortium Intellectual Property Policy, Licensing, and Working Group Process.