XML Documents

This lesson shows you how XML documents are constructed. Similar to an HTML document, XML documents consist of stuff at the top of the document, followed by the content.

Consider the following XML example:

The following table provides an explanation of each part of the XML document in the above example:

Prolog (optional)XML Declaration<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Document Type Definition (DTD)<!doctype document system "tutorials.dtd">
Comment<!-- Here is a comment -->
Processing Instructions<?xml-stylesheet type="text/css" href="myStyles.css"?>
White Space
Elements & Content (required)Root element opening tag<tutorials>
Child elements and content

<tutorial>

  <name>XML Tutorial</name>

  <url>https://www.quackit.com/xml/tutorial</url>

</tutorial>

<tutorial>

  <name>HTML Tutorial</name>

  <url>https://www.quackit.com/html/tutorial</url>

</tutorial>

Root element closing tag</tutorials>

Here's a more detailed explanation of each part:

Prolog

Right at the top of the document, we have a prolog (also spelt prologue). A prolog is optional, but if it is included, it should become at the beginning of the document. The prolog can contain things such as the XML declaration, comments, processing instructions, white space, and document type declarations. Although the prolog (and everything in it) is optional, it's recommended that you include the XML declaration in your XML documents.

XML Declaration

The XML declaration indicates that the document is written in XML and specifies which version of XML. The XML declaration, if included, must be on the first line of the document.

The XML declaration can also specify the language encoding for the document (optional) and if the application refers to external entities (optional). In our example, we specify that the document uses UTF-8 encoding (although we don't really need to as UTF-8 is the default), and we specify that the document refers to external entities by using standalone="no". This is not a standalone document as it relies on an external resource (i.e. the DTD).

Although the XML declaration is optional, the W3C recommends that you include it in your XML documents. In any case, you'll need the XML declaration to successfully validate your document.

Document Type Definition (DTD)

The DTD defines the rules of your XML document. Although XML itself has rules, the rules defined in a DTD are specific to your own needs. More specifically, the DTD allows you to specify the names of the elements that are allowed in the document, which elements are allowed to be nested inside other elements, and which elements can only contain data.

The DTD is used when you validate your XML document. Any application that uses the document must stop processing if the document doesn't adhere to the DTD.

DTDs can be internal (i.e. specified within the document) or external (i.e. specified in an external file). In our example, the DTD is external.

Comments

XML comments begin with <!-- and end with -->. Similar to HTML comments, XML comments allow you to write stuff within your document without it being parsed by the processor. You normally write comments as an explanatory note to yourself or another programmer. Comments can appear anywhere within your document.

Processing Instructions

Processing instructions begin with <? and end with ?>. Processing instructions are instructions for the XML processor. Processing instructions are not built into the XML recommendation. Rather, they are processor-dependant so not all processors understand all processing instructions. Our example is a common processing instruction that many processors understand. The instructions to the processor is to use an external style sheet.

White Space

White space is simply blank space created by carriage returns, line feeds, tabs, and/or spaces. White space doesn't affect the processing of the document, so you can choose to include whitespace or not.

Technically speaking, the XML recommendation specifies that XML documents use the UNIX convention for line endings. This means that you should use a linefeed character only (ASCII code 10) to indicate the end of a line.

Speaking of white space, there is a special attribute (xml:whitespace) that you can use to preserve whitespace within your elements (but we won't concern ourselves with that just now).

Elements & Content

This is where the document's content goes. It consists of one or more elements, nested within a single root element.

Root Element Opening Tag

All XML documents must have one (and only one) root element. All other elements must be nested inside this root element. In other words, the root element must contain all other elements within the document. Therefore, the first tag in the document will always be the opening tag of the root element (the closing tag will always be at the bottom of the document).

Child Elements and Content

These are the elements that are contained within the root element. Elements are usually represented by an opening and closing tag. Data and other elements reside between the opening and closing tag of an element.

Although most elements contain an opening and closing tag, XML allows you to use empty elements. An empty element is one without a closing tag. You might be familiar with some empty elements used in HTML such as the <img> element or the <br> element. In XML, you must close empty elements with a forward slash before the > symbol. For example, <br />.

Elements can also contain one or more attributes. An attribute is a name/value pair, that you place within an opening tag, which allows you to provide extra information about an element. You may be familiar with attributes in HTML. For example, the HTML img tag requires the src attribute which specifies the location of an image (eg, <img src="myImage.gif" />).

Root Element Closing Tag

The last tag of the document will always be the closing tag of the root element. This is because all other elements are nested inside the root element.