XML Schema

XML Schema and DTD

XML Schema and DTD are two different ways to specify the allowed content in XML documents.
You may use eiter of these to specify XML documents. Later, we will look at RELAX NG, which is a third way to specify XML documents.
For various reasons, sometimes one or another of these specification techniques will be chosen.

Some key advantages of XML Schema:

  • XML Schema are written in the XML language. DTD are written in a subset of SGML.
  • XML Schema support namespaces. DTD does not.
  • XML Schema provides for data type specification. DTD does not.
  • XML Schema provides better support for reuse of pieces of the model.

On the other hand, DTD is simpler. So, DTD is easier to use for simple document specification.

We can see that DTD is sometimes used, even for more complicated specifications. For example, the Document Object Model (DOM) is used for programming support of web pagess in the client's browser or other user agent. ( The term user agent includes browsers. It also includes other Internet interface programs that are used in other environments such as cell phones. ) The DOM uses DTD, but it provides extensions to the DTD specification, to meet the more advanced requirements of the DOM.

Initial example

Before looking at all the details, you might wish to look at a simple example first. Look at the first example in the XML Schemas chapter in the textbook. See the reading assignment below.

Don't try to understand the details of this example yet. It just helps to see the overall picture, before we start on the details.

Two XML documents

When we worked with a DTD and an XML document, the DTD specification document was in simplified SGML, and then we created an XML document, that was a valid instance, meeting the specifications given in the DTD document.

Now, working with XML Schema, we still have two documents. The first document is the specification. The specification document is the XML Schema document. The second document is our instance document, that meets the specifications given in the XML Schema document.

The XML Schema document is written in the XML language.
Our instance document is also written in the XML language.

Start XML Schema document

Let's start writing our XML Schema document. This document will specify what is required and what is allowed in an instance document, that meets this specification.

Because the XML Schema document is written in XML, the first thing we need is an XML declaration. The XML declaration says it is an XML document. Example:

<?xml version="1.0" encoding="UTF-8"?>

Root element for the XML Schema

The root element for an XML Schema document must be  
schema

The root element will use the namespace for an XML Schema. You may remember from week 1, that a namespace is usually specified with a URI. A URI looks like a URL, but may not be a real directory or file in the web site. A URI is used, because the web site name belongs to a single organization. This allows the URIs to be unique.
The URI for the namespace of an XML Schema is:
http://www.w3c.org/2001/XMLSchema
We specify this as the default namespace for our XML Schema document with:
xmlns="http://www.w3.org/2001/XMLSchema"

Later we will build one or more instance documents, which will conform to the specifications we write in this XML Schema. These instance documents are also sometimes called target documents.
We will specify a namespace to be used in the target documents. We will specify it twice here in our XML Schema document. Once so we can use it within our XML Schema document, and then so it can be used in the target document.

We will choose our own namespace for the target documents. I will choose:
http://voyager.deanza.edu/~oldham/XMLSchemaSample1
The first part of this URI matches a URL for my web site; The last part, XMLSchemaSample1, is not a real directory or file.

The first specification for the tartet document namespace is to give it a prefix Then we can use this prefix, if we need it in our XML Schema document.
Let's use the prefix target We specify this prefix with:
xmlns:target="http://voyager.deanza.edu/~oldham/XMLSchemaSample1"

Then we specify this same namespace, for use in the target XML document:
targetNamespace="http://voyager.deanza.edu/~oldham/XMLSchemaSample1"

Lastly, we specify the elementFormDefault, to control the way namespaces are used in the target XML document. By specifying it as qualified, it means that all child elements have the same namespace as their parent, unless a different namespace is specifed.
elementFormDefault="qualified"

Now we put all these in the root element, schema. That means our XML Schema contains, so far:

   <?xml version="1.0" encoding="UTF-8"?>
   <schema
       xmlns="http://www.w3.org/2001/XMLSchema"
       xmlns:target="http://voyager.deanza.edu/~oldham/XMLSchemaSample1"
       targetNamespace="http://voyager.deanza.edu/~oldham/XMLSchemaSample1"
       elementFormDefault="qualified"
       >
          

Specify the root element for the target XML documents

Next, in our XML Schema document, we will specify the root element for a target XML document. As a very simple example, we will only have a string of text within the root element. We can do this with an element tag. Notice that, in this example, it ends with /> so it needs no closing tag.
<element name="you-specify-this-name" type="string"/>
I will choose the root element name in my target document to be simpleExample in this example.

We can put these elements together and add a closing tag for the schema element to get the following very simple complete XML Schema document.

Complete XML Schema document

    <?xml version="1.0" encoding="UTF-8"?>
    <schema
        xmlns="http://www.w3.org/2001/XMLSchema"
        xmlns:target="http://voyager.deanza.edu/~oldham/XMLSchemaSample1"
        targetNamespace="http://voyager.deanza.edu/~oldham/XMLSchemaSample1"
        elementFormDefault="qualified"
        >
      <element name="simpleExample" type="string"/>
    </schema>
          

XML instance document

Now we have our XML Schema document. We can build an XML target documents that conform to the specification we wrote in the XML Schema document.

The first thing we need in our target instance document is the XML declaration. This says it is an XML document.

<?xml version="1.0" encoding="UTF-8"?>

Next we build the root element in our instance XML document.

There are four things we need to know, in able to build the root element in our instance document:

  1. The name of the root element of our instance document. We look in our XML Schema to find the specifiction of the element. It is specified by:
    <element name="simpleExample" type="string"/>
    So, the name of the root element in our instance document must be simpleExample
  2. The namespace for our target instance document. We look in our XCL Schema to find the value of the targetNamespace attribute in the schema element of the XML Schema. It is specified as:
    targetNamespace="http://voyager.deanza.edu/~oldham/XMLSchemaSample1"
  3. The namespace used in the standard, for instance doucment specification. It is:
    http://www.w3.org/2001/XMLSchema-instance
  4. The location and name of our XML Schema document. I put the XML Schema in the same directory with the instance XML document and gave it the name:
    sample1.xsd

There are several attributes we need within the opening tag of our root element. We will put the four things listed above into these.

Our XML Schema specifies that the target namespace is: http://voyager.deanza.edu/~oldham/XMLSchemaSample1
So we will use that as our default name space:
xmlns="http://voyager.deanza.edu/~oldham/XMLSchemaSample1"

We also need to use something from the XML Schema standard for an instance document.
To do this we will set up a second namespace. Let's give this namespace a local name of: xsi
The XML Schema standard for an instance document has a URI of: http://www.w3.org/2001/XMLSchema-instance
Put this together and we set up the xsi local name for the standard instance document namespace:
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

Now we will work on the schemaLocation attribute in our root element.
The schemaLocation attribute is specified in the XML Schema standard for an instance document, so we will use the local name of its namespace, which we created as:
xsi
So, the namespace and the schemaLocation attribute are:
xsi:schemaLocation

There are two things to put in the value for the xsi:schemaLocation attribute. They are the namespace of our XML instance document again, and the location and name of our XML Schema document.
Our namespace was specified as:
http://voyager.deanza.edu/~oldham/XMLSchemaSample1
The location and name of our XML Schema document is just:
sample1.xsd
We seperate these two with a space and get:
xsi:schemaLocation="http://voyager.deanza.edu/~oldham/XMLSchemaSample1   sample1.xsd"

Putting all these parts together, along with some text and a closing tag, we get the following XML document:

Complete XML instance document

    <?xml version="1.0" encoding="UTF-8"?>
    <simpleExample 
        xmlns="http://voyager.deanza.edu/~oldham/XMLSchemaSample1"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://voyager.deanza.edu/~oldham/XMLSchemaSample1   sample1.xsd"
        >
      Hello world.
    </simpleExample>
          

Validation

There are four steps in XML Schema validation. I will list the four steps Then, I will discuss exactly how to do the steps using XML Code Editor. If you are using a different XML validation technique, the details will be different, but the four steps will be the same.

The first two steps are to validate your XML Schema document.

  1. Check your XML Schema document, to make sure it is well formed.
  2. Validate your XML Schema document against the standard specification for an XML Schema.

The remainding two steps are to validate each of your XML instance documents against your XML Schema document. Usually you will have many XML instance documents, that use the same XML Schema document.

  1. Check each XML instance document, to make sure it is well formed.
  2. Validate each XML instance document against your XML Schema.

Now let's look at exactly how to do these four steps in XML Copy Editor.
In the XML Copy Editor, validate your XML Schema document:

  1. Check that the Schema document is well formed by using the left check mark icon.
  2. Do NOT use the right check mark icon; it does not work for XML Schemata.
    Inseted, select in the menu: - XML - validate - xmlschema.

Then, for each XML instance document:

  1. Check that the document is well formed by using the left check mark icon.
  2. Validate each XML instance document against your XML Schema by using the right check mark icon.

Complex type element declaration

You can have simple or complex content within an element.
You can specify that an element is simple, so that it only contains text. Just specify the element in your XML Schema, and it will be simple. To contain anything other than just text, you need to specify that the content of the element is complexType.
You do this by putting a conplexType element within your element element
in your XML Schema.
In the following example, your root element contains a complexType, which contains a sequence of elements.

XML Schema document with complexType

<?xml version="1.0" encoding="UTF-8"?>
    <schema
        xmlns="http://www.w3.org/2001/XMLSchema"
        xmlns:target="http://voyager.deanza.edu/~oldham/XMLSchemaSample2"
        targetNamespace="http://voyager.deanza.edu/~oldham/XMLSchemaSample2"
        elementFormDefault="qualified"
        >
         <element name="animal">
           <complexType>
             <sequence>
               <element name="rate" type="string"/>
               <element name="kind" type="string"/>
             </sequence>
           </complexType>
         </element>
    </schema>
          

XML instance document using complexType

<?xml version="1.0" encoding="UTF-8"?>
<animal 
     xmlns="http://voyager.deanza.edu/~oldham/XMLSchemaSample2"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://voyager.deanza.edu/~oldham/XMLSchemaSample2   sample2-complex.xsd"
     >
   <rate>discount</rate>
   <kind>rabbit</kind>
</animal>
          

sequence, all, and select

sequence

In the example before, we saw the use of sequence. The elements specified in a sequence must all be used, and must be in the order specified.

all

all may be used, as an alternative to sequence. When all is used, all the elements must be used, but they can be in any order.

choice

choice may be used, as another alternative to sequence. When all is used, only one of the element types may be used.

attribute

Another thing that can be put in a complexType is attribute. See the following example:

XML Schema document with attribute

<?xml version="1.0" encoding="UTF-8"?>
    <schema
        xmlns="http://www.w3.org/2001/XMLSchema"
        xmlns:target="http://voyager.deanza.edu/~oldham/XMLSchemaSample2"
        targetNamespace="http://voyager.deanza.edu/~oldham/XMLSchemaSample2"
        elementFormDefault="qualified"
        >
         <element name="animal">
           <complexType>
             <sequence>
               <element name="rate" type="string"/>
               <element name="kind" type="string"/>
             </sequence>
             <attribute name="age" type="string"/>
           </complexType>
         </element>
    </schema>
          

XML instance document using attribute

<?xml version="1.0" encoding="UTF-8"?>
<animal 
     xmlns="http://voyager.deanza.edu/~oldham/XMLSchemaSample2"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://voyager.deanza.edu/~oldham/XMLSchemaSample2   sample2-complex.xsd"
     age="four"
     >
   <rate>discount</rate>
   <kind>rabbit</kind>
</animal>
          

Reading assignment

Reading assignments are in Beginning XML, by DavidHunter, et al.
Read the chapter.

Chapter 5: XML Schemas

You may wish to download the examples from the web site for the book. Follow the link to the left.