Rules for well-formed XML (Extensible Markup Language)

 

XML declaration

 

- are not always required, but it is good to always provide one

- it must start with the first character in the first line in the file

- <?xml version='1.0' encoding='UTF-8' standalone='yes'?>

- unlike other XML attributes, the attributes must be in the order shown

- versions are 1.0 and 1.1 - we will use 1.0

- several encodings are available - we will use UTF-8

- standalone may be yes or no - we will start with yes, and later use no

 

XML names

- are case-sensitive

- start with a letter (including UTF non-Latin characters) or underscore (not usually used)

- following characters may be letters, underscore, and also numbers, hyphens, and periods

- may NOT contain spaces

- may contain colon when specifying name spaces

- may not start with the letter sequence xml (upper, lower, or mixed case)

- trailing spaces are allowed after names

 

XML elements

 

- must have an XML name

- leading spaces are not allowed before element names

- must have an opening tag

- must have a closing tag, beginning with a less than symbol and a slash (or the opening tag must be a self-closing singleton tag, ending with a slash and greater than symbol)

- must be nested correctly (the inner element closing tag must be before the outer element closing tag)

- only one root element, which must contain all other elements, is allowed in a document

- whitespace is kept within PCDATA text, but is compressed elsewhere

- the order of the elements is sometimes important

 

XML attributes

 

- must have an XML name, which is unique within the element

- attributes are placed in the opening tag, not the closing tag

- the attribute name must be followed by an equal sign, followed by the value

- every attribute must have a value

- the value of the attribute must be within quotes; either single or double quotes

- newline characters are compressed into a single space in attribute values

- the order of the attributes does not matter

 

XML comments

 

- begin with <-- and end with -->

- may not contain --

- may not be put inside a tag

- may not be nested one comment inside another comment

 

XML entity references

 

&amp;             &

&lt;                   <

&gt;                 >

&apos;            '

&quot;             "

- there are no other entity references in XML version 1.0

- the leading ampersand and ending semicolon are required

 

XML PCDATA

 

- all text outside tags, entity references, and CDATA sections is PCDATA

- it may not contain & unless you mean the start of an entity reference

- it may not contain < unless you mean the start of a tag

 

XML CDATA

 

- if you do not want the parser to look for & and < in your text, put it in a CDATA section

- a CDATA section starts with <![CDATA[

- a CDATA section ends with ]]>

- the sequence ]]> may not be used outside a CDATA section

 

XML processing instructions

 

- are not commonly used; we will not use them