Rules for well-formed XML
(Extensible Markup Language)
XML declaration
- are not always required, but it is good to always provide one
- it must start with the first character in the first line in the file
- <?xml version='1.0' encoding='UTF-8' standalone='yes'?>
- unlike other XML attributes, the attributes must be in the order shown
- versions are 1.0 and 1.1 - we will use 1.0
- several encodings are available - we will use UTF-8
- standalone may be yes or no - we will start with yes, and later use no
XML names
- are case-sensitive
- start with a letter (including UTF non-Latin characters) or underscore (not usually used)
- following characters may be letters, underscore, and also numbers, hyphens, and periods
- may NOT contain spaces
- may contain colon when specifying name spaces
- may not start with the letter sequence xml (upper, lower, or mixed case)
- trailing spaces are allowed after names
XML elements
- must have an XML name
- leading spaces are not allowed before element names
- must have an opening tag
- must have a closing tag, beginning with a less than symbol and a slash (or the opening tag must be a self-closing singleton tag, ending with a slash and greater than symbol)
- must be nested correctly (the inner element closing tag must be before the outer element closing tag)
- only one root element, which must contain all other elements, is allowed in a document
- whitespace is kept within PCDATA text, but is compressed elsewhere
- the order of the elements is sometimes important
XML attributes
- must have an XML name, which is unique within the element
- attributes are placed in the opening tag, not the closing tag
- the attribute name must be followed by an equal sign, followed by the value
- every attribute must have a value
- the value of the attribute must be within quotes; either single or double quotes
- newline characters are compressed into a single space in attribute values
- the order of the attributes does not matter
XML comments
- begin with <-- and end with -->
- may not contain --
- may not be put inside a tag
- may not be nested one comment inside another comment
XML entity references
& &
< <
> >
' '
" "
- there are no other entity references in XML version 1.0
- the leading ampersand and ending semicolon are required
XML PCDATA
- all text outside tags, entity references, and CDATA sections is PCDATA
- it may not contain & unless you mean the start of an entity reference
- it may not contain < unless you mean the start of a tag
XML CDATA
- if you do not want the parser to look for & and < in your text, put it in a CDATA section
- a CDATA section starts with <![CDATA[
- a CDATA section ends with ]]>
- the sequence ]]> may not be used outside a CDATA section
XML processing
instructions
- are not commonly used; we will not use them