CIS 92A Introduction to XML

Lecture notes

 

Week 1 - XML

 

History of markup languages

 

The first markup language I saw was called script.

It was used to markup text to publish in programming manuals.

(That was long, long ago, when there were actually programming manuals shipped with the software.)

Later it was somewhat revised into General Markup Language (GML).

script has markup tags for paragraphs, headings, lists, and stuff like that, that you still see in HTML.

 

The guys over in the Research Division build a markup language they called Standard Generalized Markup Language (SGML).  It is used to define markup languages.

 

HTML is and extension of the old script markup language, which provides for images, links, and other stuff that is good in web pages.

 

The formal definition of exactly what HTML is in is written in SGML.

You can read it in the standards documents.

You can; SGML is too difficult for me to read.

(Well, I can read some of it.)

 

 

What is XML?

 

What is XML?           

Full name:      Extensible Markup Language

XML is a simple markup language, used to describe and contain data.

It is actually a very simplified derivative of SGML.

 

How is XML is different from HTML?

HTML is a markup language that contains the data, and describes how it is displayed in a browser window.

XML only contains the data, not the description of how it is displayed in a browser window.

XML just organizes the data.

 

What is good, or bad, about the fact that XML only contains the data?

 

HTML was developed to build web pages that display in a browser window.

HTML is a well balanced compromise between containing the data and displaying the data.  HTML has served us well for years for building web pages, and will continue to do so.  If you want to just build a page with fixed information, use HTML.

 

Because XML only describes the data, and does not describe how it is to be displayed, it can be displayed in various ways.

The same data may be used or displayed in various ways.

Different data may be inserted in a web page.

 

Example:

Suppose you are a big on-line store.

You build a standard web page for displaying the description of a book.

Then you insert the book information.

The web page does not contain information for a specific book.

Each book can be retrieved from a data base into XML format and inserted in the page.

There is a lot of retrieval and processing of data, and then showing it into the web page.  This is an excellent XML application.

 

Example:

Suppose you want to display data on a cell phone, or other small device.

The traditional HTML formatting may be a poor match to the small size of the device.

You build a new display program to show the data.

The data is provided in XML, and displayed by your display program.

 

Example:

Suppose you are a company, and want to send an order to your supplier.
You can use an agreed upon XML format to contain the order.
Your software can build and understand the order.
The supplier also has software that can understand the order.

 

HTML has been good to build some web pages.

XML is better when managing a lot of data from a data base, such as for an on-line store.  XML is better when displaying the data in new hardware devices, that are not similar to a desktop or laptop computer.  XML can be used to exchange information between different systems.

 

 

So, why do we learn XML?

 

Because it is an important markup language used in managing and processing
data, as well as showing it in a web page.

 

So, use HTML to just build web pages.

USE XML if the data needs to be stored, processed, passed between application programs, and displayed in a web page.

 

In other words, simple web sites can just use HTML.

Commercial sites often use XML.

 

 

Where is XML used?

 

XML is used for web page data.

Besides being used in XHTML, XML is used to transfer data to HTML pages.

This can be done in two ways.

1) XML data is used in the server, to build the data into the web page, which is then sent to the user.

2) While the web page is being used in the browser on the user's machine,

more data can be sent over the Internet, and put into the user's page, without loading a new web page.  This technology is called AJAX.

You see AJAX at work when you use Google maps.  When you move the map, a white space may appear where the data is not yet available.  This white space is filled in with the map data, while you are still using the web page.

 

XML is used by the browser to manage web page display.

Modern browsers have an XML model of the web page, which keeps track of how the data is displayed.  It is called the Document Object Model (DOM).This model is used internally by the browsers.  You can also change the web page by changing the model.  You can do this by using JavaScript.  This use of JavaScript to change a page being used is called dynamic HTML, or dynamic web pages.

 

XML is used to transfer commercial data

XML data definitions have been created for many forms of commercial data.  One of these definitions can be used to create an order for auto parts.

This XML data can then be exchanged between the auto factory and a parts supplier, to manage the orders for parts needed to manufacture new cars.

This application has nothing to do with the display of data with a browser.

 

XML is used to transfer data to hand held devices

Cell phones and personal data devices are too small to have regular browsers to display data.  They may use a limited set of XML or XHTML data.  The display can then be implemented to use this XML data.  The implementation is much simpler than building an HTML browser.

 

 

What we learn in this course

 

The first week we will learn to write well-formed XML.  (Everything except namespaces and a few other things, which we will learn later)

The rest of the course is on how to control and use XML.

 

We will not learn how to write programs managing XML.

We will not learn JavaScript, the Document Object Model, or AJAX in this course, though we may discuss them a little.

 

 

HTML, XML, and XHTML standards

 

You are expected to already know how to write HTML.

Tonight we will learn how to write XML.

XHTML is HTML, which also meets the requirements for XML, so it is both HTML and XML.

 

The current standard for HTML is                4.01

The current standard for XHTML is 1.0 and 1.1

1.1 is not much different, but begins to divide HTML up into limited subsets, for easier use on handheld devices.

The current standard for XML is                  1.0

 

I am currently mostly using XHTML 1.0 strict

The standards committees are working on two competing possible next level standards:

 

HTML 5

XHTML 2

 

HTML 5 is being designed to tie the HTML language to processing programs.

On the other hand, XHTML 2 assumes that the processing programs will change, so it follows the XML objective of avoiding any ties to programs, with the expectation that all new processing programs will be able to process the simpler XHTML.  We can watch to see how this will come out.

 

The current standards are more strict than older levels of the standard, especially in requiring containers to be closed.

The XML markup language is the simplest.

However, XML is very precise in exactly how it must be written.

So, XML is simple, but must be done correctly.

 

HTML specifies closing tags, but works without them.

 

EXAMPLE:

<ul>

  <li> red

  <li> blue

</ul>

 

There is a closing </ul> tag to complete the ul container, but there are no closing </li> containers.  You must always use the closing tags, to create standard XHTML.

You can get by without some closing tags in HTML.

Other closing tags are required in HTML.

You can learn which tags require closing tags, or just code all the closing tags.

 

In XML, you must always have a closing tag.

There are no exceptions.

This makes XML consistent and simple.

But, you must always code it exactly correctly.

 

Of course, some tags are not containers; they have no closing tag in HTML.

 

EXAMPLE:

 

<hr>              in HTML

 

OR

 

<hr/>            in XML, which is shorthand for the <hr></hr> empty container.

 

OR

<hr />         which works in HTML and XML; this is XHTML

 

What is the difference between HTML and XML in this example?

The name hr must be followed by a space or the closing > in HTML.

A singleton tag must have a slash before the closing > in XML

Ending with a space slash makes the <hr /> work in either HTML and XML;
this is HTML.

 

What is XHTML, anyway?

XHTML meets the specification for HTML, so it is HTML.

XHTML meets the specification for XML, so it is XML also.

 

 

 

Well formed XML

 

XML must meet a few rules to be well formed.

All XML MUST be well formed.

It is the duty of any application using XML,
to detect any XML that is not well formed.

 

Pass out quick reference page of rules

 

You must follow the rules exactly.

 

Week 1 on the web site

 

Now let's look at the on-line material for week 1.