• Monday, February 20, 2012

Previous

Next

A Pleasant Little Chat about XML

October 6, 2009, 10:35 am

An important part of the ProfHacker 101 Manifesto is that we want to foster change by teaching people bits and pieces of technology that we use every day but others find totally intimidating. Markup languages, programming languages, database schemas, and similar technologies are those things that I live and breathe but I know send others running for the hills.

I am not here to tell anyone to use or begin to think about working with anything that doesn’t have a natural place in your personal or professional lives, but I am here to help you gain a working vocabulary of some of these topics. Plus, the next installment of the “Working with APIs” series (see parts 1, 2, 3) will use XML and I wanted to make sure that I had something relatively concise that you could reference. This post is simply a gentle introduction to XML and what it is intended to do.

XML [EXtensible Markup Language] is what its name suggests: a markup language, much like HTML. In fact, XML and HTML are siblings (if you want to think of it that way) in that they are both derivatives of SGML, or “Standard Generalized Markup Language”.

Data that has been encoded (“marked up”) in XML is just a plain text file with text surrounded by tags. At its core, XML is platform and application independent; you don’t need a proprietary word processing program to create or read an XML document, and a text file containing HTML can be opened on Windows, Mac, Linux/UNIX, or any other operating system that opens text files. [However, if you are looking for an XML editor, I recommend the <oXygen/> XML editor although there are others]

XML is not a programming language (neither is HTML for that matter). XML alone will not do anything at all. Much to @warnick’s dismay, a pony will not magically spring forth from an XML document… unless some other process has put the pony in place and XML is available to transport it to you, because XML was designed to store, transport, and exchange data.

Here’s the kicker: there are no XML tags to memorize, because with XML you create your own. When you mark up an HTML document you have to know things like <html><head><title></title></head><body></body></html> and everything in between, but with XML the structure of the document and the language you use to describe the data being stored is completely up to you.

The image used in this post shows an example of an XML document. XML documents contain two major elements: the prolog and the body. The prolog contains the XML declaration statement, and any processing instructions and comments you want to add. The following snippet is a valid prolog, and you can see at least some of it in the image as well:

<?xml version="1.0" ?>

After the prolog comes the content structure. XML is hierarchical, like a book—you know that in general books have titles and chapters, each of which contain paragraphs, and so forth. There is only one root element in an XML document, and in the case of the example in the image, the root element is “quiz”. But since I just mentioned a book, and books are easy to grasp, let me use the example of a book in a catalog. [NOTE: I yanked this example from Chapter 28, "Working with XML," of Sams Teach Yourself PHP, MySQL and Apache All in One, 4th ed., written by yours truly.]

The root element in this example is “Books”; the tags <Books></Books> surround all other information. Next, child elements are added to the document. In my Book example I’ll just pretend that Books only need elements for title, author, and publishing information. But the publishing information will likely contain more than one bit of information—you’ll need a publisher’s name, location, and year of publication. Not a problem—just create another set of child elements within your parent element (which also happens to be a child element of the root element). For example, just the <PublishingInfo> element could look like this:

<PublishingInfo>
<PublisherName>Sams Publishing</PublisherName>
<PublisherCity>Indianapolis</PublisherCity>
<PublishedYear>2008</PublishedYear>
</PublishingInfo>

<

All together, a sample books.xml document with one entry could look something like this:

<?xml version="1.0" ?>
<!--Sample XML document -->
<Books>
  <Book>
    <Title>A Very Good Book</Title>
    <Author>Jane Doe</Author>
    <PublishingInfo>
      <PublisherName>Sams Publishing</PublisherName>
      <PublisherCity>Indianapolis</PublisherCity>
      <PublishedYear>2008</PublishedYear>
    </PublishingInfo>
  </Book>
</Books>

There are two important rules (among many) for creating valid XML documents:

  • XML is case sensitive, so <Book> and <book> would be considered different elements.
  • All XML tags must be properly closed, XML tags must be properly nested, and no overlapping tags are allowed.

Ok, so that’s what XML can look like, but when do you use it? Well, it depends. Technically, you probably use XML every day, at least if you read blogs via a feed reader—all of the content that gets to that reader is in XML format—or if you use a third-party client to interact with Twitter. Remember, XML is used to store and transport data; when a client interfaces with Twitter using the Twitter API, data is sent back in response. That data looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:google="http://base.google.com/ns/1.0" xml:lang="en-US"
xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/"
xmlns="http://www.w3.org/2005/Atom"
xmlns:twitter="http://api.twitter.com/">
... (snip) ...
<entry>
  <id>tag:search.twitter.com,2005:4655492548</id>
  <published>2009-10-06T14:00:37Z</published>
  <link type="text/html"
  href="http://twitter.com/mkgold/statuses/4655492548"
  rel="alternate"/>
  <title>Late on these, but congrats to the @ProfHacker
  team on the @chronicle article and to @chutry on the First
  Monday piece. </title>
  <content type="html">Late on these, but congrats to
  the <a href="http://twitter.com/ProfHacker">@<b>ProfHacker</b></a>
  team on the <a href="http://twitter.com/chronicle">@chronicle</a>
  article and to <a href="http://twitter.com/chutry">@chutry</a> on
  the First Monday piece.</content>
  <updated> 2009-10-06T14:00:37Z</updated>
  <link type="image/png"
  href="http://a3.twimg.com/profile_images/415797105/pic.jpg" rel="image"/>
  <twitter:source><a href="http://twitter.com"
  rel="nofollow">web</a></twitter:source>
  <twitter:lang>en</twitter:lang>
  <author>
    <name>mkgold (Matt Gold) </name>
    <uri>http://twitter.com/mkgold</uri>
  </author>
</entry>
</feed>

The result above is the first result of the Twitter search for “ProfHacker” when I last checked a few minutes ago. If you were to access the search via the web you would get a different result—the data would not be sent via XML to your browser. But in order for a third-party Twitter client to show you the results, they first have to get the data (via XML) and then transform it into something readable through the client and on to you.

Right then. So, XML stores and transports data, and you get to figure out the structure and the tags that you use. Needless to say, there are thousands of other posts (and a ton of books, too) about XML and all of the other technologies that play a role in the storage, transformation, and transportation of data. Heck, the O’Reilly XML in a Nutshell book is over 600 pages long. That’s a heck of a large nut.

Tell me, besides getting a pony, why exactly you think you might want to use XML, and what it is you don’t understand. I could write for hours about what I want to do, but my problems are not your problems, and at ProfHacker we’re here to help you solve your problems. What practical example do you want to see next?

[Image from Wikimedia Commons]

This entry was posted in Software. Bookmark the permalink.

  • Print
  • Comment (12)

12 Responses to A Pleasant Little Chat about XML

G. Michael Guy - October 6, 2009 at 2:46 pm

Perhaps I’m far less techy than others here… but I think a better direction than the equation, “why exactly you think you might want to use XML?” is “here’s what XML can do for you.” I know it’s all over the place, and I accidentally open files all the time full of it, but I have no idea what use it might be for me. So beyond the built in XML in the programs we love, or for hardcore programmers, I ask you, “what should I be doing with XML?”

Rhonda - October 6, 2009 at 12:29 pm

I’d like to know how to get the xml feed from my google calendar and display events with my own formatting on my own web page.

Julie Meloni - October 6, 2009 at 12:50 pm

Ok, will you be retrieving and transforming the information using a client-side or server-side language? A little more info and description of what you want to do specifically will help me determine if you can use an existing third-party plugin (like if your web site is WordPress-powered), an existing Google Calendar Gadget, or something that accesses the Google Calendar API directly to retrieve & transform data. There are a lot of different possibilities!

G. Michael Guy - October 6, 2009 at 2:47 pm

I didn’t mean equation, I meant question. Can anyone guess what area I profess?! :-)

Brian Croxall - October 6, 2009 at 5:04 pm

I’m with Michael. I know the basics of XML, and I know that it’s used in TEI projects (and hundreds of other things). But I think I need a few suggestions of what one might do with XML in order for me to start thinking of awesome-but-not-qite-pony things that I want to try to do.

George H. Williams - October 7, 2009 at 1:48 pm

Would it be correct, however, to say that XHTML is example of using XML even though XML is not XHTML?

George H. Williams - October 7, 2009 at 7:22 am

One suggestion: A web page can be coded in XHTML, so in the back of my head I have a plan to convert all of my teaching and administrative files into XHTML so that they can be displayed within a web browser (using a web-appropriate stylesheet) or printed and put into a binder (using a printer-appropriate stylesheet).

Advantages? The documents will not be bound to a particular, proprietary word-processing program, and I won’t have to jump though hoops to convert a document originally created for print into a document suitable for web display. If a student or colleague needs a copy, I’ll just point them to the appropriate web page (where, if they hit “print” the printer-friendly stylesheet is automatically invoked) or use the “Print-to-PDF” function in the Mac environment and email it as an attachment.

That’s the plan, anyway…

It’s possible that I’ll work on this and write it up for ProfHacker over the winter break, but we’ll see if time and other unpredictables allow.

Brian - October 6, 2009 at 11:38 pm

I knew just about as much about XML as you’ve outlined in your post before I got to the point where I had to learn more in order to get something done.

My particular task concerned “My Library” in Google Books. I had entered all of the ISBN’s from an inventory of my library, and this made it easy for me to search the full text of (almost all) of the books on my shelves. However, I also wanted to make a printed backup of my list of books. Google made it easy to download an XML file containing the books in my library, but there weren’t any available tools for formatting this for printing.

So, I spent an afternoon learning about XSLT and writing a .xsl file that performed the transformation.

I’d suggest that introducing XSLT might be the next logical posting on this subject.

Julie Meloni - October 7, 2009 at 10:36 am

yes. yes it does.

George H. Williams - October 7, 2009 at 10:30 am

First, I want to point out the important fact that XML and XHTML are not the same thing

What?!

Hangs head in shame.

Does this mean I have to return the keys to the unicorn stable at ProfHacker HQ?

Julie Meloni - October 7, 2009 at 10:25 am

First, I want to point out the important fact that XML and XHTML are not the same thing. XML, as I’m introducing here, is a non-proprietary/independent means for storing and transferring data. XHTML is HTML fashioned as an XML application, or an XHTML derivative, in that XML is a metalanguage (a language to create other languages) in the same way that SGML is a metalanguage from which HTML (and XML for that matter) is sprung. So, making an XHTML compliant web site is in a sense making an XML compliant site, it is not the same as XML as described in this post, as the language has already been defined and already has a presentation layer as part of its definition.

Making a web site or web application purely in XML still needs (as Brian #2 wisely points out) a presentation template that transforms the data stored/described in the XML document into browser-renderable HTML. That is done with XSLT, which would be the next step in this discussion—what to do with data you have, if you plan to show it directly/without an intermediary agent.

With a intermediary agent, though, is how XML is most often used—as a transport language between applications. This gets to the original commenters’ questions of how or why to use it. That’s something I can’t really answer, because the goal here wasn’t to say “here are 5 ways XML can change your life.” What I’m more interested in at this point is that people understand what it means when someone throws XML into the conversation—that they’re talking about a structural/semantic markup language and not a presentation language. Then you can get down to business.

That business is going to totally depend on the project you’re working on, or the question you’re trying to answer, and so on. Brian C, for example, I would personally use XML as the transport language for data that goes into (and comes out of) SIMILE-based timelines; that would allow me to build custom interfaces for input by students. XML carries data to and from application APIs, of which there are many. Fundamentally, if I want to store data for the long-term in a non-proprietary format that I could be reasonable sure would be readable twenty years from now, I would use XML to do it; in fact, the idea for this XML introduction post came from Jason’s recent posts on institutional memory and what do with all that information. Instead of thinking “ok, what can I do with XML,” I would instead suggest thinking of problems regarding data that you might need to be solved, and considering if XML is one way to start solving it.

Julie Meloni - October 7, 2009 at 2:45 pm

If using fully-compliant XHTML with proper XML declaration and XHTML doctype, then the resulting markup in the page would be an application of XML. However, it would be very specifically the derivative language created by the XHTML Working Group. That’s not a bad thing, it’s been and continues to be a good thing for web development (and so is HTML 5, which is different).

XHTML is one version of XML, XML is not XHTML, and XML can be used to make MyML, YourML, etc, as necessary to the documents you want to create and store. You could think of it this way: when data is sent between apps or just sent for the purpose of being stored somewhere, and that language has been defined by the people creating it, then those derivatives of XML are at the same child level as XHTML. For instance, I mentioned how Twitter sends streams of data in XML format; that format has been defined by them. Technically, you could call it TwitterML or something. TwitterML and XHTML are the same type of derivative language. And so it goes.

My goal here and elsewhere is just to make sure people really know what they’re saying when they toss off an acronym or other term in conversation. The next step would be to do stuff with it, but can’t even think about putting the technology in motion so that it can help solve a problem if you don’t know what that technology really does. For instance, I want my car to run better. I know there are some mechanical parts I could tweak or add to the engine. But I don’t know what they do, so I don’t even know if they would make my car run better, which is why I don’t go to Pep Boys and browse the aisles without my mechanic on speed dial (or something to that effect).

  • The Chronicle of Higher Education
  • 1255 Twenty-Third St, N.W.
  • Washington, D.C. 20037