The Semantic Abyss - Plumbing the Semantic Web: Dirt simple XML schema for acronyms

Although it was not my original intent to dive into XML "code" so soon, I was feeling more than a little disoriented and felt a need to get at least some footing before delving into all of the conceptual angles. In particular, I figured that by trying out an interactive XML schema design tool I could very quickly get a small schema running without the need to master all of the nuances of XML Schema. The process did not go quite as smoothly as I had expected, but several hours later I do have two small test schemas for acronyms, as well as two test XML files based on those schemas. Without any further ado I will present the two test XML files, but I do not intend to offer a tutorial on all of the XML angles at this time. Some stuff is obvious and some stuff may not even be explainable in even a series of blog posts. Focus on what is obvious and ignore the rest, for now. One might wonder why I do not present the schemas first, but the simple facts are that XML schemas are somewhat cryptic and it is much simpler to have pre-visualized some sample XML text in your head before trying to make sense of the schemas. You may also be wondering why I have two schemas, but that will be clear in a moment.

All of my XML-related files will be kept on my Software Agent Web site, Agtivity.com.

The tool that I used to create the XML schemas and XML test files is Liquid XML Studio 6.1.17.0 - Free Community Edition from Liquid Technologies Limited.

So, here it is, my first test XML file for acronyms, acronym1.xml:

<?xml version="1.0" encoding="utf-8"?>

<Acronyms xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xsi:noNamespaceSchemaLocation="http://agtivity.com/xsd/acronym1.xsd">
<Acronym Term="ABC" CompoundTerm="Agent-Based Computing" />
<Acronym Term="RDF" CompoundTerm="Resource Description Framework" />
</Acronyms>

It only has two acronyms, but it should be fairly obvious how to add more. They are completely expressed by these two lines:

<Acronym Term="ABC" CompoundTerm="Agent-Based Computing" />
<Acronym Term="RDF" CompoundTerm="Resource Description Framework" />

Each acronym has a term and the equivalent compound term. Pretty simple stuff, or so it would seem. In XML parlance Term and CompoundTerm are known as attributes. In this schema, each acronym has two attributes, a Term, and a CompoundTerm.

With this image of what the XML data actually looks like, it will be easier to make sense of the XML schema.

So, here it is, my first XML Schema for acronyms, acronym1.xsd:

<?xml version="1.0" encoding="utf-8" ?>

<xs:schema elementFormDefault="qualified"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Acronyms" type="AcronymList" />
<xs:complexType name="Acronym">
    <xs:attribute name="Term" type="xs:string" />
    <xs:attribute name="CompoundTerm" type="xs:string" />
</xs:complexType>
<xs:complexType name="AcronymList">
    <xs:sequence>
      <xs:element minOccurs="0" maxOccurs="unbounded"
          name="Acronym" type="Acronym" />
    </xs:sequence>
</xs:complexType>
</xs:schema>

There is plenty of gibberish there, but the essence is that the schema defines a list of acronyms using the type complexType named AcronymList which consists of zero or more occurrences of elements of the type Acronym which is also a complexType and consists simply of two attributes which are strings, one called Term and the other called CompoundTerm.

Back in acronym1.xml, you can see that the xsi:noNamespaceSchemaLocation attribute gives the URL of the schema file, acronym1.xsd.

If you can make sense out of all of this, that is great, but at least you have been exposed to what it takes to do even something very simple in XML. Actually, it is not too bad, but it is a bit more like looking at the components and wiring inside your computer rather than simply figuring out how to use it.

But wait... we are only halfway done. I said that there were two distinct approaches to the schema and test file. The first schema defined an acronym in terms of two attributes, which is fine for very simple, unstructured data, but is too limiting for structured data. The second approach to the schema uses elements rather than attributes.

So, here it is, my second test XML file for acronyms, acronym2.xml, using elements, rather than attributes:

<?xml version="1.0" encoding="utf-8"?>

<Acronyms xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
    xsi:noNamespaceSchemaLocation="http://agtivity.com/xsd/acronym2.xsd">
<Acronym>
    <Term>ABC</Term>
    <CompoundTerm>Agent-Based Computing</CompoundTerm>
</Acronym>
<Acronym>
    <CompoundTerm>Resource Description Framework</CompoundTerm>
    <Term>RDF</Term>
</Acronym>
</Acronyms>

The header is almost identical but points to the second schema. The main difference is that each acronym takes four lines rather than a single line. My simple acronym example does not (yet) need the power of structured (nested) elements, but I hope you can see how it might be used. Future blog posts will explore the matter further. Anyway, a single acronym has four lines:

<Acronym>
<Term>ABC</Term>
<CompoundTerm>Agent-Based Computing</CompoundTerm>
</Acronym>

The first line is the same as the acronym line in the first test file, but without the attributes. The last line marks the "end" of the acronym and the elements of the acronym are in between. It is fairly obvious how the value of the Term element and the CompoundTerm elements are expressed.

Now, here is my second XML Schema for acronyms, acronym2.xsd, using elements rather than attributes:

<?xml version="1.0" encoding="utf-8" ?>

<xs:schema elementFormDefault="qualified"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Acronyms" type="AcronymList" />
<xs:complexType name="Acronym">
    <xs:all>
      <xs:element name="Term" type="xs:string" />
      <xs:element name="CompoundTerm" type="xs:string" />
    </xs:all>
</xs:complexType>
<xs:complexType name="AcronymList">
<xs:sequence>
      <xs:element minOccurs="0" maxOccurs="unbounded"
          name="Acronym" type="Acronym" />
    </xs:sequence>
</xs:complexType>
</xs:schema>

The AcronymList complex type is the same as in the first schema. The essential difference is that the Acronym complex type now consists of a group of elements, all of which must be expressed in any XML data, and those elements are simple, unstructured, scalar types.

Once again, if you can make sense out of all of this, that is great, but at least you have been exposed to what it takes to do even something very simple in XML.

The good news is that now that we have a lot of the basic stuff out of the way, we can incrementally build on it.

Note that this is still not a true Semantic Web since it does not use RDF, but it does show how Semantic Web Technologies can be used. At some point down the road I will convert the XML Schema to a full-blown OWL ontology and start using RDF triples.

-- Jack Krupansky

The Semantic Abyss - Plumbing the Semantic Web

Wednesday, September 24, 2008

Dirt simple XML schema for acronyms

0 Comments:

About Me

Previous Posts