Friday, September 26, 2008

Adding multiple definitions for an acronym

In the real world, there may be multiple definitions of the same acronym. Sometimes they are from distinct domains and unrelated but sometimes they have evolved over time within a single domain, possibly for variations in usage or different audiences. For example, RSS is commonly accepted to stand for Really Simple Syndication, but it technically stands for Rich Site Summary or even RDF Site Summary.

There are three ways to give multiple definitions for a single acronym:

  1. Define the acronym in multiple XML documents.
  2. Place multiple acronym definitions in a single XML document.
  3. Extend the schema definition for acronym to allow multiple definitions.

Ultimately, #1 is probably best and represents the distributed nature of the Web and Semantic Web and supports definitions within distinct domains. #2 can make sense when there is some obvious connection between the definitions such as for my RSS example. #3 is a tighter way of doing #2 and also ties the multiple meanings together.

I have created a sample XML document, acronym2a.xml, that illustrates placing multiple definitions of the same acronym term in a single XML document. Here is the fragment of that document for RSS:

<Acronym>
 
<Term>RSS</Term>
 
<CompoundTerm>Really Simple Syndication</CompoundTerm>
</Acronym>
<Acronym>
 
<Term>RSS</Term>
 
<CompoundTerm>Rich Site Summary</CompoundTerm>
</Acronym>
<Acronym>
 
<Term>RSS</Term>
 
<CompoundTerm>RDF Site Summary</CompoundTerm>
</Acronym>

This sample document uses the same schema as my second example, acronym2.xsd.

This approach basically works, but does nothing to suggest that these "meanings" are related and requires excessive verbiage.

Next, I modified the schema to allow an arbitrary list of compound term definitions for each acronym. Unfortunately, I have not yet been able to figure out how to design such a schema that does not require an extra level of XML element to represent the list. The new scheme does work, but is a bit more wordy than I would prefer.

So, using the old schema we wrote:

<Acronym>
  <Term>RDF</Term>
 
<CompoundTerm>Resource Description Framework</CompoundTerm>
</Acronym>

But with the new schema that same exact definition becomes:

<Acronym>
  <Term>RDF</Term>
 
<CompoundTerms>
   
<CompoundTerm>Resource Description Framework</CompoundTerm>
 
</CompoundTerms>
</Acronym>

I am still hoping that I can find a way to design the schema to make that extra level of XML element grouping optional, but for now at least this approach is functional.

Anyway, the XML that combines the three RSS definitions for one acronym now becomes:

<Acronym>
  <Term>RSS</Term>
 
<CompoundTerms>
   
<CompoundTerm>Really Simple Syndication</CompoundTerm>
   
<CompoundTerm>Rich Site Summary</CompoundTerm>
   
<CompoundTerm>RDF Site Summary</CompoundTerm>
 
</CompoundTerms>
</Acronym>

This is now finaly starting to look somewhat useful for structuring information, albeit at a very simple level.

One thing that immediately stands out for future work is that rather than "RDF" simply being a string, it would be preferable to actually link that first word of the third definition of RSS to the synonym definition for RDF. That would then start to have the feel of more of a "semantic" Web.

The full sample XML, acronym3.xml, is als available online:

<?xml version="1.0" encoding="utf-8"?>
<!-- Created with Liquid XML Studio 6.1.17.0 - FREE Community Edition (http://www.liquid-technologies.com) -->
<Acronyms xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
    xsi:noNamespaceSchemaLocation="http://agtivity.com/xsd/acronym3.xsd">
  <Acronym>
    <Term>ABC</Term>
   
<CompoundTerms>
     
<CompoundTerm>Agent-Based Computing</CompoundTerm>
   
</CompoundTerms>
 
</Acronym>
  <Acronym>
   
<Term>RDF</Term>
   
<CompoundTerms>
     
<CompoundTerm>Resource Description Framework</CompoundTerm>
   
</CompoundTerms>
 
</Acronym>
 
<Acronym>
   
<Term>RSS</Term>
   
<CompoundTerms>
     
<CompoundTerm>Really Simple Syndication</CompoundTerm>
     
<CompoundTerm>Rich Site Summary</CompoundTerm>
     
<CompoundTerm>RDF Site Summary</CompoundTerm>
   
</CompoundTerms>
 
</Acronym>
</
Acronyms>

The full schema, acronym3.xsd, is starting to get a little verbose, but still fairly manageable:

<?xml version="1.0" encoding="utf-8" ?>
<!--Created with Liquid XML Studio 6.1.17.0 - FREE Community Edition (http://www.liquid-technologies.com)-->
<xs:schema elementFormDefault="qualified"
   
xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
<xs:element name="Acronyms" type="AcronymList" />
 
<xs:complexType name="Acronym">
   
<xs:all>
     
<xs:element name="Term" type="xs:string" />
     
<xs:element name="CompoundTerms" type="CompoundTermList" />
   
</xs:all>
 
</xs:complexType>
 
<xs:complexType name="AcronymList">
   
<xs:sequence minOccurs="0" maxOccurs="unbounded">
     
<xs:element name="Acronym" type="Acronym" />
   
</xs:sequence>
 
</xs:complexType>
 
<xs:complexType name="CompoundTermList">
   
<xs:sequence minOccurs="0" maxOccurs="unbounded">
     
<xs:element name="CompoundTerm" type="CompoundTerm" />
   
</xs:sequence>
 
</xs:complexType>
  <xs:simpleType name="CompoundTerm">
   
<xs:restriction base="xs:string" />
 
</xs:simpleType>
</
xs:schema>

Basically, I added two defined types, a complex type named CompoundTermList that is a container for the arbitrary list of acronym definitions, and a simple type named CompoundTerm that represents a single compound term. The other change was that the second element of Acronym is now a reference to a CompoundTermList rather than being a simple string. I could have stayed with simple strings for the elements of a CompoundTermList, but I have throughts about wanting to allow for more structure within a compound term in the future, such as "RDF" being a URI reference to the RDF synonym.

Once again, do not despair if a lot of this seems like total gibberish -- because it is! The goal at this stage is simply to get a flavor of XML, schemas, and Semantic Web Technologies so we have a sense of footing before diving too far and deep off the deep end.

The next thing I am thinking about is to produce rudimentary term and phrase schemas so that an acronym can refer to a term as a full-fledged XML resource and so that a compound term would be a sequence of references to term resources rather than literal string values.

-- Jack Krupansky

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home