Thursday, October 2, 2008

Defining compound terms for acronyms

So far in my little acronym experiment, I defined a compound term simply as a string which happened to be a sequence of words. I actually started a separate experiment to look into defining a mini-dictionary or glossary of words and then use URI references to those XML resources in the definition of a compound term, but I ran into some issues that I was unable to resolve, so far. I may come back to that side experiment later, but it may become moot since I think the real solution is that each compound term should itself be a discrete XML resource and the acronym resource should simply tie the aconym term to the XML resource for the compound term.

I have not figured out all of the details yet, but rather than the acronym term "ABC" be defined as the string "Agent-Based Computing" or even the sequence of references to the XML resources for the individual terms "Agent-Based" and "Computing", the definition would be a single reference to a distinct XML resource for agent-based_computing.

Similarly, the definition for the acronym term "RSS" would be a collection of three references to distinct XML resources for really_simple_syndication, rich_site_summary, and rdf_site_summary.

I have not yet worked out the details, but I think I need to construct a standalone XML schema for a compound term, or maybe have the concept of a compound term glossary which is a list of compound terms relevant to a particular domain or subdomain. So, some compound terms could be represented as a single compound term in a single XML document, or a project could collect all of its compound terms into a glossary. There are pros and cons to both approaches.

The only problem here is that it introduces a separation between the abstract compound term for an acronym and the text of the words from which the individual letters of the acronym are derived.

One solution is to include both the text definition and the XML resource reference. Or, if the text of the compound term is included in the XML resource definition for the compound term then it can be obtained indirectly.

Or maybe the process by which the text of the acronym was derived is simply historical and is not strictly needed to operate at the purely semantic level.

Another approach is to actually decompose the words of the compound term and represent them in a structure that is organized by the sequence of letters of the acronym term. This structure would be kept with the acronym even though there is also a direct reference from the acronym resource to the XML resource for the compound term.

Incidentally, I already have a lot of resources on the Web for compound terms and acronyms, but they are in text in HTML documents rather than in XML. I will give some throughts as to how I might want to organize those compound terms and how to split the existing HTML into raw XML and presentation HTML that feeds off of that XML. There are links between many of my compound terms, which would mean XML resource references in the XML as well as synthesized HTML links for the presentation of the compound terms.

OTOH, it was not my intention to dive into how to solve the problem of representing full-blown term and compound term definitions at this time, but rather to tackle the simpler problem of acronyms. I need to figure out which portion of the problem to carve off to continue work on acronyms.

-- Jack Krupansky

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home