Wednesday, February 27, 2008

Modeling the real world, approximations, and heuristics

As with traditional computer software, the Semantic Web provides capabilities to model the real world. The Semantic Web does in fact provider a much richer set of modeling capabilities than have traditionally been available to software developers, but it is still rather difficult to model many aspects of the real world, especially human institutions, culture, and social structures with anything better than a relatively low level of fidelity. That is not so much a negative statement about the architects of the Semantic Web as a very positive statement of the vibrant richness of human life.

At its heart, any model of any portion of the real world is simply an approximation of what exists and transpires in the real world. The Semantic Web does give us better tools for enhancing the fidelity of that approximation, but does not eliminate the semantic gap between the the approximation of the model and the actuality of the real world.

Many users find computers frustrating because on the one hand they have such promise and such immense capabilities and seem to do so well at many things, but then fail at even a lot of simple tasks. Web search is a great example. Half the time our favorite search engine actually seems to have read our mind and gives us exactly what we want with only minimal input, and then the other half of the time the search engine is completely unable to satisfy our queries no matter how hard we try. Why this dichotomy? The answer, in one word: heuristics.

Heuristics are techniques that computer software designers use as shortcuts to approximate a significant fraction of the "right" answer to a problem. The beauty of a great heuristic, like the beauty of any great shortcut is that it gets us to where we want to go with minimal effort. The downside of a heuristic, like the ugliness of any shortcut is that they do not always work or work as well as we would like and don't always help us get to all destinations that we seek. A great search engine employs a vast library of heuristics, but such libraries are finite. The great mystery is not that search engines fail us so frequently, but that they work as well and as often as they do.

Alas, the Semantic Web is not a magic bullet that will solve all semantic issues between computers and people, but it is a framework for modeling the real world and using heuristics to increase the fidelity of our software approximation of the real world.

The Semantic Web does have a lot of great promise to advance the state of affairs in how we model the real world, but we do have to remain cognizant of the fact that we will have to continuously "mind the gap" between reality and models of reality.

-- Jack Krupansky

Tuesday, February 26, 2008

Claim this blog in Technorati

My apologies, but this post is needed to assure that the Technorati blog search engine finds this blog.

Technorati Profile

-- Jack Krupansky


The Semantic Web is not about real-world meaning

Although semantics is all about meaning and the Semantic Web is a bold attempt to layer representation of meaning onto a web of raw information, the Semantic Web is not about real-world meaning at all. Rather, the semantics and meaning of the Semantic Web is the same type of semantics that we encounter in traditional computer programming languages and relational databases.

For example, the most common example of the Semantic Web in practice, the ubiquitous RSS web feeds for blogs, are equivalent to a sequence of database records or rows with each record having a collection of fields or columns. The titles of blog posts are directly analogous to a column in a relational database. The summaries of blog posts would be another database field or column. There is a database meaning or semantics for blog post titles as discrete pieces of information, but it in no way attempts to comprehend or represent the intended human meaning of the words in the blog post title. Similarly, the bodies of blog posts would be another field or column. Even tags, where a lot of the meaning of a blog post is categorized, are analogous to records or rows in a relational database table, one per tag with an "id" to tie it to the blog post, but little or no attempt to tie the literal tag to an associated human meaning. Besides, the tags are usually assigned by the user on an ad hoc basis, with no Semantic Web verification that the body of a blog post really ties closely to the tags in terms of semantics that a person has in their mind. Yes, there is plenty of semantics and "meaning", in the traditional database sense, but not in the dictionary or encyclopedia sense of "what do you mean."

The Semantic Web does a credible job of utilizing metadata to associate meaning with data, but this is the meaning of the database world and not the meaning of the real world and real users.

This is the essence of the semantic abyss, the semantic gap between the internal meaning associated with computer data structures and the external meaning that real people keep in their minds and attempt to communicate through natural language speech, writing, music, art, and other cultural artifacts.

-- Jack Krupansky

In the beginning

The Web consists primarily of linked text documents, with very little semantic structure beyond the natural language text within the Web documents themselves. Search engines do a great job of finding documents based on keywords, but without regard to the true meaning of those words. Granted, specialized applications can be custom designed for the specific structured layout of some web pages (e.g., price comparison buying agents), or hard-wired for the service interfaces of specific web sites, but that says that meaning is in the eye of the beholder and not represented in the Web documents themselves.

The intent of the Semantic Web is to represent information in a common, structured format that can easily be processed by a wide range of applications, without those applications needing to be custom written for the specific data format or document source.

At least, that is the promise.

Alas, execution on that promise remains incomplete.

This blog is dedicated to exploring the premises and promises of the Semantic Web and how they can either be tied back to real users and real-world applications, or if there are gaps, exploring the nature of those semantic gaps and seeking approaches to bridge those semantic gaps with the ultimate goal of extricating ourselves from the semantic abyss.

I am optimistic that we can achieve success on at least some fronts, but at this stage I am unable to promise that "They all lived happily ever after."

But before we can make even modest progress, first we must gaze down into the semantic abyss.

-- Jack Krupansky