Monday, February 16, 2009

Refinement and expansion of terms and concepts

Terms and concepts tend to be used rather loosely when a new field of interest is fairly young or poorly understood. That is to be expected. But as people drill down into more detailed examination of concepts they tend to refine terms. Similarly, as they find commonality and pursue application of concepts they expand the range of terms.

Refinement tends to bring concepts and terms into sharper focus, sharper and narrower than the pioneers required for their primitive needs.

Expansion recognizes that concepts have a greater utility and greater variation than many pioneers may have recognized.

Refinement can also recognize that a concept or term may be relatively generic or general and that there is value in specialization or subsetting of a concept or term. The specialized concepts essentially fan in to the more general concept.

Expansion can also recognize that a concept or term can be supplemented to increase its utility for certain forms of application. The more general concepts essentially fan out to the supplemented concepts and terms.

Refinement implies a many-to-one relationship of the refinements to the general concept. Alternatively, there is a one-to-many relationship between a general concept and its refinements.

Expansion implies a one-to-many relationship from the general concept to the expanded concepts. Alternatively, there is a many-to-one relationship between expanded concepts and their general concept.

-- Jack Krupansky

Sunday, February 15, 2009

The difference between truth and fiction is that fiction has to make sense

There was an amusing aphorism about truth and fiction in the new movie The International (with Clive Owen and Naomi Watts.) I may not have the exact wording, but it is roughly:

The difference between truth and fiction is that fiction has to make sense.

(Or maybe it was "There is a difference between truth and fiction -- fiction has to make sense.")

That sounded like it was probably a noteworthy quote from somebody, so I did a Google search. Mark Twain's name popped up a few times with various wordings. I did another search using his name and found these two quotes on BrainyQuote.com, so they are probably the definitive quotes:

It's no wonder that truth is stranger than fiction. Fiction has to make sense.

Why shouldn't truth be stranger than fiction? Fiction, after all, has to make sense.

A similar quote is attributed to Rosten, Leo:

Truth is stranger than fiction; fiction has to make sense.

And a similar quote is attributed to Tom Clancy:

The difference between fiction and reality? Fiction has to make sense.

My suspicion is that the film used Clancy's version. If I ever meet Clancy, I'll ask if he "borrowed" from Twain's adage.

Finally, Alex Lane asserts that Twain's adage is "roundly refuted" by the popularity of The X-Files.

So, can we use the fact that a proposition "makes sense" as a criteria for judging truth or lie, fact or fiction? If not, what good is it for us to obsess over whether anything "makes sense"?

-- Jack Krupansky

Tracking the evolution of meaning

Even the dictionary is not completely static and engraved in stone. In addition to the appearance of new words, old words can take on new meanings and cease to necessarily connote old meanings. Over time, the editors of dictionaries try to track the evolution of the meanings of words and phrases in both written and spoken language. Even when the dictionary is quite clear and most people solidly recognize the "official" meaning of a word, there will always be outliers, renegades, and revolutionaries (evolutionaries?) who insist on redefining words to have meanings of their own choice or "context." Dictionary editors do a fairly good job of tracking and reporting the evolution of meanings of words. Enter the Semantic Web.

The Semantic Web is not about natural language per se, but there is an intention to represent or at least indicate real-world concepts using URI resources and inferences.

There was an interesting email thread on the W3C Semantic Web email list triggered by an email from Jeremy J. Carroll, Chief Product Architect at TopQuadrant with the subject line "live meaning and dead languages." Jeremy opined that:

In terms of meaning on the web, I see that the web as a place where the life world is produced, by active extensions of our linguistic apparatus. I hence have an aversion to techniques and technologies that somehow pretend that meaning on the web, and in particular the semantic web, should or could be made static and somehow lifeless. So, I have difficulty seeing the meaning of any URI as univocal or fixed or even particularly well-defined. This leads to some hesitation concerning systems of definitions and axioms built on top of such univocity.

I think this worry becomes more so as axioms and systems of axioms become more complicated. (I just about see similarities between OWL2 and the Shorter Latin Primer I had at high school).

A term which is too tightly nailed down in its relationship to other terms has been dug into an early grave. Having fixed its meaning, as our world moves on, the term will become useless.

The trick, in natural language, is that the meaning of terms is somewhat loose, and moves with the times, while still having some limits.
This looseness of definition gives rise to some misunderstandings (aka interoperability failures), but not too many, we hope.

So I wonder, as some people try to describe some part of their world with great precision, using the latest and greatest formal techniques, just how long that way of describing the world will last. Maybe there is a role in such precision in allowing us to be clear about differences of opinion --- but it doesn't seem to me to be a good foundation for building knowledge.

He tells us that his thoughts were in part inspired by his recent reading of the book Emptiness & Brightness by Don Cupitt, from which he quotes:

By language, I mean the dance of signs, the continuous process of symbolic exchange between people, the humming communication network of which the human life world consists. I mean also to invoke the vast strange and multi-dimensional world of linguistic mean-ing -- and I am hyphenating mean-ing, like be-ing, because <em>mean-ing is a process too</em>. We need to make this point because for so long European intellectuals studied only dead languages -- Latin, Greek and Hebrew -- and failed to grasp the way the transactions of life are carried out and the life world is produced and formed by the <em>motion</em> of living language.

The book is (of course) available at Amazon:

There ensued a long discussion on the email list, including this issue of the distinction or disparity of the Semantic Web and natural language. This unresolved aspect of the Semantic Web will continue to haunt the practical application of the Semantic Web until somebody comes up with a model to transcend "Web" meaning and human meaning. Meanwhile, practitioners will continue to invent all manner of contrived methods for pretending that the vast gap between the two does not exist.

My immediate reaction to Jeremy's original email, sent directly to him, was:

That is why the Semantic Web is based on URIs rather than "keywords" -- as "meaning" evolves over time, people can simply construct new URIs representing the same natural language text but with the new "meaning." Sure, there is always the problem of misuse of URI when the associated natural language text "matches" but the meaning is not aligned with the real world context, but that is always going to be true in any language system, natural or non-natural. Over time, people can gradually detect "meaning misalignment" (or even "suspected meaning misalignment") and add knowledge of the perceived misalignment, so that the perceived strength of any inferences can be reduced to reflect the ambiguity of any inferred meaning.

In summary, we have two big problems here: 1) representing real-world meaning in the Semantic Web, and 2) tracking evolution of real-world meaning in the Semantic Web.

There are at least four distinct forms of variation in meaning that need to be tracked:

  1. Meaning evolves over time, either to take the meaning in a different direction or simply to refine or expand the existing direction.
  2. Difference camps or contexts have distinct interpretations.
  3. Different individuals interpret and use terms or concepts differently.
  4. Obsolete terms and concepts which have been superseded with distinct, newer terms and concepts.

-- Jack Krupansky

Thursday, February 12, 2009

Sarcasm, satire, truth, and lies for semantic data mining

Although semantic data mining has a lot of potential, it is quite a minefield of tricky issues. Even if we successfully filter, say, a blog post or purported article on a Web site into succinct statements, we then have the issue of determining the veracity of those statements. That is difficult enough in its own right, and then you have sarcasm and satire, where statements are being made that are known by the author and most human readers to not be the actual opinion of the author, but superficially do indeed appear to be explicit statement of belief by the author.

In essence, statements using sarcasm and satire are inherently "lies" in a superficial sense, but for most human readers they certainly do not betray any intention of misleading the reader.

An immediate application is for semantic data mining applications that seek to uncover brand reputation issues. For example a sarcastic product review read only superficially would have the reputation 180-degrees wrong. A wiseacre might express lavish, albeit sarcastic, praise for a poor product that he despises or withering, albeit sarcastic, criticism of a great product that he personally admires (maybe simply to tweak the insufferable zealous fans of the product.)

Still, there is value in recognizing the sarcasm and satire, even if a particular application (brand reputation) does not need it.

-- Jack Krupansky