The Semantic Abyss - Plumbing the Semantic Web: What color is an apple?

I have been trying to think about how to encode even relatively simple human knowledge in simple RDF triples and what issues arise. What could be simpler than... an apple and its color? Sure, some things are simpler, but so much is much more complicated.

In a simple, toy system I might define a class of objects called "fruit", a sub-class called "apple", and have instances of the apple class. Simple enough. I might have a "color" property. Simple enough. Hmmm... but what are the values of color? A literal string like "red"? A numeric set of RGB values? Shades of primary colors? Add another object of class "color", and push out the same questions to the propertiesof that class? So, in my simple, toy system, I would now have objects of class "apple" each of which has an associated "color" object. Although, I am not so comfortable saying that basic properties must be promoted to the level of objects even if having all values be objects may be a better system architecture. This is starting to be a lot of complexity for simple things.

One question: Does each object have its own color? Sure, that makes sense? But, shouldn't I be able to ask questions about objects of that class in general? Sure. Now, there are two ways to go about that: 1) do a statistical analysis of all instances of the class and then summarize the results, possible as a histogram or something like that, or 2) contrive an abstract rule that generally describes what the population of the class would be, even if that is only an approximation. I might simply want to know "generally", or "typically", or "as a common case" what color are apples. Some are yellow, some green, but many are red, so how can I represent that information as a "rule" rather than have to do a massive data collection effort and sift through the results?

But even for apples that are "red" or "green" or whatever, rarely would they be exactly and perfectly one single color. They may be "reddish" or "greenish" or "mostly red" or "mostly green."

In fact, rather than an individual apple having a specific color, you can come up with a wide range of colors by sampling various points on the object's surface, each of which can have its own color. Once again, we can do a massive amount of data collection and then look at distributions of values and highlight dominant values or average values. And that is just for one instance of the class and we would like to do similar analysis across the population of the class instances. And this is just for a relatively simple object. Once again, we would like to come up with relatively simple rules to describe the class and its instances.

The general problem here is not simply how to encode information about objects and classes, but in what forms do users want to examine that information? What is the user's context? Do they want a simple, general answer? Do they want a simple, general, and specific answer that may not be technically accurate for individual objects (e.g., "Most apples are red."), but nonetheless be generally useful? Maybe they want that statistical summary. Maybe they want that vaguer but more accurate simple answer, "reddish." Maybe they want that full, sampled image with its discrete values for their own analysis. Or maybe they have a reference color and they simply want to know if it is an "acceptable" or even "normal" (not "rotten") value for an apple.

Maybe there are a range of generic "query formats" for fetching object and class properties that can be shared across all objects and classes or at least over broad classes of objects. So, one of these generic formats could be chosen by the user and the actual property value(s) would be transformed to match the requested format.

And maybe they just want to start somewhere with a basic answer and then examine it and drill down for more detail or more abstraction on their own as they see fit.

Maybe the user can specify a "degree of specificity" in their request and that would guide what specific form the returned property value would take.

And now we go on to other classes of objects and their "color" and we wonder to what extent we can compare or use colors across those disparate classes. Is a particular car the same color as objects of the class apple?

I am probably only scratching the surface, but these are some of the issues with trying to represent human-level knowledge in a technology world where representing even only basic information is still a real challenge.

Hmmm... I wonder if developing functionally complete ontologies for apples and colors may be even more of a challenge than even a dozen doctorate dissertations? Toy ontologies, no problem; human-level knowledge, that's a harder nut to crack.

-- Jack Krupansky