Monday, July 6, 2009

Meaning and the Semantic Web

If we look simply at the term Semantic Web, we assume that it is a web that has something to do with semantics, and semantics essentially is about meaning. I think most (but not all) people can agree with that. The rub comes when we try to figure out what various factions mean by meaning. Some of the common meanings of meaning (semantics):

  • The association of type with data so as to permit a computer to understand what the data means at the level of which type a given piece of data refers to.
  • Denotation of which object is referred to by words or terms, such as in a dictionary.
  • Human-level understanding of the "meaning", potentially (or even usually) subjective, of words, terms, and statements.
  • Human-level "meaning" in a deeper, more personalized sense for an individual, how someone feels about or experiences a concept.
  • Rich knowledge as opposed to mere information or raw data, that permits the reader to infer a much wider range of truth and acceptable behavior.
  • Formal semantics of computer science used to define a domain and the operations permitted over that domain in such a way that is complete, consistent, unambiguous (accurate), and verifiable. Even that begs the question of whether a description of a domain on a computer accurately matches the real world as it exists or as we think we know it.
  • Artificial intelligence (or computational intelligence) applying formal semantics to attempt to approximate human-level understanding.
  • Simple tagging to point from a term (e.g., keyword) to an object to cue a computer program as to the intended "meaning" of a term.
  • Simple textual natural language, even if in simple HTML or simple XML can embody an incredible range of meaning, although full processing of natural language by non-human entities is still only a partially solved problem.

The question of what "semantic" means in Semantic Web now comes down to the issue of how much and what kind of meaning is embodied in the Semantic Web. Alternatively phrased, is there enough semantic meaning embodied in the so-called Semantic Web to warrant the term "semantic"? Some might contend that the existing conceptualization of the Semantic Web is too weak, while others might asset that all of the complexity of RDF is simply not needed for most contemporary applications that need to work with limited forms of "meaning." In the end (or at the beginning), the folks at W3C made a call and sincerely believed that their concept of the Semantic Web was a close enough match between what they believed was needed and what they believed could be done. Whether their views will hold up over time remains to be seen.

At a primitive, operational level the Semantic Web really is just a Web of data or a Web of Linked Data. The modifier typed is implicit in there, since that is where most of the power comes from. This operational view is not denied, and most agree with that characterization, even if they chafe or disagree with the term Semantic Web per se.

Others believe that raw XML (and related non-RDF technologies) by itself is more than sufficient to represent and manipulate the lion share of the kinds of "meaning" that people need today in their applications. Fair enough, as far as it goes. RDF has somewhat grander goals, but many contemporary applications can do just fine with a subset of non-RDF XML-based technologies. But none of that really is a robust argument against RDF enabling a richer form of Semantic Web.

The hard-core computer scientists probably do have a point that the current RDF-based technology stack still isn't quite up to snuff to qualify as a formal semantics, but even that is not a truly robust argument against billing the RDF-based Semantic Web as a major advance in introducing semantics and meaning into the Web of Linked Data. Yes, the computer scientists can reasonably argue that we can and should do better to produce a true semantic web, but once again that is not a great argument to withhold the "semantic" label per se. Sometimes you can make better progress with your known bird in hand than spend too much effort pursuing another bird or two in the bush. Some might claim that alternative approaches are less risky, but such matters can be debated endlessly without resolution. Sometimes it is better to make rapid, informed decisions and run with them rather than to slow progress with an endless stream of second-guessed decisions. Or, who knows, maybe eventually there will be a "Version 2.0" of the Semantic Web which leapfrogs ahead of the current Semantic Web with a more robust sense of formal semantics.

Some of us would really like to see more of a Knowledge Web that goes well beyond merely linking together lots of typed data and it is not clear at all that the current RDF-based Semantic Web technology stack is indeed well-suited for that purpose, but even this is not a valid block to the use of the "semantic" label. One could also argue that a "knowledge" web needs more than "mere" semantics, including pragmatics and full-blown semiotics, but that certainly does not argue for withholding the "semantic" label.

More recently, a lot of the emphasis in the Semantic Web community is on Linked Data, Linking Open Data, and producing and populating a realistic Web of Linked Data. That is all fine and well and good, but once again does not by itself argue against the use of the "semantic" label.

My personal view is that all of these efforts are at heart attempts to increase the emphasis on meaning. Even if any given effort does not meet some impossibly high bar for the meaning of meaning, I do think it is the direction and intention of our efforts that matter. Sure, many of the current efforts focus simply on replicating basic data and information processing capabilities at Web-scale, but ultimately we are trying to get to the original Semantic Web vision of a comprehensive information infrastructure that software agents can use to automate a much broader swath of our manual tasks.

My other view is that the decision was made years ago and does have at least some valid technical and communication value, so we have more to gain by sticking with it than in jumping ship to some other term that may offer some short-term clarity but possibly at the expense of losing focus on the long-term vision.

Meanwhile, "meaning" can be found wherever it is stored, whether in RDF, RSS, XML, HTML, or raw text. Storing that meaning can be rather straightforward, but interpreting it is another story. Simple file structures have obvious advantages, but RDF is designed to be a long enough reach to give us some real intellectual leverage over non-RDF XML, but short enough reach that real applications are practical today, or at least in the not-too-distant future.

-- Jack Krupansky


