Semantic Web challenges 2009
Here are my current thoughts about the challenges facing the Semantic Web, vintage January 2009:
- Mind the Gap
- Thesis: There is a dramatic semantic gap between how users think and communicate about knowledge and the mechanisms that the Semantic Web supports for organizing knowledge.
- Superset of the semantic data mining problem
- How to jump from comfort with natural language to comfort with the Semantic Web
- Extent to which the user "sees" the Semantic Web as opposed to the Semantic Web simply being more power "under the hood" in a completely transparent manner
- Mind the Gap II
- How do we map and transition between natural language and the Semantic Web
- How to represent natural language in the Semantic Web
- Concepts, statements, reasoning, processes, prose passages, stories, outlines
- Semantic search engines
- Not just raw text, semantic inferences as well
- No single best form for database, need open access to create specialized databases
- Inference Broker
- Need for inference brokers to mediate between creators of knowledge and users of knowledge. Due to:
- Desire for privacy
- Protection of intellectual property
- Massive scalability requirements - divide and conquer
- Division of labor, factoring large problems into smaller problems
- Social structure of knowledge
- Individuals have only some of the puzzle pieces of knowledge
- Propositions of uncertain classification
- Social groups aggregate and classify knowledge
- A medium for intelligent agents
- Software agents can act more intelligently with a richer, knowledge-centric information stream
- Statements that are not strict, objective facts
- Personal facts, opinions, speculation, gossip, questions
- "Creations" - text, graphics, images, audio, video [? Separate challenge?]
- False statements
- May be outright lies, deceptions, misunderstandings, misstatements, changed information
- Medical record difficulties
- Pen on paper still most convenient for input
- Quick human scan of paper still most convenient for browsing
- Input decision process still far too intrusive
- Semi/un-structured data still far too inconvenient
- Distributed resource storage
- Extremely diversified storage to assure timely and efficient access
- Needs to be part of net infrastructure that is automatic and not subject to human whim and error
- Robust personal and organization identity, as well as roles and interests
- Authority and provenance identification and tracking
- The cost of knowledge engineering, especially maintenance and testing
- Who can really afford it?
- Semantic matching challenges
- Apparent differences that are easily bridged by a human
- Subtle or apparently insignificant distinctions that a human would say are too significant for a match
- For example, improper reuse of a resource for a different "meaning"
- Incomplete matches due to cultural differences
- Concept matches but with differences in contractual commitments (for services)
- Distributed semantic matching/mapping services
- Manual creation of libraries of semantic "logic" services for bridging semantic gaps - hide the details for how to get from "A" to "B"
- Support for time and version dimensions of information