Where is the Semantic Web?
Quite a few people and organizations have been busily slaving away on the development of the Semantic Web for a number of years now, so where exactly is the Semantic Web? Not what stage of development it is at, but where do we go to find it? At a simplistic, operational level the Semantic Web is fragmented and scattered over a significant number of Web servers all around the world. If you know where to look, you can find bits and pieces here and there. The bottom line is that it is still too early in the development of the Semantic Web to think of it as one monolithic (although distributed) "thing" the way we think of the traditional World Wide Web.
In truth, the structure of the Semantic Web is really not a lot different than the existing Web. Both consist of files stored on servers that run Web server software and both are based on hyperlinks from one file to another.
But, if you did not know anything about the content of the current Web, where would you start? There actually isn't a logical answer since there is no master "root" of the Web. Sure, you could consider Google to be the place to start, but how would you even know about Google and even Google doesn't know everything about the Web, at least in a form that a user could make any sense out of.
Back in the early days of the Web (vintage 1994 or 1995) the "answer" was one of:
- Your Web browser was pre-configured with a "home" page that had a bunch of links to interesting Web pages.
- Somebody gave you an explicit URL which you carefully typed into the Web browser address box, or copy and pasted the URL from an email message.
- You browsed the Yahoo "directory" of registered web sites, including its "What's New" page.
- You used the Lycos search engine from Carnegie Mellon University to search for keywords and then browsed through the results to select a web page. Alta Vista, and a number of other search engines came along, and eventually Google joined the fray.
- Once you "land" on one Web page you can follow links from that page to a number of other pages. Rinse and repeat and you could quickly navigate "all over" the Web. Or at least it seemed as if you were navigating "everywhere", although in actuality you were viewing only a very tiny portion of the vast Web, even in those early days.
- Paper trade publications and even the traditional media began to review and highlight Web sites and Web resources. Eventually those publications opened shop online on the Web with the text of those articles and the links to those Web sites and resources could be clicked to quickly navigate.
- Businesses advertised their Web addresses in magazines, newspapers, TV, and even billboards, as well as business cards and brochures.
- Gradually, a number of Web portals emerged which endeavored to provide you with dense snapshots of portions of the Web that the authors imagined that you would find useful - news, sports, weather, finance, entertainment, etc.
- Google introduced (or at least popularized) the concept of ranking search results more highly based on popularity or the number of inbound links for each Web page. This allowed users to find higher quality and more relevant Web pages with far less effort.
- Web advertising emerged, providing another technique for informing the user of Web pages that they might find of interest.
- Search engines began "crawling" and indexing ever-larger portions of the total Web, making it more likely that if a Web page existed, then the user could find it if they only had the proper combination of keywords.
- Web site content developers put an interesting amount of effort into soliciting other Web sites to exchange links to provide more paths to their sites as well as to boost their "Google juice" to get a higher ranking in Google's search results.
- Search Engine Optimization (SEO) and Search Engine Marketing (SEM) became full-fledged "disciplines" to increase the likelihood that users would "find" targeted Web sites.
- Web 2.0 emerged with blogs, spaces, and various social media and social networking sites and technologies which enabled mere users and a wide range of professionals to rapidly generate their own content, including links to content that they found interesting.
- Highly specialized Web sites (including Web 2.0 sites) emerged that catered to advising users what they might find interesting, including TechMeme, TechCrunch, Digg, StumbleUpon, and Twitter.
That's a brief summary of where we are today with the traditional Web in terms of how users can view the available content and answer the question "Where is the Web?" In short, there are plenty of "arrows" pointing users to an interesting subset of the total World Wide Web.
Unfortunately, the Semantic Web does not have this kind of rich support infrastructure, yet.
Sure, you can do a search for "Semantic Web" in Google, but mostly they will get you resources that describe the Semantic Web and its technologies, but will not point you to the Semantic Web itself.
There is a foundation question of the extent to which mere users would even want to know anything about the Semantic Web since it is all about data rather than presentation that users are used to with the traditional Web. Instead, it is applications and application developers who "need to know" where the Semantic Web data resides. Still, application developers do need a lot of the kinds of tools that are available for traditional Web site developers to find what is available on the Semantic Web that they can use. The fact that the Semantic Web architecture encourages code to be able to discover resources directly only makes the problem more difficult, and more interesting.
Some might assert that the Semantic Web should be completely invisible to users, but they are promoting a view that access to data should be controlled by various gatekeepers. In contrast, the view of open data, such as the Open Data Movement is that there should be no gatekeepers to prevent or enforce selective filtering of access or filtering of the data. Over time, developers will develop better and better tools that will allow even users to manipulate complex data as directly as they desire. We aren't there yet, but the vision is there. Sure, there will still be plenty of need and demand for ever more-sophisticated tools for filtering and presenting data, including so-called mashups for combining data from many sources, but the emphasis is still on transparency so that the user can still discern where the data really came from. No matter how finely or richly data is presented, users should be always be able to do their own mashups and filtering of data, as they see fit. The bottom line is that users should have direct access to the data of the Semantic Web, and hence that the Semantic Web must be visible. But, Semantic Web data will also in some cases be integrated with traditional Web applications so that users may indirectly "access" the Semantic Web without being aware that the Semantic Web is being accessed or that it even exists at all.
Another model is that the Semantic Web would be more of an on-call phantom, lurking in the background, but always available to be brought to the foreground if and when the user desires. Maybe the user will generally see a more traditional Web page interface, but occasionally drill down to examine the data more closely. For example, a Web page might present a conclusion, but the user may want to see the justification or provenance for that conclusion.
Still, even if the user does occasionally wish to see actual data, in general the Semantic Web should vanish into transparent ubiquity, meaning that it is always there, always everywhere, but generally is effectively invisible. But even if that is the case, users will on occasion still want to know where the data is and how to access and use it.
Eventually, as the Semantic Web does in fact become ubiquitous, it will in fact merge with the traditional Web so that there will once again be only one Web, but there will still be the conception of the Web of data that lies beneath the surface UI and presentation layer.
For now, how do you find out what is available on the Semantic Web? I'll summarize some of the current techniques:
- Subscribe to various Semantic Web email lists and simply read about Semantic Web resources as they are discussed. In some cases projects are mentioned and you can visit the project web site to find out where the relevant Semantic Web data resources reside.
- Ditto for trade journals and conference proceedings for the Semantic Web.
- A friend or colleague emails you a link to Semantic Web data.
- Using a data browser such as Tabulator, view a Semantic Web data source and then navigate data links much as you navigate links from a traditional Web page.
- Check out the wiki for the more recent Linking Open Data (LOD) community project. One wiki page lists many of the known Semantic Web Linked Data datasets for the emerging Web of Linked Data. There is a nice bubble diagram that shows the various LOD datasets and their relationships. This represents the best overall view of the Semantic Web, to date.
- People are beginning to create search engine-like "crawlers" to index the known fragments of the LOD portion of the Semantic Web as caches of the LOD cloud. For example, OpenLink Software provides this cache of the LOD cloud that supports text searches and queries.
- There are also some experimental semantic web search engines such as Swoogle.
- Various semantic databases, such as Freebase are beginning to provide Linked Data interfaces.
- Vendors are beginning to promote Semantic Web data that they are beginning to provide, either as RDF files or as so-called SPARQL endpoints.
- Some vendors are providing access to their underlying relational databases, once again in the form of SPARQL endpoints.
- With Linked Data, once you access one element of data you will generally have the opportunity to navigate to other, linked data, much as you would navigate the traditional Web by following links.
- RDFa permits the embedding of Semantic Web data within HTML Web pages, so that the traditional Web and the Semantic Web can in at least some situations be co-located.
- Google and Yahoo are in the early stages of experimenting with Semantic Web technologies, so we can expect that users will eventually be able to "find" interesting portions of the Semantic Web directly from our traditional search engines.
- Plug-ins for traditional Web browsers are available or under development or in the research stage so that users will eventually be able to "see" the Semantic Web directly from the Web browser.
That's what I have discovered so far and my search is only in the early stages. I am sure there are additional resources (about resources) that I have not yet discovered, and the "industry" is still in the early development stages, maybe comparable to the Web in, say, 1994, before Yahoo appeared on the scene and helped promote a user-friendly approach to promoting Web resources.
Some loose ends:
- How does non-RDF XML-based data relate to the Semantic Web and Linked Data (Linking of Open Data)?
- How do RSS feeds relate to the Semantic Web? RSS feeds are problematic in at least one sense: they are frequently only a severe subset of the available data, so they certainly do not provide full access to the underlying data.
- Data in online text files and non-W3C data formats, including CSV and spreadsheet files that users can directly access from the Web. Some sort of automated translation or "adaptor module" approach is needed so that such data can be accessed as if it were in a Semantic Web format.
Maybe one over-simplistic answer to my question is that the Semantic Web is spread all over the place, but you just need to know where and how to look for it.