Everyone’s mouth is full, these days, with the Web 2.0. Dynamic, interactive, social, it has basically become some sort embodiment of a new and hip lifestyle. But while everyone is busy making a fuss about something that’s already old news, computer scientists are already thinking up the Web’s newest version. And its use of semantics might very well surprise you: it will empower machines to understand the meaning of the Web pages they are displaying.
Imagine a world in which your personal netbot assistant would crawl the vast expanses of the Web collating information for you, a world in which search engines would ask you questions to refine or disambiguate your searches so that their results would always be spot on, a world in which Web pages would be made on the fly especially for you. Stop dreaming: such things already exist (albeit limitedly) or they very soon will. Welcome to the world of the semantic Web — the Web 3.0.
The Web’s current limitations
As we know it today, the Web is full of limitations, and even though we might manage to sidestep them as best we can, they prompted computer scientists to start thinking of a way they could be overcome. The first obvious snag is that of information retrieval: imagine you were to key in the search term “jaguar” into any search engine, you would probably get —more or less haphazardly— results about the animal, the car or even a previous version of Mac OS X. Not feeling put off in any way, some might resort to typing “Panthera Onca” (the scientific name for the jaguar) but, in view of their initial search, they would then miss altogether the results in which the word “jaguar” did not appear.
This is not the only problem the Web has with information: it is for now also utterly incapable of extracting information from a picture (or a text displayed as an image, for example) other than from the words surrounding it. Indeed, only a human agent is presently capable of making sense of such things and of deriving new information via logical deductions.
Another limitation lies with the maintenance of complex websites pointing to lots of external content on the web: if any part of that content changes location (and therefore occasions a broken link), nothing today is able to redirect the user to the adequate place so that said content can be found.
A fourth utter deficiency of the Web today is the lack of personalization. Consider the ubiquitous Google: the search engine adapts to your behavior by analyzing previous searches you’ve made through it. Problem is, Google does not consider the context in which the search is made: you might —just this once— be looking up information to help your sister with her upcoming presentation, and will therefore not search the Web in the way you usually would.
The solution is rather simple: for meaning to be interpreted correctly by machines, it needs to be encoded into semantic metadata they can read. This will allow the Web to process information in an automated way, relate and combine data which would have seemed incongruous before, and deduce implicit information from existing knowledge, resulting in a global database containing a universal network of semantic propositions.
Describing contents semantically with RDF
This encoding of meaning can easily be achieved through straightforward processing of information with a few frameworks serving as guidelines. Consider the following simple fact: “John Doe’s phone number is 414-566-7733”. Represented in XML, this might yield results as diverse as:
<person name=“John Doe”>
<person name=“John Doe” phone=“414-566-7733” />
This is obviously a problem, as none of these is able to actually give a clear rendition of the link between John and the provided phone number. The solution lies in decomposing the fact very simply, according to a Subject + Predicate + Object triple: Subject=John Doe, Predicate=has phone number, Object=414-566-7733. This exact classification serves as the basis for the Resource Description Framework (RDF), a framework used to describe and represent simple facts in the semantic Web (http://www.w3.org/RDF/). Thus, facts can be connected to one another according to logical relationships, allowing for semantic exploration of content. The RDF framework is already implemented at websites such as http://dbpedia.org (the semantic version of Wikipedia) and will soon spread to other sites: the BBC music site (www.bbc.co.uk/music) uses automated information retrieval to generate pages about artists that are always up-to-date thanks to semantic information.
But even the RDF system is not enough for a computer to understand the world as we do. Take the example of any historical character, say Abraham Lincoln. Some biographical facts about him, represented thanks to RDF, might look like:
Abraham Lincoln, has birth date, 02/12/1809
Abraham Lincoln, has death date, 04/15/1865
(for more information about RDF and its actual syntax, see http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/)
While it might seem obvious to anyone on the planet, these two triples are not sufficient for a computer to deduce that Abraham Lincoln is dead. That is why another layer called OWLs and rules, governed by logic, needs to be added: it will ensure that the computer knows what constraints make up the world as we know it, namely here that having a death date means that one is dead, and that being dead absolutely excludes being alive.
Although we have come a long way from the days of static pages, there is still a lot to be done toward the implementation of semantics on the Web, but the possibilities are breathtakingly immense. In terms of content indexing for further use in non-linear contexts, the use of semantics could provide users with a unique and more natural experience, much closer to how their own minds operate.
This article was largely inspired by the lectures given by Pr. Harald Sack of the Hasso Plattner Institute at the University of Potsdam, Germany, in the course of his MOOC about the Semantic Web (https://openhpi.de/course/semanticweb).