Though the following might seem obvious to some of you, I thought I take the time to write some lines about the data life-cycle on the Web and try to highlight some implicit assumptions and processes. We all know the old story: data itself may not be very exciting and people actually like applications rather than scrolling through endless tables or view a CSV file in a text editor. However, data is what ultimately drives applications, and, to a certain extend, our life.
When my family moved over from Austria recently, I experienced personally how much data is involved in our everyday life. Want to find a nice house? Searching for it requires quite some data (on both ends) such as location, prize range, number of rooms, etc. Have to register? Again, data is needed (insurance numbers, birth dates, etc.). Looking for a new car? New mobile phone contract? etc.
Ok, you get the idea. We need the data. It is not an end in itself, though. I want to relocate, buy stuff, sell stuff, find a new job (ahm, not really, right – this was just an example ;) and so on. For all this I need data. It is not precisely that I’m so much interested in the data, but what I can do with the data. See above.
Enough motivation. What’s the message? Well, so far (essentially the past 15 years) we have seen people using data on the Web. In services, in documents, etc. – traditionally it would look a bit like:
However, we can do better. Two key technologies enable us to get rid of a conceptually unnecessary component (the screen scraper) and offer data directly to the applications (while still serving humans the nice CSS-styled and Ajax-powered HTML pages) – one is a concrete RDF serialization called RDFa, the other is a set of principles, called linked data. So, what is possible with the above mentioned is something like:
This is essentially a paradigm shift from consumer-pull (that is, using layout information in HTML to guess the semantics) to publisher-push (that is, the one who publishes the data along with the document explicitly declares what the data is and what its semantics are). All you need is a globally universal and uniform way to refer to entities (such as houses, cars, mobile phones, etc.), which turns out to be URIs, a way to move the data around (you’ve guessed it, it’s HTTP) and a common data model to structure your data (correct, we’re talking about RDF). How does this fit together? Well, the latter three technologies are the core of linked data, and RDFa is the way to deliver RDF in HTML. Sounds easy? It is ;)
Ok, enough theory. Now, two things to remember: first, this is not a vision or a dream. It’s reality. You can use it NOW. In your Web site, in your Web application. Second: it’s cheap. Just change your templates, which generate the HTML from the RDBMS or use a CMS which has built in support for it (for example, in Drupal you can already use it with some tiny configuration changes). And you can test and view the results: for example, using Google’s rich snippets test tool or, say, in a generic Web of Data browser.