Data Lifecycle

2009-09-14

Though the following might seem obvious to some of you, I thought I take the time to write some lines about the data life-cycle on the Web and try to highlight some implicit assumptions and processes. We all know the old story: data itself may not be very exciting and people actually like applications rather than scrolling through endless tables or view a CSV file in a text editor. However, data is what ultimately drives applications, and, to a certain extend, our life.

When my family moved over from Austria recently, I experienced personally how much data is involved in our everyday life. Want to find a nice house? Searching for it requires quite some data (on both ends) such as location, prize range, number of rooms, etc. Have to register? Again, data is needed (insurance numbers, birth dates, etc.). Looking for a new car? New mobile phone contract? etc.

Ok, you get the idea. We need the data. It is not an end in itself, though. I want to relocate, buy stuff, sell stuff, find a new job (ahm, not really, right – this was just an example ;) and so on. For all this I need data. It is not precisely that I’m so much interested in the data, but what I can do with the data. See above.

Enough motivation. What’s the message? Well, so far (essentially the past 15 years) we have seen people using data on the Web. In services, in documents, etc. – traditionally it would look a bit like:

traditional-data-lifecycle

However, we can do better. Two key technologies enable us to get rid of a conceptually unnecessary component (the screen scraper) and offer data directly to the applications (while still serving humans the nice CSS-styled and Ajax-powered HTML pages) – one is a concrete RDF serialization called RDFa, the other is a set of principles, called linked data. So, what is possible with the above mentioned is something like:

wod-data-lifecycle

This is essentially a paradigm shift from consumer-pull (that is, using layout information in HTML to guess the semantics) to publisher-push (that is, the one who publishes the data along with the document explicitly declares what the data is and what its semantics are). All you need is a globally universal and uniform way to refer to entities (such as houses, cars, mobile phones, etc.), which turns out to be URIs, a way to move the data around (you’ve guessed it, it’s HTTP) and a common data model to structure your data (correct, we’re talking about RDF). How does this fit together? Well, the latter three technologies are the core of linked data, and RDFa is the way to deliver RDF in HTML. Sounds easy? It is ;)

Ok, enough theory. Now, two things to remember: first, this is not a vision or a dream. It’s reality. You can use it NOW. In your Web site, in your Web application. Second: it’s cheap. Just change your templates, which generate the HTML from the RDBMS or use a CMS which has built in support for it (for example, in Drupal you can already use it with some tiny configuration changes). And you can test and view the results: for example, using Google’s rich snippets test tool or, say, in a generic Web of Data browser.


What else?

2009-07-20

So Paul asked recently: Does Linked Data need RDF? If you drink a certain sort of coffee, I guess you are familiar with my answer: What else? ;)

Seriously. Let’s step back for a second and try to work through to the core of the issue from a totally different angle.

Compare a set of predefined, fixed terms for certain domains, easy to use, etc. with a flexible and generic (hence, maybe, a bit more initial effort required) approach for annotating data, that is structured data on the Web. Sounds familiar? You’re right. I assume that you are aware of the old discussion around microformats vs RDFa, right? So, there we go …

Now, if one looks closer into the HTML 4 spec, one finds a bunch of link types, such as next, help, section, etc.; I’m gonna pick two, IMO, important sentences from there:

User agents, search engines, etc. may interpret these link types in a variety of ways. For example, user agents may provide access to linked documents through a navigation bar.

Ah, so the targeted consumer of the link is indeed a machine, not a human in the first place. Further:

Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types.

Ok, so there is a sort of extensibility mechanism defined in the HTML 4 spec as well. Very well! Or?

An analogy might help now to understand the point I’m trying to drive home, here. If you think back to microformats vs. RDFa, the same can be said about HTML 4 link types vs. RDF(a) …

HTML 4 link types as of section 6.12 of the spec are essentially the poor man’s semantic links, directly available in HTML. They are targeting machines (not human users in the first place), but are predefined in a sense and quite limited.

If you agree up to here by and large, then the question is really: what is the alternative? What technology out there, deployed, with community support, a set of tools available, etc. is available to represent, in a generic way (needed to write generic parser), any sort of typed link between two entities on the Web?

RDF.

What else? ;)


Note: credits go out to Juergen Umbrich with whom I discussed that issue yesterday evening and who inspired me writing the post …


Toying around with (embedded) WebAccessControl

2009-05-18

So TimBL has provided a nice write-up on a Web of Data version of a simple authorization scheme and protocol called WebAccessControl (WAC). It includes a draft of a vocabulary and a protocol (see also open issues with it). I thought it might be nice to have a visual representation of the schema and hence fired up my OmniGraffle app, yielding:
wac-acl-vis
There are already some first implementations for WAC (for example Joe Presbrey’s Apache mod). Actually, I was pondering first to implement WAC in PHP, but then Melvin Carvalho pointed out that this is in the pipe for foaf.me anyway (I might contribute to that, let’s see ;)

Finally I ended up toying around with WAC in RDFa. The result is WACup, an experimental embedded-WAC explorer/viewer for RDFa-marked-up WAC policies:
wac-up-screenshot

Now, how about taking this idea further? Say, you have generated a WAC file in RDF/XML and want to inject that into the DOM? You could dynamically decorate certain resources and people (using an advanced version of WACup) would have an idea what they are allowed to do with it.

Comments? Ideas? Feature requests? I’d love to hear your opinion on this topic!

UPDATE: If you want to learn how this fits into the big picture regarding a write-enabled Web of Data, have a look at http://esw.w3.org/topic/WriteWebOfData


Exploring linked data inline

2009-04-30

So there are a couple of great RDF browser out there already such as Tabulator and ODE or VisiNav, but what I haven’t seen so far is what I call inline-browsing. The before mentioned browsers all more or less take the entire RDF graph and render some sort of tabular or other view (time line, map, etc.).

However, RDFa gives you the possibility to take the context of an RDF statement into account. Take for example a longish FOAF document. The HTML structure already gives you a nice hint where what is discussed. You might be only interested in the social network one has or the contact details. This contextual information is lost when one switches over to the pure, global RDF view.

Well, last Sunday I sat down and hacked – as a proof of concept – lidaman, an inline linked data browser for RDFa. You can integrate it into your site using the source or install the bookmarklet and play around in the sandbox. Looks like following, then:

lidaman screenshot


Alternative approach to escape the ‘non-information’ resource dilemma

2009-02-19

As just posted to the TAG list, my ‘attempt to defined non-information resources without using non-information resource‘ – I’d love to learn about your opinion here, in case you are not yet subscribed to the TAG mailing list ;)