Linked Data for RESTafarians

2009-10-09

So, you took the red pill? You’re a full blown RESTafarian brother? Good news for you, then. You’ll understand linked data in less then 30sec. Ok. Step by step. REST, understood as a ’set of constraints that inform an architecture’:

  1. Resource Identification
  2. Uniform Interface
  3. Self-Describing Messages
  4. Hypermedia Driving Application State
  5. Stateless Interactions

… and now read the linked data principles with your ‘REST goggles’ on:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

In the linked data, we use HTTP URIs for everything. For documents, but also for concepts or real-world entities such as people. Linked data provides a uniform (read-only) interface through HTTP GET. The messages are self-describing through RDF and RDF-based vocabularies and through the last of the linked data principles, what we have in the LOD cloud is a highly connected (or: interlinked) system.

As nicely described by Leonard Richardson and Sam Ruby in RESTful Web Services you design a RESTful (ROA) system in that you:

  • Identify and name your resources (using HTTP URIs),
  • design your representations (documents & data), and
  • link the resources to each other.

You’ll typically end up in a 3D design space such as the following (kudos to Cesare Pautasso and Erik Wilde):
REST-design-space

The same actually happens when you publish linked data, with some simplifications: due to the read-only characteristic of linked data you only have to worry about one HTTP verb (GET) and with RDF as the unified data model (based on your preferences and needs) you pick one of the RDF serializations (preferably RDFa, as it nicely integrates with HTML and hence allows you to serve humans and programs). When you have your data in RDF (or so ;) you’ll mainly find yourself worrying how to interlink it with other data on the Web. But this really is a huge benefit – finally enabling to use the Web as one huge database.

As an aside: I’m aware of the fact that we still need to sort out some issues along the way, both in the academia and in practice. However, I encourage people in both camps (RESTful yadayada and Linked Data rogues) to look beyond one’s own nose and eventually understand that there is only one Web and we all ‘live’ in it ;)


Data Lifecycle

2009-09-14

Though the following might seem obvious to some of you, I thought I take the time to write some lines about the data life-cycle on the Web and try to highlight some implicit assumptions and processes. We all know the old story: data itself may not be very exciting and people actually like applications rather than scrolling through endless tables or view a CSV file in a text editor. However, data is what ultimately drives applications, and, to a certain extend, our life.

When my family moved over from Austria recently, I experienced personally how much data is involved in our everyday life. Want to find a nice house? Searching for it requires quite some data (on both ends) such as location, prize range, number of rooms, etc. Have to register? Again, data is needed (insurance numbers, birth dates, etc.). Looking for a new car? New mobile phone contract? etc.

Ok, you get the idea. We need the data. It is not an end in itself, though. I want to relocate, buy stuff, sell stuff, find a new job (ahm, not really, right – this was just an example ;) and so on. For all this I need data. It is not precisely that I’m so much interested in the data, but what I can do with the data. See above.

Enough motivation. What’s the message? Well, so far (essentially the past 15 years) we have seen people using data on the Web. In services, in documents, etc. – traditionally it would look a bit like:

traditional-data-lifecycle

However, we can do better. Two key technologies enable us to get rid of a conceptually unnecessary component (the screen scraper) and offer data directly to the applications (while still serving humans the nice CSS-styled and Ajax-powered HTML pages) – one is a concrete RDF serialization called RDFa, the other is a set of principles, called linked data. So, what is possible with the above mentioned is something like:

wod-data-lifecycle

This is essentially a paradigm shift from consumer-pull (that is, using layout information in HTML to guess the semantics) to publisher-push (that is, the one who publishes the data along with the document explicitly declares what the data is and what its semantics are). All you need is a globally universal and uniform way to refer to entities (such as houses, cars, mobile phones, etc.), which turns out to be URIs, a way to move the data around (you’ve guessed it, it’s HTTP) and a common data model to structure your data (correct, we’re talking about RDF). How does this fit together? Well, the latter three technologies are the core of linked data, and RDFa is the way to deliver RDF in HTML. Sounds easy? It is ;)

Ok, enough theory. Now, two things to remember: first, this is not a vision or a dream. It’s reality. You can use it NOW. In your Web site, in your Web application. Second: it’s cheap. Just change your templates, which generate the HTML from the RDBMS or use a CMS which has built in support for it (for example, in Drupal you can already use it with some tiny configuration changes). And you can test and view the results: for example, using Google’s rich snippets test tool or, say, in a generic Web of Data browser.


Using Linked Data

2009-07-26

Ok, so finally the IEEE Internet Computing article on Exploiting Linked Data to Build Web Applications is available. Though this is a nice first step, much more is needed to advance the field. The goal is to enable people to actually use linked data in their Web applications, rather than ‘only’ publish datasets. Don’t misunderstand me here: it is a good thing to publish on the Web of Data, but ultimately data is meant to be used somewhere, right? Publishing linked data is not an end in itself.

To support this effort, I’m currently compiling a technical report in and for DERI’s Linked Data Research Centre (LiDRC) that looks at current examples of linked data-driven Web applications, gathers good practices and discusses the anatomy of a typical application (in the last part of the report issues and challenges are discussed, as well). So, one of the central contributions is a proposed concept for linked data-driven Web applications, which renders as follows:

A concept of linked data-driven Web applications

The proposed components read as follows:

  • A local RDF store, able to cache results and act as a permanent storage device to track users, etc.. We note that an RDF store such as ARC2 or Virtuoso is not a strict requirement, though often it makes sense to manage the RDF data in a native environment.
  • Some logic (a controller) and UI modules implementing the business logic, the User Interface (UI) and the interaction parts of the application. These components are not specific to linked data-driven Web applications, however typically required and found in the wild.
  • A data integration component, focusing on fetching linked data from the Web of Data, either directly from the LOD cloud or via Semantic Indexer such as Sindice or Falcons.
  • A republishing component that eventually exposes parts of the application’s (interlinked) data on the Web of Data. It is a good practice to republish the application’s data, hence providing again input to the LOD cloud.

I’d be happy to hear from you what you think about this proposal. Any architectural feedback is welcome!


What else?

2009-07-20

So Paul asked recently: Does Linked Data need RDF? If you drink a certain sort of coffee, I guess you are familiar with my answer: What else? ;)

Seriously. Let’s step back for a second and try to work through to the core of the issue from a totally different angle.

Compare a set of predefined, fixed terms for certain domains, easy to use, etc. with a flexible and generic (hence, maybe, a bit more initial effort required) approach for annotating data, that is structured data on the Web. Sounds familiar? You’re right. I assume that you are aware of the old discussion around microformats vs RDFa, right? So, there we go …

Now, if one looks closer into the HTML 4 spec, one finds a bunch of link types, such as next, help, section, etc.; I’m gonna pick two, IMO, important sentences from there:

User agents, search engines, etc. may interpret these link types in a variety of ways. For example, user agents may provide access to linked documents through a navigation bar.

Ah, so the targeted consumer of the link is indeed a machine, not a human in the first place. Further:

Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types.

Ok, so there is a sort of extensibility mechanism defined in the HTML 4 spec as well. Very well! Or?

An analogy might help now to understand the point I’m trying to drive home, here. If you think back to microformats vs. RDFa, the same can be said about HTML 4 link types vs. RDF(a) …

HTML 4 link types as of section 6.12 of the spec are essentially the poor man’s semantic links, directly available in HTML. They are targeting machines (not human users in the first place), but are predefined in a sense and quite limited.

If you agree up to here by and large, then the question is really: what is the alternative? What technology out there, deployed, with community support, a set of tools available, etc. is available to represent, in a generic way (needed to write generic parser), any sort of typed link between two entities on the Web?

RDF.

What else? ;)


Note: credits go out to Juergen Umbrich with whom I discussed that issue yesterday evening and who inspired me writing the post …


Streamed Linked Data ?

2009-06-08

Though I was not able to attend the 1st International Workshop on Stream Reasoning during ESWC2009 (as I was co-chairing our SPOT workshop) I did enjoy the workshop wrap-up and summary on the last day. Each workshop organizer had ten minutes to talk about how the workshop went, what the upcoming challenges are, etc.

Now, I was sitting in this session and thinking about what from the linked data perspective there is potentially in it. I know, it’s a very limited thing to do: for a hammer, everything looks like a nail, right? ;)

However, let’s for a moment put on our linked data glasses and see what we can find. I’ve earlier argued that microblogging is sort of linked data in disguise. Many more of these ‘micro-content’ things on the Web come to mind: news feeds in Atom or RSS, chat logs, one could even understand mailing lists as sort of slowly progressing, rather heavyweight streaming data sources, etc.

Looking at all these sources, they have a couple of things in common one needs to address in order to apply/use linked data on/with it:

  • addressing the temporal dimension: one needs to keep track of when something happened
  • addressing the order: it can be quite important to know in which order something happened
  • addressing the provenance: one has to keep track where something came from

There are for sure other issues I’ve not been thinking so far, but from the short list above one can already guess that we’ll have to invest some more resources and energy to address the streaming aspect from a linked data perspective.

Let me know your favorite solution to the issues above (some known proposals/work are beneath others: named graphs – hey, this should be in RDF core by now, right? – as well as Olaf’s great work on provenance).


Lightning Talks at ESWC09

2009-06-06

I had the pleasure to host the lightning talks at ESWC 2009. The slide sets are now available and soon the talks themselves will be available (including the very interesting discussions) via http://videolectures.net/ …

Some of the talks where quite controversial, most of them related to linked data in a sense – check it out!


Technology MalBestPracticing

2009-05-24

Reading RESTful Web Services by Leonard Richardson and Sam Ruby, it suddenly stroke my like thunder: yes indeed, it’s very often the case with technologies that they are (often unknowingly) abused in obscure ways, which then is often perceived by the community as good or best practice. So much generic introduction for explaining the title – let’s flesh it out ;)

Couple of years a go I used to develop Web applications using JSP and relational databases (RDB). One pattern I often found (and have to admit, did myself pretty much the same way): treating the RDB only as a dump store without exploiting the features it offers. So, you load whatever you need via some SQL command in the beginning, process it in memory and when you’re done you dump it back again into the RDB. Is this the way RDB are supposed to be used? Certainly not.

Then, as motivated by the RESTful WS book, HTTP naturally provides a set of methods for CRUD operations, however, certain so-called ‘Web’ solutions merely use HTTP as a transport protocol and redefine most of the logic in rather complex ways (RPC-style being one example, but also hybrids exist that partially use HTTP for reading, but define their own mechanisms to update resources).

Anyway, there seems to be a pattern and now I was wondering if we know about such MalBestPracticing in the RDF world as well. What comes to mind are the following (ok, very roughly, but feel free to add yours):

  • Using RDF in a closed-world setup: often seen and often seen failing. Whenever you have a closed-world application, that is, something that’s supposed to do a job in an environment you entirely control (Intranet, desktop, etc.) and there is no need to share/incorporate other data, using RDF is probably not a smart choice. You’re better off with the RDB of your choice and some hand-coded rules, both in terms of complexity and performance.
  • Thinking of RDF on the serialisation level. Yes, there are a couple of RDF serialisations such as RDF/XML, RDFa, Turtle, etc. but that’s not the point. If I want to I can put my RDF-glasses on and view (quite) everything as RDF, but one should think of RDF on the data model level, rather. The important point is that RDF provides a away to express structured data in a graph manner which happens to be the same as the Web from a morphological point of view.
  • No interlinking between data. Hu, that’s a heavy one. Publishing RDF without interlinking to other data in RDF out there. But to be fair, this has been properly addressed by TimBL in his LinkedData note and the community has picked it up since. Imagine HTML documents on all of the computers in the world … without a single hyperlink between them. Would you call that the Web? Certainly not. Believe it or not, this was more or less what we’ve been doing for more than six years or so in the Semantic Web.

So, what’s your favorite MalBestPracticing in the Semantic Web world?


Discussing POWDER and discovery mechanisms on the Web …

2009-05-21

My colleague Juergen Umbrich and I had a reading group on POWDER and related technologies yesterday here at DERI. There were some interesting questions and discussion around that, esp. regarding the involved costs for implementing such mechanisms, the use cases and the progress in this area.

The resource and metadata discovery domain seems all in all still in its early days and people with different background (just compare POWDER with XRD) should start talking to with each other.

What is your take on it?


Toying around with (embedded) WebAccessControl

2009-05-18

So TimBL has provided a nice write-up on a Web of Data version of a simple authorization scheme and protocol called WebAccessControl (WAC). It includes a draft of a vocabulary and a protocol (see also open issues with it). I thought it might be nice to have a visual representation of the schema and hence fired up my OmniGraffle app, yielding:
wac-acl-vis
There are already some first implementations for WAC (for example Joe Presbrey’s Apache mod). Actually, I was pondering first to implement WAC in PHP, but then Melvin Carvalho pointed out that this is in the pipe for foaf.me anyway (I might contribute to that, let’s see ;)

Finally I ended up toying around with WAC in RDFa. The result is WACup, an experimental embedded-WAC explorer/viewer for RDFa-marked-up WAC policies:
wac-up-screenshot

Now, how about taking this idea further? Say, you have generated a WAC file in RDF/XML and want to inject that into the DOM? You could dynamically decorate certain resources and people (using an advanced version of WACup) would have an idea what they are allowed to do with it.

Comments? Ideas? Feature requests? I’d love to hear your opinion on this topic!

UPDATE: If you want to learn how this fits into the big picture regarding a write-enabled Web of Data, have a look at http://esw.w3.org/topic/WriteWebOfData


Exploring linked data inline

2009-04-30

So there are a couple of great RDF browser out there already such as Tabulator and ODE or VisiNav, but what I haven’t seen so far is what I call inline-browsing. The before mentioned browsers all more or less take the entire RDF graph and render some sort of tabular or other view (time line, map, etc.).

However, RDFa gives you the possibility to take the context of an RDF statement into account. Take for example a longish FOAF document. The HTML structure already gives you a nice hint where what is discussed. You might be only interested in the social network one has or the contact details. This contextual information is lost when one switches over to the pure, global RDF view.

Well, last Sunday I sat down and hacked – as a proof of concept – lidaman, an inline linked data browser for RDFa. You can integrate it into your site using the source or install the bookmarklet and play around in the sandbox. Looks like following, then:

lidaman screenshot