Having read Adam Jacobs’ The Pathologies of Big Data and Stefano Mazzocchi’s Data Smoke and Mirrors I found myself asking: what is the motivation for people to publish linked data, and in turn to consume it (sounds funny you think? well, just because the data is available doesn’t necessarily mean it is useful or actually used ;)
Ok, so let’s start with a nice statement from Adam’s ACM article:
Here’s the big truth about big data in traditional databases: it’s easier to get the data in than out.
Yup, I think I agree and I guess the same is true for Linked Data. There are tons of ‘cheap’ ways to publish in RDF (for example, regarding relational databases, we’re currently try to define a standard). However, there is still a need for high quality data and high quality links between the data items in order to allow the data to be used sensibly in applications!
Right, so my hunch is that for data providers there are a couple of reasons to publish their data in an open and easily accessible way, but I guess one main reason may be that due to providing the raw data, one can simply cut costs. Rather than writing a Web application that serves humans and offering an additional Web service/API (such as flickr or delicious did) , one can expose the original data directly via Linked Data and open up the possibility for others to develop cool applications on top of it (see also our recent work in this direction).
On the other hand, data consumers benefit from a single (RESTful) API with a uniform data model (RDF, in case it isn’t that obvious ;), which in turn enables simplified development of applications and allows the reuse of data (just like the BBC doesn’t have to maintain the artist and song data themselves anymore, but reuses MusicBrainz data).
Let me know – what is your incentive to publish/consume Linked Data?