The incentives to put structured data on the Web seem to slowly seep in, but why does it make sense to link your data to other data? Why to invest time and resources to offer 5 star data? Even though the interlinking itself becomes more of a commodity these days – for example, the 24/7 platform we’re deploying in LATC is an interlinking cloud offering – the motivation for dataset publisher to set links to other datasets is, in my experience, not obvious.
I think it’s important to have a closer look at the motivation for interlinking data on the Web from a data integration perspective. Traditionally, you would download data from, say, Infochimps or you find it via CKAN or via the many other places that either directly offer data or provide a data catalog. Then you would put it in your favorite (NoSQL) database and use it in your application. Simple, isn’t it?
Let’s say you’re using a dataset about companies such as the Central Contractor Registration (CCR) . These companies typically have a physical address (or: location) attached:
Now, imagine I ask you to render the location of a selection of companies on a map. This requires you to look up the geographical coordinates of a company in a service such as Geonames:
I bet you can automate this, right? Maybe a bit of manual work involved, but not too much, I guess. So, all is fine, right?
The next developer that comes along and wants to use the company data and nicely map it has to go through the exact same process. Figure what geo service to use, write some look-up/glue code, import the data and so on.
Wouldn’t it make more sense, from a re-usability point of view, if the original dataset provider (CCR in our example) would have a look at its data and identify what entities (such as companies) are there and provide the links to other datasets (such as location data) up-front? This is, in a nutshell, what Tim says concerning the 5th star of Open Data deployment:
Link your data to other people’s data to provide context.
To sum up: if you have data, think about providing this context – link it to other data in the Web and you make your data more useful and more usable and, in the long run, more used.
PS: the working title of this blog post was ‘As we may link’, to render homage to Vannevar Bush, but then I thought that might be a bit too cheesy ;)