//
you're reading...
Linked Data, voiD

On the usage of Linksets

Daniel Koller asked on Twitter an interesting question:

… are linksets today evaluated in an automated way?or does it depend on a person to interpret it?

Trying to answer this question here, but let’s step a bit: back in 2008, when I started to dive into ‘LOD metadata’ one of my main use cases was indeed how to automate the handling of LOD datasets. I wanted to have a formal description of a dataset’s characteristics in order to write a sort of middle ware (there it is again, this bad word) that could use the dataset metadata and take the burden away from a human to sift through the ‘natural language’ descriptions found in the Wiki pages, such as the Dataset page.

Where are we today?

Looking at the deployment of voiD, I guess we can say that there is a certain uptake; several publisher and systems support voiD and there are dedicated voiD stores available out there, such as the Talis voiD store and the RKB voiD store.

In our LDOW2009 paper Describing Linked Datasets we outlined a couple of potential use cases for voiD and gave some examples of actual usage already. Most notably, Linksets are used for ranking of datasets (see the DING! paper) and distributed query processing.

However, to date I’m not aware of any implementation of my above outlined idea of a middle ware that exploit Linksets. So, I guess one answer to Daniel’s question is: at the moment, mainly humans look at it and use it.

What can be done?

The key to voiD really is its abstraction level. We describe entire Datasets and their characteristics, not single resources such as a certain place, a book or a gene. Understanding that the links are the essence in a truly global-distributed information space, one can see that the Linksets are the key to automatically process the LOD datasets, as they bear the high-level metadata about the interlinking.

When you write an application today that consumes data from the LOD cloud, you need to manually code which datasets you are going to use. Now, imagine a piece of software that really operates on Linksets: suddenly, it would be possible to specify certain requirements and capabilities (such as: ‘needs to be linked with some geo data and with statistical data’) and dynamically plug-in matching dataset. Of course, towards realising this vision, there are other problems to overcome (for example concerning the supported vocabularies vs. SPARQL queries used in the application), however, at least to me, this is a very appealing area, worth investing more resources.

I hope this answers your question, Daniel, and I’m happy to keep you posted concerning the progress in this area.

About these ads

About woddiscovery

Web of Data researcher and practitioner

Discussion

6 thoughts on “On the usage of Linksets

  1. I have a data set that that attempts to map semantic representations of species each other. I had been using SKOS for this, but some think a specialized vocabulary might be better. This data set also has a void. See this rdf for an example: http://lod.taxonconcept.org/ses/mCcSp.rdf

    Posted by Pete DeVries | 2010-05-19, 10:50
  2. Thanks for the on-the-fly blog entry!

    Your input helps, especially as history and current status is reflected.

    Regarding potential use cases: I especially liked the visualization use case and I would add something like:

    You have an object at hand (e.g. a person, a gene, a location) and the enduser is interested in ,what can I do with this object (e.g. get in contact, get related deseases, get a route, get events at that location.) Linksets help in this kind of information retrieval, but solve only the object linking, the information, what e.g. a foaf:Person can do with geo:City could also be described. (even if this happens outside VoID: how could we document this kind of relation knowledge?)

    –> this would enable a small footprint approach to consume linked data with reduced coding demands. (end endusers might be able to collect the resulting information on their own, compare e.g. the meshup idea, communicated also by @kidehen ).

    You focussed on the consumption of linksets, however when we want to more data in LOD, we would need more users to provide links between data sources or even linksets. –> therefore a efficient (enduser compatible) discovery of existing data sources would help.

    So, I fully support your vision of dynamically plugging together data based on linksets, but especially regarding ‘how to code that’ I am not sure whether we have already everything needed.

    (Yes, also this answer is longer than 140 chars)

    Posted by Daniel Koller | 2010-05-19, 11:25
    • Daniel,

      Thanks for your reply. Concerning:

      “You have an object at hand (e.g. a person, a gene, a location) and the enduser is interested in ,what can I do with this object (e.g. get in contact, get related deseases, get a route, get events at that location.) Linksets help in this kind of information retrieval, but solve only the object linking, the information, what e.g. a foaf:Person can do with geo:City could also be described. (even if this happens outside VoID: how could we document this kind of relation knowledge?) ”

      Funny you should mention, I recently hacked together a demo concerning this idea, see [1] and the code is available as well [2]. I coined the term “a LOD cloud of interaction” (aLODdin) for it … what do you think?

      Cheers,
      Michael

      [1] http://lab.linkeddata.deri.ie/alodin/agent/
      [2] http://code.google.com/p/alodin/

      Posted by woddiscovery | 2010-05-19, 11:31
  3. Michael, I love the aLODdin concept! Whether for published works or datasets, we need to provide accessible mechanisms for users (and user agents) to answer the question, “What can I do with this?”

    More than a decade ago, colleagues and I envisioned something similar to this based on the DOI. In that scenario each third-party service endpoint would be registered in an object’s handle system record.

    That approach had the advantage of authenticity of reference — an authority must maintain the record —- but is limited in scalability. In particular, that approach makes the creation of unexpected and serendipitous identifier-based services difficult. aLODdin’s approach, based on LOD, seems to be naturally scalable.

    Posted by John Erickson | 2010-05-19, 12:34
    • John,

      Thanks! Handle looks interesting – will look into it as well.

      Indeed, though aLODin is utterly naive (both the idea and the impl, FWIW) it has two big advantages over similar concepts such as Semantic Web Services, etc.: 1. it’s data-driven (and the data is already available now, thanks to LOD), and 2. it scales, due to the Linked Data fundamentals URI/HTTP/RDF.

      Let’s see where this journey takes us. Did I already mention that we’re living in very exciting times, these days? :)

      Cheers,
      Michael

      Posted by woddiscovery | 2010-05-19, 13:09

Trackbacks/Pingbacks

  1. Pingback: Scott Banwart's Blog » Blog Archive » Distributed Weekly 51 - 2010-05-21

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Archives

Follow

Get every new post delivered to your Inbox.

Join 2,151 other followers

%d bloggers like this: