you're reading...
Experiment, Linked Data

Linked Enterprise Data in a nutshell

If you haven’t been living under a rock for the last weeks you might have noticed a new release of the LOD cloud diagram with some 200 datasets and some 25 billion triples. Very impressive, one may think, but let’s not forget that publishing Linked Data is not an end in itself.

So, I thought, how can I do something useful with the data and I ended up with a demo app that utilizes LOD data in an enterprise setup: the DERI guide. Essentially, what it does is telling you where in the DERI building you find an expert for a certain topic. So, if you just have some 5min, have a look at the screen-cast:

Behind the curtain

Now, let’s take a deeper look how the app works. So, the objective was clear: create a Linked Data app using LOD data with a bunch of shell scripts. And here is what the DERI guide conceptually looks like:

I’m using three datasets in this demo:

All that is needed, then, is an RDF store (I chose 4Store; easy to set up and use, at least on MacOS) to manage the data locally and a bunch of shell scripts to query the data and format the result. The data in the local RDF store (after loading it from the datasets) typically looks like this:

The main script (dg-find.sh) takes a term (such as “Linked Data”) as an input, queries the store for units that are tagged with the topic (http://dbpedia.org/resource/Linked_Data), then pulls in information from the FOAF profiles of the matching members and eventually runs it through an XSLT to produce a HTML page that opens in the default browser:

echo "=== DERI guide v0.1"
echo "Trying to find people for topic: "$1

topicURI=$( echo "http://dbpedia.org/resource/"$1 | sed 's/ /_/')

curl -s --data-urlencode query="SELECT DISTINCT
 ?person WHERE { ?idperson <http://www.w3.org/2002/07/owl#sameAs> ?person ;
<http://www.w3.org/ns/org#hasMembership> ?membership .
?membership <http://www.w3.org/ns/org#organization> ?org .
?org <http://www.w3.org/ns/org#purpose> <$topicURI> . }"
http://localhost:8021/sparql/ > tmp/found-people.xml
webids=$( xsltproc get-person-webid.xsl tmp/found-people.xml )

echo "<h2>Results for: $1</h2>" >> result.html
echo "<div style='padding: 20px; width: 500px'>" >> result.html
for webid in $webids
foaffile=$( util/getfoaflink.sh $webid )
echo "Checking <"$foaffile"> and found WebID <"$webid">"
./dg-initdata-person.sh $foaffile $webid
./dg-render-person.sh $webid $topicURI
echo "</div><div style='border-top: 1px solid #3e3e3e;
 padding: 5px'>Linked Data Research Centre, (c) 2010</div>" >> result.html

rm tmp/found-people.xml
util/room2roomsec.sh result.html result-final.html
rm result.html
open result-final.html

The result for the example query ./dg-find.sh "Linked Data" yields a HTML page such as this:

Lessons learned

I was amazed by the fact how easy and quick it was to use the data from different sources to build a shell-based app. Most of the time I spent writing the scripts (hey, I’m not a shell guru and reading the sed manual is not exactly fun) and tuning the XSLT to output some nice HTML. The actual data integration part, that is, loading the data it into the store and querying it, was straight-forward (beside overcoming some inconsistencies in the data).

From the approximately eight hours I worked on the demo, some 70% went into the former (shell scripts and XSLT), some 20% into the latter (4store handling via curl and creating the SPARQL queries) and the remaining 10% were needed to create the shiny figures and the screen-cast, above. To conclude: the only thing you really need to create a useful LOD app is a good idea which sources to use, the rest is pretty straight-forward and, in fact, fun ;)

About these ads

About woddiscovery

Web of Data researcher and practitioner


4 thoughts on “Linked Enterprise Data in a nutshell

  1. Michael, out of curiosity, which part of this did you expect to be difficult (and therefore were amazed when it was not)?


    Posted by Lee Feigenbaum | 2010-09-27, 16:01
    • Lee,

      Thanks a lot for this very valid question! So, first of all I never used 4store before and was hence surprised how easy it was to handle via curl. Also, as I said, I’m not exactly a shell script hacker, so this part took longer (along with the styling, 70% of the overall time), but not as long as I would have expected.

      Additionally, I was not very familiar with the data itself, so I was unsure if the task at hand (show location and some other relevant data for an expert on a certain topic) was doable or not. Turns out that, though there where minor issues I had to overcome regarding the data – workaround for FOAF profiles via @rel from the HTML, using owl:sameAs between room/org and profiles due to differing identifiers (DERI room data has been generated from an Excel sheet that contains some bugs, my favorite one is the one for Manfred Hauswirth) – writing the respective SPARQL queries to get all the relevant data was rather easy (some minor things, such as with struggling with OPTIONALs, cause not all people have location data assigned). So, summing up, the data was in a much better shape as I would have expected.

      Not sure if this answers your question?

      Posted by woddiscovery | 2010-09-28, 07:30


  1. Pingback: Tweets that mention Linked Enterprise Data in a nutshell « Web of Data -- Topsy.com - 2010-09-28

  2. Pingback: Scott Banwart's Blog » Blog Archive » Distributed Weekly 70 - 2010-10-01

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s



Get every new post delivered to your Inbox.

Join 2,151 other followers

%d bloggers like this: