Microblogging – linked data in disguise?

2009-03-19

You may wonder what microblogging (such as Twitter and identi.ca) and linked data have in common. My hunch is: a lot. Some observations:

  • each microblogging post has a URI
  • many microblogging posts can be understood as (structured) statements
  • in microblogging posts, one finds explicit, semantically typed links to other statements (via #foo or @somechap)
  • microblogging posts connect real-world entities (such as events, books, etc.) with other real-world entities but also with documents on the Web

An example may highlight this: my identi.ca page run through OpenLink’s URI burner. Inspect the content and see how smoothly the container (the post) can be converted to linked data – the same is certainly possible for the content of the post.

Related stuff you may want to read: Benjamin Nowack’s quick thoughts on semantic microblogging and his awesome personal semantic microblogging tool called smesher as well as our related work in the SIOC domain re semantic microblogging.


How does pushback relate to …

2009-03-14

From the day we initiated pushback – on that day the idea has been discussed the first time properly in detail with Richard here at DERI and with TimBL via IRC – people would challenge me to explain how pushback would fit into the current schema. We can use proprietary APIs such as the flickr API to change or insert data in Web 2.0 data silos on the one hand and we have SPARQL Update on the other hand to update the RDF data in, say, a triple store (for example, ARC2).

But what if we’re surfing in the Web of Data, that is, viewing an RDF document using for example Tabulator? What we’re actually able to do nowadays is to update a native RDF data source (with SPARQL Update). But how many sites or services in the real world actually use triple store? You get the point …

On the other hand many Web 2.0 sites provide HTML forms for the ‘updateable’ part of their data; this could be an order form at Amazon or a Twitter post. In the linked data world people have invested time to create incredible useful so called linked datasets. One can understand some of the dataset as sort of read-wrapper around Web 2.0 data sources (e.g., Alexandre’s flickr wrapper). In the same way we’re able to set up write-wrapper that know how to handle the according Web 2.0 data source. We then just need a flexible and generic method to talk to these write-wrapper based on the data at hand, which means that the starting point in the context of pushback is always an RDF document (virtual, i.e. via GRDDL or SPARQL DESCRIBE or whatever doesn’t matter to us).

Take everything from above, shake it swiftly and voila, there you go (this is a morphological analysis, introduced by Fritz Zwicky a couple of years ago) :

Morphological analysis of update mechanisms on the Web


On the Effectiveness and Efficiency of Discovery

2009-03-01

Imagine you’re running a research group of 100 people. You want to find out the expertise of your chaps and aggregate profiles. Sure, you can perfectly sit down and browse through tons of materials you have about your people. Browse through their homepages, project pages, subversion commits, blog posts, tweets, logs from IRC, and you name it. Then, for each person, you collect all the data found on the Web (or internal information sources) and dump it into a data bank of your choice (hum? you’re using MS Excel, never mind ;)

This process might certainly be effective. You’ve gathered detailed information about 100 people and know precisely what they do and where they’re good at. Additionally, you’ve spent (or wasted?) 5000€ equivalent as it took you, say, a week? And I’m now just talking about gathering the data, not the tedious task of aggregating it nicely, formatting it properly so that you can use it to impress your sponsor.

Now, let’s imagine the same situation, but rather than you go and collect data, you ask people to provide their profiles themselves. All you do is set up a standardised form which contains fields for bio data, publications, projects, etc. and the people themselves provide this data by filling in the relevant fields. Then, after the deadline, you just press the ‘dump now’ button and voila, there you go …

Why am I telling this story? I guess this is mainly motivated by the fact that I am often faced with the question: why should one care about (using) voiD? With follow-your-nose (FYN), it is true that RDF offers a way to discover everything you like. If you’re not limited by time and/or budget. So, we note that this method is effective but NOT efficient,

To put it in other words, to a certain extent, FYN allows you to discover, gather and integrate all RDF-based data out there. It’s effective, but not very efficient. That is where voiD comes into play: people who have the data (or, at least know it very well :) provide a sort of summary of the dataset (regarding topics covered, license, vocabularies used, statistics on triples, interlinking, etc. as explained in the voiD guide). Then, all you need to do is operate on this summary. Using voiD, hence, for the task of discovery regarding the gathering, aggregation, and integration of data is effective and efficient, IMHO.


Web Programming – Assembler:Java is like Web 2.0 programming:?

2009-02-10

As Kevin Kelly pointed out in 2007 in his seminal talk Predicting the next 5,000 days of the web, the Web is one huge machine. We have different views on it (be it as a humans through HTML or a machine consuming RDF), but it is one machine.

A machine that can be and actually is programmed. Ok, so we know that there is the data (yeah, TimBL, right, tell it them people: GIMME YOUR RAW DATA) and we are developer, right.

Let’s step back. May I quiz you a bit? So, tell me, Assembler:Java is like Web 2.0 programming:?.

My two cents are: the ? is a bunch of Web of Data technologies such as URIs, RDF, SPARQL, and now voiD. Rather than learning new APIs and proprietary formats based on XML or the like , one learns the RDF data model, then some vocabularies such as FOAF, SIOC, etc. and maybe some domain-specific ones. Then, one exploits linked data, that is the datasets that are already available on the Web, and starts developing her application.

Thoughts, anyone?


Human-lead discovery

2009-02-07

Just following and contributing a thread over at public-lod@w3.org where Hugh Glaser asked: can we lower the LD entry cost?:

We should not permit any site to be a member of the Linked Data cloud if it
does not provide a simple way of finding URIs from natural language
identifiers.

Comments, anyone?