Imagine you’re running a research group of 100 people. You want to find out the expertise of your chaps and aggregate profiles. Sure, you can perfectly sit down and browse through tons of materials you have about your people. Browse through their homepages, project pages, subversion commits, blog posts, tweets, logs from IRC, and you name it. Then, for each person, you collect all the data found on the Web (or internal information sources) and dump it into a data bank of your choice (hum? you’re using MS Excel, never mind 😉
This process might certainly be effective. You’ve gathered detailed information about 100 people and know precisely what they do and where they’re good at. Additionally, you’ve spent (or wasted?) 5000€ equivalent as it took you, say, a week? And I’m now just talking about gathering the data, not the tedious task of aggregating it nicely, formatting it properly so that you can use it to impress your sponsor.
Now, let’s imagine the same situation, but rather than you go and collect data, you ask people to provide their profiles themselves. All you do is set up a standardised form which contains fields for bio data, publications, projects, etc. and the people themselves provide this data by filling in the relevant fields. Then, after the deadline, you just press the ‘dump now’ button and voila, there you go …
Why am I telling this story? I guess this is mainly motivated by the fact that I am often faced with the question: why should one care about (using) voiD? With follow-your-nose (FYN), it is true that RDF offers a way to discover everything you like. If you’re not limited by time and/or budget. So, we note that this method is effective but NOT efficient,
To put it in other words, to a certain extent, FYN allows you to discover, gather and integrate all RDF-based data out there. It’s effective, but not very efficient. That is where voiD comes into play: people who have the data (or, at least know it very well 🙂 provide a sort of summary of the dataset (regarding topics covered, license, vocabularies used, statistics on triples, interlinking, etc. as explained in the voiD guide). Then, all you need to do is operate on this summary. Using voiD, hence, for the task of discovery regarding the gathering, aggregation, and integration of data is effective and efficient, IMHO.