So, we all came across 404 on the Web of Documents, right?
Bernhard Haslhofer recently raised an issue at the email@example.com mailing list regarding Broken Links in LOD Data Sets, which unfortunately didn’t yield big and deep discussions. I intend to rehash the thread here, come up with a straw-man proposal and ask you for your comments.
Here is Bernhard’s core message:
If we assume that the consumers of LOD data are not humans but applications, broken links/references are not only “annoying” but could lead to severe processing errors if an application relies on a kind of “referential integrity”.
Today I gave it a quick thought after reviewing the recent TAG discussion how they intent to deal with broken links in their documents: the straw-man proposal is called ‘repairing vintage link values’ (revival) and may look as follows.
- A human (e.g. through a built-in feature in a Web of Data browser such
as Tabulator) encounters a broken link an reports it to the respective
dataset publisher (the authoritative one who ‘owns’ it)
A machine encounters a broken link (should it then directly ping the dataset publisher or first ‘ask’ its master for permission?)
- The dataset publisher acknowledges the broken link and creates according triples as done in the case for documents (cf. TAG’s proposal)
We note that there are two important assumptions: someone who uses the data reports it and the dataset publisher (rather than a centralised service) fixes it.