Somewhere between Ross 154 and Ross 248

2009-02-22

Imagine you are around somewhere between Ross 154 and Ross 248. Then you’ll likely see these days an announcement ‘W3C Issues Recommendation for Resource Description Framework (RDF)‘:

RDF provides interoperability between applications that exchange machine-understandable information on the Web. RDF emphasizes facilities to enable automated processing of Web resources. RDF can be used in a variety of application areas; for example: in resource discovery to provide better search engine capabilities, in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library, by intelligent software agents to facilitate knowledge sharing and exchange, in content rating, in describing collections of pages that represent a single logical “document”, for describing intellectual property rights of Web pages, and for expressing the privacy preferences of a user as well as the privacy policies of a Web site. RDF with digitally signed documents will be key to building the “Web of Trust” for electronic commerce, collaboration, and other applications.

Funny enough, the first use case mentioned is actually discovery ;)

Ah, yeah, and btw, precisely 10 years ago the RDF name space was published: http://www.w3.org/1999/02/22-rdf-syntax-ns# – Happy Birthday, old girl!


Alternative approach to escape the ‘non-information’ resource dilemma

2009-02-19

As just posted to the TAG list, my ‘attempt to defined non-information resources without using non-information resource‘ – I’d love to learn about your opinion here, in case you are not yet subscribed to the TAG mailing list ;)


An Attempt of a Web Discovery Stack

2009-02-14

Continuing the (good?) tradition of creating stacks (OSI model, Semantic Web, W3C Technology, etc.) when there are too many technologies floating around, I did some drawing and came up with:

Web Discovery Stack, v1, 2009-02-14

Web Discovery Stack, v1, 2009-02-14

The Web Discovery Stack as shown here is an attempt to get the many proposals and ideas around discovery into some shape or scheme. The WD stack is basically a layering of specifications (or proposals for specifications) that mainly serve the discovery of resources on the Web.

Very happy to hear you comments (either here, via email or on IRC #swig channel).


Thoughts on Self-Describing Web Resources, part 1

2009-02-13

This is the first part of a series of posts regarding the recent W3C TAG finding The Self-Describing Web. The TAG started the work on the document around two years ago, in February 2007. However, its roots can be traced back to a post from Dan Connolly in 2005 labeled new issue? squatting on link relationship names, x-tokens, registries, and URI-based extensibility where he was basically contemplating on ’squatting on the community resource of link relationship names’.

In the following, we will review the finding regarding self-describing Web resources (SDWR), section by section and discuss it in the context of Web of Data discovery. Each section has a motivation and discussion along with so called good practice(s) in it. The post at hand is about the first section, the introduction.

Section ‘1 Introduction’ of the SDWR finding starts already with a strong statement:

Supporting ad-hoc exploration is a goal of the Web.

Wow. I’m impressed. Can I have a reference for this please?

The section then continues, based on the Architecture of the World Wide Web, Volume One, to introduce the overall goal:

… how create, deploy and access self-describing Web resource representations that can be correctly interpreted using only widely available information.

We note: create, deploy and access. Creating stuff (especially when humans are involved) can be expensive; so you’d likely want to see a certain degree of automation in this process. Further, still in the publisher’s realm, we have the deployment, that is the actual way the resources are packaged and delivered along with the things that describe these resources (we will come back to this in one of the following posts in this series). Only the last of the three terms addresses consumer of the resource: access; we note that there is no consume or interpret or whatever, simply access … we will elaborate on this one as well.

The text in the first section continues:

Furthermore, when self-describing representations are linked together, the Web as a whole can support reliable, ad hoc discovery of information.

Worth noting are the linked together (how? who?) and the claim of ad hoc discovery.

This first section of the SDWR finding ends with a principle and a good practice:

Self-describing resources promote ad hoc discovery of information.

… which is an obvious (too obvious?) statement. It serves as a general motivation along the line: you put self-descriptive stuff on the Web, this in turn supports ad hoc discovery and this, finally, makes the entire system more valuable (anyone got another conclusion for the last part?).

The good practice that comes with it is pretty straight-forward:

Web resource representations should be self-describing.

So, we get the point. But why is ad hoc discovery, again, so valuable? Only because this fosters automation or is there another, deeper reason?

My favorite half-sentence from the last paragraph of the first section of the SDWR finding is:

… why it’s important that interpretation of Web representations be grounded unambiguously in the core specifications of the Web …

Yes! It is important to ground the interpretation in the core specifications. Even if the big players in the search engine business at large don’t honor it, even if crappy software doesn’t care, even if people wonder why, for example, validating HTML does matter.

We’ve seen now the first, introductory section of the SDWR finding and are certainly curious what the rest of this document will reveal. Stay tuned!


How to Deal with Broken Data Links

2009-02-12

So, we all came across 404 on the Web of Documents, right?

Bernhard Haslhofer recently raised an issue at the public-lod@w3.org mailing list regarding Broken Links in LOD Data Sets, which unfortunately didn’t yield big and deep discussions. I intend to rehash the thread here, come up with a straw-man proposal and ask you for your comments.

Here is Bernhard’s core message:

If we assume that the consumers of LOD data are not humans but applications, broken links/references are not only “annoying” but could lead to severe processing errors if an application relies on a kind of “referential integrity”.

Today I gave it a quick thought after reviewing the recent TAG discussion how they intent to deal with broken links in their documents: the straw-man proposal is called ‘repairing vintage link values’ (revival) and may look as follows.

  1. A human (e.g. through a built-in feature in a Web of Data browser such
    as Tabulator) encounters a broken link an reports it to the respective
    dataset publisher (the authoritative one who ‘owns’ it)
    OR
    A machine encounters a broken link (should it then directly ping the dataset publisher or first ‘ask’ its master for permission?)
  2. The dataset publisher acknowledges the broken link and creates according triples as done in the case for documents (cf. TAG’s proposal)

We note that there are two important assumptions: someone who uses the data reports it and the dataset publisher (rather than a centralised service) fixes it.

Comments, anyone?



Web Programming – Assembler:Java is like Web 2.0 programming:?

2009-02-10

As Kevin Kelly pointed out in 2007 in his seminal talk Predicting the next 5,000 days of the web, the Web is one huge machine. We have different views on it (be it as a humans through HTML or a machine consuming RDF), but it is one machine.

A machine that can be and actually is programmed. Ok, so we know that there is the data (yeah, TimBL, right, tell it them people: GIMME YOUR RAW DATA) and we are developer, right.

Let’s step back. May I quiz you a bit? So, tell me, Assembler:Java is like Web 2.0 programming:?.

My two cents are: the ? is a bunch of Web of Data technologies such as URIs, RDF, SPARQL, and now voiD. Rather than learning new APIs and proprietary formats based on XML or the like , one learns the RDF data model, then some vocabularies such as FOAF, SIOC, etc. and maybe some domain-specific ones. Then, one exploits linked data, that is the datasets that are already available on the Web, and starts developing her application.

Thoughts, anyone?


The Self-Describing Web

2009-02-08

Just announced at the www-tag@w3.org list: The Self-Describing Web, a recent finding of the W3C TAG. From its abstract:

The Web is designed to support flexible exploration of information by human users and by automated agents. For such exploration to be productive, information published by many different sources and for a variety of purposes must be comprehensible to a wide range of Web client software, and to users of that software.

HTTP and other Web technologies can be used to deploy resource representations that are self-describing: information about the encodings used for each representation is provided explicitly within the representation. Starting with a URI, there is a standard algorithm that a user agent can apply to retrieve and interpret such representations. Furthermore, representations can be what we refer to as grounded in the Web, by ensuring that specifications required to interpret them are determined unambiguously based on the URI, and that explicit references connect the pertinent specifications to each other. Web-grounding ensures that the specifications needed to interpret information on the Web can be identified unambiguously. When such self-describing, Web-grounded resources are linked together, the Web as a whole can support reliable, ad hoc discovery of information.

This finding describes how document formats, markup conventions, attribute values, and other data formats can be designed to facilitate the deployment of self-describing, Web-grounded Web content.

There are many important issues re discovery and usage of Web of Data resources included; a must-read, indeed.


P2P linked data?

2009-02-08

So, Hugh Glaser seems to toy around with deploying linked data on P2P architectures. He’s using the magnet: URI scheme and rumors says he has put stuff on thepiratebay.org – we stay tuned!


Human-lead discovery

2009-02-07

Just following and contributing a thread over at public-lod@w3.org where Hugh Glaser asked: can we lower the LD entry cost?:

We should not permit any site to be a member of the Linked Data cloud if it
does not provide a simple way of finding URIs from natural language
identifiers.

Comments, anyone?