Thoughts on Self-Describing Web Resources, part 1

2009-02-13

This is the first part of a series of posts regarding the recent W3C TAG finding The Self-Describing Web. The TAG started the work on the document around two years ago, in February 2007. However, its roots can be traced back to a post from Dan Connolly in 2005 labeled new issue? squatting on link relationship names, x-tokens, registries, and URI-based extensibility where he was basically contemplating on ’squatting on the community resource of link relationship names’.

In the following, we will review the finding regarding self-describing Web resources (SDWR), section by section and discuss it in the context of Web of Data discovery. Each section has a motivation and discussion along with so called good practice(s) in it. The post at hand is about the first section, the introduction.

Section ‘1 Introduction’ of the SDWR finding starts already with a strong statement:

Supporting ad-hoc exploration is a goal of the Web.

Wow. I’m impressed. Can I have a reference for this please?

The section then continues, based on the Architecture of the World Wide Web, Volume One, to introduce the overall goal:

… how create, deploy and access self-describing Web resource representations that can be correctly interpreted using only widely available information.

We note: create, deploy and access. Creating stuff (especially when humans are involved) can be expensive; so you’d likely want to see a certain degree of automation in this process. Further, still in the publisher’s realm, we have the deployment, that is the actual way the resources are packaged and delivered along with the things that describe these resources (we will come back to this in one of the following posts in this series). Only the last of the three terms addresses consumer of the resource: access; we note that there is no consume or interpret or whatever, simply access … we will elaborate on this one as well.

The text in the first section continues:

Furthermore, when self-describing representations are linked together, the Web as a whole can support reliable, ad hoc discovery of information.

Worth noting are the linked together (how? who?) and the claim of ad hoc discovery.

This first section of the SDWR finding ends with a principle and a good practice:

Self-describing resources promote ad hoc discovery of information.

… which is an obvious (too obvious?) statement. It serves as a general motivation along the line: you put self-descriptive stuff on the Web, this in turn supports ad hoc discovery and this, finally, makes the entire system more valuable (anyone got another conclusion for the last part?).

The good practice that comes with it is pretty straight-forward:

Web resource representations should be self-describing.

So, we get the point. But why is ad hoc discovery, again, so valuable? Only because this fosters automation or is there another, deeper reason?

My favorite half-sentence from the last paragraph of the first section of the SDWR finding is:

… why it’s important that interpretation of Web representations be grounded unambiguously in the core specifications of the Web …

Yes! It is important to ground the interpretation in the core specifications. Even if the big players in the search engine business at large don’t honor it, even if crappy software doesn’t care, even if people wonder why, for example, validating HTML does matter.

We’ve seen now the first, introductory section of the SDWR finding and are certainly curious what the rest of this document will reveal. Stay tuned!


The Self-Describing Web

2009-02-08

Just announced at the www-tag@w3.org list: The Self-Describing Web, a recent finding of the W3C TAG. From its abstract:

The Web is designed to support flexible exploration of information by human users and by automated agents. For such exploration to be productive, information published by many different sources and for a variety of purposes must be comprehensible to a wide range of Web client software, and to users of that software.

HTTP and other Web technologies can be used to deploy resource representations that are self-describing: information about the encodings used for each representation is provided explicitly within the representation. Starting with a URI, there is a standard algorithm that a user agent can apply to retrieve and interpret such representations. Furthermore, representations can be what we refer to as grounded in the Web, by ensuring that specifications required to interpret them are determined unambiguously based on the URI, and that explicit references connect the pertinent specifications to each other. Web-grounding ensures that the specifications needed to interpret information on the Web can be identified unambiguously. When such self-describing, Web-grounded resources are linked together, the Web as a whole can support reliable, ad hoc discovery of information.

This finding describes how document formats, markup conventions, attribute values, and other data formats can be designed to facilitate the deployment of self-describing, Web-grounded Web content.

There are many important issues re discovery and usage of Web of Data resources included; a must-read, indeed.