Jan Algermissen recently compiled a very useful Classification of HTTP-based APIs. This, together with Mike Amundsen‘s interesting review of Hypermedia Types made me think about data and the Web.
One important aspect of data is “on the Web vs. in the Web” as Rick Jelliffe already highlighted in 2002:
To explain my POV, let me make a distinction between a resource being “on” the Web or “in” the Web. If it is merely “on” the Web, it does not have any links pointing to it. If a resource is “in” the Web, it has links from other resources to it. […] A service that has no means of discovery (i.e. a link) or advertising is “on” the Web but not “in” the Web, under those terms. It just happens to use a set of protocols but it
is not part of a web. So it should not be called a web service, just an unlinked-to resource.
In 2007 Tom Heath repeated this essential statement in the context of Linked Data.
So, I thought it makes sense to revisit some (more or less) well-known data formats and services and try to pin down what “in the Web” means – a first step to measure how well-integrated they are with the Web. I’ll call the degree of how “much” they are in the Web the Link factor in the following. I suggest that the Link factor ranges from -2 (totally “on the Web”) to +2 (totally “in the Web), with the following attempt of a definition for the scale:
-2 … proprietary, desktop-centric document formats -1 … structured data that can be exposed and accessed via Web 0 … standardised, Web-aligned (XML-based) formats or Web services 1 … open, standardised (document) formats 2 … full REST-compliant, open (data) standards natively supporting links
Here is what I’ve so far – feel free to ping me if you disagree or have some other suggestions:
Technology | Examples | Link factor |
Documents | MS Word, PDF | -2 |
Spreadsheets | MS Excel | -1 |
RDBMS | Oracle DB, MySQL | -1 |
NoSQL | BigTable, HBase, Amazon S3, etc. | 0 |
Hypertext and Hypermedia | HTML, VoiceML, SVG, Google Docs | 1 |
Hyperdata | Atom, OData, Linked Data | 2 |
Michael,
How about this dichotomy?
Hypermedia Resources (Hypertext or Hyperdata) and Non Hypermedia Resources (traditional platform and application specific resource types).
It also becomes somewhat easier to understand value proposition of middleware products that offer the following:
1. Transformation of Non Hypermedia Resources into Hypermedia Resources
2. Transformation of Hypertext Resources into Hyperdata Resources .
Kingsley
Interesting classification, but I don’t completely agree, especially with the PDF bit. PDF documents (and even Word documents) are hypertext documents internally, but also as part of the Web, as it is easy to add external links to them (I do it all the time). So, PDF documents can be very much “in the Web”. Also, note that PDF is in fact an open ISO standard! I would even challenge the claim that PDF is “desktop-centric”. Rather, I would call PDF a “print-centric” format, where HTML is a screen-centric format.
Knud,
“… as it is easy to add external links to them (I do it all the time).”
Really? Maybe you mean link *from* (within) them? If you can show me how to link *to*, say, a certain section of a MS Word document from the Web, I’ll come upstairs and get you a coffee (and/or) a cookie for free 😉
Right, I meant linking from within the document to the outside. PDF has some disadvantages with respect to linking in the opposite direction – even though I think there is some arcane mechanism to link to specific pages or sections within a PDF document (does that qualify for a cookie?).
Still, to say that PDF is a proprietary, document-centric format and not at all “in” the Web sounds wrong to me!