//
you're reading...
FYI, Linked Data

Technology MalBestPracticing

Reading RESTful Web Services by Leonard Richardson and Sam Ruby, it suddenly stroke my like thunder: yes indeed, it’s very often the case with technologies that they are (often unknowingly) abused in obscure ways, which then is often perceived by the community as good or best practice. So much generic introduction for explaining the title – let’s flesh it out ;)

Couple of years a go I used to develop Web applications using JSP and relational databases (RDB). One pattern I often found (and have to admit, did myself pretty much the same way): treating the RDB only as a dump store without exploiting the features it offers. So, you load whatever you need via some SQL command in the beginning, process it in memory and when you’re done you dump it back again into the RDB. Is this the way RDB are supposed to be used? Certainly not.

Then, as motivated by the RESTful WS book, HTTP naturally provides a set of methods for CRUD operations, however, certain so-called ‘Web’ solutions merely use HTTP as a transport protocol and redefine most of the logic in rather complex ways (RPC-style being one example, but also hybrids exist that partially use HTTP for reading, but define their own mechanisms to update resources).

Anyway, there seems to be a pattern and now I was wondering if we know about such MalBestPracticing in the RDF world as well. What comes to mind are the following (ok, very roughly, but feel free to add yours):

  • Using RDF in a closed-world setup: often seen and often seen failing. Whenever you have a closed-world application, that is, something that’s supposed to do a job in an environment you entirely control (Intranet, desktop, etc.) and there is no need to share/incorporate other data, using RDF is probably not a smart choice. You’re better off with the RDB of your choice and some hand-coded rules, both in terms of complexity and performance.
  • Thinking of RDF on the serialisation level. Yes, there are a couple of RDF serialisations such as RDF/XML, RDFa, Turtle, etc. but that’s not the point. If I want to I can put my RDF-glasses on and view (quite) everything as RDF, but one should think of RDF on the data model level, rather. The important point is that RDF provides a away to express structured data in a graph manner which happens to be the same as the Web from a morphological point of view.
  • No interlinking between data. Hu, that’s a heavy one. Publishing RDF without interlinking to other data in RDF out there. But to be fair, this has been properly addressed by TimBL in his LinkedData note and the community has picked it up since. Imagine HTML documents on all of the computers in the world … without a single hyperlink between them. Would you call that the Web? Certainly not. Believe it or not, this was more or less what we’ve been doing for more than six years or so in the Semantic Web.

So, what’s your favorite MalBestPracticing in the Semantic Web world?

About these ads

About woddiscovery

Web of Data researcher and practitioner

Discussion

12 thoughts on “Technology MalBestPracticing

  1. Regarding the recommendation against RDF for a closed-world setup:

    You constrain the recommendation to cases where “there is no need to share/incorporate other data”, but I think that’s a pretty rare event. As soon as your data involves “users”, for example, there’s potential for connecting with other data. There are other costs with having developers use RDF, so the benefits of a potential connection need to be properly considered.

    But imagine if all projects in your organization used RDF for their local data, linking only on users. That would still allow for some pretty nice data browsing, in my opinion.

    I also think there are fewer varied conventions for the structuring of RDF data compared to SQL. Often it is hard for me to follow a foreign SQL project’s use of keys, connector tables, subclass tables, column-naming-that-encodes-constraints, etc. Having the data as RDF, perhaps even through D2R, makes it much easier to explore and manipulate.

    Posted by drewp | 2009-05-24, 09:32
  2. Hi, I wrote a response here: http://blogs.talis.com/n2/archives/470 but in addition to my point there, here are some more:

    Flat Data – when data only uses object literals, so there is no ‘graph’, only unconnected records

    Putting things in literals that need to be parsed again – eg: :Foo ex:list "apples,oranges" . instead of :Foo ex:list "apples" , "oranges" .

    Not publishing enough data! I think it will become increasingly important to publish more ‘metadata’ about the document the RDF is published in, and the dataset that it is part of, and/or, the service that generated it. To make reliable re-use of data, we need to know about its licensing and provenance.

    Posted by Keith Alexander | 2009-05-24, 12:13
  3. Perhaps not surprisingly, I disagree that these are “malbestpractices” of RDF. And perhaps not surprisingly, this tickles one of my biggest objections of the Semantic Web community: claiming that certain uses of the technologies are right and that others are wrong.

    There are *plenty* of good uses of RDF in a closed world, and there are plenty uses of RDF that require a closed world (e.g. data collection validation). RDF as an enabling technology has many benefits aside from an open-world. Unless you mean “closed world” as in “the full data structure is known in advance and will never, ever change or come in contact with any other information”, I just can’t agree here.

    I don’t think I disagree with you on the serialization bit, I just don’t understand the last sentence :-)

    The last one, of course, I disagree with pretty strongly. I think adding links is a tremendous value add, but by no means the end-all and be-all. There are plenty of isolated Web sites that do useful things, either behind or outside of firewalls. There are about a bajillion uses of SemWeb technologies that I can think of that derive benefits without relying on external links. (Both OWL-heavy use cases like some of the SNOMED validation work and RDF-heavy use cases like the spreadsheet work that we do at Cambridge Semantics come immediately to mind.)

    Lee

    Posted by Lee Feigenbaum | 2009-05-24, 14:50
  4. Creating more than one way to say things. I’m probably as guilty as the next person of doing this, but defining a vocabulary that lets people say the same thing in several different ways makes data really hard to query later on.

    Examples include properties which are inverses of each other (i.e. { :A :child :B } means the same as { :B :parent :A }), creating reified and non-reified ways of saying the same thing, and unnecessary duplication of properties and classes which already exist on other vocabularies.

    RDFS and OWL can help us out here, but this places a heavy reasoning burden on the query engine.

    Posted by Toby Inkster | 2009-05-24, 15:52
  5. Hi,

    I agree with LeeF here: there are plenty of reasons why using RDF in a closed world manner is useful and better than alternatives.

    In the past I’ve re-architected a publishing system around RDF that was an entirely “closed-world” application, and it was much better and much more flexible than the original solution based on a relational database. RDF as a flexible semi-structured graph model has value even if you aren’t using it on the web.

    Applying closed-world assumptions, and even constraining RDF/XML serializations to a specific sub-set, have their uses, e.g. in validation, moving between XML-RDF based views of the world, etc. None of these are “wrong”: the important thing is to always understand the trade-offs involved.

    So rather than focus on “MalBestPractices”, I think it’d be better to highlight the trade-offs so that the technologies can be better applied and understood.

    Posted by Leigh Dodds | 2009-05-24, 17:14
  6. Regarding your first point, RDF makes a lot of sense in many close-world applications. Particularly those applications that evolve frequently. Not being constrained by the “tyranny of the database” makes for a very flexible and agile development environment. Need a new data property? Add it. Don’t recognize one? Ignore it.

    I’m not sure I follow your second point. If you’re saying that we should not be holding up RDF/XML, N3, turtle, etc. and proclaiming “THIS is RDF”, then I agree.

    Your third point regarding linking data is a stretch I think. My SharePoint site or my internal wiki might not point to any external entities/pages/whatever. Does that make it any less “webby” or less valuable to me or my company?

    Anyway, my two cents. :)

    Posted by Brian Manley | 2009-05-24, 18:52
  7. First, let me thank you for the many excellent replies (exclusively from SW practitioners so far, which make them even more valuable to me). I very much appreciate your criticism, this should really help moving forward.

    So, this intentionally provocative post triggered quite some discussion. Certainly, I’m not in the position to judge what is good or bad – who am I? just a nerdy academic researcher ;). However, IMO, we as a community should develop a certain ability to express self-criticism in order to better serve our customers, in order to more effectively disseminate, etc.

    Especially the first anti-pattern (closed-world) seemed to cause a lot of discussion. So, let me say, @drewp, @Lee, @Leigh, and @Brian: yes, indeed, I think you’re right that there are many cases where RDF is indeed very usable when it comes to flexibility, agile development, instant data integration, etc. – I agree that my initial definition of closed world needs a rephrasing and could indeed rather look as Lee pointed out: ‘the full data structure is known in advance and will never, ever change or come in contact with any other information’.

    Regarding the serialisation anti-pattern I agree with Brian (‘… we should not be holding up RDF/XML, N3, turtle, etc. and proclaiming “THIS is RDF”’) and absolutely support Leigh when he says ‘… highlight the trade-offs so that the technologies can be better applied and understood.’ – but note that this, IMO, is rather an additional step and does not spare us to identify cases where RDF is not the optimal solution.

    Also the third (no link) anti-pattern triggered strong disagreement. I guess I owe you clear words regarding what I meant, as this is the first step for mutual understanding. It is a matter of scope, of course. Linked data, for example, can perfectly be applied to an enterprise setup (Lee), internal links are extremely valuable (Brian), but in all cases there *are* connections/relations between data items. Again, it seems to me that we are in agreement but that my wording was not precise enough.

    What I learned from the discussion is: (i) I should carefully choose the wording (for example regarding judgments such as ‘smart choice’), (ii) clearly define what I mean and give more examples, and (iii) that we as a community should be able to self-critically asses our technologies without overdoing it. Btw, Kendall, stating ‘Bad bad advice’ [1] is fine, but doesn’t really help progressing the discussion (concrete counter-examples, however, do ;)

    Again, thanks a lot for all your valuable input!

    Cheers,
    Michael

    [1] http://twitter.com/kendall/status/1904483261

    Posted by woddiscovery | 2009-05-25, 06:00
  8. I agree with Brian Manley on the value of RDF in some closed world applications – for the reasons that Brian explains. It may be a closed world in terms of having access restricted to a particular community, eg one company, but in data model terms, many such applications are very much ‘open world’. As data evolves, you get broken links, duplicate/conflicting statements, rapid changes of schema etc. Being able to handle this kind of thing is a great strength of RDF.

    But I definitely agree with Michael’s second point on the harmful confusion around particular RDF serializations. The rather turgid RDF/XML is not intended to be read by humans! And only the most ardent angle bracket enthusiast should think of RDF in those terms.

    Posted by Bill Roberts | 2009-05-25, 12:57

Trackbacks/Pingbacks

  1. Pingback: mhausenblas's status on Sunday, 24-May-09 09:04:47 UTC - Identi.ca - 2009-05-24

  2. Pingback: Technology MalBestPracticing « Web of Data - Local Tech Experts - 2009-05-24

  3. Pingback: n² » Blog Archive » A MalBestPractice with RDF: Making Assumptions - 2009-05-24

  4. Pingback: Technology MalBestPracticing « Monkey Inside - 2009-05-25

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Archives

Follow

Get every new post delivered to your Inbox.

Join 2,151 other followers

%d bloggers like this: