This is the first post in the solving-tomorrow’s-problems-with-yesterday’s-tools series.
[Relational database systems] allow versioning or activities like: Create, Read, Update and Delete. For databases, updates should never be allowed, because they destroy information. Rather, when data changes, the database should just add another record and note duly the previous value for that record.
I don’t find it puzzling at all. As Pat Helland rightly says:
In large-scale systems, you don’t update data, you add new data or create a new version.
OK, I guess arguing this on an abstract level serves nobody. Let’s get our hands dirty and have a look at a concrete example. I pick an example from the Linked Data world, but there is nothing really specific to it – it just happens to be the data language I speak and dream in ;)
- If I ask the question: ‘Where has Michael been living previously?’, I can’t get an answer anymore once the update has been performed, unless I have a local copy of the old data piece.
- Whenever I ask the question: ‘Where does Michael live?’ I need to implicitly add ‘at the moment’, as the information is not scoped.
There are few ways one can deal with it, though. And as a consequence, here is what I demand:
- Never ever DELETE data – it’s slow and lossy; also updating data is not good, as UPDATE is essentially DELETE + INSERT and hence lossy as well.
- Each piece of data must be versioned – in the Linked Data world one could, for example, use quads rather than triples to capture the context of the assertion expressed in the data.
Oh, BTW, my dear colleagues from the SPARQL Working Group – having said this, I think SPARQL Update
is heading in the wrong direction would benefit from adding an appendix that discusses ‘large-scale deployment considerations’ on a system-level. Can we still change this, pretty please?
PS: disk space is cheap these days, as nicely pointed out by Dorian Taylor ;)