//
you're reading...
Experiment, Linked Data

Quick RDFa profiling report

The other day at semantic-web@w3.org, William Waites asked how to link to an RDF serialisation from an HTML document (if I understood him correctly) as he seems not too much into RDFa. While the answer to this one is IMHO rather straight-forward (use <link href="yadayada.rdf" rel="alternate" type="application/rdf+xml" /> the follow-up discussion reminded me on the section in our Linked Data with RDFa tutorial on Usability Issues (note that this document will soon be moved to another location):

One practical issue you may want to check against sometimes occurs with fine-grained, high-volume datasets. Imagine a detailed description of audio-visual content (say, a one hour video incl. audio track) in RDF, or, equally, a detailed RDF representation of a multi-dimensional table of statistics, with dozens of columns and potentially thousands of rows. In both cases, one ends up with potentially many triples, which might mean some 100k triple or more. As both humans and machines are expected to consume the RDFa document, one certainly has to find a trade-off between using RDFa for the entire description (meaning to embed all triples in the HTML document) and an entirely externalised solution, for example using RDF/XML:

We also give a rough guideline how to decide how much is too much:

… having the entire RDF graph embedded certainly is desirable, however, one has to check the usability of the site. Usability expert Jakob Nielsen advocates a size limit for Web pages yielding an approximate 10 sec response time limit. Based on this we propose to perform a simple sort of response time testing, once with the plain HTML page and once with the embedded RDF graph. In case of a significant difference, one should contemplate if the all-in-RDFa approach is appropriate for the use case at hand.

Now, I wanted to get some real figures regarding how the number of triples embedded with RDFa impacts the loading time of an HTML page in a browser and did the following: I loaded some 17 cities from DBpedia (such as Amsterdam) into an RDF store and created a number of generic RDFa+HTML documents essentially with:

SELECT * WHERE { ?s ?p ?o } LIMIT $SIZE

… where $SIZE would range from 10 to 20,000 – each triple looks essentially like:

<div about='http://dbpedia.org/resource/William_Howitt'>
<a rel='dbp:deathPlace'
  href='http://dbpedia.org/resource/Rome'>

http://dbpedia.org/resource/Rome

</a>
</div>

Then I used Firebug with the NetExport extension and a shell script to gather the load time. The (raw) results are available online as well as two figures that give a rough idea of what is happening:

Note the following regarding the test setup: I did a local test (no network dependencies) with all caches turned off; the tests were performed with Firefox 3.6 on MacOS 10.5.8 (2.53 GHz Intel Core 2 Duo with 4GB/1067 MhZ DDR3 RAM on board). Each document had five runs, the numbers above show the averages over the runs.

 


UPDATES: William has clarified that his question was more should than how to do the linking. Further, Martin Hepp has pointed out a related work he did. Thanks to both for the interesting and valuable feedback!

 

About these ads

About woddiscovery

Web of Data researcher and practitioner

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Archives

Follow

Get every new post delivered to your Inbox.

Join 2,151 other followers

%d bloggers like this: