<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Web of Data &#187; Cloud Computing</title>
	<atom:link href="http://webofdata.wordpress.com/category/cloud-computing/feed/" rel="self" type="application/rss+xml" />
	<link>http://webofdata.wordpress.com</link>
	<description>data and computing at scale</description>
	<lastBuildDate>Wed, 22 May 2013 10:12:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='webofdata.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://1.gravatar.com/blavatar/7b693270b7a65d5773cbb0c737109c5e?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Web of Data &#187; Cloud Computing</title>
		<link>http://webofdata.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://webofdata.wordpress.com/osd.xml" title="Web of Data" />
	<atom:link rel='hub' href='http://webofdata.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Cloud Cipher Capabilities</title>
		<link>http://webofdata.wordpress.com/2013/03/24/cloud-encryption-support/</link>
		<comments>http://webofdata.wordpress.com/2013/03/24/cloud-encryption-support/#comments</comments>
		<pubDate>Sun, 24 Mar 2013 16:44:55 +0000</pubDate>
		<dc:creator>mhausenblas</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[FYI]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[encryption]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IaaS]]></category>
		<category><![CDATA[PaaS]]></category>
		<category><![CDATA[SaaS]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[SSL]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://webofdata.wordpress.com/?p=856</guid>
		<description><![CDATA[Where I'm reviewing support for encryption in the context of IaaS&#124;PaaS&#124;SaaS cloud service offerings as well as concerning Hadoop. While the motivation for encryption might differ, the primary question is if systems support this (transparently) or if developers are forced to code this in the application logic. <a href="http://webofdata.wordpress.com/2013/03/24/cloud-encryption-support/">Continue reading <span class="meta-nav">&#187;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=856&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>&#8230; or, the lack of it.</em></p>
<p>A recent discussion at a customer made me having a closer look around support for encryption in the context of XaaS cloud service offerings as well as concerning Hadoop. In general, this can be broken down into over-the-wire (cf. SSL/<a href="http://en.wikipedia.org/wiki/Transport_Layer_Security">TLS</a>) and back-end encryption. While the former is widely used, the latter is rather seldom to find.</p>
<p>Different reasons might exits why one wants to encrypt her data, ranging from preserving a competitive advantage to end-user privacy issues. No matter why someone wants to encrypt the data, the question is do systems support this (transparently) or are developers forced to code this in the application logic.</p>
<p><strong>IaaS-level</strong>. Especially in this category, file storage for app development, one would expect wide support for built-in encryption.</p>
<ul>
<li>Amazon&#8217;s S3 indeed provides <a href="http://aws.amazon.com/about-aws/whats-new/2011/10/04/amazon-s3-announces-server-side-encryption-support/">server-side support for encryption</a></li>
<li>Google Storage <a href="https://developers.google.com/storage/docs/developer-guide">does not encrypt files</a></li>
<li>Same for Rackspace&#8217;s Cloud Files &#8211; <a href="http://www.rackspace.com/knowledge_center/product-faq/cloud-files">no encryption</a>, ATM</li>
<li>As well as for Microsoft&#8217;s Azure storage &#8211; <a href="http://social.msdn.microsoft.com/Forums/en-US/windowsazuresecurity/thread/d8b461bd-87c4-4552-99ed-aab9faa16457">not encrypting files</a></li>
<li>And last but not least, HP Cloud&#8217;s Object Storage is in good company by <a href="https://docs.hpcloud.com/api/object-storage">not supporting encryption</a></li>
</ul>
<p>On the <strong>PaaS level</strong> things look pretty much the same: for example, <a href="http://aws.amazon.com/elasticbeanstalk/">AWS Elastic Beanstalk</a> provides no support for encryption of the data (unless you consider S3) and concerning Google&#8217;s App Engine, <a href="http://stackoverflow.com/questions/6040673/encrypting-user-data-on-app-engine">good practices for data encryption</a> only seem to emerge.</p>
<p>Offerings on the <strong>SaaS level</strong> provide an equally poor picture:</p>
<ul>
<li>Dropbox offers encryption <a href="https://www.dropbox.com/help/27/en">via S3</a>.</li>
<li>Google Drive and Microsoft Skydrive seem to not offer any encryption options for storage.</li>
<li>Apple&#8217;s iCloud is a notable exception: not only does it provide support but also <a href="http://support.apple.com/kb/ht4865">nicely explains it</a>.</li>
<li>For many if not most of the above SaaS-level offerings there are plug-ins that enable encryption, such as provided by <a href="http://www.syncdocs.com/how-to-set-up-google-drive-encryption/">Syncdocs</a> or <a href="http://www.cloudfogger.com/en/">CloudFlogger</a></li>
</ul>
<p>In <strong>Hadoop-land</strong> things also look rather <a href="http://stackoverflow.com/questions/7649936/using-encryption-with-hadoop">sobering</a>; there are few activities around making HDFS or the likes do encryption such as <a href="https://launchpad.net/ecryptfs">ecryptfs</a> or <a href="http://www.gazzang.com/encrypt-hadoop">Gazzang&#8217;s</a> offering. Last but not least: for Hadoop in the cloud, encryption is available via AWS&#8217;s EMR by using S3.</p>
<br />Filed under: <a href='http://webofdata.wordpress.com/category/big-data/'>Big Data</a>, <a href='http://webofdata.wordpress.com/category/cloud-computing/'>Cloud Computing</a>, <a href='http://webofdata.wordpress.com/category/fyi/'>FYI</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webofdata.wordpress.com/856/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webofdata.wordpress.com/856/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=856&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webofdata.wordpress.com/2013/03/24/cloud-encryption-support/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/5c3807aaaf0ffefe6c75e3dbbb8588b5?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">mhausenblas</media:title>
		</media:content>
	</item>
		<item>
		<title>Elephant filet</title>
		<link>http://webofdata.wordpress.com/2013/03/10/elephant-filet/</link>
		<comments>http://webofdata.wordpress.com/2013/03/10/elephant-filet/#comments</comments>
		<pubDate>Sun, 10 Mar 2013 13:37:46 +0000</pubDate>
		<dc:creator>mhausenblas</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[FYI]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[multitenancy]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[SDN]]></category>
		<category><![CDATA[sharing]]></category>
		<category><![CDATA[throughput]]></category>
		<category><![CDATA[utilisation]]></category>

		<guid isPermaLink="false">http://webofdata.wordpress.com/?p=843</guid>
		<description><![CDATA[In situations where Hadoop is used in a shared setup we witness two competing forces:  the user expects performance vs. the view of the cluster owner who aims to optimise throughput and maximise utilisation. In the post, Michael elaborates a bit on challenges and solutions on this topic.  <a href="http://webofdata.wordpress.com/2013/03/10/elephant-filet/">Continue reading <span class="meta-nav">&#187;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=843&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>End of January I participated in a panel discussion on <a title="Big Data Architecture, Sizing and Scaling: From Basics to Warehouse Scale" href="https://www.ciscolive365.com/connect/sessionDetail.ww?SESSION_ID=6169" target="_blank">Big Data</a>, held during the <a href="http://www.computerweekly.com/guides/Cisco-Live-London-2013" target="_blank">CISCO live</a> event in London. One of my fellow panelists, I believe it was Sean McKeown of CISCO, said there something along the line:</p>
<blockquote><p>&#8230; ideally the cluster is at 99% utilisation, concerning CPU, I/O, and network &#8230;</p></blockquote>
<p>This stuck in my head and I gave it some thoughts. In the following I will elaborate a bit on this in the context of where Hadoop is used in a shared setup, for example in <a href="http://webofdata.wordpress.com/2012/11/08/hosted-mapreduce-hadoop/" target="_blank">hosted offerings</a> or, say, within an enterprise that runs different systems such as Storm, Lucene/Solr, and Hadoop on one cluster.</p>
<p>In essence, we witness two competing forces: from the perspective of a single user who expects performance vs. the view of the cluster owner or operator who wants to optimise throughput and maximise utilisation. If you&#8217;re not familiar with these terms you might want to read up on Cary Millsap&#8217;s Thinking Clearly About Performance (<a href="http://cacm.acm.org/magazines/2010/9/98033-thinking-clearly-about-performance-part-1/fulltext" target="_blank">part 1</a> | <a href="http://cacm.acm.org/magazines/2010/10/99486-thinking-clearly-about-performance-part-2/fulltext" target="_blank">part 2</a>).</p>
<p>Now, in such as shared setup we may experience a spectrum of loads: from compute intensive over I/O intensive to communication intensive, illustrated in the following, not overly scientific figure:<br />
<a href="http://webofdata.files.wordpress.com/2013/03/shared-hadoop.png"><img class="aligncenter size-large wp-image-844" alt="Utilisations" src="http://webofdata.files.wordpress.com/2013/03/shared-hadoop.png?w=750&#038;h=498" width="750" height="498" /></a></p>
<p>Here are a some observations and thoughts for potential starting points of deeper research or experiments.</p>
<p><strong>Multitenancy</strong>. We see more and more deployments that require <a href="http://www.mapr.com/company/press-releases/mapr-enables-hadoop-as-a-service-with-multi-tenancy-security-and-end-to-end-management" target="_blank">strong</a> <a href="http://www.slideshare.net/treasure-data/hadoop-meets-cloud-with-multitenancy-16107610" target="_blank">support</a> <a href="http://extremehadoop.wordpress.com/tag/multitenant/" target="_blank">for</a> multitenancy; check out the <a href="http://hadoop.apache.org/docs/r1.1.1/capacity_scheduler.html" target="_blank">CapacityScheduler</a>, learn from <a href="http://developer.yahoo.com/blogs/hadoop/posts/2010/08/apache_hadoop_best_practices_a/" target="_blank">best practices</a> or use a distribution that natively supports the specification of <a href="http://www.mapr.com/doc/display/MapR/Node+Topology" target="_blank">topologies</a>. Additionally, you might still want to keep an eye on <a href="http://serengeti.cloudfoundry.com/" target="_blank">Serengeti</a> &#8211; VMware&#8217;s Hadoop virtualisation project &#8211; that seems to have gone quiet in the past months, but I still have hope for it.</p>
<p><strong>Software Defined Networks (SDN)</strong>. See Wikipedia&#8217;s <a href="http://en.wikipedia.org/wiki/Software-defined_networking" target="_blank">definition</a> for it, it&#8217;s not too bad. CISCO, for example, is <a href="http://blogs.cisco.com/tag/software-defined-networking/" target="_blank">very active</a> in this area and only recently there was a special issue in the recent IEEE Communications Magazine (<a href="http://dl.comsoc.org/comsocdl/?publication=TOC2631&amp;label=IEEE%20Communications%20Magazine%2C%202013%20February" target="_blank">February 2013</a>) covering SDN research. I can perfectly see &#8211; and indeed this was also briefly discussed on our CISCO live panel back in January &#8211; how SDN can enable new ways to optimise throughput and performance. Imagine a SDN that is dynamically workload-aware in the sense of that it knows the difference of a node that runs a <a href="http://wiki.apache.org/hadoop/TaskTracker">task tracker</a> vs. a <a href="http://wiki.apache.org/hadoop/DataNode">data node</a> vs. a <a href="http://wiki.apache.org/solr/SolrCloud">Solr shard</a> &#8211; it should be possible to transparently better the operational parameters and everyone involved, both the users as well as the cluster owner benefit from it.</p>
<p>As usual, I&#8217;m very interested in what you think about the topic and looking forward learning about resources in this space from you.</p>
<br />Filed under: <a href='http://webofdata.wordpress.com/category/big-data/'>Big Data</a>, <a href='http://webofdata.wordpress.com/category/cloud-computing/'>Cloud Computing</a>, <a href='http://webofdata.wordpress.com/category/fyi/'>FYI</a>, <a href='http://webofdata.wordpress.com/category/nosql/'>NoSQL</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webofdata.wordpress.com/843/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webofdata.wordpress.com/843/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=843&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webofdata.wordpress.com/2013/03/10/elephant-filet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/5c3807aaaf0ffefe6c75e3dbbb8588b5?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">mhausenblas</media:title>
		</media:content>

		<media:content url="http://webofdata.files.wordpress.com/2013/03/shared-hadoop.png?w=750" medium="image">
			<media:title type="html">Utilisations</media:title>
		</media:content>
	</item>
		<item>
		<title>Hosted MapReduce and Hadoop offerings</title>
		<link>http://webofdata.wordpress.com/2012/11/08/hosted-mapreduce-hadoop/</link>
		<comments>http://webofdata.wordpress.com/2012/11/08/hosted-mapreduce-hadoop/#comments</comments>
		<pubDate>Thu, 08 Nov 2012 09:34:47 +0000</pubDate>
		<dc:creator>mhausenblas</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[FYI]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Azure]]></category>
		<category><![CDATA[Compute Engine]]></category>
		<category><![CDATA[EMR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[SaaS]]></category>

		<guid isPermaLink="false">http://webofdata.wordpress.com/?p=820</guid>
		<description><![CDATA[Today&#8217;s question is: where are we regarding MapReduce/Hadoop in the cloud? That is, what are the offerings of Hadoop-as-a-Service or other hosted MapReduce implementations, currently? A year ago, InfoQ ran a story Hadoop-as-a-Service from Amazon, Cloudera, Microsoft and IBM which will serve us as a baseline here. This article contains the following statement: According to &#8230; <a href="http://webofdata.wordpress.com/2012/11/08/hosted-mapreduce-hadoop/">Continue reading <span class="meta-nav">&#187;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=820&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://webofdata.files.wordpress.com/2012/11/mr-in-the-cloud.png"><img class=" wp-image-822 alignleft" title="Hadoop in the cloud" alt="Hadoop in the cloud" src="http://webofdata.files.wordpress.com/2012/11/mr-in-the-cloud.png?w=224&#038;h=134" height="134" width="224" /></a></p>
<p>Today&#8217;s question is: <em>where are we regarding MapReduce/Hadoop in the cloud?</em> That is, what are the offerings of Hadoop-as-a-Service or other hosted MapReduce implementations, currently?</p>
<p>A year ago, InfoQ ran a story <a href="http://www.infoq.com/news/2011/10/Hadoop-as-a-Service" target="_blank">Hadoop-as-a-Service from Amazon, Cloudera, Microsoft and IBM</a> which will serve us as a baseline here. This article contains the following statement:</p>
<blockquote><p>According to a 2011 TDWI survey, 34% of the companies use big data analytics to help them making decisions. Big data and Hadoop seem to be playing an important role in the future.</p></blockquote>
<p>One year later, we learn from a recent MarketsAndMarkets study, <a href="http://www.marketsandmarkets.com/Market-Reports/hadoop-market-766.html" target="_blank">Hadoop &amp; Big Data Analytics Market &#8211; Trends, Geographical Analysis &amp; Worldwide Market Forecasts (2012 – 2017)</a> that &#8230;</p>
<blockquote><p>The Hadoop market in 2012 is worth $1.5 billion and is expected to grow to about $13.9 billion by 2017, at a [Compound Annual Growth Rate] of 54.9% from 2012 to 2017.</p></blockquote>
<p>In the past year there have also been some quite <a href="http://gigaom.com/2012/08/13/considerations-for-hadoop-in-the-cloud/" target="_blank">vivid</a> <a href="http://cloudconexpoconference2012.sched.org/event/7a61ae390b77c773c1808e977d93f44c#.UJoBS2morOI" target="_blank">discussions</a> around the topic &#8216;Hadoop in the cloud&#8217;.</p>
<p>So, here are some current offerings and announcements I&#8217;m aware of:</p>
<ul>
<li>Amazon&#8217;s Elastic MapReduce (<a href="http://aws.amazon.com/elasticmapreduce/" target="_blank">EMR</a>), featuring MapR&#8217;s rock-solid and fast <a href="http://aws.amazon.com/elasticmapreduce/mapr/" target="_blank">Hadoop distribution</a>.</li>
<li>Google&#8217;s App Engine, a PaaS offering, allows for <a href="https://developers.google.com/appengine/docs/python/dataprocessing/overview" target="_blank">experimental MapReduce processing in Python</a>.</li>
<li>Microsoft&#8217;s Azure, also a PaaS offering, now has <a href="https://www.hadooponazure.com/" target="_blank">Hadoop support</a>.</li>
<li>VMware has launched <a href="http://serengeti.cloudfoundry.com/">Project Serengeti</a> to enable rapid deployment of Hadoop clusters in their Cloud Foundry environment.</li>
<li>HStreaming&#8217;s Cloud Beta <a href="http://www.hstreaming.com/products/cloud/" target="_blank">hooked up with AWS</a> as well.</li>
<li>There is a report on a <a href="http://jaigak.blogspot.ie/2012/07/paas-on-hadoop-yarn-idea-and-prototype.html" target="_blank">PaaS on Hadoop Yarn &#8211; Idea and Prototype</a> available.</li>
</ul>
<p>&#8230; and now it&#8217;s up to you dear reader &#8211; I would appreciate it if you could point me to more offerings and/or announcements you know of, concerning MapReduce and Hadoop in the cloud!</p>
<br />Filed under: <a href='http://webofdata.wordpress.com/category/big-data/'>Big Data</a>, <a href='http://webofdata.wordpress.com/category/cloud-computing/'>Cloud Computing</a>, <a href='http://webofdata.wordpress.com/category/fyi/'>FYI</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webofdata.wordpress.com/820/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webofdata.wordpress.com/820/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=820&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webofdata.wordpress.com/2012/11/08/hosted-mapreduce-hadoop/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/5c3807aaaf0ffefe6c75e3dbbb8588b5?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">mhausenblas</media:title>
		</media:content>

		<media:content url="http://webofdata.files.wordpress.com/2012/11/mr-in-the-cloud.png" medium="image">
			<media:title type="html">Hadoop in the cloud</media:title>
		</media:content>
	</item>
		<item>
		<title>MapReduce for and with the kids</title>
		<link>http://webofdata.wordpress.com/2012/11/05/mapreduce-for-kids/</link>
		<comments>http://webofdata.wordpress.com/2012/11/05/mapreduce-for-kids/#comments</comments>
		<pubDate>Mon, 05 Nov 2012 11:22:21 +0000</pubDate>
		<dc:creator>mhausenblas</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Experiment]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[MapReduce]]></category>

		<guid isPermaLink="false">http://webofdata.wordpress.com/?p=808</guid>
		<description><![CDATA[Last week was Halloween and of course we went trick-or-treating with our three kids which resulted in piles of sweets in the living room. Powered by the sugar, the kids would stay up late to count their harvest and while I was observing them at it, I was wondering if it possible to explain the &#8230; <a href="http://webofdata.wordpress.com/2012/11/05/mapreduce-for-kids/">Continue reading <span class="meta-nav">&#187;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=808&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Last week was Halloween and of course we went trick-or-treating with our three kids which resulted in piles of sweets in the living room. Powered by the sugar, the kids would stay up late to count their harvest and while I was observing them at it, I was wondering if it possible to explain the <a href="http://research.google.com/archive/mapreduce.html">MapReduce</a> paradigm to them, or even better: <strong>doing</strong> MapReduce with them. </p>
<p>Now, it turns out that Halloween and counting kinds of sweets are a perfect setup. Have a look at the following:</p>
<p><a href="http://webofdata.files.wordpress.com/2012/11/mr-halloween.png"><img class="aligncenter size-medium wp-image-810" title="MapReduce for counting kinds of sweet after Halloween harvest." alt="MapReduce for counting kinds of sweet after Halloween harvest." src="http://webofdata.files.wordpress.com/2012/11/mr-halloween.png?w=500" width="500" /></a></p>
<p>So, the goal was to figure how many sweets of a certain kind (like, Twix) we now have available overall, for consumption. </p>
<p>We started off with every child having her or his pile of sweets in front of them. Now, in the first step I&#8217;d ask the kids to shout how many of the sweet X they have in their own pile. So one kid would go like <em>I&#8217;ve got 4 fizzers</em>, etc. &#8230; and then we&#8217;d gather all the same sweets and their respective counts together. Second, we&#8217;d add up the individual counts for each kind of sweet which would give us the desired result: number of X in total.</p>
<p><em>Lesson learned: MapReduce is a child&#8217;s play. Making kids sharing sweets is certainly not &#8211; believe me, I speak out of experience</em> <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<br />Filed under: <a href='http://webofdata.wordpress.com/category/big-data/'>Big Data</a>, <a href='http://webofdata.wordpress.com/category/cloud-computing/'>Cloud Computing</a>, <a href='http://webofdata.wordpress.com/category/experiment/'>Experiment</a>, <a href='http://webofdata.wordpress.com/category/nosql/'>NoSQL</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webofdata.wordpress.com/808/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webofdata.wordpress.com/808/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=808&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webofdata.wordpress.com/2012/11/05/mapreduce-for-kids/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/5c3807aaaf0ffefe6c75e3dbbb8588b5?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">mhausenblas</media:title>
		</media:content>

		<media:content url="http://webofdata.files.wordpress.com/2012/11/mr-halloween.png?w=500" medium="image">
			<media:title type="html">MapReduce for counting kinds of sweet after Halloween harvest.</media:title>
		</media:content>
	</item>
		<item>
		<title>Denormalizing graph-shaped data</title>
		<link>http://webofdata.wordpress.com/2012/09/20/denormalising-graph-shaped-data/</link>
		<comments>http://webofdata.wordpress.com/2012/09/20/denormalising-graph-shaped-data/#comments</comments>
		<pubDate>Thu, 20 Sep 2012 10:26:23 +0000</pubDate>
		<dc:creator>mhausenblas</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Idea]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[denormalization]]></category>
		<category><![CDATA[graphdb]]></category>

		<guid isPermaLink="false">http://webofdata.wordpress.com/?p=803</guid>
		<description><![CDATA[As nicely pointed out by Ilya Katsov: Denormalization can be defined as the copying of the same data into multiple documents or tables in order to simplify/optimize query processing or to fit the user’s data into a particular data model. So, I was wondering, why is &#8211; in Ilya&#8217;s write-up &#8211; denormalization not considered to be &#8230; <a href="http://webofdata.wordpress.com/2012/09/20/denormalising-graph-shaped-data/">Continue reading <span class="meta-nav">&#187;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=803&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>As nicely <a href="http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/">pointed out</a> by Ilya Katsov:</p>
<blockquote><p>Denormalization can be defined as the copying of the same data into multiple documents or tables in order to simplify/optimize query processing or to fit the user’s data into a particular data model.</p></blockquote>
<p>So, I was wondering, why is &#8211; in Ilya&#8217;s write-up &#8211; <strong>denormalization</strong> not considered to be applicable for GraphDBs?</p>
<p>I suppose the main reason is that the relationships (or links as we use to call them in the Linked Data world) are typically not resolved or dereferenced, which means <a href="https://github.com/tinkerpop/gremlin/wiki">traversing</a> the graph is fast, but for a number of <a href="http://www.ovaistariq.net/199/databases-normalization-or-denormalization-which-is-the-better-technique/">operations</a> such as range queries, denormalized data would be better.</p>
<p>Now, the question is: can we achieve this in GraphDBs, incl. <a href="http://www.garshol.priv.no/blog/231.html">RDF stores</a>? I would hope so. Here are some design ideas:</p>
<ul>
<li>Up-front: when inserting new data items (nodes), immediately dereference the links (embedded links).</li>
<li>Query-time: apply database <a href="http://www.cwi.nl/2010/1082/databasecracking">cracking</a>.</li>
</ul>
<p>Here is the question for you, dear reader: <em>are you aware of people doing this? My google skills have failed me so far &#8211; happy to learn about it in greater detail!</em></p>
<br />Filed under: <a href='http://webofdata.wordpress.com/category/big-data/'>Big Data</a>, <a href='http://webofdata.wordpress.com/category/cloud-computing/'>Cloud Computing</a>, <a href='http://webofdata.wordpress.com/category/idea/'>Idea</a>, <a href='http://webofdata.wordpress.com/category/linked-data/'>Linked Data</a>, <a href='http://webofdata.wordpress.com/category/nosql/'>NoSQL</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webofdata.wordpress.com/803/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webofdata.wordpress.com/803/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=803&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webofdata.wordpress.com/2012/09/20/denormalising-graph-shaped-data/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/5c3807aaaf0ffefe6c75e3dbbb8588b5?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">mhausenblas</media:title>
		</media:content>
	</item>
		<item>
		<title>Interactive analysis of large-scale datasets</title>
		<link>http://webofdata.wordpress.com/2012/09/02/large-scale-interactive-analysis/</link>
		<comments>http://webofdata.wordpress.com/2012/09/02/large-scale-interactive-analysis/#comments</comments>
		<pubDate>Sun, 02 Sep 2012 19:23:07 +0000</pubDate>
		<dc:creator>mhausenblas</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[FYI]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[BigQuery]]></category>
		<category><![CDATA[datastore]]></category>
		<category><![CDATA[Dremel]]></category>
		<category><![CDATA[Drill]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[interactive]]></category>
		<category><![CDATA[large-sc]]></category>
		<category><![CDATA[MapReduce]]></category>

		<guid isPermaLink="false">http://webofdata.wordpress.com/?p=790</guid>
		<description><![CDATA[The value of large-scale datasets &#8211; stemming from IoT sensors, end-user and business transactions, social networks, search engine logs, etc. &#8211; apparently lies in the patterns buried deep inside them. Being able to identify these patterns, analyzing them is vital. Be it for detecting fraud, determining a new customer segment or predicting a trend. As &#8230; <a href="http://webofdata.wordpress.com/2012/09/02/large-scale-interactive-analysis/">Continue reading <span class="meta-nav">&#187;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=790&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The value of large-scale datasets &#8211; stemming from IoT sensors, end-user and business transactions, social networks, search engine logs, etc. &#8211; apparently lies in the patterns buried deep inside them. Being able to identify these patterns, analyzing them is vital. Be it for detecting fraud, determining a new customer segment or predicting a trend. As we&#8217;re moving from the billions to trillions of records (or: from the terabyte to peta- and exabyte scale) the more &#8216;traditional&#8217; methods, including MapReduce seem to have reached the end of their capabilities. The question is: what now?</p>
<p>But a second issue has to be addressed as well: in contrast to what current large-scale data processing solutions provide for in batch-mode (arbitrarily but in line with the state-of-the-art defined as any query that takes longer than 10 sec to execute) the need for <strong>interactive analysis</strong> increases. Complementary, <a href="http://en.wikipedia.org/wiki/Visual_analytics">visual analytics</a> may or may not be helpful but come with their <a title="The Top 10 Challenges in Extreme-Scale  Visual Analytics (IEEE Computer Graphics and Applications)" href="http://www.computer.org/cms/Computer.org/ComputingNow/homepage/2012/0812/W_CG_TheTop10Challenges.pdf">own set of challenges</a>.</p>
<p>Recently, a proposal for a new Apache Incubator group called <a href="http://wiki.apache.org/incubator/DrillProposal">Drill</a> has been made. This group aims at building a:</p>
<blockquote><p>&#8230; distributed system for interactive analysis of large-scale datasets [...] It is a design goal to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds.</p></blockquote>
<p>Drill&#8217;s design is supposed to be informed by Google&#8217;s <a href="http://research.google.com/pubs/pub36632.html">Dremel</a> and wants to efficiently process nested data (think: <a href="http://code.google.com/p/protobuf/">Protocol Buffers</a>). You can learn more about requirements and design considerations from Tomer Shiran&#8217;s <a href="http://wiki.apache.org/incubator/DrillProposal?action=AttachFile&amp;do=get&amp;target=Drill+slides.pdf">slide set</a>.</p>
<p>In order to better understand where Drill fits in in the overall picture, have a look at the following (admittedly naïve) plot that tries to place it in relation to well-known and deployed data processing systems:</p>
<p><a href="http://webofdata.files.wordpress.com/2012/09/a-comparison-of-large-scale-data-processing-systems.png"><img class="aligncenter size-full wp-image-799" title="A comparison of large-scale data processing systems" src="http://webofdata.files.wordpress.com/2012/09/a-comparison-of-large-scale-data-processing-systems.png?w=750&#038;h=527" alt="" width="750" height="527" /></a></p>
<p>BTW, if you want to test-drive Dremel, you can do this already today; it&#8217;s an <a href="http://en.wikipedia.org/wiki/Cloud_computing#Infrastructure_as_a_service_.28IaaS.29">IaaS</a> service offered in Google&#8217;s cloud computing suite, called <a href="https://developers.google.com/bigquery/">BigQuery</a>.</p>
<br />Filed under: <a href='http://webofdata.wordpress.com/category/big-data/'>Big Data</a>, <a href='http://webofdata.wordpress.com/category/cloud-computing/'>Cloud Computing</a>, <a href='http://webofdata.wordpress.com/category/fyi/'>FYI</a>, <a href='http://webofdata.wordpress.com/category/nosql/'>NoSQL</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webofdata.wordpress.com/790/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webofdata.wordpress.com/790/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=790&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webofdata.wordpress.com/2012/09/02/large-scale-interactive-analysis/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/5c3807aaaf0ffefe6c75e3dbbb8588b5?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">mhausenblas</media:title>
		</media:content>

		<media:content url="http://webofdata.files.wordpress.com/2012/09/a-comparison-of-large-scale-data-processing-systems.png" medium="image">
			<media:title type="html">A comparison of large-scale data processing systems</media:title>
		</media:content>
	</item>
		<item>
		<title>Why I luv JSON &#8230;</title>
		<link>http://webofdata.wordpress.com/2012/03/24/why-i-luv-json/</link>
		<comments>http://webofdata.wordpress.com/2012/03/24/why-i-luv-json/#comments</comments>
		<pubDate>Sat, 24 Mar 2012 09:43:47 +0000</pubDate>
		<dc:creator>mhausenblas</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[FYI]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[backend]]></category>
		<category><![CDATA[data exchange]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[datastore]]></category>
		<category><![CDATA[frontend]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[Web data]]></category>

		<guid isPermaLink="false">http://webofdata.wordpress.com/?p=751</guid>
		<description><![CDATA[&#8230; because it&#8217;s simple, agnostic and an end-to-end solution. Wat? OK, let&#8217;s slow down a bit and go through the above keywords step by step. Simple Over 150 frameworks, libraries and tools directly support JSON in over 30 (!) languages. This might well be because the entire specification (incl. ToC, all the legal stuff and &#8230; <a href="http://webofdata.wordpress.com/2012/03/24/why-i-luv-json/">Continue reading <span class="meta-nav">&#187;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=751&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>&#8230; because it&#8217;s <em>simple</em>, <em>agnostic</em> and an <em>end-to-end</em> solution.</p>
<p><strong>Wat?</strong></p>
<p>OK, let&#8217;s slow down a bit and go through the above keywords step by step.</p>
<h3>Simple</h3>
<p>Over 150 frameworks, libraries and tools directly <a href="http://json.org/">support JSON</a> in over 30 (!) languages. This might well be because the <a href="http://tools.ietf.org/html/rfc4627">entire specification</a> (incl. ToC, all the legal stuff and contact information) is only 10 pages long, printed. To implement support for JSON in any given language, that is, parsing/mapping to native objects/types, is very very cheap and straight forward.</p>
<h3>Agnostic</h3>
<p>Just as HTTP is agnostic to the payload &#8211; you can transfer HTML over HTTP but also any other kind of representation incl. binary stuff &#8211; with JSON you have something really agnostic at hand. Want to encode a Key-Value list, JSON can do it. Need to represent any given tree in JSON &#8211; <a href="http://www.w3.org/TR/rdf-sparql-json-res/">no problem</a>. A graph serialised in JSON? Of course <a href="http://webofdata.wordpress.com/2012/02/05/json-http-data-links/">possible</a>! I suppose this flexibility makes JSON attractive for a lot of different people, having a multitude of use cases in mind.</p>
<h3>End-to-end</h3>
<p>What I mean with this is that JSON is available and used throughout, from <em>front</em>-end to <em>back</em>-end:</p>
<ul>
<li>Front-end examples: jQuery, Dojo, etc.</li>
<li>Back-end examples: MongoDB, CouchDB, <a href="http://www.elasticsearch.org/">elasticsearch</a>, <a href="http://nodejs.org/">Node.js</a>, etc.</li>
</ul>
<p>OK, I reckon it is time to say: &#8216;Thank you, <a href="http://www.crockford.com/">Doug</a>!&#8217; in case you haven&#8217;t done it today, yet <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<br />Filed under: <a href='http://webofdata.wordpress.com/category/big-data/'>Big Data</a>, <a href='http://webofdata.wordpress.com/category/cloud-computing/'>Cloud Computing</a>, <a href='http://webofdata.wordpress.com/category/fyi/'>FYI</a>, <a href='http://webofdata.wordpress.com/category/nosql/'>NoSQL</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webofdata.wordpress.com/751/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webofdata.wordpress.com/751/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=751&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webofdata.wordpress.com/2012/03/24/why-i-luv-json/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/5c3807aaaf0ffefe6c75e3dbbb8588b5?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">mhausenblas</media:title>
		</media:content>
	</item>
		<item>
		<title>Hosted NoSQL</title>
		<link>http://webofdata.wordpress.com/2012/03/18/hosted-nosql/</link>
		<comments>http://webofdata.wordpress.com/2012/03/18/hosted-nosql/#comments</comments>
		<pubDate>Sun, 18 Mar 2012 07:26:50 +0000</pubDate>
		<dc:creator>mhausenblas</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[App Engine]]></category>
		<category><![CDATA[BigQuery]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[DynamoDB]]></category>
		<category><![CDATA[GAE]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Joyent]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[Neo4j]]></category>
		<category><![CDATA[RDF]]></category>
		<category><![CDATA[riak]]></category>
		<category><![CDATA[SimpleDB]]></category>
		<category><![CDATA[SPARQL]]></category>

		<guid isPermaLink="false">http://webofdata.wordpress.com/?p=742</guid>
		<description><![CDATA[I admit I dunno how I got here in the first place &#8230; ah, right, yesterday was Paddy&#8217;s day and I was sitting at home with a sick child. Now, I tinkered around a bit with a hosted CouchDB solution to store/query JSON output from a side-project of mine. Then I thought: where are we &#8230; <a href="http://webofdata.wordpress.com/2012/03/18/hosted-nosql/">Continue reading <span class="meta-nav">&#187;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=742&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I admit I dunno how I got here in the first place &#8230; ah, right, yesterday was Paddy&#8217;s day and I was sitting at home with a sick child. Now, I tinkered around a bit with a hosted CouchDB solution to store/query JSON output from a <a href="https://github.com/mhausenblas/racoon">side-project</a> of mine. </p>
<p>Then I thought: where are we re hosted NoSQL in general? Seems others had that question <a href="http://www.quora.com/Is-there-a-sufficiently-large-market-opportunity-for-hosted-NoSQL-platforms-Cassandra-Riak-MongoDB-Redis-etc">as well</a>. So I sat down and here is a (naturally incomplete) list of so called NoSQL datastores that are available &#8216;in the cloud&#8217;. Most of them with an established <a href="http://en.wikipedia.org/wiki/Freemium">freemium</a> model, few of them in (public) beta. In terms of type (K/V, wide-column, doc, graph) we find quite everything, incl. proprietary types &#8211; like Amazon and Google have &#8211; where it&#8217;s sorta hard to tell what kind of beasts they are. Not that it matters, but for completeness <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>OK, nuff time wasted, here we go:</p>
<h3>Amazon&#8217;s hosted NoSQL datastores</h3>
<p>Both <a href="http://aws.amazon.com/simpledb/">SimpleDB</a> and <a href="http://aws.amazon.com/dynamodb/">DynamoDB</a> are sorta key-value stores where the latter seems to be for more serious business (scale out). They explain the <a href="http://aws.amazon.com/dynamodb/faqs/#How_does_Amazon_DynamoDB_differ_from_Amazon_SimpleDB_Which_should_I_use">difference between SimpleDB and DynamoDB</a> in detail. Pricing is in place, looks sensible. I have not tried any of these yet.</p>
<h3>Google&#8217;s hosted NoSQL datastores</h3>
<p>Tightly integrated with Google App Engine (GAE) comes the <a href="http://code.google.com/appengine/docs/python/datastore/">datastore</a> with its own query language. If you&#8217;re on GAE, this is what you get and what you have to use, anyways. And then, since a bit more than a year there is <a href="https://developers.google.com/bigquery/">BigQuery</a> with which I&#8217;ve been <a href="http://code.google.com/p/bigquery-linkeddata/">toying around</a> now for a year or so. Very performant and powerful but not the most obvious and clear pricing strategy.</p>
<h3>Joyent&#8217;s Riak</h3>
<p>Joyent offers a so called <a href="http://www.joyentcloud.com/products/appliances/riak-smartmachine/">Riak Smartmachine</a>. I have been <a href="http://webofdata.wordpress.com/2010/10/14/riak-for-linked-data/">toying around with Riak</a> a while ago but haven&#8217;t found time to test Joyent&#8217;s Riak offering (though I&#8217;m pleased with their <a href="https://no.de/">Node.js</a> offering, hence assuming similar level of service, documentation, etc.).</p>
<h3>Cassandra in the cloud</h3>
<p>I only found <em>one</em> <a href="http://cassandra.io/">hosted Cassandra offering</a>. Can that be? Didn&#8217;t look closer. Anyone?</p>
<h3>CouchDB</h3>
<p>So, both <a href="http://cloudno.de/">cloudno.de</a> and <a href="http://cloudant.com">Cloudant</a> offer hosted CouchDB instances (the former also offers Redis). I am currently using the free plan (&#8216;Oxygen&#8217;) with Cloudant and find it very straight-forward and easy to use. Prizing looks OK in both cases though I sometimes find it hard to pick the &#8216;best fit&#8217; for a given workload. Could anyone write an app please that does this for me? <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h3>MongoDB</h3>
<p>Also for MongoDB I was able to spot two offerings: <a href="http://mongohq.com">MongoHQ</a> seems somewhat to be the established player in the field, nice docs and sensible princing. Apparently, Joyent is also offering a <a href="http://www.joyentcloud.com/products/appliances/mongodb-smartmachine/">MongoDB Smartmachine</a> &#8211; anyone tried it?</p>
<h3>Graph datastores</h3>
<p>There are quite some offerings in this area: the general-purpose sort of graph data stores and the <a href="http://www.w3.org/RDF/">RDF</a>-focused ones. In the former category there is <a href="http://devcenter.heroku.com/articles/neo4j">Neo4j&#8217;s Heroku add-on</a> which I had the pleasure to test drive and found it very useable and useful. And then there is an OrientDB-based offering called <a href="http://www.nuvolabase.com/site/">Nuvolabase</a>; I have signed up and tried it out some weeks ago and I must say I really like it. Disclaimer: I know the <a href="http://www.linkedin.com/in/garulli">main person</a> behind OrientDB as we&#8217;ve done a joint (research) project some years ago.</p>
<p>Last but not least: RDF-focused graph datastores in the cloud. I guess my absolute favourite still is <a href="http://dydra.com/">Dydra</a> which I&#8217;ve been using manually (SPARQL endpoint, curl, etc.) and in programmatically, in<a href="https://github.com/mhausenblas/cloudisus/">applications</a>. I <em>think</em> they are still in beta and pricing is not yet announced. And then there is the good old <a href="http://www.talis.com/platform/">Talis Platform</a>, the established cloud-RDF-store for a couple of years now. Any plans known?</p>
<p>UPDATE (2013-05-22): This page has been translated into <a href="http://www.webhostinghub.com/support/es/misc/no-sql">Spanish</a> language by Maria Ramos  from <a href="http://www.webhostinghub.com/support/edu">Webhostinghub.com/support/edu</a>.</p>
<br />Filed under: <a href='http://webofdata.wordpress.com/category/big-data/'>Big Data</a>, <a href='http://webofdata.wordpress.com/category/cloud-computing/'>Cloud Computing</a>, <a href='http://webofdata.wordpress.com/category/nosql/'>NoSQL</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webofdata.wordpress.com/742/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webofdata.wordpress.com/742/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=742&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webofdata.wordpress.com/2012/03/18/hosted-nosql/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/5c3807aaaf0ffefe6c75e3dbbb8588b5?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">mhausenblas</media:title>
		</media:content>
	</item>
		<item>
		<title>Large-Scale Linked Data Processing: Cloud Computing to the Rescue?</title>
		<link>http://webofdata.wordpress.com/2012/03/01/large-scale-linked-data-processing/</link>
		<comments>http://webofdata.wordpress.com/2012/03/01/large-scale-linked-data-processing/#comments</comments>
		<pubDate>Thu, 01 Mar 2012 13:14:22 +0000</pubDate>
		<dc:creator>mhausenblas</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[FYI]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[challenges]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[paper]]></category>
		<category><![CDATA[requirements]]></category>

		<guid isPermaLink="false">http://webofdata.wordpress.com/?p=729</guid>
		<description><![CDATA[At the upcoming 2nd International Conference on Cloud Computing and Services Science (CLOSER 2012) we &#8211; Robert Grossman, Andreas Harth, Philippe Cudré-Mauroux and myself &#8211; will present a paper with the title Large-Scale Linked Data Processing: Cloud Computing to the Rescue? and the following abstract: Processing large volumes of Linked Data requires sophisticated methods and &#8230; <a href="http://webofdata.wordpress.com/2012/03/01/large-scale-linked-data-processing/">Continue reading <span class="meta-nav">&#187;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=729&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>At the upcoming <em>2nd International Conference on Cloud Computing and Services Science</em> (<a href="http://closer.scitevents.org/">CLOSER</a> 2012) we &#8211; <a href="http://rgrossman.com/about/">Robert Grossman</a>, <a href="http://harth.org/andreas/">Andreas Harth</a>, <a href="http://people.csail.mit.edu/pcm/">Philippe Cudré-Mauroux</a> and myself &#8211; will present a paper with the title <strong>Large-Scale Linked Data Processing: Cloud Computing to the Rescue?</strong> and the following abstract:</p>
<blockquote><p>
Processing large volumes of Linked Data requires sophisticated methods and tools. In the recent years we have mainly focused on systems based on relational databases and bespoke systems for Linked Data processing. Cloud computing offerings such as SimpleDB or BigQuery, and cloud-enabled NoSQL systems including Cassandra or CouchDB as well as frameworks such as Hadoop offer appealing alternatives along with great promises concerning performance, scalability and elasticity. In this paper we state a number of Linked Data-specific requirements and review existing cloud computing offerings as well as NoSQL systems that may be used in a cloud computing setup, in terms of their applicability and usefulness for processing datasets on a large-scale.
</p></blockquote>
<p>A <a href='http://webofdata.files.wordpress.com/2012/03/closer12-processing-lod.pdf'>pre-print</a> is available now and if you have any suggestions please let me know.</p>
<br />Filed under: <a href='http://webofdata.wordpress.com/category/big-data/'>Big Data</a>, <a href='http://webofdata.wordpress.com/category/cloud-computing/'>Cloud Computing</a>, <a href='http://webofdata.wordpress.com/category/fyi/'>FYI</a>, <a href='http://webofdata.wordpress.com/category/linked-data/'>Linked Data</a>, <a href='http://webofdata.wordpress.com/category/nosql/'>NoSQL</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webofdata.wordpress.com/729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webofdata.wordpress.com/729/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=729&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webofdata.wordpress.com/2012/03/01/large-scale-linked-data-processing/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/5c3807aaaf0ffefe6c75e3dbbb8588b5?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">mhausenblas</media:title>
		</media:content>
	</item>
		<item>
		<title>Open Data &#8211; a virtual natural resource</title>
		<link>http://webofdata.wordpress.com/2012/01/30/open-data-virtual-natural-resource/</link>
		<comments>http://webofdata.wordpress.com/2012/01/30/open-data-virtual-natural-resource/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 09:13:54 +0000</pubDate>
		<dc:creator>mhausenblas</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Linked Open Data]]></category>
		<category><![CDATA[open data]]></category>

		<guid isPermaLink="false">http://webofdata.wordpress.com/?p=695</guid>
		<description><![CDATA[A virtual natural resource? Doesn&#8217;t make sense, does it? Let me explain. Natural resources are derived from the environment. Many of them are essential for our survival while others are used for satisfying our wants. &#8230; is with Wikipedia says about natural resources. Now, some 150 years ago a handful of people saw the potential &#8230; <a href="http://webofdata.wordpress.com/2012/01/30/open-data-virtual-natural-resource/">Continue reading <span class="meta-nav">&#187;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=695&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A virtual natural resource? Doesn&#8217;t make sense, does it?</p>
<p>Let me explain.</p>
<blockquote><p>Natural resources are derived from the environment. Many of them are essential for our survival while others are used for satisfying our wants.</p></blockquote>
<p>&#8230; is with Wikipedia <a href="http://en.wikipedia.org/wiki/Natural_resource">says</a> about natural resources. </p>
<p>Now, some <a href="http://en.wikipedia.org/wiki/Petroleum#History">150 years ago</a> a handful of people saw the potential of petroleum which is nowadays the basis for a multi-billion dollar industry. Roughly the <a href="http://en.wikipedia.org/wiki/Electricity#History">same</a> holds for electricity. It&#8217;s not exactly that the crude resource is of much interest or, FWIW, might even be dangerous to handle &#8211; ever dipped your fingers into crude oil? ever touched a power outlet with bare hands?</p>
<p>However, as I already <a href="http://webofdata.wordpress.com/2010/11/20/open-data-is-the-electricity-of-the-21st-century/">said</a> a while ago, the <strong>applications</strong> on top of the natural resources are valuable and our modern society couldn&#8217;t do without it.</p>
<p>Back to Wikipedia. The <a href="http://techantropology.blogspot.com/2009/11/anthropology-of-homo-digitalis-and-his.html">Homo Digitalis</a> lives in a digital environment, producing data almost as a side-product of the daily activities and depending on it as it drives the applications.</p>
<p>In this sense, yes, Open Data is <strong>the</strong> virtual natural resource #1 and here to stay. </p>
<p>When will you realise the <a href="http://www.hks.harvard.edu/presspol/publications/papers/discussion_papers/d70_kundra.html">potential</a> of (Linked) Open Data?</p>
<br />Filed under: <a href='http://webofdata.wordpress.com/category/big-data/'>Big Data</a>, <a href='http://webofdata.wordpress.com/category/cloud-computing/'>Cloud Computing</a>, <a href='http://webofdata.wordpress.com/category/linked-data/'>Linked Data</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webofdata.wordpress.com/695/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webofdata.wordpress.com/695/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webofdata.wordpress.com&#038;blog=6169642&#038;post=695&#038;subd=webofdata&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webofdata.wordpress.com/2012/01/30/open-data-virtual-natural-resource/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/5c3807aaaf0ffefe6c75e3dbbb8588b5?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">mhausenblas</media:title>
		</media:content>
	</item>
	</channel>
</rss>
