//
you're reading...
demo, Experiment, FYI

Turning tabular data into entities

Two widely used data formats on the Web are CSV and JSON. In order to enable fine-grained access in an hypermedia-oriented fashion I’ve started to work on Tride, a mapping language that takes one or more CSV files as inputs and produces a set of (connected) JSON documents.

In the 2 min demo video I use two CSV files (people.csv and group.csv) as well as a mapping file (group-map.json) to produce a set of interconnected JSON documents.

So, the following mapping file:

{
 "input" : [
  { "name" : "people", "src" : "people.csv" },
  { "name" : "group", "src" : "group.csv" }
 ],
 "map" : {
  "people" : {
   "base" : "http://localhost:8000/people/",
   "output" : "../out/people/",
   "with" : { 
    "fname" : "people.first-name", 
    "lname" : "people.last-name",
    "member" : "link:people.group-id to:group.ID"
   }
  },
  "group" : {
   "base" : "http://localhost:8000/group/",
    "output" : "../out/group/",
    "with" : {
     "title" : "group.title",
     "homepage" : "group.homepage",
     "members" : "where:people.group-id=group.ID link:group.ID to:people.ID"
    }
   }
 }
}

… produces JSON documents representing groups. One concrete example output is shown below:

About these ads

About mhausenblas

Chief Data Engineer EMEA @MapR #bigdata #hadoop #apachedrill

Discussion

6 thoughts on “Turning tabular data into entities

  1. The inverse of this would be massively useful, and put massive amounts of data in to the hands of common people.

    Especially if it was also a google spreadsheet module which allowed one to map data from any JSON source in to a spreadsheet – that would *really* open up the data for consumption.

    Best and KUTGW :)

    Nathan

    Posted by webr3 | 2012-05-11, 08:44
    • Hey there – long time no see! Good to see you’re active again! Yes, I agree, the other direction is also very interesting (though, maybe rather challenging?), gotta look into that. Are you aware of any approaches out there? Anyways, thanks for the hint ;)

      Posted by mhausenblas | 2012-05-11, 08:49
  2. Hi Michael :) Well, you’ve got a good chunk of the approach here as surely those maps can be inverted; then all you need are a few simple functions like function getData(url) { var result = UrlFetchApp.fetch(url); return Utilities.jsonParse(result.getContentText()); } to dereference data and give you objects to work with in script -technically it’s all very simple. I know Melvin is importing game data from game -> data.fm -> google spreadsheet using this approach.

    I guess the only real challenge, or bit I at least am unfamiliar with, would be making a UI or menu options so that it’s usable by the masses – that is all very well documented though, and the scripting lang used is just JS with a nice set of available runtime libraries (which can handle everything from parsing through to OAuth). I honestly believe it’s just a case of RTFM’ing any bits one isn’t comfortable with.

    Best!

    Posted by webr3 | 2012-05-11, 08:58
  3. Hi Michael,
    Nice work with Tride!

    This type of simple data mapping tool is incredibly useful. It’s generic so you don’t have to write the same mapping functions again and again. And on top of that, the business logic is not hard-coded in some obfuscated source code, but explicitly available – and editable – in mapping files

    I have designed and developed a couple of similar data mapping systems for Yahoo! myself, which we are using for turning high-quality entity-oriented feeds into entity-relationship graphs. Over the years, I have redesigned them to make them work at scale, plugged data normalization capabilities, and did my best so they can be used by non-technical persons.

    Any plan towards these directions?

    Best!

    PS: Please ignore/suppress my previous reply. Same content, but wrong WordPress metadata attached to it…

    Posted by Nicolas Torzec | 2012-05-13, 23:21
    • Nicolas,

      Thanks a lot for your kind words, very encouraging! There are plenty of ideas (some of them noted in the To Do section of the README.md file over on GitHub) and others around extending it to the write case (see Mike’s suggestion) and in the inverse direction (JSON2CSV, see Nathan, above). What I wanted to try out was a JavaScript version and some more data manipulation functions. I’d be more than happy to join forces or exchange thoughts on this one, either via GitHub (issues, pull requests, etc.) or we can schedule a skype/G+ chat to discuss this. Anyways, thanks a lot and looking forward hearing from you again!

      Cheers,
      Michael

      Posted by mhausenblas | 2012-05-14, 09:39

Trackbacks/Pingbacks

  1. Pingback: Distributed Weekly 154 — Scott Banwart's Blog - 2012-05-11

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Archives

Follow

Get every new post delivered to your Inbox.

Join 2,151 other followers

%d bloggers like this: