Best way to use Tessera in a project with a custom GUI

15 views
Skip to first unread message

Jeremiah Rounds

unread,
Sep 24, 2015, 5:24:28 PM9/24/15
to rhipe
Hi,

I have seen this problem before and I am wondering what the thoughts are about it nowadays.


The scenario is this: you are analyzing data on an HDFS with Tessera (Rhipe/datadr) and other scientist in the project want to interact with outputs of Tessera via Javascript ecosystem of visualizations.

For example a stack might be like:
  1. GUI D3.js  
  2. Angular 
  3.  < Blank >
  4. HDFS Tessera Stack Key/Value Pairs in Map Files


Suppose in 4 the Map files are keyed so that to the R user you can rapidly look up any data you need, but that is a processed designed to be done from R, and causes a difficultly if someone else wants to use it. Essentially in layer 1 they want to present specific key/value pair data without starting R.  Is that possible? I remember Saptarshi used to talk about using HBase at Mozilla.  I also heard of just pushing out to MySQL as needed.  Is there any better way to approach this problem. 


Jeremiah Rounds

unread,
Sep 24, 2015, 6:27:03 PM9/24/15
to rhipe
Going to see if Saptarshi's old HBase material still works.

Saptarshi Guha

unread,
Sep 25, 2015, 6:43:05 PM9/25/15
to rh...@googlegroups.com
Hi
If you see my github repo, there is some Java code for Hbase to be
used as an inputformat and and outputformat.


https://github.com/saptarshiguha/RhipeHbaseMozilla

Regarding this

"Suppose in 4 the Map files are keyed so that to the R user you can
rapidly look up any data you need, but that is a processed designed to
be done from R, and causes a difficultly if someone else wants to use
it. "

it's not really done from R. The MapFile format has keys stored in
sorted order. A small fraction is kept aside. When a user seeks a key
K, Hadoop has an API that will tell you which sequence file (and which
block) contains it and once the file is found it is a sequential
search.

You can write wrappers around it or Thrift to Hbase. How would a JS
GUI talk directly to something on the Hadoop ecosystem ?
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "rhipe" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to rhipe+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ryan

unread,
Sep 28, 2015, 11:01:46 AM9/28/15
to rhipe
One option is to push the data out to something the js app can directly talk to.  Although if your data are big enough to be in HDFS, this probably isn't an option.

Another option is to set up a REST service that will send data back to the app when requested by key.  You can do this with OpenCPU (https://www.opencpu.org).  A box with Open CPU, RHIPE, and a connection to the cluster will be all you need.  The js app would either need to know how to construct a key for the data it wants (suppose keys are simply strings) or it would need to know the entire collection of keys.  This is how Trelliscope works - the data are in a mapfile and the viewer knows all the keys so it can look up the data for the subsets being plotted.



On Thursday, September 24, 2015 at 2:24:28 PM UTC-7, Jeremiah Rounds wrote:

Jeremiah Rounds

unread,
Sep 28, 2015, 5:54:24 PM9/28/15
to rhipe
Thanks.

I scanned some serving R over http solutions (not sure REST was in my vocabulary).  While I was investigating the issue, I wrote a super hacked version of what  OpenCPU does with Rook.  Meaning I have R in a screen instance with Rhipe loaded and rhinit()'d responding to GET http lines and upon encountering a well formed GET request, it uses datadr/ddo to get the key/value pair, and responds with the queried value as a JSON.  After I put that up I thought I would ping the world and ask what is the "right" way to do this?
Reply all
Reply to author
Forward
0 new messages