Incremental updates with Elephantdb-cascading?

45 views
Skip to first unread message

Jeroen van Dijk

unread,
Apr 10, 2013, 3:17:20 AM4/10/13
to elephan...@googlegroups.com
Hi,

I'm using Elephantdb to serve the results of my Cascalog queries. Currently these queries run infrequently and on the complete dataset so it is ok that the new results replace the old results. Now I want to do more frequent updates with new data only. I'm wondering how this would work with Elephantdb-cascading, is there such a thing as updates? Or does this mean I have to grab the old shards and combine these with the new data to create new shards?

Thanks,
Jeroen

Soren Macbeth

unread,
Apr 10, 2013, 3:29:14 AM4/10/13
to elephan...@googlegroups.com
Correct, 

You need to grab the old shards and combine the results with your new data. Previous versions did have a notion of incremental updates which was handled by the ElephantOutputFormat, but it turned out to be slower in most cases that just doing the update as a mapreduce job. 


--
You received this message because you are subscribed to the Google Groups "elephantdb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elephantdb-us...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
http://about.me/soren

Jeroen van Dijk

unread,
Apr 10, 2013, 3:42:33 AM4/10/13
to elephan...@googlegroups.com
Thanks for the quick confirmation! I think for most of my domains it would not be slow(er) since the datasets are small (counts). There are af few domains where the data is dense that might make it slower.

I'll experiment and report back.  

Op woensdag 10 april 2013 09:29:14 UTC+2 schreef Soren Macbeth het volgende:
Reply all
Reply to author
Forward
0 new messages