|Post Processing of Indexed Data with Surifki Refine||AnthonyNystrom||5/9/13 11:42 PM|
Hi everyone, we are avid users of Elasticsearch ourselves as well as for our clients. (http://www.intridea.com) In fact we use Elasticsearch in one of our products, Surfiki. More on that here: http://surfiki.com/about
However that isn't why I am writing.
After using Elasticsearch for quite some time we found the need to do post and inline processing on our data. Some examples are:
1. Combining data from multiple index's in to a new index either adhoc or on a scheduled basis.
1a. Splitting data from a single index in to new index's based upon some criteria
2. Statistical facets had been causing us some heap issues and found an easier solution was to process/transform/count this data and expose within a new index on a set schedule.
3. Transforming data already indexed with additional data. For example, some of our data has location information (lat, long) however we also wanted to store meta data associated with those locations Such as State, City and County information. While it makes sense to do some of this inline, it seemed to make more sense to do it after the initial data was already indexed.
4. Accessing third party API's to append data to existing indexed data.
For this we decided to create a product (Surfiki Refine) that utilizes both multiple combined open source tools as well as new code with the result being a python based map-reduce tier. This met all of our needs for the additional processing we needed to perform as well as more. We are going to release this open source tool in the coming week.
My question is thus;
1. For people who expose their index's publicly via API, is a hosted version of this platform of interest to you? Does hosting it and a minimal cost make sense in order to have a platform ready and running for use against your public API? We are considering a hosted version as a viable commercial offering. As well, we have a distributed version that can tackle more hefty data transforms and manipulation.
Either way, it will be released and you guys/gals can play with it and see if it meets any needs or resolves any issues you are encountering when needed to manipulate data that is already indexed.
I wanted to give a heads up and get your opinion on the above.
Attached are some screenshots of the tool/platform.
Some nice features are:
Completely browser based
Exclusively featured to work with Elasticsearch
Browser based code editing
Job creation and management
Fellow, Managing Director of Engineering
Intridea, Inc. | www.intridea.com
(o) 888.968.4332 x502