Streaming API client

33 views
Skip to first unread message

Achim Domma

unread,
Jan 8, 2015, 1:04:49 PM1/8/15
to helio...@googlegroups.com
Hi,

trying to better understand new features of HS/SOLR, I was reading this post: http://heliosearch.org/streaming-aggregation-for-solrcloud/

I don't get what's going on under the hood in that case. What exactly is the search result, that is streamed? Assuming I specify a query involving facets. Will those be streamed too?

But the most important question: It looks like the communication is still http. Is this bound to the Java SolrStream class? Our should it be possible to implement such a client (with reasonable effort) in a different language? To be honest: I don't like Java and it would be the last language I would chose for doing data analysis. ;-)

cheers,
Achim

Joel Bernstein

unread,
Jan 8, 2015, 2:34:52 PM1/8/15
to helio...@googlegroups.com
The end result of the Streaming API are Tuples and Metrics. Metrics provide analytics for the stream and Tuples are the records. Facets generated by Heliosearch are not yet included as part of the stream, so you would currently need to generate Metrics on the Stream for analytics.

Under the covers the various stream implementations call out to Heliosearch SolrCloud collections and retrieve JSON records as streams. Those streams can then be transformed by merging, joining, intersecting, grouping etc... This lays the foundation for distributed aggregation and other workflows over search results.

If you don't like java you can port the Java implementations to another language. For a class like CloudSolrStream this would involve calling out to Zookeeper and working with the SolrCloud state information to query the nodes in SolrCloud collections. Classes like MergeJoinStream and HashJoinStream should be fairly straightforward to port.
Reply all
Reply to author
Forward
0 new messages