Hi,
On Thursday, 26 February 2015 16:23:59 UTC, Lishu Liu wrote:
> Yes, I saw that thread. Thanks Hugo. Do you mean I should build a Kafka cluster to read from Cassandra as stream?
>
No. I was suggesting looking at the Kafka consumer to figure out how to
deal with streams.
Not that I know anything about streaming but I figure that when your read from the Cassandra DB it will always be batch processing. I think what you want is to:
1. Read your daily data as a stream (Kafka or whatever you are using now)
2. Process the stream as you want to before placing into the Cassandra DB
3. then storing it into Cassandra
Spark supports streaming via a StreamingContext (see [1]). So use that for your
stream processing. For common use-case see [2]. You could get fancier and
think of multiple streams. Maybe one for pre-processing data before it goes to Casandra and another for real-time statistics and monitoring.
If however you want the Cassandra DB to be the source of a stream, then page 37
of the slides in [3] is what you are looking for.
HTHs
HF
[1]
https://spark.apache.org/streaming/
[2]
http://www.slideshare.net/helenaedelson/streaming-bigdata-helenawebinarv3
[3]
http://www.slideshare.net/helenaedelson?utm_campaign=profiletracking&utm_medium=sssite&utm_source=ssslideview