Dear Matei, Tathagata,Could you guys provide me an example with explaination sliding window (in Spark streaming). I was confusing about the relationship between batchInterval-windowDuration-slideDuration.
--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Thanks so much for your explaination,I would like to ask more specifically:1. About batchInterval:Given pseudo code block:inputStream = readStream("http://...", "2s")//start Block_1ones = inputStream.map(event => (event.url, 1))counts = ones.runningReduce((a, b) => a + b)//end Block_1So, does that mean the Block_1 runs on every 2 secs (i did think this case = while(true) { execute Block_1; sleep(2s); } )?And, would the lifecycle of DStream inputStream be like following ?:@t_0: execute Block_1 with inputStream = [RDD@t_0]@t_2: execute Block_1 with inputStream = [RDD@t_0, RDD@t_2]@t_4: execute Block_1 with inputStream = [RDD@t_0, RDD@t_2, RDD@t_4]2. About window method:Given pseudo code block:inputStream = readStream("http://...", "2s")windowStream = inputStream.window(Seconds(8), Seconds(4))//start Block_2ones = windowStream.map(event => (event.url, 1))counts = ones.runningReduce((a, b) => a + b)//end Block_2So, does that mean the Block_2 runs on every 4 secs (i did think this case = while(true) { execute Block_2; sleep(4s); } )?And, would the lifecycle of DStream windowStream be like following ?:@t_0 -> @t_7: none window completed, Block_2 does not execute.@t_8: execute Block_2 with windowStream = [RDD@t_0->8]@t_12: execute Block_2 with windowStream = [RDD@t_0->8, RDD@t_4->12]@t_16: execute Block_2 with windowStream = [RDD@t_0->8, RDD@t_4->12, RDD@t_8->16]3. About the realtime data storeI have read a paper about a realtime datastore named Druid (from Metamarket, if i remember correctly).In short, Druid = streaming engine(Kafka + Storm) + zookeeper + realtime nodes + historical nodes (it is said that Druid queries 6T (in-mem) data in 1.4 secs).Then the question is, if i want to build the realtime data store solution upon Spark platform (or BDAS) then what is the recommended model?
Thanks so much for your explaination,I would like to ask more specifically:1. About batchInterval:Given pseudo code block:inputStream = readStream("http://...", "2s")//start Block_1ones = inputStream.map(event => (event.url, 1))counts = ones.runningReduce((a, b) => a + b)//end Block_1So, does that mean the Block_1 runs on every 2 secs (i did think this case = while(true) { execute Block_1; sleep(2s); } )?And, would the lifecycle of DStream inputStream be like following ?:@t_0: execute Block_1 with inputStream = [RDD@t_0]@t_2: execute Block_1 with inputStream = [RDD@t_0, RDD@t_2]@t_4: execute Block_1 with inputStream = [RDD@t_0, RDD@t_2, RDD@t_4]
2. About window method:Given pseudo code block:inputStream = readStream("http://...", "2s")windowStream = inputStream.window(Seconds(8), Seconds(4))//start Block_2ones = windowStream.map(event => (event.url, 1))counts = ones.runningReduce((a, b) => a + b)//end Block_2So, does that mean the Block_2 runs on every 4 secs (i did think this case = while(true) { execute Block_2; sleep(4s); } )?And, would the lifecycle of DStream windowStream be like following ?:@t_0 -> @t_7: none window completed, Block_2 does not execute.@t_8: execute Block_2 with windowStream = [RDD@t_0->8]@t_12: execute Block_2 with windowStream = [RDD@t_0->8, RDD@t_4->12]@t_16: execute Block_2 with windowStream = [RDD@t_0->8, RDD@t_4->12, RDD@t_8->16]
3. About the realtime data storeI have read a paper about a realtime datastore named Druid (from Metamarket, if i remember correctly).In short, Druid = streaming engine(Kafka + Storm) + zookeeper + realtime nodes + historical nodes (it is said that Druid queries 6T (in-mem) data in 1.4 secs).Then the question is, if i want to build the realtime data store solution upon Spark platform (or BDAS) then what is the recommended model?