can we read data in and out of cassandra using spark connector in a non-blocking fashion? All the examples that I see that makes a call to cassandra using spark connector looks like they are blocking calls
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
It would be great to have a non-blocking I/O or even better a reactive interface driver than creating and wrapping futures for every I/O call.
As far as multiple threads are concerned we know the parking and unparking of threads can be a bottleneck to the point where Cassandra itself is planning to change everything from SEDA (Staged event driven Architecture) to TPC (Thread per core). Below are the respective tickets.
https://issues.apache.org/jira/browse/CASSANDRA-10989
https://issues.apache.org/jira/browse/CASSANDRA-10993
Thanks!
On Monday, August 22, 2016 at 3:14:45 PM UTC-7, Russell Spitzer wrote:
> You can always run the calls from another thread or use Scala Futures, but usually most users want blocking code because they want to fully utilize their cluster for one operation until the next starts. Could you give a larger explanation of what you are trying to do?
>
>
> On Mon, Aug 22, 2016 at 2:49 PM kant kodali <kant...@gmail.com> wrote:
> can we read data in and out of cassandra using spark connector in a non-blocking fashion? All the examples that I see that makes a call to cassandra using spark connector looks like they are blocking calls
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
>
>
> --
>
>
>
>
> Russell Spitzer
> Software Engineer
>
>
>
>
>
>
>
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
> http://spark-packages.org/package/datastax/spark-cassandra-connector
|
Hi, I don't have any code samples yet but I can certainly let you know once I get there until then we can keep this discussion a bit more theoretical if you don't mind. I use Java and I am also new to spark however I know hadoop well enough. I want to start out addressing the following question. |
Do you have any specific functions you want to have return Futures other than "saveToCassandra"? |
yes, Basically any I/O call to Cassandra I would think it would be great to have non-blocking version of it as well moreover in my practical experience when dealing with performance of large scale distributed systems I have seen non-blocking I/O calls had given significant performance boost. To keep it simple say I need to make two I/O calls(either to network or disk) (I/O call1 & I/O call 2). With non-blocking interface I can make these two calls without waiting for one of them to finish using single thread whereas in blocking you have to wait for one of them to finish before you call another or you have to spawn multiple threads and this in a large scale is proven to be more expensive even if you have a thread pool so I would avoid saying things like "just wrapping a Future around it". Cassandra has a complete non-blocking interface where both reads and writes are done in a non-blocking fashion and those non-blocking functions return something called ListenableFuture<ResultSet> which is very useful. saveToCassandra is only one of them but how about reads? say I am performing ETL where I read data from Cassandra do some large scale computation and then store it back in Cassandra. In this case say I want to read rows in batches or something like that I want to be able to issue multiple read requests and start performing computation on the result that came in first. Now again, I can do this by issuing multiple threads but this goes back to the point I made in the previous paragraph. Another thing you mentioned was there really isn't any local work. I wonder how spark does Map Reduce? aren't the intermediate files written to the disk locally? and the reduce process will do remote read? I thought all MR stuff is about local-write and remote read. No? In the spark Architecture which components act as a Map process and which components acts as reduce process? It will be great if you can let me know. Thanks much! |
Hi
What about using actors for async actions like writing output of microbatches to c* ? It would allow to maximize time spent on writing while the next microbatch is being computed ? The extra cost would be more memory ?
> To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
>
>
> --
>
>
>
>
> Russell Spitzer
> Software Engineer
>
>
>
>
>
>
>
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
> http://spark-packages.org/package/datastax/spark-cassandra-connector
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
> To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
>
>
> --
>
>
>
>
> Russell Spitzer
> Software Engineer
>
>
>
>
>
>
>
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
> http://spark-packages.org/package/datastax/spark-cassandra-connector
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
|
you have made some very good points and thank you so much for sharing the knowledge as well as your thoughts. I am very excited to go through the links you had pointed out. I just finished watching few and few more to go! kant |