Thanks Miguel. Do you have any benchmark suite run to test the concurrency. That will help me to run in our setup. Im thinking of using cross-data as engine for "Data As Service" where people can query all types of data. The source of data could be
1. Druid - For realtime events data - For events analytics. The access should be native through Plyql jdbc interface to Druid. Do we need to create any connector for this. How do i enable a jdbc connector to Cross-Data for native access.
2. Hbase or Scylladb(Cassandra dropin) - For fast access store - Use native mode only for points and range queries on raw 1 months events as well as dimensions and facts entities derived/prepared from the batch pipeline (spark/hadoop). In case anyone wants to run join queries, we can create a view in cross-data and this should run through Spark.
3. Redshift / Hive - For OLAP queries on dimensions and facts. We may need to write some connectors for this probably. Any idea how to enable jdbc based native access to Redshift. Do i need to write a connector for Redshift. For Hive queries we can easily integrate i know.
4. Cross Data Source queries - Create views in Cross Data for example i want to join data between Druid + ScyllaDB + Redshift. This probably should be handled through Spark.
5. Interactive Adhoc Analysis on Deep History available on S3 + Hive - We can connect through Cross Data for this and Spark will handle it.
Another thing i want to ask - In spark cluster, we are using Tachyon, so the intermediate state of processing data we are storing in Tachyon with tiered storage(Memory -> SSD -> HDD) enabled. Our objective is not to use HDFS when reading data from S3. Rather use In memory file system Tachyon for fast access. Do you see any challenges in this setup.
Let me know your thoughts.
Regards
-Sambit