Few problems when trying to use crossdata

77 views
Skip to first unread message

Borisa Zivkovic

unread,
Jul 18, 2015, 6:58:29 PM7/18/15
to crossda...@googlegroups.com
Hi everyone, I downloaded latest crossdata Vagrant from stratio.com and I am trying to play with it - just to see capabilities...

I ran into few problems - inconsistent documentation, strange problem with connectors, unable to install mongodb connector and use it.

I am mainly interested why join does not work (even though README.TXT says it should) and also why I can not install and use mongodb connector.

thanks

Here are details:

1) Online documentation here http://docs.stratio.com/modules/crossdata/development/Sandbox.html is misleading - talks about non-existent connector

[vagrant@crossdata ~]$ service connector_deep start
connector_deep: unrecognized service


README.TXT provided with vagrant demo is better - but wasted some time figuring out what is wrong with online documentation. It would be great if these two were in sync.



2) Started following step-by-step from README.TXT and first join failed

xdsh:root:test> SELECT id, age FROM table2;
[Driver] 18-07-2015 10:37:22.026 [INFO|Shell] Query e9c08066-1318-4461-8f34-677e3776d545 in progress

xdsh:root:test> [Driver] 18-07-2015 10:37:22.092 [INFO|BasicDriver$] Query e9c08066-1318-4461-8f34-677e3776d545 finished. 
Session Id: a66048da-572e-4795-8e1b-f4b874d80be3
Query: SELECT id, age FROM table2;
Execution time: 0 seconds



Result: QID: e9c08066-1318-4461-8f34-677e3776d545

Partial result: false
--------------
| id   | age | 
--------------
| 1001 | 42  | 
| 999  | 23  | 
| 1000 | 35  | 
--------------

Result page: 0
Removing results handler for: e9c08066-1318-4461-8f34-677e3776d545
xdsh:root:test> SELECT name, age FROM table1 INNER JOIN table2 ON table1.id=table2.id;
[Driver] 18-07-2015 10:37:43.346 [INFO|Shell] Query 218fb159-55ec-41a0-ac47-189d3c27bc26 in progress

xdsh:root:test> 
Result: The operation for query 218fb159-55ec-41a0-ac47-189d3c27bc26 cannot be executed:
Cannot determine execution path as no connector supports INNER JOIN ([test.table1, test.table2]) ON [test.table1.id = test.table2.id]

xdsh:root:test> describe connectors
    » ;
[Driver] 18-07-2015 10:38:20.509 [INFO|Shell] 
Connector: connector.CassandraConnector ONLINE [cluster.cassandra_prod] [datastore.Cassandra] [akka.tcp://CrossdataSe...@10.0.2.15:35304/user/ConnectorActor/]
Connector: connector.SparkSQLConnector ONLINE [] [datastore.cassandra, datastore.hbase, datastore.hdfs] [akka.tcp://CrossdataSe...@10.0.2.15:35364/user/ConnectorActor/]


3) Could not make update working, probably because cassandra connector and sparkSQL connector do not support update operations:

xdsh:root:test> SELECT id, age FROM table2;
[Driver] 18-07-2015 10:54:37.524 [INFO|Shell] Query b6f33d4e-1cc4-49e7-bcbc-cd7d56727c1c in progress

xdsh:root:test> [Driver] 18-07-2015 10:54:37.619 [INFO|BasicDriver$] Query b6f33d4e-1cc4-49e7-bcbc-cd7d56727c1c finished. 
Session Id: a66048da-572e-4795-8e1b-f4b874d80be3
Query: SELECT id, age FROM table2;
Execution time: 0 seconds



Result: QID: b6f33d4e-1cc4-49e7-bcbc-cd7d56727c1c

Partial result: false
--------------
| id   | age | 
--------------
| 1001 | 42  | 
| 999  | 23  | 
| 1000 | 35  | 
--------------

Result page: 0
Removing results handler for: b6f33d4e-1cc4-49e7-bcbc-cd7d56727c1c
xdsh:root:test> update table2 set age = age + 10;
[Driver] 18-07-2015 10:54:43.522 [INFO|Shell] Query f9e91452-d34e-4ebf-98b6-5e2a5f3d62f7 in progress

xdsh:root:test> 
Result: The operation for query f9e91452-d34e-4ebf-98b6-5e2a5f3d62f7 cannot be executed:
There is no any attached connector supporting: 
[UPDATE_NO_FILTERS]

xdsh:root:test> update table2 set age = age + 10 where age > 0;
[Driver] 18-07-2015 10:54:54.331 [INFO|Shell] Query 1959d75c-7988-4b0f-9260-7600c997f481 in progress

xdsh:root:test> 
Result: The operation for query 1959d75c-7988-4b0f-9260-7600c997f481 cannot be executed:
There is no any attached connector supporting: 
[UPDATE_NON_INDEXED_GT]

xdsh:root:test> create default index abcIdx on table2(age);
[Driver] 18-07-2015 10:56:11.531 [INFO|Shell] Query 8721c1a2-336b-4748-aed4-a50c419d6139 in progress

xdsh:root:test> 
Result: QID: 8721c1a2-336b-4748-aed4-a50c419d6139
INDEX created successfully
xdsh:root:test> update table2 set age = age + 10 where age > 0;
[Driver] 18-07-2015 10:56:16.828 [INFO|Shell] Query b28bbbce-75df-4a43-856e-f99de05e33bf in progress

xdsh:root:test> 
Result: The operation for query b28bbbce-75df-4a43-856e-f99de05e33bf cannot be executed:
There is no any attached connector supporting: 
[UPDATE_INDEXED_GT]

4) I think there is problem with registration of sparkSQL connector on Vagrant sandbox machine:

xdsh:root:test> describe connectors;
[Driver] 18-07-2015 11:03:34.215 [INFO|Shell] 
Connector: connector.CassandraConnector ONLINE [cluster.cassandra_prod] [datastore.Cassandra] [akka.tcp://CrossdataSe...@10.0.2.15:35304/user/ConnectorActor/]
Connector: connector.SparkSQLConnector ONLINE [] [datastore.cassandra, datastore.hbase, datastore.hdfs] [akka.tcp://CrossdataSe...@10.0.2.15:35364/user/ConnectorActor/]


xdsh:root:test> ATTACH CONNECTOR SparkSQLConnector TO cassandra_prod WITH OPTIONS {'DefaultLimit': '1000'};
[Driver] 18-07-2015 11:03:37.810 [INFO|Shell] Query 4ab0b5a4-43af-4b90-973d-6bdc46cee7a2 in progress

xdsh:root:test> ATTACH CONNECTOR CassandraConnector TO cassandra_prod WITH OPTIONS {'DefaultLimit': '1000'} AND PRIORITY=1;
[Driver] 18-07-2015 11:03:42.069 [INFO|Shell] Query 0cda60a7-3c34-4a34-985c-49a5e0e45b86 in progress

xdsh:root:test> 
Result: The operation for query 0cda60a7-3c34-4a34-985c-49a5e0e45b86 cannot be executed:
The connection to cassandra_prod already exists.

xdsh:root:test> describe system;
[Driver] 18-07-2015 11:05:16.386 [INFO|Shell] 
Datastore datastore.Cassandra:
Cluster cassandra_prod:
Connector CassandraConnector



4) Tried to build and install MongodB connector from Github (it would be great if there was one built already):


This file talks about non-existent file that should be created during package target/stratio-connector-mongodb-[VERSION]/bin/stratio-connector-mongodb-core[VERSION] start

Managed to create rpm and tried to install it into running crossdata vagrant machine but for some reason it does not install correctly.

[root@crossdata vagrant]# rpm -i stratio-connector-mongodb-0.5.0_SNAPSHOT.noarch.rpm 
cp: cannot stat `/opt/sds/stratio-connector-mongodb/template/MongoDBConnector': No such file or directory
warning: %post(stratio-connector-mongodb-0.5.0-0.1.20150718.222944.noarch) scriptlet failed, exit status 1


Parts were installed but it looks like not everything. I tried to execute

[root@crossdata vagrant]# service stratio-connector-mongodb start
stratio-connector-mongodb: unrecognized service


It looks like documentation is wrong - the service name is connector-mongodb

[root@crossdata vagrant]#  service connector-mongodb start
env: /etc/init.d/connector-mongodb: Permission denied


After fixing permissions looks like this script is referencing file that does not exit

[root@crossdata init.d]#  service connector-mongodb start
/etc/init.d/connector-mongodb: line 1: /opt/sds/stratio-connector-mongodb/bin/stratio-connector-mongodb: No such file or directory
/etc/init.d/connector-mongodb: line 1: exec: /opt/sds/stratio-connector-mongodb/bin/stratio-connector-mongodb: cannot execute: No such file or directory


Juan Jose Lopez Martin

unread,
Jul 20, 2015, 2:44:28 AM7/20/15
to crossda...@googlegroups.com
Hi,

I try to answer all your steps:

1. You're right. We have a misleading version between doc and readme.txt. We will try to resolve it asap.
2. The problem is the SparkSQLConnector is not attached. Please insert this query to attach.
ATTACH CONNECTOR SparkSQLConnector to cassandra_prod WITH OPTIONS { 'DefaultLimit' : '1000' }
3. The problem with the UPDATE is because it is not supported now with a where clause. We are working on it.
4. Maybe the problem is about the first attach connector goes ok but it doesn't show any answer.
5. Please, put an issue into stratio-connector-mongodb github page. Another group is working with this connector and they will greated to resolved your use case with this connector.


Let me suggest you (now that you are trained with crossdata) to download latest version of Crossdata, Cassandra Connector, and SparkSQL Connector from Stratio Github (master branch). There are some bugs resolved and another new functionalities implemented.

Thank you very much for your feedback, and we will try to resolve the problem with the doc.

Borisa Zivkovic

unread,
Jul 20, 2015, 5:46:04 AM7/20/15
to crossda...@googlegroups.com
Thanks Juan,

I will contact mongodb team and will consider building crossdata directly from git.

I tried again crossdata sandbox and this time connectors successfully registered... but there are few stability issues - join does not work first few times I try it.. and then it does work... see below



xdsh:root:test> SELECT name, age FROM table1 INNER JOIN table2 ON table1.id=table2.id;
[Driver] 20-07-2015 09:42:25.706 [INFO|Shell] Query 4793883f-81cc-43ba-a97a-4b7a0b7f0daf in progress

xdsh:root:test> SELECT name, age FROM table1 INNER JOIN table2 ON table1.id=table2.id;
[Driver] 20-07-2015 09:42:45.700 [INFO|Shell] Query 5d6196e0-5c93-4fcc-8448-7150ef4a9d61 in progress

xdsh:root:test> SELECT id, age FROM table2;
[Driver] 20-07-2015 09:43:07.077 [INFO|Shell] Query 708e1ff5-24e2-4ae3-9fcd-b84c8588fe03 in progress

xdsh:root:test> [Driver] 20-07-2015 09:43:07.601 [INFO|BasicDriver$] Query 708e1ff5-24e2-4ae3-9fcd-b84c8588fe03 finished. 
Session Id: 5c05a7b1-140c-4a88-839f-896e067df7f6
Query: SELECT id, age FROM table2;
Execution time: 0 seconds



Result: QID: 708e1ff5-24e2-4ae3-9fcd-b84c8588fe03

Partial result: false
--------------
| id   | age | 
--------------
| 999  | 23  | 
| 1000 | 35  | 
--------------

Result page: 0
Removing results handler for: 708e1ff5-24e2-4ae3-9fcd-b84c8588fe03
xdsh:root:test> SELECT name, age FROM table1 INNER JOIN table2 ON table1.id=table2.id;
[Driver] 20-07-2015 09:43:11.833 [INFO|Shell] Query f6515789-4bf8-4390-abe7-030b2f8fde61 in progress

xdsh:root:test> select * from table1;
[Driver] 20-07-2015 09:43:27.243 [INFO|Shell] Query 1e836752-cec5-4987-8f50-355dae0c83c4 in progress

xdsh:root:test> [Driver] 20-07-2015 09:43:27.485 [INFO|BasicDriver$] Query 1e836752-cec5-4987-8f50-355dae0c83c4 finished. 
Session Id: 5c05a7b1-140c-4a88-839f-896e067df7f6
Query: select * from table1;
Execution time: 0 seconds



Result: QID: 1e836752-cec5-4987-8f50-355dae0c83c4

Partial result: false
------------------------------------------------------------
| id   | serial | name    | rating | email                 | 
------------------------------------------------------------
| 1001 | 34539  | John    | 9.3    | cros...@stratio.com
| 999  | 54000  | Peter   | 8.9    | mye...@yahoo.com     | 
| 1000 | 71098  | Charles | 2.7    | con...@stratio.com   | 
------------------------------------------------------------

Result page: 0
Removing results handler for: 1e836752-cec5-4987-8f50-355dae0c83c4
xdsh:root:test> SELECT name, age FROM table1 INNER JOIN table2 ON table1.id=table2.id;
[Driver] 20-07-2015 09:43:31.076 [INFO|Shell] Query bff9adfb-bb82-4e39-abe1-0aded60c938f in progress

xdsh:root:test> [Driver] 20-07-2015 09:43:33.952 [INFO|BasicDriver$] Query f6515789-4bf8-4390-abe7-030b2f8fde61 finished. 
Session Id: 5c05a7b1-140c-4a88-839f-896e067df7f6
Query: SELECT name, age FROM table1 INNER JOIN table2 ON table1.id=table2.id;
Execution time: 22 seconds



Result: QID: f6515789-4bf8-4390-abe7-030b2f8fde61

Partial result: false
-----------------
| name    | age | 
-----------------
| Peter   | 23  | 
| Charles | 35  | 
-----------------

Result page: 0
Removing results handler for: f6515789-4bf8-4390-abe7-030b2f8fde61
xdsh:root:test> [Driver] 20-07-2015 09:43:45.057 [INFO|BasicDriver$] Query bff9adfb-bb82-4e39-abe1-0aded60c938f finished. 
Session Id: 5c05a7b1-140c-4a88-839f-896e067df7f6
Query: SELECT name, age FROM table1 INNER JOIN table2 ON table1.id=table2.id;
Execution time: 13 seconds



Result: QID: bff9adfb-bb82-4e39-abe1-0aded60c938f

Partial result: false
-----------------
| name    | age | 
-----------------
| Peter   | 23  | 
| Charles | 35  | 
-----------------

Result page: 0
Removing results handler for: bff9adfb-bb82-4e39-abe1-0aded60c938f
xdsh:root:test> 

Borisa Zivkovic

unread,
Jul 20, 2015, 5:52:06 AM7/20/15
to crossda...@googlegroups.com
Created github issue for mongodb connector https://github.com/Stratio/stratio-connector-mongodb/issues/1

I hope someone is watching this since there are no other issues written and I could not find other ways to report problems.

regards

Juan Jose Lopez Martin

unread,
Jul 20, 2015, 6:02:37 AM7/20/15
to crossda...@googlegroups.com
I think the problem is about the spark memory cluster and the vagrant sandbox memory (there some kind of resources problems between the connector and vagrant). If you finally try Crossdata from the github repository I suppose you won't have this problem.


regards.

Borisa Zivkovic

unread,
Jul 20, 2015, 6:31:16 AM7/20/15
to crossda...@googlegroups.com
ok.. I will try two things.. increasing Vagrant memory to 10GB should be enough... and also using Crossdata from github...

thanks a lot

Borisa Zivkovic

unread,
Jul 23, 2015, 2:34:50 AM7/23/15
to Crossdata Users, jjl...@stratio.com
I tried increasing available memory to 8GB...

everything works ok except that joins are slow... execution time of join is from 31second and never goes below 9 seconds... 

I am using data from /etc/sds/crossdata/README.txt -  6 rows in total...

I will also try latest available code from github

Miguel Angel Fernandez

unread,
Jul 23, 2015, 1:50:03 PM7/23/15
to Crossdata Users, jjl...@stratio.com, borisha....@gmail.com
Hi Borisa,

this long execution time is due to the default Cassandra configuration as it uses 256 virtual nodes. Therefore, this query is divided into 2 little tasks that are also split into around 200 tasks each. Thus, most of these 9 seconds are spent sending data and collecting the results of these 400 tiny tasks. For more information, you can take a look here:


where we can find the next information:

"Attention: DataStax Enterprise turns off virtual nodes (vnodes) by default. DataStax does not recommend turning on vnodes for DSE Hadoop or BYOH nodes. Before turning vnodes on for Hadoop, understand the implications of doing so DataStax Enterprise does support turning on vnodes for Spark nodes."

Anyway, we'll take it into account for the next vagrant environment creation. Thanks for the feedback.

Miguel Angel,
Crossdata team
Reply all
Reply to author
Forward
0 new messages