Problem with importing csv using Copy

2,356 views
Skip to first unread message

Bharani Subramaniam

<bharani.sub@gmail.com>
unread,
Nov 3, 2015, 1:46:08 AM11/3/15
to ScyllaDB users
Hi,

I am trying to import csv data into scylladb using the copy command through the cqlsh shell. I run into this error


<ErrorMessage code=2000 [Syntax error in CQL query] message=" : cannot match to any predicted input...
">
Aborting import at record #1. Previously inserted records are still present, and some records after that may be present as well.


Details

- The table i am trying to insert has 5 columns with 3 columns as composite keys. I am not sure the error is due to data - it is a clean and sorted file

- I am using Fedora 22 64 bit with scylladb installed from the rpms(scylla-server-0.10-20151012.2ed34b0.fc22.x86_64.rpm)


Let me know if you need a sample data to reproduce it

Thanks
Bharani


Avi Kivity

<avi@scylladb.com>
unread,
Nov 3, 2015, 2:02:23 AM11/3/15
to scylladb-users@googlegroups.com
Please rerun the COPY command in cqlsh, but first add the --debug option
to cqlsh. This will print the query, and we can see if it hit something
that Scylla does not yet support.

Looking at the code it is a simple INSERT INTO statement, but maybe I'm
missing something.

Bharani Subramaniam

<bharani.sub@gmail.com>
unread,
Nov 3, 2015, 2:53:28 AM11/3/15
to ScyllaDB users
Here is the table
CREATE TABLE demo.ctest (
    a int,
    b int,
    c int,
    d int,
    e float,
    PRIMARY KEY (a, b, c)
) WITH CLUSTERING ORDER BY (b ASC, c ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL","rows_per_partition":"ALL"}'
    AND comment = ''
    AND compaction = {'class': 'SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';


Here is the output of cqlsh with debug

[bharani@localhost ~]$ cqlsh --debug
Using CQL driver: <module 'cassandra' from '/usr/share/scylla/cassandra/lib/cassandra-driver-internal-only-2.6.0c2.post.zip/cassandra-driver-2.6.0c2.post/cassandra/__init__.py'>
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.8 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh> use demo;
cqlsh:demo> truncate ctest;
cqlsh:demo> select * from ctest;

 a | b | c | d | e
---+---+---+---+---

(0 rows)
cqlsh:demo> copy ctest(a,b,c,d,e) from '/home/bharani/Downloads/s3.csv';
None

<ErrorMessage code=2000 [Syntax error in CQL query] message=" : cannot match to any predicted input...
">
Aborting import at record #1. Previously inserted records are still present, and some records after that may be present as well.

1200 rows imported in 0.243 seconds.

I am also sending you the data so that you can check in your end. Let me know if you need more information

Thanks
Bharani
s3.csv.gz

Pekka Enberg

<penberg@scylladb.com>
unread,
Nov 3, 2015, 3:10:01 AM11/3/15
to ScyllaDB users, Bharani Subramaniam
On Tue, Nov 3, 2015 at 9:53 AM, Bharani Subramaniam
<bhara...@gmail.com> wrote:
> I am also sending you the data so that you can check in your end. Let me
> know if you need more information

It's not a Scylla issue, you'll see the same behavior with C*:

cqlsh> copy demo.ctest(a,b,c,d,e) from '/home/penberg/Downloads/s3.csv';

<ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:56
no viable alternative at input ')' (... (a,b,c,d,[e]))">
Aborting import at record #1. Previously inserted records are still
present, and some records after that may be present as well.

1200 rows imported in 0.470 seconds.

AFAICT, the problem is the CSV header row. After deleting the header
row, Scylla imports the dataset fine:

cqlsh> copy demo.ctest(a,b,c,d,e) from '/home/penberg/Downloads/s3.csv';
Processed 10000 rows; Write: 60114.60 rows/s
19999 rows imported in 0.570 seconds.

- Pekka

Avi Kivity

<avi@scylladb.com>
unread,
Nov 3, 2015, 3:51:02 AM11/3/15
to scylladb-users@googlegroups.com, Bharani Subramaniam


On 11/03/2015 10:10 AM, Pekka Enberg wrote:
> On Tue, Nov 3, 2015 at 9:53 AM, Bharani Subramaniam
> <bhara...@gmail.com> wrote:
>> I am also sending you the data so that you can check in your end. Let me
>> know if you need more information
> It's not a Scylla issue, you'll see the same behavior with C*:
>
> cqlsh> copy demo.ctest(a,b,c,d,e) from '/home/penberg/Downloads/s3.csv';
>
> <ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:56
> no viable alternative at input ')' (... (a,b,c,d,[e]))">

Well, C* gave a nicer error here.

Bharani Subramaniam

<bharani.sub@gmail.com>
unread,
Nov 3, 2015, 4:07:04 AM11/3/15
to ScyllaDB users, bharani.sub@gmail.com
Never thought the header would be the problem. Thanks, i will now return to playing around with the scylladb  and thanks for taking effort to build this. 

ishita@excellenceinfonet.com

<ishita@excellenceinfonet.com>
unread,
Nov 27, 2015, 5:33:28 AM11/27/15
to ScyllaDB users
Yes, removing the header row solves the issue.
Message has been deleted

Pekka Enberg

<penberg@scylladb.com>
unread,
Jan 17, 2016, 6:02:38 AM1/17/16
to ScyllaDB users, lorina@datastax.com, ishita@excellenceinfonet.com, Bharani Subramaniam
Hi Lorina,

On Sat, Jan 16, 2016 at 12:55 AM, <lor...@datastax.com> wrote:
> In CQL, using the WITH HEADER=TRUE is required to use a HEADER. See:
> http://docs.datastax.com/en/cql/3.3/cql/cql_reference/copy_r.html

Right, I had missed that cqlsh feature, thanks!

- Pekka

purnima.shah@iet.ahduni.edu.in

<purnima.shah@iet.ahduni.edu.in>
unread,
Jun 22, 2016, 5:24:11 AM6/22/16
to ScyllaDB users, lorina@datastax.com, ishita@excellenceinfonet.com, bharani.sub@gmail.com
i loaded same file s3.csv using copy command in cassandra but it only load 1129 rows.
if i take my sample file it shows 428 rows processed but actually it doesnt store in table.
why?

Avi Kivity

<avi@scylladb.com>
unread,
Jun 22, 2016, 7:31:10 AM6/22/16
to scylladb-users@googlegroups.com, lorina@datastax.com, ishita@excellenceinfonet.com, bharani.sub@gmail.com

It's impossible to answer without context.  What is your schema?  What did you do? Did you have duplicate keys in the input?  How did you verify data is missing?

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To post to this group, send email to scyllad...@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/28ee02c7-15c4-4c50-b2b3-9b08fe94f80a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Purnima Shah

<purnima.shah@iet.ahduni.edu.in>
unread,
Jun 22, 2016, 8:21:58 AM6/22/16
to scylladb-users@googlegroups.com, lorina@datastax.com, ishita@excellenceinfonet.com, bharani.sub@gmail.com
my schema is like this,


CREATE KEYSPACE project1 WITH replication = {'class':'SimpleStrategy', 'replication_factor':1};
USE project1;

// Q1:
CREATE TABLE markets_by_id (id TEXT, district TEXT, latitude DOUBLE, longitude DOUBLE, godown TEXT, cold_storage TEXT, nrs TEXT, distance DOUBLE, commodity SET<TEXT>, contact TEXT, address TEXT, PRIMARY KEY (id));

my sample csv file


hduser@purnima:~/weather/final csv$ cat -v APMC_market_final.csv |more
1,Barvala,71.895672,22.1542889,NIL,NIL,Bhimnath,10,"Wheat, Jowar, Rai, Bajra",NI
L,NIL,"APMC, Dhandhuka,
Post, Barvala- 382450"
2,Bavla,72.4813809,22.9679557,"MCG,CSC,
SWG",PRC,Bavla,1,"Paddy, Cotton, Wheat,
Redgram","02714-
32234",NIL,"APMC, Bavla-382220"
3,Dhandhuka,71.9816261,22.3796555,"MCG,CSC,
SWG",NIL,Dhandhuka,1,"Potato, Bajra, Cotton,
Wheat","02713-
22357",NIL,"APMC,       
Dhandhuka-382460"
4,Dholera,72.1934421,22.2498636,NIL,NIL,Dhandhuka,27,"Potato, Bajra, Cotton,
Wheat",NIL,NIL,"APMC, Dhandhuka,
Post, Dholera-382455"
5,Dholka,72.551564,22.946007,NIL,PRC,Dholka,1,"Cotton, Wheat, Paddy, Bajra",NIL,
NIL,"APMC,Dholka-387810"
6,Dholka (Veg.),72.551564,22.946007,NIL,NIL,Dholka,1,"Potato, Onion, Tomato,
Brinjal",NIL,NIL,"APMC, Dholka-387810"
7,"Kalupur Keri
Pitha",72.5992477,23.0094868,NIL,NIL,Ahmedabad,1,"Potato, Onion, Chillies,
Brinjal",NIL,NIL,"APMC, Kalupur Keri
Pith,




--
You received this message because you are subscribed to a topic in the Google Groups "ScyllaDB users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scylladb-users/Q1TdqZBjlfU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scylladb-user...@googlegroups.com.

To post to this group, send email to scyllad...@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
Reply all
Reply to author
Forward
0 new messages