TypeError: Received an argument of invalid type for column

446 views
Skip to first unread message

Daiyue Weng

unread,
Jun 26, 2016, 12:32:56 PM6/26/16
to DataStax Python Driver for Apache Cassandra User Mailing List

I am using Cassandra Python driver to created a table. In the table there is a column (called date) defined as timestamp. I tried to access values from a dataframe and insert rows into the table using these values. The dataframe value corresponding to date has the format (captured from PyCharm debug mode),

Timestamp('2006-09-29 00:00:00')

I have used pandas.to_datetime() with format parameter specified in the dataframe before usingCassandra to insert rows into the table. But I got the following error,

TypeError: Received an argument of invalid type for column "date". Expected: <class 'cassandra.cqltypes.DateType'>, Got: <class 'str'>; (DateType arguments must be a datetime, date, or timestamp)

The table is defined as,

CREATE TABLE test_keyspace.test_table (
key1 text PRIMARY KEY,
key2 float,
key3 text,
key4 text,
key5 text,
date timestamp
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

Adam Holmberg

unread,
Jun 27, 2016, 3:08:22 PM6/27/16
to python-dr...@lists.datastax.com
I don't think you should have to convert anything. Note that Timestamp can actually be used in place of datetime. According to the docs:

TimeStamp is the pandas equivalent of python's Datetime and is interchangable with it in most cases. [sic]

DateType.serialize(datetime(2012, 5, 1), 4) == DateType.serialize(Timestamp(datetime(2012, 5, 1)), 4)
# True

I don't think that's an issue. Rather, it looks like you're somehow binding a string value instead of a Timestamp or datetime value. I'm not sure how this is happening since pandas.to_datetime returns a Timestamp and Timestamp.to_datetime returns a datetime.datetime. 

If removing the conversion does not help, we might need to see more of your code.

Adam

--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.

Daiyue Weng

unread,
Jun 28, 2016, 5:06:21 AM6/28/16
to python-dr...@lists.datastax.com
Yes, you are right. pandas.to_datetime is equivalent to Cassandra TimeStamp. The problem is due to when converting values to datetime using pandas.to_datetime, invalid parsing will be set as NaT. The dataframe may have some NaTs in it. Forgot to remove those NaTs before insert them into cassandra.

many thanks
Reply all
Reply to author
Forward
0 new messages