Hello,
I'm having some problems working with Cassandra python driver and datetime.datetime objects with millisecond precision.
The problem seems to be that the conversion datetime.datetime -> int64 (I asume cassandra stores timestamp objects using an int64) when storing a row is different from the conversion int64 -> datetime.datetime when retrieveing it, so the date you have when storing a row is different from the date you have when you retrieve it.
I have a table defined like this:
CREATE TABLE dat_datasource (
did uuid,
date timestamp,
content text,
PRIMARY KEY (did,date)
);The application receives the date and content from external agents. the date is received as a string in ISO 8601 format with
millisecond precision. This string is converted to a datetime.datetime
object, and did, date and content are stored in the database.
To represent a Row, I declared the following class:
class DatasourceData:
def __init__(self, did, date=None, content=None):
self.did=did
self.date=date
self.content=content
This object is stored in the database using the function insert_datasource_data, which receives a DatasourceData object as argument. The code is the following:
def insert_datasource_data(dsdobj):
if not isinstance(dsdobj, ormdatasource.DatasourceData):
return False
else:
connection.session.execute(stmtdatasource.I_A_DATDATASOURCE_B_DID_DATE,(dsdobj.did,dsdobj.date,dsdobj.content))
return True
The CQL statement executed on Cassandra to store each row and which is passed to the execute function is:
'insert into dat_datasource (did,date,content) values (?,?,?)', where the type of each data passed to the statement is (uuid.UUID, datetime.datetime, unicode string)
So, after I programed some quick (and dirty) tests to check if information was stored correctly, the problem arised.
TEST 1:
Description:
The first test generates datetime.datetime objects from millisecond 0 to millisecond 999 and stores it on Cassandra. After storing it, It retrieves the data, passing the same datetime.datetime object that is passed to the connection.execute function. This is the code of the test:
def test_insert_datasource_data_millisecond_precision_success(self):
''' insert_datasource_data must store data with the same date we pass in the DatasourceData object '''
did=uuid.uuid4()
# we generate a date in isoformat with millisecond precision (by default datetime.datetime objects have microsecond precision) from millisecond 0 to 999
base_sdate=datetime.datetime.utcnow().isoformat().split('.')[0]
for i in range(0,1000):
ms=str(i)
if len(ms)==1:
ms='00'+ms
elif len(ms)==2:
ms='0'+ms
content=base_sdate+'.'+ms #at this point I have a date in iso 8601 format with millisecond precision
logger.logger.debug('Storing: '+content)
date=dateutil.parser.parse(content)
dsdobj=ormdatasource.DatasourceData(did=did, date=date, content=content)
self.assertTrue(datasourceapi.insert_datasource_data(dsdobj=dsdobj))
db_data=datasourceapi.get_datasource_data_at(did=did, date=date)
self.assertTrue(isinstance(db_data, ormdatasource.DatasourceData))
self.assertEqual(db_data.did, did)
self.assertEqual(db_data.content, content)
self.assertEqual(db_data.date, date)
As you can see, we store the date in iso format in the "content" column too, so we know which is the theoretical date stored in each row.
Error: This test fails with the following error:
self.assertEqual(db_data.date, date)
AssertionError: datetime.datetime(2014, 12, 31, 17, 57, 53, 999) != datetime.datetime(2014, 12, 31, 17, 57, 53, 1000) <- in the logs, the 999 and 1000 values are microseconds
Error Analisys:So, the result of this test is that the content is the same (I am retrieving the same row if I use the same datetime.datetime object that in the insert) but the date of the row received is different (the conversion datetime->int64 and int64->datetime.datetime is not "symetric")
TEST 2:
Description:
the second test generates datetime.datetime objects from millisecond 0 to millisecond 999 and stores it on Cassandra. After storing it, it retrieves the row, first using the original datetime.datetime object and the using the datetime.datetime object of the object retrieved, and compares the columns of each one. This is the code:
def test_insert_datasource_data_millisecond_precision_success_2(self):
''' insert_datasource_data must retrieve the same data if we use the date from the insert or the date from the stored row '''
did=uuid.uuid4()
base_sdate=datetime.datetime.utcnow().isoformat().split('.')[0]
for i in range(0,1000):
ms=str(i)
if len(ms)==1:
ms='00'+ms
elif len(ms)==2:
ms='0'+ms
content=base_sdate+'.'+ms
logger.logger.debug('Storing: '+content)
date=dateutil.parser.parse(content)
dsdobj=ormdatasource.DatasourceData(did=did, date=date, content=content)
self.assertTrue(datasourceapi.insert_datasource_data(dsdobj=dsdobj))
db_data=datasourceapi.get_datasource_data_at(did=did, date=date)
db_data2=datasourceapi.get_datasource_data_at(did=did, date=db_data.date)
self.assertTrue(isinstance(db_data, ormdatasource.DatasourceData))
self.assertTrue(isinstance(db_data2, ormdatasource.DatasourceData))
self.assertEqual(db_data.did, db_data2.did)
self.assertEqual(db_data.content, db_data2.content)
Error:The error during the execution of this test I get the following error:
self.assertEqual(db_data.content, db_data2.content)
AssertionError: '2014-12-31T18:11:47.001' != '2014-12-31T18:11:47.000'
- 2014-12-31T18:11:47.001
? ^
+ 2014-12-31T18:11:47.000
? ^
Error analisys:It seems that when I use the datetime object used to insert the row I retrieve the correct row, but when I use the datetime object stored in that row, I retrieve a different one (the one of the previous millisecond), so again seems that the conversion from datetime -> int64 and int64 -> datetime is not symetric.
-------
- Have anyone experienced this issue?
- How do you store/retrieve timestamps with millisecond precision consistently?
thank you very much in advance for your help.
Juan