problem working with datetime.datetime objects with millisecond precision

444 views
Skip to first unread message

Juan Cañete

unread,
Dec 31, 2014, 4:21:10 PM12/31/14
to python-dr...@lists.datastax.com
Hello,

I'm having some problems working with Cassandra python driver and datetime.datetime objects with millisecond precision.

The problem seems to be that the conversion datetime.datetime -> int64 (I asume cassandra stores timestamp objects using an int64) when storing a row is different from the conversion int64 -> datetime.datetime when retrieveing it, so the date you have when storing a row is different from the date you have when you retrieve it.

I have a table defined like this:


         CREATE TABLE dat_datasource (
             did uuid,
             date timestamp,
             content text,
             PRIMARY KEY (did,date)
        );


The application receives the date and content from external agents. the date is received as a string in ISO 8601 format with millisecond precision. This string is converted to a datetime.datetime object, and did, date and content are stored in the database.

To represent a Row, I declared the following class:

class DatasourceData:
    def __init__(self, did, date=None, content=None):
        self.did=did
        self.date=date
        self.content=content

This object is stored in the database using the function insert_datasource_data, which receives a DatasourceData object as argument. The code is the following:

def insert_datasource_data(dsdobj):
    if not isinstance(dsdobj, ormdatasource.DatasourceData):
        return False
    else:
        connection.session.execute(stmtdatasource.I_A_DATDATASOURCE_B_DID_DATE,(dsdobj.did,dsdobj.date,dsdobj.content))
        return True


The CQL statement executed on Cassandra to store each row and which is passed to the execute function is:

    'insert into dat_datasource (did,date,content) values (?,?,?)', where the type of each data passed to the statement is (uuid.UUID, datetime.datetime, unicode string)



So, after I programed some quick (and dirty) tests to check if information was stored correctly, the problem arised.

TEST 1:

Description:

The first test generates datetime.datetime objects from millisecond 0 to millisecond 999 and stores it on Cassandra. After storing it, It retrieves the data, passing the same datetime.datetime object that is passed to the connection.execute function. This is the code of the test:

    def test_insert_datasource_data_millisecond_precision_success(self):
       
''' insert_datasource_data must store data with the same date we pass in the DatasourceData object '''
        did
=uuid.uuid4()
       
# we generate a date in isoformat with millisecond precision (by default datetime.datetime objects have microsecond precision) from millisecond 0 to 999
        base_sdate
=datetime.datetime.utcnow().isoformat().split('.')[0]
       
for i in range(0,1000):
            ms
=str(i)
           
if len(ms)==1:
                ms
='00'+ms
           
elif len(ms)==2:
                ms
='0'+ms
            content
=base_sdate+'.'+ms #at this point I have a date in iso 8601 format with millisecond precision
            logger
.logger.debug('Storing: '+content)
            date
=dateutil.parser.parse(content)
            dsdobj
=ormdatasource.DatasourceData(did=did, date=date, content=content)
           
self.assertTrue(datasourceapi.insert_datasource_data(dsdobj=dsdobj))
            db_data
=datasourceapi.get_datasource_data_at(did=did, date=date)
           
self.assertTrue(isinstance(db_data, ormdatasource.DatasourceData))
           
self.assertEqual(db_data.did, did)
           
self.assertEqual(db_data.content, content)
           
self.assertEqual(db_data.date, date)




As you can see, we store the date in iso format in the "content" column too, so we know which is the theoretical date stored in each row.

Error:

This test fails with the following error:

    self.assertEqual(db_data.date, date)
AssertionError: datetime.datetime(2014, 12, 31, 17, 57, 53, 999) != datetime.datetime(2014, 12, 31, 17, 57, 53, 1000)   <- in the logs, the 999  and 1000 values are microseconds

Error Analisys:

So, the result of this test is that the content is the same (I am retrieving the same row if I use the same datetime.datetime object that in the insert) but the date of the row received is different (the conversion datetime->int64 and int64->datetime.datetime is not "symetric")

TEST 2:

Description:

 
the second test generates datetime.datetime objects from millisecond 0 to millisecond 999 and stores it on Cassandra. After storing it, it retrieves the row, first using the original datetime.datetime object and the using the datetime.datetime object of the object retrieved, and compares the columns of each one. This is the code:

    def test_insert_datasource_data_millisecond_precision_success_2(self):
       
''' insert_datasource_data must retrieve the same data if we use the date from the insert or the date from the stored row '''
        did
=uuid.uuid4()
        base_sdate
=datetime.datetime.utcnow().isoformat().split('.')[0]
       
for i in range(0,1000):
            ms
=str(i)
           
if len(ms)==1:
                ms
='00'+ms
           
elif len(ms)==2:
                ms
='0'+ms
            content
=base_sdate+'.'+ms
            logger
.logger.debug('Storing: '+content)
            date
=dateutil.parser.parse(content)
            dsdobj
=ormdatasource.DatasourceData(did=did, date=date, content=content)
           
self.assertTrue(datasourceapi.insert_datasource_data(dsdobj=dsdobj))
            db_data
=datasourceapi.get_datasource_data_at(did=did, date=date)
            db_data2
=datasourceapi.get_datasource_data_at(did=did, date=db_data.date)
           
self.assertTrue(isinstance(db_data, ormdatasource.DatasourceData))
           
self.assertTrue(isinstance(db_data2, ormdatasource.DatasourceData))
           
self.assertEqual(db_data.did, db_data2.did)
           
self.assertEqual(db_data.content, db_data2.content)




Error:

The error during the execution of this test I get the following error:

    self.assertEqual(db_data.content, db_data2.content)
AssertionError: '2014-12-31T18:11:47.001' != '2014-12-31T18:11:47.000'
- 2014-12-31T18:11:47.001
?                       ^
+ 2014-12-31T18:11:47.000
?                       ^

Error analisys:

It seems that when I use the datetime object used to insert the row I retrieve the correct row, but when I use the datetime object stored in that row, I retrieve a different one (the one of the previous millisecond), so again seems that the conversion from datetime -> int64 and int64 -> datetime is not symetric.

-------


  • Have anyone experienced this issue?
  • How do you store/retrieve timestamps with millisecond precision consistently?


thank you very much in advance for your help.

Juan












Juan Cañete

unread,
Dec 31, 2014, 4:25:37 PM12/31/14
to python-dr...@lists.datastax.com
I forgot to say that I'm using cassandra-driver 2.1.3 and apache-cassandra 2.1.0

Adam Holmberg

unread,
Jan 5, 2015, 11:03:11 AM1/5/15
to python-dr...@lists.datastax.com
In "Test 1", does something manipulate the 'date' object outside the test context? I don't understand how this loop would generate an expected time with a millisecond component of 1000.

Using a simple script derived from your test (inline below), I am not able to reproduce this here.

Adam

----------------------------------------------------
from cassandra.cluster import Cluster
s = Cluster().connect()

ins = s.prepare('INSERT INTO test.dat_datasource (did, date, content) VALUES (?,?,?)')
sel = s.prepare('SELECT * FROM test.dat_datasource WHERE did=? AND date=?')

did = uuid.uuid4()
base_sdate=datetime.datetime.utcnow().isoformat().split('.')[0]
for i in range(0,1000):
    content="%s.%03d" % (base_sdate, i)
    date=dateutil.parser.parse(content)
    s.execute(ins, (did, date, content))
    row = s.execute(sel, (did, date))[0]
    assert(row.date == date)


To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.

Juan Cañete

unread,
Jan 5, 2015, 11:54:43 AM1/5/15
to python-dr...@lists.datastax.com
Hi,

No, nothing manipulates the date object from the outside. The 1000 value appears because the datetime object does not have a field for milliseconds but microseconds. So the 1000 is correct, meaning 1 ms. The incorrect value is in the object received from cassandra-driver (999 microseconds).

I tried your script and I can reproduce the same error. This is your script with minimal modifications:

#!/usr/bin/env python3

import uuid
import dateutil.parser
import datetime

from cassandra.cluster import Cluster
s
= Cluster().connect()

ins
= s.prepare('INSERT INTO test.dat_datasource (did, date, content) VALUES (?,?,?)')
sel
= s.prepare('SELECT * FROM test.dat_datasource WHERE did=? AND date=?')

did
= uuid.uuid4()
base_sdate
=datetime.datetime.utcnow().isoformat().split('.')[0]
for i in range(0,1000):
    content
="%s.%03d" % (base_sdate, i)
    date
=dateutil.parser.parse(content)
    s
.execute(ins, (did, date, content))
    row
= s.execute(sel, (did, date))[0]

   
assert(row.date == date),"row: %s, expected: %s" % (row.date, date)




The error it throws:

$ ./test.py
Traceback (most recent call last):
 
File "./test.py", line 20, in <module>
   
assert(row.date == date),"row: %s, expected: %s" % (row.date, date)
AssertionError: row: 2015-01-05 16:49:53.000999, expected: 2015-01-05 16:49:53.001000




I'm using python 3.3.2


Regards,

Juan

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.

Adam Holmberg

unread,
Jan 5, 2015, 1:34:36 PM1/5/15
to python-dr...@lists.datastax.com
You're right. It looks like datetime.datetime.utcfromtimestamp is yielding different results for python3:

In [4]: datetime.utcfromtimestamp(1420479023.001)
Out[4]: datetime.datetime(2015, 1, 5, 17, 30, 23, 1000)

In [2]: datetime.utcfromtimestamp(1420479023.001)
Out[2]: datetime.datetime(2015, 1, 5, 17, 30, 23, 999)

Will have to see if there's a workaround.

Adam

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.

Adam Holmberg

unread,
Jan 6, 2015, 9:24:08 AM1/6/15
to python-dr...@lists.datastax.com
A colleague created a ticket to track this issue:

Juan Cañete

unread,
Jan 6, 2015, 9:47:09 AM1/6/15
to python-dr...@lists.datastax.com
Thank you for your help.

Regards,
Juan

Alex Popescu

unread,
Jan 6, 2015, 11:01:04 AM1/6/15
to python-dr...@lists.datastax.com

[:>-a)

Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru
Reply all
Reply to author
Forward
0 new messages