comparing datetime objects in db query fails

593 views
Skip to first unread message

Luis Goncalves

unread,
Aug 19, 2011, 2:52:59 AM8/19/11
to web...@googlegroups.com

Doing database queries on datetime.datetime fields seems broken.  Here's an example:

I have a database with a table 'transEntry' which includes a 'datetime' field:

  db.define_table('itemEntry',
      Field('dateTime','datetime'),
      Field('upc', 'string'),
      Field('description', 'string'),
      Field('storeNumber', 'integer'),
      Field('terminalNumber', 'integer'),
      Field('transactionNumber', 'integer'),
      Field('operatorNumber', 'integer'),
      Field('quantity', 'integer'),
      Field('price', 'double'),
      Field('action', 'integer'),
      Field('sequenceNumber', 'integer'))


I do a search based on other fields, and it works fine:

first_search = db((db.transEntry.storeNumber==store_num) &
                            (db.transEntry.terminalNumber==lane_num)).select(orderby=db.transEntry.dateTime)
len(first_search)
>> 5213

Let's look at one of the results:

In [173]: ex = first_search[524].dateTime
In [174]: ex
Out[174]: datetime.datetime(2010, 12, 11, 17, 50, 55)

Now try to find a subset of the original query, entries antecedent to  ex :

In [183]: broken_search = db( (db.transEntry.dateTime < ex) &
                                                (db.transEntry.storeNumber==store_num) &
                                                (db.transEntry.terminalNumber==lane_num) ).select( orderby=db.transEntry.dateTime)

In [184]: len(gar2)
Out[184]: 270

?? Why are there only 270, we were expecting 523 of them??

Let's take a closer look:

In [186]: gar2[1].dateTime
Out[186]: datetime.datetime(2010, 12, 10, 10, 55, 39)

In [187]: gar2[2].dateTime
Out[187]: datetime.datetime(2010, 12, 10, 10, 56, 19)

In [189]: gar2[269].dateTime
Out[189]: datetime.datetime(2010, 12, 10, 22, 40, 26)

In [190]: ex
Out[190]: datetime.datetime(2010, 12, 11, 17, 50, 55)

?? For some reason, the closest result found is almost 24 hours away from    'ex' ??

?? Why didn't it find this one :

In [191]: gar[523].dateTime
Out[191]: datetime.datetime(2010, 12, 11, 17, 49, 37)

I can't understand what's happening! It seems that comparisons of datetime fields in db queries just don't work correctly?

If anyone can explain/fix this, it will be much appreciated!!

Thanks!

Luis.

Jose C

unread,
Aug 19, 2011, 3:12:53 AM8/19/11
to web2py-users
> In [183]: broken_search = db( (db.transEntry.dateTime < ex) &
> (db.transEntry.storeNumber==store_num) &
> (db.transEntry.terminalNumber==lane_num) ).select(
> orderby=db.transEntry.dateTime)
>
> In [184]: len(gar2)
> Out[184]: 270

In your exmaple above, in [183] you're assigning to variable
`broken_search`. From there on you're working with the contents of
variable `gar2`. Could that be your problem, or was it a cut and
paste error?

Luis Goncalves

unread,
Aug 19, 2011, 3:33:19 AM8/19/11
to web...@googlegroups.com

Cut and paste, sorry!

Wanted to rename broken_search for clarity and failed. Late night...

—« sent from my Droid tablet »—

Jose C

unread,
Aug 19, 2011, 5:40:36 AM8/19/11
to web2py-users
Hmm, nothing stands out as wrong in the logic to me.

Could `store_num` or `lane_num` have changed between runs?
Is it a shared db, i.e. is it possible someone else is accessing the
database (adding/removing records) while you're testing?

You have some other inconsistencies in the output above. For
example:
1) you define `itemEntry` yet query `transEntry` in your examples;
2) your record count shows 5213, yet later you said you expect 523
records;
3) you reference variable `gar` at the end, presumably meant to be
`first_search` )

I'd suggest starting a new session and running your code again from
scratch. That usually helps resolve issues for me after a late night
of coding tunnel vision ;)

Anthony

unread,
Aug 19, 2011, 9:53:53 AM8/19/11
to web...@googlegroups.com
Are all the results (2012, 12, 10) and earlier? Maybe it's only comparing the dates and not the times for some reason (don't know why). In that case, since you're using strict < (rather than <=), it would make sense that no results would be later than (2012, 12, 10).
 
Anthony

Luis Goncalves

unread,
Aug 19, 2011, 1:03:59 PM8/19/11
to web...@googlegroups.com
Apologies again - all typos. The first query was built by typing (no longer in my ipython history), and full of mistakes. 

The proper way to do this is to create a simple sample that everyone can replicate (too tired late last night to do so..).

Will do so now, constrained by work interruptions.

Thanks,
Luis.

Luis Goncalves

unread,
Aug 19, 2011, 1:08:02 PM8/19/11
to web...@googlegroups.com
I think you are right, Anthony!

In the original query, first_search (or gar with),  first_search[270] is the first entry on the new date!!!
(and also  first_search[269] is the same entry as broken_query[269], so the search results are the same up to that point).

I will still make a simple example all can try, but I think Anthony is right -- it only checks the date, not the time!

Luis.

Luis Goncalves

unread,
Aug 19, 2011, 1:51:59 PM8/19/11
to web...@googlegroups.com
Here is a simple example ... but it seems to behave properly:

  """ test datetime in db queries """
 
  import sys
  sys.path.append("/mnt/data1/web2py")
  from gluon import *
 
  import datetime
  import random
 
  db = DAL('sqlite://debug_datetime.sqlite', folder='/home/goncalves/matlab/objrec/target_stacked_lanes', auto_import=True)
 
  db.define_table('magic',
      Field('DT', 'datetime'),
      Field('val', 'integer'))
 
  # Dec 12 2010, 12:30:45
  init_date = datetime.datetime(2010, 12, 10, 12, 30, 45)
 
  cur_date = init_date
  cur_val = 0
  for i in range(1000):
      db.magic.insert(DT = cur_date, val = cur_val)
     
      # we increment from 30 to 300 seconds at a time
      rand_int = random.randint(30, 300)
      delta = datetime.timedelta(0, rand_int)
     
      cur_date = cur_date + delta
      cur_val = cur_val + rand_int
     
  # make test date 24 hours later
  test_date = init_date + datetime.timedelta(1)
     
  results = db( db.magic.DT < test_date).select( orderby=db.magic.DT )
 
  print 'Initial date:     ', init_date
  print 'Test date:        ', test_date
  print 'Last query result:', results[-1].DT
  print 'Next entry       :', db.magic[len(results)+1].dT

When run it produces:

Initial date:      2010-12-10 12:30:45
Test date:         2010-12-11 12:30:45
Last query result: 2010-12-11 12:29:55
Next entry in DB : 2010-12-11 12:33:45

So it behaves perfectly fine.  Something else is the problem in my application! Not sure what, though.

I'm running python 2.7 on my desktop, 2.6.5 on laptop, and have the problem on both.
Using webp2y version : Version 1.97.1 (2011-06-26 19:25:44)
I am the only person accesssing the DB.
The DB is 62MB.

Could it be that using a field name of 'dateTime'  is somehow confusing web2py?

Thanks,
Luis.

Massimo Di Pierro

unread,
Aug 19, 2011, 3:41:47 PM8/19/11
to web2py-users
Perhaps not the problem but this:

sys.path.append("/mnt/data1/web2py")

is not thread safe. It will be executed at every request and in the
best of cases you will get a memory leak.

Luis Goncalves

unread,
Aug 19, 2011, 5:00:00 PM8/19/11
to web...@googlegroups.com
Ciao Massimo,

I wasn't clear in my post :  It's not a web app,  I'm using DAL as part of a data analysis program I am writing.  I debug with ipython, but the script/programs I write can be called like :  python test_412_determine_open_or_closed_lane.py   ,   which outputs results to a file.

thx,
Luis.

Luis Goncalves

unread,
Aug 19, 2011, 9:09:51 PM8/19/11
to web...@googlegroups.com
I think I have figured it out, with the help of a python expert at work.

I am using sqlite3, which internally stores data as strings.

When creating my real database (not the toy example), I read raw date and time data as an isoformat string, and relied on DAL/web2py to convert. But perhaps in sqlite the field was just stored as the isoformat string:

db.recs.insert( ...., dateTime = isoformat_string, ... )

where 'dateTime' is a field of type  'datetime'

 If I look at a database row, dateTime is shown as a datetime object, and includes both the proper date and time, so presumably in reading the sqlite database web2py/DAL knows how to translate an isoformat to a datetime:

In [12]: db.recs[110].dateTime
Out[12]: datetime.datetime(2010, 12, 11, 16, 16, 36)

The mismatch occurs when attempting to do a db query with datetime fields, because I think that web2py/DAL converts a datetime to a string with str() instead of datetime.isoformat(),  and so when sqlite does the comparision,  only the date matches, but not the time:

In [7]: str(dt)
Out[7]: '2010-12-11 16:34:34'

In [8]: dt.isoformat()
Out[8]: '2010-12-11T16:34:34'

Before I had this all figured out, I had added a new field to my database,  iso_str, of type 'string',  and now I do all my date/time queries with that and it works fine.   Presumably if I rebuilt my database so that I converted my isoformat date/time data to a datetime object before inserting into the DB, all would be fine too!

Luis.

nick name

unread,
Aug 19, 2011, 10:49:46 PM8/19/11
to web...@googlegroups.com
This thread might be relevant: https://groups.google.com/d/topic/web2py/1N0TwMOgp3o/discussion

If you do use the PARSE_COLTYPES feature, datetimes work perfectly.

Luis Goncalves

unread,
Sep 1, 2011, 1:18:42 PM9/1/11
to web...@googlegroups.com
Thanks for the solution. Sorry for the slow reply - I wanted to try it out, but haven't since I had already fixed the problem by explicitly converting from strings to datetimes, and moved on with my work.

One question:  Why isn't the use of PARSE_COLTYPES the default behavior?  It seems DAL/sqlite doesn't do the right thing without it. 

thx,
Luis.

nick name

unread,
Sep 1, 2011, 7:27:39 PM9/1/11
to web...@googlegroups.com
If I understand correctly, Massimo was unaware of this option when he originally implemented the DAL for sqlite, and making it default now is not backward compatible.

Massimo Di Pierro

unread,
Sep 1, 2011, 11:38:26 PM9/1/11
to web2py-users
Want to try send me a patch?

nick name

unread,
Sep 2, 2011, 11:26:41 PM9/2/11
to web...@googlegroups.com
Two patches that solve this in different ways (both against trunk, but would probably apply cleanly to any recent version):

1 - patch to welcome application

diff -r 1462bcd5ea7a applications/welcome/models/db.py
--- a/applications/welcome/models/db.py    Fri Sep 02 20:41:54 2011 -0500
+++ b/applications/welcome/models/db.py    Fri Sep 02 23:04:10 2011 -0400
@@ -7,7 +7,14 @@
 
 if not request.env.web2py_runtime_gae:    
     ## if NOT running on Google App Engine use SQLite or other DB
-    db = DAL('sqlite://storage.sqlite')
+    db_url = 'sqlite://storage.sqlite'
+    if db_url[:7]=='sqlite:':
+        import sqlite3
+        db = DAL(db_url, driver_args={'detect_types':sqlite3.PARSE_DECLTYPES})
+        del sqlite3
+    else:
+        db = DAL(db_url)
+    del db_url
 else:
     ## connect to Google BigTable (optional 'google:datastore://namespace')
     db = DAL('google:datastore')

This is perfectly backward compatible - it would only apply to new applications generated by the wizard, and not change behavior of old ones (and, I delete sqlite3 and db_url at the end so that no new variables are introduced in scope). But it is awkward and confusing.

2 - I believe the existing behavior is a bug, in which case this could be fixed in DAL.py;

diff -r 1462bcd5ea7a gluon/dal.py
--- a/gluon/dal.py    Fri Sep 02 20:41:54 2011 -0500
+++ b/gluon/dal.py    Fri Sep 02 23:16:58 2011 -0400
@@ -1581,6 +1581,9 @@
                 dbpath = os.path.join(self.folder.decode(path_encoding).encode('utf8'),dbpath)
         if not 'check_same_thread' in driver_args:
             driver_args['check_same_thread'] = False
+        if not 'detect_types' in driver_args:
+            driver_args['detect_types'] = self.driver.PARSE_DECLTYPES
+   
         def connect(dbpath=dbpath, driver_args=driver_args):
             return self.driver.Connection(dbpath, **driver_args)
         self.pool_connection(connect)

Which I believe is simpler and more elegant.

(minor: I prefer to write it as "if 'detect_types' not in driver_args", but I stayed with the original style).

Massimo Di Pierro

unread,
Sep 14, 2011, 12:57:26 AM9/14/11
to web...@googlegroups.com
I added this in trunk so we can test it but I do not fully understand what this does. Do you? Can this break backward compatibility?

nick name

unread,
Sep 14, 2011, 1:28:36 AM9/14/11
to web...@googlegroups.com
by default, the sqlite driver will return everything as a string, regardless of what the real type of the field is -- because sqlite actually stores everything as a string (as the original poster describes here https://groups.google.com/d/msg/web2py/5ABy8XH4ujU/6xtq0xQB8nEJ ; ).

I can't find my test case from February that made me start this thread: https://groups.google.com/d/topic/web2py/1N0TwMOgp3o/discussion , and other changes to DAL since then may have fixed all the paths - I was occasionally getting strings out of the database rather than datetimes() as I expected.

This flag to sqlite makes it return date() and datetime() objects if the field type is "date" or "timestamp", instead of strings, like a database driver is expected to do.

I will have time tomorrow to look at my source control and perhaps find an example.

nick name

unread,
Sep 14, 2011, 1:30:30 AM9/14/11
to web...@googlegroups.com
Oh, and regarding backward compatibility - it will break anything that was expecting a string returned from selecting a date or timestamp field. Such code wouldn't work on other databases.

nick name

unread,
Sep 14, 2011, 2:56:15 AM9/14/11
to web...@googlegroups.com
Ok, so I've just spent an hour trying to recreate the old problems I have (without the PARSE_DECLTYPES), and I am unable to.

I know for sure that I was occasionally getting strings back when I expected a datetime() object, because I had a test in the code that would report that, and it stopped reporting after I started using the PARSE_DECLTYPES when you added the driver_args={} parameter in February.

So either there was some other bug that was fixed in the meantime, or it had something to do with the data that was already in my sqlite, or I'm not remembering what triggered the problem.

I spent an hour or so reviewing dal.py (including historical versions), and can't see a way for strings returned from the database to escape to the caller and not be converted by parse().

And as I can't produce a test case that shows this is needed, I apologize for the trouble; as it does change the semantics (e.g., it supports microseconds if already stored in the database, unlike before the change), it is probably wise to back this out until I manage to recreate the problem.

Massimo Di Pierro

unread,
Sep 14, 2011, 8:13:33 AM9/14/11
to web2py-users
Are you telling me that the line

driver_args['detect_types'] = self.driver.PARSE_DECLTYPES

which I added yesterday can be safely removed?

I thought this was a performance issue as well as db(...).select()
returns parsed types and it is best if the driver does it as opposed
if web2py does it.

nick name

unread,
Sep 14, 2011, 10:11:13 AM9/14/11
to web...@googlegroups.com
Yes.

I'm saying I can't find my test case that was failing without it (although I'm still trying to recreate one).


> I thought this was a performance issue as well as db(...).select()  returns parsed types and it is best if the driver does it as opposed if web2py does it.

It probably is better when the driver does it -- both in the sense of being faster (though need benchmark to prove it) and of being compatible with the "standard" handling; e.g., if you use the PARSE_DECL and your database already contains data that has microseconds, then you will get them when selecting (because the database will produce datetime() objects), whereas if you don't use PARSE_DECL and let web2py do the parsing, those will be silently ignored.

The difference in behavior is given in this example:

------------------------

# testapp/models/test.py
db.define_table('test', Field('t', 'datetime'))
db.test.insert(t=request.now)

# without PARSE_DECLTYPES
./web2py.py -M -S testapp
In [1]: db.executesql('select * from test')
Out[1]: [(1, u'2011-09-14 09:49:00')]

# with PARSE_DECLTYPES
./web2py.py -M -S testapp
In [1]: db.executesql('select * from test')
Out[1]: [(1, datetime.datetime(2011, 9, 14, 9, 56, 36))]

-----------------------

I was sure I had an example where the strings were received from a select(), but I can't find one in my repository and I can't recreate one. I still think the PARSE_DECLTYPES addition is useful for all the reasons above, but it is obviously not as critical as I thought -- I apologies for bringing this up several times, I should have posted a test when I first posted about it.

(Also - I just did hg pull, but did not see it in dal.py -- is development happening on a place other then the hg tip?)

Apologies again.
Reply all
Reply to author
Forward
0 new messages