Re: [mongodb-dev] PyMongo datetime overflow

1,312 views
Skip to first unread message

Bernie Hackett

unread,
Dec 6, 2012, 9:36:08 AM12/6/12
to mongo...@googlegroups.com
Hi,

Can you tell me what the actual exception is? I assume "OverflowError: date value out of range"? What is your application doing with these these timestamps (i.e. 1.353432683e+12)? Is it converting them to datetime.datetime instances, then doing a range query? If so, how are you doing the conversion? Have you tried with PyMongo's C extensions?

Finally, can you share more of your code so I can understand what is going on?


On Thu, Dec 6, 2012 at 1:47 AM, Guti Deng <guti...@gmail.com> wrote:
Hi, guys,

I've just encountered a circumstance that the calculated 'seconds' variable can cause overflow.

The original source code (bson/__init__.py line220):

def _get_date(data, position, as_class, tz_aware, uuid_subtype):
    seconds = float(struct.unpack("<q", data[position:position + 8])[0]) / 1000.0
    position += 8
    if tz_aware:
        return EPOCH_AWARE + datetime.timedelta(seconds=seconds), position
    return EPOCH_NAIVE + datetime.timedelta(seconds=seconds), position

I added a print(seconds) before the last line, and got something like this:
1353569100.0
1353395263.58
1353398841.0
1353432688.84
1.353432683e+12
Traceback (most recent call last):
  File "./???.py", line 124, in <module>
    ???.compute({'dt': sys.argv[2]})
  File "./???.py", line 99, in compute
    for doc in cur:
  File "***/py/pymongo/cursor.py", line 778, in next
    if len(self.__data) or self._refresh():
  File "***/py/pymongo/cursor.py", line 742, in _refresh
    limit, self.__id))
  File "***/py/pymongo/cursor.py", line 686, in __send_message
    self.__uuid_subtype)
  File "***/py/pymongo/helpers.py", line 111, in _unpack_response
    as_class, tz_aware, uuid_subtype)
  File "***/py/bson/__init__.py", line 522, in decode_all
    tz_aware, uuid_subtype))
  File "***/py/bson/__init__.py", line 332, in _elements_to_dict
    tz_aware, uuid_subtype)
  File "***/py/bson/__init__.py", line 322, in _element_to_dict
    tz_aware, uuid_subtype)
  File "***/py/bson/__init__.py", line 231, in _get_date
    return EPOCH_NAIVE + datetime.timedelta(seconds=seconds), position



I've added several lines to get Cursor.next() works. I will manually test the result to drop the meaningless datetime values so my project goes fine now.

# EPOCH_OVERFLOW hack:
EPOCH_OVERFLOW = 2**32

## and ##

def _get_date(data, position, as_class, tz_aware, uuid_subtype):
    seconds = float(struct.unpack("<q", data[position:position + 8])[0]) / 1000.0
    position += 8
    if tz_aware:
        return EPOCH_AWARE + datetime.timedelta(seconds=seconds), position
    # hacking overflow issue
    if seconds > EPOCH_OVERFLOW:
        seconds = EPOCH_OVERFLOW
    if seconds < 0:
        seconds = 0
    # hacking overflow issue
    return EPOCH_NAIVE + datetime.timedelta(seconds=seconds), position

I have tried to cut these weird value down by adding '$lt', '$gt' conditions into the 'specs' parameter of Cursor.find(), it doesn't work. But mongod runs without problem, so i guess the problem is of the python driver side.

Now i'm leaving it to the maintainer of PyMongo. Hope my experience helps. 
Thank you! :D

--
You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-dev/-/wIHvHByI4T0J.
To post to this group, send email to mongo...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-dev?hl=en.

Guti Deng

unread,
Dec 10, 2012, 6:43:42 AM12/10/12
to mongo...@googlegroups.com
Hi, Bernie,

1) The exception is exactly OverflowError:
    return EPOCH_NAIVE + datetime.timedelta(seconds=seconds), position
OverflowError: date value out of range

2) These timestamps denote for the time when some events happen. They were written into the Mongodb by some Php-driver when our users' requests come in. I download the whole collection once each day to perform some analyse tasks. It is interesting that the overflowed value doesn't break the php driver.

3) I scan the collection using a plain query:
        col = pymongo.Connection(MONGO_HOST, MONGO_PORT)['XXX']['yyyyyyyy']
        cur = col.find(fields=['ptid', 'ctime', 'start_time', 'status', 'gid',])

        for doc in cur:    # <== exception throws here
            gid, regtime, jstr = PbiItem.parse_mongodoc(doc)
            if regtime > t_boundary: continue
            line = gid + '\t' + 'pbi' + '\t' + jstr + '\n'
            file_idx = int(gid) % FILE_NUM
            files[file_idx].write(line)

The field 'ctime' and 'start_time' are both of datetime types. The exception occurs during the auto conversion to python datetime type. 
Oh, I've tried to add query conditions to limit the upper and lower boundary of the datetime field (1970-01-01 ~ 2038-01-01), it doesn't work.

4) I have no idea what the PyMongos's C extensions are :(

5) I was trying to dump the original 'data' parameters but since they are not plain text, i don't know how to do it. If it helps, will you please show me how?

def _get_date(data, position, as_class, tz_aware, uuid_subtype):
    seconds = float(struct.unpack("<q", data[position:position + 8])[0]) / 1000.0   # <<== sth. wrong here?
    position += 8
    if tz_aware:
        return EPOCH_AWARE + datetime.timedelta(seconds=seconds), position
    #   if seconds > EPOCH_OVERFLOW:
    #       seconds = EPOCH_OVERFLOW
    #   if seconds < 0:
    #       seconds = 0
    print 'data: %s, position: %s, seconds: %s' % (data, position, seconds)
    return EPOCH_NAIVE + datetime.timedelta(seconds=seconds), position


data: _idP?,?h??Kq|?? ctime[ؤ;gid 11395990ptid 10009217 start_time?n?;status, position: 89, seconds: 1353398841.0
data: _idP??ph??Kq}x ctime???;gid 35750251ptid 10033126 start_time?0L???status, position: 32, seconds: 1353432688.84
data: _idP??ph??Kq}x ctime???;gid 35750251ptid 10033126 start_time?0L???status, position: 89, seconds: 1.353432683e+12
Traceback (most recent call last):


thanks :)

Dan Pasette

unread,
Dec 10, 2012, 8:23:40 AM12/10/12
to mongo...@googlegroups.com
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-dev/-/vaUz0SCLlb0J.

Glenn Maynard

unread,
Dec 10, 2012, 10:32:48 AM12/10/12
to mongo...@googlegroups.com
On Mon, Dec 10, 2012 at 5:43 AM, Guti Deng <guti...@gmail.com> wrote:
Hi, Bernie,

1) The exception is exactly OverflowError:
    return EPOCH_NAIVE + datetime.timedelta(seconds=seconds), position
OverflowError: date value out of range

 
data: _idP??ph??Kq}x ctime???;gid 35750251ptid 10033126 start_time?0L???status, position: 89, seconds: 1.353432683e+12
Traceback (most recent call last):

FYI, the maximum "year" value of Python's datetime class is 9999, and this value is about 42000 years in the future.  I don't know why Python has this limitation (if it can handle the year 9999, I'd expect it to generalize), or what the PyMongo bson implementation can do to avoid problems here.  I assume this value is an error in your data, but having the client throw an overflow decoding data that's actually on the server isn't great...

--
Glenn Maynard

Bernie Hackett

unread,
Dec 10, 2012, 7:40:23 PM12/10/12
to mongo...@googlegroups.com
As Glenn said, Python's datetime object is limited to years between 1 and 9999:


PHP is capable of creating timestamps for years far beyond the range that standard python datetime's can handle. To filter out these documents you can do a query like this:

>>> import pymongo
>>> from datetime import datetime
>>> c = pymongo.MongoClient()
>>> for doc in c.dates.dates.find({'mongodate': {'$lte': datetime(9999, 12, 31, 23, 59)}}):
...    doc

Obviously replace 'mongodate' with whatever the datetime field is called in your collection. datetime(9999, 12, 31, 23, 59) is the largest datetime python can create.

4) I have no idea what the PyMongos's C extensions are :(

PyMongo has optional C extensions that greatly improve its performance. See the install docs here for how to get them installed:




To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-dev/-/vaUz0SCLlb0J.

Bernie Hackett

unread,
Dec 10, 2012, 8:13:06 PM12/10/12
to mongo...@googlegroups.com
Interestingly, Python's datetime C API appears to be capable of creating datetime instances with year values outside of what the pure python API will accept. If you install PyMongo's C extensions you will be able to decode timestamps with larger year values. Here's and example of PyMongo's C extensions decoding a timestamp with a year beyond 9999:

{u'_id': ObjectId('50c6692e44415e5207000180'), u'string': u'Sun, 10 Dec 12023 23:58:54 +0100', u'mongodate': datetime.datetime(12023, 12, 10, 22, 58, 54), u'title': u'Example date'}

But if I try to create that datetime in the python interactive shell:

>>> datetime(12023, 12, 10, 22, 58, 54)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: year is out of range

This seems to be a wild inconsistency between Python's pure python datetime module and its C API. Nothing we can do about it PyMongo though.

The moral of the story is always use PyMongo's C extensions if at all possible. :-)

Guti Deng

unread,
Dec 11, 2012, 3:57:13 AM12/11/12
to mongo...@googlegroups.com
Hi, Bernie,

I want to correct my former statement:
     Oh, I've tried to add query conditions to limit the upper and lower boundary of the datetime field (1970-01-01 ~ 2038-01-01), it doesn't work.
 
It actually works. I think i was putting the restriction upon only one problematic datetime field (there are two)...

Well, there are also negative values causing "OverflowError: date value out of range":
    data: _idP??vh??Kq~?% ctime??';gid 11395990ptid 10411796 start_timeH?,;status, position: 32, seconds: 1353577590.72
    data: _idP??vh??Kq~?% ctime??';gid 11395990ptid 10411796 start_timeH?,;status, position: 89, seconds: 1353667517.0
    data: _idP??ph??Kq~?? ctime?]?';gid 11420441ptid 10454924 start_time???'č??status, position: 32, seconds: 1353578864.03
    data: _idP??ph??Kq~?? ctime?]?';gid 11420441ptid 10454924 start_time???'č??status, position: 89, seconds: -1.2560136087e+11
    Traceback (most recent call last):

So, it is necessary to add another restriction(gt 1970). My application runs without exceptions in this way:
       cur = col.find({
                'ctime': {'$lt': datetime.datetime(2038,1,1,0,0), '$gt': datetime.datetime(1970,1,1,0,0)},
                'start_time': {'$lt': datetime.datetime(2038,1,1,0,0), '$gt': datetime.datetime(1970,1,1,0,0)}
            }, fields=['ptid', 'ctime', 'start_time', 'status', 'gid', ])


Hence mongodb allows other drivers (like php) to submit such kind of values while they cannot get represented in Python Datetime type, i think that we have two options to deal with it:
    1) Introduce an exception mechanism for such situations, to inform the users ( of the python driver ) what is happening. Prompt them to add query restrictions so that their codes runs without exceptions but some records maybe ignored, just like what you did to me haha :)
    2) Cutting the overflowed values into the range that python datetime can handle. (This way we save the record, but drop the field.)

Personally I prefer the first one.

About the C extension:
The moral of the story is always use PyMongo's C extensions if at all possible. :-)
I'm glad to get known about the existence of the C extension. It has been a while I setup my environment. Next time I will try it out, i promise :D



Thanks for all of you, for your attentions and helps :)


Guti

Bernie Hackett

unread,
Dec 11, 2012, 8:19:34 AM12/11/12
to mongo...@googlegroups.com
You don't have to limit your app to dates between the epoch and 2038, just years between 1 and 9999. Your code should look like this:

cur = col.find({
                'ctime': {'$lt': datetime.datetime(9999,12,31,59,59), '$gt': datetime.datetime(1,1,1,1,1)},
                'start_time': {'$lt': datetime.datetime(9999,12,31,59,59), '$gt': datetime.datetime(1,1,1,1,1)}
            }, fields=['ptid', 'ctime', 'start_time', 'status', 'gid', ])

Python (and PyMongo) are fully capable of handling those date ranges:

>>> c = pymongo.MongoClient()
>>> c.dates.dates.insert({'date': datetime(1, 1, 1, 1, 1, 1)})
ObjectId('50c7314b430ee6eea78a7ddb')
>>> c.dates.dates.insert({'date': datetime(9999, 12, 31, 23, 59, 59)})
ObjectId('50c73161430ee6eea78a7ddc')
>>>  
>>> list(c.dates.dates.find())
[{u'date': datetime.datetime(1, 1, 1, 1, 1, 1), u'_id': ObjectId('50c7314b430ee6eea78a7ddb')}, {u'date': datetime.datetime(9999, 12, 31, 23, 59, 59), u'_id': ObjectId('50c73161430ee6eea78a7ddc')}]

Furthermore, if you install PyMongo's C extensions you can decode dates with years beyond 9999.


To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-dev/-/T8D7bYQkBvUJ.

Guti Deng

unread,
Dec 12, 2012, 3:38:43 AM12/12/12
to mongo...@googlegroups.com
Ok, I got it.   Thanks! :D
Reply all
Reply to author
Forward
0 new messages