to_datetime unexpected behaviour

106 views
Skip to first unread message

Vincent Davis

unread,
May 16, 2015, 11:29:42 AM5/16/15
to pyd...@googlegroups.com
I am expecting the line below to return NaT, instead I get a type error ValueError: day is out of range for month
pd.to_datetime(2291991, format="%m%d%Y", coerce=True, exact=True)

Maybe this is a bug the following is in consistent with the above in that it returns Nat
pd.to_datetime(3321991, format="%m%d%Y", coerce=True, exact=True)

Is there a to get the first line to return NaT?



pd.to_datetime(2291991, format="%m%d%Y", coerce=True, exact=True)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/tseries/tools.py in _convert_listlike(arg, box, format)
    326             try:
--> 327                 values, tz = tslib.datetime_to_datetime64(arg)
    328                 return DatetimeIndex._simple_new(values, None, tz=tz)

pandas/tslib.pyx in pandas.tslib.datetime_to_datetime64 (pandas/tslib.c:23779)()

TypeError: Unrecognized value type: <class 'int'>

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-44-1b71262facfd> in <module>()
----> 1 pd.to_datetime(2291991, format="%m%d%Y", coerce=True, exact=True)
      2 

/Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/tseries/tools.py in to_datetime(arg, errors, dayfirst, utc, box, format, exact, coerce, unit, infer_datetime_format)
    340         return _convert_listlike(arg, box, format)
    341 
--> 342     return _convert_listlike(np.array([ arg ]), box, format)[0]
    343 
    344 class DateParseError(ValueError):

/Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/tseries/tools.py in _convert_listlike(arg, box, format)
    328                 return DatetimeIndex._simple_new(values, None, tz=tz)
    329             except (ValueError, TypeError):
--> 330                 raise e
    331 
    332     if arg is None:

/Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/tseries/tools.py in _convert_listlike(arg, box, format)
    302                     try:
    303                         result = tslib.array_strptime(
--> 304                             arg, format, exact=exact, coerce=coerce
    305                         )
    306                     except (tslib.OutOfBoundsDatetime):

pandas/tslib.pyx in pandas.tslib.array_strptime (pandas/tslib.c:40436)()

ValueError: day is out of range for month

Joris Van den Bossche

unread,
May 16, 2015, 11:58:54 AM5/16/15
to pyd...@googlegroups.com
This seems in any case inconsistent between the 33th of March and 29th of February (both with a day out of range for month). Can you open an issue for this? https://github.com/pydata/pandas/issues

It has something to do with the number of the day, only values above 31 convert to NaT, 31 or lower raises the error (eg also 31 April raise error instead of giving NaT)

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vincent Davis

unread,
May 16, 2015, 12:04:59 PM5/16/15
to pyd...@googlegroups.com
On Saturday, May 16, 2015, Joris Van den Bossche <jorisvand...@gmail.com> wrote:
This seems in any case inconsistent between the 33th of March and 29th of February (both with a day out of range for month). Can you open an issue for this? https://github.com/pydata/pandas/issues

It has something to do with the number of the day, only values above 31 convert to NaT, 31 or lower raises the error (eg also 31 April raise error instead of giving NaT)

I will open an issue later today.
 
You received this message because you are subscribed to a topic in the Google Groups "PyData" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pydata/HYCKvXmW930/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pydata+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
Sent from mobile app.
Vincent Davis
720-301-3003

Vincent Davis

unread,
May 31, 2015, 11:54:52 PM5/31/15
to pyd...@googlegroups.com
This should return '2015-02-29' and not a value error correct?

In [12]: pd.to_datetime('2015-02-29', errors='ignore', format="%Y-%m-%d", coerce=False)

Joris Van den Bossche

unread,
Jun 1, 2015, 3:31:18 AM6/1/15
to pyd...@googlegroups.com

--

Vincent Davis

unread,
Jul 6, 2015, 2:00:10 PM7/6/15
to pyd...@googlegroups.com
Ok, it has been to long, I have a patch for the specific issue # 10154 and a set of tests. I was ask to rebase my initial pull request, which I completely botched it. Starting over, I want to ask someone to review the tests before I create a new pull request. (thanks for your time, I am far from an expert, but enjoy helping)
I think all of these tests should pass, they do not. My initial patch will only address part of the test failures.
You can view the tests file here and the patch for #10154 here

Please review these tests.
class TestDaysInMonth(tm.TestCase):
def test_day_not_in_month_coerce_true_NaT(self):
self.assertTrue(isnull(to_datetime('2015-02-29', coerce=True)))
self.assertTrue(isnull(to_datetime('2015-02-29', format="%Y-%m-%d", coerce=True)))
self.assertTrue(isnull(to_datetime('2015-02-32', format="%Y-%m-%d", coerce=True)))
self.assertTrue(isnull(to_datetime('2015-04-31', format="%Y-%m-%d", coerce=True)))
def test_day_not_in_month_coerce_false_raise(self):
self.assertRaises(ValueError, to_datetime, '2015-02-29', errors='raise', coerce=False)
self.assertRaises(ValueError, to_datetime, '2015-02-29', errors='raise', format="%Y-%m-%d", coerce=False)
self.assertRaises(ValueError, to_datetime, '2015-02-32', errors='raise', format="%Y-%m-%d", coerce=False)
self.assertRaises(ValueError, to_datetime, '2015-04-31', errors='raise', format="%Y-%m-%d", coerce=False)
def test_day_not_in_month_coerce_false_ignore(self):
self.assertRaises(ValueError, to_datetime, '2015-02-29', errors='ignore', coerce=False)
self.assertRaises(ValueError, to_datetime, '2015-02-29', errors='ignore', format="%Y-%m-%d", coerce=False)
self.assertRaises(ValueError, to_datetime, '2015-02-32', errors='ignore', format="%Y-%m-%d", coerce=False)
self.assertRaises(ValueError, to_datetime, '2015-04-31', errors='ignore', format="%Y-%m-%d", coerce=False)



On Saturday, May 16, 2015 at 9:29:42 AM UTC-6, Vincent Davis wrote:

Vincent Davis

unread,
Jul 6, 2015, 5:04:16 PM7/6/15
to pyd...@googlegroups.com
I need to correct the last part of the test. Second line below is different.
def test_day_not_in_month_coerce_false_ignore(self):
self.assertEqual(to_datetime('2015-02-29', errors='ignore', coerce=False), '2015-02-29')
self.assertRaises(ValueError, to_datetime, '2015-02-29', errors='ignore', format="%Y-%m-%d", coerce=False)
self.assertRaises(ValueError, to_datetime, '2015-02-32', errors='ignore', format="%Y-%m-%d", coerce=False)
self.assertRaises(ValueError, to_datetime, '2015-04-31', errors='ignore', format="%Y-%m-%d", coerce=False)


Reply all
Reply to author
Forward
0 new messages