DATE column are considered INTEGER in sqlite

2,685 views
Skip to first unread message

Sébastien de Menten

unread,
Nov 24, 2014, 2:19:46 PM11/24/14
to sqlalche...@googlegroups.com
Hello,

When SA create a DATE column for sqlite it outputs as DDL a DATE type which is interpreted as INTEGER by sqlite.

By default, this is not an issue as sqlite serialises a date as a string through the storage_format string format and the latter is equal to  "%(year)04d-%(month)02d-%(day)02d ".
Hence, the date 23/11/2014 gets formatted as 2014-11-23 and pushes to sqlite which stores it as a TEXT (even if the column is INTEGER as there is dynamic types in sqlite).

However, if we change the storage_format to "%(year)04d%(month)02d%(day)02d", the date gets rendered to 20141123 which is pushed to sqlite and can be interpreted as an INTEGER (which is the default type of the column).However, once SA reads back the date, it cannot parse it anymore (if we use a regexp=r"(\d{4})(\d{2})(\d{2})") as it says that 20141123 is not a string (indeed sqlite handled the value as an INTEGER).

A solution is to replace in the "CREATE TABLE" generation for the sqlite dialect, the DATE type by a TEXT type (and idem for DATETIME and TIME).

Here below the code example that shows the bug.

import datetime

from sqlalchemy import create_engine, Column, Date, Integer
from sqlalchemy.dialects import sqlite
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker


Base = declarative_base()


class BuggyDate(Base):
    __tablename__ = "buggy"
    id = Column(Integer(), primary_key=True)
    sdate = Column(Date())
    idate = Column(sqlite.DATE(
        storage_format="%(year)04d%(month)02d%(day)02d",
        regexp=r"(\d{4})(\d{2})(\d{2})"
    ))


engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)

d = datetime.date(2014, 2, 22)
bd = BuggyDate(sdate=d, idate=d)

Session = sessionmaker(autocommit=False, autoflush=False, bind=engine)()

Session.add(bd)
Session.commit()
print list(engine.execute("select * from buggy"))
# OUTPUT = [(1, u'2014-02-22', 20140222)]
print Session.query(BuggyDate).all()
# ValueError: Couldn't parse date string '20140222' - value is not a string.

A hacky solution is to change the coltype in sqlite DDL Compiler (as below) ... but there should be a cleaner way to do it (but I am not familiar at all with Compilers and Dialects)

class SQLiteDDLCompiler(compiler.DDLCompiler):

    def get_column_specification(self, column, **kwargs):
        if isinstance(column.type, (DATE, TIME, DATETIME)):
            coltype="TEXT"
        else:
            coltype = self.dialect.type_compiler.process(column.type)

kr

sebastien


Michael Bayer

unread,
Nov 24, 2014, 2:52:14 PM11/24/14
to sqlalche...@googlegroups.com
On Nov 24, 2014, at 2:19 PM, Sébastien de Menten <sdem...@gmail.com> wrote:

Hello,

When SA create a DATE column for sqlite it outputs as DDL a DATE type which is interpreted as INTEGER by sqlite.

By default, this is not an issue as sqlite serialises a date as a string through the storage_format string format and the latter is equal to  "%(year)04d-%(month)02d-%(day)02d ".
Hence, the date 23/11/2014 gets formatted as 2014-11-23 and pushes to sqlite which stores it as a TEXT (even if the column is INTEGER as there is dynamic types in sqlite).

However, if we change the storage_format to "%(year)04d%(month)02d%(day)02d", the date gets rendered to 20141123 which is pushed to sqlite and can be interpreted as an INTEGER (which is the default type of the column).However, once SA reads back the date, it cannot parse it anymore (if we use a regexp=r"(\d{4})(\d{2})(\d{2})") as it says that 20141123 is not a string (indeed sqlite handled the value as an INTEGER).

A solution is to replace in the "CREATE TABLE" generation for the sqlite dialect, the DATE type by a TEXT type (and idem for DATETIME and TIME).

it can’t be TEXT because we can’t reflect that back as a date/datetime type.   I’d recommend the name DATE_CHAR , DATETIME_CHAR, which per https://www.sqlite.org/datatype3.html should trigger the text type affinity, however for the moment won’t work with reflection unless a column_reflect() event is used to intercept.


A hacky solution is to change the coltype in sqlite DDL Compiler (as below) ... but there should be a cleaner way to do it (but I am not familiar at all with Compilers and Dialects)

class SQLiteDDLCompiler(compiler.DDLCompiler):

    def get_column_specification(self, column, **kwargs):
        if isinstance(column.type, (DATE, TIME, DATETIME)):
            coltype="TEXT"
        else:
            coltype = self.dialect.type_compiler.process(column.type)

Mark Grandi

unread,
Nov 24, 2014, 6:01:18 PM11/24/14
to sqlalche...@googlegroups.com


I was discussing this on IRC, and it seems to be that per the sqlite3 documentation, a DATETIME can be stored in one of 3 'types':

SQLite does not have a storage class set aside for storing dates and/or times. Instead, the built-in Date And Time Functions of SQLite are capable of storing dates and times as TEXT, REAL, or INTEGER values:

TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.

So I'm not too familiar on the sqlite internals, but it seems to be setting the actual type affinity when you try and insert data into the row/column, so if you store it as a ISO string, it sets it to TEXT, but if you are storing it like '20141008", then it thinks its an integer so therefore stores it as INTEGER.

I don't see a DATE_CHAR type listing on that url that Michael posted in the sqlite documentation, but I honestly think the code should already work. The only problem is that the processor.py:str_to_datetime_processor_factory() function is using the passed in regex (that you pass in when you create the DATETIME instance), but the 're' module only can work on strings, but we are getting back an integer.

on line 35 of processor.py, i changed

m = rmatch(value)

to be

m = rmatch(str(value))

and that seemed to work for the sample code posted above, however I'm not sure of any other side effects. Since sqlite3 can store datetimes  in 3 different formats, SQLAlchemy should be prepared to receive all 3 formats.

Michael Bayer

unread,
Nov 24, 2014, 6:35:00 PM11/24/14
to sqlalche...@googlegroups.com
On Nov 24, 2014, at 6:01 PM, Mark Grandi <markg...@gmail.com> wrote:

I was discussing this on IRC, and it seems to be that per the sqlite3 documentation, a DATETIME can be stored in one of 3 'types':

SQLite does not have a storage class set aside for storing dates and/or times. Instead, the built-in Date And Time Functions of SQLite are capable of storing dates and times as TEXT, REAL, or INTEGER values:

TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.

So I'm not too familiar on the sqlite internals, but it seems to be setting the actual type affinity when you try and insert data into the row/column, so if you store it as a ISO string, it sets it to TEXT, but if you are storing it like '20141008", then it thinks its an integer so therefore stores it as INTEGER.

I don't see a DATE_CHAR type listing on that url that Michael posted in the sqlite documentation,

SQLite doesn’t have “types” in the usual sense, it has what’s known as a “type affinity”.  This affinity is determined based on either the name of the type, or of the data that’s placed within it.   Per https://www.sqlite.org/datatype3.html, read section 2.1, “Determination of Type Affinity”.  At the column level, this would be rule #2: "If the declared type of the column **contains any of the strings**   "CHAR", "CLOB", or "TEXT" then that column has TEXT affinity. Notice that the type VARCHAR contains the string "CHAR" and is thus assigned TEXT affinity."


but I honestly think the code should already work.

Disagree because SQLAlchemy’s date types for SQLite are documented as implementing exactly one of the above schemes: 


The implementation classes are DATETIME, DATE and TIME. These types represent dates and times as **ISO formatted strings**, which also nicely support ordering. 

The types themselves also make this clear:


Represent a Python datetime object in SQLite using a string.

The default string storage format is:

	"%(year)04d-%(month)02d-%(day)02d %(hour)02d:%(min)02d:%(second)02d.%(microsecond)06d"
The documentation is very clear that these types store strings only with limited ability to modify the format.   There should not be an expectation that these types will attempt to figure out what type of conversion to use based on the kind of data actually observed; this kind of thing is guessing, and would be error prone and non-performant.  

The use cases of types based on epoch and julian days are fully supported, by creating your own type which does the conversion you want.    An example of exactly this type is provided in the documentation for TypeDecorator at http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#sqlalchemy.types.TypeDecorator, however we can add this example as an explicitly listed example as well under http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#typedecorator-recipes.

class MyEpochType(types.TypeDecorator):
    impl = types.Integer

    epoch = datetime.date(1970, 1, 1)

    def process_bind_param(self, value, dialect):
        return (value - self.epoch).days

    def process_result_value(self, value, dialect):
        return self.epoch + timedelta(days=value)


Since sqlite3 can store datetimes  in 3 different formats, SQLAlchemy should be prepared to receive all 3 formats. 

I will gladly accept a pull request that contains all three variants for all three types (DATETIME, TIME, DATE, for a total of nine types) provided they are idiomatically written and include high quality full documentation and tests as well.   In particular, the numeric types should also support some degree of arithmetic, which is one of the advantages of epoch / julian day types, so some tests for that also would be nice.



Sébastien de Menten

unread,
Nov 24, 2014, 11:44:02 PM11/24/14
to sqlalche...@googlegroups.com


On Tuesday, November 25, 2014 12:35:00 AM UTC+1, Michael Bayer wrote:

On Nov 24, 2014, at 6:01 PM, Mark Grandi <markg...@gmail.com> wrote:

I was discussing this on IRC, and it seems to be that per the sqlite3 documentation, a DATETIME can be stored in one of 3 'types':

SQLite does not have a storage class set aside for storing dates and/or times. Instead, the built-in Date And Time Functions of SQLite are capable of storing dates and times as TEXT, REAL, or INTEGER values:

TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.

So I'm not too familiar on the sqlite internals, but it seems to be setting the actual type affinity when you try and insert data into the row/column, so if you store it as a ISO string, it sets it to TEXT, but if you are storing it like '20141008", then it thinks its an integer so therefore stores it as INTEGER.

I don't see a DATE_CHAR type listing on that url that Michael posted in the sqlite documentation,

SQLite doesn’t have “types” in the usual sense, it has what’s known as a “type affinity”.  This affinity is determined based on either the name of the type, or of the data that’s placed within it.   Per https://www.sqlite.org/datatype3.html, read section 2.1, “Determination of Type Affinity”.  At the column level, this would be rule #2: "If the declared type of the column **contains any of the strings**   "CHAR", "CLOB", or "TEXT" then that column has TEXT affinity. Notice that the type VARCHAR contains the string "CHAR" and is thus assigned TEXT affinity."


but I honestly think the code should already work.

Disagree because SQLAlchemy’s date types for SQLite are documented as implementing exactly one of the above schemes: 


The implementation classes are DATETIME, DATE and TIME. These types represent dates and times as **ISO formatted strings**, which also nicely support ordering. 

The types themselves also make this clear:


Represent a Python datetime object in SQLite using a string.

The default string storage format is:

	"%(year)04d-%(month)02d-%(day)02d %(hour)02d:%(min)02d:%(second)02d.%(microsecond)06d"
The documentation is very clear that these types store strings only with limited ability to modify the format.   There should not be an expectation that these types will attempt to figure out what type of conversion to use based on the kind of data actually observed; this kind of thing is guessing, and would be error prone and non-performant.  


If I am using a storage_format that does not render "ISO formatted strings" (for instance "20141125"), should I expect more issues than simply the one of this post (not being able to parse back) ? I am thinking about date comparison or other similar things. From reading the doc "The storage format can be customized to some degree using the storage_format and regexpparameters, such as:", it was not clear to me that it had to stay an ISO formatted string (as recognised by sqlite in paragraph "Time Strings" of its doc).

If so, then it is indeed not a good idea to reuse the sqlite.DATETIME(storage_format="...", regexp="...") for my needs. I'll implement a TypeDecorator with, if dialect=sqlite:
- type = sqlite.TEXT
- process_bind_param that reuses sqlite.DATETIME(storage_format="...", regexp="...).bind_processor(engine)(value)
- process_result_value that reuses sqlite.DATETIME(storage_format="...", regexp="...).result_processor(engine, sqlite.TEXT)(value)
to avoid reimplementing the logic of parsing/writing of dates already implemented in the sqlite SA types.

Sébastien de Menten

unread,
Nov 24, 2014, 11:46:56 PM11/24/14
to sqlalche...@googlegroups.com
I have tried the following but without success (as it does not change, in the CREATE TABLE, the DATETIME to TEXT). Isn't it the way to use it ?

@compiles(DATE, 'sqlite')
@compiles(DATETIME, 'sqlite')
@compiles(TIME, 'sqlite')
def compile_date(element, compiler, **kw):
    return "TEXT"

class MyDateTime(types.TypeDecorator):
    """Used to customise the DateTime type for sqlite (ie without the separators as in gnucash)
    """
    impl = types.TypeEngine

    def load_dialect_impl(self, dialect):
        if dialect.name == "sqlite":
            return sqlite.DATETIME(
            storage_format="%(year)04d%(month)02d%(day)02d%(hour)02d%(minute)02d%(second)02d",
                regexp=r"(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})",
            )
        else:
            return types.DateTime()
 

Michael Bayer

unread,
Nov 25, 2014, 12:19:47 AM11/25/14
to sqlalche...@googlegroups.com
On Nov 24, 2014, at 11:44 PM, Sébastien de Menten <sdem...@gmail.com> wrote:


If I am using a storage_format that does not render "ISO formatted strings" (for instance "20141125"), should I expect more issues than simply the one of this post (not being able to parse back) ?

the fact that SQLite is interpreting as an integer when there’s no alpha chars is a new issue.  Any format can be used, including straight numeric plus a single letter even, if the format is completely numeric then apparently the DDL type name has to be changed.  this is something that should be at least documented or warned about at some point, at the moment I’m just trying to give you workarounds since there’s no immediate change planned.


I am thinking about date comparison or other similar things. From reading the doc "The storage format can be customized to some degree using the storage_format and regexpparameters, such as:", it was not clear to me that it had to stay an ISO formatted string (as recognised by sqlite in paragraph "Time Strings" of its doc).

it’s intended to be ISO but you can change the format to whatever you want.


If so, then it is indeed not a good idea to reuse the sqlite.DATETIME(storage_format="...", regexp="...") for my needs. I'll implement a TypeDecorator with, if dialect=sqlite:
- type = sqlite.TEXT
- process_bind_param that reuses sqlite.DATETIME(storage_format="...", regexp="...).bind_processor(engine)(value)
- process_result_value that reuses sqlite.DATETIME(storage_format="...", regexp="...).result_processor(engine, sqlite.TEXT)(value)
to avoid reimplementing the logic of parsing/writing of dates already implemented in the sqlite SA types.

Simple solution here is to use sqlite.DATETIME as is and just reimplement the DDL to render a type name with TEXT or CHAR in the name as referred to earlier.

Michael Bayer

unread,
Nov 25, 2014, 12:33:08 AM11/25/14
to sqlalche...@googlegroups.com
On Nov 24, 2014, at 11:46 PM, Sébastien de Menten <sdem...@gmail.com> wrote:



On Monday, November 24, 2014 8:52:14 PM UTC+1, Michael Bayer wrote:

On Nov 24, 2014, at 2:19 PM, Sébastien de Menten <sdem...@gmail.com> wrote:

A hacky solution is to change the coltype in sqlite DDL Compiler (as below) ... but there should be a cleaner way to do it (but I am not familiar at all with Compilers and Dialects)

class SQLiteDDLCompiler(compiler.DDLCompiler):

    def get_column_specification(self, column, **kwargs):
        if isinstance(column.type, (DATE, TIME, DATETIME)):
            coltype="TEXT"
        else:
            coltype = self.dialect.type_compiler.process(column.type)



I have tried the following but without success (as it does not change, in the CREATE TABLE, the DATETIME to TEXT). Isn't it the way to use it ?

there’s no need to use a custom type in the solution where we merely change the compilation of DATE, TIME, and DATETIME; you continue using those types.  The following example illustrates a full round trip using your format (and the CHAR_* naming format):

from sqlalchemy.dialects.sqlite import DATE, DATETIME, TIME
from sqlalchemy.ext.compiler import compiles


@compiles(DATE, 'sqlite')
@compiles(DATETIME, 'sqlite')
@compiles(TIME, 'sqlite')
def compile_date(element, compiler, **kw):
    return "CHAR_%s" % element.__class__.__name__

from sqlalchemy import Table, Column, create_engine, MetaData, select
import datetime


m = MetaData()
t = Table(
    'dates', m,
    Column('date', DATE(
        storage_format="%(year)04d%(month)02d%(day)02d",
        regexp=r"(\d{4})(\d{2})(\d{2})",
    )),
    Column('datetime', DATETIME(
        storage_format="%(year)04d%(month)02d%(day)02d%(hour)02d%(minute)02d%(second)02d",
        regexp=r"(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})",
    )),
    Column('time', TIME(
        storage_format="%(hour)02d%(minute)02d%(second)02d",
        regexp=r"(\d{2})(\d{2})(\d{2})",
    ))
)

e = create_engine("sqlite://", echo='debug')
m.create_all(e)

now = datetime.datetime.today().replace(microsecond=0)
nowdate = now.date()
nowtime = now.time()

with e.begin() as conn:
    conn.execute(
        t.insert().values(
            date=nowdate,
            datetime=now,
            time=nowtime
        )
    )
    row = conn.execute(select([t.c.date, t.c.datetime, t.c.time])).first()
    assert row == (nowdate, now, nowtime), "%s %s" % (row, (nowdate, now, nowtime))

output (note echo=‘debug’ illustrates strings coming back):


CREATE TABLE dates (
date CHAR_DATE, 
datetime CHAR_DATETIME, 
time CHAR_TIME
)


2014-11-25 00:30:54,563 INFO sqlalchemy.engine.base.Engine ()
2014-11-25 00:30:54,563 INFO sqlalchemy.engine.base.Engine COMMIT
2014-11-25 00:30:54,564 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2014-11-25 00:30:54,564 INFO sqlalchemy.engine.base.Engine INSERT INTO dates (date, datetime, time) VALUES (?, ?, ?)
2014-11-25 00:30:54,564 INFO sqlalchemy.engine.base.Engine ('20141125', '20141125003054', '003054')
2014-11-25 00:30:54,564 INFO sqlalchemy.engine.base.Engine SELECT dates.date, dates.datetime, dates.time 
FROM dates
2014-11-25 00:30:54,564 INFO sqlalchemy.engine.base.Engine ()
2014-11-25 00:30:54,565 DEBUG sqlalchemy.engine.base.Engine Col ('date', 'datetime', 'time')
2014-11-25 00:30:54,565 DEBUG sqlalchemy.engine.base.Engine Row (u'20141125', u'20141125003054', u'003054')
2014-11-25 00:30:54,565 INFO sqlalchemy.engine.base.Engine COMMIT



Sébastien de Menten

unread,
Nov 25, 2014, 2:46:46 PM11/25/14
to sqlalche...@googlegroups.com


On Tuesday, November 25, 2014 6:19:47 AM UTC+1, Michael Bayer wrote:

On Nov 24, 2014, at 11:44 PM, Sébastien de Menten <sdem...@gmail.com> wrote:


If I am using a storage_format that does not render "ISO formatted strings" (for instance "20141125"), should I expect more issues than simply the one of this post (not being able to parse back) ?

the fact that SQLite is interpreting as an integer when there’s no alpha chars is a new issue.  Any format can be used, including straight numeric plus a single letter even, if the format is completely numeric then apparently the DDL type name has to be changed.  this is something that should be at least documented or warned about at some point, at the moment I’m just trying to give you workarounds since there’s no immediate change planned.


Your workaround works perfectly and is indeed the solution i was looking for (and not my horrible hack) ! 

I wonder if the DDL type name shouldn't have a TEXT affinity instead of a NUMERIC given the use by SA of this type (storing "strings"). As we can see, the current situation works just because we push strings to the column and that sqlite is fine storing/retrieving strings in a numeric column. 
If this is too much of a change (in terms of backward-compatibility), maybe just a try/except/raise clause to clarify the error message when getting back an int for date field in sqlite could be sufficient (+doc update)... but it is 2 cents tip :-)

Michael Bayer

unread,
Nov 25, 2014, 3:19:17 PM11/25/14
to sqlalche...@googlegroups.com

> On Nov 25, 2014, at 2:46 PM, Sébastien de Menten <sdem...@gmail.com> wrote:
>
>
> Your workaround works perfectly and is indeed the solution i was looking for (and not my horrible hack) !
>
> I wonder if the DDL type name shouldn't have a TEXT affinity instead of a NUMERIC given the use by SA of this type (storing "strings"). As we can see, the current situation works just because we push strings to the column and that sqlite is fine storing/retrieving strings in a numeric column.
> If this is too much of a change (in terms of backward-compatibility), maybe just a try/except/raise clause to clarify the error message when getting back an int for date field in sqlite could be sufficient (+doc update)... but it is 2 cents tip :-)

yes I agree, a solution here would be that we ensure a name of “TEXT” or “CHAR” is in the DDL name so that the type affinity is definitely string, however I’m still evaluating if I want to spring that on the user base as a full change, or perhaps an option or something that is documented / warned about.


Michael Bayer

unread,
Nov 25, 2014, 3:49:53 PM11/25/14
to sqlalche...@googlegroups.com
the issue created is at https://bitbucket.org/zzzeek/sqlalchemy/issue/3257/date-datetime-time-names-on-sqlite-use, but for your purposes, you have the workaround so you’re all set. Thanks for reporting the issue.


Reply all
Reply to author
Forward
0 new messages