Truncated Data on Migration

306 views
Skip to first unread message

Jonathan Z.

unread,
Jan 22, 2011, 11:02:47 AM1/22/11
to web2py-users
I have a number of fields specifying a length > 255 (some as large as
2K). On a recent schema migrate, web2py truncated all of these fields
to 255, resulting in substantial data loss.

My database is mysql. I'm on version 1.91.6 of web2py. I'm pretty
sure this has something to do with the new DAL.

Massimo Di Pierro

unread,
Jan 22, 2011, 1:55:00 PM1/22/11
to web2py-users
Can you tell us what the model was before and after migration?

Massimo

Jonathan Z.

unread,
Jan 22, 2011, 2:48:38 PM1/22/11
to web2py-users
Massimo,

Here is an example of one of the tables that got a truncated field:

before...

db.define_table("thing",
Field("user_id", db.auth_user, notnull=True, readable=False,
writable=False),
Field("title", "string", length=128, notnull=True, unique=True,
required=True),
Field("slug", "string", length=128, notnull=True, unique=True,
required=True),
Field("state", "string", length=10, notnull=True, required=True,
default="created", requires=IS_IN_SET(("created", "started",
"completed"))),
Field("description", "string", length=2048),
Field("created", "datetime", notnull=True, readable=False,
writable=False, default=request.now),
Field("updated", "datetime", readable=False, writable=False,
update=request.now),
Field("uuid", "string", length=16, notnull=True, unique=True,
readable=False, writable=False),
format="%(title)s"
)

after...

db.define_table("thing",
Field("user_id", db.auth_user, notnull=True, readable=False,
writable=False),
Field("title", "string", length=128, notnull=True, unique=True,
required=True),
Field("slug", "string", length=128, notnull=True, unique=True,
required=True),
Field("state", "string", length=10, notnull=True, required=True,
default="created", requires=IS_IN_SET(("created", "started",
"completed"))),
Field("description", "string", length=2048),
Field("things", "list:string"), # NEW FIELD
Field("created", "datetime", notnull=True, readable=False,
writable=False, default=request.now),
Field("updated", "datetime", readable=False, writable=False,
update=request.now),
Field("uuid", "string", length=16, notnull=True, unique=True,
readable=False, writable=False),
format="%(title)s"
)

After this change (addition of a new field) and subsequent DB
migration, the "description" field was truncated from 2048 characters
to 255. Note that this change didn't alter the field length on
sqlite. But it did truncate the description field on mysql.

On Jan 22, 10:55 am, Massimo Di Pierro <massimo.dipie...@gmail.com>
wrote:

Jonathan Z.

unread,
Jan 22, 2011, 3:43:30 PM1/22/11
to web2py-users
It's probably also important to note that the "before" table existed
since 1.85.x. I keep the "migrate=False" flag set on it, until I make
a change. As I noted previously, the change happened on 1.91.6. So
the issue is data loss affected by changes in the web2py DAL.

I've been digging around, and my guess is that this line (3694) in
dal.Table is the problem:

field.length = min(field.length,self._db and
self._db._adapter.maxcharlength or INFINITY)

Since maxcharlength for MySQLAdapter is 255, this has the affect of
silently truncating my specified field lengths.

Massimo Di Pierro

unread,
Jan 22, 2011, 4:06:12 PM1/22/11
to web2py-users
I agree that is the problem:

Field("description", "string", length=2048),

according to: http://dev.mysql.com/doc/refman/5.0/en/char.html
"Values in VARCHAR columns are variable-length strings. The length can
be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to
65,535 in 5.0.3 and later versions".

We set varchar max to 255 to avoid problems with older versions of
mysql. Since problems were reported by users.
There should not have been a migration in your case. I will look in
this more.

Jonathan Z.

unread,
Jan 22, 2011, 4:25:27 PM1/22/11
to web2py-users
Thanks for checking it out. In the mean time, I'm working around the
problem by overriding maxcharlength right after setting db, like this:

db = DAL(...)
db._adapter.maxcharlength = 65535

Now I just need to restore a bunch of truncated data :-(

On Jan 22, 1:06 pm, Massimo Di Pierro <massimo.dipie...@gmail.com>

Chris

unread,
Sep 14, 2012, 9:39:14 AM9/14/12
to web...@googlegroups.com
The main problem is that the truncation is done silently.  In my case, we were creating new data models with fields like string(1000) and just recently a team member noted the result in the database was varchar(255).  It violates the principle of least amazement :)  that a directive to create a data field created a field of a different size and didn't complain.

The ideal solution would be to detect the version of MySQL (via select @@version;) during define_table and set the max length based on the version.  (I thought it might be possible to query the MySQL environment variables (show variables;) to get the max length more directly, but that doesn't seem to exist; maybe someone else knows a way to get that.)

I suppose the same could be done for other DBMSs. And whatever the max length is for the particular DBMS / version, if the FIeld() call specifies a length longer than the max, return the same error as for other invalid field lengths (like -1).

Does this seem like the right approach?  I'd be happy to code & test this change and submit for the experts' review.

Chris

unread,
Apr 3, 2013, 2:40:16 PM4/3/13
to web...@googlegroups.com
It would probably be difficult to keep DAL up to date with DBMS releases -- would hate to have to rev some part of web2py every time even one of the supported DBMSs has a version change.  And even then, version alone may not answer the question, since max field lengths can be affected by database configuration options as well.

A better solution would be for DAL to probe the actual DBMS capabilities -- if there is a question about an issue like a max string length, allow DAL to control one table (dbms_capabilities?) that can be used to test actual limits.  E.g. if a DB model file specifies a string field of length 8000 and that is above the previously-tested limit, drop and create dbms_capabilities with a string(8000) field, write 8000 characters to it, read it back, and see if the result matches the input.  This would be a very thorough way to find the limits of any database exactly as configured.

Not sure if this would be needed for other data types too (decimal precision or ??) but could be generalized.

The original problem of silently accepting a database definition that the DBMS can't produce is really concerning.




On Friday, September 14, 2012 9:39:14 AM UTC-4, Chris wrote:
... whatever the max length is for the particular DBMS / version, if the Field() call specifies a length longer than the max, return the same error as for other invalid field lengths (like -1).


Niphlod

unread,
Apr 3, 2013, 2:49:44 PM4/3/13
to web...@googlegroups.com
that's more a place for unittesting the app itself than having web2py to check for it for every db (and every version of engine) at all times.
It's true that DAL absracts away the 90% of incompatibilities, but still you *need* to evaluate your choices when it comes to a production app, being aware of all pros and cons.

Cliff Kachinske

unread,
Apr 3, 2013, 3:09:16 PM4/3/13
to web...@googlegroups.com
I don't think it's smart to let DAL migrate your tables in production.  Much better to write your own scripts to handle it.  

Then of course you should test those scripts before you get to the production instance.  

The simplest and easiest place to create and test the scripts is on your development machine.  That means your development machine should be running the same DBMS as your production box.

Using the same DBMS on your production and development boxes is the right way to do things IMO, because it will help you avoid the kinds of surprises that triggered this thread in the first place.

At some point you have to get to know your DBMS.

Chris

unread,
Jun 19, 2013, 8:52:26 AM6/19/13
to web...@googlegroups.com
The main problem is that the failure is silent.  It makes no sense that a developer declares a field to be length 1000, and DAL arbitrarily reduces that to 255 instead of providing an error message.  I understand why 255 may be a safe default limit for MySQL; but if so then fail the field creation with a relevant error message (and maybe reference Jonathan's solution re: how to override maxcharlength if the developer knows the DBMS version can handle the larger limit).

In the current 2.5.1 code, looking in dal.py:

class MySQLAdapter(BaseAdapter):
    ....
    maxcharlength = 255

...

class Table(object):
    ...
    def __init__(        self,
        db,
        tablename,
        *fields,
        **args
        ):
        ...
        for field in fields:
            ...
            if db and not field.type in ('text', 'blob', 'json') and \
                    db._adapter.maxcharlength < field.length:
                field.length = db._adapter.maxcharlength   ### problem line


I'd recommend the last line above, marked ### problem line, be replaced with something like, raise ValueError('Requested field length exceeds adapter maxcharlength, cannot create.  See http://whatever for a way to override this limit.')

??

Derek

unread,
Jun 19, 2013, 7:15:44 PM6/19/13
to web...@googlegroups.com
It's not web2py's fault that MySQL silently truncates data.

Chris

unread,
Jun 22, 2013, 12:46:33 PM6/22/13
to web...@googlegroups.com
Derek, the main issue isn't truncation at insert time.  In a web2py DAL-mediated create table statement, DAL code unilaterally reduces MySQL varchar lengths to 255, whether the specific version of MySQL could tolerate a longer varchar or not.  The developer says, column abc should be varchar(1000); DAL runs the define_table; the resulting column is varchar(255); no error is raised; developer has no reason to think there's a problem; then inserts data and mySQL silently truncates (corrupts) it.  The DAL problem would be equally bad, although easier for the developer to notice / and sooner, if MySQL didn't perform truncation.  The fact that MySQL performs truncation silently makes it that much more important that DAL not change the column definition silently.

If I use the mysql command line tool, create table xyz (abc varchar(1000)) results in a column of length 1000.  If I use the logically equivalent DAL define_table statement, it results in a column of length 255.  The two approaches, given the same inputs, really should have the same outcome.

The behavior is also inconsistent among DB engines.  For any DBMS other then MySQL, DAL enforces a max varchar length of 2^15.  However, Oracle 9i has a max varchar length of 4000; SQL Server 2005 has a max of 8000; etc.  Many DBMS versions have some limit.  In those cases, DAL attempts to define the table per the developer's instructions and returns a DBMS-specific error if a column length is unacceptable.  The same should be done for MySQL.

The correct behavior is to return an error if the developer issues a define_table statement that can't be fulfilled by the database layer.

Massimo Di Pierro

unread,
Jun 22, 2013, 2:16:58 PM6/22/13
to web...@googlegroups.com
There are historical reason for this and I do not oppose to change it since modern mysql versions support longer varchar.
You can change this yourlself:

db = DAL('mysql://...'_
db._adapter.maxcharlength = 1024 # or whatever you want

If we change the default in web2py this would trigger a migration. Can you help us test it?
Should we change it or remove the check completely. Whatever value we set it to (for mysql and other dbs) it will change in the future and we will run in the same problem. Should web2py perhaps have a check and raise an error? Or should it pass the length to the db without checks?

Anthony

unread,
Jun 22, 2013, 2:55:01 PM6/22/13
to web...@googlegroups.com
In general, I would say it's not a good idea to silently change the explicitly set length of a field. Either pass the length to the database and let the database throw an exception, or raise an exception in web2py (the former is probably preferable, since web2py won't know the true limit of the particular database version).

Anthony

Massimo Di Pierro

unread,
Jun 23, 2013, 12:53:05 AM6/23/13
to web...@googlegroups.com
OK. I removed the maxcharlength from dal.py (in trunk).
We now need to understand what will happen to existing mysql apps.
Reply all
Reply to author
Forward
0 new messages