vanilla SQLite supports a case insensitive LIKE statement only for
ASCII characters. So the following works
SELECT 'a' LIKE 'A'
-> 1
but for characters outside the ASCII range the LIKE statement is case sensitive:
SELECT 'ä' LIKE 'Ä'
-> 0
The same holds true for the LOWER and UPPER functions which work only
for ASCII characters:
SELECT UPPER 'abc'
-> ABC
SELECT UPPER 'μαöä'
-> μαöä
To solve this, SQLite supports the ICU library [1] via the ICU
extension [2]. The problem [3] and a solution [4] has already been
presented on this list. However, there are several drawbacks in using
the ICU library:
• Compared to the quite small and lightweight SQLite library, ICU is
quite large and weights in with several MB.
• The ICU library is a C++ library which brings in a dependeny to the
C++ runtime library.
• ICU should be compiled with the same compiler the Python version was
compiled with. Since Python 2.5 on Windows was compiled using MSVC
2003 and ICU only ships with a solution file for MSVC 2009 this is no
easy task.
• Queries using LIKE become about 3 times slower.
This bothered me quite some time now, and with some luck I recently
came across some simpler solution. sqlite3_unicode.zip [5] is a small
extension for SQLite which tries to solve this case folding problem.
It has been around a while and has been updated to support also more
recent SQLite versions [6]. I was finally able to compile it, and the
latest version, which contains bugfixed tables for accented latin
characters can be found attached to this email. The precompiled
Windows binary for Python 2.5 can be found here: [7] This binary was
compiled to support proper case folding for the LIKE statement. It
supports the UPPER and LOWER functions for characters outside the
ASCII range. The only thing changed from the default #defines was the
automatic unaccenting, which was turned off. So with this build 'ä'
is different from 'a' which is not the case using the default
settings.
• The resulting library is only 9 kB larger than the original one.
• No dependency to the C++ runtime.
• I used MSVC 2003 to compile it, so the same as used for Python 2.5.
• Queries run at the same speed as without the extension.
Currently, the head version of web2py has to be used, which was
recently modified [8] to use a locally installed pysqlite library
before trying to import the version which ships with Python. With
this version you can try in the web2py shell the following:
print db.executesql('SELECT UPPER("ä")')[0][0].encode('utf-8')
which will output the character Ä.
Massimo: Should this version of SQLite be included in the default
web2py binary distribution? The used source file is in the public
domain and has no copyright. I am only able to provide the Windows
binary.
Kind regards,
Markus
[1] http://site.icu-project.org/
[2] http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt
[3] http://groups.google.com/group/web2py/browse_thread/thread/a98973adb0225649/
[4] http://groups.google.com/group/web2py/browse_thread/thread/43fe62d6da330833/
[5] http://ioannis.mpsounds.net/blog/2007/12/19/sqlite-native-unicode-like-support/
[6] http://ioannis.mpsounds.net/blog/2009/01/11/sqlite3_unicode-updated-for-sqlite3-v367/
[7] http://xile.org/le/pysqlite-2.5.5.win32-py2.5.exe
[8] http://bazaar.launchpad.net/~mdipierro/web2py/devel/revision/870
Since there were no objections, it should probably be included:
http://groups.google.com/group/web2py/browse_thread/thread/a7688ce510d46357/
Markus
I precompiled it only for Python 2.5 for Windows since I use this
version. The official web2py binary download also uses 2.5, so it
would be usable there too.
For 2.4 or 2.6 one has to compile the pysqlite extension and
incorporate sqlite3_unicode.c in the compilation.
Markus
No. The data in the database is not affected by this patch. It only
changes how non-ASCII characters are treated in queries when using
UPPER, LOWER, LIKE, ...
> 2) Can you send me step by step instructions for including the file
> [7] that you linked into web2py_win.zip
• Download the source version of pysqlite from
http://oss.itsystementwicklung.de/trac/pysqlite/
• You need to have MSVC 2003 installed.
• Try building pysqlite by calling
C:\Python25\python setup.py build_static
C:\Python25\python setup.py bdist_wininst
• Place the attached file sqlite3_unicode.c from my original email
into the 'amalgamation' directory.
• Apply the patch which is attached to this email.
• Rebuild pysqlite using
C:\Python25\python setup.py build_static
C:\Python25\python setup.py bdist_wininst
Now you have an installabel pysqlite extension in the 'dist' directory.
Kind regards,
Markus
http://xile.org/le/web2py_win.zip
I will leave the file there for about a week.
Markus