Request for SQLITE_ENABLE_COLUMN_METADATA on by default in Pypi

30 views
Skip to first unread message

Peter Maivald

unread,
Apr 18, 2023, 4:20:00 PM4/18/23
to python-sqlite
Hi, Thanks for making APSW maximally compatible with so many variations of Python and Sqlite. I am using APSW installed by pip on debian and Python 3.9.2, and I don't have description_full available even though the Sqlite C library I have installed does support sqlite3_column_table_name, etc. (I regularly use it through the C api.) I realize that this could potentially break those situations where someone does not have that enabled in their Sqlite library, but my guess is that situation would not be common.

Of course, the most elegant solution would be to detect if the loaded library has these functions defined, but perhaps that's not important. I'll plan to look into compiling APSW myself, but I was hoping to avoid that since it seems that having SQLITE_ENABLE_COLUMN_METADATA on by default shouldn't be a problem.

Roger Binns

unread,
Apr 18, 2023, 4:48:18 PM4/18/23
to Python-SQLite group
On Tue, 18 Apr 2023, at 11:03, Peter Maivald wrote:
> I am using APSW installed by pip on debian and
> Python 3.9.2, and I don't have description_full available even though
> the Sqlite C library I have installed does support
> sqlite3_column_table_name, etc.

When you install APSW from pip, the SQLite library is embedded inside, and not exposed outside of the extension. The system SQLite is completely ignored.

> (I regularly use it through the C api.)
> I realize that this could potentially break those situations where
> someone does not have that enabled in their Sqlite library, but my
> guess is that situation would not be common.

The SQLite configuration in terms of which APIs are present has to be known at compile time for APSW, whether statically embedding or dynamic linking, because of how C functions are called.

> Of course, the most elegant solution would be to detect if the loaded
> library has these functions defined, but perhaps that's not important.

That would only be possible if the functions were found at runtime using dlsym/GetProcAddress which would require a drastic rearchitecting of APSW. In the earlier days I did consider having APSW be a SQLite loadable extension because that could work (a table of function pointers is provided at init time). But yeah that is a different thing.

> I'll plan to look into compiling APSW myself, but I was hoping to avoid
> that since it seems that having SQLITE_ENABLE_COLUMN_METADATA on by
> default shouldn't be a problem.

I'm getting to a solution for you, but some context is worthwhile ...

SQLITE_ENABLE_COLUMN_METADATA is off by default by choice of the authors of SQLite, and they should know best. I did look into the actual implementation a while back and it was several additional memory allocations and string copies for each prepared statement. They take the "Lite" in the name seriously.

Column names are a fairly frequent post in the SQLite forums. Question 28 in the FAQ also covers it: https://www.sqlite.org/faq.html#q28 - you are expected to name your result columns in the query if you want them to be meaningful, and not depend on arbitrary names the library may happen to come up with.

Onto a resolution. You can grab the APSW source distribution from pypi or the github releases. The pypi version includes a setup.cfg that forces a download and embedding of SQLite. Delete the setup.cfg to prevent that.

Then use something like:

python3 setup.py build_ext --use-system-sqlite-config install --user

Of note the --use-system-sqlite-config flag will use ctypes to find a SQLite library and find its compile options, applying them to the APSW build.

This is covered in the documentation, but there is a lot of detail in there. Feedback for improvements is welcome.

https://rogerbinns.github.io/apsw/download.html
https://rogerbinns.github.io/apsw/build.html

As for turning on SQLITE_ENABLE_COLUMN_METADATA by default in the same setup.cfg so all pip installs get it, is something I've considered. Feedback welcome.

Roger

Peter Maivald

unread,
Apr 19, 2023, 12:06:59 PM4/19/23
to python-sqlite
Thanks for the detailed answers, since you asked for feedback, I'll try sharing my thoughts...
 
SQLITE_ENABLE_COLUMN_METADATA is off by default by choice of the authors of SQLite, and they should know best. I did look into the actual implementation a while back and it was several additional memory allocations and string copies for each prepared statement. They take the "Lite" in the name seriously.
 
I like the "Lite," but that was extremely important, I would use the C interface. I think that once Python is in the mix a few strdups wouldn't make a perceptible difference. Interestingly enough, I've been using the SQLite package on debian for some time and apparently they do have SQLITE_ENABLE_COLUMN_METADATA on. That said, I imagine that I could live without this, but I thought that switching to Python means not worrying about a few extra things going on internally.

Column names are a fairly frequent post in the SQLite forums. Question 28 in the FAQ also covers it: https://www.sqlite.org/faq.html#q28 - you are expected to name your result columns in the query if you want them to be meaningful, and not depend on arbitrary names the library may happen to come up with.

Actually, I don't usually use the column names in my queries at all, rather I index into the rows by number. The times I most use table and column names is for prototyping and testing, when I want to automatically display or log as much information about the query as possible. In both situations, I often use querys to search for errors in a database, if they return no rows, then I just continue on. If they do return some rows, then I log the output using column and table names. For prototyping, I like to quickly write queries and worry about naming columns, etc. later.

While thinking about this, I realized that I often like to see if there's any data before deciding what to do next. I think a function like bool hasdata(APSWCursor *self) return self->status == C_ROW; /*exported to Python cursor.hasdata()*/.

It's also occurred to me to be able to call something like sqlite3_reset on a cursor to start a result set over again, but now I think that those situations can be handled better in other ways.

Onto a resolution. You can grab the APSW source distribution from pypi or the github releases.

Let me summarize how I got to this point:

I just tried import apsw, and it worked! Apparently I had the debian calibre package installed which has a dependency on python3-apsw. That was good for a while, but I got interested in apsw.ext which the python3-apsw package doesn't have (or any other debian package I think).

To install from pypi, I needed to uninstall python3-apsw, which I could do by editing /var/lib/dpkg/status and removing the python3-apsw dependency so I could keep calibre. After past experiences with python debian packages, I checked for leftovers and found the files /usr/lib/python3/dist-packages/{apsw-3.34.0_r1.egg-info,apsw.cpython-39-x86_64-linux-gnu.so} which I deleted to avoid possible later headaches.

Installing the pypi went quite easily, as did removing it once I went to try building it. I wasn't set on using the sqlite system library, for some reason, I just made the assumption the pypi package was.

This is covered in the documentation, but there is a lot of detail in there. Feedback for improvements is welcome.
https://rogerbinns.github.io/apsw/download.html
https://rogerbinns.github.io/apsw/build.html

I juggled around the calls to setup.py for a while, but without looking into the setup.py code:
python3 setup.py fetch --all
python3 setup.py build_ext --definevalues SQLITE_ENABLE_COLUMN_METADATA
python3 setup.py build
sudo python3 setup.py install
python3 setup.py test

I have it working now, but some things I couldn't figure out from the documentation:
What is the difference between fetch --all and --sqlite (or are these the same)?
What is the difference between build and build_ext? Do you need both and in what order (build doesn't take the --definevalues argument)?
Where does test get apsw from? It wouldn't run until I ran the install, is that required?

As for turning on SQLITE_ENABLE_COLUMN_METADATA by default in the same setup.cfg so all pip installs get it, is something I've considered. Feedback welcome.

My vote is for turning it on. 

Roger Binns

unread,
Apr 19, 2023, 8:00:09 PM4/19/23
to Python-SQLite group


On Wed, 19 Apr 2023, at 08:59, Peter Maivald wrote:
> I've been
> using the SQLite package on debian for some time and apparently they do
> have SQLITE_ENABLE_COLUMN_METADATA on.

I went and checked a bunch of Linux distros and they all had the option on, but they didn't agree on most other options! For example only some have SQLITE_ENABLE_API_ARMOR, which APSW also enables by default.

So it will be enabled by default in the next release:

https://github.com/rogerbinns/apsw/issues/435

> The times I most use table and
> column names is for prototyping and testing, when I want to
> automatically display or log as much information about the query as
> possible.

That is also where I want to make APSW more useful. There is a lot of work behind the scenes to catch any potential bugs in code, and give useful diagnostics. That is one reason API armour is on, and why apsw.trace exists.

> While thinking about this, I realized that I often like to see if
> there's any data before deciding what to do next.

This is a bit tricky. It would be easy if Python had a standard way of telling if an iterator is exhausted, but you can't without asking for the next item. If the status is C_ROW then you are sitting at a result row, but that doesn't mean there are any more.

Code I've done recently uses apsw.ext.query_info() to get the information about the query and columns, and then for/else for the rows:

for row in con.execute("....."):
print(row)
else:
print("no rows returned")

> It's also occurred to me to be able to call something like
> sqlite3_reset on a cursor to start a result set over again

That is quite doable, although as you noted it isn't that necessary. The second time someone finds it useful I'll add it.

> I just tried import apsw, and it worked! Apparently I had the debian
> calibre package installed which has a dependency on python3-apsw. That
> was good for a while, but I got interested in apsw.ext which the
> python3-apsw package doesn't have (or any other debian package I think).

The various distros are behind by varying amounts. The version of APSW requires at least the corresponding version of the SQLite library, so they have to decide to update SQLite first. Repology has a nice table of what version is in each distro:

https://repology.org/project/python:apsw/versions

> To install from pypi, I needed to uninstall python3-apsw

That is not the case. The pypi installation completely ignores whatever SQLite is on your system, and even if it didn't that SQLite would probably be too old.

The trick is to install with --user which puts apsw below your home directory. It is only if you are running pip as root that you'll clash with the system package. When you get to Python 3.11 pip will give an error message that "This environment is externally managed" and points to PEP 668 for details.

> What is the difference between fetch --all and --sqlite (or are these
> the same)?

In the olden days there used to be additional components distributed from the SQLite site like fts and an asyncvfs (WAL is better). So --all gets everything while --sqlite only gets SQLite. The SQLite team now include fts directly in the distributed amalgamation, so --all doesn't get anything extra. But every time I go to remove --all I see rumblings on the forum about there being something that would apply. I'll go ahead and remove it from the docs, until it actually does something again.

> What is the difference between build and build_ext?

build_ext only builds an extension coded in C. build runs all the build steps which could include doc, translation tables, fortran extensions, and build_ext. It was an important distinction over 20 years ago. Python has been trying to replace setup.py completely, but so far nothing has stuck, which is why I am still stuck on setup.py.

> Do you need both and in what order (build doesn't take the --definevalues argument)?

I should probably reorder the doc to have this first, but what I want you to do, and what would have been easiest is to follow the recommended command:

https://rogerbinns.github.io/apsw/build.html#recommended

Unfortunately the install step is discouraged in recent Python versions so updating those instructions is going to be tricky. As you can probably tell, Python keeps making it difficult for compiled extensions.

build can be omitted, and doesn't take --definevalues because build_ext does as that is affecting the compilation.

> Where does test get apsw from? It wouldn't run until I ran the install,
> is that required?

Yes, the tests import apsw as the first thing so apsw has to be somewhere Python looks for it.

I think I am going to have to combine the Download and Building pages into one, and have it start with the most likely to be used things, then getting into the weeds as the page progresses.

Roger

Peter Maivald

unread,
Apr 23, 2023, 11:53:21 AM4/23/23
to python-sqlite
Thanks for the detailed explanations. This was the first time I built a Python extension, so my issues with that are rather basic.

On Thursday, April 20, 2023 at 2:00:09 AM UTC+2 Roger Binns wrote:
This is a bit tricky. It would be easy if Python had a standard way of telling if an iterator is exhausted, but you can't without asking for the next item. If the status is C_ROW then you are sitting at a result row, but that doesn't mean there are any more.
 
If the status is C_ROW then next would indeed return a row. What I had been using in C would most directly translate to

cur = con.execute(...)
if cur.status == C_ROW: LogError('Inconsistent ...', cur)

I see what you mean about iterators, though. It should instead look like

def TestAndLogError(message, cur):
  for row in cur: print(row)
  else: return False
  return True
cur = con.execute(...)
if TestAndLogError('Inconsistent ...', cur): ...
cur.execute(...)
if TestAndLogError(...): ...

I should probably reorder the doc to have this first, but what I want you to do, and what would have been easiest is to follow the recommended command:
https://rogerbinns.github.io/apsw/build.html#recommended
build can be omitted, and doesn't take --definevalues because build_ext does as that is affecting the compilation.

Hence my confusion about build vs build_ext and specifying --definevalues

I think I am going to have to combine the Download and Building pages into one, and have it start with the most likely to be used things, then getting into the weeds as the page progresses.

I think everyone may have different experiences with these pages. For me, the Download page was fine, but the Build page mentioned setup.py, and I didn't realize that was a standard python utility...

Roger Binns

unread,
Apr 23, 2023, 1:04:43 PM4/23/23
to Python-SQLite group
On Sun, 23 Apr 2023, at 08:52, Peter Maivald wrote:
> Thanks for the detailed explanations. This was the first time I built a
> Python extension, so my issues with that are rather basic.

It used to be simple - APSW still does it the way it was done two decades ago. Since then the mechanism has been deprecated, replaced multiple times, reabsorbed, stalled, authorities setup, alternatives (eg Conda) and more!

> If the status is C_ROW then next would indeed return a row.

My confusion was over a general next call vs the very first one. For example if the iterator of "select 3" is sitting at the result row of (3,) then a call to next wouldn't return another row.

It looks like what you really want is to know if the cursor is exhausted - ie would treating it as an iterator return anything. Under the hood that looks for C_DONE state. A better name than exhausted is needed.

> cur = con.execute(...)
> if cur.status == C_ROW: LogError('Inconsistent ...', cur)

So what we would have is:

cur = con.execute(....)
if cur.exhausted:
....
else:
for row in cur:
...

However this doesn't generally apply unless the code doesn't know what it is executing. "Normal" code would be expecting query results for the queries it issued. eg you won't do a SELECT unless you wanted the results, while you don't care about CREATE TABLE other than it succeeds.

> I think everyone may have different experiences with these pages. For
> me, the Download page was fine, but the Build page mentioned setup.py,
> and I didn't realize that was a standard python utility...

I should probably have a brief mention that setup.py is standard, and also deprecated, and also doesn't have a suitable replacement!

Roger

Peter Maivald

unread,
Apr 26, 2023, 10:04:41 AM4/26/23
to python-sqlite
BTW:
for row in con.execute("....."):
    print(row)
else:
    print("no rows returned")
doesn't work since the else clause is executed after any normal termination of the for, including the case when there are no rows. The primary use of else is that it doesn't get executed during an exit with a break statement. This is what works:
row = None
for row in con.execute("....."):
    print(row)
if row is None:

    print("no rows returned")
Reply all
Reply to author
Forward
0 new messages