Parameterized conversion types and dataclasses

12 views
Skip to first unread message

Peter Maivald

unread,
May 11, 2023, 1:40:31 PM5/11/23
to python-sqlite
Hello, I'm interested in parameterized types and wondered if anyone had some thoughts about this, particularly when it comes to converting them from an sql query to python types. What I mean by parameterized types is something like the Decimal type in the python library.

I made some sample code to illustrate an idea I had for using this: https://github.com/rogerbinns/apsw/issues/440

Roger Binns

unread,
May 11, 2023, 2:15:25 PM5/11/23
to Python-SQLite group
On Thu, 11 May 2023, at 10:40, Peter Maivald wrote:
> Hello, I'm interested in parameterized types

What are your thoughts on how to detect what the intended Python type should be?

The declared type string for a table column is one source that the existing code uses. Another source is result column names, but that will be project specific.

A shortcut would be to pickle values going into SQLite, and then unpickle all blobs coming out, which would retain full fidelity of the values.

Roger

Peter Maivald

unread,
May 11, 2023, 4:34:31 PM5/11/23
to python-sqlite
The declared type string for a table column is one source that the existing code uses. Another source is result column names, but that will be project specific.

Perhaps it's a project specific choice to use column names as a source for type information; however, due to SQLite it is the *only* source for type information for queries that do arithmetic in the column name part of the select:

SELECT sum(col1) AS sum_DEC$2 FROM ...
SELECT col1,col2,col1+col2 AS sum_DEC$2 FROM ...

In the two examples above, SQLite does not return any type information at all for the values sum(col1) and col1+col2. So, any project that does not use the name from the AS clauses does not have any other way to get that information from the SQL code. The fact that this is the only way to get type information in this situation says to me that there isn't a real choice here. If you want the information, you have use the name from the AS clause.

That said, I don't see any big drawback to doing it this way. Yes you have to take some care in parsing out the name, type and arguments from the AS clause, but everything needs some care.

A shortcut would be to pickle values going into SQLite, and then unpickle all blobs coming out, which would retain full fidelity of the values.

This shortcut costs you quite a few things. You can't easily use any other front end than python. You can't use arithmetic on the columns in SQL, comparison operators, GROUP BY maybee ... I'd believe that it also bloats the database both in terms of storage space and access speed.

Roger Binns

unread,
May 11, 2023, 6:18:11 PM5/11/23
to Python-SQLite group
On Thu, 11 May 2023, at 13:34, Peter Maivald wrote:
> Perhaps it's a project specific choice to use column names as a source
> for type information;

What I meant was that it is project specific what they mean - eg one project may want to map to the decimal type, while another wants to map to a different type.

One thing we didn't discuss is combining columns. For example a complex number in Python could be represented as two columns in SQLite and then combined back into a single complex coming back out.

> Due to SQLite it is the *only* source
> for type information for queries that do arithmetic in the column name
> part of the select:

Python types that are compatible with 64 bit signed integers in SQLite are a way smaller subset of possible types. Is that all that needs to be solved?

> SELECT sum(col1) AS sum_DEC$2 FROM ...
> SELECT col1,col2,col1+col2 AS sum_DEC$2 FROM ...

I also tested using CAST eg

SELECT CAST(col1+col2 AS COMPLEX)

Sadly the "COMPLEX" bit does not come back out in Cursor.description just Nones.

>> A shortcut would be to pickle values going into SQLite, and then unpickle all blobs coming out, which would retain full fidelity of the values.
>
> This shortcut costs you quite a few things. You can't easily use any
> other front end than python.

True.

> You can't use arithmetic on the columns in
> SQL, comparison operators, GROUP BY maybee ...

You could do all of those by functions implemented in Python so you'd have to

SELECT MYADD(col1, col2) .... WHERE MYCMP(col3, col4)>0 GROUP BY MYGROUPER(col5)

> I'd believe that it also
> bloats the database both in terms of storage space and access speed.

Indeed. For example pickling 3+4j is 57 bytes.

SQLite has put a lot of effort into JSON support recently so it is an alternative to pickle and portable to more platforms, plus can be used with builtin functions if you extract relevant parts and that is workable with your types. The same value as JSON is only 30 bytes:

{"type":"complex","i":3,"j":4}

The upcoming release allows using JSON5 which means the double quotes around the keys can be omitted saving another 6 bytes.

Roger

Peter Maivald

unread,
May 12, 2023, 6:23:53 AM5/12/23
to python-sqlite
> Perhaps it's a project specific choice to use column names as a source
> for type information;

What I meant was that it is project specific what they mean - eg one project may want to map to the decimal type, while another wants to map to a different type.

Yes, so each project would register the specific converters and adapters that would cover their needs. They all likely need a way to map type information in sql to python and back. I think that allowing types with arguments enhances that for all projects.
 
One thing we didn't discuss is combining columns. For example a complex number in Python could be represented as two columns in SQLite and then combined back into a single complex coming back out.

Right, combining columns is more difficult. Complex numbers as well as coordinate pairs, triplets, and so on share this problem. It's not only more difficult in the mapping in and out of the database, it's also more difficult in usage within sql itself. sql doesn't have structures, functions of structures, or anything that would help.

Because of this, I've tended to map structures in C to single columns in SQL. The sql column type could be int or blob depending on the size and types of data in the structure. I've always transferred the structures in binary to avoid conversion overhead, but that's not cross platform and doesn't work well with python.

So yes, having a good way to map multi column data to python objects should be a better way.

Python types that are compatible with 64 bit signed integers in SQLite are a way smaller subset of possible types. Is that all that needs to be solved?

Yes, that is a subset of possible types, although an important one. However, nothing other than the two examples in the code I posted limits you to ints in SQLite. I deliberately left space in the sql type declaration for an SQLite affinity. You could use FLOAT_DOU$2 for a float in the database with two decimal digits when output, or BLOB_COOR$3 for 3 binary floats packed into a blob to make a coordinate point.
 
> You can't use arithmetic on the columns in
> SQL, comparison operators, GROUP BY maybee ...

You could do all of those by functions implemented in Python so you'd have to

SELECT MYADD(col1, col2) .... WHERE MYCMP(col3, col4)>0 GROUP BY MYGROUPER(col5)

> I'd believe that it also
> bloats the database both in terms of storage space and access speed.

Indeed. For example pickling 3+4j is 57 bytes.

SQLite has put a lot of effort into JSON support recently so it is an alternative to pickle and portable to more platforms, plus can be used with builtin functions if you extract relevant parts and that is workable with your types. The same value as JSON is only 30 bytes:

{"type":"complex","i":3,"j":4}

The upcoming release allows using JSON5 which means the double quotes around the keys can be omitted saving another 6 bytes.

Yes, all of this is possible, and different approaches likely useful on a project by project basis. I always cringe at the thought of all those conversions back and forth when the database has data in ASCII or some complicated format. But, if it's not an embedded system, then perhaps it's ok. This is definitively a project by project consideration.
Reply all
Reply to author
Forward
0 new messages