Is there a way to access columns by column name in firebird-driver ?

197 views
Skip to first unread message

linux guy

unread,
Oct 27, 2022, 1:00:13 PM10/27/22
to firebird-python
Is there a way to access the elements of a row, ie columns, by the
column name in firebird-driver ?

...
cur.execute(select)

for row in cur:
        print(row.columnName("myColumn"))
...

Or are elements of a row accessible by index only, ie row[1], for example ?

I know that one could enum the column names and use them as the index.
Just wondering if something is already built into the driver.

$ pip list | grep firebird
firebird-base      1.4.3
firebird-driver    1.6.0

Thanks

linux guy

unread,
Oct 27, 2022, 3:24:54 PM10/27/22
to firebird-python
FWIW, this code will get the column names for a table.

# get a cursor to the database
cur = con.cursor()

# get the table columns
columnsQuery = "select rdb$field_name from rdb$relation_fields where
rdb$relation_name='table name'"
cur.execute(columnsQuery)

columns = [];
columnName = ''
for row in cur:
# row is a tuple
# get the first item in the row tuple
# convert it to a String
columnName = row[0]
# strip the whitespace
columnName.strip()
columns.append(columnName);

for col in columns:
print(col)

Is there a way to get the column names for a query from the cursor
object that executes the query ?

ie
cur.execute(myQuery)
firstColumn = cur.something.something[0]

Thanks

Tomasz Tyrakowski

unread,
Oct 28, 2022, 3:16:48 AM10/28/22
to firebird-python
A pythonic way would be to unpack the tuple:

for myColumn1, myColumn2, myColumn3 in cur:
    print(myColumn1)
    ...

Is it not feasible in your scenario for some reason?

regards
Tomasz

Tomasz Tyrakowski

unread,
Oct 28, 2022, 3:16:48 AM10/28/22
to firebird-python
To be more specific, if you're looking for a feature like Postgres psycopg driver's cursor_factory=psycopg2.extras.RealDictCursor, which causes fetch methods to return dictionaries rather than tuples, there's no such thing in the FB driver AFAIK. It's not required by the PEP-249 specification and the firebird driver doesn't currently support it (unless I'm wrong, which I'd kinda like to be, 'cause it would be a nice thing to have ;) ).

regards
Tomasz

czwartek, 27 października 2022 o 21:24:54 UTC+2 linux...@gmail.com napisał(a):

pcisar2

unread,
Oct 29, 2022, 5:35:05 AM10/29/22
to firebird-python
The old FDB driver had this via Cursor.fetchonemap() etc. The firebird-driver does not support this as it may lead to excess creation of disctionaries (or dict-like objects) for each row. Also, constructing Cursor.description is not free (may require additional queries for certain data types).
However, this is nice feature and I'd like reintroduce it in firebird-driver. I just did not figured out yet an elegant way to do that without drwabacks that FDB (btw, inherited from KInterbaseDB) implementation has.But I have a few ideas, so I'll try to implement it this year - for Chrismass release ;-)

Dne pátek 28. října 2022 v 9:16:48 UTC+2 uživatel hipercom...@gmail.com napsal:

pcisar2

unread,
Oct 29, 2022, 9:18:24 AM10/29/22
to firebird-python
Well, I added Cursor.to_dict() method that returns (or updates provided) dictionary from row data passed as argument. It's available in the git for experiments, and will appear in next release.
It's not exactly how FDB or psycopg2 handles this feture, but I did not liked either solution. New fetch methods (like in FDB) will only double the already high (mind the scrollable cursors!) number of Cursor.fetch_* methods, and returning dict instead tuple is a direct violation of PEP 249 that requires to return sequences (which dicts are not).

Please note that this method does not perform any checks for valid row or dict to-be-updated to achieve max. performance. It does only single assert that number of items in row is equal to number of output columns in last statement executed by Cursor.
Dne sobota 29. října 2022 v 11:35:05 UTC+2 uživatel pcisar2 napsal:

Tomasz Tyrakowski

unread,
Oct 29, 2022, 10:28:46 AM10/29/22
to firebird-python
Great! Thanks a lot. I've started some arrangements with the OP to implement this and make a PR, but I'm sure your implementation fits better with the rest of the driver code (not mentioning i'd never have done it in a single afternoon). I know PEP-249 requires rows to be represented by sequences and a dict is not one, but still, the original question / request seemed a reasonable one to me (I've been using dict rows in psycopg from time to time and they are useful in some scenarios, like dynamically built queries, which fetch different data sets depending on some factors known only at runtime - those are rare cases, though). Besides, extending the driver beyond what the PEP requires is not a violation strictly speaking - the tuple based API is still there. The performance penalty shouldn't be surprising either and is a cost to be considered before deciding to use the dict-based API. No free lunch.
On the side, if one is not sure what columns are returned by a query (and in which order), and a simple tuple unpacking seems not to be flexible enough, it may suggest certain code smell (like using SELECT *). But hey, I won't be the one to throw that stone ;).
Anyway, I still think it's a valuable addition. I will clone the repo, do some testing and report issues if I find any.
Thank you again for your work.

best regards
Tomasz

Tomasz Tyrakowski

unread,
Oct 29, 2022, 1:28:15 PM10/29/22
to firebird-python
Well, I did some testing and rouch benchmarking and it seems to work just fine. The performance penalty is 2-4%. I tested on a table with about 300k rows, selecting 7 columns of different types (varchar, double precision, timestamp).
The test compared the work time of the loop:

for col1, col2, col3, col4, col5, col6, col7 in cur: # unpacking intentional
    pass

vs

for row in cur:
    d = cur.to_dict(row)

using time.perf_counter. Only the loop has been measured (cursor.execute is excluded from the timing). On my dataset and my machine the actual times were ~8.7s for tuples vs ~8.9s for dictionaries (the times varied from execution to execution, which I repeated 10 times, but the dict/tuple ratio was always between 102 and 104 percent). I connected to localhost:db_path, so it was not an embedded connection (but no actual network trips either).
Of course the first thing I checked was if the dictionary actually contained valid row data and column names (it did ;) ).

cheers
Tomasz

Pavel Cisar

unread,
Oct 30, 2022, 5:54:48 AM10/30/22
to firebir...@googlegroups.com

The psycopg approach changes the return data type for fetch, which is
not extension, but replacement, and thus direct violation of the spec.
Well, it has to be explicitly requested, but still. The KInterbasDB/FDB
approach was much cleaner as it extends Cursor with additional fetch*
methods (like fetchonemap) that return dict-like object. When I created
firebird-driver, I thought to keep this feature, but firebird-driver
supports scrollable cursor with collection of new fetch* methods, so
Cursor will end with 18! individual fetch methods. Which is awful, so I
didn't keep the feature as designed in FDB. But I always wanted to have
it in some sort, as it's really useful.

The to_dict() method is a compromise that I don't liked much either
(which is the reason why it was not implemented in early driver
version), as it does not fit well to some fetch patterns.

But recently I realized that most used pattern is:

for row in cur.execute(cmd):
...

into which to_dict() can be integrated nicely without much hassle and
ugly code.

BTW, simple loop like:

for row in cur.execute(cmd):
d = cur.to_dict(row)
...

May create a lot of dictionaries, and thus consume significant amount of
memory (at least until GC kicks in) or have unstable performance
(because GC kicks in).

To process a lot of rows one by one, it's better to use constructs like:

d = {}
for row in cur.execute(cmd):
d = cur.to_dict(row, d)
...

That use only one dictionary for all rows fetched. Guess it would be
worth to timeit as well?

BTW, after three-year detour to create firebird-driver, new Firebird QA
based on pytest and internal suite of tools for Firebird trace
(IBPhoenix developed new trace plugin better suited for machine data
processing than standard awful text to be parsed), I'm back to Saturnin
(and thus Firebird Butler). You may expect some nice updates next year.

best regards
Pavel Cisar
IBPhoenix

Dne 29. 10. 22 v 16:28 Tomasz Tyrakowski napsal(a):

Tomasz Tyrakowski

unread,
Oct 31, 2022, 7:12:08 AM10/31/22
to firebird-python
Creating an empty dict before the loop doesn't impact processing time in any significant way. Could you explain why you think it should? If you do
d = cur.to_dict(row)
in each iteration, a new dictionary is allocated and returned by to_dict and assigned to d, so starting with an empty dict before the loop may save only the time to create a local scope symbol by Python interpreter, which probably is not much (and my crude way of measurement doesn't even detect it). The time with and without the pre-definition of d were almost identical (varying slightly between particular script executions), sometimes the version without it was even marginally faster ;).

regards
Tomasz

Mark Rotteveel

unread,
Oct 31, 2022, 7:20:22 AM10/31/22
to firebir...@googlegroups.com
On 31-10-2022 12:12, Tomasz Tyrakowski wrote:
> Creating an empty dict before the loop doesn't impact processing time in
> any significant way. Could you explain why you think it should? If you do
> d = cur.to_dict(row)
> in each iteration, a new dictionary is allocated and returned by to_dict
> and assigned to d, so starting with an empty dict before the loop may
> save only the time to create a local scope symbol by Python interpreter,
> which probably is not much (and my crude way of measurement doesn't even
> detect it). The time with and without the pre-definition of d were
> almost identical (varying slightly between particular script
> executions), sometimes the version without it was even marginally faster ;).

I think the idea is to *pass* the dict to cur.to_dict(row, d) (where d
is the dict), so the dict is reused.

Mark
--
Mark Rotteveel

Tomasz Tyrakowski

unread,
Oct 31, 2022, 7:25:56 AM10/31/22
to firebird-python
Yes, you're right. I didn't notice d was passed as the second argument. Sorry for that.
The performance of both versions doesn't difer much. The to_dict(row) version takes on average (over 10 executions) 8.56s in my test case, the version with to_dict(row, d) takes on average 8.40s (so the difference is about 2%).

regards
Tomasz

Tomasz Tyrakowski

unread,
Oct 31, 2022, 7:47:20 AM10/31/22
to firebird-python
Just to be clear. The measurements I made cannot in any way be treated as reliable benchmarks. I did it on my windows (win 11) laptop (no isolated environment) with a local install of FB server (server 2.5.9, client 3.0.10), Python 3.11, a database I happened to be working on at the moment and a hastily stitched Python script. I just wanted to see if the performance penalty is in the range of a single digit percentage or closer to 50% or 100%. So, a difference like 2% can as well be caused by exrernal factors (like a temporary load spike of my windows system - who knows what it does and when ;) ). The only conclusion I can draw is that using to_dict has a relatively small performance penalty, which would probably be even smaller when an  actual network connection is used (and the data travel time increases with respect to the local processing time).

regards
Tomasz

Pavel Cisar

unread,
Oct 31, 2022, 8:40:06 AM10/31/22
to firebir...@googlegroups.com
Hi,

It's great that impact of to_dict() is marginal. The reuse of dictionary
may lead to significant memory savings in comparison to creating new
dictionary for each row. The reused dict means direct update - i.e.:
return d.update(zip(names,data)), while creating new is: return
dict(zip(names,data)), and I expected only marginal difference between
new and update, but it was worth to be confirmed.

regards
Pavel Cisar
IBPhoenix

Tomasz Tyrakowski

unread,
Oct 31, 2022, 9:27:06 AM10/31/22
to firebird-python
I'll think about how to measure the difference in memory usage (don't have much experience in that), but I don't think the to_dict(row) would use much more memory than the version with the recycled dictionary. Dictionaries, like other Python objects, are refcounted, so when the dictionary of the current row gets out of scope at the end of an iteration, its memory should be released (not waiting for the GIL and the GC cycle). But then again, maybe a fast sequence of small allocs-deallocs will indeed increase overall allocation (due to memory fragmentation maybe? but SOA in Python should probably handle that - I don't know, never actually checked it). Besides, apart from the dictionary itself (which is just a single object), there are also the row values inside (and keys, but those are probably just references to metadata allocated once, upon query execution), which are refcounted as well and have to be deallocated when you clear the dictionary to reuse it (and then allocated anew for the next row).
I'll read how to best measure memory consumption in a script and let you know when (if?) I have something interesting to report.

regards
Tomasz

Tomasz Tyrakowski

unread,
Oct 31, 2022, 10:00:34 AM10/31/22
to firebird-python
Okay, so I've measured the loops with and without dictionary reuse with respect to memory allocation.
To be specific, I compared

        tracemalloc.start()

        for row in cur:
            d = cur.to_dict(row)
        print(tracemalloc.get_traced_memory())
        tracemalloc.stop()

with

        tracemalloc.start()
        d = {}
        for row in cur:
            d = cur.to_dict(row, d)
        print(tracemalloc.get_traced_memory())
        tracemalloc.stop()

over 336337 rows (both loops were executed in separate scripts to avoid affecting each other in any way).
The peak memory allocation in the first case was 7380 bytes, in the second case it was 2644 bytes. So, yes, a significant difference. The current allocation (just after the loop, when get_traced_memory gets called) in the first case was 5502 bytes, in the second case just 766 bytes (that's a bit puzzling - I'd rather guess the second one should have shown more allocated memory after the loop, 'cause the last content of d is still there and d is still in the scope, but I double-checked and the result is indeed as shown above). I run the test cases twice, the results were identical.

regards
Tomasz

Pavel Cisar

unread,
Oct 31, 2022, 1:46:06 PM10/31/22
to firebir...@googlegroups.com
Tomasz

thanks for your measurements!

regards
Pavel Cisar
IBPhoenix

Reply all
Reply to author
Forward
0 new messages