[pandas] SQLAlchemy bridge

1,755 views
Skip to first unread message

Will Furnass

unread,
Dec 8, 2011, 7:13:05 AM12/8/11
to pystatsmodels
It'd be really neat if there were a clean way of turning the result of
a SQLAlchemy query into a Pandas dataframe. It looks as though some's
already had this idea [1]; does anyone know much about this referenced
feature proposal and how it would be implemented?

Cheers,

Will

[1] https://github.com/wesm/pandas/issues/191

Wouter Overmeire

unread,
Dec 8, 2011, 8:06:10 AM12/8/11
to pystat...@googlegroups.com
2011/12/8 Will Furnass <willf...@gmail.com>

Also curious how [1] will be.

Most of the times i work with pandas, data is retrieved from a database through SQLalchemy. I don`t really feel the need to make it easier/cleaner to convert a SQLalchemy query result in a pandas Series, DataFrame, ...

By the way "someone" you are referring too is the author and i would say "benevolent dictator for life" of pandas :-)

Wes McKinney

unread,
Dec 12, 2011, 4:19:22 PM12/12/11
to pystat...@googlegroups.com

There's not a whole lot to it, frankly, assuming you've created a
sqlalchemy Engine:

df = DataFrame.from_records(engine.execute(sql))

any post-processing / adding of indexes will be up to you. e.g. you could do:

indexed_df = df.set_index(index_column_list)

here engine is an Engine and sql is some string with an SQL query

- Wes

Wes McKinney

unread,
Dec 12, 2011, 4:29:50 PM12/12/11
to pystat...@googlegroups.com

Sorry, shooting from the hip here (or have had too much coffee).
Here's code that may actually work (forgot that SQLAlchemy returns a
ResultProxy object that pandas doesn't understand):

result = engine.execute(sql)
rows = [tuple(x) for x in result]
data = DataFrame.from_records(rows, names=result.keys())

- W

aristotle137

unread,
Dec 13, 2011, 12:58:37 PM12/13/11
to pystatsmodels
I very often find myself implementing a bridge between SQLAlchemy and
pandas along the lines Wes described, but I have problems with pandas'
type inference as once I get the data back from the database, the
dtypes are all object. I just posted a question relating to this
issue.

On Dec 12, 9:29 pm, Wes McKinney <wesmck...@gmail.com> wrote:
> On Mon, Dec 12, 2011 at 4:19 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> > On Thu, Dec 8, 2011 at 8:06 AM, Wouter Overmeire <loda...@gmail.com> wrote:
> >> 2011/12/8 Will Furnass <willfurn...@gmail.com>

hmg

unread,
Dec 20, 2011, 11:43:46 AM12/20/11
to pystatsmodels
Following up on this, I'm just trying to weave together SQLAlchemy's
ORM functionality and pandas (great stuff!). The discussion above was
rather on the case where you just more or less only Python's DB-API.

On a ORM-based query, I will get a list of user-defined objects
mapping (roughly) to rows of database tables; the data are held in
these objects' attributes (names corresponding to columns). One added
complication is that sometimes the data I care about are 'a couple of
tables away', so I need to catch things like
"qry_result[0].some_attr.data_from_another_table".

The best I have come up with so far is pasted below. Does this seem
like a common enough use case to merit an alternative to
DataFrame.from_records() eventually? I'd be happy to look into the
pandas core if given some pointers.

Best,
Hans-Martin


def getattr_recursively(object, name):
"""Same as getattr() built-in, but first split *name* into parts
and
descend into object hierarchy.

Return *None* if any of the attributes along the way is *None*.

"""

current_object = object
for n in name.split('.'):
try:
current_object = getattr(current_object, n)
except AttributeError as err:
if current_object is None:
break
else:
raise err
return current_object


columns = ('age_id', 'is_working.value')
list_of_lists = [[getattr_recursively(row, col) for col in columns]
for row in qry_result]
data = DataFrame.from_records(list_of_lists, columns=columns)

Wes McKinney

unread,
Dec 25, 2011, 10:43:27 PM12/25/11
to pystat...@googlegroups.com

Having only made very basic use of SQLAlchemy I can't immediately
comment, but I imagine you can inspect each value in result.keys() and
if it's a nested record (a row in another table) then I imagine you
get another ResultProxy? So if you wanted to get a "flattened
DataFrame" from the SQLAlchemy results this would probably be the way
to go. Performance won't be all that great for large data sets, but
then you might not be using SQLAlchemy if that were a major concern.

- Wes

Brian Howard

unread,
Oct 23, 2013, 9:17:45 PM10/23/13
to pystat...@googlegroups.com
Just ran across this thread.... the solution I adopted for getting my sqlalchemy db hooked up to pandas:

from sqlalchemy import *
import pandas as pd

q = "select * from mytable"
db = create_engine(<your connection details>)

rs = db.execute(q)
d = rs.fetchall()
h = rs.keys()

dtf = pd.DataFrame.from_records(d,columns=h) #

dtf.plot()



Note that if you use:
   dtf = pd.DataFrame(d,columns=h)  
rather than from_records, pandas screws up type inference, leaving you all object dtypes, causing pandas to barf with an error about no numeric data to plot
Reply all
Reply to author
Forward
0 new messages