Fwd: [pydata] Re: Read dbf files to Panda dataframe

29 views
Skip to first unread message

Peter Lubell-Doughtie

unread,
Jun 3, 2013, 11:31:04 PM6/3/13
to bambo...@googlegroups.com

Interesting thread on using geospatial data with pandas.

---------- Forwarded message ----------
From: "Daniel Arribas-Bel" <dreame...@gmail.com>
Date: Jun 3, 2013 4:28 PM
Subject: [pydata] Re: Read dbf files to Panda dataframe
To: <pyd...@googlegroups.com>
Cc:

Hi there,

PySAL has I/O support for dbf files and, although pandas integration is not built-in out of the box, it is fairly straightforward to use its dbf reader to load .dbf files into DataFrames. For example, if you have a .dbf with the columns "col1", "col2" and "col3", something like this should work:

>>> import pysal as ps
>>> import pandas as pd
>>> dbf_link = "/path/to/my/file.dbf"
>>> dbf = ps.open(dbf_link)
>>> d = {col: dbf.by_col(col) for col in ["col1", "col2", "col3"]}
>>> df = pd.DataFrame(d)

If you just wanted to load up the whole thing without neccesarily manually inputting every column, you could replace the second-to-last line by:

>>> d = {col: dbf.by_col(col) for col in dbf.header}

But it can be useful to load only certain columns in some cases...

Just because it's pretty handy and I use it very often, I wrote a very simple wrapper here.

Hope it helps,

]d[


-- 
======================================
Daniel Arribas-Bel, PhD.
Url: darribas.org
Mail: darr...@feweb.vu.nl

Department of Spatial Economics 
Faculty of Economics and Business Administration
VU University, Amsterdam (Netherlands)
======================================

On Monday, June 3, 2013 6:03:04 AM UTC+2, Bjorn Nyberg wrote:
Hej 

I have a question/suggestion regarding the IO tools for Pandas http://pandas.pydata.org/pandas-docs/dev/io.html#excel-files. I deal with several hundred shapefiles (GIS) and I'd like to perform geostatistics on those datasets that otherwise (imo) is done rather poorly in the GIS GUI environments. These datasets are stored as dBase or .dbf but unfortunately the GIS python extension (arcpy) only allows for the iteration row by row and not the dynamic indexing of pandas. However one cannot parse that data into pandas without using the module http://dbfpy.sourceforge.net/ and converting it into a csv and then into a dataframe?... example please? So I was wondering if anyone has needed to do something similar and if so how? I recently came across this blog http://geodacenter.asu.edu/blog/2012/01/17/dbf-files-and-p and thought something along those lines might be of interest/implement to others who use Pandas?

Cheers,
B

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply all
Reply to author
Forward
0 new messages