Re: [pydata] Read dbf files to Panda dataframe

2,385 views
Skip to first unread message

Andreas Hilboll

unread,
Jun 3, 2013, 8:23:54 AM6/3/13
to pyd...@googlegroups.com, Bjorn Nyberg
On 03.06.2013 06:03, Bjorn Nyberg wrote:
> Hej
>
> I have a question/suggestion regarding the IO tools for
> Pandas http://pandas.pydata.org/pandas-docs/dev/io.html#excel-files. I
> deal with several hundred shapefiles (GIS) and I'd like to perform
> geostatistics on those datasets that otherwise (imo) is done rather
> poorly in the GIS GUI environments. These datasets are stored as dBase
> or .dbf but unfortunately the GIS python extension (arcpy) only allows
> for the iteration row by row and not the dynamic indexing of pandas.
> However one cannot parse that data into pandas without using the
> module http://dbfpy.sourceforge.net/ and converting it into a csv and
> then into a dataframe?... example please? So I was wondering if anyone
> has needed to do something similar and if so how? I recently came across
> this blog http://geodacenter.asu.edu/blog/2012/01/17/dbf-files-and-p and
> thought something along those lines might be of interest/implement to
> others who use Pandas?
>
> Cheers,
> B

Hi Bjorn,

I'll soon have to dive into shapefile analysis myself. So far I haven't
tried loading the data with pandas yet, but I came acress pyshp during
my research:

http://code.google.com/p/pyshp/

It looks like the data were available in a `fields` member, which might
be easy to convert a pd.DataFrame from.

As I said, I never tried this, but it looked promising when I had a
glance a month ago.

Please let us know how it goes!

Andreas.

Andy Wilson

unread,
Jun 3, 2013, 11:06:57 AM6/3/13
to pyd...@googlegroups.com
You might also want to check out fiona: http://toblerity.github.io/fiona/manual.html

Fiona and pyshp are both solid - not sure I could recommend one over the other for just working with the dbf part of shapefiles, but fiona is a little cleaner if you need to work with actual geometries of features (intersections, unions and such).

-andy

Daniel Arribas-Bel

unread,
Jun 3, 2013, 4:28:06 PM6/3/13
to pyd...@googlegroups.com
Hi there,

PySAL has I/O support for dbf files and, although pandas integration is not built-in out of the box, it is fairly straightforward to use its dbf reader to load .dbf files into DataFrames. For example, if you have a .dbf with the columns "col1", "col2" and "col3", something like this should work:

>>> import pysal as ps
>>> import pandas as pd
>>> dbf_link = "/path/to/my/file.dbf"
>>> dbf = ps.open(dbf_link)
>>> d = {col: dbf.by_col(col) for col in ["col1", "col2", "col3"]}
>>> df = pd.DataFrame(d)

If you just wanted to load up the whole thing without neccesarily manually inputting every column, you could replace the second-to-last line by:

>>> d = {col: dbf.by_col(col) for col in dbf.header}

But it can be useful to load only certain columns in some cases...

Just because it's pretty handy and I use it very often, I wrote a very simple wrapper here.

Hope it helps,

]d[


-- 
======================================
Daniel Arribas-Bel, PhD.
Url: darribas.org
Mail: darr...@feweb.vu.nl

Department of Spatial Economics 
Faculty of Economics and Business Administration
VU University, Amsterdam (Netherlands)
======================================

Bjorn Nyberg

unread,
Jun 20, 2013, 6:26:14 AM6/20/13
to pyd...@googlegroups.com
Hej,
Sorry for the late update but I have tested a few of the suggestions mentioned above and I have to say that the best for my application - converting to a dataframe to analyze the data in pandas/ipython, has been the  GeoDaSandbox tool. It would be nice to have this feature packaged into either pysal or pandas IO - although simple enough to install from the github. Another nice feature would be able to update/delete a features attribute table (and associated shp etc...) from a pandas dataframe although now I might be asking too much - I shall explore pysal and pyshp tools further for that.

Bjorn

Luc Dekoninck

unread,
Feb 5, 2014, 8:23:38 AM2/5/14
to pyd...@googlegroups.com
Hello,

New to this group.

I use shplib and dbflib (installed by the shapelib library).

I have an old function (indent screwed) to load a dbf file into a dict of dicts (might become large forlarge dbf's but load the records as tuple would work as well).

def loadDBFasDict(iFilename, ID_field, subset_IDs = None):

#print 'reading dbf file', iFilename

dbf = dbflib.open(iFilename,'r')

dbf_dict = {}

fld_info = []

for i in range(dbf.field_count()):

fld_info.append(dbf.field_info(i))

tableDBF =[]

#print dir(dbf)

for i in range(dbf.record_count()):

recD = dbf.read_record(i)

if subset_IDs is not None:

if recD[ID_field] in subset_IDs:

tableDBF.append(recD)

dbf_dict[recD[ID_field]] = recD

else:

tableDBF.append(recD)

dbf_dict[recD[ID_field]] = recD

dbf.close()

return dbf_dict, fld_info


Then I only have to use the


DataFrame.from_dict(dbf_dict, orient="index")


and the dbf is a dataframe


Greetings,


Luc

Tim Michelsen

unread,
Feb 5, 2014, 5:55:19 PM2/5/14
to pyd...@googlegroups.com
Hello,

> I use shplib and dbflib (installed by the shapelib library).
>
> I have an old function (indent screwed) to load a dbf file into a dict
> of dicts (might become large forlarge dbf's but load the records as
> tuple would work as well).
This all sound very nice.

But how do you maintain the spatial reference once that you are only
working with the DBF file?

Looking forward to your answer,
Timmie

Reply all
Reply to author
Forward
0 new messages