Message from discussion
New parser branch merged; please help with testing
Received: by 10.42.25.147 with SMTP id a19mr2676998icc.14.1353033845244;
Thu, 15 Nov 2012 18:44:05 -0800 (PST)
X-BeenThere: pydata@googlegroups.com
Received: by 10.50.36.133 with SMTP id q5ls569727igj.29.canary; Thu, 15 Nov
2012 18:44:03 -0800 (PST)
Received: by 10.50.5.210 with SMTP id u18mr1743295igu.4.1353033842907;
Thu, 15 Nov 2012 18:44:02 -0800 (PST)
Received: by 10.50.5.210 with SMTP id u18mr1743294igu.4.1353033842897;
Thu, 15 Nov 2012 18:44:02 -0800 (PST)
Return-Path: <w...@lambdafoundry.com>
Received: from mail-ia0-f170.google.com (mail-ia0-f170.google.com [209.85.210.170])
by gmr-mx.google.com with ESMTPS id uk11si38947igb.2.2012.11.15.18.44.02
(version=TLSv1/SSLv3 cipher=OTHER);
Thu, 15 Nov 2012 18:44:02 -0800 (PST)
Received-SPF: pass (google.com: domain of w...@lambdafoundry.com designates 209.85.210.170 as permitted sender) client-ip=209.85.210.170;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of w...@lambdafoundry.com designates 209.85.210.170 as permitted sender) smtp.mail=...@lambdafoundry.com
Received: by mail-ia0-f170.google.com with SMTP id x24so2861653iak.1
for <pydata@googlegroups.com>; Thu, 15 Nov 2012 18:44:02 -0800 (PST)
d=google.com; s=20120113;
h=mime-version:x-originating-ip:date:message-id:subject:from:to
:content-type:x-gm-message-state;
bh=Hh17CdPwlysgQlxJr7a0PvEWaja8vivX/4ekIMaa6eM=;
b=TKBwwPkSblz/lHvaKtP88+i6paPKAU4SudDycpFjPMYSOAD5Yp72aTPnIJVOS5Btxy
z7FqP2A8z/apblvEpEYu73qwg7VbuZ60wqilD1ELuFWAqiIB1Fe3rGdfW4LxkwXOSLt/
YTrAzyceQ506/tw9Baau3WczFP36HTNIHQo3QlAw3FNiI5UphWY7pTyVRhV7GwNbK8nA
nnc9O1dQ2tFaDD5INZpdehBth+Ew9HNKBL6KLLghkApqGCgYeAVIl6iTeP5ll5FhP1We
nm7WXxTWl1GxQKMRSIJtbyyXqEAnvyj2YPmaNZ3uLwZv2Zvnmf6Tm6SL9040tjdsPWzu
cG4A==
MIME-Version: 1.0
Received: by 10.50.193.170 with SMTP id hp10mr1853414igc.63.1353033842558;
Thu, 15 Nov 2012 18:44:02 -0800 (PST)
Received: by 10.64.61.74 with HTTP; Thu, 15 Nov 2012 18:44:02 -0800 (PST)
X-Originating-IP: [100.2.68.80]
Date: Thu, 15 Nov 2012 21:44:02 -0500
Message-ID: <CA+Ypf_o+zqKLmihBmo-3vt=5g95H=ahz9ASiWb39H6RLz7T...@mail.gmail.com>
Subject: New parser branch merged; please help with testing
From: Wes McKinney <w...@lambdafoundry.com>
To: pydata@googlegroups.com
Content-Type: text/plain; charset=ISO-8859-1
X-Gm-Message-State: ALoCoQlfRK8UUCl6kIG8+n4eXk/WIA4lFXA+gUza3qR0V8DedRaUfEFu0RVfZkalhy6dj0+he9cr
hi folks,
I just merged the new-and-improved (faster, low-memory use) file
parser branch (i.e. the guts of read_csv and read_table). If you work
regularly with medium-size, 100MB+ datasets, I dare say this will be
life-altering.
The new parser branch includes in addition:
- Ability to yield NumPy record arrays instead of pandas.DataFrame if
you want (as_recarray=True)
- Explicit dtypes: e.g. dtype={'C': np.float64, 'D': 'S5'}
- usecols option: read a subset of the column in a file with low memory use
- Reading of compressed (gzip, bz2) files (e.g. compression='gzip')
- Easier specification of CSV/delimited file dialect options (e.g.
skipinitialspace=True)
- Lower-level/faster handling of european decimal formats (decimal=',')
- Special-casing whitespace delimited files for high performance
(delim_whitespace=True)
- Ability to disable NA detection logic altogether (na_filter=False)
If you're able, I'd appreciate some help beating any remaining bugs
out of the code-- all you need to do is install the development branch
and use pandas normally. If you run into any problems, please report
them on GitHub.
http://github.com/pydata/pandas
For Windows users, we're working on getting development builds (look
for 0.9.2dev-...) up on the pandas website (they aren't there yet):
http://pandas.pydata.org/pandas-build/dev/
- Wes