Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion New parser branch merged; please help with testing

Received: by 10.42.25.147 with SMTP id a19mr2676998icc.14.1353033845244;
        Thu, 15 Nov 2012 18:44:05 -0800 (PST)
X-BeenThere: pydata@googlegroups.com
Received: by 10.50.36.133 with SMTP id q5ls569727igj.29.canary; Thu, 15 Nov
 2012 18:44:03 -0800 (PST)
Received: by 10.50.5.210 with SMTP id u18mr1743295igu.4.1353033842907;
        Thu, 15 Nov 2012 18:44:02 -0800 (PST)
Received: by 10.50.5.210 with SMTP id u18mr1743294igu.4.1353033842897;
        Thu, 15 Nov 2012 18:44:02 -0800 (PST)
Return-Path: <w...@lambdafoundry.com>
Received: from mail-ia0-f170.google.com (mail-ia0-f170.google.com [209.85.210.170])
        by gmr-mx.google.com with ESMTPS id uk11si38947igb.2.2012.11.15.18.44.02
        (version=TLSv1/SSLv3 cipher=OTHER);
        Thu, 15 Nov 2012 18:44:02 -0800 (PST)
Received-SPF: pass (google.com: domain of w...@lambdafoundry.com designates 209.85.210.170 as permitted sender) client-ip=209.85.210.170;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of w...@lambdafoundry.com designates 209.85.210.170 as permitted sender) smtp.mail=...@lambdafoundry.com
Received: by mail-ia0-f170.google.com with SMTP id x24so2861653iak.1
        for <pydata@googlegroups.com>; Thu, 15 Nov 2012 18:44:02 -0800 (PST)
        d=google.com; s=20120113;
        h=mime-version:x-originating-ip:date:message-id:subject:from:to
         :content-type:x-gm-message-state;
        bh=Hh17CdPwlysgQlxJr7a0PvEWaja8vivX/4ekIMaa6eM=;
        b=TKBwwPkSblz/lHvaKtP88+i6paPKAU4SudDycpFjPMYSOAD5Yp72aTPnIJVOS5Btxy
         z7FqP2A8z/apblvEpEYu73qwg7VbuZ60wqilD1ELuFWAqiIB1Fe3rGdfW4LxkwXOSLt/
         YTrAzyceQ506/tw9Baau3WczFP36HTNIHQo3QlAw3FNiI5UphWY7pTyVRhV7GwNbK8nA
         nnc9O1dQ2tFaDD5INZpdehBth+Ew9HNKBL6KLLghkApqGCgYeAVIl6iTeP5ll5FhP1We
         nm7WXxTWl1GxQKMRSIJtbyyXqEAnvyj2YPmaNZ3uLwZv2Zvnmf6Tm6SL9040tjdsPWzu
         cG4A==
MIME-Version: 1.0
Received: by 10.50.193.170 with SMTP id hp10mr1853414igc.63.1353033842558;
 Thu, 15 Nov 2012 18:44:02 -0800 (PST)
Received: by 10.64.61.74 with HTTP; Thu, 15 Nov 2012 18:44:02 -0800 (PST)
X-Originating-IP: [100.2.68.80]
Date: Thu, 15 Nov 2012 21:44:02 -0500
Message-ID: <CA+Ypf_o+zqKLmihBmo-3vt=5g95H=ahz9ASiWb39H6RLz7T...@mail.gmail.com>
Subject: New parser branch merged; please help with testing
From: Wes McKinney <w...@lambdafoundry.com>
To: pydata@googlegroups.com
Content-Type: text/plain; charset=ISO-8859-1
X-Gm-Message-State: ALoCoQlfRK8UUCl6kIG8+n4eXk/WIA4lFXA+gUza3qR0V8DedRaUfEFu0RVfZkalhy6dj0+he9cr

hi folks,

I just merged the new-and-improved (faster, low-memory use) file
parser branch (i.e. the guts of read_csv and read_table). If you work
regularly with medium-size, 100MB+ datasets, I dare say this will be
life-altering.

The new parser branch includes in addition:

- Ability to yield NumPy record arrays instead of pandas.DataFrame if
you want (as_recarray=True)
- Explicit dtypes: e.g. dtype={'C': np.float64, 'D': 'S5'}
- usecols option: read a subset of the column in a file with low memory use
- Reading of compressed (gzip, bz2) files (e.g. compression='gzip')
- Easier specification of CSV/delimited file dialect options (e.g.
skipinitialspace=True)
- Lower-level/faster handling of european decimal formats (decimal=',')
- Special-casing whitespace delimited files for high performance
(delim_whitespace=True)
- Ability to disable NA detection logic altogether (na_filter=False)

If you're able, I'd appreciate some help beating any remaining bugs
out of the code-- all you need to do is install the development branch
and use pandas normally. If you run into any problems, please report
them on GitHub.

http://github.com/pydata/pandas

For Windows users, we're working on getting development builds (look
for 0.9.2dev-...) up on the pandas website (they aren't there yet):

http://pandas.pydata.org/pandas-build/dev/

- Wes