data_utils is for making datasets that go into our datasets
collection, but I am probably going to rewrite this at some point in
the near future, so it shouldn't be of too much concern and is
definitely not for reading data.
You want to look at
numpy.genfromtxt (and related functions in npyio).
http://docs.scipy.org/numpy/docs/numpy.lib.npyio.genfromtxt/
You might also have a look at
scikits.statsmodels.lib.io.genfromdta
It will read relatively Stata binary .dta files into structured
ndarrays. Though just looking at the docs, they're pretty
non-existent. Note that it only works on Stata datasets for data
version >= 9 and might not be bug-free (hence it being hidden for
now). I won't have too much time to look at this soon, so you might
just want to remember it for later.
Skipper
http://docs.scipy.org/numpy/docs/numpy.lib.npyio.loadtxt/
http://docs.scipy.org/numpy/docs/numpy.lib.npyio.genfromtxt/
--
To unsubscribe, reply using "remove me" as the subject.
numpy.lib.npyio.loadtxt/
numpy.lib.npyio.loadtxt/I am not able to import this, I actually don't have a npyio file/folder in my numpy install. Not sure why this isBut I do have thisnumpy.lib.io.recfromcsv
They recently changed numpy.lib.io to numpy.lib.npyio because of a
conflict with python's built-in io. Not a big fan of this change, but
so it goes...
So, in python I tend to use csv very heavily. It is simple and
straightforward and clear what you are doing - you can load SOMETHING
fairly quickly - even if its not in the right layout straight away!
You don't get the data into the pretty dataset form you might need,
but I find it much easier to think about doing that once I am in
python. I often find with R that it plain refuses to load a file for
one reason or another, and it can be a real pain to debug.
Also the advantage csv has is that is comes with python (or numpy) so
the user will already have it installed. Since dependencies can be
some of the most confusing aspects of getting a new piece of kit
working, I think having a dependency that is only needed for an
example would be little irritating.
By all means support and demo the complicated fancy methods, but I
think we should also include the basic simple ones - even if they take
more line of code - so that its clear what is going on.
There will be a lot of people coming to this module (like me) who have
large libraries of existing code, and just need it for a couple of
specific algos - so they will already have the data loaded from their
databases etc how they want it and won't use any of these loaders.
Just my two penneth; feel free to ignore.
On 12 Apr, 06:16, Skipper Seabold <jsseab...@gmail.com> wrote:
> On Mon, Apr 12, 2010 at 12:58 AM, Vincent Davis
>
> <vinc...@vincentdavis.net> wrote:
I used python csv module a lot in the past, but Pierre with the help
of Skipper and Bruce have improved genfromtxt a lot last year. And it
only requires numpy. There are still a few rough edges, but in
contrast to the csv module, it is very nice to have automatic type
conversion, from string to numbers with nan handling. And it works
well for clean csv files.
The advantage of using (a unicode enabled) csv module is that it is
much more flexible to handle "weird" csv files, e.g. I have some with
different non-ASCII characters for various missing value codes and any
automatic conversion barfs (or it might be possible but with more
effort than just using plain python).
Another small problem with genfromtxt examples is that they may
require numpy 1.4 and might not work with numpy 1.3
my 2 cents (Canadian)
Josef
>
> By all means support and demo the complicated fancy methods, but I
> think we should also include the basic simple ones - even if they take
> more line of code - so that its clear what is going on.
>
> There will be a lot of people coming to this module (like me) who have
> large libraries of existing code, and just need it for a couple of
> specific algos - so they will already have the data loaded from their
> databases etc how they want it and won't use any of these loaders.
>
> Just my two penneth; feel free to ignore.
>
> On 12 Apr, 06:16, Skipper Seabold <jsseab...@gmail.com> wrote:
>> On Mon, Apr 12, 2010 at 12:58 AM, Vincent Davis
>>
>> <vinc...@vincentdavis.net> wrote:
>>
>> > On Sun, Apr 11, 2010 at 10:53 PM, Vincent Davis <vinc...@vincentdavis.net> wrote:
>>
>> >>> numpy.lib.npyio.loadtxt/
>>
>> >> I am not able to import this, I actually don't have a npyio file/folder in my numpy install. Not sure why this is
>> >> But I do have this
>> >> numpy.lib.io.recfromcsv
>>
>> > Dumb mistake, I was trying to import it from numpy.lib.npyio
>> > numpy.loadtxt works fine
>>
>> They recently changed numpy.lib.io to numpy.lib.npyio because of a
>> conflict with python's built-in io. Not a big fan of this change, but
>> so it goes...
>
>
Josef "it is very nice to have automatic type
conversion, from string to numbers with nan handling."
Almost all of these (except loadtxt I think without checking) just
call from genfromtxt but specify different defaults, so I mainly use
genfromtxt and fiddle with the arguments myself. So recfromcsv
deafults for the delimiter to be a comma, though you can change it I
think. There is recfromtxt also that uses a space as a delimiter (the
default in genfromtxt).
So basically, just see genfromtxt for the arguments you can specify
except in the case of loadtxt, which can't handle names and missing
data, I don't think, so is a bit different but more "lightweight".
Also note that savetxt doesn't let you specify names (there is a
ticket filed for this, but I can't ever get anyone to commit the one
line change), so I added a savetxt to our scikits.statsmodels.lib.io
Skipper
I didn't know about the statsmodels savetxt. A summary of IO will be
very useful, I always struggle with this.
Thanks,
Josef
>
> Skipper
Yeah, I got sick of rewriting it every time I wanted to save a csv
with a header row. It's in my revision 2003 (really 2004 with a bug
fix). Note that I used the slightly older version of numpy.savetxt,
since savetxt has been rewritten for Python 3 transition and it
requires some compatibility functions that are only available in newer
versions of numpy and I didn't want to force people to be on the
bleeding edge.
Skipper
However, I do not recall any complaints about that change.
Bruce
IIUC, Isn't that why relative imports were introduced?
http://docs.python.org/whatsnew/2.5.html#pep-328
> However, I do not recall any complaints about that change.
>
Someone suggested changing to relative imports IIRC, but not very loudly.
Skipper
>> However, I do not recall any complaints about that change.
>>
>>
> Someone suggested changing to relative imports IIRC, but not very loudly.
>
> Skipper
>
>
>
I do not know if that would have even solved the problem. Also I think
at least Chuck does not like have modules with the same name across
projects. I know in R it causes some issues of shadowing of functions
that can be a problem.
Bruce
Yeah you're right. Oh well.
Skipper