Just a few quick thoughts:
- Since the file is in ASCII format, the binary representation should be less space consuming (1 line with 1 number of the format 1.234567e12 is 12 bytes in ASCII, 4 bytes in numpy.float32). I'd try numpy.loadtxt (or numpy.genfromtxt, or similar) on a smaller portion of the file and see how large the array actually turns out to be.
- If you have a multicolumn file, can you convert one column at a time? The resulting arrays will be smaller than the full file, so problem solved
- If you are only doing this once, you could simply "swap out" your RAM to disk (OS does this automagically) - leave your script running overnight, no problem
- Most scalable, probably: read and append chunks (say, 10k lines at a time) to the hdf5 file. Simply use a couple of loops and some numpy-style slicing, and you're good. One line at a time will work, but will probably be much slower. For a once-off think you might be good with that, too.
Did that help?
Paul
> --
> You received this message because you are subscribed to the Google Groups "h5py" group.
> To post to this group, send email to h5...@googlegroups.com.
> To unsubscribe from this group, send email to h5py+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/h5py?hl=en.
>