Oh, I read incorrectly, sorry. So, I while it is difficult to
understand what's going on behind the scenes, I would say that the
problem with handing big chunks (scenario 2a) is that HDF5 normally
needs a copy for reading the data from disk or OS memory page cache to
the actual data buffer. On its hand, memmap does not require that, so
that *may* account for the 2x slowdown. For small chunks (2b), the copy
may happen at CPU cache level, so maybe this is why the difference in
speed is less (and I must say that I am a bit surprised that HDF5
performance is better here).
> Probably what is happening here is that the memmap
> approach is just more I/O efficient than HDF5, but having some
> profiles
> would be nice. Do you have some that you can show?
>
> Here are two detailed profiles of the system, one for PyTables and
> another for memmap:
>
https://hyperbrowser.uio.no/gtrackcore/u/henrik/h/profiling-memmap-vs-pytables
> To view them: Click the 'eye'-icon in each of the history items on the
> right hand side.
Yeah, after having a look at the profiles I would say that the reading
is the bottleneck, and that how memmap works vs a regular file read is
the explanation for the difference in performance.
Hmm, perhaps we are not speaking the same language. When I say row
order, I mean something like it is shown in slide 24 of
http://www.pytables.org/docs/StarvingCPUs.pdf. In you example above you
are storing just a 1-dim array, so in this case there is not the concept
of 'rows' and 'columns'.
Besides, there is the additonal 'complication' that EArrays are stored
in chunks as shown in slide 14 of
http://www.pytables.org/docs/PUG-Austin-2012-v3.pdf. Typically the
chunks chosen by PyTables are such that complete rows fit in a chunk, so
it is usually better to read your EArray row-wise. In addition, and as
I suggested before, if the chunks fit comfortably in CPU cache (higher
level cache is enough), you don't incur in the additional copy (to
practical effects).
> But on
> another hand, if your datasets are homogeneous, certainly an Array
> (or
> better, an EArray for allowing the appends like the Table object)
> could
> be a faster way to iterate. But you should do your own benchmarks.
>
> > And how large speed improvement could we expect?
>
> It is difficult to assess. In one hand, the Array containers do not
> have the overhead of having to deal with heterogeneous data
> (Table). On
> the other hand, the Table object has an extremely fine-tuned
> iterator.
> As always, there is no replacement for experimentation.
>
>
> I will probably experiment more with Arrays to see if they improve
> upon tables.
>
>
> Also, is your data compressible? If it is, you can experiment
> with that
> too, specially with the integrated Blosc compressor.
>
>
> I tried turning on compression, but it didn't improve performance.
Hmm, which is the compression ratio of your datasets? How large are
they? In theory compression helps with I/O because you are
reading/writing less. In some cases, specially when doing benchmarks,
your files end living in the OS page cache, so they performance is
somewhat skewed. It is usually good practice flushing the OS page cache
in order to reproduce actual disk I/O. For example, on Linux you can do
that by using this:
http://tecadmin.net/flush-memory-cache-on-linux-server/
-- Francesc Alted