Evon Silvia wrote:
> I like this idea because it gives the option to multi-thread the LAZ
> read/write without having simultaneous drive seeks, thereby speeding
> up the most expensive part of the LAZ IO – the (de)compression. The
> way I see this working is that the user would request a chunk of 50k
> (or whatever) points, LASzip would lock a mutex, read the raw data
> chunk from disk, unlock a mutex, and then perform the decompression on
> the raw bytes. Writing could work the same way except in reverse.
Until I saw this thread I (naiviely?) assumed that this was the way it
worked, since it is the "only sensible" way to do it. :-)
However, as long as Martin is using the standard C file IO or C++ stream
library calls, then this happens underneath the hood anyway, i.e. even
if you do single-byte fgetc() file reads the IO library will under the
hood perform a block operation to fill a 2-64K internal buffer, then
supply data from that buffer. This is (among many other things) in order
to allow the library code to suppress CR ('\r') bytes from an input
stream on a Windows platform.
I have however also written quite a bit of serious
compression/decompression code, in the form of DVD and BluRay decoding,
the world's fastest Ogg Vorbis decoder and probably also the fastest LZ4
code: I always allocate a significant chunk of buffer space (a multiple
of the 4K page and common file block allocation size) and do all
subsequent processing to/from this buffer.
>
> The end result would be that multiple threads could read and write at
> the same time, but the mutex would protect against simultaneous disk IO.
You _can_ do this, but I have performed tests that show clearly that it
isn't needed and can often hurt: You just need a large enough local
buffer so that each OS file IO call becomes large enough, with 1 MB as
the point where you can use a single laptop hard drive to handle at
least 10 simultaneous h264 full-res video streams with no stutter or
glitching.
For lastile which can easily have 1000+ open file handles this would
require a GB or more of buffer space, so there the cutoff point should
probably be lowered.
One very interesting point though is that the Microsoft Visual C++
compiler library will in fact do a lot of mutex handling under the hood!
I.e. when you use the MT (multithreading) library, every single fgetc()
byte read is internally protected by a mutex to make sure that multiple
users of the same file handle cannot stomp on each other. This is
superfluous for at least 99% of all code, even code which does a lot of
true MT work. This one misfeature means that the same C(++) source code
running on the same hardware can be 3X faster on Linux than Windows!
>
> Alternatively, you could add a read/write interface that accepts a raw
> buffer instead of a file pointer, and then leave the actual
> reading/writing of raw bytes up to the user.
This is a god idea which follows quite naturally when you start with a
buffer interface.
Terje
>
> Evon
>
> On Sat, Nov 26, 2016 at 9:10 AM, Martin Isenburg
> <
martin....@gmail.com <mailto:
martin....@gmail.com>> wrote:
>
> Hello,
>
> under the hood of the dynamic linked library API (DLL) is the
> LASzip library that decompresses the points one by one so from a
> decompression speed point of view there is nothing to be gained
> from decompressing points in chunks (-> those will be implemented
> by decompressing single point by single point anyways). What
> would be improved is the number of DLL calls and the repeated copy
> of point attributes from and to the struct (assuming the single
> chunk the could be decompressed to alternatively directly maps to
> your internal representation). By linking statically (LASzip is
> LGPL 2.1 with static linking exception) you could overcome the
> penalty of the repeated DLL calls (not sure how much overhead this
> is).
>
> In summary, currently laszip_read_chunk_of_points() does not
> exist. Would it worthwhile to add this from a performance point of
> view? Maybe you could experiment yourself? It should not be too
> hard to write such a function given the info above.
>
> Regards,
>
> Martin @rapidlasso
>
> On Sat, Nov 26, 2016 at 3:07 PM, Simone Rapposelli
> <
simone.r...@gmail.com <mailto:
simone.r...@gmail.com>>
--
- <
Terje.M...@tmsw.no>
"almost all programming can be viewed as an exercise in caching"