Suitability of blosc2's frames and superchunks as a basis for random access.

Tomasz Sobczyk

unread,

Aug 23, 2020, 3:42:16 PM8/23/20

to blosc

Hi,

I have a problem that I imagine is not very specific to me, but I still can't find a satisfying solution.

I need to handle very large binary files (>1TiB) of fixed size records that support the following operations:
- read bytes in a range with size usually between 1kiB and 100kiB. Effectively a relatively coarse-grained random read access
- append to the end of the file

Additionally all metadata needs to be in the said file, i.e. the file must be self contained.

Currently I'm thinking about making my own solution. My solution would involve using blocks of uncompressed size at most either 128kiB or 256kiB, an index kept at the end of the file (loaded fully into memory, serialized when closing the file after changes were made, writes initially override the stored index), and a transparent cache for recently uncompressed blocks.

Blosc1 didn't have anything that would be of help, but the new features of blosc2 seem like a right tool for this kind of a job. So can this be implemented using blosc2 features without much pain? In particular I want to know:
- Does accessing a random chunk within a superchunk have constant complexity? Would such access be HDD friendly (i.e. ideally one read per access)?
- Can a superchunk handle tens of millions of chunks within it? What would be the memory and storage space overhead?
- Can appending a chunk to a (likely already large) frame-bound superchunk incur significant costs unrelated to the size of the appended chunk?
- Does a frame-bound superchunk use memory additional to what is necessary to perform the requested operations?
- Is it possible to have a superchunk with chunks of varying size (but each chunk less than 256kiB)? I ask because I sometimes had problems with that https://github.com/Blosc/c-blosc2/blob/73c1a4c6a052f467dc1ad877fbc5b4788495f730/blosc/schunk.c#L149. I could settle for a solution that requires padding smaller chunks.

Francesc Alted

unread,

Aug 25, 2020, 2:45:41 AM8/25/20

to Blosc

Hi Tomasz,

On Sun, Aug 23, 2020 at 9:42 PM Tomasz Sobczyk <ts.tom...@gmail.com> wrote:

Hi,

I have a problem that I imagine is not very specific to me, but I still can't find a satisfying solution.

I need to handle very large binary files (>1TiB) of fixed size records that support the following operations:
- read bytes in a range with size usually between 1kiB and 100kiB. Effectively a relatively coarse-grained random read access
- append to the end of the file

Additionally all metadata needs to be in the said file, i.e. the file must be self contained.

Currently I'm thinking about making my own solution. My solution would involve using blocks of uncompressed size at most either 128kiB or 256kiB, an index kept at the end of the file (loaded fully into memory, serialized when closing the file after changes were made, writes initially override the stored index), and a transparent cache for recently uncompressed blocks.

Yes, I also think that Blosc2 should be a sensible option here.

Blosc1 didn't have anything that would be of help, but the new features of blosc2 seem like a right tool for this kind of a job. So can this be implemented using blosc2 features without much pain? In particular I want to know:
- Does accessing a random chunk within a superchunk have constant complexity? Would such access be HDD friendly (i.e. ideally one read per access)?

Yes.

- Can a superchunk handle tens of millions of chunks within it? What would be the memory and storage space overhead?

Yes, a superchunk/frame can store up to 2^31 chunks. Regarding overhead, it is essentially 16 (chunk header) + 8 (offsets) bytes per chunk.

- Can appending a chunk to a (likely already large) frame-bound superchunk incur significant costs unrelated to the size of the appended chunk?

The only cost besides the actual append chunk is the rewrite of the trailer section (see https://github.com/Blosc/c-blosc2/blob/master/README_FRAME_FORMAT.rst#trailer), which is typically small (few tens of bytes), plus the index of offsets (8 bytes/chunk).

- Does a frame-bound superchunk use memory additional to what is necessary to perform the requested operations?

Not sure about this one, but if that is critical, my recommendation is always measure.

- Is it possible to have a superchunk with chunks of varying size (but each chunk less than 256kiB)? I ask because I sometimes had problems with that https://github.com/Blosc/c-blosc2/blob/73c1a4c6a052f467dc1ad877fbc5b4788495f730/blosc/schunk.c#L149. I could settle for a solution that requires padding smaller chunks.

Nope, this is not yet possible. The frame format does support this, but not implemented yet (not a high priority for us).

Finally, remember that Blosc2 is still in beta, so the format might still change before 2.0-final would be out (by the end of 2020, hopefully).

Hope this helps

Tomasz Sobczyk

unread,

Aug 25, 2020, 6:19:27 AM8/25/20

to blosc

Thank you for the reply. It clarified some things.

I looked at the implementation of appending a chunk to a frame to see how exactly it behaves and I have some concerns.

https://github.com/Blosc/c-blosc2/blob/master/blosc/frame.c#L1215-L1296

It seems that the current implementation rewrites all chunk offsets on each append, which in my case would easly reach 100x the size of an individual chunk.

Moreover each access through frame_get_chunk reads the offsets from disk and decompresses them https://github.com/Blosc/c-blosc2/blob/master/blosc/frame.c#L1119. This would contradict the statement that reading a random chunk has constant complexity.

Besides the point, I also have a suspicion that the current implementation does not correctly handle files larger than 2GiB on windows systems.

See this line: https://github.com/Blosc/c-blosc2/blob/master/blosc/frame.c#L1277

It uses fseek which takes the offset as a long. While on linux ABI longs have 64 bits on windows ABI longs have 32 bits. I've ran into this issue myself, and the only feasible solution was to use windows specific _fseeki64. This could also be fixed partially by seeking relative to the end where the offsets would be smaller. If this is indeed an issue I will start an issue in the blosc2 repo later.

>> - Is it possible to have a superchunk with chunks of varying size (but each chunk less than 256kiB)? I ask because I sometimes had problems with that https://github.com/Blosc/c-blosc2/blob/73c1a4c6a052f467dc1ad877fbc5b4788495f730/blosc/schunk.c#L149. I could settle for a solution that requires padding smaller chunks.

> Nope, this is not yet possible. The frame format does support this, but not implemented yet (not a high priority for us).

I initially thought I was able to add padding to go around this issue but after further inspection it looks like it was false hope. No support for adding smaller chunks makes me unable to add data from multiple batches so for now this is the killer number 1.

There is one more issue I see however. Assuming that chunks of variable dsize would be possible then how would one know which chunk to fetch in order to get a range (a, b) of bytes? Upon reading the format documentation it seems that only the compressed size is stored, so it's not possible to perform binary search on chunks' dsizes.

With that said I really hope blosc2 will mature in the future as it's a huge step forward from blosc1 design-wise, but I'm afraid that for now I'm forced to make an in-house solution with old blosc routines as a backend (in which case are there any advantages to using blosc2 over blosc1 apart from the new filters? Is the other stuff improved somewhat?).

Valentin Haenel

unread,

Aug 25, 2020, 6:40:09 AM8/25/20

to bl...@googlegroups.com, Tomasz Sobczyk

Hi Tomasz,

* Tomasz Sobczyk <ts.tom...@gmail.com> [2020-08-25]:

> With that said I really hope blosc2 will mature in the future as it's a
> huge step forward from blosc1 design-wise, but I'm afraid that for now I'm
> forced to make an in-house solution with old blosc routines as a backend
> (in which case are there any advantages to using blosc2 over blosc1 apart
> from the new filters? Is the other stuff improved somewhat?).

Just briefly: it sounds like you could benefit from looking at:

https://github.com/blosc/bloscpack

While it is an older program, it is a very simple way to blosc compress
data to files, it does have appending capabilities and it can store
(compressed) metadata. Maybe it can serve your needs and if you can't
use it directly perhaps it can serve as an inspiration for writing your
in-house solution?

Best wishes,

V-

Tomasz Sobczyk

unread,

Aug 25, 2020, 7:50:13 AM8/25/20

to blosc

Thanks for mentioning this, but it doesn't solve any of my problems. It still has severe limitations when appending " However there are certain limitations on the amount of data can be appended. For example, if there is an offsets section, there must be enough room to store the offsets for the appended chunks. If no offsets exists, you may append as much data as possible given the limitations governed by the maximum number of chunks and the chunk-size." and no random access. Also I need a C API.

> "perhaps it can serve as an inspiration for writing your in-house solution"

I already have a good specification for my solution. I was just looking for library solutions first.

Tomasz Sobczyk

unread,

Aug 26, 2020, 6:25:11 AM8/26/20

to blosc

> It uses fseek which takes the offset as a long.

Further looking into it it seems there's actually a define on top of frame.c on windows for _fseeki64. Quite confusing.

Francesc Alted

unread,

Aug 27, 2020, 1:46:34 PM8/27/20

to Blosc

You have raised good points.

Regarding chunk offsets during appends, yes, I forgot about the chunk offset rewrite, whose cost can be significant, especially in scenarios where the chunksize is relatively small (seems your case). On a side note, the typical scenario where frames are efficient is when chunks are relatively large (~10 MB or more is usual for modern CPUs with large L3 caches), and the amount of chunks added is up to a few millions. In this case, the cost of rewriting the offsets is of the same order of magnitude (but typically quite less, as the offsets are data that can be compressed quite a lot, quickly) than the chunk to be appended.

And for frame_get_chunk() performance, the responsible of getting the chunks (get_coffsets()) has an internal cache so that the offsets need to be read only once, so that's the reason for constant complexity.

As regards chunks of variable size, both the uncompressed and compressed size are stored in every chunk (https://github.com/Blosc/c-blosc2/blob/master/README_CHUNK_FORMAT.rst). So, for random access, one might need to traverse all the chunks in order to gather that data so as to create the in-memory index. Although this is an expensive operation, one may need to do it only once. Accelerating this would involve creating a similar chunk than for offsets but for chunksizes, and that would require a change in the frame format.

Finally, you can see a list of the extra features in C-Blosc2 here: https://github.com/Blosc/c-blosc2/blob/master/ROADMAP.md. For a more up to date version of the features that we added recently in C-Blosc2, see: https://blosc.org/posts/mid-2020-progress-report/.

Francesc

--
You received this message because you are subscribed to the Google Groups "blosc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blosc+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blosc/2c301960-6dd1-4731-9557-11fe08412c37n%40googlegroups.com.

--

Francesc Alted

Tomasz Sobczyk

unread,

Aug 27, 2020, 3:37:46 PM8/27/20

to blosc

Indeed get_coffsets caches the index, I missed that.

Thank you for answering my questions.

Reply all

Reply to author

Forward