Ogawa and (zlib?) data compression

688 views
Skip to first unread message

Milan Bulat

unread,
Jan 22, 2015, 9:09:42 AM1/22/15
to alembic-d...@googlegroups.com
Is there any way to make Ogawa compress sample data?

By looking at the code it seems to ignore Archive's setCompressionHint.

As far as I understood, one of the Ogawa's advertised features was better support for compression (ie. no need to chunk data up).

Thanks,
Milan

Lucas Miller

unread,
Jan 22, 2015, 12:40:38 PM1/22/15
to alembic-d...@googlegroups.com
After extensive testing with zlib, and exploring much faster alternatives like lz4 and snappy, I found that compressing sample data greatly decreased read performance, so I left it out of the AbcCoreOgawa.  Compression could theoretically help with certain very large data sets, but those are unlikely to be encountered in the average Alembic file.

I don't believe Ogawa ever advertised better compression support, you most likely confused that with improved data sharing.

Lucas

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Milan Bulat

unread,
Jan 22, 2015, 5:32:12 PM1/22/15
to alembic-d...@googlegroups.com
I would think that if HDD / server speed is the bottleneck, instead of the CPU cycles, compression would do good?

Prime example would bi simulation datasets where de-duplication does not help much. I've found that compressing Alembic files with Rar / Zip reduces their size by 2x to 5x. That translates to 2x to 5x speed increase in read-from-platter compared to uncompressed data and reduces server load.

I've just tested compressing alembic containing a sphere of 500000 polygons, and it shows 7x decrease in file size when Rar-ed.

Thanks,
Milan 

Lucas Miller

unread,
Jan 22, 2015, 5:39:14 PM1/22/15
to alembic-d...@googlegroups.com
You didn't account the amount of time it takes to do the decompression in your calculation.
5x less data isn't a 5x speedup in reads it's 5x + decompression time of that data.

Decompressing one large dataset (the whole file) is also better than having to decompress several much smaller datasets (per sample data)

Lucas

Milan Bulat

unread,
Jan 22, 2015, 5:55:05 PM1/22/15
to alembic-d...@googlegroups.com
Yep, but if you have enough CPU cycles to do it and not enough server bandwidth, it would make a lot of sense.

Also, it sounds like the main performance issue in that case would be interleaving HDD reads with decompression. Did you do that as well in your tests?

Thanks,
Milan

Lucas Miller

unread,
Jan 22, 2015, 5:57:02 PM1/22/15
to alembic-d...@googlegroups.com

Also, it sounds like the main performance issue in that case would be interleaving HDD reads with decompression. Did you do that as well in your tests?


Yes. 

Milan Bulat

unread,
Jan 22, 2015, 6:14:47 PM1/22/15
to alembic-d...@googlegroups.com
Would you be willing to share your patch if you have it handy? I'd like to put it through the loops over here.

Ben Houston

unread,
Jan 22, 2015, 6:27:15 PM1/22/15
to alembic-d...@googlegroups.com
On 22 Jan 2015 18:26, b...@exocortex.com wrote:

Our testing shows that lzma works amazingly on mesh data. 2x better than gzip.

Best regards,
Ben Houston
http://Clara.io Online 3d modeling and rendering

On 22 Jan 2015 17:39, "Lucas Miller" <miller...@gmail.com> wrote:
You didn't account the amount of time it takes to do the decompression in your calculation.
5x less data isn't a 5x speedup in reads it's 5x + decompression time of that data.

Decompressing one large dataset (the whole file) is also better than having to decompress several much smaller datasets (per sample data)

Lucas
On Thu, Jan 22, 2015 at 2:32 PM, Milan Bulat <milan...@thefoundry.co.uk> wrote:
I would think that if HDD / server speed is the bottleneck, instead of the CPU cycles, compression would do good?

Prime example would bi simulation datasets where de-duplication does not help much. I've found that compressing Alembic files with Rar / Zip reduces their size by 2x to 5x. That translates to 2x to 5x speed increase in read-from-platter compared to uncompressed data and reduces server load.

I've just tested compressing alembic containing a sphere of 500000 polygons, and it shows 7x decrease in file size when Rar-ed.

Thanks,
Milan 

On Thursday, 22 January 2015 18:40:38 UTC+1, Lucas wrote:
After extensive testing with zlib, and exploring much faster alternatives like lz4 and snappy, I found that compressing sample data greatly decreased read performance, so I left it out of the AbcCoreOgawa.  Compression could theoretically help with certain very large data sets, but those are unlikely to be encountered in the average Alembic file.

I don't believe Ogawa ever advertised better compression support, you most likely confused that with improved data sharing.

Lucas

On Thu, Jan 22, 2015 at 6:09 AM, Milan Bulat <milan...@thefoundry.co.uk> wrote:
Is there any way to make Ogawa compress sample data?

By looking at the code it seems to ignore Archive's setCompressionHint.

As far as I understood, one of the Ogawa's advertised features was better support for compression (ie. no need to chunk data up).

Thanks,
Milan

--

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Houston

unread,
Jan 22, 2015, 6:27:17 PM1/22/15
to alembic-d...@googlegroups.com

Our testing shows that lzma works amazingly on mesh data. 2x better than gzip.

Best regards,
Ben Houston
http://Clara.io Online 3d modeling and rendering

On 22 Jan 2015 17:39, "Lucas Miller" <miller...@gmail.com> wrote:

Lucas Miller

unread,
Jan 22, 2015, 6:34:32 PM1/22/15
to alembic-d...@googlegroups.com
Sadly no, it was something quickly tested and abandoned without being formally committed.

Lucas

--

Lucas Miller

unread,
Jan 22, 2015, 8:25:13 PM1/22/15
to alembic-d...@googlegroups.com
2x faster decompress times on your average mesh data? or 2x better decompression ratios?

Lucas

Ben Houston

unread,
Jan 22, 2015, 8:53:21 PM1/22/15
to alembic-d...@googlegroups.com
LZMA made 2x smaller files that gzip on standard polymesh data streams
- no joke, I tried a number of representative files and LZMA was just
awesome. The main issue with LZMA is that it is slow to decompress
compared to GZIP - quite a bit slower.

The solution to LZMA's slow decompression time is LZHAM:
https://code.google.com/p/lzham/ But I haven't tested LZHAM in any
real way, I've just read about it. Not sure if LZHAM can be compiled
on all platforms yet easily.

The ultimate decompression codec comparison is here (lists both LZ4
and LZHAM, and LZMA) and looks at size and compression/decompression
speed:

http://mattmahoney.net/dc/text.html

Best regards,
-ben
Best regards,
Ben Houston
Voice: 613-762-4113 Skype: ben.exocortex Twitter: @exocortexcom
http://Clara.io - Professional-Grade WebGL-based 3D Content Creation

Milan Bulat

unread,
Jan 23, 2015, 5:09:06 AM1/23/15
to alembic-d...@googlegroups.com
I think users could benefit from having 3 compression modes:

1. None
2. Streaming (LZHAM)
3. Archive (LZMA)

and they could pick what suits their needs the best.

3. is obvious, for scenarios where disk space or server bandwidth is problematic (ie. large scale render farm rendering where frame rendering time dwarfs decompression speed, and only ~1 frame needs to be decompressed)

2. is questionable for scenes with lots of objects and small sample sizes. It sounds like it would do very good for usual VFX data (large samples, low amount of objects, no deduplication), due to less frequent need to restart the decompressor.

Lucas, I assume decompressor needs to be restarted for each Alembic property? There is no way to (de)compress the whole schema sample in one go?

Thanks,
Milan
>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "alembic-discussion" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "alembic-discussion" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "alembic-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Ben Houston

unread,
Jan 23, 2015, 9:37:49 AM1/23/15
to alembic-d...@googlegroups.com
I'm not endorsing adding compression to Alembic -- I'm neutral. But
if you did, the compression difference between LZMA and LZHAM are
fairly minor thus just go with LZHAM if it is stable. Having both is
unnecessary.

Here are the relevant lines:

7zip/lzma, resulting size: 178,965,454, compression speed: 503
ns/byte, decompression speed: 546 ns/byte, memory used: 1630
lzham, resulting size: 206,549,091, compression speed: 595 ns/byte,
decompression speed: 9 ns/byte, memory used: 4800
gzip: resulting size: 322,630,796, compression speed: 101 ns/byte,
decompression speed: 17 ns/byte, memory used: 1.6

I think compression meshes is about the same ratios in my testing.
Although I only tested gzip and lzma.

Best regards,
Ben Houston
http://Clara.io Online 3d modeling and rendering

Lucas Miller

unread,
Jan 23, 2015, 3:14:44 PM1/23/15
to alembic-d...@googlegroups.com

Lucas, I assume decompressor needs to be restarted for each Alembic property? There is no way to (de)compress the whole schema sample in one go?


At the lowest level the individual property samples get shared, a schema is made up of several properties and each property can have a number of samples, so there wouldn't be a good way to decompress whole schemas, instead you would be looking to do it when it makes sense for the individual property samples.

Lucas 

Simon Haegler

unread,
Feb 23, 2016, 4:57:47 AM2/23/16
to alembic-discussion
FYI, i did some stress testing of our alembic exporter:
procedural city model: 92G
7zip (ultra): 5G
-> compression factor = 18

of course, nobody uses abc files this big in production and 7zip took about 4h ;-)
the data is available here: https://esri.box.com/s/hplqq5cpdrrp058y9bh01b9iotxfr3i8 (nyc_all_high)
Reply all
Reply to author
Forward
0 new messages