A faster Alembic back-end: Ogawa

Steve LaVietes

unread,

May 8, 2013, 5:42:02 PM5/8/13

to alembic-d...@googlegroups.com

As guided by feedback both internal and external, continual improvement of Alembic's performance remains an ongoing goal of the project. And, over time, it’s become clear that the dependence upon HDF5 (as specific to Alembic) is a significant hurdle to further performance.

Specifically:

1) HDF5 is thread-safe but not thread efficient. As client tools increasingly rely upon concurrent access for reading scene data, this matters much more. (See details here: http://www.hdfgroup.org/hdf5-quest.html#tsafe)

2) While Alembic atop HDF5 provides very efficient storage of large datasets, the structural overhead of many small objects and properties is less efficient.

Fortunately, Alembic's API design has an abstraction layer atop the underlying data format (on disk or otherwise). We initially chose HDF5 as the container format based on its history of successful use within ILM/Imageworks but we designed it to be swapped out if the case arose in the future.

With a few years of experience with Alembic now, it’s clear we need a back-end data format that meets the following requirements:

1) A simple API to easily and efficiently express the higher-level Alembic APIs without change to client code.

2) Minimal library dependencies

3) Data sharing for Alembic's de-duplication features

4) Input/output abstraction (for potentially reading from non-file sources)

5) Thread-efficient reading

6) Low overhead for small data cases

We’ve looked extensively and haven’t found anything freely available that meets our requirements without further trade-offs. So, we have prototyped a new library (Ogawa) and integrated it with Alembic for testing. If you’re curious, you can see the work in progress here:

https://code.google.com/r/millerlucas-dev/source/browse?name=ogawa

As this is a prototype version, we will be spending the next few months gathering and vetting suggestions from the community and honing the implementation to certify it for production use. We are excited by its prospects as -- even at this early stage -- we're seeing these significant improvements in alignment with our goals for Alembic’s future:

1) File sizes are on average 5-15% smaller. Scenes with many small objects should see even greater reductions.

2) Single-threaded reads average around 4x faster

3) Multi-threaded reads can improve by 25x (relative to the same operations in the existing HDF5 back-end) on 8 core systems.

We’re pretty enthusiastic about it.

Of course, we’ll maintain backwards compatibility. Backwards compatibility remains another key goal of the Alembic project and as such, the ability to read and write Alembic data with the existing HDF5 back-end will continue to be supported for the foreseeable future. Only very minor changes to client code are required to support reading the new format transparently along with the old. (This relates only to the instantiation of the archive itself and should be confined to a line or two of code.)

We look forward to your feedback and any questions you may have.

Steve LaVietes

Jonathan Gibbs

unread,

May 8, 2013, 5:54:40 PM5/8/13

to alembic-d...@googlegroups.com

This is really really exciting. Keep it up, Alembic team!!!

--jono

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Nicholas Yue

unread,

May 8, 2013, 5:54:47 PM5/8/13

to alembic-d...@googlegroups.com

On 9 May 2013 09:42, Steve LaVietes <steve.l...@gmail.com> wrote:

We’ve looked extensively and haven’t found anything freely available that meets our requirements without further trade-offs. So, we have prototyped a new library (Ogawa) and integrated it with Alembic for testing. If you’re curious, you can see the work in progress here:

https://code.google.com/r/millerlucas-dev/source/browse?name=ogawa

Hi Steve,

Should one plan to be able to use Ogawa layer independently for other VFX tools ? i.e. Treat it like a better HDF5.

Cheers

--
Nicholas Yue
Graphics - RenderMan, Visualization, OpenGL, HDF5
Custom Dev - C++ porting, OSX, Linux, Windows
http://au.linkedin.com/in/nicholasyue
https://vimeo.com/channels/naiadtools

Steve LaVietes

unread,

May 8, 2013, 6:05:12 PM5/8/13

to alembic-d...@googlegroups.com

While Ogawa is potentially useful outside of Alembic, it's not intended to be a general-purpose replacement of HDF5. Rather, it's the minimal (in terms of complexity and overhead) set of functionality necessary to support the higher-level Alembic APIs. It's closer to a stream format than to a self-describing structured hierarchical container.

-stevel

--

Ben Houston

unread,

May 8, 2013, 6:44:47 PM5/8/13

to alembic-d...@googlegroups.com

Very very cool.

Best regards,
-ben

--
Best regards,
Ben Houston
CTO, Exocortex Technologies, Inc.
http://www.exocortex.com

Alex Suter

unread,

May 8, 2013, 8:47:14 PM5/8/13

to alembic-d...@googlegroups.com

I've been testing it a bunch here at ILM, and I can attest to the speed and size improvements, particularly on caches with very deep hierarchies. Caches are smaller on disk as well.

I also want to mention that it drops right into current code. There's a nifty factory object that will return a valid archive object for either HDF5 or Ogawa based Alembic caches. Two line change to read the new format, and a one line change to write it. I'm a big fan.

Take it for a spin and let us know what you see.

Steven Caron

unread,

May 9, 2013, 12:57:01 AM5/9/13

to alembic-d...@googlegroups.com

awesome news!

does this also mean improved build support across platforms? years ago when i was first looking at alembic, i quickly noticed how hard it was to build HDF5 on windows. since then helge and ben at exocortex took care of that for us. but anything to ease the building process is always welcome for novice programmers like myself.

thanks

steven

Steve LaVietes

unread,

May 9, 2013, 1:05:30 AM5/9/13

to alembic-d...@googlegroups.com

Yes and no.

The yes part: Ogawa itself introduces no external dependencies and (thus far) appears to compile cleanly on Windows.

The no part: For backwards compatibility with existing HDF5-based archives, AbcCoreHDF5 is still present.

It's possible to build without HDF5 but you sacrifice the ability to read older archives.

-stevel

--

Steven Caron

unread,

May 9, 2013, 1:43:15 AM5/9/13

to alembic-d...@googlegroups.com

ah, that makes sense. well alembic is still pretty new, maybe it won't be long before we can shed hdf5 entirely.

s

Ivan Busquets

unread,

May 9, 2013, 3:04:46 AM5/9/13

to alembic-d...@googlegroups.com

That's fantastic news. Really exciting.

Are there plans to add a command-line converter to help test performance vs older archives?

Thanks,

Ivan

Diego Garces

unread,

May 9, 2013, 3:09:55 AM5/9/13

to alembic-discussion

That's great news. Are there any recommended versions for the different dependencies? Are you using ilmbase 2.0.0 for your tests or the stable 1.0.3?

Thanks for all the great work,
Diego

Steve LaVietes

unread,

May 9, 2013, 7:46:23 AM5/9/13

to alembic-d...@googlegroups.com, alembic-discussion

The core library is fine with 1.0.3 or 2.0.0. The optional python bindings depend on features from 2.0.0 (as well as on boostpython).

Performance should be identical in either case as ilmbase is mostly used (in the core library) for typing the array data.

-stevel

Lucas Miller

unread,

May 9, 2013, 11:08:12 AM5/9/13

to alembic-d...@googlegroups.com

A command line converter is already included with the prototype. For more details see:

https://code.google.com/r/millerlucas-dev/source/browse?name=ogawa#hg%2Fexamples%2Fbin%2FAbcConvert

Francois

unread,

May 9, 2013, 5:03:01 PM5/9/13

to alembic-d...@googlegroups.com

Just got the ogawa compiled last night, will test it out more today

but haven't seen any issues so far, everything went smoothly !

Francois

On Friday, May 10, 2013 3:08:12 AM UTC+12, Lucas wrote:

A command line converter is already included with the prototype. For more details see:

https://code.google.com/r/millerlucas-dev/source/browse?name=ogawa#hg%2Fexamples%2Fbin%2FAbcConvert

On May 9, 2013 12:04 AM, "Ivan Busquets" <ivanbu...@gmail.com> wrote:

That's fantastic news. Really exciting.

Are there plans to add a command-line converter to help test performance vs older archives?

Thanks,
Ivan

On Wed, May 8, 2013 at 10:43 PM, Steven Caron <car...@gmail.com> wrote:

ah, that makes sense. well alembic is still pretty new, maybe it won't be long before we can shed hdf5 entirely.

s

On Wednesday, May 8, 2013 10:05:30 PM UTC-7, Steve LaVietes wrote:

It's possible to build without HDF5 but you sacrifice the ability to read older archives.

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discussion+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discussion+unsub...@googlegroups.com.

Ivan Busquets

unread,

May 11, 2013, 3:17:25 PM5/11/13

to alembic-d...@googlegroups.com

Oh, cool!

I had checked out the default branch instead of the "ogawa" branch :S

Many thanks,

Ivan

Michel Lerenard

unread,

May 13, 2013, 3:09:38 AM5/13/13

to alembic-d...@googlegroups.com

On 05/09/2013 02:47 AM, Alex Suter wrote:
> I've been testing it a bunch here at ILM, and I can attest to the
> speed and size improvements, particularly on caches with very deep
> hierarchies. Caches are smaller on disk as well.
>

Awesome news ! We have some speed issues on large scenes, I can't wait
to test Ogawa on them.

Steve, is there any known drawbacks/bugs that we should be aware of ?

Michel Lerenard

Michel Lerenard

unread,

May 13, 2013, 5:42:13 AM5/13/13

to alembic-d...@googlegroups.com

Hi,

i've troid to compile the branch, but there is a compil issue in the obj convert tool.
In AbcReader I get :

/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp:60: error: no ï¿½void AbcClients::WFObjConvert::AbcReader::vt(Alembic::AbcCoreAbstract::v5::index_t, double)ï¿½ member function declared in class ï¿½AbcClients::WFObjConvert::AbcReaderï¿½
/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp:70: error: no ï¿½void AbcClients::WFObjConvert::AbcReader::vt(Alembic::AbcCoreAbstract::v5::index_t, const Imath::V2d&)ï¿½ member function declared in class ï¿½AbcClients::WFObjConvert::AbcReaderï¿½
/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp:80: error: no ï¿½void AbcClients::WFObjConvert::AbcReader::vt(Alembic::AbcCoreAbstract::v5::index_t, const Imath::V3d&)ï¿½ member function declared in class ï¿½AbcClients::WFObjConvert::AbcReaderï¿½
/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp:90: error: no ï¿½void AbcClients::WFObjConvert::AbcReader::vn(Alembic::AbcCoreAbstract::v5::index_t, const Imath::V3d&)ï¿½ member function declared in class ï¿½AbcClients::WFObjConvert::AbcReaderï¿½
/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp: In member function ï¿½virtual void AbcClients::WFObjConvert::AbcReader::f(const AbcClients::WFObjConvert::Reader::IndexVec&, const AbcClients::WFObjConvert::Reader::IndexVec&, const AbcClients::WFObjConvert::Reader::IndexVec&)ï¿½:
/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp:117: error: ï¿½m_texIndicesï¿½ was not declared in this scope
/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp:125: error: ï¿½m_normIndicesï¿½ was not declared in this scope
/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp: In member function ï¿½void AbcClients::WFObjConvert::AbcReader::makeCurrentObject()ï¿½:
/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp:157: error: ï¿½m_texIndicesï¿½ was not declared in this scope
/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp:158: error: ï¿½m_texVerticesï¿½ was not declared in this scope
/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp:184: error: ï¿½m_normIndicesï¿½ was not declared in this scope
/home/michel-local/src/3rdparty/millerlucas-dev/examples/AbcClients/WFObjConvert/AbcReader.cpp:185: error: ï¿½m_normalsï¿½ was not declared in this scope
make[2]: *** [examples/AbcClients/WFObjConvert/CMakeFiles/AbcWFObjConvert.dir/AbcReader.cpp.o] Error 1

Apparently there is a header mixup due to multiple paths available: replacing the original
#include <AbcClients/WFObjConvert/AbcReader.h>
by
#include "AbcReader.h"

fixes the problem.

On 05/08/2013 11:42 PM, Steve LaVietes wrote:

As guided by feedback both internal and external, continual improvement of Alembic's performance remains an ongoing goal of the project. And, over time, itï¿½s become clear that the dependence upon HDF5 (as specific to Alembic) is a significant hurdle to further performance.

Specifically:

1) HDF5 is thread-safe but not thread efficient. As client tools increasingly rely upon concurrent access for reading scene data, this matters much more. (See details here: http://www.hdfgroup.org/hdf5-quest.html#tsafe)

2) While Alembic atop HDF5 provides very efficient storage of large datasets, the structural overhead of many small objects and properties is less efficient.

Fortunately, Alembic's API design has an abstraction layer atop the underlying data format (on disk or otherwise). We initially chose HDF5 as the container format based on its history of successful use within ILM/Imageworks but we designed it to be swapped out if the case arose in the future.

With a few years of experience with Alembic now, itï¿½s clear we need a back-end data format that meets the following requirements:

1) A simple API to easily and efficiently express the higher-level Alembic APIs without change to client code.

2) Minimal library dependencies

3) Data sharing for Alembic's de-duplication features

4) Input/output abstraction (for potentially reading from non-file sources)

5) Thread-efficient reading

6) Low overhead for small data cases

Weï¿½ve looked extensively and havenï¿½t found anything freely available that meets our requirements without further trade-offs. So, we have prototyped a new library (Ogawa) and integrated it with Alembic for testing. If youï¿½re curious, you can see the work in progress here:

https://code.google.com/r/millerlucas-dev/source/browse?name=ogawa

As this is a prototype version, we will be spending the next few months gathering and vetting suggestions from the community and honing the implementation to certify it for production use. We are excited by its prospects as -- even at this early stage -- we're seeing these significant improvements in alignment with our goals for Alembicï¿½s future:

1) File sizes are on average 5-15% smaller. Scenes with many small objects should see even greater reductions.

2) Single-threaded reads average around 4x faster

3) Multi-threaded reads can improve by 25x (relative to the same operations in the existing HDF5 back-end) on 8 core systems.

Weï¿½re pretty enthusiastic about it.

Of course, weï¿½ll maintain backwards compatibility. Backwards compatibility remains another key goal of the Alembic project and as such, the ability to read and write Alembic data with the existing HDF5 back-end will continue to be supported for the foreseeable future. Only very minor changes to client code are required to support reading the new format transparently along with the old. (This relates only to the instantiation of the archive itself and should be confined to a line or two of code.)

We look forward to your feedback and any questions you may have.

Steve LaVietes

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ï¿½
ï¿½

Steve LaVietes

unread,

May 13, 2013, 11:11:13 AM5/13/13

to alembic-d...@googlegroups.com

We are not aware of any bugs in the current Ogawa and AbcCoreOgawa implementation. Any testing or feedback you can provide is greatly appreciated.

The only potential drawback is that plug-ins compiled against older versions of the library cannot read the Ogawa-backed archives. This is the first time we've broken *forwards* compatibility -- and we don't do so lightly. (We still maintain *backwards* compatibility for reading and writing HDF5-backed archives from the newer library version.)

-stevel

Michel Lerenard

unread,

May 13, 2013, 11:45:32 AM5/13/13

to alembic-d...@googlegroups.com

Hi,

i've started to modify my app to use the factory to get an Archive that can read Ogawa files, my first request: a function that returns a pointer to an IArchive.
I can create one using the type returned by factory.getArchive, but it will be faster if one function directly returns a pointer.

I made a few tests and so far it is indeed very promizing.
We have an alembic test scene created using a script (lot of duplication in it), weighting about 225Mo. It takes right under 100sec to import it in our app. (including what we need to do internally)
I've converted the scene using the tool, the file now weights 54Mo. (I guess the converter de-duplicated a lot of things)
Import took 42sec. Performance boost for the parse part, i'd say more than 5x.

I'll make a benchmark later and will post detailed results here. I will try to compare scenes with little duplicated data, to check if the performance impact is the same or not.

Steven Caron

unread,

May 13, 2013, 4:59:15 PM5/13/13

to alembic-d...@googlegroups.com

i just wanted to ask again an question which presented it self on this list a while ago... alembic crashing when a file its reading is updated/overwritten. was this an imposed requirement of HDF5 or is it in the alembic api (abstraction layer)? will moving to ogawa mean no more crashes?

s

Lucas Miller

unread,

May 13, 2013, 5:13:29 PM5/13/13

to alembic-d...@googlegroups.com

This isn't necessarily Alembic or HDF5 specific, it's something that can happen when keeping any file open for deferred reading while accidentally trying to open it for a truncated write.

Alembic backed by Ogawa should have similar behavior as writing and reading continue to be segregated.

You can also close all references to the archive before opening it for rewriting.

Lucas

--

Steven Caron

unread,

May 13, 2013, 5:30:13 PM5/13/13

to alembic-d...@googlegroups.com

ok, just wanted to know if HDF5 was the culprit and if shedding it would workaround it.

"You can also close all references to the archive before opening it for rewriting."

yes, now its just knowing if the file has changed. the plugin maintainers (exocortex in this case) has proposed some options for this. was just wishfully hoping :)

thanks guys

s

Michel Lerenard

unread,

May 14, 2013, 8:01:16 AM5/14/13

to alembic-d...@googlegroups.com

More feedback on speed.

I've performed some tests and I have to say: I like Ogawa !

All tests have been performed using Clarisse iFX, on a dual Xeon E5645 (12 cores/24 HT) with 12GB of ram and a "simple" sata drive. (no raid, no ssd). I'm working with CentOS6.

The tests I've done have two parts:
- import : parse file and gather information, then create data in the app. This part is single threaded.
- time before render: read geometric data from nodes, to build up a scene. This part is multi threaded. When building a scene, I have up to 24 threads reading the file at the same time. If data is deformed, this includes the building of a cache containing arrays digests (done once).

Files have been converted using abcconverter from Lucas' branch..

Time: Ogawa VS HDF5

Buliding 1: 33k objects ( 67k including single child xforms ), max depth 10, file size 225Mo (=>54Mo with Ogawa), lots of duplicates:
    import: 9s / 34s
    tbf: ~3s / ~10s

Building 2: 7k objects (14k including single child xforms), max depth 9, file size 240Mo (=>191Mo with Ogawa), few duplicates.
    import: 2s / 8s
    tbf: ~6s / ~20s

Animated characters: 20 files, 591 objects, about 15-20 per file. max depth 2. Total file size 803Mo (=>777 with Ogawa), very few duplicates.
    import: average per file: 3ms / 15ms.
    tbf: immediate/~5sec
The very good thing on this particular scene is that we're now able to play the animation. Using HDF5 we were not able to get more than 1 fps (around 0.8), now we can play it at more than 20 (23.8 average) !
I've attached a screenshot of the scene so you can get an idea of its complexity. Each object is stored in a separate file and has several meshes. (courtesy of Cluser Studio).

I've not encountered a single crash, it's really stable. Amazing job done !

One thing that could be interesting to do is modify the converter to accept wildcard to batch convert files. (adding _ogawa to the output filename, or replacing the original file.)

--

cluster.png

Alex Suter

unread,

May 14, 2013, 12:09:10 PM5/14/13

to alembic-d...@googlegroups.com, alembic-d...@googlegroups.com

Great news. That's in line with the gains we are seeing as well.

-- Alex

—
Sent from Mailbox for iPhone

<cluster.png>

Sam Assadian

unread,

May 14, 2013, 5:49:07 PM5/14/13

to alembic-d...@googlegroups.com

Guys as more figures are coming in, it's getting more and more exciting! :)

A suggestion, if it's not already in the pipe: Is it possible to store a UUID/Hash/MD5 by standard in ABCs when using Ogawa?
That would be even more awesome. Basically it would resolve once and for all the ABC update issues, we are having, Michel mentioned weeks ago in his topic.

Cheers,

Barnaby Robson

unread,

May 14, 2013, 6:08:32 PM5/14/13

to alembic-d...@googlegroups.com

Hi Sam,

What exactly would be in the hash ? Is the hash of a parent expected to recursively cover all the hashes of the children ?

barnaby.

On 05/14/2013 02:49 PM, Sam Assadian wrote:

Guys as more figures are coming in, it's getting more and more exciting! :)

A suggestion, if it's not already in the pipe:ï¿½Is it possible to store a UUID/Hash/MD5 by standard in ABCs when using Ogawa?

That would be even more awesome. Basically it would resolve once and for all the ABC update issues, we are having, Michel mentioned weeks ago in his topic.

Cheers,

On Tuesday, May 14, 2013 6:09:10 PM UTC+2, Alex Suter wrote:

Great news. That's in line with the gains we are seeing as well.ï¿½

ï¿½ ï¿½ ï¿½ ï¿½-- Alex

ï¿½

Sent from Mailbox for iPhone

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ï¿½
ï¿½

Lucas Miller

unread,

May 14, 2013, 6:21:45 PM5/14/13

to alembic-d...@googlegroups.com

Hi Barnaby,

The previous discussion that Sam was hinting at from Michel is here:

https://groups.google.com/d/msg/alembic-discussion/I5kzY1v4O_4/_ACYGak6SMYJ

Lucas

On Tue, May 14, 2013 at 3:08 PM, Barnaby Robson <bro...@ilm.com> wrote:

Hi Sam,

What exactly would be in the hash ? Is the hash of a parent expected to recursively cover all the hashes of the children ?

barnaby.

On 05/14/2013 02:49 PM, Sam Assadian wrote:

Guys as more figures are coming in, it's getting more and more exciting! :)

A suggestion, if it's not already in the pipe: Is it possible to store a UUID/Hash/MD5 by standard in ABCs when using Ogawa?

That would be even more awesome. Basically it would resolve once and for all the ABC update issues, we are having, Michel mentioned weeks ago in his topic.

Cheers,

On Tuesday, May 14, 2013 6:09:10 PM UTC+2, Alex Suter wrote:

Great news. That's in line with the gains we are seeing as well.

-- Alex

—

Sent from Mailbox for iPhone

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Barnaby Robson

unread,

May 14, 2013, 6:25:58 PM5/14/13

to alembic-d...@googlegroups.com

Thanks Lucas, so actually what was requested in that thread was just a mandatory timestamp at the very top level.ï¿½ That seems much more reasonable than getting into the details of a hierarchical hashing scheme.

barnaby.

On 05/14/2013 03:21 PM, Lucas Miller wrote:

Hi Barnaby,

The previous discussion that Sam was hinting at from Michel is here:

https://groups.google.com/d/msg/alembic-discussion/I5kzY1v4O_4/_ACYGak6SMYJ

Lucas

On Tue, May 14, 2013 at 3:08 PM, Barnaby Robson <bro...@ilm.com> wrote:

Hi Sam,

What exactly would be in the hash ? Is the hash of a parent expected to recursively cover all the hashes of the children ?

barnaby.

On 05/14/2013 02:49 PM, Sam Assadian wrote:

Guys as more figures are coming in, it's getting more and more exciting! :)

A suggestion, if it's not already in the pipe:ï¿½Is it possible to store a UUID/Hash/MD5 by standard in ABCs when using Ogawa?

That would be even more awesome. Basically it would resolve once and for all the ABC update issues, we are having, Michel mentioned weeks ago in his topic.

Cheers,

On Tuesday, May 14, 2013 6:09:10 PM UTC+2, Alex Suter wrote:

Great news. That's in line with the gains we are seeing as well.ï¿½

ï¿½ ï¿½ ï¿½ ï¿½-- Alex

ï¿½

Sent from Mailbox for iPhone

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ï¿½
ï¿½

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ï¿½
ï¿½

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ï¿½
ï¿½

Sam Assadian

unread,

May 14, 2013, 6:29:48 PM5/14/13

to alembic-d...@googlegroups.com

Hi Barnaby,

Basically, a hash to identify the ABC itself. I'm not asking for a unique identifier that depends on the file content. Obviously that would be the ideal solution as we could even detect redundant files or copies during import...

No, what I would love to see is an identifier in the ABC that would allow us to detect instantly if an ABC has changed. Basically it would be used to resync Clarisse with updated ABCs. For now we mainly rely on file name/date/file size which is completely unreliable as many users tend to copy their ABCs locally.

Jonathan Gibbs

unread,

May 14, 2013, 7:50:37 PM5/14/13

to alembic-d...@googlegroups.com

I might have said this in the other thread. All our in-house file formats store a uuid in their header, generated by libuuid (man uuid_generate).

It's really useful.

--jono

Francois

unread,

May 14, 2013, 7:51:31 PM5/14/13

to alembic-d...@googlegroups.com

I second that ... having the same issue now, with our abcProxy in maya , would be great to know if the file has changed on disk, and a unique abcId for the file would make that job very easy

(could get the time stamp as well but it's not as nice and linux time stamps seem to be kinda flaky here ...)

Lucas Miller

unread,

May 14, 2013, 8:34:38 PM5/14/13

to alembic-d...@googlegroups.com

You can always put a uuid into the MetaData string of the OArchive.

If there is a reliable multi-platform way to do this that wouldn't add additional external dependencies, we would add it to Abc/ArchiveInfo.cpp

Lucas

Stephen Parker

unread,

May 14, 2013, 8:42:27 PM5/14/13

to alembic-d...@googlegroups.com

Boost supports generating UUID's.

From: Lucas Miller <miller...@gmail.com>
To: alembic-d...@googlegroups.com
Sent: Tuesday, May 14, 2013 5:34 PM
Subject: Re: [alembic-discussion] Re: A faster Alembic back-end: Ogawa

Barnaby Robson

unread,

May 14, 2013, 8:42:49 PM5/14/13

to alembic-d...@googlegroups.com

A combination of the time and the MAC address and a random number would probably be unique enough for our purposes, right? I'm pretty sure we can find some open source code to get the MAC address in a cross platform way.

barnaby.

Luke Emrose

unread,

May 14, 2013, 9:13:19 PM5/14/13

to alembic-discussion

Why would you want to put the MAC address and time in the hash for a file?

That will simply produce the issue people are already having again....

You want a hash entirely based on the contents of the file, to ensure that duplicate files can be identified.

Adding machine or time specific information will actually make the issues worse, since the uniqueness of the file will be compromised.

Unless I am missing the point.

Luke Emrose
aka evolutionary theory
www.evolutionarytheory.com
www.soundcloud.com/evolutionarytheory
www.reverbnation.com/evolutionarytheory
http://www.facebook.com/pages/evolutionary-theory/110717958952413

Stephen Parker

unread,

May 14, 2013, 10:42:49 PM5/14/13

to alembic-d...@googlegroups.com

A checksum like .md5 would be the ideal solution. But I don't think anyone wants a sidecar file lying around. What's being offered is a compromise. If you generate a new random unique number every time you write out the alembic file, you'd have a way to compare against some other means of persistent storage that the file may have been changed. It's a bit limiting in what you can do with just a unique identifier. But I suppose it's better than nothing.

From: Luke Emrose <evolution...@gmail.com>
To: alembic-discussion <alembic-d...@googlegroups.com>
Sent: Tuesday, May 14, 2013 6:13 PM

Subject: Re: [alembic-discussion] Re: A faster Alembic back-end: Ogawa

Jonathan Gibbs

unread,

May 15, 2013, 2:20:25 PM5/15/13

to alembic-d...@googlegroups.com

I think there are two different threads here:

1. A uuid in the file will tell you if one file is a copy of another, like if a file in a cache is the same as a source file. As files move through a pipeline, you can see if a file has been just copied along or rewritten. This can be done with libuuid on linux (man uuid). There are a few methods in there, but one is a mac address + time + random bits.

2. A checksum/hash of the file's contents will tell you if two files generated at different times have the same contents. This is more powerful than the uuid, but more expensive.

--jono

Ben Houston

unread,

May 15, 2013, 2:24:29 PM5/15/13

to alembic-d...@googlegroups.com

> 2. A checksum/hash of the file's contents will tell you if two files
> generated at different times have the same contents. This is more powerful
> than the uuid, but more expensive.

Hiearchical hashes are not expensive at all if the hashes to the data
streams are already computed (which they are with Alembic.) I think
that a 5000 node hierarchical hash can be computed in less than 0.1
second or less if you are smart about it, where as actually writing
out that file would be many seconds, thus the cost is likely to be
less than 1% of the total if not much less. I think that the
usefulness of hierarchical hashes would be really long term and much
more useful than uuids. The caching potential is pretty huge.

--
Best regards,
Ben Houston
CTO, Exocortex Technologies, Inc.
http://www.exocortex.com

Sam Assadian

unread,

May 16, 2013, 6:15:11 AM5/16/13

to alembic-d...@googlegroups.com

That would be really cool specially that 95% of the job is already done.

Alex Suter

unread,

May 16, 2013, 1:06:35 PM5/16/13

to alembic-d...@googlegroups.com

Agreed that it's really cool, but it's tricky to nail down exactly what should be included in the hash. For example, do you include UVs? Topology? Hierarchy and points should be included, but sometimes you want to know if those match between caches while not caring if UVs do. Do you include other metadata like string properties on the objects? If you do, you run into situations where the hashes don't match because a date changed, or little things like that.

You could have one hash for points and hierarchy, another for UVs and topological information, but finding a single way to break it down for everyone is definitely a challenge.

--

Ben Houston

unread,

May 16, 2013, 1:15:16 PM5/16/13

to alembic-d...@googlegroups.com

A classic hierarchical hash would be different for the node if
anything is different on that node. Hierarhical hashes are first and
foremost comprehensive at the root but you can traverse down the tree
to find out what exactly where the differences are and what isn't
different.

Given that there are already hashes for data blocks (which UVs and
Topology are), you can traverse these subhashes to see if they
changed. I think that having metadata invalidate the node's hash is
an important feature because that metadata may influence how one
interprets that node thus necessitating an update.

Basically there is a sub-hash for each sub-item. Properties on a node
that are not in a hashed data block would need to be hashed as a
separate item and then combined into the node's hash.

I'd suggest not writing dates on each individual node but rather on
the base node. Thus one can save out a new file with just a new date
and while the primary hash is different (because it is a hierarchical
hash is fully comprehensive) once you traverse in the tree, you'll
notice it is all the same.

This scheme is at least efficient in terms of updates as the random
UUID and it has the potential of being drastically more efficient.
But all of us developers can implement these improved caching
structures incrementally.

Best regards,
-ben

Aghiles

unread,

May 16, 2013, 2:34:12 PM5/16/13

to alembic-d...@googlegroups.com

Very nice that you guys are thinking about supporting something else than crappy HDF5. A file format that was totally behind in terms of technology and performance 10 years ago.

Still, why not take the format that naturally fits all your needs : http://libgit2.github.com.

Yes, git (the library).

Advantages:

- Speed and compactness that you won't be able to match (unless you hire a small army of Linus clones)

- Handles hashes and compression *by design*. Also will work across frames and across anything for that matter. (basically you won't have to deal with hashes in Alembic anymore, libgit takes care of that).

- Unique cryptographic hashed on the content. (An alembic file will have a "unique" 64 hash)

- Versioning: save an alembic file and then save more iterations of work on top of it. Roll back iterations .. etc. Versioning is of course space efficient because of the hashing/compression.

- Internet protocol: load and push alembic files from/to the network using a safe protocol for work sharing.

- Multi-platform library that actually compiles.

- File safety guaranteed by design: If such an alembic file gets corrupted it will be immediately known on the first read (you won't read possibly bad data)

- ...

Aghiles

On Wednesday, May 8, 2013 5:42:02 PM UTC-4, Steve LaVietes wrote:

As guided by feedback both internal and external, continual improvement of Alembic's performance remains an ongoing goal of the project. And, over time, it’s become clear that the dependence upon HDF5 (as specific to Alembic) is a significant hurdle to further performance.

Specifically:

1) HDF5 is thread-safe but not thread efficient. As client tools increasingly rely upon concurrent access for reading scene data, this matters much more. (See details here: http://www.hdfgroup.org/hdf5-quest.html#tsafe)

2) While Alembic atop HDF5 provides very efficient storage of large datasets, the structural overhead of many small objects and properties is less efficient.

Fortunately, Alembic's API design has an abstraction layer atop the underlying data format (on disk or otherwise). We initially chose HDF5 as the container format based on its history of successful use within ILM/Imageworks but we designed it to be swapped out if the case arose in the future.

With a few years of experience with Alembic now, it’s clear we need a back-end data format that meets the following requirements:

1) A simple API to easily and efficiently express the higher-level Alembic APIs without change to client code.

2) Minimal library dependencies

3) Data sharing for Alembic's de-duplication features

4) Input/output abstraction (for potentially reading from non-file sources)

5) Thread-efficient reading

6) Low overhead for small data cases

We’ve looked extensively and haven’t found anything freely available that meets our requirements without further trade-offs. So, we have prototyped a new library (Ogawa) and integrated it with Alembic for testing. If you’re curious, you can see the work in progress here:

https://code.google.com/r/millerlucas-dev/source/browse?name=ogawa

As this is a prototype version, we will be spending the next few months gathering and vetting suggestions from the community and honing the implementation to certify it for production use. We are excited by its prospects as -- even at this early stage -- we're seeing these significant improvements in alignment with our goals for Alembic’s future:

1) File sizes are on average 5-15% smaller. Scenes with many small objects should see even greater reductions.

2) Single-threaded reads average around 4x faster

3) Multi-threaded reads can improve by 25x (relative to the same operations in the existing HDF5 back-end) on 8 core systems.

We’re pretty enthusiastic about it.

Of course, we’ll maintain backwards compatibility. Backwards compatibility remains another key goal of the Alembic project and as such, the ability to read and write Alembic data with the existing HDF5 back-end will continue to be supported for the foreseeable future. Only very minor changes to client code are required to support reading the new format transparently along with the old. (This relates only to the instantiation of the archive itself and should be confined to a line or two of code.)

We look forward to your feedback and any questions you may have.

Steve LaVietes

Alex Suter

unread,

May 16, 2013, 4:02:38 PM5/16/13

to alembic-d...@googlegroups.com

A classic hierarchical hash would solve the unique cache id problem, sure. So if all you're trying to figure out is if a cache has changed on disk, you'd be fine.

We tend to want to answer questions like, "Is the animation in this cache still compatible with the rig of this other one?" That requires us to break out the hashing a bit more. One for hierarchy and naming. Another for animation and deformation. Another for topology and UVs. etc.

It could still work with the scheme you detail, but we'd be interested in having those compatibility hashes bubble up to the root object so we can quickly identify the answer to that question without walking the entire hierarchy.

We can do this with our own string properties, of course, but if others see value in standardizing we should hash it out.

I am deeply sorry about the pun.

-- Alex

Jeremy Cowles

unread,

May 16, 2013, 4:18:33 PM5/16/13

to alembic-d...@googlegroups.com

For what it's worth: we implemented a similar hashing scheme internally, tracking specific topological changes that we care about (assuming this would be the most useful thing downstream) along with logic to notify the user and show visual debugging cues in renders. This sounds great in theory, but it literally /never/ gets used because we update our caches so frequently.

We also have a unique IDs per file, which we now explicitly remove for network efficiency (among other arcane reasons).

Congrats on the release of Ogawa!

--

Jeremy

Ryan Galloway

unread,

May 16, 2013, 4:30:23 PM5/16/13

to alembic-d...@googlegroups.com

We developed a similar lib that generates a sidecar file to store hierarchy, topology and other attr hashes.

I have to agree with Alex. The definition of compatible seems highly contextual and pipeline dependent. Perhaps one day we can all conform to a consolidated VFX pipeline spec, but until then it seems like the best way to track files is still with a good ol' fashioned AMS.

That said, perhaps there are a few canonical attributes that could be hashed, but even then it's easy to do this in the export tool and store it in the metadata if you're averse to sidecar files.

RE: libgit2, interesting idea. I wonder if it could work I think we need to be careful about portability with too many backends

Michel Lerenard

unread,

May 17, 2013, 4:53:40 AM5/17/13

to alembic-d...@googlegroups.com

My 2 cents on the hash key use.

"Is the animation in this cache still compatible with the rig of this other one?"

=> We do handle this kind of question at runtime in Clarisse, using array properties digests, and in my opinion, this should not be stored in the file as an attribute: each application has its own way of using information so there's no way to know if the hash attribute will be efficient. Even worse, an application can decide to use a property or not depending of some attributes. Example: we can use UVs, or not. If the Uvs property change and we don't use Uvs, the resource is still valid.

My initial remark was to have a way to identify quickly if a whole file have changed. For me it would be enough to have a hash key on the root node. For the reason stated above, i'm not sure individual hash keys would be really efficient:
- if the file is saved and loaded from the same app it would work. You can set an hash key on a node that is calculated from the use you make of it. (ie not take all properties into account)
- From one app to another other it would mostly be useless: the key should reflect the full content of the node, including all properties stored in it, whether the app decides to use them or not since you don't know which properties will be use. Such a hash would not be used to identify a resource but tell whether the data have changed or not. Which means that if data has changed, you have then to check manually if you have to update your resource in memory. (see Uvs example above)

Lucas Miller

unread,

May 17, 2013, 12:58:01 PM5/17/13

to alembic-d...@googlegroups.com

My takeaway so far is that hierarchical hashes are useful but aren't sufficient to cover all of the workflows in which someone might think a shape or a partial hierarchy is the same.

It's great at determining if things are EXACTLY the same, but isn't great for cases where the mesh might be exactly the same as another, but with an extra arbitrary attribute, or a partial hierarchy is the same but the names on the transforms are slightly different.

I'm still on the fence as to whether including it will be worth the time to calculate (I know this can be calculated in a reasonable time on write) and extra disk space.

Lucas

Lerenard Michel

unread,

May 17, 2013, 3:06:12 PM5/17/13

to alembic-d...@googlegroups.com

That was my point: it won't be standard. Detecting if data from a file changed and identifying a resource are two separate things.

Maybe adding helpers functions so people can easily retrieve digests from properties can be a good idea, that way we can decide how we want to compute a custom hash key, and we can store it in an extra custom attribute. Let people compute the key they want, if they want to.

2013/5/17 Lucas Miller <miller...@gmail.com>

Alex Suter

unread,

May 17, 2013, 3:14:36 PM5/17/13

to alembic-d...@googlegroups.com

We have our own solution for the more fine grained hashes that we require (and wouldn't want to inflict on others, as it is probably unique to our workflow).

If we had these full hashes all the way down the hierarchy like .childBnds (and possibly optional, like .childBnds) we could use it to determine branches that are identical in the cache. That might be useful.

Outside of that, I don't think we'd personally have a use for it.

Lucas Miller

unread,

Jun 4, 2013, 1:42:10 PM6/4/13

to alembic-d...@googlegroups.com

After careful consideration and compromise we've decided to introduce the hierarchical hashes on the IObject/OObject.

The current implementation has already been released to my work in progress branch:

https://code.google.com/r/millerlucas-dev/source/browse?name=ogawa

It currently stores 2 hashes per IObject, one is a hash which represents all of it's properties, and one which represents all of it's descendant objects.

http://code.google.com/r/millerlucas-dev/source/browse/lib/Alembic/AbcCoreAbstract/ObjectReader.h?name=ogawa#154

Reading these hashes are optional so current read times remain unchanged.

Since calculating the hashes proved to be relatively unobtrusive we have currently decided against making it optional when writing Ogawa data.

These hashes won't be generated when writing HDF5 data.

Lucas

Ben Houston

unread,

Jun 4, 2013, 8:58:35 PM6/4/13

to alembic-d...@googlegroups.com

Whoa, that was unexpected but a nice development. I am confident that
a lot of stuff can piggy back on this, even by comparing embedded
sub-trees of hashes (not just specific nodes), and it will pay
dividends over time. Also formal computer science name for this
hashing structure is "Merkle tree":
http://en.wikipedia.org/wiki/Merkle_tree
-ben

Reply all

Reply to author

Forward