pyalembic: Faster vertex data access

Tyler Fox

unread,

Dec 14, 2017, 7:53:45 PM12/14/17

to alembic-discussion

Loading alembic vertex data for use in Python can be very slow, so I went looking for a faster way. I'm going to list the ways I found for speeding up python just to make it easier to stumble upon this message with the search function later.

My sample data is 562 frames of a 35877 vertex mesh.

Option 1: Vertex access.

This is the slow way that I wanted to improve upon. Also, it's noticeably faster to index the vertex components than it is to unpack them.

On my sample data, this took about 48 seconds.

data1 = np.array([[(vert[0], vert[1], vert[2]) for vert in sample] for sample in prop.samples])

Option 2: Component access.

V3fArray objects have component accessors. Rather than reading each vertex (with a lot of Python object creation overhead) you can access the .x, .y, and .z components of the arrays individually.

On my sample data, this took about 26 seconds.

data2 = np.array([(list(sample.x), list(sample.y), list(sample.z)) for sample in prop.samples])
data2 = data2.reshape((data2.shape[0], -1, 3))

Option 3: imathnumpy

Did you know that, along with the imath module, there's a *separate* imathnumpy module? Because I sure didn't. And as of this writing, there are only 4 Google results, so it seems that nobody else knew either.

That said, there's a *bit* of a caveat with this one. The object returned from imath.arrayToNumpy may get garbage collected. If that happens, your numpy array (which is just a memoryview) will contain junk data. To fix this, wrap your call in an array copy as shown.

On my sample data, this took about 5 seconds for a fresh run. (If I reran the test, it took about 1.2 seconds on a subsequent run)

import imathnumpy
data3 = np.array([np.array(imathnumpy.arrayToNumpy(s), copy=True) for s in prop.samples]))

So I think we have a winner. However, there was something else I saw that *may* have beaten imathnumpy ... if it wasn't bugged.

Option 4: Serialization *BUGGED*

IArrayProperty has a serialize() method. It looks like it *should* read the sample data of a property, and return it as a string. This would, of course, be extremely useful for reading data directly into numpy without the slow stopover in python.

However, every single type of property I tried gives me this error:

TypeError: No to_python (by-value) converter found for C++ type: class std::basic_stringstream<char,struct std::char_traits<char>,class std::allocator<char> >

TL;DR: Use imathnumpy. It's about 100x faster. Just make sure to copy the array like in the example, otherwise you'll get garbage data.

~T.Fox

Dorian FEVRIER

unread,

Dec 15, 2017, 7:49:06 AM12/15/17

to alembic-discussion, Tyler Fox

Thanks for the tips Tyler!

--
You received this message because you are subscribed to the Google Groups "alembic-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alembic-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lucas Miller

unread,

Dec 15, 2017, 3:26:06 PM12/15/17

to alembic-d...@googlegroups.com

Thank you! This info was really interesting.

As we start thinking about changes to accomodate python 3, is there a
new hotness we should look into to more quickly deal with large
arrays?

Lucas

Tyler Fox

unread,

Dec 15, 2017, 6:40:08 PM12/15/17

to alembic-discussion

The first one, I think, is just to make the .serialize() method work. That will quickly get the bytestring data out of the alembic files without having to worry about corrupted memoryviews, or other module imports.

Plus it looks like it will work for all data types so long as we interpret the bytestring type correctly.

Then, of course, we have the opposite problem: Building arrays for output.

Currently, we have to iterate through our data in python to load something like a V3fArray when that could (should?) be handled in c++.

Updating the pyilmbase code to use the python buffer protocol would go a long way to fix that.

And once that change happens, it needs to be in the tutorials post-haste.

And since I've got your ear right now, I've got a general pyalembic complaint

Automatic garbage collection does not play well with alembic.

I can't count the times where python garbage collection wasn't called in time to destroy the wrapped OArchive object before I copied the .abc file. And sometimes, even gc.collect() won't fix it (circular references keeping some OObjects around, I think)

In python, please give us a .forceWrite() method or *something* that guarantees the archive is written and closed.

~T.Fox

> email to alembic-discussion+unsub...@googlegroups.com.

Tyler Fox

unread,

Jan 24, 2018, 4:01:10 PM1/24/18

to alembic-discussion

I take it back. I apparently didn't think things through completely.

The imathnumpy python module handles the output as well as input.

I talked about a caveat in the first post of this chain saying that imathnumpy.arrayToNumpy returns a memoryview object.

Well you can write to a memoryview as well as read from it *smacks forehead*

from imath import V3fArray
from imathnumpy import arrayToNumpy
import numpy as np

# pts is a (N, 3) shaped np.array
array = V3fArray(len(pts))
memView = arrayToNumpy(array)
np.copyto(memView, pts)

~T.Fox

Reply all

Reply to author

Forward