Table vs CArray

27 views
Skip to first unread message

Ken Walker

unread,
Jun 20, 2018, 3:29:08 PM6/20/18
to pytables-users
I'm looking for insights regarding when to use a Table and when to use an CArray.
As I understand:
-Tables are always 2D with unstructured data types.
-All array types can have any dimension (1D, 2D, 3D, etc).
-Arrays can be NumPy arrays and scalars
-Carrays can have mixed types (defined by Atom class), similar to NumPy unstructured arrays.
If I have that right, what's the difference between a Table and 2D CArray? (especially if I don't need a enlargeable array)
Are there advantages of one over the other?

For background, I'm working with HDF5 data provided to me (I don't control the format).
All datasets are saved in Table format, and PyTables table methods have been sufficient to find and manipulate the data I need.
The data values are floats with ints that represent location and time IDs.
Here are 2 examples of exported .coldtypes and .coldtypes.shape:
LOC_ID := int64, ()
UX := float64, ()
UY := float64, ()
UZ := float64, ()
TIME_ID := int64, ()

LOC_ID := int64, ()
VALUE := ('<f8', (6,)), (6,)
TIME_ID := int64, ()

A typical dataset has 10e5 LOC_IDs for each TIME_ID.
The number of TIME_IDs is highly variable. Some datasets will have 1, other sets could have 1000s.
The largest data set might have 500,000 LOC_IDs and 20,000 TIME_IDs (10e10 rows).
I usually want to access all LOC_ID data for one TIME_ID (but there are times I slice in other ways). 
table.where and table.read  have been very useful to slice the data as needed.

Are there any benefits to reorganizing the Table into a 3D CArray with the same columns/rows except using the LOC_ID as the third dimension?

Thanks,
-Ken
Reply all
Reply to author
Forward
0 new messages