Jagged array operations, coming to uproot 3.0

7 views

Skip to first unread message

Jim Pivarski

unread,

Aug 11, 2018, 9:20:43 AM8/11/18

to scikit-h...@googlegroups.com

Hi everyone,

We're currently preparing uproot for version 3.0, which will add two basic features:

* ability to write ROOT files (just simple objects at first, eventually building up to histograms and TTrees)

* remove physics objects and special array types from the main uproot repository and move them to specialized libraries: "uproot-methods" for physics objects and "awkward-array" for arrays

The first is coming together at a good pace, developed by DIANA-HEP summer student Pratyush Das. We can already make ROOT files containing arbitrarily many TObjStrings— not particularly useful for physics, but a solid foundation for understanding the structure of this file format.

The second is what I wanted to e-mail everyone about, because a core piece of awkward-array is ready for testing: Numpy-like operations on jagged arrays.

If you've ever opened a ROOT file containing std::vector<something> (or equivalent) in uproot, you've encountered jagged arrays. They are collections containing variable-length subcollections, unlike 2D arrays, which are fixed-size in both dimensions. In uproot 2.x, you can read these objects and execute nested for loops over their contents, but this is a different execution model and much slower than the Numpy operations you can apply to flat arrays.

Among other things, the awkward-array library defines Numpy-like operations— multidimensional indexing, slicing, masking, and fancy indexing— on jagged arrays. In most cases, there's a natural extension of Numpy's rules for jaggedness: e.g. reducing (sum, min, max) a flat array gives you a scalar, so reducing a jagged array gives you a flat array. Masking a jagged array representing events that contain particles with a flat array of booleans filters events, but masking it with a jagged array of the same structure filters particles. There's even a "per-event cross-join" algorithm developed by Google Summer Code student Jaydeep Nandi, which simulates nested for loops over pairs of particles, all with Numpy.

Below is an online, interactive demo of the new features. They won't be available in uproot until version 3.0, but it let's you get a first look at what's coming and maybe make suggestions if you need something that isn't here.

https://mybinder.org/v2/gh/scikit-hep/awkward-array/0.0.5?filepath=binder%2Fjagged-arrays.ipynb

(The demo doesn't use HEP examples because I'm using it to communicate with computer scientists as well. Interest in jagged arrays goes beyond HEP!)

If you want to run these outside of the Binder notebook, at least one feature has been found to depend on Numpy 1.14+. This will be fixed (eventually should support back to Numpy 1.8).

Cheers,

Jim

Reply all

Reply to author

Forward

0 new messages