Vector representation

35 views

Skip to first unread message

Eric Holk

unread,

Oct 2, 2012, 6:40:01 PM10/2/12

to harla...@googlegroups.com

During the DPP meeting today, we spent a lot of time discussing the way vectors are represented, and it seems like we could make some changes that would open up more optimization opportunities.

Currently, vectors are stored in memory as a length field followed by the contents. Values of type vector are conceptually a pair of a pointer to a region and an offset into that region. However, the Harlan compiler removes the pointer to the region and tracks this statically through other means.

The change we discussed today was removing the length field of the in-heap portion of the vector, and instead represent vector references as a (region *, offset, length) triple, although again the region pointer would be a purely static entity. This makes all vector references an extra word long, but it also makes it easier for us to do 2D kernels on ragged structures. Say we have the following structure:

(let ((X (vector (vector 1) (vector 2 3) (vector 4 5 6)))) ...)

If we wanted to do (kernel ((row X)) (kernel ((i row)) (add1 i)), we could generate a one dimensional kernel with 6 threads. Each thread could have enough information to figure out which leaf it is operating on, and could simply do that part.

Of course, it seems like there are a few complications, such as allocating the output shape array (which is harder if the innermost body were something like (iota i) instead of (add1 i)), but I'm sure we can figure out a way to do this.

As another step, we could make vectors include the length of each inner vector, as well as the prefix-sum of all the length to the left of an element. This would let threads do a binary search to find their target instead of a linear search.

Ultimately, there are a lot of things we could include. For example, in addition to the outermost length, and the prefix sum, we might want the total size of all nested data. Of course, there are times when this isn't useful. For example, we might only be kerneling over the two outermost dimensions of a 3D data structure. Would we want to store the size at each level?

It seems like this technique could also generalize well to other data structures, such as trees. We could have all types include some notion of a size, which is basically the number of elements a kernel would process. For trees, this might be the number of leaves in the left or right subtree. The bookkeeping for these should be fairly simple, because all of our data structures are immutable once constructed.

Anyway, these is a really quick summary of some of the things we talked about doing in our meeting today. Thoughts? Questions?

Reply all

Reply to author

Forward

0 new messages