Hi all,
New to the community, thanks in advance for reading! I'm considering using GraphLab for a large probabilistic inference problem on astronomical images, and I'm wondering about whether GraphLab has certain features that would be crucial. Our model can be represented by two "layers" of vertices, each distributed (relatively) uniformly over space and connected to "nearby" vertices in the other layer only. The top layer has S source nodes (corresponding to latent stellar objects), and the bottom layer has N observation nodes (pixel data, either represented as 1 node per pixel or 1 node per image patch?). However, because the linkages are limited in their "distance" in space, there are K<<SN edges. I'm relatively confident that our inference algorithm can be expressed as Gather-Apply-Scatter steps on these vertices/edges.
With that context, I'd like to pick your brains about two features that we'd need, since it's hard to tell from the tutorials and API whether they are supported in GraphLab - and if they are supported, what would I look for in the documentation?
1. Since N*data_per_observation_node is large (multiple terabytes), we'd like to minimize the amount of movement of data between machines. Is there a way to tell GraphLab to do this, i.e. give observation node n, and operations on its local neighborhood, an affinity for machine g, instead of distributing it to a random machine every time the scheduler decides to update n's local neighborhood? Can GraphLab deal with fault tolerance on machine g's associated vertices if machine g goes down?
2. A priori, we do not know the number of sources S, and we'd like to vary this number using reversible jump Metropolis-Hastings. This would correspond to adding or removing source vertices and edges in the graph between iterations/during GAS rounds. Is it possible to do this while the system is running, i.e. update steps are occurring? Or would we need to do something like start with a pool of disabled source vertices, and enable from that? It might be very inefficient to do so since we would need to support all plausible edges between a new source vertex and the observation vertices.
Thanks for your help! GraphLab looks awesome, and I hope that we can use it!
Best,
Brenton Partridge
Harvard University