Hey Matan,
It's only intended to solve the latter problem -- to be a PAAS for Caffe instances. For the "grid computing" approach, check out
ParameterServer .. there was recently a caffe issue discussing this, but I can't find it.
Issue 876 has some discussion about it.
Here's the itch I am personally trying to scratch: after using Caffe for a while, I started finding it inconvenient to run things on my own laptop. Often I'd kick off a job and then need to throw my laptop in my backpack and run out the door, and so the job would stop running. I came to the realization that I needed to run Caffe in the cloud.
After thinking through it, I also realized I wanted to be able to run multiple training jobs in parallel, as well as have the ability to queue up jobs so that I could fire off a bunch of experiments all at once, and check back later after they were all done. (and the ability to specify how many workers would be servicing the queue, depending on how fast I needed the results).
Going further down the feature creep rabbit hole, I decided that the system should be multi-tenant and allow multiple users, where each user only sees their own data. This would be useful for a team to share a single cluster in the cloud, so they only have to set it up once and have a single thing to maintain.
Even futher .. to make it easy as possible to use, I wanted to be able to interact it with it via a web interface (and even a mobile app), so I figured I'd start with a REST api and build those on top of it later.
So those are the core problems I'm trying to solve..