Implementations of (1) will use the mechanics which will be developed in 2.
Implementations of (2) can be wrappers of functions of non-parallel optimization algorithms (e.g. adadelta, rmsprop, adam, nesterov's etc) using Theano with libgpuarray/pygpu backend.
Implementations of (3) can be a simple periodic rule or a schedule of periods or an exotic one (let's say entropy-based) or whatever.
Implementations of (4) will be the Downpour or EASGD mechanism, or a simpler one and will be the one that characterizes actually the algorithmic template of the inherited abstraction.
Implementations of (5) can be again something like the (3).
Splitting responsibilities like this, would allow to build independent reusable codebase, to create quickly many variations of algorithms, to minimize user coding and disengage hyperparameters from a concrete usable algorithm (something that defines all (1,2,3,4 and 5)) to be managed in the place (1, 2, 3, 4 or 5) where they are actually used.
In order for this to work well, implementations of (4), apart from the definition of the "synchronization" rule as it is defined now, will need to define routines built upon controller or worker communication helper functions which will tell a concrete controller or worker how to act on (4).
What do you think about these thoughts? Most probably, the notion of a controller and a worker will have to change/be augmented to apply such changes (i.e. a worker can be related to one or a group of devices), but the handling of the communication layer will remain their responsibility.
Because most probably I have not expressed my thoughts well enough, if you want I can write a diagram to be a flexible reference on what we will be discussing.
- - - - - - - - - - -
tl;dr: In case you agree that we need to discuss upon platoon API in general, I propose to organize a web meeting in order to discuss and decide on. Do you agree?
Christos
Hello to everyone!
I'm starting a thread in order to discuss the designing and development of the Platoon.
Platoon is a mini-framework whose purpose is to provide Theano users with training algorithms implemented for multi-gpu/multi-node infrastructures.
Here I would like to pose questions about its current state, discuss its purpose in specific and its responsibilities in the ecosystem, ask for new features to be included and discuss overall about is design and implementation details.
Without further introduction, I am posing my own questions about its current state and afterwards I will suggest changes in its requirements.
1
Implementation uses posix_ipc.ExistentialError in 4 places in channel.py. I am not sure I understand what its purpose is. It is used once when the Controller unlinks_semaphore and once when a Worker unlinks_shared_memory, both before creating each of those and when destroying them. Is it for clearing out (for sure) previously declared semaphores/shared memories with the same name?
3
I notice that a zmq push socket (asocket) is used in order to deliver minibatches from the controller to workers with a single publisher and multiple subscribers to the same topic (port). I believe that this is costly, especially for large minibatches, because in order to make multi-process/multi-gpu computation (1 process = 1 gpu), batches will be copied to O(n) worker processes and from there to O(n) devices. This introduces a lot of latency I think. So I suggest the following:
Implementations of (2) can be wrappers of functions of non-parallel optimization algorithms (e.g. adadelta, rmsprop, adam, nesterov's etc) using Theano with libgpuarray/pygpu backend.
Implementations of (3) can be a simple periodic rule or a schedule of periods or an exotic one (let's say entropy-based) or whatever.
Implementations of (4) will be the Downpour or EASGD mechanism, or a simpler one and will be the one that characterizes actually the algorithmic template of the inherited abstraction.
Implementations of (5) can be again something like the (3).
Splitting responsibilities like this, would allow to build independent reusable codebase, to create quickly many variations of algorithms, to minimize user coding and disengage hyperparameters from a concrete usable algorithm (something that defines all (1,2,3,4 and 5)) to be managed in the place (1, 2, 3, 4 or 5) where they are actually used.
In order for this to work well, implementations of (4), apart from the definition of the "synchronization" rule as it is defined now, will need to define routines built upon controller or worker communication helper functions which will tell a concrete controller or worker how to act on (4).
What do you think about these thoughts? Most probably, the notion of a controller and a worker will have to change/be augmented to apply such changes (i.e. a worker can be related to one or a group of devices), but the handling of the communication layer will remain their responsibility.
Because most probably I have not expressed my thoughts well enough, if you want I can write a diagram to be a flexible reference on what we will be discussing.
- - - - - - - - - - -
tl;dr: In case you agree that we need to discuss upon platoon API in general, I propose to organize a web meeting in order to discuss and decide on. Do you agree?
--
Christos
---
You received this message because you are subscribed to the Google Groups "theano-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Explaining changes in `Worker`:
Each node spawns a single Controller instance in its process. Each node spawns also N Worker instances in their own processes. As before, the controller manages and co-ordinates the N worker and each worker corresponds to a single gpu device on the node.
For `init_worlds`:
For `new_shared_mem`:
This is going to be current `init_shared_params` with 2 differences:
For a collective, such as `all_reduce`:
These are the "major" changes in worker/controller.
Considering the other changes, I will ask you to read platoon/generic_training.py in my clone.
Please begin from: line 239
I tried to express my thoughts as an interface and as documentation, please ask me for further explaination.
I post 2 diagrams with my thoughts on the communication, the Worker/Controller and the generic training class suite.
And also, in the attachment exists the diagram file from staruml.
I feel like I should explain more about decisions in this proposal.
Please let's all who are interested, organize a web meeting in order to discuss changes towards multi-node.
Christos
Well, it's not necessary to use the proposed framework as a whole. One who has already some things implemented can try to use, if he wants, a Sampler or a Condition as a standalone. Also because this design expects that the worker is instanciated by user's code, someone can use the worker still by itself without using the implied flow of GenericTraining's train. Nevertheless, someone, who starts building a trainer, can use the class and all the required object types to create and experiment with reusable & fast (due to the multi-gpu/node interface) training "parts" which can be combined in any ways. Finally, someone who wishes not to adapt code or use the 'train' function, can still use the 'sync' function alone. It stays a mini-framework in a sense that does not enforce the user to use its ways but will provide reusable helpers.
I agree with the callback injection!
I would like to include also a 'validate' function and a 'make_validation_sample' as well.
I will rewrite lstm example accordingly, as a proof of concept.
Also as I expect to use this thing to experiment with various dynamics and also implement existing ones, how does one unit test a gradient descent???
Well, it's not necessary to use the proposed framework as a whole. One who has already some things implemented can try to use, if he wants, a Sampler or a Condition as a standalone. Also because this design expects that the worker is instanciated by user's code, someone can use the worker still by itself without using the implied flow of GenericTraining's train. Nevertheless, someone, who starts building a trainer, can use the class and all the required object types to create and experiment with reusable & fast (due to the multi-gpu/node interface) training "parts" which can be combined in any ways. Finally, someone who wishes not to adapt code or use the 'train' function, can still use the 'sync' function alone. It stays a mini-framework in a sense that does not enforce the user to use its ways but will provide reusable helpers.As long as that stays possible, then it should be ok. But we might have to underline this in the docs so that people don't feel like they have to buy the whole thing.
I agree with the callback injection!
:)
I would like to include also a 'validate' function and a 'make_validation_sample' as well.One of the things to watch for is that if we are doing synchronous training, then it might be wrth it to distribute the validation and test. Otherwise it's going to the one worker doing it and the others twiddling their thumbs.
I will rewrite lstm example accordingly, as a proof of concept.Good idea.
Also as I expect to use this thing to experiment with various dynamics and also implement existing ones, how does one unit test a gradient descent???It's not really possible to "unit test" gradient descent. You can test the individual parts, but it will require some integration tests that train a known model to some degree of accuracy.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+unsubscribe@googlegroups.com.
--
---
You received this message because you are subscribed to the Google Groups "theano-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+unsubscribe@googlegroups.com.
Hi,This is great, but first, wrap nccl in libgpuarray and use it before changing too much thing and working on multi-node.
For the multi-node, we need to think about it. I think it is not duplicate effort with Theano-MPI, as Theano-MPI is about model-parallelism and this project about data parallelism only.
Fred
Christos
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+unsubscribe@googlegroups.com.
--
---
You received this message because you are subscribed to the Google Groups "theano-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.