Apr 20, 2021, 11:18:46 AM4/20/21
to TensorFlow Developers, jonathan...@gmail.com, TensorFlow Developers, jan.b...@golem.network
I’m Jan's colleague working In Golem. Thank you Jonathan. I would say that your answer is spot on and in general you are right, but… there are some border cases when our idea makes a bit more sense than the general case that already has been discussed. We hope to find some use case to golem fit here.
One of base scenario we are exploring is using nonheterogeneous boxes in the after-hours in the university or office. Often there is a lot of powerful hardware being not used. When forming TF learning cluster on golem, we can use geolocation tag to form the cluster from the boxes in a given location. Then the ping times and bandwidth between the boxes will make it more feasible.
Additionally, the training set can be generated locally on the worker nodes by the simulator engine. This is the approach I was using to train cnn for genetic sequencing. There was simply not enough train data we can use. Our friends from SkyEngine also use train data set generation as a base strategy. So there are cases that do not need extensive training set transfer.
Also, there are countries with the very nice network. Also, 5G is coming. In Golem we are thinking ahead in 5-10 year frame, so I think that we should not stop…
We would like to continue with some form of POC here. We can assume that all the boxes are in the same Ethernet network. We know that for serious HPC the sole Ethernet is not enough and one needs something like an Infinityband hardware connector, but we would like to try with a small POC and if the results are better even by 10% than on a single box it is OK.
Having said above, can you give us any advice on the strategy we should base on / implementing strategy from scratch?