Protocol of the meeting:
We discusses the current status: Dashboard/API is working, OpenMPI with simple example is working, only minibatch-sgd example and metrics gathering for example still need to be done.
Discussed how we should measure performance:
The clock should be stopped when calculating Accuracy to not falsify results, evaluation should not be included in benchmarking metric.
Maybe just save checkpoints during training at regular interval and calculate accuracy at the end of the run.
Training can be done until accuracy is reached with time taken as metric, or we train for a fixed-time and use final accuracy as metric.
Discussed data distribution:
Data should already be present on workers when benchmarking starts (Possibly already in memory).
Possibility for algorithms to load data themselves on Open division.
We need to decide on some datasets that are fixed for closed divisions, no other datasets are allowed.
Preprocessing needs to be fixed in closed division as well.
We could create 2-3 default dataloaders to deal with the most common use-cases and those are proscribed for all benchmarking tasks (All data on all workers, Even split of data among workers, ?)
Decided to create a new file on github detailing the open/closed division.
Dimensions to compare are (for now): Hardware (Google Cloud vs. AWS, GPUs etc.), Scaling (1,2,4,8 etc nodes), Network bandwidth
In the beginning, only synchronous training will be implemented.
Fair comparisons are the most important criteria for our implementation.
Please let me know if I missed something.