Benchmarking on CIFAR-10 datasets

9 views

Skip to first unread message

lie.he

unread,

Aug 17, 2018, 9:14:31 AM8/17/18

to mlbench

We can use CIFAR-10 to develop framework and perform some first results on benchmarking. It is fast to train on CIFAR-10 and there are plenty of papers presenting their results on it.

Models for Benchmarking

`torchvision` package has already included most common models like ResNet (18, 50, etc), VGG, DenseNet.

In DAWNBench, every submission includes their source code and performances (TTA, cost, inference latency).

Some customized models like `custom wide resnet` by fast.ai are very fast. It would be very interesting to add it in our benchmark.

Any other models do you think we should add?

Goals

Our goal of benchmarking on CIFAR-10 include

Make sure mlbench:

can be deployed on different platforms;
is easy to change modules (models, optimizers, etc) and tuning hyperparameters.

Benchmark:

hardware environment (network, different hardwares, etc). maybe we can include a small module to verify that the hardware environment is rationale.
the overhead of benchmarking in mlbench comparing their original implementations. (CPU idle time, etc.)
the impact of having different number of devices, and different nodes; (like NCCL vs MPI)

Visualization:

common learning curves
allow users to upload their own results as is discussed in https://groups.google.com/forum/#!topic/mlbench/5ZoYuEGY2uQ

Other issues like checkpoints may not be very serious, considering the size of dataset.

Are there any suggestions with regards to the choices of models, implementations, functionalities?