[CM task force update] automating, visualizing and comparing upcoming MLPerf inference v4.0

2 views

Skip to first unread message

Grigori Fursin

unread,

Sep 22, 2023, 10:52:37 AM9/22/23

to Collective Knowledge, artifact-...@googlegroups.com

The MLCommons Task Force on Automation and Reproducibility is pleased to announce the release of the CM automation language v1.5.3 with beta MLPerf inference explorer that provides a unified interface to run, visualize and compare MLPerf inference benchmarks v3.0+.

Here is a highlight of new features added thanks to important feedback from our users including Neural Magic, TTA, One Stop Systems, Nutanix, Collabora, Deelvin and participants in our public MLPerf challenges:

We improved MLCommons CK playground with a better visualization and comparison of inference results via our beta MLPerf explorer.

It is now possible to generate comparison reports before, during or after submission as demonstrated at https://github.com/mlcommons/ck/tree/master/cm-mlops/report/mlperf-inference-v3.1-analysis-ctuning :

For the first time, all reference implementations were benchmarked in the edge category closed division in v3.1 using CM.

The community obtained competitive performance results using Nvidia implementation with CM automation on AWS and GCP.

CM helped several external companies including One Stop Systems submit top performance numbers on an Nvidia-powered Rigel Edge Supercomputer.

We benchmarked a small 4-core Azure instance with Intel’s BERT implementation automated by CM.

We added a new CM feature to automate and visualize multiple experiments such exploring sparsity, quantization and batch size of BERT models across multiple AMD, Intel and ARM-based systems in terms of performance, accuracy and power consumption.

It is possible to use a common CM interface to run all MLPerf inference benchmarks natively or inside containers and prepare open and closed submissions with compliance tests in an automated and reproducible way with practically any combination of

all MLPerf models including GPT-J with preliminary support for DeepSparse Zoo, Hugging Face Hub and BERT pruners from the NeurIPS paper

Main MLPerf implementations including all reference, Nvidia, Intel, TFLite, Modular Inference Library;

Main frameworks and run-times including DeepSparse, PyTorch, TensorFlow, TFLite, TVM, TensorRT, ONNX, NCNN, Triton; we plan to add support for QAIC and other frameworks for v4.0

Diverse hardware including Coral TPU, Nvidia GPUs (A100,T4,L4,RTX 4090, Jetson Orin), Intel/AMD servers, Graviton, NeoVerse, Apple metal

As for v4.0 submission, we have requests to add the CM interface for Intel, Qualcomm and a few other inference implementations and improve our GUI to generate CM commands. Please note that this is a community project extended based on your suggestions and feedback - please don’t hesitate to get in touch with our task force via public Discord server or email if you want to learn how to use CM automation for MLPerf, add new hardware backends, showcase new platforms and optimize submissions.

We are also excited to see that our CM automation language is now used outside MLCommons:

ACM/IEEE MICRO’23 used our CM automation language to provide a unified interface to prepare, run, reproduce and visualize results from different papers: https://github.com/ctuning/cm-reproduce-research-projects/tree/main/script

Arjun Suresh and I were appointed as MLPerf liaisons for the Student Cluster Competition at SuperComputing’23 to help students and researchers run and optimize MLPerf inference benchmarks across different servers on the spot using CM automation - please get in touch if you want to showcase your technology there.

We will give a CM-MLPerf tutorial at the IEEE IIWSC’23 in a week: https://iiswc.org/iiswc2023/#/program/ - please feel free to join us and/or spread the word.

We would like to deeply thank many individuals who helped us to submit > 12000 MLPerf inference v3.1 results out of ~13500 including Anandhu S, Michael Goin, Jose Armando Hernandez, Ilya Kozulin, Amrutha Sheleenderan, Justin Faust, Datta Nimmaturi, Byoungjun Seo, Resmi Arjun, Miro Hodak, Nijo Jacob, Iliya Slavutin, Ashwin Nanjappa and Ethan Cheng.

Looking forward to collaborating with you to automate and optimize MLPerf inference v4.0!

Grigori Fursin and Arjun Suresh

MLCommons Task Force on Automation and Reproducibility and the cTuning foundation

Reply all

Reply to author

Forward

0 new messages