the outcome of the 1st community submission to MLPerf inference v3.0 and asking for your feedback about our v3.1 plans

0 views
Skip to first unread message

Grigori Fursin

unread,
Apr 17, 2023, 10:23:24 AM4/17/23
to Collective Knowledge

Dear all,


We would like to give you an update about the progress of our MLCommons taskforce on automation and reproducibility since the beginning of this year. Our goal was to provide a unified and automated way for any organization to prepare, optimize and submit their MLPerf inference results no matter which SW/HW stack or inference engine they were using thus reducing MLPerf submission time and costs.


Thanks to your feedback, we have developed a universal and open-source workflow with a GUI using technology-agnostic MLCommons CK/CM automation framework with portable and reusable scripts to automatically connect different models, data sets, inference engines, compilers and hardware descriptions in a transparent and non-intrusive way. 


We particularly thank Neural Magic for working with our taskforce to integrate their Deep Sparse engine with our open-source CK/CM workflow and use it to prepare their diverse end-to-end submissions including power results.


Furthermore, having such a universal and portable workflow allowed our taskforce to organize the 1st community challenge together with the cTuning foundation and cKnowledge to let external students and researchers run, reproduce and optimize MLPerf inference v3.0 benchmarks using our new CK/CM end-to-end submission workflow. 


We are very pleased to announce that the CK/CM technology has helped to automate, unify and reproduce more than 80% of all v3.0 submission results including 98% of power results with very diverse technology and benchmark implementations from cTuning foundation, Neural Magic, Qualcomm, Nvidia, AMD, Intel, Apple, KRAI, HPE, Lenovo and Hugging Face across diverse CPUs, GPUs and DSPs with PyTorch, ONNX, QAIC, TF/TFLite, TVM and TensorRT using popular cloud providers (GCP, AWS, Azure) and individual servers and edge devices provided by our great volunteers


You can learn more about the highlights of our cTuning results in the following two articles at ZDnet and Forbes: "CTuning took the top spot for the lowest latency, the shortest time from submission of a query to when the answer comes back, for four out of five tasks on the benchmark for edge computing, within the closed category."


Furthermore, we have also created a prototype of a free on-prem platform (MLCommons CK playground) with this MLCommons repo to make it easier for anyone to visualize and compare MLPerf inference results publicly or privately at any time during or after submission. 


For example, you can see and compare all MLPerf inference results v3.0, v2.1 and v2.0 online together with reproducibility reports including the MLPerf BERT model from the Hugging Face Zoo on Nvidia Jetson Orin platform . You can even create your own derived metrics (such as performance per Watt), provide your own constraints using this MLCommons repository and visualize them as shown in this example for power efficiency with X axis showing latency in ms. and Y axis - inference per Joule.




Our current plan is to continue improving the MLCommons CK/CM technology and encoding all the best practices and optimization techniques from all MLCommons members into our universal workflow. We will also continue providing free help to all MLCommons organizations to unify, optimize and prepare their MLPerf inference v3.1 results at scale while minimizing their costs. 


That’s why we would like to hear from you about how we can help you with the MLPerf inference v3.1 submission while connecting you with our students and researchers to test your tools and optimize your workloads. Please check out our next public challenges powered by the MLCommons CK playground that we are preparing with the community to help you reproduce and improve various MLPerf results for v3.1 submission round.


Please feel free to get in touch with any feedback and suggestions!


Thank you and looking forward to collaborating with you,

Grigori and Arjun

=============================================================

Grigori Fursin

* MLCommons taskforce on automation and reproducibility

* cTuning foundation and cKnowledge


Reply all
Reply to author
Forward
0 new messages