Universal Flash Storage (UFS) is a flash storage specification for digital cameras, mobile phones and consumer electronic devices.[1][2] It was designed to bring higher data transfer speed and increased reliability to flash memory storage, while reducing market confusion and removing the need for different adapters for different types of cards.[3] The standard encompasses both packages permanently attached (embedded) within a device (eUFS), and removable UFS memory cards.
The proposed flash memory specification is supported by consumer electronics companies such as Nokia, Sony Ericsson, Texas Instruments, STMicroelectronics, Samsung, Micron, and SK Hynix.[5] UFS is positioned as a replacement for eMMCs and SD cards. The electrical interface for UFS uses the M-PHY,[6] developed by the MIPI Alliance, a high-speed serial interface targeting 2.9 Gbit/s per lane with up-scalability to 5.8 Gbit/s per lane.[7][8] UFS implements a full-duplex serial LVDS interface that scales better to higher bandwidths than the 8-lane parallel and half-duplex interface of eMMCs. Unlike eMMC, Universal Flash Storage is based on the SCSI architectural model and supports SCSI Tagged Command Queuing.[9] The standard is developed by, and available from, the JEDEC Solid State Technology Association.
In September 2013, JEDEC published JESD220B UFS 2.0 (update to UFS v1.1 standard published in June 2012). JESD220B Universal Flash Storage v2.0 offers increased link bandwidth for performance improvement, a security features extension and additional power saving features over the UFS v1.1.
On 30 January 2018 JEDEC published version 3.0 of the UFS standard, with a higher 11.6 Gbit/s data rate per lane (1450 MB/s) with the use of MIPI M-PHY v4.1 and UniProSM v1.8. At the MWC 2018, Samsung unveiled embedded UFS (eUFS) v3.0 and uMCP (UFS-based multi-chip package) solutions.[12][13][14]
On 30 January 2020 JEDEC published version 3.1 of the UFS standard.[15] UFS 3.1 introduces Write Booster, Deep Sleep, Performance Throttling Notification and Host Performance Booster for faster, more power efficient and cheaper UFS solutions. The Host Performance Booster feature is optional.[16]
On 7 July 2016, Samsung announced its first UFS cards, in 32, 64, 128, and 256 GB storage capacities.[19] The cards were based on the UFS 1.0 Card Extension Standard. The 256 GB version was reported to offer sequential read performance up to 530 MB/s and sequential write performance up to 170 MB/s and random performance of 40,000 read IOPS and 35,000 write IOPS. However, they were apparently not actually released to the public.
On 14 May 2019, OnePlus introduced the OnePlus 7 and OnePlus 7 Pro, the first phones to feature built-in eUFS 3.0 (The Galaxy Fold, originally planned to be the first smartphone to feature UFS 3.0 was ultimately delayed after the OnePlus 7's launch).[21]
The first UFS cards began to be publicly sold in early 2020. According to a Universal Flash Storage Association press release, Samsung planned to transition its products to UFS cards during 2020.[22] Several consumer devices with UFS card slots have been released in 2020.
On 08 December 2022, IQOO announced the IQOO 11 which was the first phone to come with UFS 4.0 Storage. After that other android OEMs started using this storage solution on their flagship to upper mid-range category smartphones.[23]
On 30 March 2016, JEDEC published version 1.0 of the UFS Card Extension Standard (JESD220-2), which offered many of the features and much of the same functionality as the existing UFS 2.0 embedded device standard, but with additions and modifications for removable cards.[43]
A UFS drive's rewrite life cycle affects its lifespan. There is a limit to how many write/erase cycles a flash block can accept before it produces errors or fails altogether. Each write/erase cycle causes a flash memory cell's oxide layer to deteriorate. The reliability of a drive is based on three factors: the age of the drive, total terabytes written over time and drive writes per day.[49] This is typical of flash memory in general.
Dell Technologies recently submitted results to the MLPerf Inference v2.0 benchmark suite. This blog examines the results of two specialty edge servers: the Dell PowerEdge XE2420 server with the NVIDIA T4 Tensor Core GPU and the Dell PowerEdge XR12 server with the NVIDIA A2 Tensor Core GPU.
The Dell PowerEdge XE2420 and PowerEdge XR 12 servers are designed for edge computing workloads. The design criteria is based on real life scenarios such as extreme heat, dust, and vibration from factory floors, for example. However, despite these servers not being physically located in a data center, server reliability and performance are not compromised.
The PowerEdge XE2420 server is a specialty edge server that delivers high performance in harsh environments. This server is designed for demanding edge applications such as streaming analytics, manufacturing logistics, 5G cell processing, and other AI applications. It is a short-depth, dense, dual-socket, 2U server that can handle great environmental stress on its electrical and physical components. Also, this server is ideal for low-latency and large-storage edge applications because it supports 16x DDR4 RDIMM/LR-DIMM (12 DIMMs are balanced) up to 2993 MT/s. Importantly, this server can support the following GPU/Flash PCI card configurations:
The PowerEdge XR12 server is part of a line of rugged servers that deliver high performance and reliability in extreme conditions. This server is a marine-compliant, single-socket 2U server that offers boosted services for the edge. It includes one CPU that has up to 36 x86 cores, support for accelerators, DDR4, PCIe 4.0, persistent memory and up to six drives. Also, the PowerEdge XR12 server offers 3rd Generation Intel Xeon Scalable Processors.
The MIG numbers are per card and have been divided by 28 because of the four physical GPU cards in the systems multiplied by second instances of the MIG profile. The non-MIG numbers are also per card.
For the ResNet 50 benchmark, the PowerEdge XE2420 server with the T4 GPU showed more than double the performance of the PowerEdge XR12 server with the A2 GPU. The PowerEdge XE8545 server with the A100 MIG showed competitive performance when compared to the PowerEdge XE2420 server with the T4 GPU. The performance delta of 12.8 percent favors the PowerEdge XE2420 system. However, the PowerEdge XE2420 server with A30 GPU card takes the top spot in this comparison as it shows almost triple the performance over the PowerEdge XE2420 server with the T4 GPU.
The SSD-ResNet 34 model falls under the computer vision category because it performs object detection. The PowerEdge XE2420 server with the A30 GPU card performed more than three times better than the PowerEdge XE8545 server with the A100 MIG.
The following figure shows a comparison of the Recurrent Neural Network Transducers (RNNT) Offline performance of the PowerEdge XR12 server with the A2 GPU and the PowerEdge XE2420 server with the T4 GPU:
The RNNT model falls under the speech recognition category, which can be used for applications such as automatic closed captioning in YouTube videos and voice commands on smartphones. However, for speech recognition workloads, the PowerEdge XE2420 server with the T4 GPU and the PowerEdge XR12 server with the A2 GPU are closer in terms of performance. There is only a 32 percent performance delta.
The following figure shows a comparison of the BERT Offline performance of default and high accuracy runs of the PowerEdge XR12 server with the A2 GPU and the PowerEdge XE2420 server with the A30 GPU:
BERT is a state-of-the-art, language-representational model for Natural Language Processing applications such as sentiment analysis. Although the PowerEdge XE2420 server with the A30 GPU shows significant performance gains, the PowerEdge XR12 server with the A2 GPU exceeds when considering achieved performance based on the money spent.
The following figure shows a comparison of the Deep Learning Recommendation Model (DLRM) Offline performance for the PowerEdge XE2420 server with the T4 GPU and the PowerEdge XR12 server with the A2 GPU:
DLRM uses collaborative filtering and predicative analysis-based approaches to make recommendations, based on the dataset provided. Recommender systems are extremely important in search, online shopping, and online social networks. The performance of the PowerEdge XE2420 T4 in the offline mode was 40 percent better than the PowerEdge XR12 server with the A2 GPU.
Despite the higher performance from the PowerEdge XE2420 server with the T4 GPU, the PowerEdge XR12 server with the A2 GPU is an excellent option for edge-related workloads. The A2 GPU is designed for high performance at the edge and consumes less power than the T4 GPU for similar workloads. Also, the A2 GPU is the more cost-effective option.
It is important to budget power consumption for the critical load in a data center. The critical load includes components such as servers, routers, storage devices, and security devices. For the MLPerf Inference v2.0 submission, Dell Technologies submitted power numbers for the PowerEdge XR12 server with the A2 GPU. Figures 8 through 11 showcase the performance and power results achieved on the PowerEdge XR12 system. The blue bars are the performance results, and the green bars are the system power results. For all power submissions with the A2 GPU, Dell Technologies took the Number One claim for performance per watt for the ResNet 50, RNNT, BERT, and DLRM benchmarks.
Note: During our submission to MLPerf Inference v2.0 including power numbers, the PowerEdge XR12 server was not tuned for optimal performance per watt score. These results reflect the performance-optimized power consumption numbers of the server.
MLPerf inference is a standardized test for machine learning (ML) systems, allowing users to compare performance across different types of computer hardware. The test helps determine how well models, such as GPT-J, perform on various machines. Previous blogs provide a detailed MLPerf inference introduction. For in-depth details, see Introduction to MLPerf inference v1.0 Performance with Dell Servers. For step-by-step instructions for running the benchmark, see Running the MLPerf inference v1.0 Benchmark on Dell Systems. Inference version v3.1 is the seventh inference submission in which Dell Technologies has participated. The submission shows the latest system performance for different deep learning (DL) tasks and models.
b37509886e