The eMMC chip wears out causing the MCU and the vehicle to fail to operate normally. What’s the matter?

The eMMC chip wears out causing the MCU and the vehicle to fail to operate normally. What’s the matter?

ODI’s recent information requirements for older Tesla Model S and Model X vehicles highlighted the negligence of the workload, where the main control unit (MCU) based on the NVIDIA Tegra 3 processor and integrated 8GB eMMC NAND flash memory encountered problems. When new firmware updates are introduced to bring additional features to electric vehicles (EVs), the problem becomes more complicated.

Monitoring costs | eMMC NAND flash technology and use case requirements

ODI’s recent information requirements for older Tesla Model S and Model X vehicles highlighted the negligence of the workload, where the main control unit (MCU) based on the NVIDIA Tegra 3 processor and integrated 8GB eMMC NAND flash memory encountered problems. When new firmware updates are introduced to bring additional features to electric vehicles (EVs), the problem becomes more complicated. This serves as fuel to further stimulate the progress of NAND flash wear. Although the firmware is not a problem at the beginning, and the recorded data has enough memory to handle the workload, each firmware upgrade brings new features, thereby reducing the storage space for each update. In response to ODI’s information request, Tesla listed 2,399 complaints and field reports, 7,777 warranty claims, and 4,746 non-warranty claims related to MCU replacement programs. When reversing, the faulty MCU caused the image Display of the rear camera to be lost. As the NAND flash memory is completely exhausted, the driver can no longer use certain functions of the vehicle, such as HVAC (defogging), audible beeps related to ADAS, autopilot and turn signal lights, strictly speaking, despite the owner The vehicle can still be driven, but it cannot be recharged, making the vehicle inoperable.

The eMMC module has a predetermined service life because it is based on NAND flash memory technology. They have a limited program/erase (P/E) cycle, and even if the company initially designs in accordance with these specifications, they must also foresee that the same system must cope with increasing workload challenges over time. Finally, there are three aspects to this problem. There is a lack of understanding of NAND flash technology, as well as more complex and multifaceted use cases, and it is assumed that the life of the drive depends entirely on NAND flash technology C rather than the flash controller being used.

Understand NAND flash technology

According to Tesla maintenance experts, due to the NAND flash memory cell structure in eMMC, embedded NAND-based eMMC wear found in older Model S and X components. It is correct to a certain extent. Different types of NAND flash technologies have different (but always limited) P/E cycles or what others call “write cycles”.

• SLC NAND flash memory technology is about 100,000 P/E cycles

• MLC NAND flash memory technology is about 10 000-3500 P/E cycles

• TLC NAND flash memory technology is about 3000 P/E cycles

• QLC NAND flash memory technology approximately 1000-100 P/E cycles

This means that once these cycles are exhausted, the drive will no longer be able to store data reliably. According to Tesla’s report, Hynix cells “are rated for 3,000 program/erase cycles for each NAND flash memory block in eMMC.”

To understand why NAND flash memory cells always have a limited P/E cycle, it is necessary to understand its basic technology. NAND flash memory is a non-volatile memory (NVM) technology that uses charge trap technology or floating gate MOSFET transistors to store data in a fabricated memory cell array. By applying a high Voltage to the control gate of the transistor while grounding the source and drain, the electrons in the channel can obtain enough energy to overcome the oxide barrier and move from the channel to the floating gate. The process of capturing electrons in the floating gate is a programming (or “write”) operation of the flash memory device, which corresponds to a logic bit 0. In contrast, the erase operation extracts electrons from the floating gate, thereby switching the data stored in the NAND flash memory cell to wear out, because the programming and erasing cycles will eventually damage the isolation layer between the floating gate and the substrate. This reduces data retention and may result in data loss or accidental programming of the cell.

Understand the workload of the use case

Tesla electric cars are a challenging environment for any storage application, not only because of the temperature and functional safety requirements of car quality, but also because each car is used in a different way. In this case, the eMMC module will be affected by daily driving time, daily charging time, daily music streaming time, and a series of other factors. In addition, extremely important functions and features depend on the MCU’s ability to perform its work reliably. The eMMC in this ecosystem has a very unique industrial-grade workload, which can only be properly obtained by using high-quality flash memory controllers designed in accordance with industry standards.

Tesla believes that “calculated based on the daily P/E cycle usage rate of 0.7 per block, it takes 11 to 12 years to obtain an average of 3,000 P/E cycles per block in the device. In the 95th percentile of the /E cycle usage rate, it takes 5 to 6 years to accumulate 3,000 P/E cycles in a device on average.” In the final analysis, the harsh nature of composite firmware updates caused these drives to crash earlier than expected. This begs the question, why do these MCUs crash so early?

Understand the role of NAND flash controller

The role of flash memory controllers in high-end storage systems is often overlooked. Where NAND flash memory often attracts attention, many people neglect to evaluate the true ability of the controller to manage its applications, and the selected flash memory has a predefined P/E cycle. Although flash memory technology plays an important role in defining the life of the drive, the selected controller should conceal all the inherent defects of flash memory to extend its life and ensure that there will be no malfunctioning equipment or data damage.

For example, a flash memory controller can perform the best type of error correction coding (ECC) for any particular storage device, depending on the characteristics of the selected NAND flash memory and the processing performance available in the controller. In different types of NAND flash memory, different types of errors are more common. For example, read interference errors are more likely to occur in multi-layer cells (MLC), and other controller functions (such as wear leveling) and garbage collection time are also more common. Affected by over-provisioning in NAND flash memory. Therefore, the controller needs to carefully match the characteristics of the NAND flash memory. If this is ignored, it is not surprising that the drive crashes prematurely before the predicted time. This is an expensive oversight. Choosing the correct flash memory controller is an essential part of designing efficient and reliable storage systems (such as eMMC modules).

In the final analysis, C failure systems and data corruption are not as acceptable in industry as they are in other markets, because life expectancy and failure costs are more urgent. Storage systems like eMMC modules need to be designed for their unique workloads and properly managed to avoid failures in their specific areas. Finally, the flash memory controller plays a very important role in disguising the shortcomings of the selected NAND flash technology and should be regarded as a core component, not just the support of NAND flash.

The Links:   DS90UH929TRGCRQ1 6MBI30F-060

micohuang