loading

WELCOME TO LORI COOLING,LORI IS COOLING SOLUTION PROVIDER 

Call: +86 17775668621     E-mail: sales@lori-cn.com

AI Server NVIDIA H200 Cooling Solution—— Liquid Cooling Technology

2024-09-04

Against the backdrop of the global Artificial Intelligence (AI) boom, NVIDIA's newly released H200 Tensor Core GPU has attracted a lot of attention. Based on NVIDIA's Hopper architecture, this graphics card is NVIDIA's first to utilize HBM3e high-bandwidth memory, which provides higher speeds and greater capacity, allowing the H200 to more efficiently process large datasets, which is critical for developing large language models.

The NVIDIA H200 GPU is particularly well suited for large data center and enterprise server environments. It was designed to meet the needs of application scenarios such as AI-as-a-Service (AIaaS), large-scale machine learning training, and high-performance computing. Leading server vendors, such as Supermicro, have announced a partnership with NVIDIA to deliver a variety of server systems supporting the H200 graphics card, which dramatically accelerate the training of AI models and improve data center efficiency and scalability. The launch of H200-enabled servers will also require a more efficient liquid cooling system to dissipate heat.

Three core reasons for the development of liquid cooling

1. Chip thermal power consumption reaches the limit of air cooling.

The training and promotion of AI large models requires more computing power on the chip, and more power consumption on a single chip. The temperature of the chip affects its performance. When the operating temperature of the chip is close to 70-80 °C, the performance of the chip decreases by about 10% for every 2 °C increase in temperature. As a result, the increased power consumption of individual chips further increases the need for heat dissipation. NVIDIA H200 TDP up to 700W, thermal power consumption has reached the limit of air-cooled processors 350W ~ 400W. Even NVIDIA GPUs have a tendency to run to the development of 1000W.

2. The state of the data center PUE puts forward a higher demand.

Data center is a major energy consumer, China's data center energy consumption is high, reduce PUE for energy saving and emission reduction will also play a big role.

3. Liquid cooling is more obvious than the advantages of air cooling heat dissipation.

  • The same volume of liquid to take away heat is 3,000 times that of air;

  • The thermal conductivity is 25 times that of air;

  • The same level of heat dissipation, liquid cooling both no-load and full-load conditions of its system noise than air cooling are much lower, according to laboratory data to reduce the average of 10 ~ 15db;

  • Liquid cooling system is about 30% more energy efficient than air cooling system.

Classification of liquid cooling technology

According to the cooling method can be divided into two categories: indirect cooling and direct cooling. Indirect cooling mainly rely on the cold plate as an intermediate medium for the heat exchange of the device, this way is also widely used by the industry. Direct cooling is divided into submerged and spray type, and submerged and is distinguished into single-phase type (non-phase change) and phase change type, and single-phase type is widely used by the industry. The spray type is usually rarely used.Next, we will focus on cold plate liquid cooling.

The biggest difference between cold plate liquid cooling and immersion liquid cooling is that the CPU, which is the component that generates the most heat, does not come into direct contact with the heat dissipation liquid, but rather, the liquid is infused in the conduction device, and the CPU end is affixed with the liquid conduction device, and the heat is carried away through the conduction device, thus achieving the heat dissipation effect. Cold plate liquid cooling has been commercially used in HPC and AI high-density computing for more than 8 years, with mature technology, perfect ecology and controllable overall cost. What's more, cold plate liquid cooling does not change the customer's usage habits, hard disk, optical module and other components are consistent with air-cooled, and the operation and maintenance mode, server room load-bearing and air-cooled scenarios are basically the same, and at the same time, the single-point cooling capacity is more than 700 watts, so it can effectively reduce the PUE of the data center, which is more suitable for large-scale commercial use.

Liquid cooling technology is key to overcoming the challenges of AI cloud computing, paving the way for very large scale cloud services.As a global cooling solution supplier,Lori specializes in manufacturing standard PC heat sinks, server heat sinks,and research liquid cooling technology to provide cooling solutions for AI servers.

...                
...                
...                
Chat Online
Chat Online
Leave Your Message inputting...
Sign in with: