The Business & Technology Network
Helping Business Interpret and Use Technology
S M T W T F S
 
 
 
 
 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
 
 
 
 
 
 
 
 
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
29
 
30
 

Nvidia’s Blackwell GPUs are so hot, they could bake a cake

DATE POSTED:November 18, 2024
Nvidia’s Blackwell GPUs are so hot, they could bake a cake

Nvidia’s Blackwell GPUs face overheating challenges impacting major tech clients. The next-generation processors are struggling to perform effectively in server racks housing 72 GPUs, raising concerns for companies like Google, Meta, and Microsoft about timely deployment. Reports indicate that Nvidia is reevaluating its rack designs multiple times due to these overheating issues, which risk damaging components and limit GPU performance. The anticipated power draw for these configurations is up to 120kW per rack.

Insiders informed The Information that Nvidia’s Blackwell GPUs for AI and high-performance computing (HPC) have overheated in high-capacity servers, affecting launch timelines for clients relying on these technologies. In a bid to address the complications stemming from these overheating problems, Nvidia has requested its suppliers to modify the rack designs repeatedly. A spokesperson from Nvidia emphasized their collaborative approach with cloud services, describing the design changes as a routine part of the development process.

Adjustments in design to counteract overheating issues

Previously, delays in the Blackwell production ramp were attributed to a “yield-killing” design flaw. The Blackwell B100 and B200 GPUs utilize TSMC’s CoWoS-L packaging technology, which integrates two chiplets for enhanced data transfer speeds of up to 10 TB/s. However, a mismatch in thermal expansion characteristics among the GPU chiplets and other components led to warping and system failures. To resolve this, Nvidia made modifications to the GPU silicon’s metal layers and bump structures.

The result of these improvements only entered mass production in late October, with expected shipping dates pushed back to late January. This delay is critical for Nvidia’s clients like Google, Meta, and Microsoft, which depend on these GPUs to enhance their most powerful AI models. Nvidia previously touted the Blackwell chips as being 30 times faster for tasks such as responding to chatbot queries compared to earlier models.

Nvidia’s Blackwell chip revenue was projected to reach $6 billion in the next quarter, highlighting the high demand despite the ongoing supply constraints. Nvidia, recently surpassing Apple, is now the world’s most valuable company with a market capitalization soaring to $3.482 trillion. However, the continuous setbacks regarding the Blackwell processors threaten to disrupt the planned advancements in AI capabilities essential for major tech players.

Featured image credit: Nvidia