Comments on: When Push Comes To Shove, Google Invests Heavily In GPU Compute

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Mon, 31 Jul 2023 11:09:16 +0000

In reply to JayN. I do not think their yields on Ponte Vecchio for the manufacturing were very high. I have heard some horror stories. But, they will get better, and IBM will help them--they know packaging, too.

By: JayN

JayN — Mon, 31 Jul 2023 01:20:40 +0000

In reply to Matt. I believe Intel does their own advanced packaging on their data center GPUs, so those are not necessarily limited by the TSM packaging bottleneck.

By: Matt

Matt — Sun, 14 May 2023 17:03:48 +0000

In reply to EC.

You can research the Alps and Venado supercomputers as well as discussions/announcements of Grace itself from March 2022 to see that Grace was planned for “early 2023”. Here’s something from Nvidia’s website published August 23, 2022 which is still saying “first half of 2023”: https://developer.nvidia.com/blog/inside-nvidia-grace-cpu-nvidia-amps-up-superchip-engineering-for-hpc-and-ai/

By: EC

EC — Sun, 14 May 2023 00:46:52 +0000

In reply to Thomas Hoberg. Grace wasn't planned to ship until 2H 23. Certainly Nvidia has planned and booked packaging capacity for Grace and Grace+Hopper skus, but they are probably (typically) conservative projections.

By: Hubert

Hubert — Sat, 13 May 2023 19:46:09 +0000

In reply to OGeneral.

I wouldn’t understimate the importance of a well-integrated software stack, down to the bottom layers of hardware-specific libraries, even when developing at the much higher level of PyTorch or huggingface. In “HPC” (for example) there’s quite a difference in MATLAB performance (high-level programming) based on whether one uses generic BLAS, GotoBLAS, rocBLAS (AMD), or MKL (Intel). In particular, using MKL on AMD CPUs, or rocBLAS on Intel chips, can be most entertaining … This being said, today, both PyTorch and huggingface seem to favor (support) the nVidia GPU ecosystem a bit more than others ( https://pytorch.org/docs/stable/backends.html , https://huggingface.co/pricing#spaces ) and so I would expect developers to have the better experience with that HW, or something compatible-ish 8^p. This is a situation that one hopes would improve over time, with PhDs from more HW vendors contributing highly-tuned code libraries to those high-level programming frameworks (but I could be missing somethings…).

By: Hubert

Hubert — Sat, 13 May 2023 18:07:27 +0000

In reply to Thomas Hoberg. I think they may be prioritizing the opportunity of pairing with Sapphire Rapids (nicely performant), as a substitute for Ponte Vecchio (possibly a bit lackluster at this time).

By: Matt

Matt — Sat, 13 May 2023 16:58:36 +0000

In reply to Thomas Hoberg.

Grace was supposed to be a 2023 part while Hopper and Bluefield-3 were to be 2022 parts. Actually Hopper and Bluefield-3 seem to have appeared more slowly than originally promised. Perhaps AMD’s and especially Intel’s delays with PCIe 5 capable platforms are the main reasons for that but Bluefield-3 especially still seems a bit scarce.
Grace seems to have been delayed by about 6 months. Just the fact that it’s Nvidia’s first foray into server CPUs makes that unsurprising. Who knows why it’s been delayed. But the CoWoS capacity issue likely only materialized after the demand surge related to ChatGPT hype. Any 1H 2023 plans for Grace should have been in place well before that, so my guess is that the delay is unrelated to CoWoS capacity. It’s likely also unrelated to demand for Grace. Los Alamos National Laboratory and the Swiss National Supercomputer Center likely wanted their Grace chips in H1 2023 and not H2 2023.
In TSMC’s latest earnings conference call, on April 20th, their CEO, CC Wei, said “…just recently in these 2 days, I received a customer’s phone call requesting a big increase on the back-end capacity, especially in the CoWoS. We are still evaluating that.” Whoever that customer is has made the request rather recently. I would guess it’s Nvidia. The rumors are that TSMC are unable to increase their capacity this year by much more than they already have.

By: OGeneral

OGeneral — Sat, 13 May 2023 11:19:24 +0000

In reply to Timothy Prickett Morgan. I think you are pretty wrong on this since more and more are moving to basically just use hugging face models and infrastructure which completely separates from device specific code. Most people do not care along as it runs reliable and fast, they do not write any nVidia specific CUDA kernel, that's maybe 5% out there (which is neglectable)

By: Thomas Hoberg

Thomas Hoberg — Sat, 13 May 2023 09:02:19 +0000

In reply to Matt.

So is that what’s holding back Grace? Or is it that customers don’t want to put all their eggs into the same admiral?

It’s very odd that they were co-designed and now only the GPUs and DPUs are showing up in numbers…

By: Matt

Matt — Fri, 12 May 2023 20:06:10 +0000

I’ve read that the limiting factor for Nvidia’s H100 production is TSMC’s CoWoS packaging capacity. If so, the situation with AI datacenter GPUs now is much the same as with gaming GPUs during the crypto craze in 2021. When everyone is affected by the exact same bottleneck to production then the inability by the market leader to service the demand does not result in others to be able to pick up the slack. AMD was not able to make and sell particularly many extra PC gaming GPUs during the crypto craze because they just couldn’t get extra wafers and components. Nvidia is going to scoop up every drop of spare CoWoS capacity TSMC has and AMD is not going to be able to significantly increase their production. Nvidia can afford to pay more for the previously untapped capacity as they can charge higher prices to customers since their GPUs have far greater utility due to the maturity of the software ecosystem that surrounds them. The same goes for any AI ASIC that relies on TSMC’s CoWoS packaging, which, correct me if I’m wrong, is just about any of them that use HBM, including Intel’s Habana Gaudi 2, Intel’s datacenter GPUs, and Google’s TPUv4.