Comments on: Stacking Up AMD MI200 Versus Nvidia A100 Compute Engines

By: Matt

Matt — Wed, 15 Dec 2021 20:09:58 +0000

In reply to Michael.

Arcturus (CDNA-1) has 25.6B transistors in 750 square mm on TSMC 7 while GA100 has 54.2B transistors in 826 square mm on TSMC 7. So NVIDIA has a much higher areal density of transistors than both CDNA-1 or CDNA-2. I won’t venture to guess why that is, but it is die area that’s the important scaling factor to look at, not the number of transistors.
As far as AMD’s ability to outperform the A100, it must be said that Aldebaran is a next generation part when compared to the A100 and those performance data are from supercomputing applications. Aldebaran is a part that caters to the supercomputer market while the A100 is not. It would be interesting to see Aldebaran’s performance in the areas where NVIDIA is commercially targeting their processors these days.

By: Matt

Matt — Wed, 15 Dec 2021 19:38:50 +0000

In reply to peter j connell.

NVIDIA has been incubating, patenting, and validating chiplets for years, as well. It is an industry-standard direction of moment. That’s well-known. It should not at all be controversial to say they strongly suspect NVIDIA will be moving to a chiplet architecture. The only thing is they likely won’t call them chiplets. In their research they seem to call them multi-chip modules, for example, this paper from 4 1/2 years ago: https://research.nvidia.com/publication/2017-06_MCM-GPU%3A-Multi-Chip-Module-GPUs

I’m not sure why you are claiming NVIDIA must use PCI-E to connect multiple GPUs when they have had NVLink for years. And their ability to do that with x86 CPUs is not a matter of technical ability but a matter of their competition with the x86 chip manufacturers meaning those manufacturers will not allow the technology on their chips. Finally you seem to be under the impression that AMD achieves its aggregate memory bandwidth of 3 TB/s across its infinity fabric, which has a bandwidth of 800 GB/s. I don’t understand that assertion at all.

Regarding NVIDIA’s access to the host memory and cache that should change in 2023 with Grace for supercomputers and perhaps CXL in data center, although it’s unclear to me whether the 2.0 or 3.0 specification will be when it is introduced in the data center.

By: Michael

Michael — Wed, 08 Dec 2021 09:57:20 +0000

Although the Mi250X uses 2 dies, they are quite small in comparison to an A100 and only use 7% more transistors overall. The fact that AMD have been able to out perform the a100, to a great extent in certain tasks over 2x, with only 7% more transistors is nothing short of a marvel, they’ve literally came from pretty much no ground in the data centre, to after just 2 generations of CDNA, they’re competing with their competitors top products.

By: peter j connell

peter j connell — Tue, 07 Dec 2021 11:02:39 +0000

I am no expert, but there seem some some astonishing ~”dont u worry about that” pontifications here.

” We strongly suspect that Nvidia will move to a chiplet architecture” – excuse me?

We are not talking some new fashion which can chosen & implemented on a whim.

Chiplets and vitally, Infinity Fabric, has been incubated, patented & validated since the early 2000s

It treats 2TB/s vs amd’s 3TB/s bandwidth as if they are like metrics which is absurd

One is a measure of a gpu onboard local bus. The other, the Infinity Fabric speed that can connect cores & cache on multi gpuS. Nvidia must revert to the snail’s pace of pcie for this.

Nvidia have very little say in the host’s ecosystem, unlike amd’s total control over co-developed cpu/gpu/platform ecosystems.