Comments on: How Did DeepSeek Train Its AI Model On A Lot Less – And Crippled – Hardware? https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Tue, 28 Jan 2025 00:50:57 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: itellu3times https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-246851 Tue, 28 Jan 2025 00:50:57 +0000 https://www.nextplatform.com/?p=145225#comment-246851 So it’s all mechanics, and really not a touch of actual theory.
But then, what is the theory behind LLMs? Doh!

]]>
By: Carl Schumacher https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-246845 Tue, 28 Jan 2025 00:03:51 +0000 https://www.nextplatform.com/?p=145225#comment-246845 If this is indeed “DeepFake”, then its one of the best engineered shorts (both in an energy and IT capital sense) in decades.

]]>
By: Tapa Ghosh https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-246833 Mon, 27 Jan 2025 22:33:03 +0000 https://www.nextplatform.com/?p=145225#comment-246833 “And here is another side effect: The V3 model uses pipeline parallelism and data parallelism, but because the memory in managed so tightly, and overlaps forward and backward propagations as the model is being built, V3 does not have to use tensor parallelism at all. Weird, right?”

This is mostly because of the small # of GPUs used, they can use expert parallelism as well, to eliminate the need for TP, if you used more GPUs, you’d need TP

]]>