Listen to the blog...

Nvidia has recently unveiled a novel category of AI supercomputer with a substantial memory capacity. This new supercomputer is a variant of Nvidia DGX and is equipped with GH200 Grace Hopper Superchips and the NVLink Switch System. 

As per the organization’s statement, the AI supercomputer has been designed to facilitate the creation of massive, cutting-edge models for generative AI language applications, recommender systems, and data analytics workloads.

Let’s dive deeper and grab the Key points.

The New Wave Of AI: Nvidia’s AI Supercomputer 

The DGX GH200 supercomputer was officially unveiled at the Computex tech conference in Taipei. This cutting-edge system is fueled by 256 Grace Hopper Superchips, a fusion of Nvidia’s Grace CPU and a 72-core Arm processor, both engineered for high-performance computing, along with the Hopper GPU. 

The interconnection between the two entities is established through Nvidia’s exclusive NVLink-C2C interconnect, which operates at high speeds.

The DGX GH200 features a substantial collective memory capacity exceeding 144TB of HBM3 memory, facilitated by its NVLink-C2C interconnect technology. According to Ian Buck, the Vice President and General Manager of Nvidia’s Hyperscale and HPC Business Unit, the software perceives the processors of the system as a single, large GPU with a unified memory pool due to its streamlined design.

According to him, the system can be deployed and trained with the assistance of Nvidia in AI models that may necessitate memory exceeding the limits of a single GPU. According to him, there is a requirement for a novel system architecture that can surpass one terabyte of memory to facilitate the training of these massive models.

According to Nvidia, the performance of exaFLOP is achieved through the utilization of eight-bit FP8 processing. Currently, the predominant method for AI processing involves the utilization of 16-bit Bfloat16 (Brain Floating Point) instructions, which results in a doubling of processing time.

The utilization of NVLink in lieu of conventional PCI Express interconnects results in a sevenfold increase in the bandwidth linking the GPU and CPU while simultaneously reducing the interconnect power requirements to a fifth of the original.

The DGX GH200 is expected to be initially accessible to Google Cloud, Meta, and Microsoft for the purpose of investigating its potential for generative AI workloads. Nvidia has articulated its plan to furnish the DGX GH200 blueprint to cloud service providers and hyper scalers for additional customization of their infrastructure. The anticipated release date for the Nvidia DGX GH200 supercomputers is slated for around the end of the current calendar year.

Bundled With Enterprise Level Nvidia Software

The supercomputers are equipped with pre-installed Nvidia software, offering a comprehensive solution that encompasses Nvidia AI Enterprise, the core software layer for its AI platform, comprising frameworks, pre-trained models, and development tools. Additionally, the supercomputers are integrated with Base Command, a cluster management system designed for enterprise-level operations.

The DGX GH200 represents a groundbreaking achievement in the realm of supercomputing, as it is the first system to integrate the powerful Grace Hopper Superchips with Nvidia’s NVLink Switch System. This interconnected technology facilitates the seamless collaboration of the GPUs within the system, thereby enabling them to function as a unified entity. The earlier version of the system achieved the maximum capacity of eight Graphics Processing Units operating collaboratively.

Despite the considerable heat generated by the Hopper GPUs, the system is air-cooled while drawing 700 Watts of power. Nvidia stated that it is currently engaged in the internal development of liquid-cooled systems and is actively discussing the matter with its customers and partners. However, at present, the DGX GH200 is being cooled by fans.

As per Charlie Boyle, the VP of DGX systems at Nvidia, it seems that the current user base of the system is not yet prepared for the implementation of liquid cooling. “In the future, there will be designs that require liquid cooling, but we were able to keep this one air-cooled,” he added.

Conclusion

Nvidia’s latest product is poised to establish a new benchmark in the rapidly evolving field of artificial intelligence. This cutting-edge solution holds the potential to deliver unprecedented breakthroughs and progress in the domain of AI-powered technologies. 

The integration of AI is all set to revolutionize the very core of modern economies, serving as the digital engine that propels us into the future. It’s absolutely thrilling to see how top tech companies are or will be leveraging the advantages of this technology to create cutting-edge and robust large language models.

Sanjay Mehan| SunArc Technologies
Sanjay Mehan