NVIDIA GB200 NVL72 - Next-Gen AI Supercomputer

Highlights

Breakthrough Performance for AI Workloads

LLM Inference

30X

vs. NVIDIA H100 Tensor Core GPU

LLM Training

4X

vs. H100

Energy Efficiency

25X

vs. H100

Data Processing

18X

vs. CPU

LLM inference and energy efficiency: TTL = 50 milliseconds (ms) real time, FTL = 5s, 32,768 input/1,024 output, NVIDIA HGX™ H100 scaled over InfiniBand (IB) vs. GB200 NVL72, training 1.8T MOE 4096x HGX H100 scaled over IB vs. 456x GB200 NVL72 scaled over IB. Cluster size: 32,768

Data processing benchmark: A database join and aggregation workload with Snappy / Deflate compression derived from TPC-H Q4 query. Custom query implementations for x86, H100 single GPU and single GPU from GB200 NLV72 vs. Intel Xeon 8480+ Projected performance subject to change.

Real-Time LLM Inference

Real-Time LLM Inference

The GB200 NVL72 delivers 30X faster real-time inference for trillion-parameter language models. Powered by second-generation Transformer Engine with FP4 AI and fifth-generation NVLink, it combines 1.4 exaFLOPS of AI performance with 30TB of high-speed memory in a single unified architecture.

Accelerated LLM Training

Accelerated LLM Training

Train large language models 4X faster with the second-generation Transformer Engine featuring FP8 precision. Fifth-generation NVLink provides 1.8TB/s GPU-to-GPU interconnect, complemented by InfiniBand networking and NVIDIA Magnum IO software for maximum throughput.

Sustainable AI Infrastructure

Sustainable AI Infrastructure

Liquid cooling technology enables 25X better performance than H100 air-cooled systems at the same power consumption. This advanced cooling solution increases compute density, reduces datacenter footprint, and minimizes water usage while enabling high-bandwidth, low-latency GPU communication.

Accelerated Data Analytics

Accelerated Data Analytics

Speed up database queries by 18X compared to CPU with high-bandwidth memory, NVLink-C2C, and dedicated decompression engines. The NVIDIA Blackwell architecture delivers 5X better total cost of ownership for enterprise data processing workloads.

Features

Technological Breakthroughs

Blackwell Architecture

Blackwell Architecture

The NVIDIA Blackwell architecture delivers groundbreaking advancements in accelerated computing, powering a new era of computing with unparalleled performance, efficiency, and scale.

NVIDIA Grace CPU

NVIDIA Grace CPU

The NVIDIA Grace CPU is a breakthrough processor designed for modern data centers running AI, cloud, and HPC applications. It provides outstanding performance and memory bandwidth with 2X the energy efficiency of today's leading server processors.

Fifth-Generation NVIDIA NVLink

Fifth-Generation NVIDIA NVLink

Fifth-generation NVLink delivers 1.8TB/s GPU-to-GPU interconnect bandwidth, enabling seamless communication across massive GPU clusters for trillion-parameter AI models and exascale computing workloads.

NVIDIA Networking

NVIDIA Networking

NVIDIA Quantum-X800 InfiniBand and Spectrum-X800 Ethernet provide the high-performance networking backbone for distributed AI training and inference, enabling efficient scaling across thousands of Blackwell GPUs.

Specifications

GB200 NVL72 1 Specs

GB200 NVL72GB200 Grace Blackwell Superchip
Configuration36 Grace CPU : 72 Blackwell GPUs1 Grace CPU : 2 Blackwell GPU
FP4 Tensor Core21,440 PFLOPS40 PFLOPS
FP8/FP6 Tensor Core2720 PFLOPS20 PFLOPS
INT8 Tensor Core2720 POPS20 POPS
FP16/BF16 Tensor Core2360 PFLOPS10 PFLOPS
TF32 Tensor Core2180 PFLOPS5 PFLOPS
FP326,480 TFLOPS180 TFLOPS
FP643,240 TFLOPS90 TFLOPS
FP64 Tensor Core3,240 TFLOPS90 TFLOPS
GPU Memory | BandwidthUp to 13.5 TB HBM3e | 576 TB/sUp to 384 GB HBM3e | 16 TB/s
NVLink Bandwidth130TB/s3.6TB/s
CPU Core Count2,592 Arm® Neoverse V2 cores72 Arm Neoverse V2 cores
LPDDR5X Memory | BandwidthUp to 17 TB LPDDR5X | Up to 18.4 TB/sUp to 480GB LPDDR5X | Up to 512 GB/s

1 Preliminary specifications. May be subject to change.

2 With sparsity.

Leading GPU Cloud Provider

Massive-scale GPU infrastructure with industry-leading performance and flexibility.