A History of NVIDIA Datacenter GPUs, from P100 to B200

The development of NVIDIA’s datacenter GPUs can be seen as a steady progression of increasingly capable hardware rather than a series of dramatic leaps. It began with the Tesla P100 from the Pascal generation. This GPU offered thousands of CUDA cores and high-bandwidth memory, but it did not yet include tensor cores, which would later become central for accelerating AI workloads. Even so, it established an important foundation for large-scale computing tasks.
With the Volta generation came the V100. It introduced tensor cores alongside more CUDA cores and faster HBM2 memory, making it significantly more suitable for training deep learning models. Researchers began to view GPUs as not just graphics processors but as specialized tools for machine learning and high-performance computing.
The Ampere generation added the A100, which broadened flexibility by supporting multiple precision formats, from FP32 down to INT4. This adaptability allowed it to balance accuracy and efficiency depending on the task. The A100, especially in its SXM form, became widely used in datacenters for both training and inference workloads.
The Hopper generation followed, bringing the H100. It increased the number of CUDA and tensor cores, expanded memory bandwidth, and provided features aimed at large language models and generative AI. The H100 NVL variant offered even more memory, which made it well suited for large inference workloads. Shortly after, the H200 extended this further by adopting HBM3e memory and increasing bandwidth, helping to handle larger and more complex models.
In parallel, NVIDIA introduced Superchips—systems that integrate Grace CPUs with GPUs through NVLink. This design allowed CPU and GPU memory to work more closely together, reducing bottlenecks in workloads that require very large datasets.
In 2024, NVIDIA announced the Blackwell generation. The B200 in particular used a dual-die design connected with NV-HBI, enabling high throughput and large memory capacity with HBM3e. It was designed for generative AI and other data-intensive tasks.
Looking ahead, Rubin is expected to follow Blackwell around 2026, bringing further improvements in performance and efficiency.
Taken together, the path from Pascal to Volta, from Ampere to Hopper, and now to Blackwell—and eventually Rubin—illustrates a clear trajectory: each generation incrementally adds more cores, more memory, and better interconnects, making GPUs progressively more capable of supporting the growing demands of AI and high-performance computing.
Subscribe to my newsletter
Read articles from Dmitry Noranovich directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
