This is a two-part series on the post-DirectX 10 history of ATI and Nvidia, the two largest commercial GPU vendors in the world.  A word of caution: though I have tried to make this accessible to as many people as possible, I must still give detail and be informative.  As such, you may find this series rather dense and technical.  For that I apologize.  This is meant to provide perspective for those more intimate with the constant clash between these two GPU vendors.

For the uninitiated, graphics cards are becoming one of the most vital parts of computers today, though they are relatively young components in comparison to others like the CPU, RAM or Hard Drive.  Beginning in the late 90s, companies started selling accelerator cards meant to drive 2D graphics through APIs like Microsoft’s DirectX.  Before this, graphics tasks were bound to the CPU, even for games.  To my generation, the notion of running high-end games and applications with only a CPU seems impossible, because in our lifetime GPUs are so important.  In the late 90s, having a GPU running 2D graphics, let alone later 3D accelerators was a luxury.  It wasn’t until games like Quake II and Half-Life that the benefits of 3D accelerators became clear.  Over the past decade, graphics cards have become the fastest growing segment in PC hardware, with new cards arising much more frequently than new processors.  In addition, the graphics card has become the most powerful hardware in a computer, though this is often left untapped unless by a powerful game.  The most powerful GPUs can have a processing power in excess of 2 TeraFLOPS (floating operations per second – a standard indicator of processing power) while the most powerful Intel processor doesn’t go beyond 100 GigaFLOPS.  Suffice to say, these are incredible pieces of silicon and PCB.

GPU cards were designed specifically to be compatible with Microsoft’s DirectX graphics API.  As such, their features were dependent on the abilities and features given by DirectX (look back to my previous post on graphics APIs to fill in the blanks on what DirectX actually is).  So when referring to new generations of graphics cards, they are often described by the version of DirectX they supported.  OpenGL was also used but was meant for professional applications before being adopted into games.  This article focuses specifically on the post-DirectX 10 era and it will be important to understand what differentiates DX10 from previous versions.

DX10 was initially released alongside Windows Vista in 2006.  Along with upgrading many existing features to more modern design, DX10 introduced the Unified Shader Model to graphics rendering.  Starting with DX8, graphics chips were broken down into multiple hardware elements called shaders that processed specific types of data – vertex, pixel and geometry data.  The Unified Shader Model threw out this old design of specific units and re-imagined hardware as a cluster of very small, very rudimentary processor cores linked together in sets of hundreds that could process any type of data that passed through.  This was the Unified Shader Architecture, and allowed for remarkable improvements in performance and efficiency.  It also allowed developers to rethink GPUs as not just graphics hardware but rather as massive vector processors, able to process large sets of redundant data at once.

ATI’s very first Unified Shader design you’ll never find in any consumer computer, but rather in Microsoft’s Xbox 360.  Known as C1, R500, and most popularly as Xenos, ATI was commissioned to build the GPU for Microsoft as a sort of proof-of-concept that the architecture worked.  The chip organized 48 shader units into SIMD cores that processed all data.  Though the chip was built with the Unified Shader Architecture, Microsoft released the Xbox 360 before DX10 was completed and while optimizations were made to the 360’s version of DX9, it never had full access to the features of DX10.  After ATI completed Xenos and finished a merger with Advanced Micro Devices (AMD) – a competitor to Intel – they began work on their first desktop DX10 card series, R600.  R600 was a rough patch for ATI, who was financially still reeling from their merger, because of late arrival and weak performance.  Though based on Xenos, R600 was drastically different with up to 320 stream processors (as they were now called) that were far weaker than before.  Such a large amount allowed for good scaling between high-end and low-end chips, but high-end cards were never fast or sophisticated enough to compete with Nvidia’s offerings.  Because R600 was several months late to a demanding market and could not compete in performance or price, most gamers chose Nvidia and for over 6 months ATI couldn’t manage to pull ahead.

However, by November of 2007, ATI finally re-defined their strategy.  Knowing that the time and money needed to build a new architecture capable of beating Nvidia would only push them farther behind, ATI instead re-fabricated R600 into the RV670 or “HD 3000 series”.  Re-fabrication allowed the cards to become cooler, less power-hungry and cheaper.  ATI’s new flagship GPU to replace the HD 2900XT was the HD 3870.  Notice how the naming scheme changed between these two generations – instead of using letter combinations like Pro, XT, and XTX to denote differences in performance, ATI now uses a third digit like 3, 5, and 7 to show variation.  While the HD 3870 and other variations were not as powerful as their Nvidia counterparts, they exceeded the performance of R600 while being cheaper.  HD 3000s became favorites among budget gamers and OEMs where their power/dollar value exceeded that of Nvidia along with early support for DX10.1.  The efficiency of the RV670 core also allowed ATI to create the HD 3870X2, the world’s first Dual-GPU, 1+ TeraFLOP graphics card.  Such a strategy allowed ATI to make lower power cores and then link two together in order to compete with Nvidia in terms of raw performance.  It is this niche strategy that has allowed ATI to return to profitability and real success.  The R700 series, coupled with 800 stream processors, has continued this strategy with the wildly popular HD 4870, not as powerful as Nvidia’s offerings, but considered one of the best value cards in 2008/2009 and the HD 4870X2 was the first GPU to break the 2 TeraFLOP barrier.

In October of 2009, ATI took the initiative and released the first DirectX 11 GPU: the HD 5000 series or “Evergreen”.  The HD 5870 has over 1600 stream processors with over 2.5 TeraFLOPs in power, while the dual-GPU HD 5970 peaks out at 4.6 TeraFLOPs, making it the most powerful slab of silicon available to consumers.  DX11 brings many new features to GPUs, most importantly native support for General Purpose GPU or GPGPU processing.  Because modern GPUs are in practice simply massive vector processors with remarkable power over CPUs, GPGPU processing allows this to be tapped by non-graphics programs.  Once fully leveraged by developers, every consumer can have a virtual supercomputer at their fingertips.  DX11 is not the only API to introduce this.  Apple has adopted this through their OpenCL API, Nvidia has been pushing this through CUDA, and ATI was developing their own solution called Stream SDK before adopting DX11 and OpenCL.  As of today, ATI remains the only DX11 graphics card vendor, well ahead of Nvidia’s solution and now the current market leader.

Check back in the coming days for my post on Nvidia’s last few years.