Texture Mappers (1995-1999)
Fixed T&L, Early Shaders (2000-2002)
Shader Model 2.0/3.0 (2003-2007)
Unified Shaders (2008+)
Featureset determines where a card will go, not its year of introduction
Sapphire Atlantis Radeon 9700 128MB - 2002
ATI had never really been able to compete in the big boy races, it'd always been either 3DFX or Nvidia who'd been at the top of things. Once 3DFX screwed themselves, it was just Nvidia although ATI tried, they really did, but dismal hardware, horrible drivers and senseless design decisions (R100's 2x3 architecture was moronic and R200 sacrificed a lot of render quality) always meant they'd be second best in a two horse race.
Ther was another horse, however. ArtX was a company which was formed by ex-SGI engineers and hoped to get into the PC GPU business. A product was released, though it was somewhat poor and ended up in a forgotten ALI (Acer) Socket 7 chipset. ArtX then produced the underperforming "Flipper" GPU for Nintendo's underperforming "Gamecube" console, a 4x1 fixed function TCL chip which was R300's direct ancestor. ATI bought ArtX in 2000 and directed R300 development from there. In 1999 or so, ArtX had a very public spat with the website Ars Technica, where Ars proved that ArtX PR had lied, so ArtX invented fake witnesses and got them to post made up stories on the Ars website. The resulting PR backlash was a minor embarrassment.
Voodoo2, Geforce (NV10), Geforce3 (NV20) are all names of products which stood beyond the competition with ease. It was with some surprise that 'Radeon 9700' was added to the list, the R300 GPU. The R300 was, put simply, a monster. It not only introduced Shader Model 2.0, it excelled at it. Normally being first to market is a poison chalice: The next product will have had time to refine the implementation, learn from the mistakes you made. So mighty was the R300 that a Radeon9700 (or Pro), released July 2002, was still competitive in mid-2006, four whole years later! Such a long lifespan was, quite frankly, absolutely unheard of in an industry with six-month product cycles and yet even by June 2007, a Radeon9700 would still out-perform some low-end solutions, such as the GeForce 7300 or Radeon X1300. To be still comparable with in-production cards after five years just shows how ridiculously powerful the 9700 was when it was new.
By 2011, AMD's entry level was the Radeon HD 5450, which performed pretty much the same as the Radoen 9700 in games I had available to test. The 5450 had a much faster CPU (a 2.6 GHz Athlon X2) than the R9700 did (2.2 GHz Athlon XP) and performed similarly in Aquamark3 and Oblivion.
Despite an enormous shader advantage, Cedar - almost a decade newer - was not able to beat back the R300 due to its bandwidth deficit and lack of convincing ROP throughput ("Pixels" in the table).
|Product||GPU||Clock (MHz)||Pixels/s||RAM B/W||GFLOPS
|Radeon 9700 128 MB||R300||275||2,200||17,280||32
|Radeon HD 5450 512 MB||Cedar||650||2,600||12,800||104
Most of the R300's massive performance came from its 256 bit wide memory bus, enabling almost 20GB/sec of raw memory bandwidth. This, together with its angle-dependent anisotropic filtering (AF), meant that using AF on an R300 carried little to no performance penalty in an era when performance hits of up to 70% were the norm for enabling AF. Most other video cards had four render output pipelines (ROPs) with two texture mappers each (4x2 configuration) so only reached their peak performance when multitexturing was in use. R300 carried eight (!) ROPs with one texture mapper each (8x1), so was running at peak performance all the time.
The GPU wasn't all that different from ArtX/Nintendo "Flipper", the ROPs were very similar and the pixel shader obviously an extension of the fixed-function TCL in "Flipper". It appears, however, that ArtX owned the rights to the design and so needed to pay no royalties to Nintendo.
ATI were rumoured to be having yield problems from the sheer size of the R300 GPU, 110 million transistors compared to the 61 million of the NV25 (GF4Ti) and wanting a "300MHz+" clock, when NV25 only barely scraped 300MHz. However, overclockers need not have worried. Most R300s were stable, depending on cooling, to 350MHz and even beyond. This particular card ran perfectly happy at 356MHz GPU clock (on stock cooling!) and 300MHz RAM clock, the RAM used in both the Pro and vanilla was not good for overclocking. At 356MHz the R300 was (to use a technical term) taking the piss.
The specs for this particular card are a 270MHz GPU clock, a 277MHz RAM clock, 128MB of DDR-SDRAM (The R300 supported GDDR2, but never used it) and the usual VGA/DVI/TV connectors. The 9700 Pro was about 15% more expensive and featured a 325MHz GPU clock and a 310MHz RAM clock so for the performance was actually better value than the plain 9700. It also turns out the PCB I have is a later version, most were sold with a slighly different PCB. Just left of the GPU ASIC, most cards had an unfilled double pin header. Mine has an unfitted IC, labelled "U12". It is not clear what function this IC would have, and I can find no examples of similar cards with it fitted.
So successful was R300 that R4xx and R5xx are all based on it and the Xenos GPU in the Xbox360 is, at heart, descended from the R300 (the ROPs are identical). A moment of respect for a former king, please.
Drivers: Drivers included with WindowsXP (Service Pack 2), Linux (as r300).
Core: R300 with 8 ROPs, 1 TMU per ROP, 275MHz (2200 million texels per second)
RAM: 256 bit DDR SDRAM, 540MHz, 17280MB/s
Shader: 8x PS2.0, 4x VS2.0,
MADD GFLOPS: 29
I accidentally lost the high-res image for this, so have borrowed one of the same card from vccollection.ru. It is for illustration only and has no claim of title.
Gigabyte GV-N52128DE Geforce FX5200 128MB - 2002
This was about as slow as AGP video cards got and was a fairly standard FX 5200. Now the Geforce FX series weren't terribly great to begin with so one can imagine how bad the FX5200 was (about a tenth of the speed of a Radeon 9700, which was released about the same time). To be fair, it was very cheap and did beat integrated video...though not by much. While extremely slow it does still implement a full DirectX 9.0 featureset, so is compatible with Windows Vista's Aero Glass interface.
In a pure marketing move, most FX5200s were given 128MB of memory when both the memory and the GPU were extremely slow, warranting no more than 32 or 64MB. The memory on this card is bog standard (and incredibly cheap) 200MHz DDR, normally destined to go on PC3200 memory modules. An FX5200 loaded with the entire 128MB in use would be a slideshow measured in seconds per frame.
I'll let the specs speak for themselves.
Core: NV34 with 2 pipeline, 2 TMU, 200MHz (800 million texels per second)*
RAM: 64 bit DDR SDRAM, 400MHz, 3200MB/s
Shader: 1x PS2.0, 1x VS2.0
MADD GFLOPS: 5.5
The NV3x rasterisation pipeline was very weird. It could work on, in this case, four different pixels at once so long as they weren't being textured or part of a Z operation at the time. So the pipeline looks like "2 pipes, 2 TMU" or "4 pipes, 0 TMU" depending on how one looks at it. This meant the FX series could reach maximum fillrate quite easily and was technology NVidia aquired from 3DFX, to have been part of Spectre: NV3x could do stencil and colour operations "for free".
Thanks to Filoni for providing the part
Inno3D Geforce FX 5200 128MB - 2002
They don't really get much worse than this. Let's start with the memory, 128 MB of Elixir N2DS25616B-5T. These are 256 Mbit capacity, 16 bit wide DDR SDRAMs. They'd commonly be found on PC3200 or PC2700 DIMMs, since they (the "5T" model) were rated for CL 2.5 at 166 MHz and CL 3 at 200 MHz. The Gigabyte card above uses a different vendor, but the specs are identical.
DIMMs are 64 bits wide, so four of these would be needed for a 128 MB memory module, as cheap as it got back with DDR.
Oddly enough, while the dominant FX 5200 was the 128 bit model, both samples here are 64 bits wide. With 128 MB it too uses four RAM chips. The PCB, as you can see, has provision for two more (and two more on the rear), but these would be 128 Mbit DRAMs and would give the FX 5200 a 128 bit bus and 128 MB.
The GPU is an NV34, seen in every FX 5200 and the hilarious FX 5100. In this 64 bit model (even the FX 5200 was standard with 128 bit memory) the GPU is clocked to only 250 MHz. With a 2x2 pipeline and 166 MHz memory, it had just 2.7 GB/s of memory bandwidth, less than the TNT2 Ultra had several years before.
It was cheap, slow and... well, just cheap and slow.
Update: This actually went into use, providing video for a 3.0 GHz Pentium4-HT. In Aquamark3, it scored 7,300 (only 790 for graphics!), but then was overclocked to 298 MHz GPU and 432 MHz memory (without even trying, as it was still running passive). This improved its AM3 score immensely, to GPU 1,000 and total score 9,100, as the memory was overclocked by 130% of its standard value and all FX5200s, except the Ultra, were enormously memory limited. As shown, the card would not overclock any more than 298 MHz on the GPU. Even 300 MHz was just not quite there.
Some of the detail above was also modified, as it turns out the standard 64 bit RAM FX5200 runs DDR400 memory, but this clocks to only DDR333. The standard GPU is also 240 MHz, this one runs 250 MHz. It does, however, use memory capable of DDR400... and, via overclocking, even higher. I'm told that with a few minor hardware mods (e.g. adding a fan!), the NV34 GPU was quite happy going to 400 - 500 MHz.
Core: NV34 with 2 pipeline, 2 TMU, 250MHz (1,000 million texels per second)
RAM: 64 bit DDR SDRAM, 166MHz, 2700MB/s
Shader: 1x PS2.0, 1x VS2.0
MADD GFLOPS: 6.5
MSI FX5600XT-VTDR128 (MS8912) - 2003
The Geforce FX wasn't terribly good and we'll leave it like that. Part of this was that it had ATI's exceptional R300 to deal with: It was poor by comparison, but part of it was simply that the Geforce FX was underperforming on its own terms. The 5600 was refreshed once from the NV31 it appeared with to a "flip chip" NV36 and a corresponding bump in clock to 400 MHz.
It was priced alongside the Radeon 9600 family, all of which beat the FX 5600 very soundly in everything except Doom3.
At the time, two very big games were released, Doom3 and Half-Life 2. HL2 was optimised and coded for the Radeon 9700 (R300), Doom3 was a heavy OpenGL stencil and Z-buffer user, which Geforce FX was better at.
Valve Software famously refused to use Geforce FX's 64 bit pixel shader modes and coded everything in 96 bit for the R300, which the Geforce FX series had to work in 128 bit for. A fan-made patch for Half-Life 2 rewrote many of the shaders to "hint" down to 64 bit when it wouldn't be noticed, hence gaining large amounts of performance on the Geforce FX without any video quality loss. With that patch, the FX 5600 was only slightly behind the Radeon 9600.
But who pays the same money for less product?
Core: NV31 with 4 ROPs, 1 TMU per ROP, 325 MHz (1,300 million texels per second)
RAM: 128 bit DDR SDRAM, 550MHz, 8,800MB/s
Shader: 2x PS2.0, 1x VS2.0
MADD GFLOPS: 8.5
Thanks to Ars AV forum member SubPar
ATI Radeon 9650 256 MB (Mac) - 2004
This rather bulky thing was common in PowerMac G5 machines way, way back. Apple loved to confuse people and hide the actual hardware which was on sale: The 9650 was significantly slower and more cheaply made than the 9600XT which was also available on G5 machines! You'd be spending six hundred US dollars for one of these.
It clocked in at 400 MHz on the GPU, which controlled 270 MHz DDR memory at 128 bits wide. The GPU was fitted with a single R350-based quad (four pixel pipelines), two vertex shaders and four pixel shaders. It was capable of a shader throughput of about 12 MADD GFLOPS.
The Radeon 9600 PRO ($120) was more or less the same, but with 300 MHz DDR memory, so it was in fact faster.
The RV351 GPU was simply the RV350 with a Mac BIOS on its external BIOS chip - in terms of features and performance, it was identical to the RV350 in the Radeon 9600. For the Mac edition, a larger heatsink than usual was fitted (cooling in a G5 was horrendous) and it was given dual-DVI, while the S-video output was stripped off: Macs were intended to use highly profitable Apple displays or nothing. One of the DVI outputs was dual-link, able to run a 30", 2560x1600 Apple Cinema display, but the other was single link and limited to 1920x1080 or so.
Adding insult to injury, ATI went and sold the EXACT SAME CARD as the "Radeon 9600 Pro Mac and PC edition", with dual-firmwares but without the actual Radeon 9600 Pro's 600 MHz memory clock. ATI released it at $199 USD, it was usually available around $170 or so.
The previous PowerMac G4s commonly had fearsome Radeon 9700 GPUs, which were far more powerful than these, while the high end G5s had Radeon X800XTs, but Apple being Apple, the Mac edition of the X800XT ran its GPU at 473 MHz, the real version was rated at 500 MHz. It also cost - I shit you not - four and half times more.
Mac video "solutions" were full of plain ol' PC video cards with slightly different BIOS firmware and down-rated. It was quite common for OS X suckers to go out, buy the PC version, and flash a Mac BIOS on it.
For example, this Radeon 9650, bought from ATI by Apple and resold to you, would have been $599.95. The Radeon 9600 PRO (slightly faster) retailed around $120, the Radeon 9600 Pro Mac and PC edition - identical to this card, but officially and genuinely from ATI and guaranteed comaptible with the PowerMac G5 was $199 at launch. So you could give Apple $400 for... well... free money for Apple is always good, right?
What about those wanting REAL POWER? (Caps!) ATI would sell you the workstation-class FireGL X1-256, a Radeon 9700 Pro with 256 MB on board and fiendishly powerful for $540. By 2004, the FireGL X3 was available, at a mite over $1,000, and the fastest workstation class video card in the world.
Core: RV351 with 4 ROPs, 1 TMU per ROP, 400 MHz (1,600 million texels per second)
RAM: 128 bit DDR SDRAM, 540 MHz, 8600 MB/s
Shader: 4x PS2.0, 2x VS2.0
MADD GFLOPS: 12
Thanks to Ben
Gainward GeForce 6600 128MB AGP - 2004
With an 8x1 pipeline clocked at 300MHz and 128 bit, 250MHz DDR, this had a pixel fill rate of 2400 megapixels per second and a memory bandwidth of 8000MB/sec. Looking closely at the core in the big image, it can be seen to be labelled "GF6800-A4", this quickly labelling it as the A4 revision of the NV40, most likely intended for a GF6800. Maybe it was a little faulty or maybe Nvidia just made too many of them, who knows. In whichever case, two of the 4x1 blocks were disabled to cut it down from 16x1 to 8x1 and it was plonked behind an AGP bridge chip (the little one at the bottom) to bridge the NV40's native PCIe to AGP.
This bridged arrangement was collectively known as NV43 and used in the AGP versions of GeForce 6200, 6600 LE, 6600 and 6600 GT. It wasn't a great performer, the Radeon 9700 earlier on this page would quite easily beat it, but it did perform well with Shader Model 3.0 (which the 9700 didn't support) and was a good way of gaining SM3.0 support on an entry level to mainstream performance card but SM3.0 using games were just too intensive for the meagre 8GB/sec memory of the 6600.
The fate of this card, after being kindly donated by Al Wall, was to have its AGP to PCIe bridge chip damaged during a heatsink replacement. It ended up at the bottom of a storage box until 2016, when it was finally discarded.
Thanks to Al for his kind donation of the hardware
Core: NV40 with 8 ROPs, 1 TMU per ROP, 300MHz (2400 million texels per second)
RAM: 128 bit DDR SDRAM, 500MHz, 8000MB/s
Shader: 8x PS3.0, 3x VS3.0
MADD GFLOPS: 28
Club 3D GeForce 6600 256MB AGP - 2004
In general, this is nearly identical to the Gainward above, but the GPU is labelled "GF6600" and it has 256 MB onboard: Very likely, it is a later model from 2005 or so. When testing this, and seeing the "256 MB" on the pre-POST VGA BIOS splash, I had high hopes for the memory on this.
Nvidia cards have always been a diverse lot, with plenty of completely non-standard cards from the different vendors, so the hope was that this'd have faster memory than normal. Instead it had slower memory! Club 3D had used standard DRAMs intended for PC3200 DIMMs. Cheap as all hell, but left the GPU starved.
In the same system (a 3 GHz P4 Northwood with HT), the Radeon 9700 above could score 29,000 in Aquamark3, but this thing only managed 23,000. For what it's worth, the same Radeon 9700 coupled to an Athlon XP 3200+ was pushing well over 40,000. As an AthlonXP 3200+ and a Pentium4 3.0 are very similar, I suspected something was afoot. True enough, Throttle Watch told me that the P4 was throttling anything between 15% and 20% while running the tests. One heatsink-clean (eww) later, we had 25,200 out of the Geforce 6600 and 39,000 from the Radeon 9700. For comparison, the under-performing Radeon HD4250 IGP built into a cheap single-core AMD laptop (2.4 GHz) got 32,400.
The GPUs themselves are of quite similar spec, in fact this GF6600 has a small advantage in terms of clock rate and shader model, as well as twice as much memory... But memory speed means much more than size. The R9700 could push over 17 GB/s, this thing just 6.4 GB/s.
Core: NV40 with 8 ROPs, 1 TMU per ROP, 300MHz (2400 million texels per second)
RAM: 128 bit DDR SDRAM, 400MHz, 6400MB/s
Shader: 8x PS3.0, 3x VS3.0
MADD GFLOPS: 28
EVGA (?) GeForce 6800 GT 256MB PCI-E - 2004
Very roughly the same performance as the Radeon X800XT below, but also sold quite a bit worse than expected. Why? There were thirteen different models! In order of performance, they were 6800XT, 6800LE, 6800, 6800 GTO, 6800GS, 6800GT, 6800 Ultra, 6800 Ultra Extreme. Additionally, the 6800 GT, Ultra and Ultra Extreme had both AGP and PCI-E versions.
The lowest end, the 6800XT, was actually slower than the fastest 6600, the GT, so nobody knew what the hell they were buying, sales were poor, so Nvidia introduced even more models.
Nvidia introduced Shader Model 3.0 with the Geforce 6 series, which truthfully was just a minor extension to SM2.0 to relax instruction limits and introduce better flow control. The basic NV40 GPU was a 16 ROP, 16 pixel shader, 6 vertex shader device which was unfortunately dogged by enormous (for the time) power consumption and Nvidia had made such a complex GPU that it couldn't actually be manufactured very well (NV would repeat this with Fermi). The full NV40 GPU was only enabled in the 6800 GT, Ultra and Ultra Extreme and even then it was clocked quite low, no Geforce 6800 ever went faster than 450 MHz (the hilariously expensive Ultra Extreme) and most clustered between 350 and 400, while ATI's R420 was topping 500 MHz with ease.
NV40 was over-ambitious and this made it expensive and power hungry. Additionally NV had still not caught up to ATI's phenomenally fast R300-based shaders, so the very fastest Geforce 6 still had about three quarters of the shader power of its Radeon competitor. NV40's derivation, G70 used in the Geforce 7 series, corrected these problems.
Thanks to Kami
Core: NV40 with 16 ROPs, 1 TMU per ROP, 350MHz (5,600 million texels per second)
RAM: 256 bit GDDR3 SDRAM, 500MHz, 32,000MB/s
Shader: 16x PS3.0, 4x VS3.0
MADD GFLOPS: 66
Radeon X800XT 256MB VIVO AGP - 2004
The R420 GPU was a stop-gap solution never really intended for release. The actual R420 was to become the R6xx series (Radeon HD2xxx) and was plagued by delays and problems from day one, problems that even when it was finally released it was barely working at all.
R420 as released was R390, basically two R360s in tandem on the same die and a sort of contingency plan. When NV40 was much, much faster than ATI had expected, R390 had to be expedited to release, yesterday. It took the R420 codename and was quickly adapted to a 130nm TSMC process. However, its age showed.
At heart it is a direct descendant of R300 (see above) and it shares many of the same shader limitations. It did relax some limits, it wasn't full Shader Model 3.0, it was "2.0b". Shader Model 3.0 was really hyped up by Nvidia but, to developers, it was more a "2.1" version, didn't really add all that much. In general anything possible on SM3.0 was also possible on SM2.0b of the Radeon X series.
The GPU design also showed its R300 heritage: Vertex shaders were decoupled from the core (so every R4xx product had six vertex shaders) but pixel shaders were not. Each quad (four ROPs) had four pixel shaders and so this 16x1 ROP design had 16 pixel shaders. This was the exact same design first introduced in the Radeon 9700, which had four 'global' vertex shaders and four pixel shaders per quad.
The X800XT and Geforce 6800 Ultra traded places at the top of benchmarks throughout their production life, neither one having a decisive advantage. NV40 was a more efficient, more elegant part, but it was also limited to a top speed of 400 MHz and had a sizeable shader performance disadvantage.
In terms of raw pixel shader power, R420 was a little less efficient than NV40 (because NV40 could use SM3.0 shaders) but clock for clock about 30% more powerful. When running SM2.0 shaders, NV40's efficiency savings evaporated and the raw performance advantage of R420 (which was also clocked higher) came into play.
Core: R420 with 16 ROPs, 1 TMU per ROP, 500MHz (8000 million texels per second)
RAM: 256 bit GDDR3 SDRAM, 500MHz, 32000MB/s
Shader: 16x PS2.0b, 6x VS2.0
MADD GFLOPS: 94
eVGA Geforce 7900 GTO 512MB - 2006
The 7900 GTO was something of a surprise when it was released in June 2006, it had not been announced, no PR was sent and suddenly vendors had the outrageously cheap 7900 GTO in stock. It is, for all intents and purposes, an underclocked 7900 GTX made for OEM customers. It had the 7900GT's memory bandwidth and the 7900GTX's GPU speed. The GTX has 800MHz GDDR3 memory, the GTO has merely 660MHz GDDR3 memory, giving the GTO 41.3GB/s memory bandwidth to its 512MB of onboard RAM. Both have a 650MHz GPU, 24 ROPs, 24 pixel shaders, 8 vertex shaders and Shader Model 3.0 support.
Apparently, some Geforce 7900 GTX cards had faulty RAM so rather than absorb the losses, Nvidia recalled them, flashed the BIOS to underclock the RAM, and patched the hardware ID from "0290" (7900 GTX) which was in the hardware to the "0291" that the much slower Geforce 7900 GT used. Under a sticker on the back of this eVGA card is the designation "Geforce 7900 GTX 512MB" but the sticker says "Geforce 7900 GTO 512MB". The RAM problem remained, however, and while the GTX used the very same RAM chips (most GTOs would overclock that far), this one will not pass 1460 MHz memory, the GTX was clocked at 1600 MHz.
The G70/G71 GPU is the 'full version' of the GPU in the Playstation3 and a development of NV40, the Geforce6 series base GPU. The cooler on this card has been replaced by a Zalman CNPS "Flower" which, rather annoyingly, doesn't have the correct fan fitting for the video card so can't be automatically controlled by the video card.
The card was a monster to put it simply. It performed within a hair of the much more expensive 7900 GTX and beat the 7950 GT by that same hair. This card eventually died in August 2011 when the third party heatsink's fan connector failed, leaving the GPU without cooling for several days, by which time it had suffered fatal damage.
Update: In April 2012, it was resurrected. Turns out a capacitor (the right-most on the image) had been knocked off the PCB. Replacing it with a large electrolytic from the box 'o spares means it works normally. Some boots don't detect it, but this may well be an artefact of the rather cruelly bodged up system it's in (including the Opteron 165 from the CPU section!).
Update 2!: In 2015 it was donated to Ben (the same guy behind the PowerMac G5's GPU) after sitting on a shelf for a while, and turned out to not work anymore. It ended up in Ben's bin.
Core: G71 with 16 ROPs, 24 TMUs, 660MHz (15.7 billion texels per second, 10.4 billion pixels per second)
RAM: 256 bit GDDR3 SDRAM, 1320MHz, 42240MB/s
Shader: 24x PS3.0, 8x VS3.0,
MADD GFLOPS: 301
The 300+ GFLOPS of the G71 was originally thought to be a typo, G70 could only peak at perhaps 180 GFLOPS, G71 optimises commonly used loop and branch functions to stop them stalling the execution units, so doesn't actually have more hardware available but it can use what it has much better than G70. In shader heavy benchmarks, G71 will wipe the floor with G70 even at the same clock.
Just what is a pixel shader anyway?
Nvidia and ATI had been, for a long time, adding slightly programmable portions to their GPUs. NV15 (Geforce 2) had a bank of register combiners, which is part of an ALU, the part of a CPU that "does the work". It used this for its triangle setup and T&L engine, drivers were able to customise how it worked for different games and so gain maximum performance.
These devices were called "shaders" in engineering parlance, since their most common use was to solve lighting equations per-pixel. Their origin can be traced as far back as the Voodoo3's guardband culler, a piece of hardware which assisted in rotating triangles in 3D space to be correctly oriented with respect to the view frustrum.
The origin in the lighting and transformation engine gave us two types of shader. One was dedicated to operating on RGB data, textures, pixels, colours; This was the pixel shader and came from the device used to solve lighting equations.
The other was higher precision but slower and more complex, it was used to alter vertex positions, it came from the transformation and setup part of the engine.
Microsoft saw what was going on (as they control DirectX, they have a very big say in what GPUs can and cannot do) and decided that a standardised model for what a shader should be able to do was needed. DirectX 8.0 included shader model 1.0 which was based around the capabilities of the shaders in the Geforce 3. An earlier version, shader model 0.5, was based around what a Geforce 2 could do but was deemed too inflexible to expose to programmers in the Direct X API. It can be addressed with some NV specific OpenGL extensions, but few used them.
This control of the shader models by Microsoft had quite an unintended effect on the industry. Shader model 1.0 was extended to 1.1 (Geforce 3), 1.3 (Geforce 4) and 1.4 (Radeon 8500) but it was a limited hack based around a simple transformation and lighting unit.
Shader model 2.0 was the birthplace of the modern GPU and its harbinger was the ATI R300 (see below in the Radeon 9700). Nvidia also had a very flexible shading part, NV30 (Geforce FX 5800) which was in many ways far more powerful than the R300 (while slower in raw crunch rate, it could do more in fewer instructions and had fewer restrictions). The two were, however, solving the same problem in different ways. Microsoft did not want to split the shader model in half, Direct X was only ever going to support one shader model and it was ATI's. SM2.0 was based around what the R300 could do so NV30 was left having to emulate capabilities it didn't have.
R300 was fixed at 24 bit precision in SM2.0, but NV30 had 12, 16 and 32 bit modes - no 24 bit. NV30 had to emulate 24 bit using its 32 bit mode which was just less than half as fast as the 16 bit mode and often gave no advantage.
Shader Model 3.0 was designed around what NV40 could do, putting ATI at a considerable disadvantage, their extended SM2.0 (SM2_b, seen in R360 and above, which is R9600, R9800 and the Radeon Xx00 family, such as the Radeon X800 XT I used to have) was nowhere near as powerful as Nvidia's SM3.0.
Finally, Shader Model 4.0 with DirectX 10 unified the shaders, which had become as complex as simple CPUs anyway, doing away with vertex shaders and pixel shaders to provide a single unified shader. This was good, very good. Most games used a 80:20 split between pixel and vertex shaders, but occasionally they'd reverse it, which caused momentary losses of framerate. This meant that hardware was made for the most common case: The G71 in the Geforce 7900 series (see below) had 24 pixel shaders and 8 vertex shaders with a pixel shader performance of about 250 GFLOPS (billion floating point operations per second) but it had significantly poorer vertex performance, everything did, because vertex operations are so much more complex than simple RGB pixel operations and it had fewer units, games only used 20% of their shading time doing vertices.