The GPU Hierarchy: Nvidia GeForce RTX 50-Series Graphics Card Performance Hierarchy

The Nvidia Blackwell architecture mostly rehashes Ada, using the same process node. Only the RTX 5090 stands out as a major (-ly expensive) upgrade.

Welcome to the modern era of Nvidia graphics cards, courtesy of the Blackwell architecture. Except, if we're being honest here — unlike Nvidia — not a whole helluvalot has changed architecturally with Blackwell relative to Ada Lovelace. We can sum up the major upgrades quite quickly.

First, Blackwell has native FP4 support on the tensor cores, which as of yet has only been used in a handful of applications, like a special version of AI image generation using Flux built into a special version of UL's Procyon benchmark. Blackwell also offers native FP6 support, which is sort of a hybrid between FP4 and FP8 that can potentially reduce memory requirements, but our understanding is that it's not really any faster than just using native FP8 operations.

Blackwell does offer some new features for ray tracing applications, but as with all other ray tracing tools, developer uptake can often be quite slow, particularly in regards to new games using the features. An enhanced triangle cluster intersection engine allows Mega Geometry (a new buzzword from Nvidia!) to better render massively complex scenes without bogging down. It feels a bit to me like the old over-tessellation approach where Nvidia got some games (like Crysis 2 or 3, IIRC) to utilize massive amounts of tesselation... on flat surfaces! Why? Because AMD's GPUs struggled mightily with the workload so it made Nvidia's GPUs look better.

Linear Swept Spheres is another new RT tool to better handle things like hair. HairWorks, anyone? We don't doubt there's some potential use here, but it just feels like Nvidia is reaching to find things to make the new hardware "better" than the old hardware.

The only other major changes with Blackwell are the addition of an AI Management Processor, to help the GPU better schedule and utilize the various resources. And there are some tweaks to the display output engines include hardware flip metering, which is basically only used to try and help multi-frame generation. As for MFG, I'm 100% certain most of it could run on Ada Lovelace GPUs with hardly any effort... but then how could Jensen make the claim that "an RTX 5070 is as fast as an RTX 4090?" LOL

Okay, let's throw Nvidia a bone. Blackwell also heralds the first generation of GPUs to support GDDR7 memory. The boost in bandwidth is awesome. What would have been really awesome is if Nvidia, as the sole current user of GDDR7 memory, had told the DRAM companies to just skip the 2GB (16Gb) chips and only manufacture 3GB (24Gb) chips. Those chips exist, they're used in the RTX 6000 Pro Blackwell Edition as well as the RTX 5090 Laptop GPU, but the current supply isn't quite sufficient and so all the consumer desktop GPUs get saddled with 2GB chips.

That's perhaps excusable for the 5090, as 32GB is certainly sufficient for any non-AI workloads we're likely to run in the next four years... but it totally kills the potential of lower tier GPUs like the RTX 5050, 5060, 5060 Ti 8GB, and even the 5070. Having 50% more VRAM capacity on every one of those GPUs would have been incredibly awesome, so no wonder Nvidia held back. (Cue the mid-cycle Super refresh next year.)

DisplayPort 2.1 with full 80 Gbps bandwidth support is also present, incidentally. It's really not as important as you might think. Because all the monitors that support stuff like 4K at 240Hz also support DSC (Display Stream Compression), which means even the old RTX 20-series can run such displays at 4K 240Hz. I've done it, and frankly I couldn't tell the difference between DSC and non-DSC operation on the newer GPUs (which limits the refresh rate the 120 Hz at 4K.)

Otherwise, Blackwell and the RTX 50-series end up mostly being a moderate bump in performance at every level when compared to the RTX 40-series GPUs. Every level except at the top, that is, because the RTX 5090 is a legitimate beast of a GPU. Nvidia went from a 608 mm^2 die on the RTX 4090 with 128 streaming multiprocessors (SMs) enabled and 1 TB/s of bandwidth, to a 750 mm^2 die with 170 SMs and a wholloping 1.8 TB/s of bandwidth. It's the first full 512-bit memory interface from Nvidia since the GeForce GTX 280 back in 2008 — and only the third time Nvidia has done a 512-bit single GPU interface ever. The net result is that, on paper, you get 77% more memory bandwidth than the outgoing 4090, with 27% more compute.

Everything else in the RTX 50-series lineup ends up being more incremental over the prior generation. The 5080 has the same 256-bit interface and 16GB of VRAM as the 4080, with only a few more SMs enabled. The 5070 Ti looks good against the 4070 Ti, less so against the newer 4070 Ti Super as that has the same 16GB, so again it's just a handful more SMs. The 5070 lands between the 4070 and 4070 Super in SM counts, with the same 12GB of VRAM on a 192-bit interface. And everything below the 5070 has a 128-bit interface with 8GB of VRAM, except for the double-the-memory 5060 Ti 16GB.

Here's the sortable table of performance, power, and efficiency alongside the specs. Prices are the lowest we could find at a reputable retail outlet (e.g. Amazon, Newegg, B&H Photo, and similar), rather than using eBay prices since these newest GPUs should all be readily available online without resorting to eBay's garbage. Prices are mostly similar to the outgoing 40-series counterparts, so if you don't have an RTX 40-series card, these are the best Nvidia GPUs to pick up right now.

Nvidia GeForce RTX 50-Series Blackwell GPU Performance

Graphics Card	Price (MSRP)	Overall Performance	Value (FPS/$)	4K Ultra	1440p Ultra	1080p Ultra	1080p Medium	Power (Watts)	Efficiency (FPS/W)	Specifications
GeForce RTX 5090	$2,280 ($2,000)	146.3	0.064	95.3	141.2	163.9	207.9	416.3	0.352	GB202, 21760 shaders, 2407MHz, 32GB GDDR7@28Gbps, 1792GB/s, 575W
GeForce RTX 5080	$1,000 ($1,000)	107.6	0.108	59.8	99.5	127.9	175.9	264.7	0.406	GB203, 10752 shaders, 2617MHz, 16GB GDDR7@30Gbps, 690GB/s, 360W
GeForce RTX 5070 Ti	$750 ($750)	97.2	0.130	52.1	88.7	116.8	165.2	257.1	0.378	GB203, 8960 shaders, 2452MHz, 16GB GDDR7@28Gbps, 896GB/s, 300W
GeForce RTX 5070	$524 ($550)	79.0	0.151	40.1	70.4	97.6	141.5	215.7	0.366	GB205, 6144 shaders, 2512MHz, 12GB GDDR7@28Gbps, 672GB/s, 250W
GeForce RTX 5060 Ti 16GB	$430 ($430)	60.5	0.141	29.4	53.7	75.8	112.0	150.2	0.403	GB206, 4608 shaders, 2572MHz, 16GB GDDR7@28Gbps, 448GB/s, 180W
GeForce RTX 5060 Ti 8GB	$335 ($380)	45.6	0.136	14.0	42.8	65.4	109.8	134.6	0.339	GB206, 4608 shaders, 2572MHz, 8GB GDDR7@28Gbps, 448GB/s, 180W
GeForce RTX 5060	$296 ($300)	40.1	0.136	12.4	37.8	57.5	95.9	128.4	0.312	GB206, 3840 shaders, 2497MHz, 8GB GDDR7@28Gbps, 448GB/s, 160W

GPU Testbed
AMD Ryzen 7 9800X3D CPU
Asus ROG Crosshair 870E Hero
G.Skill 2x16GB DDR5
Crucial T705 4TB SSD
Corsair HX1500i PSU
Cooler Master 280mm AIO

As with the RTX 40-series, I ran every single test, multiple times, on each GPU. And that causes some immediate issues with certain results. The RTX 4090 was already smacking into CPU limitations at 1080p, so making a chip that's potentially 30% faster just means the CPU bottleneck becomes even more pronounced. On the bottom of the GPU totem pole, we also have the RTX 5060 and RTX 5060 Ti 8GB basically sputtering and dying at 4K ultra, and often struggling at 1440p ultra. (Don't worry: I intend to pick up an RTX 5050 at some point, just to punish myself further....) The goal is to have meaningful measurements of overall performance, using the combined 1080p, 1440p, and 4K results.

The GeForce RTX 5090 reigns supreme, which should surprise no one as the RTX 4090 remains the second fastest GPU overall. AMD and Intel aren't even bothering to try and compete with the halo Nvidia GPUs, in part because both companies know that PC gaming enthusiasts willing to spend $1,500 or more on a GPU are almost all buying Nvidia hardware. But it's not always the landslide victory that the specs would imply. 4K ultra shows a massive 35% lead for the 5090 over the 4090, and a much larger 59% gap relative to the second string RTX 5080. At 1440p ultra, however, the margins shrink to 23% over the 4090, and 42% over the 5080. 1080p ultra shows even less of a performance advantage, just 13% and 28% over the 4090 and 5080, and at 1080p medium only five of the tested games showed a noticeable performance advantage for the 5090 over the 4090, with the rest being completely CPU limited — there's only an 8% uplift relative to the 4090, but a larger 18% increase over the 5080.

The RTX 5080 sits in second place among the 50-series, at least until the seemingly inevitable 5080 Super arrives next year with 24GB of VRAM. Until then, if you want more than 16GB on an Nvidia GPU, the 4090 and 5090 are the only two options. 16GB remains sufficient for nearly any current game, but we'll start to see games push beyond 16GB at 4K ultra in the coming years. Compared to the prior generation 4080, the 5080 only delivers a 10% performance uplift, and against the 4080 Super that drops to just 6% (but 12% at 4K). Without enabling MFG, which has some serious snakeoil marketing, the 50-series just doesn't offer that much more performance than the 40-series.

The 5070 Ti has the same 16GB as the 5080, just with lower clocks and fewer GPU shaders. It's mostly about the same performance as the prior generation 4080, for a decently lower price. Or at least, the cheapest 5070 Ti cards are now available at MSRP, which took about six months after the cards first launched. Depending on the prior generation comparison you make, the 5070 Ti looks pretty decent or pretty mediocre. It's 19% faster than the 4070 Ti, for a slightly lower MSRP, but it's also only 11% faster than the 5070 Ti Super — again, for less money. The generational performance gains are very muted with Blackwell, in other words.

GeForce RTX 5070 has a memory handicap, the same as the prior generation 4070 Ti, 4070 Super, and 4070. 12GB should be "enough" for most games, but we're definitely encountering situations where 4K ultra can exceed that mark and performance takes a bigger hit. At least the 5070 doesn't cost more than the prior gen 4070, so getting 20% more performance for the same price looks pretty decent. Note also that power use went up 18%, however, so really it's just Nvidia pushing more watts to get more performance. Efficiency gains are minimal at best.

The RTX 5060 Ti, like its predecessor RTX 4060 Ti, tells the story of Dr. Jekyll and Mr. Hyde. The 16GB variant does quite well overall, though it's still a large 24% drop in performance compared to the RTX 5070. So, it's basically linear performance scaling with price right now. The 8GB card meanwhile ties the 16GB card at 1080p medium, but then falls 14% behind at 1080p ultra, and that grows to a 20% deficit at 1440p ultra, and finally an exceptionally awful 53% drop at 4K ultra. Granted, it's not supposed to be a 4K card, but for $50 more, I just can't countenance the existence of the 8GB card.

Speaking of 8GB cards, the RTX 5060 has the same 8GB as both the 5060 Ti 8GB along with the RTX 4060 and the new whipping boy RTX 5050. It's only 12% slower than the 5060 Ti 8GB, while costing 21% less, so I guess it has that going for it. I don't have 5050 data yet, but testing so far reveals performance that's just a bit behind the prior gen 4060. Either way, you really don't want to run maxed out settings on an 8GB GPU these days, unless you're only running games made before about 2022.

Here are the charts for the overall performance tests of the RTX 40-series. Once I've wrapped up all the individual families, I'll see about putting together the official monolithic GPU benchmarks hierarchy to showcase how the past four generations of Nvidia, AMD, and even Intel GPUs stack up. Until then, here are the Ada results.

Sunday, September 7, 2025

Nvidia GeForce RTX 50-Series Graphics Card Performance Hierarchy

The Nvidia Blackwell architecture mostly rehashes Ada, using the same process node. Only the RTX 5090 stands out as a major (-ly expensive) upgrade.

No comments:

Post a Comment