If you lot are in the market for a new computer, or thinking of upgrading your current system, choosing the right CPU can exist a daunting notwithstanding incredibly important task. With dozens upon dozens of CPU models available from Intel and AMD, each with their own unique ready of specifications, it can exist hard to determine which volition give you the best possible operation inside your budget.

Editor's Note:
Matt Bach is the head of Puget Labs and has been function of Puget Systems, a bazaar builder of gaming and workstation PCs, since the early days. This commodity was originally published on the Puget blog.

While hardware review sites like TechSpot do a smashing job testing and comparing different CPUs, unless they specifically criterion the applications you personally apply their results may not accurately reflect the performance that you would see. After all, every bit reliable equally review numbers may be, if reviewers were to test every possible application they just would non exist able to complete their testing by the time the CPU becomes obsolete.

When yous are choosing a CPU, there are two primary specifications y'all need to pay attending to that define the relative functioning of CPUs:

  • The frequency is how many operations a single CPU core can complete in a second (how fast it is).
  • The number of cores is how many physical cores at that place are within a CPU (how many operations information technology can run simultaneously).

This doesn't take into business relationship whatsoever differences in architecture (AMD versus Intel, Haswell versus Ivy Span, etc.) but when comparing two CPUs from the same family they are the two main specifications that make up one's mind the relative performance capability of a CPU.

If your software but uses a single cadre, the frequency is a decent indicator of how well a CPU will perform. However, if your software is able to utilize multiple CPU cores it becomes very hard to estimate the functioning of different CPU models since almost no program is going to be 100% efficient at using those cores. The trick is to make up one's mind exactly how efficient your program is at using multiple CPU cores (information technology's parallelization efficiency) and use that number to gauge the performance of dissimilar CPU models.

To summate the parallelization efficiency, you demand to use a mathematical equation called Amdahl'southward Police force. At Puget Systems, we were first introduced to this equation near a year and a half ago when we hired a Dr. Donald Kinghorn to help us get established in the scientific computing market. He has been invaluable as a resource in that segment, but his noesis has likewise been useful in many means nosotros never anticipated -- including the practical application of Amdahl'southward Police.

What is Amdahl'due south Law?

At the almost basic level, Amdahl'south Law is a way of showing that unless a programme (or role of a program) is 100% efficient at using multiple CPU cores, yous will receive less and less of a benefit by adding more cores. At a certain point - which can be mathematically calculated once you lot know the parallelization efficiency - you will receive improve operation by using fewer cores that run at a college frequency than using more cores that run at a lower frequency.

Amdahl's Law:

is the theoretical speedup
is the time an algorithm takes to finish when running n threads
is the fraction of the algorithm that is strictly series (so i- B is how much of the plan can exist run in parallel)

Unless you deal with complex equations regularly, this may be a chip daunting of an equation. However, since nosotros are primarily concerned with the maximum speedup that tin be achieved by increasing the number of CPU cores, this equation can simplified a bit into the post-obit:

Parallelization Formula:

is the theoretical speedup
is the fraction of the algorithm that can be fabricated parallel
is the number of CPU threads

What this is basically saying is that the amount of speedup a program volition see by using cores is based on how much of the program is series (tin can just exist run on a single CPU cadre) and how much of information technology is parallel (can be split up among multiple CPU cores).

In order to use this equation, you commencement need to determine the parallelization efficiency of your program. With that number, you can then employ Amdahl'due south Police and a CPU'southward frequency to adequately accurately approximate the performance of most whatever CPU that uses a similar architecture to the CPU you lot used for testing. While you are certainly invited to follow this guide in it's entirety, if you lot are more concerned about actually estimating a CPU'due south performance than all the math behind it feel complimentary to skip ahead to the "Easy Fashion: Using a Google Doc spreadsheet" department.

Amdahl'south Constabulary Limitations

While the method we described above is nifty for determining how much of a programme can exist run in parallel, information technology (and Amdahl's Constabulary in general) has some limitations:

  • Non every activeness washed in a program will have the same corporeality of parallelization. If y'all look at the results of our recent article: Adobe Photoshop CC CPU Multi-threading Operation you will discover that how many CPU cores Photoshop can use varies greatly depending on what you are actually doing. You tin mitigate this limitation somewhat by testing diverse tasks and calculating the parallelization efficiency for each job individually (which is what we did), but depending on the program information technology may not be viable to examination every single possible action.
  • Amdahl'due south Constabulary merely applies if the CPU is the clogging. If what you are doing is non existence limited by the CPU, you lot will find that after a certain number of cores you lot end seeing whatever performance gain. If your video card, RAM, or hard drive performance is preventing the program from running whatsoever faster, calculation more CPU cores will never help even if the plan is 100% parallel. Also, keep in listen that if yous end upwards purchasing a faster CPU than the one you tested with, information technology is entirely possible that the new CPU will be fast enough that something else in the system (RAM, Hard disk drive, GPU, etc.) may then go the bottleneck and limit the performance of your new, faster CPU.
  • Many programs are hard-coded to use a certain number of cores. Even if information technology may exist possible for a programme to effort to use more than cores, many programs accept a hard-set number of CPU cores that can be utilized. In fact, a large majority of software available today notwithstanding merely uses a unmarried CPU core! This is done for a variety of reasons ranging from the nature of what the program is doing making it non-conducive to using multiple CPU cores to it simply being easier to program for a fixed number of cores.
  • Estimating the functioning of a CPU will just be accurate for CPUs based on similar architecture. If the CPU y'all used to determine the parallel efficiency of a program is vastly dissimilar than the CPU you are considering purchasing, you may not be able to accurately estimate the performance of a CPU. Fifty-fifty outside of AMD vs Intel CPUs, if the CPU y'all used to exam is more than a generation or ii quondam the actual performance of a newer CPU may be vastly different (and usually faster) than what Amdahl's Police volition judge. You can still accurately summate the parallel efficiency and utilize that to compare the relative functioning of two or more CPUs that use the same architecture but yous won't be able to determine more than a full general idea of the actual performance.

Footstep 1: Test your program with various number of CPU cores

Unfortunately, determining the parallelization efficiency of a plan is not something yous can notice just past looking in a ReadMe.txt file. The easiest fashion we accept plant to do this is to but run your program and fourth dimension how long it takes to complete a task with the number of CPU cores it tin use limited artificially. Luckily, you don't need to change out your CPU a bunch of times to practise this. Instead, yous can simply set the plan'south affinity through Task Manager in Windows. This is not equally good as completely disabling the CPU cores through the BIOS - which is possible on some motherboards - only we have plant information technology to be much more than accurate than you lot would look.

To fix the affinity, simply launch the plan you want to test, open Task Manager, right-click on the programme list under Details, select "Set Analogousness", and cull the threads that you lot want to allow the programme to apply. Note that if your CPU supports Hyperthreading there will really be twice equally many threads listed than your CPU actually has cores. You tin either disable Hyperthreading in the BIOS earlier doing your testing, or simply select two threads for every CPU cadre you want to test. Hyperthreading threads are always listed immediately after the physical core in Windows, so you would select 2 threads for every CPU cadre you want the programme to use. In other words, selecting threads i&2 will allow the program to only utilize a single CPU core, selecting threads 1-4 will permit the plan to use two CPU cores, and so on.

Note that setting the affinity only lasts until the program is closed. The next fourth dimension yous run the program, you have to re-set the affinity again. However, if you want to apace examination a unmarried action using various numbers of CPU cores, you don't have to close the program before changing the affinity - just click on "Set Affinity" and change it on the fly. Still, you will get more accurate results past endmost the program between runs as that will clean out the RAM that is already allocated to the program.

With the power to ready how many CPU cores a program can utilise, all you need to practise is perform a repeatable activity using a variety of CPU cores. For case, you lot may fourth dimension how long it takes to complete a render in AutoCAD or consign images in Lightroom using a variety of CPU cores. The larger the diversity of number of cores you examination the better, but you need to at least examination with a single CPU core and all possible CPU cores. If possible, we recommend testing with as many combinations equally possible (so if y'all take an 8-core CPU, test with one,2,3,four,5,vi,7, and eight cores).

Pace two: Determining the parallelization fraction

At this point, y'all should have a list that shows how long it took your program to complete an action using various numbers of CPU cores. Only to take an instance, lets say your results look like those in the "Action Time (seconds)" column in the nautical chart beneath:

# of Cores Activeness Time (seconds) Actual Speedup Amdahl'south Law Speedup (97% efficient)
1 645.4 1 1
2 328.3 1.97 i.95
3 230 2.8 2.8
4 172 three.75 3.67
five 140.3 iv.six 4.five
6 117.5 5.v v.2
7 108 6 v.nine
8 97.8 6.6 six.6

The easiest way nosotros take institute to use these results to determine the parallelization efficiency of a program is to start determine how much faster the program completed the job with N cores versus how long it took with a unmarried core. To observe this out, you lot merely need to divide how long the action took with a single core by how long it took with N cores. In our instance, for two cores the speedup is 645.four/328.iii which equals 1.97 . Make full this in for each row and we can use these numbers to decide the parallelization fraction of the plan.

At that place is a complex mathematical way to use the actual speedup numbers to direct find the parallelization fraction using non-linear to the lowest degree squares curve fitting, but the easiest way we accept found is to but guess at the fraction, see how close the results are, so tweak it until the actual speedup is close to the speedup calculated using Amdahl'southward Law. Using a program like Excel or Google Doc'southward Sheets makes this much easier, but you lot can exercise it with only a calculator and a pad of newspaper if you desire to practice it manually and have hours to kill.

To detect the parallelization fraction, you demand to use the parallelization equation nosotros listed earlier and plug in dissimilar values for P:

A good identify to start might be to try P=.8 (or 80% parallel efficient) and perform this calculation for each # of cores. For example, for 4 cores the equation would be:

This equals 2.5. Compare this to our actual speedup in our case (which was 3.75) and you will see that our case program is actually more than 80% efficient then we need to increase the parallelization fraction to something college. In our case, the bodily fraction was .97 (97%) which is pretty decent. You will discover that the results don't line up perfectly every single fourth dimension since at that place is a certain margin of error that always exist when yous run benchmarks - you simply have to average it out and go information technology as close as you can. Having this in a spreadsheet where y'all can graph both data series makes information technology much easier as you'll run into in the Easy Mode section.

Step three: Estimate CPU performance using the parallelization fraction

Once you have the parallelization fraction, you can use it to judge the performance of any other CPU that uses the same or similar architecture as the CPU. If you are interested in a CPU that uses an entirely dissimilar architecture, you tin can still utilize this method to determine the relative difference in operation betwixt a number of unlike CPU models from the same family, but it will likely not be an accurate representation of the bodily performance you'd see with that CPU.

To estimate a CPU's performance, you need to know the operating frequency and how many cores both the CPU you lot used to benchmark with and the CPU you are interested in has. With those specs in hand, yous first need to summate how many constructive cores both CPUs take which is done by using the equation:

Basically, this is using the same parallelization equation nosotros used earlier simply using the actual number of cores the CPU has. This gives us the effective number of CPU cores the CPU has when running your program if the program was actually 100% efficient. From this, we can multiply the number of effective cores with each CPU's operating frequency to get what is essentially how many operations per 2d the CPU is able to complete (or GFLOPs):

Finally, nosotros tin judge how long information technology would take the CPU yous are interested in to complete the same action you benchmarked past dividing the GFLOPS of the two CPUs and multiplying it by the time it took your test CPU to complete the activity with all of information technology'southward cores enabled:

With this, you should cease up with an estimation of how long it would accept a CPU to complete the activeness you benchmarked.

Easy Style: Using a Google Doc spreadsheet

If yous desire to estimate the performance of a CPU using Amdahl's Law and don't love math, yous will probably have a headache by the time you complete this guide. Lucky for you, we took the time to put together a Google Medico that has all the equations already washed and set: Estimating CPU Functioning . You will need to make a re-create of the Doc (go to File->Make a Re-create), just once you have done that you will be able to use it as much as you similar.

To use this doc, do the post-obit:

  • Consummate Step one: Exam the programme with various number of CPU cores. Unfortunately, you simply have to do this step yourself.
  • In one case you have tested your application with various numbers of CPU cores active, input your results into the orange cells in the Google Doc (replacing the example results)
  • Conform the parallel efficiency fraction (the yellow prison cell) until the two lines on the graph are similar. If y'all cannot go the two lines to line up, it may exist that your program is not CPU limited (see the Amdahl's Police force Limitations section)
  • Modify the low-cal bluish cells to reverberate the cores and frequency of the CPU you used for testing (row 28) and the CPU(south) yous are interested in estimating the performance of (row 29-30)
  • You lot should see an estimation of how long information technology should take each CPU to perform the action you benchmarked in the green cells

This is much easier than trying to keep rail of all the different equations, although we understand that there are some people who strangely love doing math.

Decision

Whether yous followed the step-by-stride instructions or only used the Google Medico we linked, you should at present have the resources and information needed to estimate the performance of a CPU for your exact program and application. While this is not the easiest procedure in the earth, it can be invaluable when trying to decide what CPU to employ in your new reckoner.

Say you are purchasing a new organisation but are torn between two CPU models that are like in cost, but very different in terms of frequency and core count. Every bit an instance, lets utilise a Xeon E5-2667 V3 and a Xeon E5-2690 V3. Using the data from the example in Footstep 2 and assuming that our test CPU was a Xeon E5-2660 V3 2.6GHz Ten Core we can estimate the performance of these two CPUs to exist:

CPU Model ~MSRP Estimated Action Time
Intel Xeon E5-2660 V3 2.6GHz X Core (Test CPU) $1450 85.3 seconds
Intel Xeon E5-2667 V3 3.2GHz Viii Cadre $2057 82.five seconds
Intel Xeon E5-2690 V3 2.6GHz Twelve Cadre $2090 74.iv seconds

In this example, a E5-2667 V3 should take about 82.5 seconds to consummate the activeness nosotros benchmarked, while a E5-2690 V3 should only take about 74.four seconds. Since the two CPUs are simply $33 autonomously in price, this makes it almost a no-brainer that the E5-2690 V3 is the best choice in this instance.

Remember that this simply applies to CPUs that are of a similar compages to the one you lot used for testing and only for the action that you benchmarked. Annihilation different (even inside the same programme) may have drastically different results. However, if you keep finding yourself waiting on a render to finish, an export to consummate, or whatsoever other unmarried job you can limit your testing to only those tasks. Each may accept a different parallelization efficiency, but if yous determine the efficiency for each task and give them a certain weight (likely based on how frequently y'all are waiting on each to terminate) you tin can brand a much more educated decision on which CPU is right for you.

If you followed this guide, we'd love to hear what you lot tested, what issues (if any) y'all ran into, and what parallelization fraction yous establish to be the closest lucifer.

Header image via Shutterstock