Search In
• More options...
Find results that contain...
Find results in...

# When calculating the amount of FLOPS in a GCN GPU.

## Recommended Posts

The 7970 has 32 GCN Compute Units, Each GCN Compute Unit contains 4 SIMDs 16-wide.

4 SIMDs 16-wide per each Compute Unit = 4 X 16 = 64 SIMDs per each GCN Compute Unit

64 SIMDs X 32 Compute Units = 2048 Stream Processors, correct?

2048 (stream processors) X 0.925 (frequency in GHz) X 2 (Operations per cycle, 2 FMA) = 3788.8 GFLOPS

What does the number 2 represent? Did I label all of them correctly?

Edited by nigelhere9901

##### Share on other sites

It signifies two floating point operations per clock cycle. This integer varies depending on the processor design and type. I think the Intel Sandy and Ivy Bridge processors were 6 (most) or 8 (Haswell) operations per clock cycle per core, for example.

##### Share on other sites

It signifies two floating point operations per clock cycle. This integer varies depending on the processor design and type. I think the Intel Sandy and Ivy Bridge processors were 6 (most) or 8 (Haswell) operations per clock cycle per core, for example.

Also, do you happen to know how to calculate FLOPS in a CPU? Is it the same method? Cores * Speed in GHz * FLOPS per cycle?

Edited by nigelhere9901

##### Share on other sites

Technically, it's Sockets * Cores Per Socket * Clock Speed Per Core * Floating-Point Operations per-cycle, but the former number is usually 1 in most desktop systems.

The actual calculation for the 7970 would be 32 * 64 * 0.925 * 2 = 3788.8 GFLOPS

##### Share on other sites
The actual calculation for the 7970 would be 32 * 64 * 0.925 * 2 = 3788.8 GFLOPS

That theory is not any different from mine, each Compute Unit has 64 stream cores, you just have to multiply the number 64 to the number of compute units and then in to speed and then into Floating Point Operations per cycle.

##### Share on other sites

The mathematics is the same, but it's showing the process in the main sum. That way you don't have to work out total core numbers, which is relatively okay on a single GPU or server but much more difficult across distrusted computing systems.

##### Share on other sites

Alright, Thanks for your help kind sir.

## Create an account

Register a new account