为什么C ++中的float除法比整数除法快？

问题描述：

请考虑以下C ++代码段：（Visual Studio 2015）

Consider the following code snippet in C++ :(visual studio 2015)

第一个区块

const int size = 500000000;
int sum =0;
int *num1 = new int[size];//initialized between 1-250
int *num2 = new int[size];//initialized between 1-250
for (int i = 0; i < size; i++)
{
    sum +=(num1[i] / num2[i]);
}

第二个区块

const int size = 500000000;
int sum =0;
float *num1 = new float [size]; //initialized between 1-250 
float *num2 = new float [size]; //initialized between 1-250
for (int i = 0; i < size; i++)
{
    sum +=(num1[i] / num2[i]);
}

我期望第一个块的运行速度更快，因为它是整数运算。但是，尽管第二块是浮点运算，但它的运行速度要快得多。这是我基准测试的结果：
部门：

I expected that first block runs faster because it is integer operation . But the Second block is considerably faster , although it is floating point operation . here is results of my bench mark : Division:

Type    Time
uint8   879.5ms
uint16  885.284ms
int     982.195ms
float   654.654ms

以及浮点乘法比整数乘法快。
是我的基准测试结果：

As well as floating point multiplication is faster than integer multiplication. here is results of my bench mark :

乘法：

Type    Time
uint8   166.339ms
uint16  524.045ms
int     432.041ms
float   402.109ms

我的系统规格：CPU核心i7-7700，Ram 64GB，Visual Studio 2015

My system spec: CPU core i7-7700 ,Ram 64GB,Visual studio 2015

答

由于浮点中的指数部分，浮点数除法比整数除法要快。点号表示。

int32_t 除法需要快速除以31位数字，而 float 除法要求快速除以24位尾数（隐含尾数中的前一个而不是存储在浮点数中），并且需要更快地减去8位指数。

int32_t division requires fast division of 31-bit numbers, whereas float division requires fast division of 24-bit mantissas (the leading one in mantissa is implied and not stored in a floating point number) and faster subtraction of 8-bit exponents.

请参见出色的详细说明，如何在CPU中执行除法。

值得一提的是SSE和AVX指令仅提供浮点除法，而不提供整数除法。 SSE指令/整数可以轻松地将 float 计算的速度提高三倍。

It may be worth mentioning that SSE and AVX instructions only provide floating point division, but no integer division. SSE instructions/intrinsincs can be used to quadruple the speed of your float calculation easily.

如果您查看 Agner Fog的指令表，例如，对于Skylake，是32位整数除法的延迟是26个CPU周期，而SSE标量浮点除法的等待时间是11个CPU周期（而且令人惊讶的是，划分四个压缩浮点需要花费相同的时间）。

If you look into Agner Fog's instruction tables, for example, for Skylake, the latency of the 32-bit integer division is 26 CPU cycles, whereas the latency of the SSE scalar float division is 11 CPU cycles (and, surprisingly, it takes the same time to divide four packed floats).

还要注意，在C和C ++中，对小于 int 的数字没有除法，因此 uint8_t 和 uint16_t 升级为 int ，然后进行 int 的划分。 uint8_t 的划分看起来比 int 快，因为转换为 int时它设置的位更少了可使除法更快完成。

Also note, in C and C++ there is no division on numbers shorter that int, so that uint8_t and uint16_t are first promoted to int and then the division of ints happens. uint8_t division looks faster than int because it has fewer bits set when converted to int which causes the division to complete faster.

为什么C ++中的float除法比整数除法快？

相关推荐