MSVC/arch:[指令集]-SSE3,AVX,AVX2

MSVC/arch:[指令集]-SSE3,AVX,AVX2

问题描述:

这是显示支持的指令集的类的示例. https://msdn.microsoft.com/en-us/library/hskdteyh.aspx

Here is an example of a class which shows supported instruction sets. https://msdn.microsoft.com/en-us/library/hskdteyh.aspx

我想为一个函数编写三个不同的实现,每个实现都使用不同的指令集.但是,例如,由于带有/ARCH:AVX2标志,此应用程序将无法在第四代以上的英特尔处理器上任何地方运行,因此整个检查点毫无意义.

I want to write three different implementations of a single function, each of them using different instruction set. But due to flag /ARCH:AVX2, for example, this app won't ever run anywhere but on 4th+ generation of Intel processors, so the whole point of checking is pointless.

所以,问题是:这个标志究竟是做什么 ?启用 support 还是使用提供的指令集启用编译器优化?

So, question is: what exactly this flag does? Enables support or enables compiler optimizations using provided instruction sets?

换句话说,我可以完全删除该标志并继续使用immintrin.h,emmintrin.h等中的函数吗?

In other words, can I completely remove this flag and keep using functions from immintrin.h, emmintrin.h, etc?

使用选项/ARCH:AVX2可以以最佳方式使用CPU的YMM寄存器和AVX2指令.但是,如果CPU不支持这些指令,则将导致程序崩溃.如果使用AVX2指令和编译器标志/ARCH:SSE2,则会降低性能(大约2倍).

An using of option /ARCH:AVX2 allows to use YMM registers and AVX2 instructions of CPU by the best way. But if CPU is not support these instruction it will be a program crash. If you use AVX2 instructions and compiler flag /ARCH:SSE2 that will be a decreasing of performance (about 2x times).

因此,当使用相应的编译器选项(/ARCH:AVX2,/ARCH:SSE2等)编译函数的每个实现时,就是最佳实现.最简单的方法-将实现(标量,SSE,AVX)放在不同的文件中,并使用特定的编译器选项编译每个文件.

So the best implementation when every implementation of your function is compiled with corresponding compiler options (/ARCH:AVX2, /ARCH:SSE2 and so on). The easiest way to do it - put your implementations (scalar, SSE, AVX) in different files and compile each file with specific compiler options.

如果您创建一个单独的文件来检查CPU功能并调用函数的相应实现,那也是个好主意.

Also it will be a good idea if you create a separate file where you can check CPU abilities and call corresponding implementation of your function.

有一个的示例,该示例执行CPU 调用已实现的功能之一.

There is an example of a library which does CPU checking and calling an one of implemented function.