禁用所有用于g ++构建的AVX-512指令
我正在尝试使用这些标志在没有任何avx512指令的情况下进行构建: -march = native -mno-avx512f
.但是我仍然得到一个二进制文件生成了AVX512( vmovss
)指令(我正在使用elfx86exts进行检查).知道如何禁用这些功能吗?
Hi I'm trying to build without any avx512 instructions by using those flags:
-march=native -mno-avx512f
.
However i still get a binary which has
AVX512 (vmovss
) instruction generated (i'm using elfx86exts to check).
Any idea how to disable those ?
-march = native -mno-avx512f
是正确的选项,有 是 vmovss
的AVX512F EVEX编码,但是除非涉及的寄存器是 xmm16..31
,否则GAS不会使用它.当您使用 -mno-avx512f
禁用AVX512F或不使用 -march = skylake
之类的功能来启用它时,GCC不会使用这些寄存器发出asm.或 -march = znver2
.
There is an AVX512F EVEX encoding of vmovss
, but GAS won't use it unless the register involved is xmm16..31
. GCC won't emit asm using those registers when you disable AVX512F with -mno-avx512f
, or don't enable it in the first place with something like -march=skylake
or -march=znver2
.
如果仍然不确定,请检查实际的反汇编+机器代码,以查看该指令以什么前缀开头:
If you're still not sure, check the actual disassembly + machine code to see what prefix the instruction starts with:
-
C5
或C4
字节:2或3字节VEX前缀(AVX1编码)的开头. -
62
字节:EVEX前缀的开头,AVX512F编码
- a
C5
orC4
byte: start of a 2 or 3 byte VEX prefix, AVX1 encoding. - a
62
byte: start of an EVEX prefix, AVX512F encoding
.intel_syntax noprefix
vmovss xmm15, [rdi]
vmovss xmm15, [r11]
vmovss xmm16, [rdi]
使用 gcc -c avx.s
进行汇编,并使用 objdump -drwC -Mintel avx.o
进行反汇编:
assembled with gcc -c avx.s
and disassemble with objdump -drwC -Mintel avx.o
:
0000000000000000 <.text>:
0: c5 7a 10 3f vmovss xmm15,DWORD PTR [rdi] # AVX1
4: c4 41 7a 10 3b vmovss xmm15,DWORD PTR [r11] # AVX1
9: 62 e1 7e 08 10 07 vmovss xmm16,DWORD PTR [rdi] # AVX512F
10 操作码前的
2和3字节VEX,以及4字节EVEX前缀.(ModRM字节也不同; xmm0和xmm16的区别仅在于前缀的额外寄存器位,而不是modrm).
2 and 3 byte VEX, and 4 byte EVEX prefixes before the 10
opcode. (The ModRM bytes are different too; xmm0 and xmm16 would differ only in the extra register bit from the prefix, not the modrm).
GAS尽可能使用 vmovss
的AVX1 VEX编码和其他指令.因此,您可以指望使用非AVX512F格式的指令来使用非尽可能使用AVX512F表格.这就是GNU工具链(由GCC使用)使 -mno-avx512f
工作的方式.
GAS uses the AVX1 VEX encoding of vmovss
and other instructions when possible. So you can count on instructions that have a non-AVX512F form to be using the non-AVX512F form whenever possible. This is how the GNU toolchain (used by GCC) makes -mno-avx512f
work.
即使EVEX编码较短,这也适用 .例如当 [reg + constant]
可以使用AVX512缩放的disp8(按元素宽度缩放),但AVX1编码需要32位位移(以字节为单位)时.
This applies even when the EVEX encoding is shorter. e.g. when a [reg + constant]
could use an AVX512 scaled disp8 (scaled by the element width) but the AVX1 encoding would need a 32-bit displacement that counts in bytes.
f: c5 7a 10 bf 00 01 00 00 vmovss xmm15,DWORD PTR [rdi+0x100] # AVX1 [reg+disp32]
17: 62 e1 7e 08 10 47 40 vmovss xmm16,DWORD PTR [rdi+0x100] # AVX512 [reg + disp8*4]
1e: c5 78 28 bf 00 01 00 00 vmovaps xmm15,XMMWORD PTR [rdi+0x100] # AVX1 [reg+disp32]
26: 62 e1 7c 08 28 47 10 vmovaps xmm16,XMMWORD PTR [rdi+0x100] # AVX512 [reg + disp8*16]
请注意机器代码编码的最后一个字节或最后4个字节:对于AVX1编码,它是32位的小尾数0x100字节位移,但是对于AVX512,它是8x的0x40 dword或0x10 dqwords位移.编码.
Note the last byte, or last 4 bytes, of the machine code encodings: it's a 32-bit little-endian 0x100 byte displacement for the AVX1 encodings, but an 8-bit displacement of 0x40 dwords or 0x10 dqwords for the AVX512 encodings.
但是使用 {evex} vmovaps xmm0 [rdi + 256]
的asm源覆盖,即使对于"low",我们也可以获得紧凑的编码.寄存器:
But using an asm-source override of {evex} vmovaps xmm0, [rdi+256]
we can get the compact encoding even for "low" registers:
62 f1 7c 08 28 47 10 vmovaps xmm0,XMMWORD PTR [rdi+0x100]
GCC当然不会使用 -mno-avx512f
来做到这一点.
GCC will of course not do that with -mno-avx512f
.
不幸的是,当您执行启用AVX512F(例如,编译 __ m128 load(__ m128 * p){时返回p [16];}
与 -O3 -march = skylake-avx512
( Godbolt ).使用二进制模式,或者只是注意在编译器输出的asm源代码行上缺少 {evex}
标记.
Unfortunately GCC and clang also miss that optimization when you do enable AVX512F, e.g. when compiling __m128 load(__m128 *p){ return p[16]; }
with -O3 -march=skylake-avx512
(Godbolt). Use binary mode, or simply note the lack of an {evex}
tag on that asm source line of compiler output.