ARM gcc 内联汇编器优化问题

问题描述：

为什么当我有优化标志 -O3 时，我的内联汇编程序不起作用，但它可以与其他优化标志(-O0、-O1、-O2、-Os)一起使用?

Why is it that my inline assembler routine is not working when I have optimization flag -O3 but it works with other optimization flags (-O0, -O1, -O2, -Os)?

我什至在我的所有汇编指令中都添加了 volatile，我认为这会告诉编译器不要接触或重新排序任何东西?

I even added volatile to all my assembler instructions, which I thought would tell the compiler to not touch or reorder anything?

最好的问候

吉古先生

答

GCC 内联汇编器对正确规范非常敏感.

GCC inline assembler is very sensitive towards correct specification.

特别是，您必须非常精确地指定正确的约束，以确保编译器不会决定优化"您的汇编代码.有几件事需要注意.举个例子.

In particular, you have to be extremely precise about specifying the correct constraints to make sure the compiler does not decide to "optimize" your assembler code. There's a few things to watch out for. Take an example.

以下两个:

    int myasmfunc(int arg)    /* definitely buggy ... */
    {
        register int myval asm("r2") = arg;

        asm ("add r1, r0, #22\n" ::: "r1");
        asm ("adds r0, r1, r0\n" ::: "r0", "cc");
        asm ("subeq r2, #123\n" ::: "r2");
        asm ("subne r2, #213\n" ::: "r2");
        return myval;
    }

和

    int myasmfunc(int arg)
    {
        int myval = arg, plus = arg;

        asm ("add %0, #22\n\t" : "+r"(plus));
        asm ("adds %1, %2\n\t"
             "subeq %0, #123\n\t"
             "subne %0, #213\n\t" : "+r"(myval), "+r"(plus) : "r"(arg) : "cc");
        return myval;
    }

乍一看可能很相似，你会天真地认为它们是一样的；但他们远非如此！

might look similar at first sight and you'd naively assume they do the same; but they are very far from that !

此代码的第一个版本存在多个问题.

There are multiple problems with the first version of this code.

一方面，如果您将其指定为单独的 asm() 语句，编译器可以自由地在中间插入任意代码.这特别意味着 sub 指令，即使它们本身不修改条件代码，也可能与编译器选择插入的内容发生冲突.
第二，同样由于在指定单独的 asm() 语句时指令的拆分，不能保证代码生成器会选择相同的寄存器来放入 myval两次，尽管变量声明中有 asm("r2") 规范.
第三，第一个假设r0包含函数的参数是错误的；编译器在到达汇编块时可能已经选择将此参数移动到任何其他位置.更糟糕的是，您再次拥有 split 语句，并且不能保证两个 asm() 之间会发生什么.即使您指定 __asm__ __volatile__(...);，编译器也会将两个这样的块视为独立实体.
第四，您没有告诉编译器您正在破坏/分配 myval.它可能选择暂时将它移到别处，因为你正在破坏r2"，当返回时，决定从......恢复它(???).

For one, if you specify it as separate asm() statements, the compiler is free to insert arbitrary code in-between. That in particular means the sub instructions, even though they themselves don't modify the condition codes, can fall foul of things the compiler choose to insert which did.
Second, again due to the split of the instructions when specifying separate asm() statements, there's no guarantee the code generator will choose the same register to put myval in both times, the asm("r2") spec in the variable declaration notwithstanding.
Third, the assumption made in the first that r0 contains the argument of the function is wrong; the compiler, by the time it gets to the assembly block, might've choosen to move this argument to whatever other place. Worse even since again you have the split statement, and no guarantee is made as to what happens between two asm(). Even if you specify __asm__ __volatile__(...); the compiler treats two such blocks as independent entities.
Fourth, you're not telling the compiler that you're clobbering / assigning myval. It might've chosen to temporarily move it elsewhere because you're clobbering "r2" and when returning, decide to restore it from ... (???).

只是为了好玩，这里是第一个函数的输出，用于以下四种情况:

Just for the fun of it, here's the output of the first function, for the following four cases:

默认 - gcc -c tst.c
优化 - gcc -O8 -c tst.c
使用一些不寻常的选项 - gcc -c -finstrument-functions tst.c
那加上优化 - gcc -c -O8 -finstrument-functions tst.c

Disassembly of section .text:

00000000 :
   0:   e52db004    push    {fp}        ; (str fp, [sp, #-4]!)
   4:   e28db000    add fp, sp, #0  ; 0x0
   8:   e24dd00c    sub sp, sp, #12 ; 0xc
   c:   e50b0008    str r0, [fp, #-8]
  10:   e51b2008    ldr r2, [fp, #-8]
  14:   e2811016    add r1, r1, #22 ; 0x16
  18:   e0910000    adds    r0, r1, r0
  1c:   0242207b    subeq   r2, r2, #123    ; 0x7b
  20:   124220d5    subne   r2, r2, #213    ; 0xd5
  24:   e1a03002    mov r3, r2
  28:   e1a00003    mov r0, r3
  2c:   e28bd000    add sp, fp, #0  ; 0x0
  30:   e8bd0800    pop {fp}
  34:   e12fff1e    bx  lr


Disassembly of section .text:

00000000 :
   0:   e1a03000    mov r3, r0
   4:   e2811016    add r1, r1, #22 ; 0x16
   8:   e0910000    adds    r0, r1, r0
   c:   0242207b    subeq   r2, r2, #123    ; 0x7b
  10:   124220d5    subne   r2, r2, #213    ; 0xd5
  14:   e1a00003    mov r0, r3
  18:   e12fff1e    bx  lr


Disassembly of section .text:

00000000 :
   0:   e92d4830    push    {r4, r5, fp, lr}
   4:   e28db00c    add fp, sp, #12 ; 0xc
   8:   e24dd008    sub sp, sp, #8  ; 0x8
   c:   e1a0500e    mov r5, lr
  10:   e50b0010    str r0, [fp, #-16]
  14:   e59f0038    ldr r0, [pc, #56]   ; 54 
  18:   e1a01005    mov r1, r5
  1c:   ebfffffe    bl  0 
  20:   e51b2010    ldr r2, [fp, #-16]
  24:   e2811016    add r1, r1, #22 ; 0x16
  28:   e0910000    adds    r0, r1, r0
  2c:   0242207b    subeq   r2, r2, #123    ; 0x7b
  30:   124220d5    subne   r2, r2, #213    ; 0xd5
  34:   e1a04002    mov r4, r2
  38:   e59f0014    ldr r0, [pc, #20]   ; 54 
  3c:   e1a01005    mov r1, r5
  40:   ebfffffe    bl  0 
  44:   e1a03004    mov r3, r4
  48:   e1a00003    mov r0, r3
  4c:   e24bd00c    sub sp, fp, #12 ; 0xc
  50:   e8bd8830    pop {r4, r5, fp, pc}
  54:   00000000    .word   0x00000000


Disassembly of section .text:

00000000 :
   0:   e92d4070    push    {r4, r5, r6, lr}
   4:   e1a0100e    mov r1, lr
   8:   e1a05000    mov r5, r0
   c:   e59f0028    ldr r0, [pc, #40]   ; 3c 
  10:   e1a0400e    mov r4, lr
  14:   ebfffffe    bl  0 
  18:   e2811016    add r1, r1, #22 ; 0x16
  1c:   e0910000    adds    r0, r1, r0
  20:   0242207b    subeq   r2, r2, #123    ; 0x7b
  24:   124220d5    subne   r2, r2, #213    ; 0xd5
  28:   e59f000c    ldr r0, [pc, #12]   ; 3c 
  2c:   e1a01004    mov r1, r4
  30:   ebfffffe    bl  0 
  34:   e1a00005    mov r0, r5
  38:   e8bd8070    pop {r4, r5, r6, pc}
  3c:   00000000    .word   0x00000000

如您所见，这些都没有达到您希望看到的效果；但是，代码的第二个版本在 gcc -c -O8 ... 上最终为:

As you can see, neither of these does what you'd be hoping to see; the second version of the code, though, on gcc -c -O8 ... ends up as:

Disassembly of section .text:

00000000 :
   0:   e1a03000    mov r3, r0
   4:   e2833016    add r3, r3, #22 ; 0x16
   8:   e0933000    adds    r3, r3, r0
   c:   0240007b    subeq   r0, r0, #123    ; 0x7b
  10:   124000d5    subne   r0, r0, #213    ; 0xd5
  14:   e12fff1e    bx  lr

也就是说，更接近于您在程序集中指定的内容以及您所期望的内容.

and that is, rather closely, what you've specified in your assembly and what you're expecting.

士气:明确和准确地处理你的约束、你的操作数分配，并将相互依赖的汇编行保持在相同 asm() 块中(制作多行语句).

Morale: Be explicit and exact with your constraints, your operand assignments, and keep interdependent lines of assembly within the same asm() block (make a multiline statement).

ARM gcc 内联汇编器优化问题

相关推荐