如何在不支持硬件的情况下测试AVX-512指令?
我正在尝试学习x86-64
的新AVX-512指令,但是我的计算机均不支持它们.我尝试使用各种反汇编程序(从Visual Studio到在线反汇编程序: 1 ,
I'm trying to learn x86-64
's new AVX-512 instructions, but neither of my computers have support for them. I tried using various disassemblers (from Visual Studio to online ones: 1, 2) to see the instructions for specific opcode encodings, but I'm getting somewhat conflicting results. Plus, it would've been nice to run some instructions and see their actual output.
所以我想知道是否有一个在线服务可以编译小型(x86-64)汇编代码并在特定处理器上运行或逐步执行该汇编代码? (例如,英特尔的Sandy Bridge
,Cannon Lake
等)
So I'm wondering if there is an online service that allows to compile small (x86-64) assembly code and run it, or step through it, on a specific processor? (Say, Intel's Sandy Bridge
, Cannon Lake
, etc.)
Use Intel® Software Development Emulator, aka SDE to run an executable on an emulated CPU that supports future instruction-sets. It's freeware (not open source, but a free download), and is available for Linux, Windows, and I think also OS X.
https://software.intel.com/zh-cn/articles/debugging-applications-with-intel-sde 包含有关如何在Windows或Linux上进行调试的逐步说明:SDE可以用作GDB远程服务器,因此您可以运行sde -debug -- ./your-program
,然后在另一个终端中运行gdb ./your-program
并使用target remote :portnumber
连接到SDE进程,以便您可以设置断点和单步.
https://software.intel.com/en-us/articles/debugging-applications-with-intel-sde has step-by-step instructions for how to debug with it on Windows or Linux: SDE can work as a GDB remote, so you can run sde -debug -- ./your-program
, then in another terminal run gdb ./your-program
and use target remote :portnumber
to connect to the SDE process so you can set breakpoints and single-step.
如果QEMU增加了对模拟AVX512的支持,则您也许可以对QEMU做同样的事情. QEMU也可以充当GDB远程服务器.
You might be able to do the same thing with QEMU, if they've added support for emulating AVX512. QEMU can also act as a GDB remote.
QEMU肯定具有可配置的指令集内容,例如您可以告诉它使用AVX而不是AVX2(例如Sandybridge)来模拟x86.SDM可能做同样的事情.
QEMU definitely has configurable instruction-set stuff, e.g. you could tell it to emulate an x86 with AVX but not AVX2 (like Sandybridge.) SDM can probably do the same thing.
如果要验证您的CPUID检查不假设任何东西暗示着任何其他不能保证的东西,您甚至可以告诉它模拟在真实硬件上找不到的东西,例如AVX2,但不能模拟BMI1/2.
You could even tell it to emulate something you won't find on real hardware, like AVX2 but not BMI1/2, if you want to verify that your CPUID checks don't assume anything implies anything else that isn't guaranteed.
请记住,这些对于性能测试本质上是无用的,仅用于矢量化的正确性. IACA 可能有助于了解性能在SKX上,但这还远未达到完美,并且根本没有模型化内存瓶颈. (仅是实际管道的详细程度.)
Remember that these are both essentially useless for performance testing, only for correctness of your vectorization. IACA could be useful to get an idea of performance on SKX, but it's far from perfect and doesn't model memory bottlenecks at all. (Only the actual pipeline in some level of detail.)