2D莫顿code连接code /德code 64位

问题描述:

如何连接ç莫顿codeS(z顺序)给出[X,Y]作为32位无符号整数产生64位莫顿code和反之亦然code /德$ C $? 我有xy2d和d2xy但只为坐标是16位宽32位生产莫顿数量。搜索了很多网,也没有找到。请大家帮帮忙。

How to encode/decode morton codes(z-order) given [x, y] as 32bit unsigned integers producing 64bit morton code, and vice verse ? I do have xy2d and d2xy but only for coordinates that are 16bits wide producing 32bit morton number. Searched a lot in net, but couldn't find. Please help.

如果有可能你使用的建筑你可能能够加速超越使用位twiddeling黑客什么是可能的具体操作说明:

If it is possible for you to use architecture specific instructions you'll likely be able to accelerate the operation beyond what is possible using bit-twiddeling hacks:

例如,如果你写code为英特尔的Haswell以后CPU,可以使用包含BMI2指令集 PEXT PDEP 的说明。这些可以(在其他伟大的事情),可以用来构建你的职责。

For example if you write code for the Intel Haswell and later CPUs you can use the BMI2 instruction set which contains the pext and pdep instructions. These can (among other great things) be used to build your functions.

下面是一个完整的示例(与GCC测试):

Here is a complete example (tested with GCC):

#include <immintrin.h>
#include <stdint.h>

// on GCC, compile with option -mbmi2, requires Haswell or better.

uint64_t xy_to_morton (uint32_t x, uint32_t y)
{
  return _pdep_u32(x, 0x55555555) | _pdep_u32(y,0xaaaaaaaa);
}

uint64_t morton_to_xy (uint64_t m, uint32_t *x, uint32_t *y)
{
  *x = _pext_u64(m, 0x5555555555555555);
  *y = _pext_u64(m, 0xaaaaaaaaaaaaaaaa);
}

如果您有支持更早的CPU或ARM平台并非一切都完了。您仍然可以得到至少获得了xy_to_morton功能的帮助下,从特定的加密操作。

If you have to support earlier CPUs or the ARM platform not all is lost. You may still get at least get help for the xy_to_morton function from instructions specific for cryptography.

很多的CPU有进位乘法,这些天的支持。在ARM那将是 vmul_p8 从NEON指令集。在x86上,你会发现它是 PCLMULQDQ 从CLMUL指令集(自2010)。

A lot of CPUs have support for carry-less multiplication these days. On ARM that'll be vmul_p8 from the NEON instruction set. On X86 you'll find it as PCLMULQDQ from the CLMUL instruction set (available since 2010).

诀窍这里是,一些与本身的进位少的乘法将返回一个位模式包含与零位交织参数的原始比特。因此它是相同的_pdep_u32(X,0x55​​555555)如上所示。例如。事实证明下列字节:

The trick here is, that a carry-less multiplication of a number with itself will return a bit-pattern that contains the original bits of the argument with zero-bits interleaved. So it is identical to the _pdep_u32(x,0x55555555) shown above. E.g. it turns the following byte:

 +----+----+----+----+----+----+----+----+
 | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
 +----+----+----+----+----+----+----+----+

进入:

 +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
 | 0  | b7 | 0  | b6 | 0  | b5 | 0  | b4 | 0  | b3 | 0  | b2 | 0  | b1 | 0  | b0 |
 +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+

现在,你可以建立xy_to_morton函数(此处显示为CLMUL指令集):

Now you can build the xy_to_morton function as (here shown for CLMUL instruction set):

#include <wmmintrin.h>
#include <stdint.h>

// on GCC, compile with option -mpclmul

uint64_t carryless_square (uint32_t x)
{
  uint64_t val[2] = {x, 0};
  __m128i *a = (__m128i * )val;
  *a = _mm_clmulepi64_si128 (*a,*a,0);
  return val[0];
}

uint64_t xy_to_morton (uint32_t x, uint32_t y)
{
  return carryless_square(x)|(carryless_square(y) <<1);
}

_mm_clmulep​​i64_si128 生成一个128位的结果,而我们只使用低64位。所以,你甚至可以在该版本改进上面,并用一个单一的_mm_clmulep​​i64_si128不要做的工作。

_mm_clmulepi64_si128 generates a 128 bit result of which we only use the lower 64 bits. So you can even improve upon the version above and use a single _mm_clmulepi64_si128 do do the job.

这是一样好,你可以在主流平台(如现代与ARM NEON和x86)获得。不幸的是,我不知道有什么绝招的使用加密指令,加快morton_to_xy功能,我想真的很难数月。

That is as good as you can get on mainstream platforms (e.g. modern ARM with NEON and x86). Unfortunately I don't know of any trick to speed up the morton_to_xy function using the cryptography instructions and I tried really hard for several month.