将64位整数相除,就好像被除数向左移64位,而没有128位类型
致歉的标题致歉.我不确定如何更好地描述我要完成的工作.我本质上是在尝试做相反的事情 获得64位乘法的上半部分在C中针对其中的平台
Apologies for the confusing title. I'm not sure how to better describe what I'm trying to accomplish. I'm essentially trying to do the reverse of getting the high half of a 64-bit multiplication in C for platforms where
int64_t divHi64(int64_t dividend, int64_t divisor) {
return ((__int128)dividend << 64) / (__int128)divisor;
}
由于缺乏对__int128
的支持而无法使用.
isn't possible due to lacking support for __int128
.
无需多字分割即可完成
假设我们想做⌊2 64 × x ⁄ y ⌋,那么我们可以像这样变换表达式
Suppose we want to do ⌊264 × x⁄y⌋ then we can transform the expression like this
根据这个问题,第一个术语用((-y)/y + 1)*x
来简单表示如何在C中计算2⁶⁴/n?
The first term is trivially done as ((-y)/y + 1)*x
as per this question How to compute 2⁶⁴/n in C?
第二项等效于(2 64 %y)/y * x,有点棘手.我尝试了各种方法,但是如果仅使用整数运算,则都需要128位乘法和128/64除法.可以使用算法在以下问题中计算MulDiv64(a, b, c) = a*b/c
来完成
The second term is equivalent to (264 % y)/y*x and is a little bit trickier. I've tried various ways but all need 128-bit multiplication and 128/64 division if using only integer operations. That can be done using the algorithms to calculate MulDiv64(a, b, c) = a*b/c
in the below questions
- 在64位中进行乘法和除法运算的最准确方法是什么?
- 如何在C ++中将64位整数乘以分数,同时最大程度地减少错误?
- (a * b)/c MulDiv并处理来自中间乘法的溢出
- 如何准确地对64位整数进行乘除运算?
- Most accurate way to do a combined multiply-and-divide operation in 64-bit?
- How to multiply a 64 bit integer by a fraction in C++ while minimizing error?
- (a * b) / c MulDiv and dealing with overflow from intermediate multiplication
- How can I multiply and divide 64-bit ints accurately?
但是它们可能很慢,如果您具有这些功能,则可以像MulDiv64(x, UINT64_MAX, y) + x/y + something
这样更轻松地计算整个表达式,而不会弄乱上面的转换
However they may be slow, and if you have those functions you calculate the whole expression more easily like MulDiv64(x, UINT64_MAX, y) + x/y + something
without messing up with the above transformation
使用long double
似乎是最简单的方法,如果它具有64位或更高的精度.所以现在可以通过(2 64 %y)/(long double)y * x
Using long double
seems to be the easiest way if it has 64 bits of precision or more. So now it can be done by (264 % y)/(long double)y*x
uint64_t divHi64(uint64_t x, uint64_t y) {
uint64_t mod_y = UINT64_MAX % y + 1;
uint64_t result = ((-y)/y + 1)*x;
if (mod_y != y)
result += (uint64_t)((mod_y/(long double)y)*x);
return result;
}
为简化起见,省略了溢出检查.如果您需要签名分割,则需要稍作修改
The overflow check was omitted for simplification. A slight modification will be needed if you need signed division
如果您定位的是 64位Windows ,但您使用的MSVC没有__int128
,则 div
指令会在这种情况下引发异常
If you're targeting 64-bit Windows but you're using MSVC which doesn't have __int128
then now it has a 64-bit divide intrinsic which simplifies the job significantly without a 128-bit integer type. You still need to handle overflow though because the div
instruction will throw an exception on that case
uint64_t divHi64(uint64_t x, uint64_t y) {
uint64_t high, remainder;
uint64_t low = _umul128(UINT64_MAX, y, &high);
if (x <= high /* && 0 <= low */)
return _udiv128(x, 0, y, &remainder);
// overflow case
errno = EOVERFLOW;
return 0;
}
上面的溢出检查可以简化为检查x< y,因为如果x> = y则结果将溢出
The overflow checking above is can be simplified to checking whether x < y, because if x >= y then the result will overflow
另请参见
- Efficient Multiply/Divide of two 128-bit Integers on x86 (no 64-bit)
- Efficient computation of 2**64 / divisor via fast floating-point reciprocal
对16/16位除法的详尽测试表明,我的解决方案在所有情况下均能正常工作.但是,即使float
的精度超过16位,您仍然需要double
,否则偶尔会返回少于一的结果.可以通过在截断之前添加 epsilon 值来修复该问题:(uint64_t)((mod_y/(long double)y)*x + epsilon)
.这意味着您需要__float128
(或 -m128bit-long-double
选项如果您不使用 epsilon 更正结果,则在gcc中使用acc)进行精确的64/64位输出.但是该类型在32位目标上可用,与
Exhaustive tests on 16/16 bit division shows that my solution works correctly for all cases. However you do need double
even though float
has more than 16 bits of precision, otherwise occasionally a less-than-one result will be returned. It may be fixed by adding an epsilon value before truncating: (uint64_t)((mod_y/(long double)y)*x + epsilon)
. That means you'll need __float128
(or the -m128bit-long-double
option) in gcc for precise 64/64-bit output if you don't correct the result with epsilon. However that type is available on 32-bit targets, unlike __int128
which is supported only on 64-bit targets, so life will be a bit easier. Of course you can use the function as-is if just a very close result is needed
下面是我用于验证的代码
Below is the code I've used for verifying
#include <thread>
#include <iostream>
#include <limits>
#include <climits>
#include <mutex>
std::mutex print_mutex;
#define MAX_THREAD 8
#define NUM_BITS 27
#define CHUNK_SIZE (1ULL << NUM_BITS)
// typedef uint32_t T;
// typedef uint64_t T2;
// typedef double D;
typedef uint64_t T;
typedef unsigned __int128 T2; // the type twice as wide as T
typedef long double D;
// typedef __float128 D;
const D epsilon = 1e-14;
T divHi(T x, T y) {
T mod_y = std::numeric_limits<T>::max() % y + 1;
T result = ((-y)/y + 1)*x;
if (mod_y != y)
result += (T)((mod_y/(D)y)*x + epsilon);
return result;
}
void testdiv(T midpoint)
{
T begin = midpoint - CHUNK_SIZE/2;
T end = midpoint + CHUNK_SIZE/2;
for (T i = begin; i != end; i++)
{
T x = i & ((1 << NUM_BITS/2) - 1);
T y = CHUNK_SIZE/2 - (i >> NUM_BITS/2);
// if (y == 0)
// continue;
auto q1 = divHi(x, y);
T2 q2 = ((T2)x << sizeof(T)*CHAR_BIT)/y;
if (q2 != (T)q2)
{
// std::lock_guard<std::mutex> guard(print_mutex);
// std::cout << "Overflowed: " << x << '&' << y << '\n';
continue;
}
else if (q1 != q2)
{
std::lock_guard<std::mutex> guard(print_mutex);
std::cout << x << '/' << y << ": " << q1 << " != " << (T)q2 << '\n';
}
}
std::lock_guard<std::mutex> guard(print_mutex);
std::cout << "Done testing [" << begin << ", " << end << "]\n";
}
uint16_t divHi16(uint32_t x, uint32_t y) {
uint32_t mod_y = std::numeric_limits<uint16_t>::max() % y + 1;
int result = ((((1U << 16) - y)/y) + 1)*x;
if (mod_y != y)
result += (mod_y/(double)y)*x;
return result;
}
void testdiv16(uint32_t begin, uint32_t end)
{
for (uint32_t i = begin; i != end; i++)
{
uint32_t y = i & 0xFFFF;
if (y == 0)
continue;
uint32_t x = i & 0xFFFF0000;
uint32_t q2 = x/y;
if (q2 > 0xFFFF) // overflowed
continue;
uint16_t q1 = divHi16(x >> 16, y);
if (q1 != q2)
{
std::lock_guard<std::mutex> guard(print_mutex);
std::cout << x << '/' << y << ": " << q1 << " != " << q2 << '\n';
}
}
}
int main()
{
std::thread t[MAX_THREAD];
for (int i = 0; i < MAX_THREAD; i++)
t[i] = std::thread(testdiv, std::numeric_limits<T>::max()/MAX_THREAD*i);
for (int i = 0; i < MAX_THREAD; i++)
t[i].join();
std::thread t2[MAX_THREAD];
constexpr uint32_t length = std::numeric_limits<uint32_t>::max()/MAX_THREAD;
uint32_t begin, end = length;
for (int i = 0; i < MAX_THREAD - 1; i++)
{
begin = end;
end += length;
t2[i] = std::thread(testdiv16, begin, end);
}
t2[MAX_THREAD - 1] = std::thread(testdiv, end, UINT32_MAX);
for (int i = 0; i < MAX_THREAD; i++)
t2[i].join();
std::cout << "Done\n";
}