在cython中生成高斯随机数的最有效,最便捷的方法是什么?
我正在编写一个cython应用程序,需要在紧密的嵌套循环中即时生成一个高斯随机变量.我想这样做而不会引入任何额外的依赖关系,例如,对GSL的依赖.
I am writing a cython application where I need to generate a Gaussian random variable on-the-fly in a tight nested loop. I would like to do this without introducing any extra dependencies, e.g., on GSL.
对于当前我可以使用均匀随机数字即时实现的方法的最低版本:
For a minimal version of the way I am currently able to do this with uniform random numbers on-the-fly:
from libc.stdlib cimport rand, RAND_MAX
import numpy as np
cdef double random_uniform():
cdef double r = rand()
return r/RAND_MAX
def my_function(int n):
cdef int i
cdef double[:] result = np.zeros(n, dtype='f8', order='C')
for i in range(n):
result[i] = random_uniform()
return result
以上代码在功能上等效于numpy.random.rand(n),并且可以使用以下最小安装文件进行编译:
The above code is functionally equivalent to numpy.random.rand(n), and can be compiled with the following minimal setup file:
from distutils.core import setup
from Cython.Build import cythonize
import numpy as np
setup(ext_modules=cythonize("example.pyx"), include_dirs=[np.get_include()])
# compile instructions:
# python setup.py build_ext --inplace
要回答这个问题,我要寻找的是与np.random.randn(n)等效的功能的最小解决方案,出于可移植性的考虑,理想情况下还是直接从libc.stdlib中导入任何依赖项.
To answer this question, what I am looking for is the same kind of minimal solution for the functional equivalent of np.random.randn(n), again ideally with any dependency directly cimported from libc.stdlib for reasons of portability.
There is an example implementation on the Wikipedia entry for the Box-Muller algorithm, but I have had trouble implementing it due to the way that the constant epsilon is defined.
我创建了一个函数,该函数根据Box-Muller变换的极坐标版本生成高斯分布的随机数,如伪代码
I created a function that generates gaussian-distributed random numbers based on the polar version of the Box-Muller transformation, as described by the pseudocode here. (I originally found this on the page archived here.)
此方法一次生成两个高斯分布的随机数.这意味着要获得完整的 my_gaussian_fast
所做的,并且以适度的优势击败了 numpy
.
This method generates two gaussian-distributed random numbers at a time. That means to get full cython
speed, we need to figure out a way to pass two numbers around without turning them into Python objects. The most straightforward way to do so (that I can think of) is to pass the buffer in for direct manipulation by the generator. That's what my_gaussian_fast
does, and it beats numpy
by a modest margin.
from libc.stdlib cimport rand, RAND_MAX
from libc.math cimport log, sqrt
import numpy as np
import cython
cdef double random_uniform():
cdef double r = rand()
return r / RAND_MAX
cdef double random_gaussian():
cdef double x1, x2, w
w = 2.0
while (w >= 1.0):
x1 = 2.0 * random_uniform() - 1.0
x2 = 2.0 * random_uniform() - 1.0
w = x1 * x1 + x2 * x2
w = ((-2.0 * log(w)) / w) ** 0.5
return x1 * w
@cython.boundscheck(False)
cdef void assign_random_gaussian_pair(double[:] out, int assign_ix):
cdef double x1, x2, w
w = 2.0
while (w >= 1.0):
x1 = 2.0 * random_uniform() - 1.0
x2 = 2.0 * random_uniform() - 1.0
w = x1 * x1 + x2 * x2
w = sqrt((-2.0 * log(w)) / w)
out[assign_ix] = x1 * w
out[assign_ix + 1] = x2 * w
@cython.boundscheck(False)
def my_uniform(int n):
cdef int i
cdef double[:] result = np.zeros(n, dtype='f8', order='C')
for i in range(n):
result[i] = random_uniform()
return result
@cython.boundscheck(False)
def my_gaussian(int n):
cdef int i
cdef double[:] result = np.zeros(n, dtype='f8', order='C')
for i in range(n):
result[i] = random_gaussian()
return result
@cython.boundscheck(False)
def my_gaussian_fast(int n):
cdef int i
cdef double[:] result = np.zeros(n, dtype='f8', order='C')
for i in range(n // 2): # Int division ensures trailing index if n is odd.
assign_random_gaussian_pair(result, i * 2)
if n % 2 == 1:
result[n - 1] = random_gaussian()
return result
测试.这是一个统一的基准:
Tests. Here's a uniform benchmark:
In [3]: %timeit numpy.random.uniform(size=10000)
10000 loops, best of 3: 130 µs per loop
In [4]: %timeit numpy.array(example.my_uniform(10000))
10000 loops, best of 3: 85.4 µs per loop
因此,对于普通随机数,这绝对比 numpy
快.而且,如果我们对此很聪明,那么对高斯随机数也更快:
So this is definitely faster than numpy
for ordinary random numbers. And if we're smart about it, it's faster for gaussian random numbers too:
In [5]: %timeit numpy.random.normal(size=10000)
1000 loops, best of 3: 393 µs per loop
In [6]: %timeit numpy.array(example.my_gaussian(10000))
1000 loops, best of 3: 542 µs per loop
In [7]: %timeit numpy.array(example.my_gaussian_fast(10000))
1000 loops, best of 3: 266 µs per loop
As confirmed by Robert Kern, numpy
uses both values generated. my_gaussian
throws one away; my_gaussian_fast
uses both and stores them quickly. (See this answer's history for a naive my_gaussian_pair
that tries to return the pair in a slow way.)