大量选择以产生具有所有唯一值的2D数组
所以我想知道在使用np.random.choice
生成二维数组时,是否存在一种更有效的解决方案,其中每一行都有唯一的值.
So I am wondering if there's a more efficient solution in generating a 2-D array using np.random.choice
where each row has unique values.
例如,对于形状为(3,4)
的数组,我们期望输出为:
For example, for an array with shape (3,4)
, we expect an output of:
# Expected output given a shape (3,4)
array([[0, 1, 3, 2],
[2, 3, 1, 0],
[1, 3, 2, 0]])
这意味着每行的值在列数方面必须是唯一的.因此,对于out
中的每一行,整数应仅介于0到3之间.
This means that the values for each row must be unique with respect to the number of columns. So for each row in out
, the integers should only fall between 0 to 3.
我知道我可以通过将False
传递给replace
参数来实现.但是我只能为每一行而不是整个矩阵执行此操作.例如,我可以这样做:
I know that I can achieve it by passing False
to the replace
argument. But I can only do it for each row and not for the whole matrix. For instance, I can do this:
>>> np.random.choice(4, size=(1,4), replace=False)
array([[0,2,3,1]])
但是当我尝试这样做时:
But when I try to do this:
>>> np.random.choice(4, size=(3,4), replace=False)
我收到这样的错误:
File "<stdin>", line 1, in <module>
File "mtrand.pyx", line 1150, in mtrand.RandomState.choice
(numpy\random\mtrand\mtrand.c:18113)
ValueError: Cannot take a larger sample than population when
'replace=False'
我认为这是因为由于矩阵的大小,它试图绘制3 x 4 = 12
样本而不进行替换,但我只给出了4
的限制.
I assume it's because it's trying to draw 3 x 4 = 12
samples due to the size of the matrix without replacement but I'm only giving it a limit of 4
.
我知道我可以使用for-loop
来解决它:
I know that I can solve it by using a for-loop
:
>>> a = (np.random.choice(4,size=4,replace=False) for _ in range(3))
>>> np.vstack(a)
array([[3, 1, 2, 0],
[1, 2, 0, 3],
[2, 0, 3, 1]])
但是我想知道是否有不使用任何for循环的解决方法? (我有点假设,如果行数大于1000,添加for循环可能会使速度变慢.但是正如您所看到的,我实际上是在a
中创建一个生成器,因此我也不确定是否毕竟有效果.)
But I wanted to know if there's a workaround without using any for-loops? (I'm kinda assuming that adding for-loops might make it slower if I have a number of rows greater than 1000. But as you can see I am actually creating a generator in a
so I'm also not sure if it has an effect after all.)
我经常使用的一个技巧是生成一个随机数组,并使用argsort
获取唯一索引作为所需的唯一数字.因此,我们可以做-
One trick I have used often is generating a random array and using argsort
to get unique indices as the required unique numbers. Thus, we could do -
def random_choice_noreplace(m,n, axis=-1):
# m, n are the number of rows, cols of output
return np.random.rand(m,n).argsort(axis=axis)
样品运行-
In [98]: random_choice_noreplace(3,7)
Out[98]:
array([[0, 4, 3, 2, 6, 5, 1],
[5, 1, 4, 6, 0, 2, 3],
[6, 1, 0, 4, 5, 3, 2]])
In [99]: random_choice_noreplace(5,7, axis=0) # unique nums along cols
Out[99]:
array([[0, 2, 4, 4, 1, 0, 2],
[1, 4, 3, 2, 4, 1, 3],
[3, 1, 1, 3, 2, 3, 0],
[2, 3, 0, 0, 0, 2, 4],
[4, 0, 2, 1, 3, 4, 1]])
运行时测试-
# Original approach
def loopy_app(m,n):
a = (np.random.choice(n,size=n,replace=False) for _ in range(m))
return np.vstack(a)
时间-
In [108]: %timeit loopy_app(1000,100)
10 loops, best of 3: 20.6 ms per loop
In [109]: %timeit random_choice_noreplace(1000,100)
100 loops, best of 3: 3.66 ms per loop