`numpy.mean` 与元组一起用作 `axis` 参数:不适用于掩码数组

问题描述:

我有一个简单的 3D 数组 a1,以及它的屏蔽模拟 a2:

I have one simple 3D array a1, and its masked analog a2:

import numpy

a1 = numpy.array([[[ 0.00,  0.00,  0.00],
                   [ 0.88,  0.80,  0.78],
                   [ 0.75,  0.78,  0.77]],

                  [[ 0.00,  0.00,  0.00],
                   [ 3.29,  3.29,  3.30],
                   [ 3.27,  3.27,  3.26]],

                  [[ 0.00,  0.00,  0.00],
                   [ 0.41,  0.42,  0.40],
                   [ 0.42,  0.43,  0.41]]])


a2 = numpy.ma.masked_equal(a1, 0.)

我想一次沿几个轴执行此数组的平均值(这是 numpy.meanaxis 参数的一种特殊的、未记录的使用,参见例如此处举例):

I want to perform the mean of this array along several axes at a time (this is a peculiar, undocumented use of axis argument in numpy.mean, see e.g. here for an example):

numpy.mean(a1, axis=(0, 1))

这对 a1 工作正常,但我在使用掩码数组 a2 时出现以下错误:

This is working fine with a1, but I get the following error with the masked array a2:

TypeError: tuple indices must be integers, not tuple

并且我在使用掩码版本 numpy.ma.mean(a2, axis=(0, 1)) 时遇到相同的错误,或者如果我通过 a2[a2.mask]=0.

And I get the same error with the masked version numpy.ma.mean(a2, axis=(0, 1)), or if I unmask the array through a2[a2.mask]=0.

我在 numpy.mean 中为 axis 参数使用元组,因为它实际上不是硬编码的(此命令应用于具有潜在不同维数的数组,元组根据哪个进行调整).

I am using a tuple for the axis argument in numpy.mean as it is actually not hardcoded (this command is applied on arrays with potenially different number of dimensions, according to which the tuple is adapted).

numpy 版本 1.9.11.9.2 遇到的问题.

Problem encountered with numpy version 1.9.1 and 1.9.2.

对于 MaskedArray 参数,numpy.mean 调用 MaskedArray.mean,它不支持元组 axis 参数.您可以通过根据支持 axis 元组的操作重新实现 MaskedArray.mean 来获得正确的行为:

For a MaskedArray argument, numpy.mean calls MaskedArray.mean, which doesn't support a tuple axis argument. You can get the correct behavior by reimplementing MaskedArray.mean in terms of operations that do support tuples for axis:

def mean(a, axis=None):
    if a.mask is numpy.ma.nomask:
        return super(numpy.ma.MaskedArray, a).mean(axis=axis)

    counts = numpy.logical_not(a.mask).sum(axis=axis)
    if counts.shape:
        sums = a.filled(0).sum(axis=axis)
        mask = (counts == 0)
        return numpy.ma.MaskedArray(data=sums * 1. / counts, mask=mask, copy=False)
    elif counts:
        # Return scalar, not array
        return a.filled(0).sum(axis=axis) * 1. / counts
    else:
        # Masked scalar
        return numpy.ma.masked

或者,如果您愿意依赖 MaskedArray.sum 使用元组 axis(您可能是这样,因为您使用的是numpy.mean),

or, if you're willing to rely on MaskedArray.sum working with a tuple axis (which you likely are, given that you're using undocumented behavior of numpy.mean),

def mean(a, axis=None):
    if a.mask is numpy.ma.nomask:
        return super(numpy.ma.MaskedArray, a).mean(axis=axis)

    sums = a2.sum(axis=axis)
    counts = numpy.logical_not(a.mask).sum(axis=axis)
    result = sums * 1. / counts

我们依靠 MaskedArray.sum 来处理掩码.

where we're relying on MaskedArray.sum to handle the mask.

我只是简单地测试了这些功能;在使用它们之前,请确保它们确实有效,并编写一些测试.例如,如果输出是0维的并且没有掩码值,那么输出是0D MaskedArray还是标量取决于输入掩码是nomask还是全为False的数组.这与默认的 MaskedArray.mean 行为相同,但它可能不是您想要的;我怀疑默认行为是一个错误.

I have only lightly tested these functions; before using them, make sure they actually work, and write some tests. For example, if the output is 0-dimensional and there are no masked values, whether the output is a 0D MaskedArray or a scalar depends on whether the input mask is nomask or an array of all False. This is the same as the default MaskedArray.mean behavior, but it may not be what you want; I suspect the default behavior is a bug.