使用C API访问NumPy数组的视图

问题描述：

在我用C ++编写的Python扩展模块中，我使用以下代码段将NumPy数组转换为

In a Python extension module I've written in C++, I use the following snippet of code to convert a NumPy array into an Armadillo array for use in the C++ portion of the code:

static arma::mat convertPyArrayToArma(PyArrayObject* pyarr, int nrows, int ncols)
{
    // Check if the dimensions are what I expect.
    if (!checkPyArrayDimensions(pyarr, nrows, ncols)) throw WrongDimensions();
    
    const std::vector<int> dims = getPyArrayDimensions(pyarr);  // Gets the dimensions using the API
    
    PyArray_Descr* reqDescr = PyArray_DescrFromType(NPY_DOUBLE);
    if (reqDescr == NULL) throw std::bad_alloc();
    
    // Convert the array to Fortran-ordering as required by Armadillo
    PyArrayObject* cleanArr = (PyArrayObject*)PyArray_FromArray(pyarr, reqDescr, 
                                                                NPY_ARRAY_FARRAY);
    if (cleanArr == NULL) throw std::bad_alloc();
    reqDescr = NULL;  // The new reference from DescrFromType was stolen by FromArray

    double* dataPtr = static_cast<double*>(PyArray_DATA(cleanArr));
    arma::mat result (dataPtr, dims[0], dims[1], true);  // this copies the data from cleanArr
    Py_DECREF(cleanArr);
    return result;
}

问题是当我通过NumPy数组的 view (即my_array[:, 3])时，它似乎无法正确处理基础C数组的步幅.根据输出，函数似乎接收到的数组pyarr实际上是完整的基本数组，而不是视图(或者至少当我使用PyArray_DATA访问数据时，我似乎正在获得指向完整的基本数组).如果我改为将此视图的副本(即my_array[:, 3].copy())传递给该函数，则它可以按预期工作，但是我不想每次都记得这样做.

The problem is that when I pass this a view of a NumPy array (i.e. my_array[:, 3]), it doesn't seem to handle the strides of the underlying C array correctly. Based on the output, it seems like the array pyarr received by the function is actually the full base array, and not the view (or at least when I access the data using PyArray_DATA, I seem to be getting a pointer to the full base array). If I instead pass this function a copy of the view (i.e. my_array[:, 3].copy()), it works as expected, but I don't want to have to remember to do that every time.

那么，有没有办法让PyArray_FromArray只复制我想要的矩阵的一部分?我尝试使用标志NPY_ARRAY_ENSURECOPY，但这没有帮助.

So, is there a way to make PyArray_FromArray copy only the slice of the matrix I want? I tried using the flag NPY_ARRAY_ENSURECOPY, but that didn't help.

如评论中所建议，这是一个完整的工作示例:

As suggested in the comments, here is a full working example:

在文件example.cpp中:

#define NPY_NO_DEPRECATED_API NPY_1_9_API_VERSION

extern "C" {
    #include <Python.h>
    #include <numpy/arrayobject.h>
}
#include <exception>
#include <cassert>
#include <string>
#include <type_traits>
#include <map>
#include <vector>
#include <armadillo>

class WrongDimensions : public std::exception
{
public:
    WrongDimensions() {}
    const char* what() const noexcept { return msg.c_str(); }

private:
    std::string msg = "The dimensions were incorrect";
};

class NotImplemented : public std::exception
{
public:
    NotImplemented() {}
    const char* what() const noexcept { return msg.c_str(); }

private:
    std::string msg = "Not implemented";
};

class BadArrayLayout : public std::exception
{
public:
    BadArrayLayout() {}
    const char* what() const noexcept { return msg.c_str(); }

private:
    std::string msg = "The matrix was not contiguous";
};

static const std::vector<npy_intp> getPyArrayDimensions(PyArrayObject* pyarr)
{
    npy_intp ndims = PyArray_NDIM(pyarr);
    npy_intp* dims = PyArray_SHAPE(pyarr);
    std::vector<npy_intp> result;
    for (int i = 0; i < ndims; i++) {
        result.push_back(dims[i]);
    }
    return result;
}

/* Checks the dimensions of the given array. Pass -1 for either dimension to say you don't
 * care what the size is in that dimension. Pass dimensions (X, 1) for a vector.
 */
static bool checkPyArrayDimensions(PyArrayObject* pyarr, const npy_intp dim0, const npy_intp dim1)
{
    const auto dims = getPyArrayDimensions(pyarr);
    assert(dims.size() <= 2 && dims.size() > 0);
    if (dims.size() == 1) {
        return (dims[0] == dim0 || dim0 == -1) && (dim1 == 1 || dim1 == -1);
    }
    else {
        return (dims[0] == dim0 || dim0 == -1) && (dims[1] == dim1 || dim1 == -1);
    }
}

template<typename outT>
static arma::Mat<outT> convertPyArrayToArma(PyArrayObject* pyarr, int nrows, int ncols)
{
    if (!checkPyArrayDimensions(pyarr, nrows, ncols)) throw WrongDimensions();

    int arrTypeCode;
    if (std::is_same<outT, uint16_t>::value) {
        arrTypeCode = NPY_UINT16;
    }
    else if (std::is_same<outT, double>::value) {
        arrTypeCode = NPY_DOUBLE;
    }
    else {
        throw NotImplemented();
    }

    const auto dims = getPyArrayDimensions(pyarr);
    if (dims.size() == 1) {
        outT* dataPtr = static_cast<outT*>(PyArray_DATA(pyarr));
        return arma::Col<outT>(dataPtr, dims[0], true);
    }
    else {
        PyArray_Descr* reqDescr = PyArray_DescrFromType(arrTypeCode);
        if (reqDescr == NULL) throw std::bad_alloc();
        PyArrayObject* cleanArr = (PyArrayObject*)PyArray_FromArray(pyarr, reqDescr, NPY_ARRAY_FARRAY);
        if (cleanArr == NULL) throw std::bad_alloc();
        reqDescr = NULL;  // The new reference from DescrFromType was stolen by FromArray

        outT* dataPtr = static_cast<outT*>(PyArray_DATA(cleanArr));
        arma::Mat<outT> result (dataPtr, dims[0], dims[1], true);  // this copies the data from cleanArr
        Py_DECREF(cleanArr);
        return result;
    }
}

static PyObject* convertArmaToPyArray(const arma::mat& matrix)
{
    npy_intp ndim = matrix.is_colvec() ? 1 : 2;
    npy_intp nRows = static_cast<npy_intp>(matrix.n_rows);  // NOTE: This narrows the integer
    npy_intp nCols = static_cast<npy_intp>(matrix.n_cols);
    npy_intp dims[2] = {nRows, nCols};

    PyObject* result = PyArray_SimpleNew(ndim, dims, NPY_DOUBLE);
    if (result == NULL) throw std::bad_alloc();

    double* resultDataPtr = static_cast<double*>(PyArray_DATA((PyArrayObject*)result));
    for (int i = 0; i < nRows; i++) {
        for (int j = 0; j < nCols; j++) {
            resultDataPtr[i * nCols + j] = matrix(i, j);
        }
    }

    return result;
}

extern "C" {
    // An example function that takes a NumPy array and converts it to
    // an arma::mat and back. This should return the array unchanged.
    static PyObject* example_testFunction(PyObject* self, PyObject* args)
    {
        PyArrayObject* myArray = NULL;

        if (!PyArg_ParseTuple(args, "O!", &PyArray_Type, &myArray)) {
            return NULL;
        }

        PyObject* output = NULL;
        try {
            arma::mat myMat = convertPyArrayToArma<double>(myArray, -1, -1);
            output = convertArmaToPyArray(myMat);
        }
        catch (const std::bad_alloc&) {
            PyErr_NoMemory();
            Py_XDECREF(output);
            return NULL;
        }
        catch (const std::exception& err) {
            PyErr_SetString(PyExc_RuntimeError, err.what());
            Py_XDECREF(output);
            return NULL;
        }

        return output;
    }

    static PyMethodDef example_methods[] =
    {
        {"test_function", example_testFunction, METH_VARARGS, "A test function"},
        {NULL, NULL, 0, NULL}
    };

    static struct PyModuleDef example_module = {
       PyModuleDef_HEAD_INIT,
       "example",   /* name of module */
       NULL, /* module documentation, may be NULL */
       -1,       /* size of per-interpreter state of the module,
                    or -1 if the module keeps state in global variables. */
       example_methods
    };

    PyMODINIT_FUNC
    PyInit_example(void)
    {
        import_array();

        PyObject* m = PyModule_Create(&example_module);
        if (m == NULL) return NULL;
        return m;
    }
}

和setup.py进行编译:

from setuptools import setup, Extension
import numpy as np

example_module = Extension(
    'example',
    include_dirs=[np.get_include(), '/usr/local/include'],
    libraries=['armadillo'],
    library_dirs=['/usr/local/lib'],
    sources=['example.cpp'],
    language='c++',
    extra_compile_args=['-std=c++11', '-mmacosx-version-min=10.10'],
    )

setup(name='example',
      ext_modules=[example_module],
      )

现在假设我们有示例数组

Now assume we have the example array

a = np.array([[ 1, 2, 3, 4, 5, 6], 
              [ 7, 8, 9,10,11,12], 
              [13,14,15,16,17,18]], dtype='float64')

该函数对于像a[:, :3]这样的多维切片似乎工作正常，并且它返回的矩阵没有改变，正如我所期望的那样.但是，如果我给它一个一维切片，除非复制，否则我将得到错误的组件:

The function seems to work fine for multidimensional slices like a[:, :3], and it returns the matrix unaltered as I'd expect. But if I give it a single-dimensional slice, I get the wrong components unless I make a copy:

>>> example.test_function(a[:, 3])
array([ 4.,  5.,  6.])

>>> example.test_function(a[:, 3].copy())
array([  4.,  10.,  16.])

答

数组的视图只是同一数据数组的另一个信息包装器. Numpy不在此处复制任何数据.如果有用，仅调整用于解释数据的信息，并移动指向数据的指针.

The view of the array is just another information-wrapper for the same data array. Numpy doesn't copy any data here. Only the information for the interpretation of the data are adjusted and the pointer to the data moved if useful.

在您的代码中，您假设向量a[:, 3]的数据表示为内存中的向量，对于NPY_ARRAY_CARRAY和NPY_ARRAY_FARRAY而言，它们没有什么不同.但是，只有在创建数组本身的(fortran排序的)副本之后，您才能获得这种表示形式.

In your code you're assuming, that the data of a vector a[:, 3] is represented as a vector in memory which wouldn't differ for NPY_ARRAY_CARRAY and NPY_ARRAY_FARRAY. But this representation you only get after creating a (fortran ordered) copy of the array itself.

为使其正常工作，我对您的convertPyArrayToArma()函数进行了一些修改以创建副本，即使它是矢量也是如此:

To make it work I modified your convertPyArrayToArma() function a little to create a copy even if it's a vector:

template<typename outT>
static arma::Mat<outT> convertPyArrayToArma(PyArrayObject* pyarr, int nrows, int ncols)
{
    if (!checkPyArrayDimensions(pyarr, nrows, ncols)) throw WrongDimensions();

    int arrTypeCode;
    if (std::is_same<outT, uint16_t>::value) {
        arrTypeCode = NPY_UINT16;
    }
    else if (std::is_same<outT, double>::value) {
        arrTypeCode = NPY_DOUBLE;
    }
    else {
        throw NotImplemented();
    }

    PyArray_Descr* reqDescr = PyArray_DescrFromType(arrTypeCode);
    if (reqDescr == NULL) throw std::bad_alloc();
    PyArrayObject* cleanArr = (PyArrayObject*)PyArray_FromArray(pyarr, reqDescr, NPY_ARRAY_FARRAY);
    if (cleanArr == NULL) throw std::bad_alloc();
    reqDescr = NULL;  // The new reference from DescrFromType was stolen by FromArray

    const auto dims = getPyArrayDimensions(pyarr);
    outT* dataPtr = static_cast<outT*>(PyArray_DATA(cleanArr));

    // this copies the data from cleanArr
    arma::Mat<outT> result;
    if (dims.size() == 1) {
        result = arma::Col<outT>(dataPtr, dims[0], true);
    }
    else {
        result = arma::Mat<outT>(dataPtr, dims[0], dims[1], true);
    }

    Py_DECREF(cleanArr);
    return result;
}

使用C API访问NumPy数组的视图

相关推荐