OPENCV:CUDA上下文初始化为不同的方法

问题描述:

我正在开发一个简单的c ++程序来评估一些Opencv GPU方法(cv :: cuda)的性能。
我在Unbuntu 15(使用CUDA 7.5)和GeForce 770上使用Opencv 3.1。

I'm working on a simple c++ program to evaluate the performance of some Opencv GPU methods (cv::cuda). I am using Opencv 3.1 on Unbuntu 15 (with CUDA 7.5) with a GeForce 770.

我以前读过,我们需要初始化CUDA环境,过程在第一次调用。所以,我用cv :: cuda :: getDevice()和setDevice()初始化我的程序。

I previously read that we need to initialize CUDA environnement to avoid slow process at first call. So, I initialize my program with a cv::cuda::getDevice() and setDevice().

然后,我测试2个方法:
- cv :: cuda :: resize()(factor 0.5)
- 和cv :: cuda :: meanStdDev。

Then, I test 2 methods : - cv::cuda::resize() (factor 0.5) - and cv::cuda::meanStdDev.

初始化需要400ms。然后,调整大小需要2或3毫秒,没关系。
但是... meanStdDev需要476ms!
如果我运行两个连续的meanStdDev,第二个更快(3ms)。

Initialization takes 400ms. Then, resizing takes 2 or 3 ms, that's OK. But... meanStdDev takes 476ms !!! If I run 2 successive meanStdDev, the second one is much faster (3ms).

我真的不明白为什么初始化对调整大小()但不是meanStdDev()...

I really don't understand why the initialization has an effect on resize() but not on meanStdDev()...

我使用-DCUDA_ARCH_BIN = 3.0编译OPENCV。我尝试用-DCUDA_ARCH_PTX =但问题仍然是一样的。

I compile OPENCV with -DCUDA_ARCH_BIN=3.0. I try with -DCUDA_ARCH_PTX="" but the problem is still the same.

感谢您的帮助。

Pierre。

#include <opencv2/opencv.hpp>
#include <opencv2/cudaimgproc.hpp>
#include "opencv2/cudawarping.hpp"
#include "opencv2/cudaarithm.hpp"

using namespace std;

int main(int argc, char *argv[]) 
{

    double t_init_cuda = (double)cv::getTickCount();
    int CudaDevice;
    if(cv::cuda::getCudaEnabledDeviceCount()==0)
    {
        cerr<<endl<<"ERROR: NO CudaEnabledDevice"<<endl;
        exit(2);
    }
    else
    {
        CudaDevice = cv::cuda::getDevice();
        cv::cuda::setDevice(CudaDevice);
    }
    t_init_cuda = ((double)cv::getTickCount() - t_init_cuda)/cv::getTickFrequency() * 1000;
    cout<<endl<<"\t*T_INIT_CUDA="<<t_init_cuda<<"ms\n";;

    cv::Mat src = cv::imread(argv[1], 0);
    if (!src.data) exit(1);
    cv::cuda::GpuMat d_src(src);


    //CV::CUDA::RESIZE
    cv::cuda::GpuMat d_dst;
    double factor = 0.5;
    double t_gpu_resize = cv::getTickCount();
    cv::cuda::resize(d_src, d_dst, cv::Size( (int) ((float) (d_src.cols)*factor) , (int) ((float) (d_src.rows)*factor)), 0, 0, CV_INTER_AREA);
    t_gpu_resize = ((double)cv::getTickCount() - t_gpu_resize)/cv::getTickFrequency() * 1000;
    cout<<endl<<"D_SRC="<<d_src.rows<<"x"<<d_src.cols<<" => D_DST="<<d_dst.rows<<"x"<<d_dst.cols<<endl;
    cout<<endl<<"\t*T_GPU_RESIZE="<<t_gpu_resize<<"ms\n";;

    //CV::CUDA::MEANSTDDEV
    double t_meanstddev = (double)cv::getTickCount();
    cv::Scalar mean, stddev;
    std::vector<cv::cuda::GpuMat> d_src_split;
    cv::cuda::split(d_src, d_src_split);
    cv::cuda::meanStdDev (d_src_split[0], mean, stddev); 
    t_meanstddev = ((double)cv::getTickCount() - t_meanstddev)/cv::getTickFrequency() * 1000.0;
    cout<<endl<<"mean="<<mean.val[0]<<" | stddev="<<stddev.val[0]<<endl;    
    cout<<endl<<"\t*T_GPU_MEANSTDDEV="<<t_meanstddev<<"ms\n";

    return 0;
}


我的朋友,相同的函数两次:

My friend, When you call same function twice :

1-首次在设备上分配新内存以调整大小。 根据WIKI的 OpenCV

1- First time you allocate new memory at Device for resized. "According to WIKI of OpenCV"

2-第二次重新使用分配的内存,所以速度会很快。

2- Second time you reuse allocated memory so it will be fast.

我从OpenCV中获得了这个函数,为什么它说。

I get that function from OpenCV for you so you can understand why it said that.

void cv::cuda::meanStdDev(InputArray _src, OutputArray _dst, Stream& stream)
{
    if (!deviceSupports(FEATURE_SET_COMPUTE_13))
        CV_Error(cv::Error::StsNotImplemented, "Not sufficient compute capebility");

    const GpuMat src = getInputMat(_src, stream);

    CV_Assert( src.type() == CV_8UC1 );

    GpuMat dst = getOutputMat(_dst, 1, 2, CV_64FC1, stream);

    NppiSize sz;
    sz.width  = src.cols;
    sz.height = src.rows;

    int bufSize;
#if (CUDA_VERSION <= 4020)
    nppSafeCall( nppiMeanStdDev8uC1RGetBufferHostSize(sz, &bufSize) );
#else
    nppSafeCall( nppiMeanStdDevGetBufferHostSize_8u_C1R(sz, &bufSize) );
#endif

    BufferPool pool(stream);
    GpuMat buf = pool.getBuffer(1, bufSize, CV_8UC1); // <--- this line create new GpuMat

    NppStreamHandler h(StreamAccessor::getStream(stream));

    nppSafeCall( nppiMean_StdDev_8u_C1R(src.ptr<Npp8u>(), static_cast<int>(src.step), sz, buf.ptr<Npp8u>(), dst.ptr<Npp64f>(), dst.ptr<Npp64f>() + 1) );

    syncOutput(dst, _dst, stream);
}

此功能

GpuMat cv::cuda::BufferPool::getBuffer(int rows, int cols, int type)
{
    GpuMat buf(allocator_);
    buf.create(rows, cols, type);
    return buf;
}

我希望这会帮助你。