array_view.synchronize_asynch将等待parallel_for_each完成吗?

问题描述:

如果我在 concurrency :: parallel_for_each 循环中对 concurrency :: array_view 进行操作,我的理解是我可以在执行循环时继续在CPU上执行其他任务:

If I have a concurrency::array_view being operated on in a concurrency::parallel_for_each loop, my understanding is that I can continue other tasks on the CPU while the loop is executing:

using namespace Concurrency;

array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
  // do some intense computations on av
}

// do some stuff on the CPU while we wait

av.synchronize(); // wait for the parallel_for_each loop to finish and copy the data

但是如果我不想不等待进行并行for循环,而是开始从GPU复制数据

But what if I want to not wait for the parallel for loop but start copying data back from the GPU as soon as possible. Will the following work?

using namespace Concurrency;

array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
  // do some intense computations on av
}

// I know that we won't be waiting to synch when I call this, but will we be waiting here
// until the data is available on the GPU end to START copying?
completion_future waitOnThis = av.synchronize_asynch();

// will this line execute before parallel_for_each has finished processing, or only once it
// has finished processing an the data from "av" has started copying back?

completion_future.wait();

我在飞蛾,但是在阅读了以下内容后,我实际上并不明智:

I read about this topic on The Moth, but after reading the following I'm not really any wiser:


请注意,parallel_for_each的执行似乎与调用代码
同步,但实际上是异步的。即一旦进行了
parallel_for_each调用并将内核传递给
运行时,some_code_B区域将继续由
CPU线程立即执行,而并行地,内核由GPU执行
个线程。但是,如果您尝试访问在some_code_B区域中的lambda中捕获的(数组或array_view)数据
,则代码
将阻塞,直到结果可用为止。因此,正确的
语句是:parallel_for_each就
可见的副作用而言是同步的,但实际上是异步的。

Please note that the parallel_for_each executes as if synchronous to the calling code, but in reality, it is asynchronous. I.e. once the parallel_for_each call is made and the kernel has been passed to the runtime, the some_code_B region continues to execute immediately by the CPU thread, while in parallel the kernel is executed by the GPU threads. However, if you try to access the (array or array_view) data that you captured in the lambda in the some_code_B region, your code will block until the results become available. Hence the correct statement: the parallel_for_each is as-if synchronous in terms of visible side-effects, but asynchronous in reality.


我不喜欢这种解释的方式。考虑它的更好方法是 parallel_for_each 队列可用于GPU,因此它几乎立即返回。在排队工作完成之前,CPU端代码可以通过多种方式阻止,例如,显式调用 synchronize 或从 parallel_for_each

I don't like the way this has been explained. A better way to think about it is that the parallel_for_each queues work to the GPU, so it returns almost immediately. There are numerous ways that your CPU-side code can block until the queued work is complete, for example, explicitly calling synchronize, or accessing data from one of the array_view instances used within the parallel_for_each

using namespace concurrency;

array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
  // Queue (or schedule if you like) some intense computations on av
}

主机代码现在可以执行。AMP计算可能已经开始,也可能未开始。此处的代码访问 av ,它将阻塞直到GPU上的工作完成并且 av 中的数据已被写入

Host code can execute now. The AMP computations may or may not have started. If the code here accesses av then it will block until the work on the GPU is complete and the data in av has been written and can be synchronized with the host memory.

这是一个未来,因此它也是一个计划任务,不能保证
在任何特定时间执行。如果已调度,它将阻塞正在运行的线程,直到 av 与主机内存正确同步(如上所述)。

This is a future so it is also a scheduled task. It is not guaranteed to execute at any particular point. Should it be scheduled then it will block the thread it is running on until av is correctly synchronized with the host memory (as above).

completion_future waitOnThis = av.synchronize_asynch();

更多主机代码可以在此处执行。如果主机代码访问 av ,它将阻塞直到 parallel_for_each 已 完成(如上所述)。在某个时候,运行时将执行future并阻塞,直到 av 与主机内存同步为止。如果可写且已更改,则它将被复制回主机内存。

More host code can execute here. If the host code accesses av then it will block until the parallel_for_each has completed (as above). At some point the runtime will execute the future and block until av has synchronized with the host memory. If it is writable and has been changed then it will be copied back to the host memory.

completion_future.wait();

wait 的调用将被阻止,直到未来已经完成(在调用 wait 之前,不能保证任何东西实际上已经执行了)。此时,可以保证GPU计算已经完成,并且可以在CPU上访问 av

The call to wait will block until the future has completed (prior to calling wait there is no guarantee that anything has actually executed). At this point you are guaranteed that the GPU calculations are complete and that av can be accessed on the CPU.

说了这么多,添加 waitOnThis 的未来似乎是使事情复杂化了。

Having said all that adding the waitOnThis future seems to be over complicating matters.

array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
  // do some intense computations on av on the GPU
}

// do some independent CPU computation here.

av.synchronize();

// do some computation on the CPU that relies on av here.

MSDN文档在该主题上不是很好。以下博客文章更好。在同一博客上,有关异步API的其他文章也是如此。

The MSDN docs aren't very good on this topic. The following blog post is better. There are some other posts on the async APIs on the same blog.