MTLBuffer分配+ CPU/GPU同步

问题描述:

我正在使用金属性能着色器(MPSImageHistogram)在MTLBuffer中计算要抓取的内容,执行计算,然后通过MTKView显示.着色器的MTLBuffer输出很小(〜4K字节).因此,我为每个渲染通道分配一个新的MTLBuffer对象,并且每个视频帧每秒至少有30个渲染.

I am using a metal performance shader(MPSImageHistogram) to compute something in an MTLBuffer that I grab, perform computations, and then display via MTKView. The MTLBuffer output from the shader is small (~4K bytes). So I am allocating a new MTLBuffer object for every render pass, and there are atleast 30 renders per second for every video frame.

calculation = MPSImageHistogram(device: device, histogramInfo: &histogramInfo)
let bufferLength = calculation.histogramSize(forSourceFormat: MTLPixelFormat.bgra8Unorm)
let buffer = device.makeBuffer(length: bufferLength, options: .storageModeShared)
let commandBuffer = commandQueue?.makeCommandBuffer()

calculation.encode(to: commandBuffer!, sourceTexture: metalTexture!, histogram: buffer!, histogramOffset: 0)
commandBuffer?.commit()

commandBuffer?.addCompletedHandler({ (cmdBuffer) in
    let dataPtr = buffer!.contents().assumingMemoryBound(to: UInt32.self)
    ...
    ...

}

我的问题-

  1. 每次使用device.makeBuffer(..)都可以创建一个新的缓冲区是可以的,或者更好地静态分配 几个缓冲区并实现重用那些缓冲区?如果重用更好,我们该如何在这些缓冲区上同步CPU/GPU数据的读/写?

  1. Is it okay to make a new buffer every time using device.makeBuffer(..), or better to statically allocate few buffers and implement reuse those buffers? If reuse is better, what do we do for synchronizing CPU/GPU data write/read on these buffers?

另一个不相关的问题,可以在非主线程上绘制MTKView结果吗?还是MTKView绘制只能在主线程中(即使我读到Metal是真正的多线程)?

Another unrelated question, is it okay to draw in MTKView the results on a non-main thread? Or MTKView draws must only be in main thread (even though I read Metal is truly multithreaded)?

  1. 分配有些昂贵,因此我建议使用可重用的缓冲区方案.我执行此操作的首选方法是保留一个可变的缓冲区数组(队列),在使用该缓冲区的命令缓冲区完成时(或在您的情况下,在CPU上读回结果之后)将一个缓冲区加入队列,并进行分配当队列为空并且您需要对更多工作进行编码时,将创建一个新的缓冲区.在稳定状态下,假设帧及时完成,则该方案很少会分配总共2-3个以上的缓冲区.如果您需要此方案具有线程安全性,则可以使用互斥锁(用dispatch_semaphore实现)来保护对队列的访问.

  1. Allocations are somewhat expensive, so I'd recommend a reusable buffer scheme. My preferred way to do this is to keep a mutable array (queue) of buffers, enqueuing a buffer when the command buffer that used it completes (or in your case, after you've read back the results on the CPU), and allocating a new buffer when the queue is empty and you need to encode more work. In the steady state, you'll find that this scheme will rarely allocate more than 2-3 buffers total, assuming your frames are completing in a timely fashion. If you need this scheme to be thread-safe, you can protect access to the queue with a mutex (implemented with a dispatch_semaphore).

只要遵循标准的多线程预防措施,就可以使用另一个线程对绘制成MTKView出售的可绘制对象的渲染工作进行编码.请记住,虽然命令队列是线程安全的(从某种意义上讲,您可以同时从同一队列创建并编码到多个命令缓冲区),但命令缓冲区本身却不是编码器.我建议您分析单线程的情况,仅在绝对必要时才介绍多线程的复杂性.

You can use another thread to encode rendering work that draws into a drawable vended by an MTKView, as long as you follow standard multithreading precautions. Remember that while command queues are thread-safe (in the sense that you can create and encode to multiple command buffers from the same queue concurrently), command buffers themselves and encoders are not. I'd advise you to profile the single-threaded case and only introduce the complication of multi-threading if/when absolutely necessary.