How to use async_work_group_copy in OpenCL? -


I understand how to use async_work_group_copy () calls in OpenCL to see a simplified example:

  __Color Zero Test (__Global Float * x) {__local xcopy [GROUP_SIZE]; Int globalid = get_global_id (0); Int localid = get_local_id (0); Event_t e = async_work_group_copy (xcopy, x + globalid-localid, GROUP_SIZE, 0); Wait_group_events (1, & amp; e); }   

The reference says, "Copy an async copy of the num_elements gentype elements of the DCT from SQL. Async copy is done by all the works in a work-group and the underlying function is therefore All work-items should execute the kernel with the same logic values ​​in a workgroup; otherwise the results are undefined. "

But this does not clarify my questions ...

I would like to know, if not The following assumptions are correct:

  1. async_work_group_copy () should be executed by all work items in the call group.
  2. A call must be in a way that the source address is identical to all work items and points to the first element of the memory area to be copied.
  3. As my source address is relative, the global work-item ID of the first work-item in the work-group, therefore, I have to subtract the local ID for the same address for all the work items ... < / Li>
  4. Is the third parameter actually the number of elements (not the size in the bytes)?

    Bonus Question:

    a Can I just use the hindrance (CLK_LOCAL_MEM_FENCE) instead of wait_group_events () and ignore the return value? If so, will it possibly be faster?

    b Is local copy also understood for processing on CPU or is it overhead, as they share some cache?

    Regards, Stephen

    text "itemprop =" text ">

    One of the main reasons for this function is to present an assumption about the driver / kernel compiler about the hardware Allows you to efficiently copy the memory without the developer.

    You describe that the memory you need to copy is such that it is a single-threaded copy, and async_work_group_copy Sub parallel hardware

    For your specific questions:

    1. I've never seen async_work_group_copy used in group only by some work items I always believed this because it is necessary that I think the blocked nature of wait_group_events forces all the work items to become part of the copy.

    2. Yes source (and Destination) Address should be same for all work items

    3. You can deduct your local ID to get the correct address, but I guess that comes to solve this problem as well as locate the group ID. (Get_group_id)

    4. Yes the last is the number of elements, not in size bytes.

      a No. Event-based you will know that your handicap from the work items is killed almost immediately, and the data will not necessarily be copied. This makes sense because some opencl hardware can not even use calculated units to perform real copy operations.

      b I think the CPU can guarantee the use of OpenCL llation cache when you use local memory. The only way to get to know is to make sure your app is benchmarked with different settings to perform better.

Comments

Popular posts from this blog

excel vba - How to delete Solver(SOLVER.XLAM) code -

c# - Add Image in a stackpanel based on textbox input -

java - Reaching JTextField in a DocumentListener -