Wrox Programmer Forums
|
BOOK: Professional Professional CUDA C Programming
This is the forum to discuss the Wrox book Professional CUDA C Programming John Cheng, Max Grossman, Ty McKercher; ISBN: 978-1-118-73932-7
Welcome to the p2p.wrox.com Forums.

You are currently viewing the BOOK: Professional Professional CUDA C Programming section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old February 23rd, 2015, 06:09 AM
Registered User
 
Join Date: Feb 2015
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Default gpuRecursiveReduce2

When I run the downloaded code for this example shown on page 131 the test fails.

My understanding is that
a) when a child grid is launched, its view of global memory is consistent with its parent thread only, and
b) if __synchthreads is called before the the child thread is launched, the child threads view of global memory, will be consisttent with all threads in the block its parent is from.

In this example threads in the child grid access locations which are updated by threads in different blocks to its parent. Does the test fail because their is no way to synch threads from different blocks, in this example? That is, in the following code
idata[threadIdx.x] += idata[threadIdx.x + iStride];
thread 0 in block 1 on the first recursion, will try to access memory location 512 and 640. However their is no guarantee that thread 0 in block 1 from the previous recursion will have updated memory location 512 before this happens.









Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.