When I run the downloaded code for this example shown on page 131 the test fails.
My understanding is that
a) when a child grid is launched, its view of global memory is consistent with its parent thread only, and
b) if __synchthreads is called before the the child thread is launched, the child threads view of global memory, will be consisttent with all threads in the block its parent is from.
In this example threads in the child grid access locations which are updated by threads in different blocks to its parent. Does the test fail because their is no way to synch threads from different blocks, in this example? That is, in the following code
idata[threadIdx.x] += idata[threadIdx.x + iStride];
thread 0 in block 1 on the first recursion, will try to access memory location 512 and 640. However their is no guarantee that thread 0 in block 1 from the previous recursion will have updated memory location 512 before this happens.