[parsec-users] Porting Bodytrack on GP-GPUs -- Problems and Issues

Jim Dempsey jim at quickthreadprogramming.com
Tue Aug 9 09:03:44 EDT 2011

Does your GPGPU support double precision?
nVidia GTX570 and similar cards do.
ATI FireStream and other high end cards do.
And nVidia Tesla does.
On GPGPU double precision tends to be slower than single precision
especially in the trig functions. But it may be faster than transporting the
data out, calc, transporting data in. 
Note, If you parallel pipeline the stages of CPU and GPGPU the transport
time is overlapped with other calculation time and may not affect the
throughput of the pipeline (although latency will be affected).
I haven't looked at the calculation part of the code, but I am aware that
there are some "tricks" you can do to effectively increase the precision. An
example of this is:
1) Move the observation point from where you measure the arc angle between
particles such that the angle differences tend to be large.
2) When performing calculations containing both large numbers and small
numbers see if you can apply a bias to the large number(s) such that it
becomes small-ish. This avoids drop-off of precision bits in the binary
mantissa. Example
      result = func(large, small)
     result = func(large-bias, small) + bias
The above is over simplified but illustrative of the process
Jim Dempsey


From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of aftab
Sent: Tuesday, August 09, 2011 2:16 AM
To: parsec-users at lists.cs.princeton.edu
Subject: [parsec-users] Porting Bodytrack on GP-GPUs -- Problems and Issues

Dear All, 
             I am trying to port Bodytrack application to GP-GPUs as my MS
thesis. I have a working code but my tracking results are screwed.
When I further investigated the code I found that the difference in sin/cos
calculations on CPU and GPU are messing things up.
For some particles the difference (error uptill 6th-7th decimal place) in
sin/cos calculations gets accumulated in later stages
(Body Geometry calculations, projection calculations, Error term
calculations). In the edge error term calculations I get one extra 
sample point due to which the error weight gets changed and the final
normalized weight for that particular particle is different 
upto 4th decimal place (a lot of error). And this is in the Initialization
stage of the particle filter (weight calculation).

This in turn produces error for the next iterations because in the particle
generation stage for the next iteration, a wrong particle is 
selected which further introduces error and finally the estimate for a frame
is very different from the CPU estimate.

I have the following stages implemented on GPU because these are the most
compute intensive stages of the application.

1- Body Geometry
2- Projection Calculation
3- Error Terms (Inside Error Term, Edge Error Term)

When I move the sin/cos calculation to CPU, the improvement in execution
time I get on the GPU stages in screwed up by the particle generation 
stage because I have to arrange (copy from CPU data structure to GPU data
structure plus sin/cos calculation) the data structure suitable for GPU 
implementation that gives speed up in the execution. The overall application
speed up is not very interesting due to this problem.

Can any help me in this issue because my Thesis is stuck due to this


Best Regards

Aftab Hussain
Research Assistant,
High Performance Computing Lab,
NUST School of Electrical Engineering and Computer Science

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/parsec-users/attachments/20110809/117c7426/attachment.html>

More information about the parsec-users mailing list