[parsec-users] Porting Bodytrack on GP-GPUs -- Problems and Issues
msinclair at wisc.edu
Tue Aug 9 08:30:26 EDT 2011
What version of sine and cosine are you using for your GPU kernels?
Are you using the native ones? Because those are less precise than
the slower, non-native ones. So, if you're using the native ones,
even though it will hurt performance, you might try them and see if
they solve your issue. Also, there was a talk @ GTC 2010 that dealt
with the imprecision of the sin/cos functions in CUDA and how they
affected some astronomy calculations, and how they got around them. I
can send a link to it if you think that would be helpful.
Also, what version of CUDA are you using (I'm assuming you're using
CUDA?)? If you're using 4.0+, then you might be able to look into
their overlapping memory transfers, which would alleviate some of the
performance bottlenecks you're seeing. If you're using OpenCL, are
you setting the memory transferring to be blocking or non-blocking?
I've done quite a bit of work myself on porting the PARSEC benchmarks
to GPUs, and I thought bodytrack was a pretty tough one to easily port
(just because of how it's written, and the fact that there's so much
code), so good for you to have made this much progress! What are your
plans on releasing it eventually?
2011/8/9 aftab hussain <aftab.hussain at seecs.edu.pk>:
> Dear All,
> I am trying to port Bodytrack application to GP-GPUs as my MS
> thesis. I have a working code but my tracking results are screwed.
> When I further investigated the code I found that the difference in sin/cos
> calculations on CPU and GPU are messing things up.
> For some particles the difference (error uptill 6th-7th decimal place) in
> sin/cos calculations gets accumulated in later stages
> (Body Geometry calculations, projection calculations, Error term
> calculations). In the edge error term calculations I get one extra
> sample point due to which the error weight gets changed and the final
> normalized weight for that particular particle is different
> upto 4th decimal place (a lot of error). And this is in the Initialization
> stage of the particle filter (weight calculation).
> This in turn produces error for the next iterations because in the particle
> generation stage for the next iteration, a wrong particle is
> selected which further introduces error and finally the estimate for a frame
> is very different from the CPU estimate.
> I have the following stages implemented on GPU because these are the most
> compute intensive stages of the application.
> 1- Body Geometry
> 2- Projection Calculation
> 3- Error Terms (Inside Error Term, Edge Error Term)
> When I move the sin/cos calculation to CPU, the improvement in execution
> time I get on the GPU stages in screwed up by the particle generation
> stage because I have to arrange (copy from CPU data structure to GPU data
> structure plus sin/cos calculation) the data structure suitable for GPU
> implementation that gives speed up in the execution. The overall application
> speed up is not very interesting due to this problem.
> Can any help me in this issue because my Thesis is stuck due to this
> Best Regards
> Aftab Hussain
> Research Assistant,
> High Performance Computing Lab,
> NUST School of Electrical Engineering and Computer Science
> parsec-users mailing list
> parsec-users at lists.cs.princeton.edu
More information about the parsec-users