[parsec-users] performance improvement for fluidanimate

Fedorova, Julia Julia.Fedorova at intel.com
Tue Jun 29 09:06:11 EDT 2010


Chris, 

Here are patches for the pthreads and serial version. 

If you have newer version - I will be also interested to look to it from performance perspective :) 
(of course if it is complete and you ready to share it) 


Patch for tbb version would be similar. But I am hesitant to suggest it - as tbb version seems has data races. 

This could be seen if run tbb version several times and compare output results with same number of iterations - they will differ in some low digits from time to time. 

Also - our (intel) new Inspector tool complains about data races. 

So I will look more into it. 


Regards, Julia

-----Original Message-----
From: parsec-users-bounces at lists.cs.princeton.edu [mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Christian A Bienia
Sent: Monday, June 28, 2010 7:05 PM
To: PARSEC Users
Subject: Re: [parsec-users] performance improvement for fluidanimate

Hi Julia,

I'm very interested in all patches, including your optimization for fluidanimate. :) I rewrote the program for the next version of PARSEC, so your change might not apply anymore. Please send me a patch and I'll take a look.

Best,
Chris



----- Ursprüngliche Mail -----
Von: "Julia Fedorova" <Julia.Fedorova at intel.com>
An: "PARSEC Users" <parsec-users at lists.cs.princeton.edu>
Gesendet: Montag, 28. Juni 2010 07:14:55 GMT +01:00 Amsterdam/Berlin/Bern/Rom/Stockholm/Wien
Betreff: [parsec-users] performance improvement for fluidanimate





Hi Chris, all 



I have some performance improvement for fluidanimante application. 



They are some small code changes in ComputeForces (in case of serial version) and in 

ComputeForcesMT & ComputeDensitiesMT functions (in pthreads version) 

that lead to ~ 18% speed up as measured for serial and pthreads, native input with 500 frames on Xeon Core i7 single socket platform. 



Gcc-serial (indeed 1 thread): 

Initial - 358.6 sec 

Optimized - 293.4 sec 



gcc-pthreads, 4 threads: 

Initial - 109.7 sec 

Optimized - 89 sec 





It is related to excessive branch misprediction cost in the code and putting additional "if" upper by the code flow gives visible performance boost. 



Although above numbers are for the intel platform - I believe the changes will bring benefit for other platforms/CPU also - as they minimize wasted work that created by branch misprediction. 



Is there interest to such changes? 

I will send the patches then. 



Thanks 



Regards, Julia Fedorova 

Intel 

--------------------------------------------------------------------
Closed Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park, 
17 Krylatskaya Str., Bldg 4, Moscow 121614, 
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies. 
_______________________________________________
parsec-users mailing list
parsec-users at lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/parsec-users
_______________________________________________
parsec-users mailing list
parsec-users at lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/parsec-users

--------------------------------------------------------------------
Closed Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park, 
17 Krylatskaya Str., Bldg 4, Moscow 121614, 
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch_serial
Type: application/octet-stream
Size: 997 bytes
Desc: patch_serial
URL: <http://lists.cs.princeton.edu/pipermail/parsec-users/attachments/20100629/4fdb6af3/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch_pthreads
Type: application/octet-stream
Size: 995 bytes
Desc: patch_pthreads
URL: <http://lists.cs.princeton.edu/pipermail/parsec-users/attachments/20100629/4fdb6af3/attachment-0001.obj>


More information about the parsec-users mailing list