[parsec-users] speed-up of parsec

Christian Bienia cbienia at CS.Princeton.EDU
Fri Aug 1 14:17:00 EDT 2008


Hi Qingyuan,

 

You can use Amdahl's Law to calculate an upper bound for the speedup. If you
haven't done so yet, subtract the instructions in the initialization and
shutdown phase from the instruction count of the main thread. These are the
instructions spent in the serial parts of the program. For the parallel
phase, simply add up all instructions by the other threads. Assume perfect
load balancing and simply divide that by the number of processors. The sum
of the serial and parallel instruction count is a metric for how "fast" the
program was.

 

Obviously this metric has some limitations which we also mention in the tech
report. Basically, all timing effects are neglected. It is nevertheless
useful to measure the effect of serial sections and synchronization
overhead.

 

Best,

Chris

 

 

From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Qingyuan
Deng
Sent: Friday, August 01, 2008 2:55 AM
To: PARSEC Users
Subject: Re: [parsec-users] speed-up of parsec

 

Hi Chris,

 

Thanks a lot for your points. That makes sense!

 

Yes the main threads in Canneal keep idle during the ROI, but in Bodytrack,
there are plenty of interactions between main threads and worker threads
during that phase, which makes the caculation more interesting. I have got
the instruction counts for all the threads(main and workers) collected by
Pin. But I am a little confused here for how to do the math. I am thinking
if the execution time of main thread can be hided by those workers, so we
still don't need to consider the execution time of main thread in this case?
May I know your method of doing this?

 

And in addition, could you please also give me a hint of your method to
calculate the speed-ups of pipelined benchmark, since I think the
instruction counts differ from thread groups of different stages. So I am
wondering the longest stage can hide the execution time of shorter stages,
but there are still an "initialized" latency between neighbor stages, so the
execution time can be calculated as the sum of single execution time of a
thread in each stage?

 

Thank you!

 

Have a nice weekend,

Qingyuan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/parsec-users/attachments/20080801/fa3fe02d/attachment.html>


More information about the parsec-users mailing list