[parsec-users] Parsec 2.0, M5 Simulations, Linux Idle loop.

Rick Strong rstrong at cs.ucsd.edu
Wed Sep 16 16:06:43 EDT 2009


I have attached the pictures this time. Hopefully, they make it to the 
mailing list.

-Rick

Rick Strong wrote:
> Dear all,
>
> I am current Ph.D. student at UCSD studying computer architecture for 
> multicore systems and its interaction with the OS. My goal for last 
> half of year has been to run Parsec-2.0 on the M5 simulator for the 
> alpha ISA for a many many core  architectures.
>
> I have most of the benchmarks compiled and ready to go but I find that 
> IPC is smaller than what I would expect. The figure attached shows IPC 
> for 2 cores, 4 cores, 8 cores, 16 cores and 32 cores for a 
> hypothetical 22nm process technology running @ 3.5GHz in an 
> Out-of-Order processor modeling the Alpha EV6.  The IPC seems fine for 
> 2 cores, but as more cores are added an alarming amount of time is 
> spent in the idle loop of the linux kernel which puts the processor to 
> sleep through a quiesce instruction ... you may find the amount of 
> time spent sleeping in profile_quiesce.png that was also attached 
> (This stat is gathered in gprof like manner).  The input set that was 
> being used was simsmall and I started simulation measurement at the 
> beginning of the Region of Interest.
>
> There are many things that can be going wrong but the problem seems to 
> be related to a lack of work available to be scheduled on the idle 
> cores. Some possible causes include:
> (1) The linux scheduler has not load balanced the parallel application 
> leaving some cores unscheduled.
> (2) The threads are stalling on a barrier and the core has nothing 
> left to do.
> (3) Poor startup performance. I see this occur when I simulate the 
> benchmarks for simsmall on a x86 nehalem architecture where the 8 
> virtual cpu's never get up to 100% utilization.
>
> This introduction brings the following questions for the parsec team, 
> as I am hoping your experience and expert knowledge can direct my 
> instrumentation more effectively.
>
> (1) Have you noticed that linux scheduler load balancing takes longer 
> than the proportion of time of execution in simsmall?
>
> (2) Is there an easy way to determine that the parsec benchmark is 
> indeed scheduled and running on all available cores?
>
> (3) Does simsmall contain enough work to saturate core utilization or 
> is it just too small? If so, which sim size is optimal?
>
> (4) Are there known reasons why the parsec benchmark suite would not 
> play nice with the Alpha architecture running a linux kernel for those 
> benchmarks compiled using pthreads (I am purposely leaving out OpenMP)?
>
> (5) Is there a way to easily test the barrier stall hypothesis?
>
> Thanks in advance,
> -Richard Strong
>
>
>
>
>
> ------------------------------------------------------------------------
>
>
> ------------------------------------------------------------------------
>
>
> _______________________________________________
> parsec-users mailing list
> parsec-users at lists.cs.princeton.edu
> https://lists.cs.princeton.edu/mailman/listinfo/parsec-users
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: IPCSummary:Warning_cycle_time_is_based_on_detail_cpu0.png
Type: image/png
Size: 14896 bytes
Desc: not available
URL: <http://lists.cs.princeton.edu/pipermail/parsec-users/attachments/20090916/2d6096ec/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: profile_quiesce.png
Type: image/png
Size: 24380 bytes
Desc: not available
URL: <http://lists.cs.princeton.edu/pipermail/parsec-users/attachments/20090916/2d6096ec/attachment-0003.png>


More information about the parsec-users mailing list