[parsec-users] Fluid animate

Christian Bienia cbienia at CS.Princeton.EDU
Thu Jun 10 13:27:34 EDT 2010


Hey Jim,

 

Interesting numbers! Do you have an idea why the program doesn't achieve
higher speedups? It should scale fairly well with such a big input. Did you
use any form of thread pinning?

 

- Chris

 

 

From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Jim
Dempsey
Sent: Thursday, June 10, 2010 12:35 PM
To: 'PARSEC Users'
Subject: [parsec-users] Fluid animate

 

Chris and others:

 

I got some time on a Dell R610 with dual Intel Xeon 5570 processors.

The readers of this mailing list might find it of interest.

 

Results from running fluidanimate using in_500K.fluid with 100 iterations

Runtimes using QuickThread threading toolkit:

 

Threads

1  Total time spent in ROI:         92.494s  1.0000x
2  Total time spent in ROI:         48.265s  1.9164x
3  Total time spent in ROI:         35.771s  2.5857x
4  Total time spent in ROI:         28.770s  3.2149x
5  Total time spent in ROI:         23.912s  3.8681x
6  Total time spent in ROI:         21.912s  4.2212x
7  Total time spent in ROI:         20.918s  4.4217x
8  Total time spent in ROI:         18.428s  5.0192x
9  Total time spent in ROI:         18.897s  4.8946x * note 1
10 Total time spent in ROI:         18.396s  5.0279x
11 Total time spent in ROI:         18.002s  5.1380x
12 Total time spent in ROI:         17.991s  5.1411x
13 Total time spent in ROI:         17.946s  5.1540x
14 Total time spent in ROI:         16.071s  5.7553x
15 Total time spent in ROI:         16.057s  5.7604x
16 Total time spent in ROI:         14.398s  6.4241x
17 Total time spent in ROI:         41.042s  2.2536x ** note 2
18 Total time spent in ROI:        553.489s  0.1671x ** note 3

 

Each processor has 4 cores with HyperThreading

Total of 8 cores and 16 hardware threads

fluidanimate is a floating point and memory access intensive application.

 

Note 1:

On this configuration, QuickThread distributes work to cores first, then
back fills to HyperThread siblings second.

Result being fairly steady slope from 1 thread to 8 threads (full set of
cores) then shallower slope as the HT threads are filled in.

 

Note 2:

At 17 threads we have oversubscription of threads. Note the adverse effect
on cache.

 

Note 3:

At 18 threads, the adverse effect on cache appears to be exponential.

Additional run data would provide some insight as would profiling.

 

The above results were from one set of test runs on a remote system.

IOW I could not verify no other activity was present on the system.

 

Jim Dempsey


 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/parsec-users/attachments/20100610/7005698b/attachment-0001.html>


More information about the parsec-users mailing list