[parsec-users] Fluid animate
cbienia at CS.Princeton.EDU
Thu Jun 10 13:27:34 EDT 2010
Interesting numbers! Do you have an idea why the program doesn't achieve
higher speedups? It should scale fairly well with such a big input. Did you
use any form of thread pinning?
From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Jim
Sent: Thursday, June 10, 2010 12:35 PM
To: 'PARSEC Users'
Subject: [parsec-users] Fluid animate
Chris and others:
I got some time on a Dell R610 with dual Intel Xeon 5570 processors.
The readers of this mailing list might find it of interest.
Results from running fluidanimate using in_500K.fluid with 100 iterations
Runtimes using QuickThread threading toolkit:
1 Total time spent in ROI: 92.494s 1.0000x
2 Total time spent in ROI: 48.265s 1.9164x
3 Total time spent in ROI: 35.771s 2.5857x
4 Total time spent in ROI: 28.770s 3.2149x
5 Total time spent in ROI: 23.912s 3.8681x
6 Total time spent in ROI: 21.912s 4.2212x
7 Total time spent in ROI: 20.918s 4.4217x
8 Total time spent in ROI: 18.428s 5.0192x
9 Total time spent in ROI: 18.897s 4.8946x * note 1
10 Total time spent in ROI: 18.396s 5.0279x
11 Total time spent in ROI: 18.002s 5.1380x
12 Total time spent in ROI: 17.991s 5.1411x
13 Total time spent in ROI: 17.946s 5.1540x
14 Total time spent in ROI: 16.071s 5.7553x
15 Total time spent in ROI: 16.057s 5.7604x
16 Total time spent in ROI: 14.398s 6.4241x
17 Total time spent in ROI: 41.042s 2.2536x ** note 2
18 Total time spent in ROI: 553.489s 0.1671x ** note 3
Each processor has 4 cores with HyperThreading
Total of 8 cores and 16 hardware threads
fluidanimate is a floating point and memory access intensive application.
On this configuration, QuickThread distributes work to cores first, then
back fills to HyperThread siblings second.
Result being fairly steady slope from 1 thread to 8 threads (full set of
cores) then shallower slope as the HT threads are filled in.
At 17 threads we have oversubscription of threads. Note the adverse effect
At 18 threads, the adverse effect on cache appears to be exponential.
Additional run data would provide some insight as would profiling.
The above results were from one set of test runs on a remote system.
IOW I could not verify no other activity was present on the system.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the parsec-users