[parsec-users] Freqmine Question Help

Raghav Mohan rmohan2 at wisc.edu
Wed Aug 8 16:59:37 EDT 2012


I am using RHEL 6.3, GCC 4.7.0 and openMP is enabled, as if I run top alongside the program, I can see the %CPU usage by the program, which is equivalent to the threads I provide. Here is the requested output.

COMMAND: ./freqmine kosarak_990k.dat 790

OUTPUT:
NUMTHREADS: 1 the data preparation cost 0.595663 seconds, the FPgrowth cost 23.579778 seconds
NUMTHREADS: 2 the data preparation cost 0.595737 seconds, the FPgrowth cost 29.178112 seconds
NUMTHREADS: 3 the data preparation cost 0.595698 seconds, the FPgrowth cost 24.463154 seconds
NUMTHREADS: 4 the data preparation cost 0.595958 seconds, the FPgrowth cost 20.679875 seconds
NUMTHREADS: 5 the data preparation cost 0.631013 seconds, the FPgrowth cost 21.178104 seconds
NUMTHREADS: 6 the data preparation cost 0.595853 seconds, the FPgrowth cost 19.078028 seconds
NUMTHREADS: 7 the data preparation cost 0.598170 seconds, the FPgrowth cost 17.646492 seconds
NUMTHREADS: 8 the data preparation cost 0.597291 seconds, the FPgrowth cost 18.438906 seconds
NUMTHREADS: 9 the data preparation cost 0.596892 seconds, the FPgrowth cost 17.640142 seconds
NUMTHREADS: 10 the data preparation cost 0.601513 seconds, the FPgrowth cost 16.806145 seconds
NUMTHREADS: 11 the data preparation cost 0.597656 seconds, the FPgrowth cost 17.051052 seconds
NUMTHREADS: 12 the data preparation cost 0.600122 seconds, the FPgrowth cost 15.583760 seconds
NUMTHREADS: 13 the data preparation cost 0.601045 seconds, the FPgrowth cost 16.162628 seconds
NUMTHREADS: 14 the data preparation cost 0.598893 seconds, the FPgrowth cost 15.565976 seconds
NUMTHREADS: 15 the data preparation cost 0.599190 seconds, the FPgrowth cost 15.765923 seconds
NUMTHREADS: 16 the data preparation cost 0.600952 seconds, the FPgrowth cost 15.196432 seconds
NUMTHREADS: 17 the data preparation cost 0.601894 seconds, the FPgrowth cost 14.385916 seconds
NUMTHREADS: 18 the data preparation cost 0.601292 seconds, the FPgrowth cost 15.297303 seconds
NUMTHREADS: 19 the data preparation cost 0.609123 seconds, the FPgrowth cost 15.814151 seconds
NUMTHREADS: 20 the data preparation cost 0.599771 seconds, the FPgrowth cost 16.419628 seconds
NUMTHREADS: 21 the data preparation cost 0.601651 seconds, the FPgrowth cost 15.231015 seconds
NUMTHREADS: 22 the data preparation cost 0.602804 seconds, the FPgrowth cost 14.558048 seconds

So running this without the output file gives some speedup, however not to the magnitude that you attached in your email, or reported in the papers. (I would expect atleast a speedup of 4 in the best case).
I noticed that you do not provide an output file in your run. This drastically changes my results, as running this with the output file, I getCOMMAND:./freqmine kosarak_250k.dat 220 /scratch/mohan/out.txt


OUTPUT:
NUMTHREADS: 1   the data preparation cost 0.161744 seconds, the FPgrowth cost 2.922024 seconds
NUMTHREADS: 2   the data preparation cost 0.190038 seconds, the FPgrowth cost 4.751599 seconds
NUMTHREADS: 3   the data preparation cost 0.161909 seconds, the FPgrowth cost 6.822731 seconds
NUMTHREADS: 4   the data preparation cost 0.161746 seconds, the FPgrowth cost 7.654892 seconds
NUMTHREADS: 5   the data preparation cost 0.163237 seconds, the FPgrowth cost 8.025010 seconds
NUMTHREADS: 6   the data preparation cost 0.162682 seconds, the FPgrowth cost 8.104605 seconds
NUMTHREADS: 7   the data preparation cost 0.163109 seconds, the FPgrowth cost 7.985950 seconds
NUMTHREADS: 8   the data preparation cost 0.162191 seconds, the FPgrowth cost 8.088410 seconds

NUMTHREADS: 9   the data preparation cost 0.162928 seconds, the FPgrowth cost 8.148432 seconds
NUMTHREADS: 10  the data preparation cost 0.192140 seconds, the FPgrowth cost 8.509589 seconds
NUMTHREADS: 11  the data preparation cost 0.167107 seconds, the FPgrowth cost 8.685088 seconds
NUMTHREADS: 12  the data preparation cost 0.162842 seconds, the FPgrowth cost 9.417641 seconds



Looking at the code, I see that the only difference is the fact that in the last routine FP_growth , fout is NULL vs. not, however, this routine is not threaded, and does not wait for any threads to complete execution. Hence I am curious as to why this computation time increases with the number of threads. Again, I apologize if I am missing something obvious here, that I could not spot.


Thank you for your prompt responses and help,
Raghav




Also, I get different results depending on whether or not I supply an output file for the 3rd argument. Specifically,

On 08/08/12, Joseph Greathouse wrote:
> On 8/8/2012 3:04 PM, Raghav Mohan wrote:
> >Hi,
> >
> >I am trying to parallelize the Freqmine(parsec v 2.1) benchmark with my own parallel library instead of Open MP. I ran the freqmine benchmark and compared the results from the sequential to open MP version. I would expect the Open MP time to be drastically less, however, it keeps increasing by the magnitude of threads. (Essentially reverse speedup). I am running freqmine on a Hyper threaded Intel Xeon E5620 CPU. This machine has 8 cores that are hyperthreaded, giving 16 threads. Here are the sample results:
> >
> >
> >Command:
> >./freqmine kosarak_250k.dat 220 out.txt
> >
> >
> >
> >Sequential Version Result :
> >the data preparation cost 0.163102 seconds, the FPgrowth cost 2.720993 seconds
> >
> >
> >OMP Version Result (16 threads):
> >the data preparation cost 0.191582 seconds, the FPgrowth cost 9.168250 seconds
> >
> >
> >
> >
> >As one can see, the FPgrowth cost for the threaded is about 4 times more than the sequential. This is the behavior is replicated for all inputs.
> >
> >
> >I apologize if I am missing something or interpreting the results wrongly, and this is the expected behavior, however, I read the manual, and I could not find any information on this.
> >Any help provided is more than greatly appreciated.
> >
> >
> >Thank you.
> 
> Hi Raghav,
> 
> I agree with Yungang, those numbers appear strange. I've attached outputs from a few freqmine runs on a Xeon E5520 (which is a Nehalem-based core, rather than a Westmere-based core like yours, but otherwise also has 8 physical cores and 16 virtual cores). This is running on RHEL 5.8, compiled with GCC 4.1.2 (Red Hat patch 52).
> 
> As you can see, adding more threads gives a steady decrease in runtime.
> 
> What OS and compiler are you using? What environment variables are set?
> 
> Also, you showed the FPgrowth output for the serial version and your 16-thread version. Could you show us the outputs of the 2-, 4-, and 8-threaded versions as well?
> 
> -Joe
> 
> -----------------------------------------------
> 
> bash-3.2$ cd ../inst/amd64-linux.gcc-serial/bin/
> bash-3.2$ time ./freqmine ../../../inputs/webdocs_250k.dat 11000
> ...
> the data preparation cost 4.136187 seconds, the FPgrowth cost 935.675228 seconds
> 
> real 15m39.923s
> user 15m38.940s
> sys 0m0.729s
> 
> bash-3.2$ cd ../../amd64-linux.gcc-openmp/bin/
> bash-3.2$ OMP_NUM_THREADS=4
> bash-3.2$ export OMP_NUM_THREADS
> bash-3.2$ time ./freqmine ../../../inputs/webdocs_250k.dat 11000
> ...
> the data preparation cost 4.151570 seconds, the FPgrowth cost 215.969161 seconds
> 
> real 3m40.163s
> user 14m26.022s
> sys 0m0.891s
> 
> bash-3.2$ OMP_NUM_THREADS=8
> bash-3.2$ export OMP_NUM_THREADS
> bash-3.2$ time ./freqmine ../../../inputs/webdocs_250k.dat 11000
> ...
> the data preparation cost 4.094214 seconds, the FPgrowth cost 116.869059 seconds
> 
> real 2m0.972s
> user 15m21.003s
> sys 0m1.030s
> 
> bash-3.2$ OMP_NUM_THREADS=16
> bash-3.2$ export OMP_NUM_THREADS
> bash-3.2$ time ./freqmine ../../../inputs/webdocs_250k.dat 11000
> ...
> the data preparation cost 4.145387 seconds, the FPgrowth cost 92.685972 seconds
> 
> real 1m36.841s
> user 21m38.168s
> sys 0m1.801s


More information about the parsec-users mailing list