[parsec-users] Dedup scalability limitations

Austin Clements aclements+parsec at csail.mit.edu
Thu Jul 28 13:46:26 EDT 2011


'Lo.  I've encountered an issue with the dedup benchmark that limits
the application's scalability.  Specifically, the Rabin fingerprint
used to divide the input splits runs of input NULL bytes into blocks
of only 32 bytes (the minimum block size), even though the average
block size is set to 4K.  This results in lock contention for the hash
bucket corresponding to these blocks that severely limits the
scalability of the benchmark.  For larger inputs, it can also result
in far more blocks than would be ideal, which both slows down
SendBlock and increases the size of the output file.

One solution is to place block boundaries when the low 12 bits of the
fingerprint are 1 (or some non-zero value), rather than when they are
0 (since runs of NULL bytes generate a zero fingerprint).  A patch to
this effect is below.  This has the effect of dividing runs of NULL
bytes into the 60K maximum block size (which is the nature of any
rolling hash applied to a constant input).  For the "native" test
input (the Fedora Core 6 ISO image), this reduces the number of
all-NULL blocks from about 200,000 to about 80.  On our 80 core
machine with jemalloc (since basically nothing scales to 80 cores with
glibc malloc), this reduces the ROI from 2.200 seconds (15X scaling)
to 1.138 seconds (28X scaling) and increases system utilization during
the ROI from 28% to 51%.

Has anybody else encountered or addressed this bottleneck?  Is this
low utilization simply considered part of the benchmark?

--- a/pkgs/kernels/dedup/src/rabin.c
+++ b/pkgs/kernels/dedup/src/rabin.c
@@ -98,7 +98,7 @@
         h = (h<<8)|p[i];
           h ^= rabintab[x];
           }
-          if((h & RabinMask) == 0)
+          if((h & RabinMask) == 1)
                 return i;
                 while(i<n){
                        x = p[i-NWINDOW];
@@ -107,7 +107,7 @@
          h <<= 8;
            h |= p[i++];
              h ^= rabintab[x];
-               if((h & RabinMask) == 0)
+                     if((h & RabinMask) == 1)
                              return i;
                              }
                              return n;


More information about the parsec-users mailing list