[parsec-users] ferret-serial

Christian Bienia cbienia at CS.Princeton.EDU
Tue Mar 30 15:59:06 EDT 2010

Hey Jim,


Good to hear. Evaluating correctness of the output files is tough. Ferret
does similarity search, and evaluating how similar two pictures are is
inherently difficult. Ferret produces a ranking as text output that you can
use to check for correctness, but you need to pick one ranking that you
simply assume to be correct.


I've had some exposure to the GSL source code, but I quickly decided to stay
away from it. Your best course of action is probably to work with the latest
version publicly available and directly submit your patches to the GSL dev
team. You can get GSL from here: http://www.gnu.org/software/gsl/ We have
all of PARSEC and the included libraries running on amd64 cpus, so there
should be no fundamental problems on 64-bit CPUs.


The LSH libraries are indeed quite old, just like a couple other pieces of
code that ferret uses. It's mostly scientific code written by grad students
and post-docs, so don't expect too much.


I'm not too sure about the command line of the benchmark programs, it seems
those were used for the development of the benchmark version of ferret. I'll
poke Wei again, he did the work.


- Chris





From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Jim
Sent: Tuesday, March 30, 2010 11:59 AM
To: 'PARSEC Users'
Subject: [parsec-users] ferret-serial




I think I have ferret-serial working now. At least it is churning away
producing an output file. As to the correctness???

Do you have a suggestion as to how to check the correctness of the output


In the archive you sent to me there is a file named runbench2 containing
command lines like


./benchmark corel lsh image 50 "-L 8 -T 20" 25 1 out1

./benchmark corel lsh image 50 "-L 8 -T 20" 25 2 out2

./benchmark corel lsh image 50 "-L 8 -T 20" 25 3 out3


In looking at the command line parsing it appears that this command file has
an extra argument ( "-L 8 -T 20" )

this assumes 1, 2, 3, ... is number of threads and out1, out2, out3, ... are
output file names


db_dir = argv[1];    (corel)

table_name = argv[2];    (lsh)

query_dir = argv[3];    (image)

top_K = atoi(argv[4]); (50)

DEPTH = atoi(argv[5]); ( "-L 8 -T 20"  ) ????????????? should this be
argv[6] ?

NTHREAD = atoi(argv[6]); (25)  ?????? should this be argv[7] ?

output_path = argv[7]; ( 1 ) ?????????? should this be argv[8]?


there is another command file named runbench that contains 3 additional
command line arguments.


Can you clarify what I should use as the correct command line arguments.


Summary of the adventure


The GSL library, although the functionality is good...

from a developer's perspective it needs some serious work.


The folder layout is fine excepting that the development folder layout is
different than the end user folder layout (with respect to include paths).

And the usage of the folder paths is inconsistent within the library. I made
changes to the GSL files here such that the path naming is consistent. The
GSL "people" may wish to change this, I have no objection to such a change
as long as the switch from developing the GSL library to test for end-user
use of GSL library requires changing only one path (not dozens as it is/was


Another assumption in GSL is that void* references an object of size of 1
byte. IOW


  void* p = somewhere;


  p = p + n; // advance p by n bytes


In C++ the object to which p points is void and has no size (IOW not even a
size of 0)


In C is is valid to have


   someType* p = mallocOrOtherFunctionReturningVoidPonter(...)


Where in C++ this requires a cast. The cast is supported in C.


Additionally, I haven't experimented with x64 builds as of this time, but I
imagine portability problems.


LSH has similar problems.


In the components that open files, fopen is being called with "r" to read
binary files. It should be calling with "rb" to read binary files. Although
"r" works on Linux, it is not a correct. Should Linux require "r" to read
binary files then the appropriate measure is to define and use a macro name
as it was done in JMEMDOS.C, although I would recommend placing it into a
common header file, CONFIG.H or an include included by config.h.


Many of the rather large macros, required for C, should be revised to use
templates of C++. With the templates, you can actually debug the code.


In some of the support libraries there is the practice of


/* foo.c bla bla bla */


#ifdef ASDF

#include "something.c"



While the above does exactily what the makefile programmer wants to
accomplish, it makes it awkward for supporting the library using an
Integrated Development Environment. e.g. Solution explorer cannot have
source files in the solution.


I realize that these libraries were written many years ago, and at the time
these quirks may have been acceptible, however today, these libraries may
need a good dusting off to be servicable using todays development tools.


Jim Dempsey


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/parsec-users/attachments/20100330/05fa6ea7/attachment.htm>

More information about the parsec-users mailing list