# [parsec-users] fluidanimate Cells2 and cnumPars2 bug?

Jim Dempsey jim at quickthreadprogramming.com
Fri Feb 26 10:47:04 EST 2010

```Chris,

There are two problems with your solution. (incomplete test, and wrong place
for test)

Although the placement of that test will work for one specific cell paring,
it will not protect anainst alternate cell pairing, and you are performing
more work than necessary. Assume cells in 4x4x4 arangement (0:3, 0:3, 0:3)
in Fortran notation. A face of the cube looking like

/   /   /   /   /|...
+---+---+---+---+ |
|030|031|032|033| |
+---+---+---+---+/|
|020|021|022|023| |
+---+---+---+---+/|
|010|011|012|013| |
+---+---+---+---+/|
|000|001|002|003| |
+---+---+---+---+/

These are cells, not particles. Each cell can contain contain arbitrary
particles (with associated particle numbers/address)
Assume current cell of focus is 012. GetNeighborCells returns

001, 002, 003, 011, 012, 013, 021, 022, 023, (101, ...)

Your ComputeForces (code snip therefrom in your email below) will at one
point reach a pairing of cells

011,012

(recall that the current cell of focus is 012) the pairing is with the
neighboring cell immediately to the left.
Then the particles within that cell pairing will be used within
ComputeForces.

At some point later, the cell of focus will be 011,  GetNeighborCells
returns

000, 001, 002, 010, 011, 012, 020, 021, 022,(100, ...)

Then compute forces will eventually reach a cell pairing of

011, 012

Same pairing as seen earlier when focus was on cell 012. Now ComputeForces
will recompute for a cell pairing previously completed.

The solution for this is for GetNeighborCells to only return neighboring
cells who's cell address (index) is <= to current cell of focus adderess
(index).

Then your "if(&neigh.p[iparNeigh] < &cell.p[ipar])" test will work but it
may be more work than necessary. What I think will work more efficently is
to provide for the collection of cells (from GetNeighborCells) a virtual 1D
indexing capability. Then the permuations can go

for(i=0; i<nTotalParticlesInCollectionOfNeighbors; ++i)
for(j=0; j<nTotalParticlesInCollectionOfNeighbors; ++j)
{...

And the "if(&neigh.p[iparNeigh] < &cell.p[ipar])" test can be eliminated and
fewer loop iterations are required.

P.S. if you incorporate my ideas and if they are new to you, please
attribute.

Jim Dempsey

_____

From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Christian
Bienia
Sent: Thursday, February 25, 2010 9:23 PM
To: 'PARSEC Users'
Subject: Re: [parsec-users] fluidanimate Cells2 and cnumPars2 bug?

Hi Jim,

Now I get what you mean. Sounds good, just want to point out that the
current version of the program avoids the redundant computations of the
density by comparing the memory address of the particle data:

if(&neigh.p[iparNeigh] < &cell.p[ipar])   ç This eliminates redundant
computation

{

float distSq = (cell.p[ipar] - neigh.p[iparNeigh]).GetLengthSq();

if(distSq < hSq)

{

float t = hSq - distSq;

float tc = t*t*t;

cell.density[ipar] += tc;

neigh.density[iparNeigh] += tc;

}

}

Best,

Chris

From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Jim
Dempsey
Sent: Thursday, February 25, 2010 9:18 PM
To: 'PARSEC Users'
Subject: Re: [parsec-users] fluidanimate Cells2 and cnumPars2 bug?

See inclosed comments

_____

From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Christian
Bienia
Sent: Thursday, February 25, 2010 4:17 PM
To: 'PARSEC Users'
Subject: Re: [parsec-users] fluidanimate Cells2 and cnumPars2 bug?

Hi Jim,

That sounds exciting. I’d be interested in hearing more about your insights.
We’re always looking for ways to improve the benchmarks. Here are a few
quick things I’d like to point out:

·         Intel used a simple linked list of particles in an early version
of the program, but the performance was disappointing. I don’t have any
numbers on that though.
[JD] The Particles in my implimentation is an array of Particles. These
Particle objects do have a link pointer, the purpose of which is to form a
small collection of the particles within a Cell. The Cell containing the
head pointer. In your revised method (current PARSEC) each cell maintained a
table of particles [16] and when particles migrated they were (supposed to
be) copied from one cell to another (but weren't, were reset). Although the
copying phase was not performed in the code. The lack of the copy may have
produced a misleading assumption that the revised method (current PARSEC)
gave superior performance.

·         Please keep in mind that increasing the kernel radius h is not
necessarily desirable. Setting h to infinity would give you perfectly
realistic fluid behavior, but for a program like fluidanimate which computes
fluid movements for an animation all we need is fluid that behaves “good
enough” from the point of view of a human observer. You will get diminishing
returns as you increase h.
[JD] I addressed this in my prior email. The purpose of the (current PARSEC)
cell is to reduce the number of candidate cells yet produce some level of
fidelity. The design of the current PARSEC cell is such that the majority of
the particles in the cell will interact (those within distance of h of each
other). The cell size could exceed the size of the interaction radius h.
Example cell size = 2h would contain 4x the number of particles. Yet the
calculations can continue to test if(distSq < hSq) and produce the same
answer at the expense of taking the false path around the if statement that
qualifies the within h distance. The benefit of using larger cell size (but
not larger h) comes with the reduction of  the number of cells. The expense
is additional calculations. The cost savings is in the reduction of
shuffleing particles from cell to cell. There should be a sweet spot for
cell size.

Notes, the current PARSEC calculations are redundantly recalculating
AND redundantly accumulating density and acceleration. e.g. assume Cell X
has 26 neighbors Xn1, Xn2,... Xn26. Assume for this example Cell X is
processed first in ComputeForces. When Cell X computation is complete Cell X
will contain the accumulated densities and forces amongst the particles
within Cell X and Cell Xn1, Xn2, ... Xn26. AND cells Xn1, Xn2, ... Xn26 have
accumulated the forces amongst themselves as well as with Cell X. Next
ComputeForces advances to the next cell in the system. At some point in time
the cell formerly known as Xn1 will be processed, call it Y. Yn26 could
indeed  be the former cell X. Now cell X and Y interact again.

The GetNeighborCells must be modified, or some other technique
employed, such that the same pair of neighboring cells are only processed
once.

·         In any case the computational complexity of the program grows
significantly as you increase h, which will limit the kernel radius in
practice. Increasing h causes a cubic increase of the volume in which
particle interactions are considered, and computing particle interactions is
a quadratic in the number of particles because you need all pair-wise
interactions.
[JD] see prior discussion.

·         The cell sizes are chosen such that they match exactly the kernel
radius, which means you can get a list of all particles that might interact
with a particle in the current cell simply by looking up all the particles
in neighboring cells. This is the acceleration structure I refer to in the
technical report, its only purpose is to allow us to cut away uninteresting
particles as fast as possible, so you probably want to keep that
relationship between the cell size and the kernel radius to avoid additional
checks of the distances between particles. If you find a better way to do
that then please let me know.
[JD] I will begin with the cell size = h and then experiment with increasing
the size.

It should be noted that this is a relatively simple design but could serve
as the basis for expansion. For example, adding charge to the particle (and
mass and radius, etc...). When adding charge then you will likely require a
different h. Possibly this can be accomidated in a get neighbors to take an
argument as to a multiplier of h (using same cell size). e.g. current is
within 1 h of current cell. New feature within n * h of current cell.

Jim

Best,

Chris

From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Jim
Dempsey
Sent: Thursday, February 25, 2010 4:18 PM
To: 'PARSEC Users'
Subject: Re: [parsec-users] fluidanimate Cells2 and cnumPars2 bug?

Chris,

I am in the process of writing my own implementation of fluidanimate for use
with QuickThread. I am taking generous liberties with the design. My current
design (work in progress) is having each Cell use a simple linked list of
Particles (new data object) as opposed to an array of particles. This linked
list needs management only when a particle migrates from one cell to
another. To reduce the frequency of migration I can play with the magnitude
of h (length of cell cube dimension). However in doing so, increase of h
will increase the number of computations in the density calculation (larger
number of failures on if(distSq < hSq). Some trade-off analysis will have to
be run.

The GetNeighborCells has been replaced with a new struct () who's ctor
builds a table of neighboring particles (not cells). For this table of
neighboring particles I too settled on a linked list of arrays for the
particle neighbors (1024 particles / node with the first node on the stack
in the base object). I have been toying with the idea that in addition to
the Cell p (position as x, y, z) that the integer quantized position ix, iy,
iz (currently in units of h relative to base of domain) can be carried in
the Cell object as well. With this addition I can increase the h of the cell
but quantize the ix, iy, iz to what was in old h (kernelRadiansMultiplier /
restParticlesPerMeter). IOW a unit with a high probability of adjacent
positions interacting. Then with this installed, GetNeighboringParticles
could collect only the current cells particles plus the particles in the
adjacent cells at the shared wall(corner). The integer test can be performed
without doing the distance test. GetNeighboringParticles will effectively
collect the current cell's particles and the particles in the adjacent cells
that are 1 quantized unit away from the cell. Using this subset, the
quantized positions can be used to quickly determine if the particles are
close enough to interact, if so then the floating point determination can be
made, and if at that point interaction is warranted, the values will be
updated. With the floating point calculations as fast as they are now, I
will have to run some test to see if the time required to avoid code exceeds
the time required to execute the code.

I also can eliminate locks with appropriate non-locked thread progress
detection. I've done this before with an n-Body interaction. The
fluidanimate is similar execpt for the Cell concept use to produce subsets
of particles that have a high probability of interaction. In the n-Body, all
bodies interact.

I will look at the website for docs on ferret.

Jim Dempsey

_____

From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Christian
Bienia
Sent: Thursday, February 25, 2010 2:31 PM
To: 'PARSEC Users'
Subject: Re: [parsec-users] fluidanimate Cells2 and cnumPars2 bug?

Hi Jim,

On fluidanimate:

We chose to implement cells as a “linked list of arrays”. That way it’s
about as flexible as a list but still nearly as efficient as an array
because the cost of pointer chasing is amortized over many particles. We
also wrote our own specialized memory management system that is
thread-private to cut down on allocation / freeing cost and nearly
completely eliminate additional synchronization in the system libraries. We
would prefer not to publish that before the next official PARSEC release.

On ferret:

Great job porting it to Windows x64! That sounds like it was a fight. You
can read the high-level description I wrote up in the technical report that
we publish on the PARSEC website. That should give you at least a rough idea
on what the program does. The ferret documentation you’re referring to must
be the one in the doc/manual subdirectory of the source? I gotta say I never
worked with it because the authors of the program sit in my office. We can
certainly add some pdf / html files. What exactly is it that you think we
should add?

- Chris

From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Jim
Dempsey
Sent: Wednesday, February 24, 2010 4:56 PM
To: 'PARSEC Users'
Subject: Re: [parsec-users] fluidanimate Cells2 and cnumPars2 bug?

Hi Chris,

Thanks for the rapid response. I was aware of the hard coded limit of 16
particles per cell could be a potential problem. The initialization code
with evenly distributed particles, would result in ~8 particles per cell (on
the 500K sim with 4 cores). So unless presure or density exceeds 2x of
initial value in any given cell there would have been no problem. For a
relatively static gas it might have been well enough behaved to not
experience an overflow.

One solution would be to use expandable vectors but you might want to hand
code this as opposed to using std::vector (or tbb's concurrent vector) since
use each of those vectors might preclude use of SSE vector instructions on
those vectors.

A second solution would be adjustable cells. i.e. when a given cell requires
more than its current max number of particles the size of the cell changes
particle capacity. As long as the particle capacity changes were infrequent
the extra work work required to resize the particle capacity might not be
all that bad.

Using the expandable vectors would (could) be problematic if, for example,
you have a high pressure wave propigate throught the cavity. In this
situation you would require excessive amounts of memory. It would seem like
the way to resolve this would be to permit the vectors to reduce in size as
well as expand. You would like to avoid paging if at all possible.

When might we expect the revised version? Could you post the revised
"serial" version of the code as alpha or beta and then post the others as
they mature? Each version is a single file and requires no change in build
procedure (other than for addition of visualization)

Off topic

I was porting ferret to Windows x64. The port was rather difficult in that I
also had to port GSL, LSH and Magick (plus Pthreads). The program runs, but
I haven't a clue as to what it is doing. The manual to ferret is not on the
system as a .PDF or HTML file, however it can be built using texi2dvi.
Unfortunately on Windows TeX and texi2dvi are not present. Could these be
included in the tools folder?

Jim Dempsey

_____

From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Christian
Bienia
Sent: Wednesday, February 24, 2010 1:50 PM
To: 'PARSEC Users'
Subject: Re: [parsec-users] fluidanimate Cells2 and cnumPars2 bug?

Hi Jim,

There is indeed a bug in fluidanimate that makes the program recompute the
first frame over and over again, as you and Milo pointed out correctly.
However, the issue is a little more complicated than simply swapping the
pointers to the cell arrays because if the fluid state is simulated for more
frames then the number of particles per cell can exceed the hard-coded limit
of 16 particles per cell that are currently supported by fluidanimate. (That
is the reason for the crash that Milo reported.)

We fixed the issues and have a significantly improved version of the program
that will be released with the next version of PARSEC. The new version also
includes some more features such as the possibility to visualize the fluid.

Best,

Chris

From: parsec-users-bounces at lists.cs.princeton.edu
[mailto:parsec-users-bounces at lists.cs.princeton.edu] On Behalf Of Jim
Dempsey
Sent: Wednesday, February 24, 2010 1:38 PM
To: 'PARSEC Users'
Subject: [parsec-users] fluidanimate Cells2 and cnumPars2 bug?

I have been looking at fluidanimate with an eye towards reducing the number
of (or eliminating) critical sections. In performing this investigation I've
noticed two issues. These issues had been reported earlier by Milo Martin
(https://lists.cs.princeton.edu/pipermail/parsec-users/2008-August/000193.ht
ml)

He mentioned his fix wasn't correct, but he may have subsequently corrected
the problem.

The Cells2 and cnumPars2 are built at file load time and never changed
afterwards.

As Milo pointed out, RebuildGrid is essentially performing a Reset Grid to
initial values. IOW no advancement is occurring (particles are "vibrating"
so to speak).

>From my cursory look at the code, it seems like the author's intentions were
for Cells2 and cnumPars2 to contain the prior state  (i.e. read-in initial
state and from integration step 2 onwards, the post AdvanceParticles state),
and then to be used by RebuildGrid for preparing the grid for the next
integration step.

To fix this (which I haven't done as of this writing) either

AdvanceParticles has to advance Cells into Cells2 (and copy new cnumPars
to cnumPars2)

or

AdvanceFrame needs to add a new function CopyCellsAndcnumPars() following
AdvanceParticles()

Not only were the initial states (p, v, hv) not updated but also the
proximity pairs counts in cnumPars2 was not updated so the local particle
density could not change.

After I examine this a bit further, I will try to report back my findings.
(bug/noBug, if bug - bug fix)

It might be an interesting diagnostic to display the particles during
integration using a relatively small set of particles. Coding errors
(assuming this is an error) would quite easily be identified. Although this
might not catch all errors in coding the fluid dynamic formulas.

Jim Dempsey

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/parsec-users/attachments/20100226/67b48eb1/attachment-0001.htm>
```

More information about the parsec-users mailing list