Joe Papac

CS240A Homework 0

A parallelized, adaptive algorithm for multiphase flows in general geometries

Application

Fluid flow problems involving multiple phases can be very difficult and expensive to compute. Multi-phase flows have a sharp interface where density and viscosity can be discontinuous and surface tension forces can be significant.

This paper presents a parallelized algorithm for the solution of multiphase fluid flows with sharp boundaries. The fluid motion is computed using the coupled level set/volume-of-fluid (CLSVOF) method.

The parallel platform

Computations were performed on an IBM eServer pSeries 690 supercomputer running AIX at Florida State University. It is made up of 512 CPUs, which are IBM Power4 chips with a clock speed of 1.1 gigaHertz. The processors are arranged into 16 nodes. Each node is a tightly bound set of 32 processors. 15 of these nodes are set up so that their processors have access to 32 gigabytes of memory; the remaining node has 24 gigabytes. Each node also has 72 gigabytes of local disk storage. Execution on multiple processors on a single node may be done using OpenMP or MPI. Execution on multiple nodes may be done using MPI.

While any arrangement of processors can be used in parallel, one common arrangement is to have an OpenMP program running on a node, with the 32 processors sharing the entire memory. Another arrangement uses MPI, in which case processors on different nodes can cooperate, but even if the processors share the same node, they divide up the node's memory rather than sharing it (this seems to be the approach used by the author in this work). A more elaborate hybrid programming scheme allows MPI to set up several processes, assigning a single process to each node. On each node, OpenMP divides up the task among all the processors.

Programming tools and software

This work used the parallel boxlib library developed by the CCSE group at Lawrence Berkeley National Laboratories. The boxlib library is designed to be used together with MPI.

The numerical algorithm uses adaptive mesh refinement and a multigrid method. In the grid generation step, the maximum size of each grid is determined by the number of processors available.

Numerical examples

3d wobbling bubble

In this problem, an air bubble is rising in silicon oil. The numerical results exhibit the expected unsteady, wobbling behavior and the rise velocity has good agreement with experiments. Adaptive mesh refinement was not used so that the effects of parallelism can be studied independently.

The computational grid is 64x64x64 and Table 1 shows the speed-up for 2, 4, 8, and 16 processors. The results show good speedup until the 16 processor case. For the 16 processor case, the ratio between ghost cells and interior cells is 0.66. Two-thirds of the data must be transfered from another processor.

Remarks

The author claims that he achieves parallel speed-up up to about 80%. This amounts to an overall time savings of days for large 3d simulations. He remarks that the parallel speedup is sensitive to the ratio of "ghost cells" to interior cells on each processor. The "boxlib" library provides routines for initializing "ghost cells" of grids with data that may exist on other processors. The need to introduce ghost cells is an indicator that the problem is spacially local in nature.

Based on the author's comments, moving to a parallel architecture is a worthwhile thing to do, however we can only expect modest performance gains because of the tightly coupled problem. The dependance of speed-up to the number of ghost cells suggests that the he is not able to use anything close to the maximum capacity of the supercomputer because the problem is not highly parallelizable. While the problem/numerical algorithm is not an ideal candidate for parallel computing, it still can have modest benefits.

While it may seem a bit unnecessary to move to parallel computation for the simulation of a single bubble, it will become quite necessary for larger, more realistic multiphase flows. The key challenge will be to minimize the transfer of data between nodes in order to take advantage of more processors. This requires finding more parallelism in the numerical algorithm.

References

1. Sussman, M., A parallelized, adaptive algorithm for multiphase flows in general geometries, Computers and Structures, 83 (2005) 435-444.

2. Boxlib website: http://seesar.lbl.gov/ccse/Software/index.html

3. FSU Eclipse supercomputer technical description: http://www.scs.fsu.edu/hpc/sp4_specs.php