Galaxies

The mpC application simulating the evolution of a system of stars in a galaxy

Description of the problem

Let us consider an irregular application simulating the evolution of a system of stars in a galaxy (or set of galaxies) under the influence of Newtonian gravitational attraction.
Let our system consist of a number of large groups of bodies. It is known, that since the magnitude of interaction between bodies falls off rapidly with distance, the effect of a large group of bodies may be approximated by a single equivalent body, if the group is far enough away from the point at which the effect is being evaluated. Let it be true in our case. So we can parallelize the problem, and our application will use a few virtual processors, each of which updates data characterizing a single group of bodies. Each virtual processor holds attributes of all the bodies constituting the corresponding group as well as masses and centers of gravity of other groups. The attributes characterizing a body include its position, velocity and mass.
   Finally, let our application allow both the number of groups and the number of bodies in each group to be defined in run time.

Algorithm and program in mpC
The application implements the following scheme:

Initializing the galaxy on the virtual host-processor
Creation of the network
Scattering groups over virtual processors
Parallel computing the masses of groups
Interchanging the masses of among virtual processors
while (1) {
   Visualization of the galaxy on the virtual host-processor
   Parallel computation of centers of gravity of groups
   Interchanging the centers among virtual processors
   Parallel updating groups
   Gathering groups on the virtual host-processor
}

The corresponding mpC program looks as follows:

#define DELTA 3600.0
#define INTERVAL 3

/* The maximum number of groups */
#define MaxGs 30

/* The maximum number of bodies in a group */
#define MaxBs 60

typedef double Triplet[3];
typedef struct {Triplet pos; Triplet v; double m;} Body;

/* The number of groups */
int [host]M;

/* The numbers of bodies in groups */
int [host]N[MaxGs];

int repl dM, dN[MaxGs];

/* The galaxy timer */
double [host]t;

/* Bodies of a galaxy */
Body (*[
host]Galaxy[MaxGs])[MaxBs];

nettype GalaxyNet(m,n[m]) {
   
coord I=m;
   
node {I>=0: fast*n[I] scalar;};
   
link (J=m) {

      J>0: length*(-1) [J]->[0];
      J>0: length*1 [I]->[J];
   }
}

void [host] Input(), UpdateGroup();
void [host] VisualizeGalaxy();

void [*]Nbody(char *[host]infile) {
   /* Initializing Galaxy, M and N */
   Input(infile);

   /* Broadcasting the numbers of bodies in groups */
   dN[] = N[];

   {
      
net GalaxyNet(dM,dN) g;
      
int [g]myN, [g]mycoord;
      Body [g]Group[MaxBs];
      Triplet [g]Centers[MaxGs];
      
double [g] Masses[MaxGs];
      
int repl [g]i;

      void [net GalaxyNet(m,n[m])] Mintegrity(double (*)[MaxGs]);
      
void [net GalaxyNet(m,n[m])] Cintegrity(Triplet (*)[MaxGs]);

      mycoord = I coordof body_count;
      myN = dN[mycoord];

      /* Scattering groups */
      
for(i=0;i<[g]dM;i++)
         [g:I==i]Group[] = (*Galaxy[i])[];

      for(i=0;i<myN;i++)
         Masses[mycoord]+=Group[i].m;

      ([([g]dM,[g]dN)g])Mintegrity(Masses);

      while(1) {
         
if(((int)(t/DELTA))%INTERVAL==0)
            VisualizeGalaxy();
         Centers[mycoord][]=0.0;

         for(i=0;i<myN;i++)
            Centers[mycoord][] +=

               (Group[i].m/Masses[mycoord])*(Group[i].pos)[];
         ([([g]dM,[g]dN)g])Cintegrity(Centers);
         ([g]UpdateGroup)(Centers, Masses, Group, [g]dM);
         t+=DELTA;
         
if(((int)(t/DELTA))%INTERVAL==0)
            /* gathering groups */
            for(i=0;i<[g]dM;i++)
               (*Galaxy[i])[]=[g:I==i]Group[];
      }
   }
}

void [net GalaxyNet(m,n[m])p] Mintegrity

   (double (*Masses)[MaxGs])
{
   
double MassOfMyGroup;

   repl i, j;
   MassOfMyGroup = (*Masses)[I
coordof i];
   
for(i=0;i<m;i++)
      
for(j=0;j<m;j++)
         [p:I==i](*Masses)[j] = [p:I==j]MassOfMyGroup;
}

void [net GalaxyNet(m,n[m])p] Cintegrity
   (Triplet (*Centers)[MaxGs])
{
   Triplet MyCenter;
   
repl i, j;
   MyCenter[] = (*Centers)[I coordof i][];
   
for(i=0;i<m;i++)
      
for(j=0;j<m;j++)
         [p:I==i](*Centers)[j][] =
            [p:I==j]MyCenter[];
}

This mpC source file contains the following external definitions:

In general, a network function is called and executed on some network or subnetwork, and its value is also distributed over this region of the computing space. The header of the definition of the network function either specifies an identifier of a global static network or subnetwork, or declares an identifier of the network being a special formal parameter of the function. In the first case, the function can be called only on the specified region of the computing space. In the second case, it can be called on any network or subnetwork of a suitable type. In any case, only the network specified in the header of the function may be used in the function body. No network can be declared in the body. Only data objects belonging to the network specified in the header may be defined in the body. In addition, corresponding components of an externally-defined distributed data object may be used. Unlike basic functions, network functions (as well as nodal functions) can be called in parallel.
   Network functions
Input and VisualizeGalaxy, both associated with the virtual host-processor, as well as the nodal function UpdateGroup are declared and called here.
   Automatic network
g, executing most of computations and communications, is defined in such a way, that it consists of M virtual processors, and the relative performance of each processor is characterized by the number of bodies in the group which it computes.
   So, the more powerful is the virtual processor, the larger group of bodies it computes, and more intensive is the data transfer between two virtual processors, the shorter link connects them (length specifier
length*(-1) specifies a shorter link than length*1 does).
   The mpC programming environment bases on this information to map the virtual processors constituting network
g into the processes constituting the entire computing space in the most appropriate way. Since it does it in run time, the user does not need to recompile this mpC program, to port it to another DMM.
   The result of the binary operator
coordof (in the first statement of the inner block of the function Nbody) is an integer value distributed over g, each component of which is equal to the value of coordinate I of the virtual processor to which the component belongs. The right operand of operator coordof is not evaluated and used only to specify a region of the computing space. Note, the coordinate variable I is treated as an integer variable distributed over the region.
   Call expression
([g]UpdateGroup)(...) causes parallel execution of the nodal function UpdateGroup on each of virtual processors of network g. It is meant, that function name UpdateGroup is converted to a pointer-to-function distributed over the entire computing space, and operator [g] cuts from this pointer a pointer distributed over g. So, the value of expression [g]UpdateGroup is a pointer-to-function distributed over g. Therefore, expression ([g]UpdateGroup)(...) denotes a distributed call to a set of undistributed functions.
   Network functions
Mintegrity and Cintegrity have 3 special formal parameters. Network parameter p denotes the network executing the function. Parameter m is treated as a replicated over p integer variable, and parameter n treated as a pointer to the initial member of integer unmodifiable m-member array replicated over p. The syntactic construct ([(dM, dN)g]), placed on the left of the name of the function called in the call expressions in function Nbody, just specifies the actual arguments corresponding to the special formal parameters.

Experimental results

We compared the running time of our mpC program to the carefully written MPI counterpart. We used 3 workstations - SPARCstation 5 (hostname gamma), SPARCclassic (omega), and SPARCstation 20 (alpha), connected via 10Mbits Ethernet. There were 23 other computers in the same segment of the local network. We used LAM MPI version 5.2 as a particular communication platform.
   The computing space of the mpC programming environment consists of 15 processes, 5 processes running on each workstation. The dispatcher runs on gamma and uses the following relative performances of the workstations obtained automatically upon the creation of the virtual parallel machine: 1150 (gamma), 331 (omega), 1662 (alpha).
   The MPI program is written in such a way to minimize communication overheads. All our experiments deal with 9 groups of bodies. We map 3 MPI processes to gamma, 1 process to omega, and 5 processes to alpha, providing the optimal mapping if the numbers of bodies in these groups are equal to each other.
   The first experiment compares the mpC and MPI programs for homogeneous input data when all groups consist of the same number of bodies. Figure 1 shows the running time of both programs simulating 15 hours of the galaxy evolution depending on the number of bodies in groups.
   In fact, it shows how much we pay for the usage of mpC instead of the MPI. One can see that the running time of the MPI program consists about 95-97% of the running time of the mpC program. That is, in this case we loose 3-5% of performance.

The second experiment compares these programs for heterogeneous input data. Let out groups consist of 10, 10, 10, 100, 100, 100, 600, 600, and 600 bodies correspondingly.
The running time of the mpC program does not depend on the order of numbers. In any case, the dispatcher selects:

The mpC program takes 94 seconds to simulate 15 hours of the galaxy evolution.
The running time of the MPI program essentially depends on the order of these numbers. It takes from 88 to 391 seconds to simulate 15 hours of the galaxy evolution depending on the particular order Figure 2 shows the relative running time of the MPI and mpC programs for different permutations of these numbers. All possible permutations can be broken down into 24 disjoint subsets of the same power in such a way in such a way that if two permutations belong to the same subset, corresponding running time is equal to each other. Let these subsets be numerated so that the greater number the subset has, the longer time the MPI program takes. In Figure 2, each such a subset is represented by a bar, the height of which is equal to the corresponding value of tMPI/ tmpC.
   One can see that almost for all input data the running time of the MPI program exceeds (and often, essentially) the running time of the mpC program.


Back to The mpC Program Examples