Galaxies
The mpC application simulating the evolution of a system of stars in a galaxy
Description of the problem
Let us consider an irregular application simulating the evolution of a system of stars in a galaxy (or set of galaxies) under the influence of Newtonian gravitational attraction.
Let our system consist of a number of large groups of bodies. It is known, that since the magnitude of interaction between bodies falls off rapidly with distance, the effect of a large group of bodies may be approximated by a single equivalent body, if the group is far enough away from the point at which the effect is being evaluated. Let it be true in our case. So we can parallelize the problem, and our application will use a few virtual processors, each of which updates data characterizing a single group of bodies. Each virtual processor holds attributes of all the bodies constituting the corresponding group as well as masses and centers of gravity of other groups. The attributes characterizing a body include its position, velocity and mass.
Finally, let our application allow both the number of groups and the number of bodies in each group to be defined in run time.
Algorithm and program in mpC
Initializing the galaxy on the virtual host-processor
Creation of the network
Scattering groups over virtual processors
Parallel computing the masses of groups
Interchanging the masses of among virtual processors
while (1) {
Visualization of the galaxy on the virtual host-processor
Parallel computation of centers of gravity of groups
Interchanging the centers among virtual processors
Parallel updating groups
Gathering groups on the virtual host-processor
}
The corresponding mpC program looks as follows:
#define
DELTA 3600.0/* The maximum number of groups */
/* The maximum number of bodies in a group */
typedef double
Triplet[3];/* The number of groups */
/* The numbers of bodies in groups */
int
repl dM, dN[MaxGs];/* The galaxy timer */
/* Bodies of a galaxy */
Body (*[
nettype
GalaxyNet(m,n[m]) { J>0: length*(-1) [J]->[0];
J>0: length*1 [I]->[J];
}
}
void
[host] Input(), UpdateGroup();void
[*]Nbody(char *[host]infile) { /* Broadcasting the numbers of bodies in groups */
dN[] = N[];
{
void [net GalaxyNet(m,n[m])] Mintegrity(double (*)[MaxGs]);
mycoord = I
coordof body_count; /* Scattering groups */
for(i=0;i<myN;i++)
([([g]dM,[g]dN)g])Mintegrity(Masses);
while(1) {
for(i=0;i<myN;i++)
(Group[i].m/Masses[mycoord])*(Group[i].pos)[];
([([g]dM,[g]dN)g])Cintegrity(Centers);
([g]UpdateGroup)(Centers, Masses, Group, [g]dM);
t+=DELTA;
void
[net GalaxyNet(m,n[m])p] Mintegrity(
double (*Masses)[MaxGs])repl i, j;
void
[net GalaxyNet(m,n[m])p] CintegrityThis mpC source file contains the following external definitions:
In general, a network function is called and executed on some network or subnetwork, and its value is also distributed over this region of the computing space. The header of the definition of the network function either specifies an identifier of a global static network or subnetwork, or declares an identifier of the network being a special formal parameter of the function. In the first case, the function can be called only on the specified region of the computing space. In the second case, it can be called on any network or subnetwork of a suitable type. In any case, only the network specified in the header of the function may be used in the function body. No network can be declared in the body. Only data objects belonging to the network specified in the header may be defined in the body. In addition, corresponding components of an externally-defined distributed data object may be used. Unlike basic functions, network functions (as well as nodal functions) can be called in parallel.
Network functions
Experimental results
We compared the running time of our mpC program to the carefully written MPI counterpart. We used 3 workstations - SPARCstation 5 (hostname gamma), SPARCclassic (omega), and SPARCstation 20 (alpha), connected via 10Mbits Ethernet. There were 23 other computers in the same segment of the local network. We used LAM MPI version 5.2 as a particular communication platform.
The computing space of the mpC programming environment consists of 15 processes, 5 processes running on each workstation. The dispatcher runs on gamma and uses the following relative performances of the workstations obtained automatically upon the creation of the virtual parallel machine: 1150 (gamma), 331 (omega), 1662 (alpha).
The MPI program is written in such a way to minimize communication overheads. All our experiments deal with 9 groups of bodies. We map 3 MPI processes to gamma, 1 process to omega, and 5 processes to alpha, providing the optimal mapping if the numbers of bodies in these groups are equal to each other.
The first experiment compares the mpC and MPI programs for homogeneous input data when all groups consist of the same number of bodies. Figure 1 shows the running time of both programs simulating 15 hours of the galaxy evolution depending on the number of bodies in groups.
In fact, it shows how much we pay for the usage of mpC instead of the MPI. One can see that the running time of the MPI program consists about 95-97% of the running time of the mpC program. That is, in this case we loose 3-5% of performance.
The second experiment compares these programs for heterogeneous input data. Let out groups consist of 10, 10, 10, 100, 100, 100, 600, 600, and 600 bodies correspondingly.
The running time of the mpC program does not depend on the order of numbers. In any case, the dispatcher selects:
The mpC program takes 94 seconds to simulate 15 hours of the galaxy evolution.
The running time of the MPI program essentially depends on the order of these numbers. It takes from 88 to 391 seconds to simulate 15 hours of the galaxy evolution depending on the particular order Figure 2 shows the relative running time of the MPI and mpC programs for different permutations of these numbers. All possible permutations can be broken down into 24 disjoint subsets of the same power in such a way in such a way that if two permutations belong to the same subset, corresponding running time is equal to each other. Let these subsets be numerated so that the greater number the subset has, the longer time the MPI program takes. In Figure 2, each such a subset is represented by a bar, the height of which is equal to the corresponding value of tMPI/ tmpC.
One can see that almost for all input data the running time of the MPI program exceeds (and often, essentially) the running time of the mpC program.
Back to
The mpC Program Examples