Networks of computers - hardware
As far as ten years ago, parallel computer systems were restricted mostly by so-called supercomputers - distributed memory multiprocessors (MPPs) and shared memory multiprocessors (SMPs). Parallel computing on common networks of workstations and PCs did not make sense, since it could not speed up solving most of problems because of low performance of commodity network equipment. But in the 1990s, network capacity increases surpassed processor speed increases. Up-to-date commodity network technologies, such as Fast Ethernet, ATM, Marinet, etc., enable data transfer between computers at the rate of hundreds Mbits per seconds and even Gigabits per second. This has led to the situation when not only specialized parallel computers, but also local networks of computers and even global ones could be used as parallel computer systems for high performance parallel computing.
So, networks of computers become the most common parallel architecture available, and very often more performance can be achieved by using the same set of computers utilized via up-to-date network equipment as a single distributed memory machine rather than with a new more powerful computer.
Networks of computers - software
The use of networks for parallel high performance computing is kept back only by the absence of appropriate system software. The point is that, unlike supercomputers, networks are inherently heterogeneous and consist of diverse computers of different performances interconnected via mixed network equipment providing communication links of different speeds and bandwidths. Therefore, the use of traditional parallel software, ported from (homogeneous) supercomputers, on a heterogeneous network makes the network behave as if it was a homogeneous network consisting of weakest participating computers, just because the traditional software distributes data, computations and communications not taking into account the differences in performances of processors and communication links of the network. This sharply decreases the efficiency of utilization of the performance potential of networks and results in their poor usage for high perpormance parallel computing.
Currently, main parallel programming tools for networks are MPI, PVM and HPF.
PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) are message-passing packages providing, in fact, the assembler level of parallel programming for networks of computers. The low level of their parallel primitives makes the writing of really complex and useful parallel applications in PVM/MPI tedious and error-prone. In addition, the tools are not designed to support development of adaptable parallel applications, that is applications distributing computations and communications in accordance with input data and peculiarities of the executing heterogeneous network. Of course, due to their low level, one may write a special run-time system to provide that property for his application, but such a system is usually so complicated that the necessity of its development can frighten off most of normal users.
HPF (High Performance Fortran) is a high-level parallel language originally designed for (homogeneous) supercomputers as a target architecture. Therefore, the only parallel machine visible when programming in HPF is a homogeneous multiprocessor providing very fast communications among its processors. HPF does not support neither irregular and/or uneven data distribution nor coarse-grained parallelism. A typical HPF compiler translates an HPF program into a message-passing program in PVM or MPI, and the programmer cannot exert influence on the level of balance among processes of the target message-passing program. In addition, HPF is a very difficult language to compile. Even the most advanced HPF compilers (such as ones produced by the Portland Group Inc. - the world leader in portable HPF compilers) produce target code running on homogeneous clusters of workstations in average 2-3 times slower then the corresponding MPI counterparts (report on the 1998 HPF Users Group's annual meeting in Porto, Portugal, published in IEEE Computational Science & Engineering, 5(3), pp.92-93). So, HPF is also not suitable for programming high-performance parallel computations on networks.
Resume: To utilize a heterogeneous network of computers as a single distributed memory machine, dedicated tools are needed.
Our programming tools for networks
We have addressed the problem and developed dedicated tools delivering its solution. Namely, we have developed a high-level parallel language, mpC, designed specially to develop portable adaptable application for heterogeneous networks of computers. The main idea underlying mpC is that an mpC application explicitly defines an abstract network and distributes data, computations and communications over the network. The mpC programming system uses this information to map the abstract network to any real executing network in such a way that ensures efficient running of the application on this real network. This mapping is performed in run time and based on information about performances of processors and links of the real network, dynamically adapting the program to the executing network.
The first version of the mpC programming system for networks of workstations and PCs became available early in 1997 from our homepage http://www.ispras.ru/~mpc.
The mpC programming system includes a compiler, a run-time support system (RTSS), a library, and a command-line user interface. The compiler translates a source mpC program into the ANSI C program with calls to functions of RTSS. RTSS manages processes, constituting the parallel program, and provides communications. It encapsulates a particular communication platform (currently, a subset of MPI) ensuring platform-independence of the rest of system components.
Our work has been supported with a two years grant from the US Office Naval Research (June 1995 - June 1997). The research has been listed among the most significant achievements of the Russian Academy of Sciences in computer science for the last 15 years.
Our technology of parallel programming for heterogeneous networks
Several years we experimented with the mpC system and developed some technology of its use for high-performance computing on heterogeneous networks.
The technology has been successfully applied to solving the following problems:
- efficient use on heterogeneous networks of legacy parallel software software ported from supercomputers (the corresponding successful story includes the development of an interface between mpC and the most famous parallel linear algebra package, ScaLAPACK, allowing to use the latter efficiently on heterogeneous networks; it took only a week for one of us to develop that interface and only a couple of days to port a complex ScaLAPACK application to a heterogeneous network using the interface);
- rewriting in mpC supercomputer parallel applications with such modifications of the underlying algorithm that ensure their efficient execution on heterogeneous networks (the corresponding successful story includes the port of an application, written in Fortran 77 with calls to PVM (Parallel Virtual Machine) to simulate oil extraction (about 3000 lines of source code), from a Parsytec supercomputer to a heterogeneous network of PCs and workstations; the rewriting of this applications in mpC, the corresponding modification of the algorithm to take into consideration the heterogeneity of processor performances and testing the resulting mpC program took about 2 weeks of one of us; executed on the network of 8 workstations and PCs, the program ran 3 times as faster than its Fortran/PVM counterpart on the same network and 2 times as faster than the Fortran/PVM application on a 8-processor segment of the Parsytec supercomputer);
- parallelization of serial applications to run on heterogeneous networks (the corresponding successful story includes the development of a parallel version of the classic quanc8 automatic adaptive Fortran routine based on the 8-panel Newton-Cortes rule for numeric computation of definite integrals; the mpC application allowed a user to essentially speed up numeric integration using available computers of a local network in case of computationally complex integrands; note, that the application auotomatically redistribute computations in run time changing the volume of computations performed by a separate computer dependent on its current workload; the work took two days of one of us);
- development of original dedicated mpC applications for heterogeneous computing networks (the corresponding successful story includes an application simulating the evolution of systems of bodies under the influence of Newtonian gravitational attraction on heterogeneous computing networks; the application demonstrates multi-fold sppedups comparing to its very carefully written MPI counterpart when running on local heterogeneous networks of workstations).
- Once developed, an mpC application will run as efficiently as possible on any heterogeneous network of computers without any changes of its source code (we call the property efficient portability ).
- The mpC language allows to write applications adapting not only to nominal performances of processors but also to redistribute computations and communications dependent on dynamic changes of workload of separate computers of the executing network.
- By now, mpC is the unique tool having no research or industrial analogs. There are some tools executing some functions of an distributed operating system and trying to take into consideration the heterogeneity of processor performances in commodity networks of computers when scheduling tasks in order to maximize throughput of the corresponding network. Unlike such tools, mpC is aimed at minimization of the running time of an application on the executing network. The feature is the most important for end-users, while the network throughput is important for network administrators.
- No specific requirements needed to install and use the mpC system. Only highly standard, highly portable and freely available software is needed to get the mpC programming system worked.