This paper describes a delegation based high throughput MPIcommunication mechanism under tough memory utilization constrains on a many-core oriented hybrid parallel computer. Towards the Exascale era, hybrid parallel computers consisting of many-core and multi-core architectures both on the same node are focused. Although many-core architectures such as GPU or Intel MIC has high potential in computing power by the large number of computing cores, per-core computing power is lower than that of multi-core CPUs. Furthermore, available memory resources for the many-core CPUs are quite smaller than those for multi-core CPUs. Thus we may have a sort of penalty in memory utilization in MPI communications when we utilize a normal MPI library. Here we deploy a delegatee process on each node to merge MPI communications and minimize memory utilization for an MPI communicator. Another advantage of the delegatee process scheme is minimization of memory utilization on many-core CPUs by delegating MPI requests to associated delegatee process on multi-core CPUs. In this paper, we show performance advantages and effective resource utilization by our proposed scheme compared with the original MPI implementation.
Paper available at IEEE.