Patent attributes
Computer-implemented techniques for fast block-based parallel message passing interface (MPI) transpose are disclosed. The techniques achieve an in-place parallel matrix transpose of an input matrix in a distributed-memory multiprocessor environment with reduced consumption of computer processing time and storage media resources. An in-memory copy of the input matrix or a submatrix thereof to use as the send buffer for MPI send operations is not needed. Instead, by dividing the input matrix in-place into data blocks having up to at most a predetermined size and sending the corresponding data block(s) for a given submatrix using an MPI API before receiving any data block(s) for the given submatrix using an MPI API in the place of the sent data block(s), making the in-memory copy to use a send buffer can be avoided and yet the input matrix can be transposed in-place.