The SLURM scheduling and queueing system pins processes to CPUs. This worked great with OpenMPI 1.6, but OpenMPI 1.8 introduced its own pinning policy. As a result, our jobs ended up running very slowly because all jobs that used less than a whole machine ended up getting pinned to cores 0 through n, where n is the number of cores requested by a job.
Fixing this is pretty simple: just disable OpenMPI’s pinning so that the MPI processes inherit the scheduler’s pinning. To do that, add the following to your job script:
export OMPI_MCA_hwloc_base_binding_policy=none
Obviously, as a more permanent and more fool-proof solution, set that setting in the OpenMPI config file on all of your compute nodes, have the scheduler set that environment variable, or put it into your module file for OpenMPI.