qe 调用 mpi 出错

24 Oct 2022 - Gaomou XU

Tags: qe, mpi

Quantum Espresso 7.1 release 版本中 $QE_ROOT/LAXlib/ptoolkit.f90文件调用 MPI 时, 错误地试图 free 一些不存在的 communicator,进而导致如下报错:

Abort(134821893) on node 2 (rank 2 in comm 0): Fatal error in PMPI_Comm_free: 
Invalid communicator, error stack:
PMPI_Comm_free(137): MPI_Comm_free(comm=0x7ffd27046258) failed
PMPI_Comm_free(85).: Null communicator

官方论坛已经有人贴出这个bug,但查不到有效回应,还好在 gitlab issues 中已得到解决,但目前仍未 release

可以 make clean 后,手动修改 $QE_ROOT/LAXlib/ptoolkit.f90文件并重新编译,随后 qe 可正常运行:

@@ line 5554
    END DO

    ! split communicator is present and must be freed on all processors
-    CALL mpi_comm_free( col_comm, ierr )
-    IF( ierr /= 0 ) &
+    IF( col_comm /= MPI_COMM_NULL ) THEN
+       CALL mpi_comm_free( col_comm, ierr )
+       IF( ierr /= 0 ) &
          CALL lax_error__( " pztrtri ", " in mpi_comm_free 25 ", ABS( ierr ) )
+    ENDIF

    DEALLOCATE(B)
    DEALLOCATE(C)
    END DO
@@ line 5564
@@ line 5931
    ! split communicator is present and must be freed on all processors
-    CALL mpi_comm_free( col_comm, ierr )
-    IF( ierr /= 0 ) &
+    IF( col_comm /= MPI_COMM_NULL ) THEN
+       CALL mpi_comm_free( col_comm, ierr )
+       IF( ierr /= 0 ) &
          CALL lax_error__( " pdtrtri ", " in mpi_comm_free 25 ", ABS( ierr ) )
+    ENDIF

    DEALLOCATE(B)
    DEALLOCATE(C)
@@ line 5941