LZN's Blog CodePlayer

【已解决】CAM_F05 Resolution Out of Memory

2014-11-08
LZN

之前在天河二号(包括学校超算集群)测试的F05的CAM,都没有出现什么问题,最近在天河二号上重新跑CAM5 F05,却出现了out of memory的错误,重新配置了CAM4,还是一样的错误:

Fatal error in PMPI_Alltoallv: Other MPI error, error stack: PMPI_Alltoallv(540)....................: MPI_Alltoallv(sbuf=0x806f43e0, scnts=0xe6c75e0, sdispls=0xe764390, dtype=0x4c000829, rbuf=0x805f7650, rcnts=0xe765ba0, rdispls=0xe7673b0, dtype=0x4c000829, comm=0x84000006) failed MPIR_Alltoallv_impl(378)...............: MPIR_Alltoallv(341)....................: MPIR_Alltoallv_intra(192)..............: MPIC_Waitall(720)......................: MPIR_Waitall_impl(163).................: MPIDI_CH3I_Progress(363)...............: MPID_nem_mpich_blocking_recv(906)......: MPID_nem_glex_poll(234)................: MPIDI_nem_glex_ER_progress_PUT(74).....: MPIDI_nem_glex_ER_recv_progress(378)...: MPID_nem_handle_pkt(635)...............: MPIDI_CH3_PktHandler_EagerSend(640)....: failure occurred while posting a receive for message data (MPIDI_CH3_PKT_EAGER_SEND)

重新配置PE层,增多节点数,调整MPIBUFFER,都没用。目前搁置。forum上有贴子:

https://bb.cgd.ucar.edu/node/1001941

eaton提到检查内存leak can be a very challenging exercise.

#Up to 20141108#

俊文寒假前将model统一升级到1.2.2后似乎没有这个问题了。不过我也退出折腾高分辨率啦,还是看paper学英语重要~嘿嘿

#Up to 20150519#


Similar Posts

Comments