帮JL移植wrf3.8.1到天河2号。首先调整环境变量。
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# edited on Apr 1, 2018 for WRF
# User specific aliases and functions
#bug fix
source /WORK/app/osenv/ln1/set2.sh
export MODULEPATH=/WORK/app/modulefiles
#set COMMON input data path for CESM
#export COMMON_CESM_INPUT=/WORK/app/CESM_inputdata
#set Intel compiler and MPI
module load intel-compilers/13.0.0
module load MPI/Intel/MPICH/3.1-icc13
module load cmake/3.0.2
#set hdf5 and netcdf4 for WRF
module load hdf5/1.8.13/03-CF-13
module load netcdf/4.3.2/02-CF-13
#set jasper
export JASPER=/HOME/sio_goc017/WORKSPACE/jjl/soft/jasper-1.900.1
export JASPERLIB=$JASPER/lib
export JASPERINC=$JASPER/include
#set WRF
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
export WRF_EM_CORE=1
export NETCDF4=0
export BUFR=1
export CRTM=1
export J="-j 12"
export MP_STACK_SIZE=4000000000
export OMP_STACKSIZE=4000000000
export KMP_STACKSIZE=4000000000
ulimit -s unlimited
ulimit -c unlimited
MPI_BUFFER_SIZE=128
注意在编译cumulus模块的地方大概要等待20分钟之久。采用上述环境变量出错:
wrf.o: In function `MAIN__':
wrf.f90:(.text+0x1f): undefined reference to `__intel_new_feature_proc_init'
libwrflib.a(module_check_a_mundo.o): In function `module_check_a_mundo_mp_check_nml_consistency_':
module_check_a_mundo.f90:(.text+0x31fa): undefined reference to `__intel_ssse3_rep_memmove'
module_check_a_mundo.f90:(.text+0x35ea): undefined reference to `__intel_ssse3_rep_memmove'
libwrflib.a(module_mp_etanew.o): In function `module_mp_etanew_mp_etanewinit_':
module_mp_etanew.f90:(.text+0xacb0): undefined reference to `__intel_ssse3_rep_memmove'
libwrflib.a(module_mp_thompson.o): In function `module_mp_thompson_mp_table_ccnact_':
module_mp_thompson.f90:(.text+0x2ed74): undefined reference to `__intel_ssse3_rep_memmove'
libwrflib.a(module_ra_rrtm.o): In function `module_ra_rrtm_mp_rrtm_lookuptable_':
module_ra_rrtm.f90:(.text+0x256ea): undefined reference to `__intel_ssse3_rep_memmove'
libwrflib.a(module_ra_rrtmg_lw.o):module_ra_rrtmg_lw.f90:(.text+0x769bf): more undefined references to `__intel_ssse3_rep_memmove' follow
/WORK/sio_goc017/jjl/wrfdata/WRFV3/external/io_grib_share/libio_grib_share.a(io_grib_share.o): In function `transpose_grib_':
io_grib_share.f90:(.text+0x1ee4): undefined reference to `_intel_fast_memmove'
/WORK/sio_goc017/jjl/wrfdata/WRFV3/external/io_grib_share/libio_grib_share.a(io_grib_share.o): In function `transpose1d_grib_':
io_grib_share.f90:(.text+0x26f5): undefined reference to `_intel_fast_memmove'
io_grib_share.f90:(.text+0x29e3): undefined reference to `_intel_fast_memmove'
0.57user 4.84system 0:08.12elapsed 66%CPU (0avgtext+0avgdata 487696maxresident)k
12392inputs+102200outputs (47major+183178minor)pagefaults 0swaps
make[1]: [em_wrf] Error 1 (ignored)
Google以下,发现intel官网的trouble shooting page 以下内容:
I believe the message may be caused by mixing objects built with different compiler versions. The __intel_new_feature_proc_init entry point is only in recent compiler libraries; if you have code built with a recent compiler, but are linking to older run-time libraries, you might encounter such a problem.
有道理,可能是之前编译的模块没有删掉。执行
$./clean -a
再次测试一下.出现难以理解的错误:
nup_em.f90(68): error #7002: Error in opening the compiled module file. Check INCLUDE paths. [MODULE_INITIALIZE_REAL]
USE module_initialize_real, only : wrfu_initialize
-------^
real_em.f90(12): error #7002: Error in opening the compiled module file. Check INCLUDE paths. [MODULE_INITIALIZE_REAL]
表示找不到include文件。到相关目录下检查发现文件确实存在,奇葩……重新执行compile语句后编译成功。怀疑是天河的文件系统不稳定造成。
测试WPS编译出错。
make[1]: pathf90: Command not found
检查了下发现自己选错了architecture,并不是与前面相同的#15,而是#19.
Updated 2018-04-01
据CM大哥表示,天河不开NC4编译是不行的。果然,JL运行出错,测试NC4打开后是否能解决。
Updated 2018-04-03
开始测试tensorflow的图片分类器! 按这个视频来,发现train的一步出现问题
ERROR:tensorflow:Couldn't understand architecture name ''
发现是指定的环境变量读不到,于是将给ARCH赋值的语句和python执行语句一起放到shell脚本里,问题解决。 train的过程大概用了不到五分钟,估计30-60min的经验是CPU-based,tf应该自动选用cuda解决了。
之后做测试,效果很好。
Updated 2018-03-30
周三助教回到东校,决定去处理一下实验室服务器,看下我的wordpress博客内容到底还有没有救。因为上次做了1号盘位的rebuild,这次测试将0号盘位degraded的旧硬盘换下。 可是将0号盘位换下后,1号盘位显示为non-raid磁盘,而0号新盘直接不认,what the fuck。怀疑是上次rebuild没有完成,所以将旧硬盘再次换到0号位置让raid卡自己玩去。 换上旧硬盘后可以正常识别到raid信息并进入系统。测试mysql数据库居然发现wp_posts数据都存在,太幸运了,我的技术博客有救了!果断导出。
mysqldump -h localhost -u root -p --database wordpress | gzip > backupfile.sql.gz
gunzip -c backupfile.sql.gz > bck.sql
之后在mysql下导入
source ~/bck.sql;
查看中文时注意utf8编码
set names utf8;
后面考虑写个python脚本将过去的日志转成markdown放到jekyll博客下。
Updated 2018-03-28