Central Cluster System in the CUHK has 3 GPU nodes with Nvidia 1080Ti, we hope to use the resources for our deep learning research. Here we archive the process to setup the Tensorflow environment.
First, we create a slurm script to apply the login authority to GPU node.
#!/bin/bash
#SBATCH -J gpu_test
#SBATCH -N 1
#SBATCH --gres=gpu:GTX1080Ti:2
sleep 99999
Now we can ssh
to the assigned GPU node. Login, and type nvidia-smi
, with return:
Tue Apr 14 16:49:54 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A |
| 0% 32C P5 17W / 260W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:05:00.0 Off | N/A |
| 0% 41C P0 62W / 260W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:08:00.0 Off | N/A |
| 0% 37C P5 17W / 260W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A |
| 0% 40C P5 16W / 260W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX 108... Off | 00000000:85:00.0 Off | N/A |
| 0% 35C P0 61W / 260W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX 108... Off | 00000000:86:00.0 Off | N/A |
| 0% 39C P0 61W / 260W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce GTX 108... Off | 00000000:89:00.0 Off | N/A |
| 0% 32C P0 61W / 260W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce GTX 108... Off | 00000000:8A:00.0 Off | N/A |
| 0% 35C P0 61W / 260W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Here we see that there are 8 GTX 1080Ti Cards on the GPU node, with Nvidia driver version 418.67 and CUDA version 10.1. Note that the CUDA you see here is the “Driver API”, which is not what we need in deep learning.
We need the so called runtime
CUDA libs, which contains ncvv
compiler.
Please check this link for match table, be sure to install the corresponding match in case of problems.
Note the CUDA10.1 pack may have some trouble in configuring all default paths: This post
Use
sh cuda_10.1.105_418.39_linux.run --toolkit --toolkitpath=$HOME/tkit --defaultroot=$HOME/tkit --samples --samplespath=$HOME/tkit/samples
If you see the following summary, Bingo!
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /users/b145872/soft/cuda_10_1/
Samples: Not Selected
Please make sure that
- PATH includes /users/b145872/soft/cuda_10_1/bin
- LD_LIBRARY_PATH includes /users/b145872/soft/cuda_10_1/lib64, or, add /users/b145872/soft/cuda_10_1/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /users/b145872/soft/cuda_10_1/bin
Please see CUDA_Installation_Guide_Linux.pdf in /users/b145872/soft/cuda_10_1/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 418.00 is required for CUDA 10.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Check the nvcc
compiler:
(base) [b145872@chpc-login01 cuda_10_1]$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168
Follow the instructions, just download the tar pack and unzip to cuda installed path. Note you need to finish a survey at first.
Error occurs:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation Variable/IsInitialized/VarIsInitializedOp: node Variable/IsInitialized/VarIsInitializedOp (defined at neural_style.py:243) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
Using the baseline test on Tensorflow official site:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
With the result:
Num GPUs Available: 0
This is strenge. I then traced the error msg and found tf could not find the cudnn lib. It is because the unpacked cudnn is not set into the right cuda/lib64
& cuda/include
dir. Just mv
.
The environment settings done, just pip install tensorflow-gpu
will install the newest (2.1) version, and works.
I recommend to use new conda environment and conda install
to avoid dependency issues. Also, 2.1 can use similar procedure. 1.14 need py3.6 environment.
conda create -n tensorflow1.14 python=3.6
source activate tensorflow1.14
conda install tensorflow-gpu==1.14
Conda will automatically install corresponding cuda and cudnn.
Updated 2020-04-14
YQ need to use new version of CMAQ on HKUST cluster. Here we archive the procedure to refresh the CMAQ version.
We first check the pgi
and mpi
version, which are already loaded in the environment.
which pgfortran
/usr/local/pgi-16cos7/linux86-64/2016/
which mpifort
/usr/local/pgi-16cos7/linux86-64/2016/mpi/mpich/bin/mpifort
Nice, next we compile HDF5
and NETCDF
.
First download the zlib
source from official site.configure
, make
, make install
as usual.
Next, download HDF5 source code from offical site. We chose version 1.10 instead of 1.12 as v1.12 seems to have changed quite a lot.
Installation commands:
./configure --with-zlib=/home/yhuangci/soft/zlib-1.2.11-gcc --prefix=/home/yhuangci/soft/hdf5-1.10.6-pgi-16cos7 --enable-hl
make check
The check process meets problem:
VDS SWMR tests failed with 1 errors.
I searched a while but seldom useful results returned. After make clean
, I used make
only and there is no error report. Here I just hope to have a quick check if this outcome supports the NetCDF compiling.
Set bashrc
:
# set hdf5
HDF5=/home/yhuangci/soft/hdf5-1.10.6-pgi-16cos7
export PATH=$HDF5/bin:$PATH
export LD_LIBRARY_PATH=$HDF5/lib:$LD_LIBRARY_PATH
export INCLUDE=$HDF5/include:$INCLUDE
source the bashrc
, install the NetCDF
, pay attention to assign the compiler and disable the remote application (We do not need it now):
CPPFLAGS='-I${H5DIR}/include -I${ZDIR}/include' LDFLAGS='-L${H5DIR}/lib -L${ZDIR}/lib' ./configure --prefix=${NCDIR} --disable-dap CC=pgcc
make
make check
make install
Successful!
Then bind the Fortran libs.
CPPFLAGS='-I${H5DIR}/include -I${ZDIR}/include -I/home/yhuangci/soft/netcdf-473-pgi-16cos7/include' LDFLAGS='-L${H5DIR}/lib -L${ZDIR}/lib -L/home/yhuangci/soft/netcdf-473-pgi-16cos7/lib' ./configure --prefix=${NCDIR} FC=pgfortran
make
make check
make install
Great! All done! It seems that the specific item in HDF5 make check
does not influence the ensuing compiling process of the NetCDF. The CMAQ can be compiled using the above configurations in NetCDF.
Updated 2020-03-23
After Repeating Sandy (2012)
, we finally come to start to build the coupling framework over the Guangdong-Hong Kong-Macao Greater Bay Area (GBA). Here we archive the process in building the coupling framework for simulating Mangkhut (2018)
.
The Sandy (2012)
case can be smoothly repeated by following the instructions on the manual. While buiding up the coupling framework over the GBA need more efforts.
I skipped the WPS procedure for building WRF grid system. Now we form the ROMS grid system using WPS generated geo_em
data.
1.1 Use WPS to generate a geo_em
file which is equal or larger compared to your aim ROMS domain. For example, in our case, we need to set the ROMS in around 2.2 km spatial resolution and 900x600 size, so I generate a geo_em
file with 3 km and 1000x1000 size first.
1.2 Generate ROMS grid system by geo_em
from WRF-WPS. There is an urban legend that all islands with sizes smaller than 4 px need to be removed to ensure the smooth run. I even tried Computer Vision algorithm to label the connected components and then remove the elements. Finally, I found this is not necessary. The underlying cause of unstable integration is still from the bathymetry. step1_roms_grid_from_wps_200219.m
Meanwhile, we need to be careful to fit the condition in Lm
and Mm
:
! Lm Number of INTERIOR grid RHO-points in the XI-direction for
! each nested grid, [1:Ngrids]. If using NetCDF files as
! input, Lm=xi_rho-2 where "xi_rho" is the NetCDF file
! dimension of RHO-points. Recall that all RHO-point
! variables have a computational I-range of [0:Lm+1].
!
! Mm Number of INTERIOR grid RHO-points in the ETA-direction for
! each nested grid, [1:Ngrids]. If using NetCDF files as
! input, Mm=eta_rho-2 where "eta_rho" is the NetCDF file
! dimension of RHO-points. Recall that all RHO-point
! variables have a computational J-range of [0:Mm+1].
Final Domain Configuration (Outer and white inner box for WRF, and black dashed box for ROMS):
2.1 Use ETOPO bathymetry data to fill the h in the generated file roms_grid.nc
. Here we use a simple neighboring method to fill the h in each grid. step2_ETOPO_bath_to_roms_200219.ncl
2.2 Preliminary process the bathy using 9-point smooth. This is optional, as the following LP method is quite effective in optimizing the bathy. Of note is that here we also set the minimum bathy to 10 m. This is quite important as the vertical velocity can easily violate the CFL condition if the shallowest water near the coast is too shallow.
How to justify “too shallow”? In this case, we have a strong typhoon, using 10 s dtime in ROMS, the 10-m shallowest bathy works fine. Meanwhile, we restricted the deepest bathy to 3000 m, as we do not concern the deep sea process in this relative short simulation. step3_OPTIONAL_prim_process_roms_bath_200219.ncl
Original Bathymetry:
2.3 Smooth the bathemetry to satisfy the rx threshold, using the LP method. There is a toolkit from IRB in Croatia LP Bathymetry, providing several ways to deal with the problem.
Note the lp_solve
command line tool need to be installed at frist. For other version, remember to download the file named as lp_solve_x.x.x.x_exe_ux64.tar.gz. step4_LP_smooth_bath_200219.m
Bathymetry Change After LP Optimization. Note that the deepest bathy has been cut off at 3000 meter:
Why/How do we smooth the bathy in ROMS? I quote a very insightful post from the ROMS forum here:
1) compute vertically stable! profile of temp and salt for the domain you are running, horizontally homogeneous! , hence not producing any density gradients. In other words, create ana vertical profile for temp and salt in a similar way as is done for many examples in ana_initial.h . In that way if you start your simulation WITHOUT any forcing! the ocean should stay (not exactly, diffusion etc, but lets pretend it does, with big confidence) in stable state, you have only vertically stable stratified density, not producing any velocity (because there are no density gradients). However we are not on z grid and there is a sigma slope so you will have effects of HPGE, which are direct consequence of “bad” rx1.
In that way you can see what is the effect of “bad bathymetry and big rx1” and where it is introducing artificial currents because of HPGE.
Usually look at the bottom, where pressure builds up.
2) I identify those regions (i.e. using magnitude of velocity) and create weight factor for smoothing, big magnitude big weight. You want to smooth only there where you have big errors from HPGE, and keep as close as possible to real bathy. After smoothing you get new bathy, run the same case again and compute magnitude again, do that in a iterative way until you are happy. In my case I have to run model for a week to get to steady state with “HPGE” currents that are not changing any more.
Also according to Dutour et al. (2009):
The initial error, called an error of the first type (Mellor et al., 1994), is easy to estimate: simply run the model with no forcing and a horizontally uniform vertical T/S profile to get the induced currents. The remaining HPG error, called an error of the second type, is much smaller but still creates artificial currents, which limit the stability of the model. It is very difficult to estimate this error and there are only heuristic rules about it based on experience with the ROMS model. To be on the safe side, it is recommended to use grids with , and most simulations with ROMS are done with and (see Shchepetkin and McWilliams (2003) for more details). Therefore, we call a grid numerically stable if , even if this is somewhat arbitrary. Useful links:
* https://www.myroms.org/forum/viewtopic.php?t=2330
* https://www.myroms.org/forum/viewtopic.php?t=612
Here is a bug in the original mtools from the COAWST if using OPeNDP through the network interface. Thus, we download the HYCOM data manully through a python script, and rewrote some code in the orginal mtool.
3.1 Download HYCOM data. 200211-down-hycom.py
3.2 Interpolate to the model layers. This will take a while. step5_local_gba_roms_master_climatology_coawst_mw_200219.m
In ROMS/Modules/mod_scalars.F
, change max_speed = 20.0_dp
to max_speed = 100.0_dp
as TC may cause very strong surface current.
Follow the instrucitions in the manual and compile the scrip_coawst
util for generating the weighting file. Note that the path in scrip_coawst_XXX.in
should not longer than 80 chars.
Current settings across 3 component models in the COAWST:
Task-based Load-Balancing Configuration:
The ROMS surf layer temp:
We really appreciate Dr. John Warner for his great efforts to build the COAWST coupling framework, which provides us the possibility to investigate our potential ideas in TC-wave-sea simulation. We also appreciate Dr. Mathieu Dutour Sikiric for his LP bathy tool to optimize the bathy over the SCS, which is very important to run the model stably.
Updated 2020-03-17