Newly updated SCREAM repo removed the readme file about how to port the code, but the older version of readme can be fetched by tracing back the code histroy.
The first problem comes from buiding kokkos:
CMake Error at cmake/kokkos_functions.cmake:64 (MESSAGE):
Matching option found for Kokkos_ENABLE_SERIAL with the wrong case
KOKKOS_ENABLE_SERIAL. Please delete your CMakeCache.txt and change option
to -DKokkos_ENABLE_SERIAL=ON. This is now enforced to avoid hard-to-debug
CMake cache inconsistencies.
This error is basically hoping you use camel-like cases in the command line to set the configuring flags. Just follow it suggests.
The command changes to:
cmake \
-D CMAKE_INSTALL_PREFIX=${RUN_ROOT_DIR}/kokkos/install \
-D CMAKE_BUILD_TYPE=Debug \
-DKokkos_ENABLE_DEBUG=ON \
-DKokkos_ENABLE_AGGRESSIVE_VECTORIZATION=OFF \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_OPENMP=ON \
-DKokkos_ENABLE_PROFILING=OFF \
-DKokkos_ENABLE_DEPRECATED_CODE=OFF \
-DKokkos_ENABLE_EXPLICIT_INSTANTIATION:BOOL=OFF \
${KOKKOS_SRC_LOC}
Here we can successfully install the Kokkos. In building the SCREAM, several errors appear:
CMake Error: File /users/b145872/project-dir/app/scream/components/scream/../cam/src/physics/rrtmgp/external/rrtmgp/data/rrtmgp-data-lw-g224-2018-12-04.nc does not exist.
CMake Error: File /users/b145872/project-dir/app/scream/components/scream/../cam/src/physics/rrtmgp/external/rrtmgp/data/rrtmgp-data-sw-g224-2018-12-04.nc does not exist.
This file can be easily download by a simple search. Here I just found something not right. It seems some modules are missing in the scream folder. I found they locate in the external
folder, and we can see the folder structure on github, but has not been cloned to local path.
That is interesting! I then found these folders on github actually point to other repos. It is [git submodule](https://git-scm.com/book/en/v2/Git-Tools-Submodules)
!
The right way to clone the repo with submodules:
git clone --recurse-submodules
Now it comes to the MPI issue (cannot find mpi.h
). Add INCLUDE path to .bashrc
.
export CPLUS_INCLUDE_PATH=$INCLUDE
export C_INCLUDE_PATH=$INCLUDE
New error:
/users/b145872/project-dir/app/scream/components/scream/ekat/src/ekat/util/scream_arch.cpp(34): error: class "Kokkos::Serial" has no member "impl_is_initialized"
ss << "ExecSpace initialized: " << (DefaultDevice::execution_space::impl_is_initialized() ? "yes" : "no") << "\n";
We then found the Kokkos default settings are not what we want, Seriel is not acceptable.
-- Final kokkos settings variable:
-- env;KOKKOS_CMAKE=yes;KOKKOS_SRC_PATH=/users/b145872/project-dir/app/scream/externals/kokkos;KOKKOS_PATH=/users/b145872/project-dir/app/scream/externals/kokkos;KOKKOS_INSTALL_PATH=/users/b145872/project-dir/app/scream_run/scream_test01/kokkos/install;KOKKOS_ARCH=None;KOKKOS_DEVICES=Serial;KOKKOS_DEBUG=no;KOKKOS_OPTIONS=disable_dualview_modify_check;KOKKOS_USE_TPLS=librt
We then re-source the ~/.bashrc
. It seems the configure grep the MPI settings now, with OpenMP as the Parallel settings. New error in configuring SCREAM:
CMake Error at /users/b145872/project-dir/app/scream/externals/kokkos/cmake/kokkos_functions.cmake:49 (MESSAGE):
Matching option found for Kokkos_ENABLE_DEBUG with the wrong case
Kokkos_ENABLE_Debug. Please delete your CMakeCache.txt and change option
to -DKokkos_ENABLE_DEBUG=FALSE. This is now enforced to avoid
hard-to-debug CMake cache inconsistencies.
This is weird. In configuring Kokkos, we give excatly “-DKokkos_ENABLE_DEBUG=ON ", and in configuring SCREAM, there is no such option. Using grep, we found several “Debug” in components/scream/ekat/cmake/Kokkos.cmake
, after changing to “DEBUG”, we pass this point…
New error [26%]:
/users/b145872/project-dir/app/scream/components/scream/ekat/src/ekat/scream_kokkos_meta.hpp(18): error: class "Kokkos::MemoryTraits<0U>" has no member "RandomAccess"
value = ((View::traits::memory_traits::RandomAccess ? Kokkos::RandomAccess : 0) |
According to kokkos programming guide, the RandomAccess
is a trait from CUDA. It seems we also have to build cuda for SCREAM? If this is true, the SCREAM is required to run on GPU nodes. Okay, just a simple test, let us first see what would happen if we build kokkos in seriel mode.
The seriel mode shows similar errors. Besides, I noticed it seems all memory access operation including “Atomic Access” traits are missing, thus it may not be simply “need cuda” issues. Interesting.
When I used the latest kokkos from github, different errors occur when try “Seriel” and “OPENMP”. When only “Seriel” is turned on, MPI errors will occur in compiling SCREAM.
On Jul 20, I found a new version of master branch and following the new instructions in build.md, the test can be finished successfully.
When I turn on CUDA with cuda version 10.1, there are still problems.
Here we show an simple example about how to use community development commands in GitHub.
We need to first fork the community repo, then clone it to our local machine. On the local machine, we can create our own branch:
git checkout -b test-brch
After modification:
git add .
git commit -m "test-brch"
git push origin test-brch
Now we push the revised branch onto our remote GitHub repo. Go to that repo and change to the branch, submit the pull request to the original repo. Then the community admin will see our pull request.
If we hope to merge the branch locally:
git checkout master # checkout to master branch
git merge test-brch
Now it is safe to delete the test-brch branch.
git branch -d test-brch
Updated 2020-07-10
In the Spellcaster! project, we combined observational analysis data and s2s forecast data to form the forecast inputs.
By the default scratch code I wrote, the process is very slow. For 2000 stations, it took nearly 30 minutes to complete the combination.
I speculated the bottleneck is in the IO, so here I try to use multiprocessing
in python to increase the IO speed.
The orginal hotspot code:
for idx, row in sta_df.iterrows():
sta_num=str(int(row['区站号']))
# print(sta_num+' '+row['省份']+' '+row['站名'])
lat_sta=conv_deg(row['纬度(度分)'][0:-1])
lon_sta=conv_deg(row['经度(度分)'][0:-1])
var=var1.sel(lat=lat_sta,lon=lon_sta,method='nearest')
clim_var = var.loc['1981-01-01':'2010-12-31'].groupby("time.month").mean()
ano_var = (var.groupby("time.month") - clim_var)
ano_series=np.concatenate((ano_var.values,np.array((0.0,)),(fcst_var1.sel(LAT=lat_sta, LON=lon_sta, method='nearest').values,)))
np_time=np.append(hist_time.values, np.datetime64('now'))
np_time=np.append(np_time, fcst_time.values)
df =pd.DataFrame(ano_series, index=np_time, columns=['prec_ano'])
df=df.fillna(0)
df.to_csv(blend_outdir+sta_num+'.prec.csv')
Using multiprocessing
and rewite this part, the main
function:
def main():
# number of processes in use
ntasks=4
# PREC/L data
ds = xr.open_dataset(prec_arch_fn)
var1 = ds['precip'].loc['1979-01-01':,:,:]
hist_time= ds['time'].loc['1979-01-01':]
#print(var1.loc['1981-01-01':'2010-12-31',:,:])
clim_var1 = var1.loc['1981-01-01':'2010-12-31'].groupby("time.month").mean()
ano_var1 = (var1.groupby("time.month") - clim_var1)
#S2S data
ds_s2s = xr.open_dataset(s2s_fcst_file)
fcst_var1=ds_s2s['anom'][0,0,0,:,:]
fcst_time=ds_s2s['TIME']
np_time=np.append(hist_time.values, np.datetime64('now'))
np_time=np.append(np_time, fcst_time.values)
# Get in Station meta
sta_df=get_station_df(sta_meta_file)
print('Parent process %s.' % os.getpid())
# start process pool
process_pool = Pool(ntasks)
len_df=sta_df.shape[0]
len_per_task=len_df//ntasks
# open tasks ID 0 to ntasks-2
for itsk in range(ntasks-1):
process_pool.apply_async(combine_data, args=(itsk, sta_df[itsk*len_per_task:(itsk+1)*len_per_task], ano_var1, fcst_var1, np_time, blend_outdir,))
# open ID ntasks-1 in case of residual
process_pool.apply_async(combine_data, args=(ntasks-1, sta_df[(ntasks-1)*len_per_task:], ano_var1, fcst_var1, np_time, blend_outdir,))
print('Waiting for all subprocesses done...')
process_pool.close()
process_pool.join()
print('All subprocesses done.')
The parallelized function:
def combine_data(itsk, sta_df, ano_var1, fcst_var1, np_time, npblend_outdir):
print('Run task %s (%s)...' % (itsk, os.getpid()))
start = time.time()
for idx, row in sta_df.iterrows():
sta_num=str(int(row['区站号']))
# print(sta_num+' '+row['省份']+' '+row['站名'])
lat_sta=conv_deg(row['纬度(度分)'][0:-1])
lon_sta=conv_deg(row['经度(度分)'][0:-1])
ano_var=ano_var1.sel(lat=lat_sta,lon=lon_sta,method='nearest')
ano_series=np.concatenate((ano_var.values,np.array((0.0,)),(fcst_var1.sel(LAT=lat_sta, LON=lon_sta, method='nearest').values,)))
df =pd.DataFrame(ano_series, index=np_time, columns=['prec_ano'])
df=df.fillna(0)
df.to_csv(blend_outdir+sta_num+'.prec.csv')
end = time.time()
print('Task %s runs %0.2f seconds.' % (itsk, (end - start)))
Note:
NetCDF
operation is “lazy”, combining xarray operation like group
and multiprocessing will cause HDF5 IO errors. These conflicted operations should be excluded from the parallelized function.sel
: ~ 1 min; 4 tasks parallel: 10s.The principle for optimization:
Updated 2020-06-30