The generic swap function in the last lecture is useful for 4-byte figures. How to make it be a universial function? Look at the following realization:
void swap(void *vp1, void *vp2, int size)
{
char buffer[size];
memcpy(buffer, vp1, size);
memcpy(vp1, vp2, size);
memcpy(vp2, buffer, size);
};
int x=17; y=37;
swap(&x,&y,sizeof(int));
void *
means a generic pointer.
char buffer[size]
is to set aside enough space to hold byte info.
memcpy(target, source, size)
is a more generic function to copy byte info with respect to the byte length.
Most energy is used to dynamically allocate and deallocate buffer.
The perk of using this method compared to C++ template is that this way is lean and economical, while the C++ template will generate patches as much as the call times in the executables.
Wrong usage:
int i = 44;
short j=5;
swap(&i, &j, sizeof(short))
If using generic code, you have to be very careful about the type. The above usage will cause unexpected result as the short
and int
are different types with different byte lengthes.
Another example of the generic code (a very complicated case):
char * husband = strdup("Fred");
char * wife = strdup("Wilma");
// Right usage
swap(&husband, &wife, sizeof(char *))
Note here we are acturally swap the bit patterns (address stored in char *) of two char pointers. We still need the reference operator &
. See the blackboard:
Some terrible errors:
swap(husband, wife, sizeof(char *)) // will swap the first 4 chars in heap
swap(husband, &wife, sizeof(char *)) // will make things chaos, the first 4 chars will be interpreted as an address!
An example for linear search:
int lsearch(int key, int array[], int size){
for (i=0;i<size;i++){
if (array[i]==key){
return i;
}
}
return -1;
}
Let’s make it generic! How about a undefinite type of array?
void *lsearch(void *key, void *base, int n, int elemSize){
for (i=0;i<n;i++){
void *elemAddr = (char *) base + i*elemSize;
if (memcmp(key, elemAddr, elemSize)==0)
return elemAddr;
return NULL;
}
}
It does not work for character pointers or C-strings.
Function templates are special functions that can operate with generic types. This allows us to create a function template whose functionality can be adapted to more than one type or class without repeating the entire code for each type.
template <class myType>
myType GetMax (myType a, myType b) {
return (a>b?a:b);
}
int x, y;
GetMax <int> (x, y);
perk 额外好处 sedate v. 安慰 sentinel 哨兵
Updated 2020-08-06
double d=3.1416;
char ch=*(char *)&d; // 1. &d get d ref --> 2. (char *) use char point to &d --> 3. * deref by `char`
Another dangerous test:
short s=45;
double d=*(double *) &s;
Note double
need 8 bytes to store the value, while short only takes 2 bytes, so this operation is very dangerious.
Big Endian and Little Endian:
See this plot:
If copy a short
type 1 from Big Endian machine to Little Endian machine, it will give a 256. Not a problem in forced type conversion.
struct fraction{
int num;
int denum;
};
fraction pi;
pattern:
||||pi.denum
||||pi.num
^
|
pointer
See the quirky syntax:
(fraction*)(&pi.denum))->num=12;
It first point to the original pi.denum
(4 byte) and then interpret it to a fraction struct! What will happen? The orginal pi.denum
is interpreted to a <new_struct>
of fraction! Thus,
Similar examples:
((fraction*)&(pi.denum))->denum=33;
What you will see:
Actually, we need to accept the concept that verything in C/C++ is pointer, look at array:
array<=>&array[0]
array+k<=>&array[k]
*array<=>array[0]
*(array+k)<=>array[k]
If do this:
int array[10];
array[10]=1;
Be aware, this will not cause a compiler error as the C compiler is an efficiency-wise compiler, it will not do the bounce check.
array[10] will be interpreted by 10*sizeof(a[0]), which is 40. Thus, from &array[0]
and count for 40 bytes, that 4-byte space will be set to 1.
This operation even tolerates negative numbers. (This is just code, not good code!)
The neighbouring address is highly possible to be other variables. See activation record.
The above code is equivalent to:
*(array+10)=1;
There could be many crazy examples (Actually there is an error, you can find it):
Now we see the struct
:
The corresponding bit patterns in memory: KCwj.jpg](https://s1.ax1x.com/2020/08/06/aRKCwj.jpg)
Try this:
pupils[2].name=strdup("Adam");
Here a linked table
like thing will work:
And this:
pupils[3].name=pupils[0].suid+6;
You will see:
Another one:
strcpy(pupils[1].suid, "40415xx");
Null character \0
or NUL
Try a scary on:
strcpy(pupils[3].name,"123456")
See the result:
void swap (int *ap, int *bp)
{
int temp = *ap;
*ap = *bp;
*bp = temp;
}
int x=7;
int y=117;
swap(&7, &y);
See the flow:
Asterisk 星号 Ampersand 连i字符 synonymous 同义的 arithmetic 算数的 verbatim 逐字的 backslash 反斜线 jurisdiction 管辖权
two to the ninth 2^9 gibberish 胡言乱语 contrived (deliberately created rather than arising naturally or spontaneously)
phantom 幻影
Updated 2020-08-06
Newly updated SCREAM repo removed the readme file about how to port the code, but the older version of readme can be fetched by tracing back the code histroy.
The first problem comes from buiding kokkos:
CMake Error at cmake/kokkos_functions.cmake:64 (MESSAGE):
Matching option found for Kokkos_ENABLE_SERIAL with the wrong case
KOKKOS_ENABLE_SERIAL. Please delete your CMakeCache.txt and change option
to -DKokkos_ENABLE_SERIAL=ON. This is now enforced to avoid hard-to-debug
CMake cache inconsistencies.
This error is basically hoping you use camel-like cases in the command line to set the configuring flags. Just follow it suggests.
The command changes to:
cmake \
-D CMAKE_INSTALL_PREFIX=${RUN_ROOT_DIR}/kokkos/install \
-D CMAKE_BUILD_TYPE=Debug \
-DKokkos_ENABLE_DEBUG=ON \
-DKokkos_ENABLE_AGGRESSIVE_VECTORIZATION=OFF \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_OPENMP=ON \
-DKokkos_ENABLE_PROFILING=OFF \
-DKokkos_ENABLE_DEPRECATED_CODE=OFF \
-DKokkos_ENABLE_EXPLICIT_INSTANTIATION:BOOL=OFF \
${KOKKOS_SRC_LOC}
Here we can successfully install the Kokkos. In building the SCREAM, several errors appear:
CMake Error: File /users/b145872/project-dir/app/scream/components/scream/../cam/src/physics/rrtmgp/external/rrtmgp/data/rrtmgp-data-lw-g224-2018-12-04.nc does not exist.
CMake Error: File /users/b145872/project-dir/app/scream/components/scream/../cam/src/physics/rrtmgp/external/rrtmgp/data/rrtmgp-data-sw-g224-2018-12-04.nc does not exist.
This file can be easily download by a simple search. Here I just found something not right. It seems some modules are missing in the scream folder. I found they locate in the external
folder, and we can see the folder structure on github, but has not been cloned to local path.
That is interesting! I then found these folders on github actually point to other repos. It is [git submodule](https://git-scm.com/book/en/v2/Git-Tools-Submodules)
!
The right way to clone the repo with submodules:
git clone --recurse-submodules
Now it comes to the MPI issue (cannot find mpi.h
). Add INCLUDE path to .bashrc
.
export CPLUS_INCLUDE_PATH=$INCLUDE
export C_INCLUDE_PATH=$INCLUDE
New error:
/users/b145872/project-dir/app/scream/components/scream/ekat/src/ekat/util/scream_arch.cpp(34): error: class "Kokkos::Serial" has no member "impl_is_initialized"
ss << "ExecSpace initialized: " << (DefaultDevice::execution_space::impl_is_initialized() ? "yes" : "no") << "\n";
We then found the Kokkos default settings are not what we want, Seriel is not acceptable.
-- Final kokkos settings variable:
-- env;KOKKOS_CMAKE=yes;KOKKOS_SRC_PATH=/users/b145872/project-dir/app/scream/externals/kokkos;KOKKOS_PATH=/users/b145872/project-dir/app/scream/externals/kokkos;KOKKOS_INSTALL_PATH=/users/b145872/project-dir/app/scream_run/scream_test01/kokkos/install;KOKKOS_ARCH=None;KOKKOS_DEVICES=Serial;KOKKOS_DEBUG=no;KOKKOS_OPTIONS=disable_dualview_modify_check;KOKKOS_USE_TPLS=librt
We then re-source the ~/.bashrc
. It seems the configure grep the MPI settings now, with OpenMP as the Parallel settings. New error in configuring SCREAM:
CMake Error at /users/b145872/project-dir/app/scream/externals/kokkos/cmake/kokkos_functions.cmake:49 (MESSAGE):
Matching option found for Kokkos_ENABLE_DEBUG with the wrong case
Kokkos_ENABLE_Debug. Please delete your CMakeCache.txt and change option
to -DKokkos_ENABLE_DEBUG=FALSE. This is now enforced to avoid
hard-to-debug CMake cache inconsistencies.
This is weird. In configuring Kokkos, we give excatly “-DKokkos_ENABLE_DEBUG=ON ", and in configuring SCREAM, there is no such option. Using grep, we found several “Debug” in components/scream/ekat/cmake/Kokkos.cmake
, after changing to “DEBUG”, we pass this point…
New error [26%]:
/users/b145872/project-dir/app/scream/components/scream/ekat/src/ekat/scream_kokkos_meta.hpp(18): error: class "Kokkos::MemoryTraits<0U>" has no member "RandomAccess"
value = ((View::traits::memory_traits::RandomAccess ? Kokkos::RandomAccess : 0) |
According to kokkos programming guide, the RandomAccess
is a trait from CUDA. It seems we also have to build cuda for SCREAM? If this is true, the SCREAM is required to run on GPU nodes. Okay, just a simple test, let us first see what would happen if we build kokkos in seriel mode.
The seriel mode shows similar errors. Besides, I noticed it seems all memory access operation including “Atomic Access” traits are missing, thus it may not be simply “need cuda” issues. Interesting.
When I used the latest kokkos from github, different errors occur when try “Seriel” and “OPENMP”. When only “Seriel” is turned on, MPI errors will occur in compiling SCREAM.
On Jul 20, I found a new version of master branch and following the new instructions in build.md, the test can be finished successfully.
When I turn on CUDA with cuda version 10.1, there are still problems.