The ZCHPC is conducting the National Student Cluster Building training program as a way of promoting and developing an HPC curriculum for undergraduates. The pilot program includes eight tertiary institutions namely Africa University (AU), Midlands State University (MSU), University of Zimbabwe (UZ), National University of Science and Technology (NUST), Bindura University of Science Education (BUSE), Chinhoyi University of Technology (CUT), Harare Institute of Technology (HIT) and Great Zimbabwe University (GZU). Participating students are expected to take part in a cluster-building competition which would run at a date to be advised. Undergraduate students from the institutions would have a chance to showcase their knowledge in cluster building, parallel processing as well as HPC application. The students acquire hands-on training on HPC for future use in any HPC-related field. They get an opportunity to learn software applications used to solve real-world problems. The training involves the use of virtual technology to simulate real-world HPC clusters. That is, each cluster is built on a single host with several virtual machines. VMWare workstation is used and students install Linux CentOS 7 on the Virtual Machine. Linux Firewall and SELinux are configured for the HPC environment. The following services are installed and configured on the virtual cluster.
- RSH
- SSH
- NIS
- NFS
The installation and configuration of these services are part and parcel of cluster building. This is because all nodes can communicate smoothly with each other to collaborate in parallel processing.
After cluster implementation is done, students are taught how to install and configure job submission and management software called TORQUE. It is a resource management system for submitting and controlling jobs on supercomputers, clusters, and grids. TORQUE manages jobs that users submit to various queues on a computer system, each queue representing a group of resources with attributes necessary for the queue’s jobs. Before implementing TORQUE, students are theoretically exposed to several job submission and management software.
In HPC environment, it is necessary to have resources monitored, for this training, several monitoring software were analysed before choosing Ganglia for implementation. With Ganglia, you can monitor performance at a cluster level, node level as well as processor level.
Message Passing Interface (MPI) is another important implementation required in most HPC environments. Eight (8) out of ten (10) HPCs use MPICH worldwide which is also the one used for this training. MPICH is installed and configured before sample programs are run to test it.
Parallel programming is another skill that the students are equipped with. Parallel programming is taught using the C programming language and several programs are written to run in parallel using mpirun.
Installation, configuration as well as running of different HPC applications concludes the cluster building training workshop.