cluster:Worker/2
Contents |
Introduction
Cluster/ grid middleware
|
Worker Nodes are the work horses of the system. Their role is to run batch jobs, submitted by the batch server. Jobs often spend most of their life cycle executing. While a job is running, its status can be queried with qstat. When a job has completed, by default, the stdout to store the output and stderr to store the errors files are created. |
|
MPI
| |
The Message Passing Interface (MPI) is a standard developed by the Message Passing Interface Forum (MPIF). |
The standard includes:
- Point-to-point communication
- Collective operations
- Process groups
- Communication contexts
- Process topologies
- Bindings for Fortran 77 and C
- Environmental management and inquiry
- Profiling interface
There are different implementation for the MPI standart. The D-Grid reference installation consider the http://www.open-mpi.org/ Open MPI] (MPI-2 Implementation) project. To use the MPI for the Grid infrastructure, the MPI-Start is needed. MPI-Start is a set of shell scripts to close the gap between the workload management system of a Grid insfrastructure and the configuration of the nodes on which MPI applications are run. To use the MPI, the following hosts should be configured withing the cluster:
- grid middleware server (gLite CE 3.1)
- WNs
Links:
Worker node
Prepare
- Operating system
- Scientific Linux version 5.6 64 bit
Optimizing the configuration:
Use minimal operating system installation without firewall. To verify installed packages use the command
-
rpm -qa | grep package_name
Install the following additional packages:
-
yum -y install wget yum rpm make gcc gcc-c++ tar sed zlib openssl
After the installation is complete, turn off any unnecessary services (like gpm, sendmail, cups, haldaemon, messagebus, pcmcia, anacron, atd) with the following command:
-
chkconfig <SERVICE> off
Configure the following settings for the server:
- proxy
- ntp
- script:/etc/resolv.conf
- Torque for WN
- UMD Repo
- Firewall configuration
Allowing incoming connections directed to the WNs is optional and Resource Providers can freely decide whether to permit them on a voluntary base. However, when such inbound connections are blocked, data transfers using GridFTP will be forced to work in "single-stream" mode and their performance might be accordingly degraded (how to open port in firewall).
| Service | Incoming ports (TCP) | Change to default configuration |
| GridFTP | 20000-25000 | Yes |
| WN should have an access to external network |
administrator's script: prepare.sh
#!/bin/bash# install umd#clean oldsudo - su
rm /etc/yum.repos.d/UMD* /etc/yum.repos.d/epel*
wget http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
wget http://repository.egi.eu/sw/production/umd/1/sl5/x86_64/updates/umd-release-1.0.2-1.el5.noarch.rpm
rm -f epel-release-5-4.noarch.rpm umd-release-1.0.2-1.el5.noarch.rpm
yum install epel-release-5-4.noarch.rpm
yum install yum-prioritiesyum install umd-release-1.0.2-1.el5.noarch.rpm
sed -i -e "s/priority=.*/priority=5/g" /etc/yum.repos.d/UMD-1-base.repo
sed -i -e "s/priority=.*/priority=4/g" /etc/yum.repos.d/UMD-1-updates.repo
# Missing packages installationyum -y install gcc gcc-c++ openssl
#<- end routine
Install
There are some types of packages to install into the Cluster Node that it will provide the Worker Node functionality
- glite-WN packages to operate with Grid middleware
administrator's script: install.sh
#!/bin/bash# install worker node packages# install glite-WN-3.2yum -y install glite-yaim-torque-utils.noarch yaim-voms.x86_64 glite-yaim-clients.noarch glite-yaim-core.noarch glite-yaim-mpi.noarch glite-yaim-torque-client.noarch lcgutil-yaim.noarch lfc-yaim.noarch yaim-glexec-wn.noarch
Configure
- Mount File system
- Configure users
- Prepare WNs for gLite
- The packages for the gLite middleware and OGSA-DAI will be provided by NFS server.
- The middleware configuration for all Worker nodes is unique to any WN.
- This require the corresponding WN performs the write rights for the configuration scripts to the directory /opt/glite-MW.
- This directory mounts with appropriate write rights.
- Rights configuration can be changed later, after general configuration.
- The specific configuration can be implemented using the prepared templates on the: http://www.d-grid.de/index.php?id=132
- The info.def, groups.conf and users.conf files are required for the WN configuration.
| Note: The site-info.def have the JAVA_LOCATION which should be configured!
|
WARNING:
The dgrid_env.sh script should be edited and the variables VOS, INSTALL_ROOT and DGRID_VO_DIRECTORY adjusted. The script ensures that only the D-Grid VOs users used the middleware environment variables.
| The dgrid_env.sh is calling another script - grid_env.sh
|
- Optional adjustment: In order to accelerate the WN configuration, the Certificates and CRLs configurations can be denied (they will be executed on the gLite-CE). This require removing the following functions from the
$GLITE_DIR/glite/yaim/scripts/node-info.def:- install_certs_userland
- config_fix_edg-fetch-crl-cron
- config_crl
by using the function TAR_WN_FUNCTIONS.
| The following error message is NOT important: [ERROR] Failed to add group
|
administrator's script: configure.sh
Update
administrator's script: update.sh
MPI support
Prepare
Configuration is necessary on both the CEs (gLite) and WNs in order to support and advertise MPI correctly (see Site configuration for MPI for details). This is performed by the gLite YAIM module glite-yaim-mpi which should be run on both the CE and WNs.
administrator's script: prepare.sh
#!/bin/bash# prepare mpi on wn#-> start routineyum -y install torque-devel-2.3.6 glite-yaim-mpi
yum -y install gcc-c++ compat-gcc-32 compat-gcc-32-c++
#<- end routine
Install
The following packages to install:
-
openmpi -
MPI-START
administrator's script: install.sh
#!/bin/sh# install MPI packages#-> start routinerpm install --prefix=/opt/glite-mw/openmpi-1.2.9 http://mirror.scc.kit.edu/downloads/rpms/wns/2010.1/openmpi-1.2.9-1.x86_64.rpm
yum -y install --prefix= /opt/glite-mw/i2g http://mirror.scc.kit.edu/downloads/rpms/wns/2010.1/i2g-mpi-start-0.0.58-1
#<- end routine
Configure
- Add the following to the site-info.de of the CE and WNs. see YaimConfig for detailed information.
- export set of environment variables to avoid
INFO: No MPI flavours enabled. - execute yaim command to configure
WARNING: in /etc/hosts you have to set wn with full hostname, otherwise yaim wont't find hostname -f:hostname wn.fzk.de and the yaim will abort the configuration!!!
After yaim configuration has finished edit /etc/hosts again with wn older hostname, other wise the node will be seen twice as different node wn and wn.fzk.de while reserving nodes for an MPI job.!!!
administrator's script: configure.sh
#!/bin/bash# configure mpi on wns#-> start routinecat /opt/glite-mw/site-info.de
MPI_OPENMPI_ENABLE="yes"
MPI_OPENMPI_PATH="/opt/glite-mw/openmpi-1.2.9"
MPI_OPENMPI_VERSION="1.2.9"
MPI_SHARED_HOME="yes"
MPI_SSH_HOST_BASED_AUTH="no"
I2G_MPI_START="$INSTALL_ROOT/i2g/bin/mpi-start"
# export environment variablesexport MPI_OPENMPI_VERSION="1.2.9"
export MPI_OPENMPI_PATH="/opt/glite-mw/openmpi-1.2.9"
export MPI_OPENMPI_ENABLE="yes"
# execute yaim command/opt/glite-mw/glite/yaim/bin/yaim -c -s site-info.def -n MPI_WN -n WN_TAR -n TORQUE_client
#<- end routine
Initial test
- You can try submitting a job to your site using the instructions found via the page job submission
- You can do some basic tests by logging in on a WN as a pool user and running the following:
administrator's script: test.sh
#!/bin/bash# initial test mpiUSER='griduser'
su $USER
env|grep MPI_
# Result should be:# MPI_MPICC_OPTS=-m32# MPI_SSH_HOST_BASED_AUTH=yes# MPI_OPENMPI_PATH=/opt/openmpi/1.1# MPI_LAM_VERSION=7.1.2# MPI_MPICXX_OPTS=-m32# MPI_LAM_PATH=/usr# MPI_OPENMPI_VERSION=1.1# MPI_MPIF77_OPTS=-m32# MPI_MPICH_VERSION=1.2.7# MPI_MPIEXEC_PATH=/opt/mpiexec-0.80# MPI_MPICH2_PATH=/opt/mpich2-1.0.4# MPI_MPICH2_VERSION=1.0.4# I2G_MPI_START=/opt/i2g/bin/mpi-start# MPI_MPICH_PATH=/opt/mpich-1.2.7p1exit 0