cluster:Torque/216/client
Contents |
Torque/216/client
Prepare
- Operating system
- Scientific Linux version 4.5 64 bit
Optimizing the configuration:
Use minimal operating system installation without firewall. To verify installed packages use the command
-
rpm -qa | grep package_name
Install the following additional packages:
-
yum -y install wget yum rpm make gcc gcc-c++ tar sed zlib openssl
After the installation is complete, turn off any unnecessary services (like gpm, sendmail, cups, haldaemon, messagebus, pcmcia, anacron, atd) with the following command:
-
chkconfig <SERVICE> off
Configure the following settings for the server:
- Firewall configuration
Be sure that if you have firewalls running on the server or node machines that you allow connections on the appropriate ports for each machine. TORQUE pbs_mom daemons use UDP port 1023 and the pbs_server/pbs_mom daemons use ports 15001-15004 by default (how to open port in firewall).
| Firewall based issues are often associated with server to mom communication failures and messages such as 'premature end of message' in the log files. Also, the tcpdump program can be used to verify the correct network packets are being sent. |
administrator's script: prepare.sh
#!/bin/bash# prepareREPO_URL="http://dgiref.d-grid.de/svn/dgiref/PROD/cf3/repl/repos/external/"
wget -O /etc/yum.repos.d/sl-dgiref.repo ${REPO_URL}/sl-dgiref.repo
Install
MAUI use encrypted connections between the client and server. Symmetric encryption keys embedded in the binaries. Therefore absolutely necessary to install, RPM packages for clients and servers from the same source, i.e. with the same keys in the binaries!
administrator's script: install.sh
#!/bin/bash# install torque client# Choose the OS architecture:OS_arch="x86_64" # x86_64 for 64 bit
# OS_arch="i686" # i686 for 32 bit# Install packagesyum -y install torque-2.1.6-1cri_2dgrid_sl4.${OS_arch} torque-client-2.1.6-1cri_2dgrid_sl4.${OS_arch}
yum -y install maui-3.2.6p19-snap1171482917_1dgrid_sl4.${OS_arch} maui-client-3.2.6p19-snap1171482917_1dgrid_sl4.${OS_arch}
Configure
- For each compute host, the MOM server must be configured to trust the pbs_server daemon. In TORQUE 2.0.0p5 and later, this can also be done by creating the $(TORQUECFG)/server_name file and placing the server hostname inside.
- Additional config parameters may be added to $(TORQUECFG)/mom_priv/config (see the MOM config page for details.)
Data management allows jobs’ data to be staged in/out or to and from the server and compute nodes.
- For shared filesystems (i.e., NFS, DFS, AFS, etc.) use the $usecp parameter in the mom_priv/config files to specify how to map a user’s home directory. (Example: $usecp gridmaster.tmx.com:/home /home)
- For local, non-shared filesystems, rcp or scp must be configured to allow direct copy without prompting for passwords (key authentication, etc.)
administrator's script: configure.sh
#!/bin/bash# Torque client configuration# server_name configurationecho ${TORQUE_SERVER} > /var/spool/pbs/server_name
# Maui configuration# For MAUI file ''/var/spool/maui/maui.cfg'' must be configuredsed -i /var/spool/maui/maui.cfg -e 's/localhost/${TORQUE_SERVER}/g'
exit 0
Proceed
To start / stop use the commands:
administrator's script: proceed.sh
#!/bin/bash# proceed# Start the pbs_mom daemon on each compute nodepbs_mom
exit 0
Initial test
- From a user account, it should be possible to use a 'Hello World' job submitting, as well as an interactive shell on a WN
- The job results are as files STDIN.o<JOBID> (std-output) and STDIN.e<JOBID> (std-error).
- test MAUI
- The test on the gLite-CE should work as edginfo user configuration of gLite-packages.
administrator's script: test.sh
#!/bin/bashUSER='grid user' # e.g. dgdt0001
su ${USER} # as the grid user
echo "sleep 60; hostname" | /usr/bin/qsub
# Then you get e.g. the following:# 6.dgiref-batch.fzk.deecho "/bin/hostname" | qsub -q ${QUEUE}
qsub -I -q ${QUEUE}
# To check the status of the job query, the following command is used within the lifetime of submitted jobs:qstat
# The result should be something like the following:# Job id | Name | User | Time Use | S | Queue# 6.node1 | STDIN | user1 | 0 | R | batch# The output files should be stored by the end of the jobs execution in the grid-user home directory:ls ~# STDIN.e6 STDIN.o6# MAUI can be tested withshowq
showstats
diagnose -gexit 0
Update
To update use the standart rpm command syntax.
administrator's script: update.sh
#!/bin/bash# Reinstall packages:rpm -Uvh --force package-to-reinstall.rpm
# delete installation:rpm -Uvh --force package-to-delete.rpm
exit 0