cluster:Torque/257/wn
From Dgiref
Contents |
TORQUE WN v.2.5.7
Prepare
- Operating system
- Scientific Linux version 5.6 64 bit
Configure the following settings for the host:
- UMD repo
- Create Host Based Authentication for Torque server and Middleware hosts. See ssh auth
administrator's script: prepare.sh
# install umd#clean oldrm /etc/yum.repos.d/UMD* /etc/yum.repos.d/epel*
wget http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
wget http://repository.egi.eu/sw/production/umd/1/sl5/x86_64/updates/umd-release-1.0.2-1.el5.noarch.rpm
rm -f epel-release-5-4.noarch.rpm umd-release-1.0.2-1.el5.noarch.rpm
yum install epel-release-5-4.noarch.rpm
yum install yum-prioritiesyum install umd-release-1.0.2-1.el5.noarch.rpm
sed -i -e "s/priority=.*/priority=5/g" /etc/yum.repos.d/UMD-1-base.repo
sed -i -e "s/priority=.*/priority=4/g" /etc/yum.repos.d/UMD-1-updates.repo
Install
- Install Torque-mom with yum from umd repo
administrator's script: install.sh
# the emi torque package can be used as wellyum -y install torque-client torque-mom
Configure
| Create hostname alias for Bach server in hosts file if use the separate internal network for communication between Batch server and WN |
| All nodes should be specified in /var/torque/server_priv/nodes on batch server host See configure batch server |
- Customization of mom config file
- Prepare munge service
administrator's script: configure.sh
#BATCH_SERVER=dgiref-batch.fzk.deBATCH_SERVER_INTERNAL_IP="10.0.171.205"
echo "$BATCH_SERVER_INTERNAL_IP $BATCH_SERVER" >> /etc/hosts
echo -e $BATCH_SERVER >> /var/torque/server_name
echo "\$pbsserver $BATCH_SERVER
\$logevent 255\$ideal_load 4\$max_load 10\$usecp *:/home /home\$usecp *:/srv/nfs/home /srv/nfs/home" > /var/torque/mom_priv/config
scp $BATCH_SERVER:/etc/munge/munge.key /etc/munge/munge.key
chown munge:munge /var/log/munge/munged.log*
chown munge:munge /etc/munge/munge.key
Proceed
- Enable autoboot and start services
- munge
- pbs_mom
administrator's script: proceed.sh
/etc/init.d/pbs_mom restart
service munge restart
chkconfig munge on
chkconfig pbs_mom on
Initial test
- Check information about WN from batch server
- Start test job for WN
administrator's script: test.sh
#wn="wn10"
#show information about wnqnodes -q $wn
# Tryecho "
#!/bin/bashsleep 60"> test.sh
qsub -q dgiseq test.sh# to check supported queues:# ldapsearch -x -H ldap://<CE_FQDN>:2170 -b mds-vo-name=resource,o=grid#To detect the jobid:qstat -Qqstat -a#To check the jobtracejob <jobid>