cluster:Torque/236/client
Contents |
TORQUE client v.2.3.6
Prepare
- Operating system
- Scientific Linux version 5.4 64 bit
| If you want to use the maui scheduler instead of the torque scheduler, do not start the TORQUE's pbs_sched daemon after the torque installation.
|
- Firewall configuration
Be sure that if you have firewalls running on the server or node machines that you allow connections on the appropriate ports for each machine. TORQUE pbs_mom daemons use UDP port 1023 and the pbs_server/pbs_mom daemons use ports 15001-15004 by default (how to open port in firewall).
| Firewall based issues are often associated with server to mom communication failures and messages such as 'premature end of message' in the log files. Also, the tcpdump program can be used to verify the correct network packets are being sent. |
administrator's script: prepare.sh
#!/bin/bash# prepare torque client#-> start routineREPO_URL="http://dgiref.d-grid.de/svn/dgiref/PROD/cf3/repl/repos/external/"
wget -O /etc/yum.repos.d/sl-dgiref.repo ${REPO_URL}/sl-dgiref.repo
#<- end routine
Install
MAUI use encrypted connections between the client and server. Symmetric encryption keys embedded in the binaries. Therefore absolutely necessary to install, RPM packages for clients and servers from the same source, i.e. with the same keys in the binaries!
administrator's script: install.sh
#!/bin/bash# install torque client# load parameters from prepare section#-> start routine# Choose the OS architecture:# OS_arch="x86_64" # x86_64 for 64 bitOS_arch="i386" # i386 for 32 bit
# install for SL4yum -y install torque-2.3.6-1cri.slc4.${OS_arch}
yum -y install torque-client-2.3.6-1cri.slc4.${OS_arch}
# install for SL5yum -y install torque-2.3.6-1cri.sl5 torque-client-2.3.6-1cri.sl5
# only for WN -----------------------------------------------------yum -y install torque-mom-2.3.6-1cri.slc4.${OS_arch}
# oryum -y install torque-mom-2.3.6-1cri.sl5
#<- end routine
Configure
- For each compute host, the MOM server must be configured to trust the pbs_server daemon. In TORQUE 2.0.0p5 and later, this can also be done by creating the $(TORQUECFG)/server_name file and placing the server hostname inside.
- Additional config parameters may be added to $(TORQUECFG)/mom_priv/config (see the MOM config page for details.)
Data management allows jobs’ data to be staged in/out or to and from the server and compute nodes.
- For shared filesystems (i.e., NFS, DFS, AFS, etc.) use the $usecp parameter in the mom_priv/config files to specify how to map a user’s home directory. (Example: $usecp gridmaster.tmx.com:/home /home)
- For local, non-shared filesystems, rcp or scp must be configured to allow direct copy without prompting for passwords (key authentication, etc.)
- To use the PBS StageIN /out mechanism to copy the input/output files between CEs and Worker Nodes, add the configuration into
/var/spool/pbs/mom_priv/configlike this:
$usecp *:/home /home $usecp *:/srv/nfs/home /srv/nfs/home
administrator's script: configure.sh
#!/bin/bash# configure# Declare the variables section ------------# Please insert your actual configuration# TORQUE_SERVER="torque server name"# from here ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~TORQUE_SERVER="sn05"
# till here ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#-> start routine# server_name configurationecho "${TORQUE_SERVER}" > /var/spool/pbs/server_name
echo "\
\$pbsserver ${TORQUE_SERVER}
\$logevent 255\$ideal_load 4\$max_load 10\$usecp *:/home /home\$usecp *:/srv/nfs/home /srv/nfs/home" > /var/spool/pbs/mom_priv/config
#<- end routine
Initial test
- From a user account, it should be possible to use a 'Hello World' job submitting, as well as an interactive shell on a WN
- The job results are as files STDIN.o<JOBID> (std-output) and STDIN.e<JOBID> (std-error).
- test MAUI
- The test on the gLite-CE should work as edginfo user configuration of gLite-packages.
- To check the status of the job query, the following command is used within the lifetime of submitted jobs:
qstat # The result should be something like the following: # Job id | Name | User | Time Use | S | Queue # 6.node1 | STDIN | user1 | 0 | R | batch
administrator's script: test.sh
#!/bin/bash# test# Declare the variables section ------------# Please insert your actual configuration# USER='grid user'# QUEUE=queue name# from here ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~USER='dgdt0001' # grid user, e.g. dgdt0001
QUEUE=dgiseq# till here ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#-> start routinesu ${USER} # as the grid user
echo "sleep 60; hostname" | /usr/bin/qsub
# Then you get e.g. the following:# 6.dgiref-isn02.fzk.deecho "/bin/hostname" | qsub -q ${QUEUE}
qsub -I -q ${QUEUE}
qstat
# The output files should be stored by the end of the jobs execution in the grid-user home directory:ls ~# STDIN.e6 STDIN.o6#<- end routine
Update
To update use the standart rpm command syntax.
administrator's script: update.sh
#!/bin/bash# remove torque clientyum remove torque# update torque clientyum update torqueexit 0