cluster:Torque/236/client

From Dgiref
Jump to: navigation, search
Please open a NGI-DE ticket if you experience any Installation or Configuration problem.

Contents

TORQUE client v.2.3.6

Prepare

Operating system
Scientific Linux version 5.4 64 bit
Note-icon.png
  
If you want to use the maui scheduler instead of the torque scheduler, do not start the TORQUE's pbs_sched daemon after the torque installation.
Firewall configuration

Be sure that if you have firewalls running on the server or node machines that you allow connections on the appropriate ports for each machine. TORQUE pbs_mom daemons use UDP port 1023 and the pbs_server/pbs_mom daemons use ports 15001-15004 by default (how to open port in firewall).

Note-icon.png
  
Firewall based issues are often associated with server to mom communication failures and messages such as 'premature end of message' in the log files.

Also, the tcpdump program can be used to verify the correct network packets are being sent.

Install

MAUI use encrypted connections between the client and server. Symmetric encryption keys embedded in the binaries. Therefore absolutely necessary to install, RPM packages for clients and servers from the same source, i.e. with the same keys in the binaries!


Configure

  • For each compute host, the MOM server must be configured to trust the pbs_server daemon. In TORQUE 2.0.0p5 and later, this can also be done by creating the $(TORQUECFG)/server_name file and placing the server hostname inside.
  • Additional config parameters may be added to $(TORQUECFG)/mom_priv/config (see the MOM config page for details.)

Data management allows jobs’ data to be staged in/out or to and from the server and compute nodes.

  • For shared filesystems (i.e., NFS, DFS, AFS, etc.) use the $usecp parameter in the mom_priv/config files to specify how to map a user’s home directory. (Example: $usecp gridmaster.tmx.com:/home /home)
  • For local, non-shared filesystems, rcp or scp must be configured to allow direct copy without prompting for passwords (key authentication, etc.)
  • To use the PBS StageIN /out mechanism to copy the input/output files between CEs and Worker Nodes, add the configuration into /var/spool/pbs/mom_priv/config like this:
$usecp *:/home /home
$usecp *:/srv/nfs/home /srv/nfs/home


Initial test

  • From a user account, it should be possible to use a 'Hello World' job submitting, as well as an interactive shell on a WN
  • The job results are as files STDIN.o<JOBID> (std-output) and STDIN.e<JOBID> (std-error).
  • test MAUI
  • The test on the gLite-CE should work as edginfo user configuration of gLite-packages.
  • To check the status of the job query, the following command is used within the lifetime of submitted jobs:
qstat 
# The result should be something like the following: 
# Job id  |   Name    |   User  | Time Use  | S | Queue
# 6.node1 |   STDIN   |  user1  | 0         | R |  batch

Update

To update use the standart rpm command syntax.


Personal tools