middleware:Globus/service/GRAM
Contents |
WS-GRAM
Configure WS-GRAM
After successful installation of the recommended D-Grid package Globus is aready configured to use PBS (more precisely TORQUE) as the Local Resource Management System (LRMS). The interface for submitting jobs to the LRMS is provided by a component called Scheduler Adapter, which is basically a Perl module called <$GLOBUS_LOCATION>/lib/perl/Globus/GRAM/JobManager/pbs.pm. It should be patched as discribed below.
Patching the Scheduler Adapter
Line 387-388:
elsif($description->jobtype() eq 'mpi' || $description->jobtype() eq 'multiple')
should be replaced by:
elsif( $description->jobtype() eq 'mpi' || ($description->jobtype() eq 'multiple' and ($description->host_count() > 1 or $description->count() > 1) ) )
At line 408:
print CMD "#!/bin/sh\n";
should be added:
print CMD "#!/bin/sh\n"; print CMD ". /etc/profile";
A more rigorous patch is possible, if the Mpiexec implementation by Pete Wyckoff is installed on the cluster. Note that this version of Mpiexec is not fully compatible with those provided with MPICH and other MPI iomplementations. Most notable, the parameter '-machinefile' does not exist.
Configuring Sudo
In order to submit jobs on behalf of a user Globus needs to be authorized to invoke specific commands via sudo (Super User Do). To this end edit the file /etc/sudoers.
Add the following lines to /etc/sudoers:
#
# Disable "ssh hostname sudo <cmd>", because it will show the password in clear.
# You have to run "ssh -t hostname sudo <cmd>".
#
# Defaults requiretty
# Globus GRAM entries
globus ALL=(ALL) NOPASSWD: \
/usr/local/globus/libexec/globus-gridmap-and-execute \
-g /etc/grid-security/grid-mapfile \
/usr/local/globus/libexec/globus-job-manager-script.pl *
globus ALL=(ALL) NOPASSWD: \
/usr/local/globus/libexec/globus-gridmap-and-execute \
-g /etc/grid-security/grid-mapfile \
/usr/local/globus/libexec/globus-gram-local-proxy-tool *
References
Pre-WS GRAM configuration
- As root user create the /etc/xinetd.d/gsigatekeeper file and insert the following data:
service gsigatekeeper { socket_type = stream protocol = tcp wait = no user = root env += LD_LIBRARY_PATH=<$GLOBUS_LOCATION>/lib env += GLOBUS_TCP_PORT_RANGE=20000,25000 server = <$GLOBUS_LOCATION>/sbin/globus-gatekeeper server_args = -conf <$GLOBUS_LOCATION>/etc/globus-gatekeeper.conf disable = no }
- Restart as root user the xinetd daemon:
$ /etc/init.d/xinetd restart
GRAM tests
To be sure that GRAM accepts jobs, execute as grid-user:
- For WS-GRAM
> globusrun-ws -submit -F <FQDN of the Globus Frontend> -s -c /bin/hostname Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:66720d6a-6aac-11dd-82c4-af7ae8031d29 Termination time: 08/16/2008 09:27 GMT Current job state: Pending Current job state: Active Current job state: CleanUp-Hold dgiref-globus.fzk.de Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done.
- For Pre-WS-GRAM
> globus-job-run localhost:2119/jobmanager-fork /bin/date Fri Dec 21 10:59:52 CEST 2007
Turn off fork scheduler
To turn off the fork scheduler, rename the following configuration files, and restart the container as root user:
$ cd $GLOBUS_LOCATION/etc/gram-service-Fork $ mv jndi-config.xml jndi-config.xml_save $ cd $GLOBUS_LOCATION/etc/grid-services $ mv jobmanager-fork jobmanager-fork.save $ /etc/init.d/globus-container restart
Disabling the fork scheduler can be tested by a grid user as following:
- For WS-GRAM
> globusrun-ws -submit -c /bin/hostname Submitting job...Failed. globusrun-ws: Error submitting job globus_soap_message_module: SOAP Fault Fault code: soapenv:Server.userException Fault string: java.rmi.RemoteException: Job creation failed.; nested exception is: java.rmi.RemoteException: The Managed Job Factory Service at https://10.156.10.69:8443/wsrf/services/ManagedJobFactoryService does not have a resource with key "Fork". > globusrun-ws -submit -Ft Fork -c /bin/hostname Submitting job...Failed. globusrun-ws: Error submitting job globus_soap_message_module: SOAP Fault Fault code: soapenv:Server.userException Fault string: java.rmi.RemoteException: Job creation failed.; nested exception is: java.rmi.RemoteException: The Managed Job Factory Service at https://10.156.10.69:8443/wsrf/services/ManagedJobFactoryService does not have a resource with key "Fork".
- For Pre-WS-GRAM
> globus-job-run localhost:2119/jobmanager-fork /bin/date GRAM job submission failed because the gatekeeper failed to find the requested service (error code 93)