middleware:Glite/31
Contents |
Introduction
|
| ||
|
The gLite middleware itself is a complex system with interconnected parts, interacting over the network. This includes as the middleware to store a data (dCache Storage Element (SE)) as cluster resources (Worker Nodes, Local Resource Management System, NFS server). Every gLite instance has Computing Element as a frontend for job submission. All connections need to pass a generic interface to the cluster (Grid Gate). Information Service (IS) or "site BDII" provides information about the Grid resources and their status which can be used for monitoring and accounting. Current D-Grid gLite implementation uses Globus Monitoring & Discovery Service (MDS) for resource discovery and to publish the resource status. |
| |
|
Note from ticket 614: WARNING: It is possible that user files found in the user accounts home folders get deleted by a gLite cleanup cronjob if the Sites used gLite together with some other middleware (Globus, Unicore or both). The problematic cronjob is:/etc/cron.d/cleanup-grid-accounts D-Grid uses shared instead of poolaccount accounts with fixed user mappings, hence the problematic cronjob should be disabled. This cronjob creates logfiles: /var/log/cleanup-grid-accounts.log* /opt/lcg/etc/cleanup-grid-accounts.conf The recommended solution is to completely disable that cronjob in the gLite CE. Please be aware that:
| ||
gLite server v.3.1
Prepare
- Software
- Scientific Linux version 4.8 32 bit
- Java JDK >= 1.6.0
- perl
- Torque Client
Optimizing the configuration:
Use minimal operating system installation without firewall. To verify installed packages use the command
-
rpm -qa | grep package_name
Install the following additional packages:
-
yum -y install wget yum rpm make gcc gcc-c++ tar sed zlib openssl
After the installation is complete, turn off any unnecessary services (like gpm, sendmail, cups, haldaemon, messagebus, pcmcia, anacron, atd) with the following command:
-
chkconfig <SERVICE> off
Configure the following settings for the server:
- Server Certificates for gLite CE
The supported installation method for SL4 is the yum tool, and you have to configure yum repositories yourself and install the meta packages using your preferred way.
| Please note that YAIM IS NOT SUPPORTING INSTALLATION |
- Download the following repo files into the /etc/yum.repo.d:
- jpackage.repo
- lcg-CA.repo
- lcg-CE.repo
- glite-TORQUE_utils.repo
- Firewall configuration
The LCG/gLite frontend runs the LCG CE and Site-BDII services. To enable the communication, check the following ports (how to open port in firewall):
| Service | Incoming ports (TCP) | Differs from default configuration |
| GRAM Gatekeeper + Jobmanager | 2119 | No |
| Globus port-range (Jobmanager, GridFTP) | 20000-25000 | No |
| BDII | 2170 | No |
| GridFTP | 2811 | No |
administrator's script: prepare.sh
#!/bin/bash# prepare gLite to installREPO_URL="http://mirror.scc.kit.edu/downloads/yum.repo"
# Configure repositories# Add yum repositorieswget -O /etc/yum.repos.d/sl-dgiref.repo ${REPO_URL}/sl-dgiref.repo
wget -O /etc/yum.repos.d/jpackage17.repo ${REPO_URL}/jpackage17.repo
wget -O /etc/yum.repos.d/dag.repo ${REPO_URL}/dag.repo
wget -O /etc/yum.repos.d/lcg-CA.repo ${REPO_URL}/glite/lcg-CA.repo
yum clean all# Missing packages installationyum -y install perl java
# create directory for the grid host certificatesmkdir /etc/grid-security/
# after copy the host certificate and host key into /etc/grid-security/
Install
The D-Grid reference installation uses the LCG CE variant for the gLite computing resources. Hence the following three main gLite components must be installed on the CE (Computing Element):
- Computing Element: lcg-CE package
- Information system: glite-BDII package
- Batch system components: glite-TORQUE_utils package
administrator's script: install.sh
#!/bin/bash# install gliteyum -y install lcg-CE lcg-CA glite-BDII
yum -y install glite-TORQUE_utils
Configure
| To install the gLite Monitoring services (BDII and RGMA), please refer to gLite services page. |
Generally speaking the gLite configuration done by the YAIM packages (for the YAIM description check YAIM guide). There are three important site-specific configuration files:
- site-info.def has site-specific configuration, (check also: /opt/glite/etc/gip/ldif/glite-info-site.ldif)
- users.conf to set up users,
- groups.conf for access rules.
The files structure description can be found: into the /opt/glite/yaim/examples/ (for example users.conf.README). The file users.conf must be created or adapted for all VOs users. During the configuration, the YAIM configuration tool creates these users if they are not exist yet. If the user accounts already exist YAIM do not change the UIDs/GIDs. The entries are controlled in the directory /etc/grid-security/gridmapdir.
- Certificates
The certificate installation procedure can be done by the two ways:
- Use the apt savannah.fzk.de repository. Examples:
- install the fzk-vomscert package from the apt repository:
rpm savannah.fzk.de repository/fzk security
cat << EOF > /etc/apt/sources.list.d/fzk.list
###
### FZK apt repository containing some packages needed for DGrid
### Currently these are the VOMS server certificate, and the GridKa-CA
### configuration rpms. Do not remove this repository.
###
rpm http://savannah.fzk.de repository/fzk security
EOF
apt-get update
apt-get install fzk-vomscert- GSI configuration. Install the ca_FZK-local package from the following apt repository:
rpm savannah.fzk.de repository/fzk security
- GSI configuration. Install the ca_FZK-local package from the following apt repository:
- Use the d-grid download area (see the following script)
administrator's script: configure.sh
#!/bin/bash# configure gLite# load parameters from prepare sectionBASE_URL=http://mirror.scc.kit.edu/downloads/src/glite/2009.1
# Host certificates# The host certificate and the associated key are copied in the directory /etc/grid-security:cp hostcert.pem hostkey.pem /etc/grid-security
chmod 400 /etc/grid-security/hostkey.pem
chmod 644 /etc/grid-security/hostcert.pem
# VOMS server certificate# Copy the d-grid VOMS server certificate into /etc/grid-security/vomsdirwget -O /etc/grid-security/vomsdir/dgrid-voms.fzk.de ${BASE_URL}/dgrid-voms.fzk.de
# GSI configuration for the GridKA CA (needed for grid-cert-request, etc.):# either download and install the GSI configuration rpmrpm -ihv ${BASE_URL}/ca_FZK-local-1.0-1.noarch.rpm
# Yaim Configurationcp /opt/glite/yaim/examples/siteinfo/site-info.def /opt/glite/yaim/site-info.def
### Since the site-info.def file contains passwords, it should NOT be readable for users!##chmod 600 /opt/glite/yaim/site-info.def
# and/orchmod 700 /opt/glite/yaim
### The following warnings may occur during the configuration, although this can be ignored:### /sbin/ldconfig: <LIBRARY> is not a symbolic link# rfiod: unrecognized service# users_getprduser: could not find prd user for <VO> in users.conf## /opt/glite/yaim/site-info.def configuration## lcg CE# Required variables in the site-info.def for the configuration of the lcg-CE are:BATCH_SERVER
BDII_HOST
CE_BATCH_SYS
CE_CPU_MODEL
CE_CPU_SPEED
CE_CPU_VENDOR
CE_INBOUNDIP
CE_LOGCPU
CE_MINPHYSMEM
CE_MINVIRTMEM
CE_OS
CE_SMPSIZE
CE_OS_RELEASE
CE_OS_VERSION
CE_OS_ARCH
CE_OUTBOUNDIP
CE_PHYSCPU
CE_RUNTIMEENV
CE_SF00
CE_SI00
GROUPS_CONF
<queue-name>_GROUP_ENABLE
JOB_MANAGER
QUEUES
SE_LIST
USERS_CONF
VOS
VO_<vo-name>_VOMS_SERVERS
VO_<vo-name>_SW_DIR
VO_<vo-name>_VOMS_CA_DN
VO_<vo-name>_VOMSES
# Torque utils# For the configuration of the torque utilities following variables have to be set in the site-info.def:BATCH_SERVER
CE_HOST
QUEUES
SITE_NAME
WN_LIST
# The configuration is done by/opt/glite/yaim/bin/yaim -c -s "/opt/glite/yaim/site-info.def" -n TORQUE_utils -n lcg-CE
Proceed
The gLite instance is started automatically.
| To make available stagein/stageout options for PBS Jobs, the /etc/ssh/shosts.equiv and /etc/ssh/ssh_known_hosts should be distributed from gLite into all worker nodes. The reference installation use the cfengine to implement such a procedure (use link).
|
administrator's script: proceed.sh
#!/bin/bash# proceed
Initial test
Examine the newly installed system by the following commands:
administrator's script: test.sh
#!/bin/bash# initial tests for gLite installation### Create a voms proxy[grid user] $ voms-proxy-init --voms dgtest
### Show proxy info[grid user] $ voms-proxy-info --al
#subject : /C=DE/O=GermanGrid/OU=FZK/CN=Grid User/CN=proxy#issuer : /C=DE/O=GermanGrid/OU=FZK/CN=Grid User#identity : /C=DE/O=GermanGrid/OU=FZK/CN=Grid User#type : proxy#strength : 512 bits#path : /tmp/x509up_u7632#timeleft : 7:46:28#=== VO dgtest extension information ===#VO : dgtest#subject : /C=DE/O=GermanGrid/OU=FZK/CN=Grid Jrad#issuer : /O=GermanGrid/OU=FZK/CN=host/dgrid-voms.fzk.de#attribute : /dgtest/Role=NULL/Capability=NULL### Create a sample job[user]$ vi hostname.jdl
Executable = "hostname.sh";stdOutput = "stdout";stdError = "stderr";InputSandbox = {"hostname.sh"};
OutputSandbox = {"stdout", "stderr"}
### Create input file for the job[user]$ vi hostname.sh
hostname/usr/bin/id sleep 10
### Submit a sample job.[grid user]$ glite-wms-job-submit -a hostname.jdl
#Connecting to the service https://iwrrb.fzk.de:7443/glite_wms_wmproxy_server#====================== glite-wms-job-submit Success ======================##The job has been successfully submitted to the WMProxy Your job identifier is:##https://iwrrb.fzk.de:9000/TsAUEzstiFMmbupVY37KWg##==========================================================================### Show job status[user] $ glite-wms-job-status https://iwrrb.fzk.de:9000/TsAUEzstiFMmbupVY37KWg
### If status is '''done''' get the job output and store it locally[user] $ glite-wms-job-output --dir . https://iwrrb.fzk.de:9000/TsAUEzstiFMmbupVY37KWg
Update
Updates to gLite 3.1 are released regularly. It is enough to execute yum update to update the instance.
WARNING: Several sites use auto update mechanism. Sometimes middleware updates require non-trivial configuration changes or a reconfiguration of the service. This could involve database schema changes, restart service, new configuration files, etc, which makes it difficult to ensure that automatic updates will not break up a service. Thus NOT TO USE AUTOMATIC UPDATE PROCEDURE OF ANY KIND!
administrator's script: update.sh
#!/bin/bash# update for glite#If reconfiguration of any kind is necessary, just run the following command# (don't forget to list all node types installed in your host):node="-n lcg-CE -n TORQUE_utils"
/opt/glite/yaim/bin/yaim -c -s site-info.def $node
gLite services
Top Level
There are some "top level" services, provided for interactions between providers sites and users.
- "top level BDII" - Berkeley Database Information Index is a Lightweight Directory Access Protocol server which collect the data from the sites information services.
- Replica Catalog - keep track where data are stored.
- Workload Management System (WMS) - a set of Grid middleware components responsible for the tasks distribution and management across Grid resources, in such a way that applications are conveniently, efficiently and effectively executed. The core component is the Workload Manager (WM), whose purpose is to accept and satisfy requests for job management coming from its clients.
Site-level
BDII
to install BDII, use the following procedure:
- download the glite-BDII.repo
- install package glite-BDII with yum
- configure site-info.def
cd /etc/yum.repos.d wget http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/glite-BDII.repo yum -y install glite-BDII ## glite site BDII # For the configuration of the Site BDII following variables have to be set in the site-info.def: CE_HOST DCACHE_ADMIN ## when using dCache as SE SITE_EMAIL SITE_LAT SITE_LONG SITE_NAME BDII_HOST SITE_BDII_HOST BDII_REGIONS BDII_CE_URL BDII_SE_URL ## when a SE is available on the Site # The configuration is done by /opt/glite/yaim/bin/yaim -c -s "site-info.def" -n BDII_site
MON-RGMA
| The current Reference Installation does not use any of the services provided by the MON component, in particular it doesn't make use of RGMA at all. If your site needs it, use the following documentation. |
to install the MON component, use the following procedure:
- download the file glite-MON.repo
- install the package glite-MON with yum
- configure site-info.def
cd /etc/yum.repos.d wget http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/glite-MON.repo yum -y install glite-MON ## glite MonBox # For the configuration of the MonBox following variables have to be set in the site-info.def: APEL_DB_PASSWORD CE_HOST GRIDICE_SERVER_HOST MON_HOST MYSQL_PASSWORD SITE_NAME SITE_BDII_HOST # The configuration is done by /opt/glite/yaim/bin/yaim -c -s "site-info.def" -n MON
Attribute-based authorization
The Attribute-based authorization is already a part of the gLite user administration and includes only the configuration of the /opt/glite/yaim/etc/groups.conf file.
release:/glite/yaim/etc/groups.conf
JavaGAT
Regarding security, the gLite adaptor behaves mostly like Globus. The difference between Globus Tookit and gLite, is that instead of an entirely self-signed proxy, gLite uses so-called VOMS proxies for authentication and authorization.
- locate the personnel certificates files
userkey.pemandusercert.pemin the directory$HOME/.globus - locate the host certificates of the Grid hosts you like to access in the directory
$HOME/.globus/certificates. - The dataset
$HOME/.globus/cog.propertiesshould exists and to be like:
cat $HOME/.globus/cog.properties #Java CoG Kit Configuration File #usercert: The path to the file containing your dgrid certificate. usercert=/home/dgdt0000/.globus/usercert.pem # userkey: The path to the file containing your Grid key. userkey=/home/dgdt0000/.globus/userkey.pem # proxy: The name under which your proxy certificate which you create with grid-proxy-init is stored. proxy=/tmp/x509up_u1000 #cacert: The path of the directory, which contains the host certificates. #cacert=/etc/grid-security/certificates cacert=/home/dgdt0000/.globus/cog-certificates
To be able to make the VOMS-proxy request on behalf of the user, the gLite adaptor needs to know a few additional pieces of data:
- The name of the VO for which the user wants to obtain a credential (e.g. dgtest)
- The endpoint of the VOMS server webservice (this address is usually different to the URL at which the VOMS admin can be accessed with a browser)
- The port at which the VOMS server is listening to requests
- The distinguished name (DN) of the VOMS Host. If you are unsure about this, you can usually find the information on the "Configuration" page in the VOMS admin server application.
An example configuration of all the necessary parameters for the gLite adaptor could look as follows:
GATContext context = new GATContext(); CertificateSecurityContext secContext = new CertificateSecurityContext( new URI("/home/dgdt0000/.globus/userkey.pem"), new URI("/home/dgdt0000/.globus/usercert.pem"), "mysupersecretpwd"); Preferences globalPrefs = new Preferences(); globalPrefs.put("vomsServerURL", "skurut19.cesnet.cz"); globalPrefs.put("vomsServerPort", "7001"); globalPrefs.put("vomsHostDN", "/DC=cz/DC=cesnet-ca/O=CESNET/CN=skurut19.cesnet.cz"); globalPrefs.put("VirtualOrganisation", "voce"); context.addPreferences(globalPrefs); context.addSecurityContext(secContext);
mpi
Prepare
Configuration is necessary on both the CEs (gLite) and WNs in order to support and advertise MPI correctly (see Site configuration for MPI for details). This is performed by the gLite YAIM module glite-yaim-mpi which should be run on both the CE and WNs.
administrator's script: prepare.sh
#!/bin/bash# prepareexit 0
Install
The following packages to install:
-
glite-MPI_utils
administrator's script: install.sh
#!/bin/bash# install MPI packagesecho "[glite-MPI_utils]
name=glite 3.1 MPIenabled=1gpgcheck=0baseurl=http://glitesoft.cern.ch/EGEE/gLite/R3.1/glite-MPI_utils/sl4/i386/" >> /etc/yum/repos.d
yum install glite-MPI_utilsexit 0
Configure
- Add the following to the site-info.de of the CE and WNs. see YaimConfig for detailed information.
- export set of environment variables to avoid
INFO: No MPI flavours enabled. - execute yaim command to configure
WARNING: in /etc/hosts you have to set wn with full hostname, otherwise yaim wont't find hostname -f:hostname wn.fzk.de and the yaim will abort the configuration!!!
After yaim configuration has finished edit /etc/hosts again with wn older hostname, other wise the node will be seen twice as different node wn and wn.fzk.de while reserving nodes for an MPI job.!!!
administrator's script: configure.sh
#!/bin/bash# configure mpi on gLite CE# export variablesexport MPI_OPENMPI_ENABLE="yes"
export MPI_OPENMPI_VERSION="1.2.9"
/opt/glite/yaim/bin/yaim -c -s site-info.def -n MPI_CE -n lcg-CE -n glite-MON -n glite-TORQUE_utils -n glite-BDII_site
# To allow Torque to allocate the correct CPU number requisted by a MPI job:vi /opt/globus/lib/perl/Globus/GRAM/JobManager/lcgpbs.pm
# change:$cluster = 0;
$cpu_per_node = 0;
# To:$cluster = 1;
$cpu_per_node = 1;
cat /var/spool/pbs/torque.cfg
SUBMITFILTER /var/spool/pbs/submit_filter.pl
# Change Torque dgipar queue configurationqmgr:#set queue dgipar resources_default.ncpus = 2
#changed to:set queue dgipar resources_default.ncpus = 1
Initial test
- You can try submitting a job to your site using the instructions found via the page job submission
- You can do some basic tests by logging in on a WN as a pool user and running the following:
administrator's script: test.sh
#!/bin/bash# initial test mpiUSER='griduser'
WN='worker node address'
ssh $USER@$WN
env|grep MPI_
# Result should be:# MPI_MPICC_OPTS=-m32# MPI_SSH_HOST_BASED_AUTH=yes# MPI_OPENMPI_PATH=/opt/openmpi/1.1# MPI_LAM_VERSION=7.1.2# MPI_MPICXX_OPTS=-m32# MPI_LAM_PATH=/usr# MPI_OPENMPI_VERSION=1.1# MPI_MPIF77_OPTS=-m32# MPI_MPICH_VERSION=1.2.7# MPI_MPIEXEC_PATH=/opt/mpiexec-0.80# MPI_MPICH2_PATH=/opt/mpich2-1.0.4# MPI_MPICH2_VERSION=1.0.4# I2G_MPI_START=/opt/i2g/bin/mpi-start# MPI_MPICH_PATH=/opt/mpich-1.2.7p1exit 0