data:Dcache/195

From Dgiref
Jump to: navigation, search

Contents

Introduction

Dcache.png the dCache is a mass storage system developed by Deutsches Elektronen-Synchrotron (DESY) and Fermi National Accelerator Laboratory (Fermilab) which provides a disk cache management, transfer management, data access and data management protocols.

The goal of this project is to provide a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods.

Simplified dCache architecture

The dCache system consists of several components (nodes) which interact over the network

  1. A provider site.
  2. The user uses the SRM node for resources access, authentication, authorization to use the resources.
  3. From SRM node user can access the PNFS node which know the data location, ADMIN node which works as the information system for dCache.
  4. All connections need to pass the GridFTP Door, which acts as a generic interface to the pool of nodes, managed by the PNFS.

Package:    dcache 1.9.12-10
 os:             Scientific Linux version 5.6 64 bit
 server:        dgiref-dcache.fzk.de
 manuals:   dcache server v. 1.9.12 / extentions
 monitoring: monitoring page


Archive links
Information links
Download links
Guidelines links
Tutorials links


Please open a NGI-DE ticket if you experience any Installation or Configuration problem.

dCache server v1.9.5

Prepare

Operating system
Scientific Linux v.5.4 64 bit

Optimizing the configuration:


Use minimal operating system installation without firewall. To verify installed packages use the command

  • rpm -qa | grep package_name

Install the following additional packages:

  • yum -y install wget yum rpm make gcc gcc-c++ tar sed zlib openssl

After the installation is complete, turn off any unnecessary services (like gpm, sendmail, cups, haldaemon, messagebus, pcmcia, anacron, atd) with the following command:

  • chkconfig <SERVICE> off

Configure the following settings for the server:

  • The host running the srm transfer service needs to have a host certificate and a host key in place (/etc/grid-security/hostcert.pem, /etc/grid-security/hostkey.pem).
Firewall configuration

dCache ports (additionally see dCache book, chapter 22)

Protocol Port(s) Direction Nodes
dCap 22125 incoming doorDomain (admin node)
any outgoing pools
GSIdCap 22128 incoming gsidcapDomain (where GSIDCAP=yes in node_config)
any outgoing pools
GridFTP 2811 incoming gridftpDomain (where GRIDFTP=yes in node_config)
20000-25000 outgoing (active FTP) pools
20000-25000 incoming (passive FTP) gridftpDomain
SRM v1 8443 incoming srmDomain
SRM v2 8444 incoming srmDomain

There are two ways to configure the Firewall ports:

  1. Before dcache setup. Firewall ports can be configured by the site-info.def variables:
    • DCACHE_PORT_RANGE_PROTOCOLS_SERVER_GSIFTP. Sets the portrange for dcache as a GSIFTP server in "passive" mode. Default value is from 50000 till 52000 ("50000,52000").
    • DCACHE_PORT_RANGE_PROTOCOLS_CLIENT_GSIFTP. Sets the portrange for dcache as a GSIFTP client in "active" mode. Default value is from 33115 till 33125 ("33115,33125").
    • DCACHE_PORT_RANGE_PROTOCOLS_SERVER_MISC. Sets the portrange for dcache as a (GSI)DCAP and xrootd server in "passive" mode. Default value is from 60000 till 62000 ("60000,62000").
  2. After dcache setup. Modify the following variables into /opt/d-cache/config/dCacheSetup
    • Java Configuration section
      • Dorg.globus.tcp.port.range to "20000,25000"
      • Dorg.dcache.net.tcp.portrange to "33115,33215"
    • Network Configuration section
      • dCapPort to "22125"
      • dCapGsiPort to "22128"
      • gsiFtpPortNumber to "2811"
      • srmPort to "8443"
      • clientDataPortRange to "20000,25000"

Install

WARNING: Users edguser and edginfo must be added into the information provider nodes (in general this is considered to be the "admin node"). They are not needed on other nodes but, since their presence will do no harm, they may be added on all nodes.

Users postgres and (after PostgresSQL is installed) srmdcache to connect dCache to the databases should be created. They need other rights for handling PostgreSQL databases like a normal unix user, therefore use PostgreSQL tools with the option -U to allow execution of commands with a different user account.

Install dCache server and clients

  • dcache-server
  • dcap
  • libdcap
  • dcache-srmclient

Configure

node_config

Edit the file node_config and place it inside of /opt/d-cache/etc.

Note-icon.png
  
A template is shipped together with the dCache server rpm and can be found at /opt/d-cache/etc/node_config.template. Look into the template for some documentation.

This flat file defines few, node-specific variables that are needed on all nodes the dCache setup consists of. The following values have to be set in the node_config file:

NAMESPACE=chimera Defines the type of namespace dCache will use. Possible values are pnfs or chimera.
NAMESPACE_NODE=<FQDN of your headnode> Defines the (fully qualified domain) name of the node that hosts the dCache service for managing the namespace.
NODE_TYPE=admin Defines the role this node has to fulfill in the global dCache setup. Set this either to admin, so a predefined set of services will be launched on this host, or custom and select every single service by means of the next variable.
SERVICES="srm dcap gsidcap gridftp info" List all services that this node should host. Possible values can be chosen from dcap, gsidcap, srm, gridftp, xrootd, info, gPlazma, chimera, admin, httpd, utility, dir, statistics, lm, dCache, replica.

dCacheSetup

The next important file for configuration is the dCacheSetup file. Again we have to edit some variables inside of it and place it afterwards at /opt/d-cache/config/dCacheSetup.

Note-icon.png
  
dCache comes with a template of the dCacheSetup file, too. It can be found at /opt/d-cache/etc/dCacheSetup.template.

In the dCacheSetup file almost all variables are of global importance. Hence, it is strictly advised to keep this file as much identical as possible among all nodes of the dCache setup (it will ease the administration a lot!). Set the following values inside of it:

serviceLocatorHost=<FQDN of your headnode> Adjust this to point to one unique server for one and only one dCache instance (usually the admin node).
java="/usr/java/latest/bin/java" Set this to the jdk executable you want to use for dCache.
java_options="-server -Xmx64m -XX:MaxDirectMemorySize=64m (...)" The numbers for Xmx and MaxDirectMemorySize indicate the amount of memory allocated for each java task initialized by dCache.
cacheInfo=pnfs dCache needs to know about locations for all stored files. This variable decides whether this information is kept in a separate database or the namespace service itself. Usually pnfs is chosen if Chimera is used and companion otherwise.
logArea=/var/log/dCache In principle dCache can write its log files in any existing directory. But as every dCache service has its own log file, thus the number may become high, it may turn out beneficial to separate them from the default system log files.

Chimera

  • As user postgres create a database for Chimera, the namespace type of our choice.
  • The Chimera NFS server uses the /etc/exports file to manage exports. A typical export file looks like:
/ localhost(rw)
/pnfs *.your.domain(rw)
Note-icon.png
  
Probably such a file does not exist yet. Then it has to be created.
  • After you have edited /etc/exports, start the NFS server via script:
/opt/d-cache/libexec/chimera/chimera-nfs-run.sh start
  • Create the main Chimera path
Note-icon.png
  
In the production system mount localhost:/pnfs /pnfs can be used. But for configuration/initialization mount localhost:/ /mnt is needed.
  • Create the root of the Chimera namespace, called pnfs (for legacy reasons).
  • Add the directory tags "sGroup", "OSMTemplate", "RetentionPolicy", "AccessLatency".
Note-icon.png
  
Directory tags in PNFS are metadata, which will be evaluated by dCache and inherited by future subdirectories.
Note-icon.png
  
The Chimera configuration is done now, so unmount NFSv3: umount /mnt

Whenever you need to change the configuration, you have to remount the admin-view (localhost:/) to a temporary location like /mnt

  • Create directories for storing data as needed. Remember to adapt the owners for the directories accordingly (the directories should not be owned by root)! For this, have a look at attribute-based authorization in dCache.

Miscellaneous tasks

  • Create the directory for dCache to put it's logfiles in, if it does not exist yet (remember the settings from the dCacheSetup file).
    • Normally every dCache component has its own logfile which will be stored in this directory. But all current dCache versions have one exception to this rule: /opt/d-cache/libexec/apache-tomcat-5.5.20/logs/catalina.out. This file is used as logfile for the SRM-Domain. If you like to work with this logfile as with every other dCache logfile, than you may consider to install a symbolic link:
ln -s /opt/d-cache/libexec/apache-tomcat-5.5.20/logs/catalina.out /var/log/dCache/srm-dgiref-dcacheDomain.log
  • Create a little wrapper for the dccp command which sets the environment variable LD_LIBRARY_PATH whenever executed.
  • Once all this configuration is done, we have to make it take effect by running /opt/d-cache/install/install.sh
Note-icon.png
  
Everytime basic settings – e.g. there has been an update of the dCache version – have changed, install.sh must be executed!
  • Configure dCache pools
Note-icon.png
  
dCache pools hold all the data ever written into dCache. These pools are completely independent from the PNFS directories (for the time beeing). In fact, they could be created anywhere and then mounted locally. Do not use the whole disk space available for the pools! dCache needs some additional space to keep record on metadata linked to files stored in each pool.
  • The (logical) dCache pools are created by means of /opt/d-cache/bin/dcache pool create <size> <target location>.
  • Add pools to a dCache domain (usage: /opt/d-cache/bin/dcache pool add [--fqdn] [--domain=<domain>] [--lfs=<mode>] <pool name> <target location>). If no domain is specified, the pool will be added to one named after the hostname (in our case "dgiref-dcache.fkz.deDomain"). "lfs=precious" makes a pool disk-only, that is without HSM back-end.
  • Adapt the file /opt/d-cache/config/PoolManager.conf.

Proceed

  • make dCache a known service so it will be started during boot phase.
  • start dCache as a service
Starting lmDomain Done (pid=19164)
Starting dCacheDomain Done (pid=19233)
Starting dirDomain Done (pid=19305)
Starting adminDoorDomain Done (pid=19389)
Starting httpdDomain Done (pid=19477)
Starting utilityDomain 6 Done (pid=19560)
Starting gPlazma-dgiref-dcacheDomain 6 Done (pid=19712)
Starting chimeraDomain 6 Done (pid=19800)
Starting pool1Domain Done (pid=19900)
Starting pool2Domain Done (pid=19987)
Starting infoProviderDomain Done (pid=20083)
Starting infoDomain Done (pid=20163)
statisticsDomain might still be running
Starting dcap-dgiref-dcacheDomain Done (pid=20335)
Starting gridftp-dgiref-dcacheDomain 6 Done (pid=20460)
Starting gsidcap-dgiref-dcacheDomain Done (pid=20621)
Using CATALINA_BASE:   /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_HOME:   /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_TMPDIR: /opt/d-cache/libexec/apache-tomcat-5.5.20/temp
Using JRE_HOME:       /usr/java/jdk1.6.0_17
 
Pinging srm server to wake it up, will take few seconds ...
VersionInfo : v2.2
backend_type:dCache
 
Done

Initial test

  • check dCache web interface on the dcache server port 2288 (http://<dcache-headnode-fqdn>:2288/cellinfo)
  • write a file via dccp (so unauthenticated)
  • write a file via srmcp (using a voms-proxy)

Additional information can be used from the Troubleshooting.

Update

To update/remove the dCache server do:

  1. Get the dCache rpms for the version you want to install
  2. Remember to read the changelog for the version and adapt the following steps accordingly
  3. Stop all running dCache services
  4. Install the rpms
  5. Run /opt/d-cache/install/install.sh
  6. Restart dCache services

dCache extensions

Authorization in dCache

Assume, that dCache is installed and configured including two pools for disk-only files (as it described on data:Dcache/195/server). Now we need to tell dCache, who is actually allowed to write into and read from the pools. The easiest way of enabling attribute based authorization is via the vorole-mapping. For this another set of files have to be edited.

The official documentation for dCache (“dCache The Book”) has a up-to-date chapter about this method: http://www.dcache.org/manuals/Book/config/cf-gplazma-vorole.shtml.

/opt/d-cache/etc/dcachesrm-gplazma.policy

In this file dCache looks up, which methods for authorization are enabled and in which order they are to apply (there may be up to 5 different methods). For our purpose we only need to activate gplazmalite-vorole-mapping:

kpwd="OFF"
gplazmalite-vorole-mapping="ON"

Accordingly two other variables need to be checked (remember their value):

# Built-in gPLAZMAlite grid VO role mapping
gridVoRolemapPath="/etc/grid-security/grid-vorolemap"
gridVoRoleStorageAuthzPath="/etc/grid-security/storage-authzdb"

/etc/grid-security/grid-vorolemap

In the grid-vorolemap the first part for attribute based authorization is done: mapping of user roles onto user names. This file may be generated automatically, but it is very easy maintainable by hand. However, there is no template shipped with the installation of dCache, so one has to create it at /etc/grid-security/grid-vorolemap.

"*" "<group>" <username>

This may look like this.

"*" "/ops" ops001

This line introduces the mapping of every possible DN (the asterisk is used as wildcard, but a specific DN is also valid) together with the attribute /ops onto the user “ops001”. This attribute can also reflect the role a user came with:

"*" "/<group>/<subgroup>" <username>
"*" "/<group>/Role=<role>" <username>

At this point it does not matter which user names are used for the mappings as these do not have to be real existing unix useraccounts. In order to compensate for this, the storage-authzdb file is needed.

/etc/grid-security/storage-authzdb

Like said before, this file is needed in order to give the “virtual users” employed in the grid-vorolemap proper uid and gid.

authorize <username> read-write <uid> <gid> / / /
authorize ops001 read-write 22001 5850 / / /

Besides this the very first (non-commented) line must specify the version of the storage-authdb format: “version 2.1”.

The three slashes at the end of the line exist mostly for legacy reasons. Have a look at dCache - The Book (http://www.dcache.org/manuals/Book/config/cf-gplazma-authzdb.shtml) for further details.

/opt/d-cache/etc/LinkGroupAuthorization.conf

Lastly dCache needs to know, which users and roles are to be allowed working with defined link groups. This is configured inside /opt/d-cache/etc/LinkGroupAuthorization.conf. We can either allow any authenticated user and role, or restrict to known VOs.

LinkGroup default-linkGroup
*/Role=*

or

LinkGroup default-linkGroup
<group>
<group>/<subgroup>
<group>/Role=<role>

create directory structure

# insert domainName value
domainName=fzk.de
 
for vo in dgops medigrid c3grid ingrid textgrid wisent hepcg kerndgrid astrogrid dgtest gdigrid progrid partgrid fingrid lifescience education bwgrid dgcms aerogrid bisgrid bauvogrid biz2grid bioif interloggrid mosgrid
do
mkdir /pnfs/$domainName/data/$vo
done
 
for dir in `ls /pnfs/$domainName/data`
do
  uname=$(grep -m1 $dir /etc/grid-security/grid-vorolemap | cut -d' ' -f 3)
  uid=$(grep $uname /etc/grid-security/storage-authzdb | cut -d' ' -f 4-5 | sed 's/ /:/')
  chown $uid /pnfs/$domainName/data/$dir
done
 
ls -dln /pnfs/fzk.de/data/*
Personal tools