data:Dcache/1912

From Dgiref
Jump to: navigation, search

Contents

Introduction

Dcache.png the dCache is a mass storage system developed by Deutsches Elektronen-Synchrotron (DESY) and Fermi National Accelerator Laboratory (Fermilab) which provides a disk cache management, transfer management, data access and data management protocols.

The goal of this project is to provide a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods.

Simplified dCache architecture

The dCache system consists of several components (nodes) which interact over the network

  1. A provider site.
  2. The user uses the SRM node for resources access, authentication, authorization to use the resources.
  3. From SRM node user can access the PNFS node which know the data location, ADMIN node which works as the information system for dCache.
  4. All connections need to pass the GridFTP Door, which acts as a generic interface to the pool of nodes, managed by the PNFS.

Package:    dcache 1.9.12-10
 os:             Scientific Linux version 5.6 64 bit
 server:        dgiref-dcache.fzk.de
 manuals:   dcache server v. 1.9.12 / extentions
 monitoring: monitoring page


Archive links
Information links
Download links
Guidelines links
Tutorials links


Please open a NGI-DE ticket if you experience any Installation or Configuration problem.

dCache server v1.9.12

Prepare

Operating system
Scientific Linux v.5.6 64 bit

Optimizing the configuration:


Use minimal operating system installation without firewall. To verify installed packages use the command

  • rpm -qa | grep package_name

Install the following additional packages:

  • yum -y install wget yum rpm make gcc gcc-c++ tar sed zlib openssl

After the installation is complete, turn off any unnecessary services (like gpm, sendmail, cups, haldaemon, messagebus, pcmcia, anacron, atd) with the following command:

  • chkconfig <SERVICE> off

Configure the following settings for the server:

  • The host running the srm transfer service needs to have a valid host certificate and a host key in place (/etc/grid-security/hostcert.pem, /etc/grid-security/hostkey.pem).
  • Prepared grid-vorolemap, storage-authzdb [dCache ABA]
Firewall configuration

For normal communication between dCache server and clients the list of default ports(see below) must be accepted on firewall (dCache default ports)

Port number Direction Component
32768 is used by the NFS layer within dCache which is based upon rpc. This service is essential for rpc. NFS
1939, 33808 is used by portmapper which is also involved in the rpc dependencies of dCache. portmap
34075 is for postmaster listening to requests for the PostgreSQL database for dCache database functionality. Outbound for SRM, PnfsDomain, dCacheDomain and doors; inbound for PostgreSQL server.
33823 is used for internal dCache communication. By default: outbound for all components, inbound for dCache domain.
8443 is the SRM port. See Chapter 14, dCache Storage Resource Manager Inbound for SRM
2288 is used by the web interface to dCache. Inbound for httpdDomain
22223 is used for the dCache admin interface. See the section called “The Admin Interface” Inbound for adminDomain
22125 is used for the dCache dCap protocol. Inbound for dCap door
22128 is used for the dCache GSIdCap . Inbound for GSIdCap doors

Install

Install dCache server and clients

  • dcache-server
  • dcap
  • libdcap
  • dcache-srmclient

Configure

Layout

dCache keeps list of the domains and the services that are to be run within these domains in the layout files. Every domain is a separate Java VM.

Note-icon.png
  
dCache provides 3 typical kinds of layout file: head.conf, pools.conf, single.conf.
Note-icon.png
  
If in layout file included more then one domain for normal communication between domains need to be set the parameter messageBroker=cells. In case when all services is working in one domain this parameter is should be set to none.

dcache.conf

The most important file which define layout file for node and specific parameters for current node-services (those parameters could be get from properties files from /usr/share/dcache/defaults/ directory)

Chimera

Chimera is a namespace provider for single-rooted view of distributed dCache files. For configuration Chimera:

  • Create a database for Chimera
  • upload the create.sql, pgsql-procedures.sql

Pools

Note-icon.png
  
dCache pools hold all the data ever written into dCache. These pools are completely independent from the PNFS directories (for the time beeing). In fact, they could be created anywhere and then mounted locally. Do not use the whole disk space available for the pools! dCache needs some additional space to keep record on metadata linked to files stored in each pool.

All files into dCache system located in pools. The pools could be definded on the several "pool-nodes". For define the pools:

  • create the pool directory
  • create a new domain for pool or include the pool service to exist domain into layout file
  • configure PoolManager.conf

PnfsManager

NFS interface for Chimera provides access to dCache file tree. Pnfs makes possible to get and change information about file without access to file data.

Note-icon.png
  
Pnfs just service for monipulation of file tree. it means that provides an access to dCache file tree. Could create, delete, change attributes of files but can't get derect access to file data.
Note-icon.png
  
Directory tags in PNFS are metadata, which will be evaluated by dCache and inherited by future subdirectories.

Publishing Dcache in site-bdii

See details [1]

Proceed

  • Make dCache as a service for OS and add it to autoboot system. So it will be started during boot.
  • Start dCache

Initial test

Simple dCache test for client side:

  • Grid user init
  • Generate random data file
  • Make a directory via srm service
  • Upload random file to dCache via srm
  • Download file from dCache server
  • Delete new file and new folder from destination host
  • Compare uloaded and downloaded file

If all tests passed, probably your dCache service is working normally.

Update

Note-icon.png
  
For easy update don't change the settings into /usr/share/dcache/defaults/*.properties manually (all changes should be set into dcache.conf)
  • For installation of new version from umd just use yum update.

dCache extensions

Authorization in dCache

Assume, that dCache is installed and configured including two pools for disk-only files (as it described on data:Dcache/1912/server). Now we need to tell dCache, who is actually allowed to write into and read from the pools. The easiest way of enabling attribute based authorization is via the vorole-mapping. For this another set of files have to be edited.

The official documentation for dCache (“dCache The Book”) has a up-to-date chapter about this method: http://www.dcache.org/manuals/Book/config/cf-gplazma-vorole.shtml.

/etc/dcache/dcachesrm-gplazma.policy

In this file dCache looks up, which methods for authorization are enabled and in which order they are to apply (there may be up to 5 different methods). For our purpose we only need to activate gplazmalite-vorole-mapping:

kpwd="OFF"
gplazmalite-vorole-mapping="ON"

Accordingly two other variables need to be checked (remember their value):

# Built-in gPLAZMAlite grid VO role mapping
gridVoRolemapPath="/etc/grid-security/grid-vorolemap"
gridVoRoleStorageAuthzPath="/etc/grid-security/storage-authzdb"

/etc/grid-security/grid-vorolemap

In the grid-vorolemap the first part for attribute based authorization is done: mapping of user roles onto user names. This file may be generated automatically, but it is very easy maintainable by hand. However, there is no template shipped with the installation of dCache, so one has to create it at /etc/grid-security/grid-vorolemap.

"*" "<group>" <username>

This may look like this.

"*" "/ops" ops001

This line introduces the mapping of every possible DN (the asterisk is used as wildcard, but a specific DN is also valid) together with the attribute /ops onto the user “ops001”. This attribute can also reflect the role a user came with:

"*" "/<group>/<subgroup>" <username>
"*" "/<group>/Role=<role>" <username>

At this point it does not matter which user names are used for the mappings as these do not have to be real existing unix useraccounts. In order to compensate for this, the storage-authzdb file is needed.

/etc/grid-security/storage-authzdb

Like said before, this file is needed in order to give the “virtual users” employed in the grid-vorolemap proper uid and gid.

authorize <username> read-write <uid> <gid> / / /
authorize ops001 read-write 22001 5850 / / /

Besides this the very first (non-commented) line must specify the version of the storage-authdb format: “version 2.1”.

The three slashes at the end of the line exist mostly for legacy reasons. Have a look at dCache - The Book (http://www.dcache.org/manuals/Book/config/cf-gplazma-authzdb.shtml) for further details.

/etc/dcache/LinkGroupAuthorization.conf

Lastly dCache needs to know, which users and roles are to be allowed working with defined link groups. This is configured inside /etc/dcache/LinkGroupAuthorization.conf. We can either allow any authenticated user and role, or restrict to known VOs.

LinkGroup default-linkGroup
*/Role=*

or

LinkGroup default-linkGroup
<group>
<group>/<subgroup>
<group>/Role=<role>