-
Notifications
You must be signed in to change notification settings - Fork 175
RFC #5: Resources CS section structure
Authors: A.Tsaregorodtsev, R.Graciani
Last Modified: 17.04.2012
The /Resources section of the DIRAC configuration contains a description of all IT Resources available for the users of a given DIRAC installation. By IT Resources we understand the services providing access to computing, storage and all other necessary services (catalogs, databases,...) to build a functional distributed computing system, as well as the description of the resources themselves (capacity, limits,...). This section does not include high level DIRAC services, but it does include DIRAC interfaces to some of the above resources.
The current /Resources section in the DIRAC configuration was originally dedicated to the description of the Computing Elements. It has naturally evolved as the functionality of DIRAC has increased. Now the schema shows some limitations that we would like to overcome:
- There is not a clear dependence relation between some of the access points (resources) and the (sites) responsible for their operation. This is essential for properly monitoring the status of the infrastructure.
- When several communities or virtual organizations share the same DIRAC installation, there is no well defined way to express which resources are accessible to each of them.
- When several funding agencies or providers are supporting the resources at different sites it is not easy to avoid in the current schema the double definition of the sites and access points to the resources.
- Different communities, or even different setups for the same community, might want to define different access points for common resources like catalogs, FTS servers,... This is currently not possible.
The main idea is to structure the section by the logic of the resources provisioning. Therefore, it is based on the notion of a Site as the main body responsible for the services offered to the user communities. Access to each resource is provided by a Site. The managers of the Site are those responsible for the resources availability as well as for defining the rules for accessing and sharing them. Organizing resources by Sites gives a clear administrative information about whom to contact when needed. At the same time it provides a natural proximity relation between different types of resources that it is essential for DIRAC in order to optimize the scheduling.
The key concepts in the new schema are:
- Community (or Virtual Organization, VO): is a group of users that use the resources in a coordinated way. They are essential pieces in the DIRAC functionality since each Community may define its own policies for accessing and using the resources, within the limits allowed by the Sites. In the grid world Communities are called VOs.
- Site: is the central piece of the new schema both from a functional and an administrative point of view. On the one hand, it is the entity that collects access points to resources that are related by locality in a functional sense, i.e. the storage at a given Site is considered local to the CPU at the same Site and this relation will be used by DIRAC. On the other hand, a Site must provide a single entry point responsible for the availability of the resources that it encompasses. In the DIRAC sense, a Site can be from a fraction of physical computer center, to a whole regional grid. It is the responsibility of the DIRAC administrator of the installation to properly define the sites. Not all Sites need to grant access to all VOs supported in the DIRAC installation.
- Domain: is a supra-site organization meaningful in the context of a given DIRAC installation. Domains might be related to the funding of the resources (GISELA, WLCG,...), to administrative domains (NGIs) or any other reason relevant for the installation. They have no functional meaning in DIRAC and are only used for reporting purposes to group contributions beyond the Site level. Sites can belong to more than one Domain, in some cases exclusively but not necessarily. By default, all installations support the DIRAC domain.
- Resource Type: is the main category in which relevant IT Resources are grouped. At the moment the following types are relevant for this document: Computing, Storage, Catalog, FileTransfer, Database, CommunityManagement.
The /Resources sections contain all the information about the Sites, the services they provide and resources behind those services. This section will include also the information about the Communities supported by the resources and the Domains that are relevant to them. The Domains are described under the same /Resources section and the Communities are described under the top level /Registry section (not described in this document). Another top level /Operations section (also not described in this document) will allow the VOs to define how their resources are used.
With all the above in mind, the new schema of the /Resources section is proposed as follows:
/Resources/Sites/[Site Name]/[Resource Type]/[Name Of Service]/[Type Of Access Point]/[Name Of Access Point] /Resources/Domains/[Domain Name]
The naming conventions, the usage of each of the levels and the relevant options that need to be defined are described in the following sections of this document.
In the DIRAC configuration Sites have names resulting from concatenation of the name of the Site and the country code according to the ISO 3166 standard with a dot as a separator: [Site].[co]. These short Site names are used in the CS as the name of the sections containing all the resources provided by the Site.
Together with the short Site names, a full Site name is constructed by adding the "Domain" prefix. The full DIRAC Site Name becomes of the form: [Domain].[Site].[co]. The full site names are used everywhere when the site resources are assigned to the context of a particular Domain: in the accounting, monitoring, configuration of the Operations parameters, etc.
Both the Site and the Domain names must be unique alphanumeric strings, irrespective of case, with a possible use of the following characters: "_".
This convention will be enforced and Names not following it will not be usable.
They are defined, together with their contact information and other details, under the /Resources/Domains and /Registry/Communities sections of the configuration, respectively:
/Resources/Domains /EGI /GISELA ... /Registry /Communities ....
They are essential for the use of the information kept under the /Resources/Sites section. Therefore, at each level of the tree under this section, i.e. for each type of resources, access point or the site as a whole, a list of supported "Domains" and served "Communities" can be defined. This information is inherited by all subsections and can override the corresponding settings in the parent section.
For multi-VO installations, at each level, sections named after each of the supported Communities can be used to overwrite the common options of the parent section. This allows to define a different contact at a Site for a certain Community, or a different Port for a VOMS Server, or a different SpaceToken in a SRM Storage Element, etc.:
/Resources/Sites /CERN.ch ... /IN2P3.fr /Domains = EGI, LCG, GISELA /Communities = biomed, esr /ContactEmail = someone@somewhere /biomed /ContactEmail = someoneelse@somewhereelse /PIC.es ...
Sites are usually grouped in larger infrastructures like Grids, Clouds, etc. or provisioned by different funding bodies like national or international grid projects. This grouping might not be exclusive and Sites might belong to more or one of this groups (or none). We propose to call this Domains and the information related to them is kept in the /Resources/Domains section, with one subsection for each Domain. Typical examples are:
/Resources/Domains /gLite /AMAZON /StratusLab /BOINC
The use of these Domains is mostly for reporting purposes, Accounting and Monitoring, and it is the responsibility of the administrator of the DIRAC installation to chose them in such a way that they are meaningful for the communities and for the computing resources served by the installation. In any case, DIRAC will always be a default Domain if nothing else is specified for a given resource.
Each resource section, at any level in the tree, can have a Domains option which is a list of Domains providing this resource or to which the resource belongs by some other relation. The Domains option is not mandatory. If it is not present, the resource will be assigned to the DIRAC Domain when used.
Sites are providing access to the resources, therefore the /Resources/Sites section is the main place where the resources description is stored. At the next level there is a list of sections representing each of the Sites named following the short [Site].[co] convention as defined above:
/Resources/Sites /CERN.ch /IN2P3.fr /PIC.es
The subsection for each Site contains several options describing the Site as a whole, and a number of Sections describing the type of Resources it provides access to. At the moment this list includes Computing, Storage, Catalog, FileTransfer, Database, CommunityManagement, but it can be extended in the future if necessary. The resulting section will look as follows:
/ContactEmail = someone@somewhere /WebURL = http://some.whe.re /Coordinates = /MoUTierLevel = 1 /Computing /... /Storage /... /...
As for any section underneath, the options Domains and Communities can be defined. Their values will be inherited all subsections that do not explicitly overwrites them.
This section contains information about the interfaces to access Computing resources at the Site. It can have options common to all Computing interfaces at the Site and each of these interfaces (ComputingElements) has its own subsection. The subsection name is at the same time the name of the ComputingElement (CE). The CE subsection contains the options describing the CE, for example:
.../Computing/some.cream.ce /CEType = CREAM /Host = some.cream.ce /Communities = VO1, VO2 /Domains = Grid1, Grid2 /Queues ... ...
Note that unlike the current CS the name of the CE sections is not necessarily the CE host name. However, if the host name is not given as an explicit Host option, the name of the CE section is interpreted as the host name. This details should be all taken care of by the Resources helper utility.
Notice that the CE section can contain a Communities option which is a comma separated listed of the Communities allowed to use the given CE. This list can be defined for the CE as a whole or for each Queue of the CE in the corresponding section. The Communities value defined for the Queue overrides the one in the CE section. Similarly to the Communities option, the Domains option is a list of Domains. A Site or CE or Queue can be contributing resources in the name of one or more Domains. This information will eventually allow to provide accounting per Domain.
Each ComputingElement section has subsection (Queues for those CEs providing access to batch systems like WMS, CREAM, Torque, SGE, SSHSGE, etc.) which in turn contains subsections per named "queue". The name of the "queue" section is interpreted as the queue name. The queue subsection contains queue options, for example:
.../Queues/cream-sge-long /Communities = VO1, VO2 /Domains = Grid1, Grid2 /MaxCPUTime = /SI00 = /MaxWaitingJobs = /MaxTotalJobs = /OutputURL = ... ...
The format of the name of the queues and their options can be different for different CE types.
This section contains information about the interfaces to access Storage resources at the Site. It can have options common to all Storage interfaces at the Site and each of these interfaces (StorageElements) has its own subsection. The Storage section contains subsections per named Storage Element (SE). A named SE is an SE in the DIRAC sense, in other words, as seen by the DIRAC users. Different named SEs can point to the same StorageElement server, and make use of different options to upload/retrieve data from different backend storages. For instance a different base path or a different SRM Space Token for different types of data. Different named SEs can also be used with identical definitions for the purpose of accounting classification. See below for SE naming convention. The SE section contains the options describing the SE as a whole, for example:
.../Storage/disk /BackendType = /ReadAccess = /WriteAcces = /AccessProtocols ... ...
In general the SE name is a logical name and not a hostname. See below for SE naming conventions. As for CEs, the Storage and CE sections can contain Communities and Domains options that apply to all the AccessProtocols. In turn they can inherit those options from their Site.
The SE section contains an AccessProtocols subsection in which each subsection is dedicated to one access point description. For example:
.../AccessProtocols/SRM /Host = /Port = /Protocol = /Path = ... ...
The options might be different for different protocols
In DIRAC SE names follow the convention [SiteName]-[SEQualifier]. Examples: CERN-disk, IN2P3-USER. To avoid typos and to enforce this convention, in the CS only the SEQualifier is used in the Storage section. Based on the parent site name, the full name is build by the Helper tools. The full SE name must be used everywhere when the SE is referenced. In order to allow for compatibility with LFC records created by lcg-utils, we must also allow SEs named [SE Fully Qualified Host Name] (host.name.of.the.srm.server). In this case, the SE will be accesible in read-only mode and its DIRAC name will be the same host name.
This section contains description of the configured File Catalogs. This includes third party catalogs, e.g. LcgFileCatalog, but also DIRAC File Catalogs. The name of the section is a unique File Catalog name (and not its type). The options of the FileCatalog include, for example:
.../Catalog/DFC /CatalogType = DIRACFileCatalog /CatalogModes = Replica, Metadata /Url = ... ...
If the CatalogType option is not given, the section name is interpreted as the Catalog Type. The CatalogModes refers to Replica, Metadata or both. In this way the FileCatalog class can properly chose which one to use for different methods and getReplicasWithMetadata() can be issued with AMGA/LFC, DFC/LFC or DFC installations.
We will have to find the way to discover the configuration for a LcgFileCatalogCombined. Probably at this level they are all Catalogs of Type LcgFileCatalog, and the Combined is defined in the Operations section of the VO. As for the SEs, the full name to refer to catalogs will be [Site]-[CatalogName], for instance CERN-LFC.
This section includes the description of third party transfer services like FTS. It contains a number of sections, one per server available (currently only FTS is available, but there are plans to provide a service with similar functionality in DIRAC, other file transfer services also exists in the market). We follow the same convention as in the previous section, subsections are given a unique name:
.../FileTransfer/FTS /TranferType = FTS /Status = /URL = /Channels ... ...
If the TranferType option is not given, the section name is interpreted as the TransferType. Each Type of server will have its own set of Options, apart from the default ones to report the Status. For each server there could be section Channels defining the transfer channels that are supported in the server. As for the SEs, the full name to refer to a transfer server will be [Site]-[FileTransferName].
This section describes Community Management services like the VOMS servers:
.../CommunityManagement/VOMS /Type = VOMS /Host = /Status = /Port = ...
For VOMS servers and multi-VO installations, there is a Section per VO that holds the specific Port for each VO.
This section describes instances of the database servers available in the installation, like the Oracle Conditions database. "ConditionsDB" is LHCb specific. In principle, there is nothing specific to "Conditions" in this subsection and it describes generic database access parameters, for example:
.../Database/Conditions /Type = ConditionsDB /Connection = /User = /Password = ... ...
We foresee a DIRAC Service exposing access to a MySQL server using the DISET protocol with a similar functionality as the current MySQL class. As for other cases, the full name to refer to a database service will be [Site]-[DBName].
The proposed CS schema can be used directly by the RSS in its internal Resources mapping. In most cases it corresponds to the three levels of the resources hierarchy which can be loosely described by the schema: Sites->Resources->Nodes . In this case Resources are CEs, SEs, etc; Nodes are Queues, AccessProtocols, Channels, etc.