-
Notifications
You must be signed in to change notification settings - Fork 175
Refactoring of the RM
The inheritance schema is heavy and useless, it is just some wrapper methods. The checks done are anyway done further. The idea is to merge CatalogToStorage and ReplicaManager into a single class, and instead of inheriting from all these interfaces, we will use the StorageElement and the FileCatalog directly.
This attribute is not very much used and makes the class complex: it should disappear from the ReplicaManager, the StorageElement and the FileCatalog. The default will be the Successful/Failed dictionary convention. We can provide a helper function that converts the Successful/Failed to an S_OK/S_ERROR.
As for SingleFile, it should disappear. If someone really wants to specify manually which catalog to use, he can just create a new FileCatalog and give the catalog name in the constructor of the FileCatalog. It concerns only very few modules.
The PFN should not be used as arguments in the ReplicaManager, the FileCatalog and the StorageElement. Even though we claimed we were not using the PFN stored in the LFC, we are, and this has bad consequences like making it very difficult (if not impossible...) to change the basepath of a storage element transparently. The proposed solution is to always use the LFN. If we want to refer to a particular replica, we should use the LFN and the SE name. The only place where we would need the PFN stored in the LFC is to remove a replica, but this can be sorted out internally.
This should make the following methods obsolete :
- getCatalogLFNForPFN. Used by:
- Dirac : StorageManagementSystem/Agent/SENamespaceCatalogCheckAgent.py and DataManagementSystem/Client/DataIntegrityClient.py
- getLfnForPfn. Used by:
- Dirac : DataManagementSystem/Client/DataIntegrityClient.py
- getPfnForLfn. Used by:
- Dirac : Used in TransformationSystem/Agent/TransformationCleaningAgent.py
- LHCbDirac : DataManagementSystem/scripts/dirac-dms-lfn-replicas.py,
- getPfnForProtocol. Used by:
- Dirac: StorageManagementSystem/Agent/SENamespaceCatalogCheckAgent.py, DataManagementSystem/Client/DataIntegrityClient.py, and DataManagementSystem/Client/FTSClient.py
The only places where the actual PFN should be used are inside the Catalog plugins and Storage plugins, and so not visible to the user.
Many methods are just forwarding the ReplicaManager call to the StorageElement or the FileCatalog classes. We should make sure that these methods are now called directly on the StorageElement/FileCatalog, and not through the ReplicaManager anymore. The idea is that the ReplicaManager should be used only if both the FileCatalog and the StorageElement are involved. A detailed list of places where these methods are used is available on demand.
Method | Replacement |
---|---|
addCatalogFile | Call FC.addFile |
addCatalogReplica | Call FC.addReplica |
getCatalogDirectoryMetadata | Call FC.getDirectoryMetadata |
getCatalogExists | Call FC.exists |
getCatalogFileMetadata | Call FC.getFileMetadata |
getCatalogFileSize | Call FC.getFileSize |
getCatalogLFNForPFN | Call FC.getLFNforPFN. But do we really need it if we get ride of the PFN? |
getCatalogListDirectory | Call FC.listDirectory with default verbose = False |
getCatalogReplicas | Call FC.getReplicas with default allStatus = False |
getCatalogReplicaStatus | Call FC.getReplicaStatus |
getLfnForPfn | This method should be removed |
getPfnForLfn | Call SE.getPfnForLfn. But do we really need it if we get ride of the PFN? |
getPfnForProtocol | Calls SE.getPfnForProtocol. The default protocol asked in the rm is SRM2. |
getStorageFile | Call SE.getFIle |
getStorageFileAccessUrl | Call SE.getAccessUrl |
getStorageFileExists | Call SE.exists |
getStorageFileMetadata | Call SE.getFileMetadata |
getStorageFileSize | Call SE.getFileSize |
getStorageListDirectory | Call SE.listDirectory |
pinStorageFile | Call SE.pinFile with default lifetime = 86400 |
prestageStorageFile | Call SE.prestageFile with default lifetime = 86400 |
putStorageDirectory | Call SE.putDirectory |
removeCatalogDirectory | Call FC.removeDirectory with default recursive = False |
removeCatalogFile | Call FC.removeFile but sort the lfn from the longest to the shortest |
removeCatalogReplica | Call FC.removeReplica |
removeStorageDirectory | Call SE.removeDirectory with default recursive = false |
removeStorageFile | Call SE.removeFile |
setCatalogReplicaStatus | Call FC.setReplicaStatus |
These replacements concern 34 files in Dirac, and 29 in LHCbDirac.
The proposed plan is the following:
- Modification of the ReplicaManager, FileCatalog and StorageElement for v6r[asap]. There are only very few calls to the gConfig (1 for the RM, 2 for the SE, 4 for the FC), so porting it to v7 should be easy. We would make the changes backward compatible: if a method is called via the ReplicaManager instead of directly calling the StorageElement/FileCatalog we could forward it to the proper class and issue a message. This hack would be here only the time of one release to give more time to people to change.
- Modification of the scripts, agents, services .... will be done progressively. The version targeted to finish the changes would be v7r0. We would then drop the helper that insure the backward compatibility.
Several methods seem to be unused. We can maybe remove them, unless they are here for future use or used at a place I could not spot.
Method | Behavior |
---|---|
createCatalogDirectory | Call FC.createDirectory |
createCatalogLink | Call FC.createLink |
getCatalogDirectoryReplicas | Call FC.getDirectoryReplicas |
getCatalogDirectorySize | Call FC.getDirectorySize |
getCatalogIsDirectory | Call FC.isDirectory |
getCatalogIsFile | Call FC.isFile |
getCatalogIsLink | Call FC.isLink |
getCatalogReadLink | Call FC.readLink |
getPrestageStorageFileStatus | Call SE.prestageFileStatus |
getStorageDirectory | Call SE.getDirectory |
getStorageDirectoryIsDirectory | Call SE.isDirectory |
getStorageDirectoryMetadata | Call SE.getDirectoryMetadata |
getStorageDirectorySize | Call SE.getDirectorySize |
getStorageFileIsFile | Call SE.isFile |
putStorageFile | Call SE.putFile |
releaseStorageFile | Call SE.releaseFile |
removeCatalogLink | Call FC.removeLink |
replicateStorageFile | Call SE.replicateFile |
setCatalogReplicaHost | Call FC.setReplicaHost |
With the current chain system (RM -> SE/FC-> plugins), an error happening in the plugins is often printed 3 times, which makes the logs heavy and more complex to read/parse. The proposition is that all the messages issued by these classes should be at the debug level. They are just tools, so it is up to the script/agent/service using this tool to report the error.