In order to gather your comments, I summarise here some private exchanges with Federico, Chris, Andrei, Johan and some open questions related to this RFC.
For reference the current dev version is under:
https://github.com/arrabito/DIRAC/tree/TSrelv6r13/TransformationSystemand integration tests under:
https://github.com/arrabito/TestDIRAC/blob/testTestDIRACTS/Integration/TransformationSystem/TestClientTransformation.py* About the list of methods to instrument with meta-data filters:
- addFile
- setMetaData
- addReplica
Although the necessity of instrumenting addFile and setMetadata methods is evident, it's less evident the need for the addReplica method. Indeed, it has sense only if input data queries consider the location of the input data in the list of metadata. This may go down at the level of the computing model: do we consider that data at certain locations are "different" than the same data at other locations? In LHCb the answer is "no", and also in CTA for the moment. Also, there could be a performance issue: if every time a new replica is added or removed the whole list of queries is scanned, there's an added logic to be applied and this may hinder the performances of the system.
* Appending/removing files to a transformation
When a change of file meta-data makes the file matching the query condition of a transformation, the file is attached to the transformation. What should it happen when the change of file meta-data makes that a query condition that was previously statisfied is not satisfied anymore?
Should the file be removed from the transformation? If yes, we could end-up with a quite confusing situation, since part of the file attached to a transformation could remain attached, while another part would be removed. One could also imagine to remove only files that are unprocessed, however this means that in the end we would have a transformation having processed only part of all the files having the same meta-data.
To avoid this confusing situation, it would may be better that a file always remains attached to a transformation until the transformation is cleaned or the file itself removed. This means that we consider the files attached to a transformation as the files matching the inputdata query at a given time.
* Implementation of meta-data filters on client or server side?
Currently it's on the client side. However after some discussions we concluded that it would be better to move it on the server side (not necessarily on the service code, but at least server side). Indeed, if a file is added to a transformation, we want to have it logged at a single place and understand why, and not having to check all the job's logs. Especially if using a message queue, we would want to simply dump the info in it, and the listener applies the filter.
* Instantiation of FileCatalog within the TransformationClient
The TransformationClient is meant to be used as a FileCatalog plugin. However in its current implemention we instantiate a FileCatalog there, so there is something conceptually wrong. Indeed, we end up having a FileCatalog instantiating a TSCatalog instantiating a TransformationClient instantiating a FileCatalog instantiating a TSCatalog, and luckily stopping there.
The reason of instantiating the FileCatalog there is that we need to call a few meta-data methods to implement the logic of the filter, essentially to update the metadict of a file.
How to avoid this? Even transferring the logic on the server side, in the end would result in the same chain of instantiations.
* Multiple inheritance in the TransformationClient
Currently, and also in v6r13 we have:
class TransformationClient( Client, FileCatalogueBase ):
but we only call the init of the Client, not the one of the FileCatalog. Is there any reason for it?
* Add the possibility to create a transformation without any inputdataquery and to add an inputdataquery afterwards
* Missing unit-tests
Thanks in advance for your comments,
Luisa