Hi Stephen,
The line of code you're referring to is for importing the
Executable class which is simply used to sequentially execute several java programs with the help of an
executions.xml file. There's no requirement to use that at all.
Yeah you're right the current documentation for that is confusing, let me tell you why (I will update it once I finish writing this message).
At the beginning only one program was needed to import the UniProt module: ImportUniProtTitan.
However, as TrEMBL continued growing exponentially the strategy for the importing process started to throw some exceptions due to Titan not being able to cope with adding so much data at once as well as making a lot of queries to indices on that process. That's way in order for it to work with the latest TrEMBL versions we had to split the process in two:
- First importing all the vertices that should created as part of the UniProt module with ImportUniProtVerticesUsingFolderTitan
- Secondly importing all the relationships among the nodes that have already been stored in the database by means of the program: ImportUniProtEdgesUsingFolderTitan
Even when using this new solution we also had some issues with the last version of TrEMBL so we had to split the huge TrEMBL XML file in several ones (I don't remember exactly how many of them but somewhere around the hundreds or thousands). The program used for this is SplitUniProtXMLFile, that's why it's included as the first occurence in the executionsBio4jTitan.xml file.
Regarding GO annotations information you would still need to import two more modules:
I know this might not seem very intuitive and that's why we always recommend to use Bio4j releases that are already imported in AWS.
Cheers,
Pablo