Hi all,
First of all, I am fan of MS Amanda (which I use in standalone mode), since it is one of the few (perhaps only) search engines that supports customizable neutral loss specification for modifications, which is a great feature. As Chair of the PSI, I am trying to encourage the use of mzIdentML as much as possible, and also this is obviously a submission format to PRIDE.
I noticed that there are a few glitches in the MS Amanda mzIdentML format, which while technically (almost) legal files, they could be improved quite a lot with a couple of minor fixes, as follows:
First, the spectrumID attribute is set wrongly:
<SpectrumIdentificationResult id="SIR_42931"
spectrumID="42931" spectraData_ref="SD_1">
<SpectrumIdentificationItem id="SII_0_206120" chargeState="2" experimentalMassToCharge="904.406463069461" peptide_ref="EQEDLPLYQHQATR_0000000000000100" rank="1" passThreshold="True">
<PeptideEvidenceRef peptideEvidence_ref="EQEDLPLYQHQATR_0000000000000100_sp|Q8NI35" />
<cvParam name="Amanda:AmandaScore" value="35.1037212392773" cvRef="PSI-MS" accession="MS:1002319" />
</SpectrumIdentificationItem>
It is an important part of the specification that for an MGF formatted input file, the output should look as follows:
<SpectrumIdentificationResult id="SIR_42931"
spectrumID="index=42931" spectraData_ref="SD_1">
<SpectrumIdentificationItem id="SII_0_206120" chargeState="2"
experimentalMassToCharge="904.406463069461"
peptide_ref="EQEDLPLYQHQATR_0000000000000100" rank="1"
passThreshold="True">
<PeptideEvidenceRef peptideEvidence_ref="EQEDLPLYQHQATR_0000000000000100_sp|Q8NI35" />
<cvParam name="Amanda:AmandaScore" value="35.1037212392773" cvRef="PSI-MS" accession="MS:1002319" />
</SpectrumIdentificationItem>
Assuming that this is a result from searching the spectrum at position 42931 in the file, where index=0 is the first spectrum searches. I realise this looks a bit odd, but we wrote it this was in the specifications for some historic reason which seemed to make sense at the time, but reading software packages now rely on this value for connecting results to spectra.
Second, the SII element is missing the attribute calculatedMassToCharge. This is an optional attribute but if you can set this value, it is preferable since then downstream software can easily calculate mass error and do plots etc.
Third, the SpectrumIdentificationProtocol would be greatly improved if you included some basic settings in there, such as the precursor and fragment tolerance, Modifications, Enzyme searched for etc, as per MSGF (user params are obviously at your own discretion, but it would be straightforward to copy across from your settings.xml):
<SpectrumIdentificationProtocol analysisSoftware_ref="ID_software" id="SearchProtocol_1">
<SearchType>
<cvParam cvRef="PSI-MS" accession="MS:1001083" name="ms-ms search"/>
</SearchType>
<AdditionalSearchParams>
<cvParam cvRef="PSI-MS" accession="MS:1001211" name="parent mass type mono"/>
<cvParam cvRef="PSI-MS" accession="MS:1001256" name="fragment mass type mono"/>
<userParam name="TargetDecoyApproach" value="false"/>
<userParam name="MinIsotopeError" value="0"/>
<userParam name="MaxIsotopeError" value="1"/>
<userParam name="FragmentMethod" value="HCD"/>
<userParam name="Instrument" value="QExactive"/>
<userParam name="Protocol" value="Phosphorylation"/>
<userParam name="NumTolerableTermini" value="2"/>
<userParam name="NumMatchesPerSpec" value="10"/>
<userParam name="MaxNumModifications" value="2"/>
<userParam name="MinPepLength" value="8"/>
<userParam name="MaxPepLength" value="30"/>
<userParam name="MinCharge" value="2"/>
<userParam name="MaxCharge" value="4"/>
<userParam name="ChargeCarrierMass" value="1.00727649"/>
</AdditionalSearchParams>
<ModificationParams>
<SearchModification fixedMod="true" massDelta="57.021465" residues="C">
<cvParam cvRef="UNIMOD" accession="UNIMOD:4" name="Carbamidomethyl"/>
</SearchModification>
<SearchModification fixedMod="false" massDelta="79.96633" residues="S">
<cvParam cvRef="UNIMOD" accession="UNIMOD:21" name="Phospho"/>
</SearchModification>
<SearchModification fixedMod="false" massDelta="79.96633" residues="T">
<cvParam cvRef="UNIMOD" accession="UNIMOD:21" name="Phospho"/>
</SearchModification>
<SearchModification fixedMod="false" massDelta="79.96633" residues="Y">
<cvParam cvRef="UNIMOD" accession="UNIMOD:21" name="Phospho"/>
</SearchModification>
</ModificationParams>
<Enzymes>
<Enzyme semiSpecific="false" missedCleavages="1000" id="Tryp">
<EnzymeName>
<cvParam cvRef="PSI-MS" accession="MS:1001251" name="Trypsin"/>
</EnzymeName>
</Enzyme>
</Enzymes>
<ParentTolerance>
<cvParam cvRef="PSI-MS" accession="MS:1001412" name="search tolerance plus value" value="5.0" unitAccession="UO:0000169" unitName="parts per million" unitCvRef="UO"/>
<cvParam cvRef="PSI-MS" accession="MS:1001413" name="search tolerance minus value" value="5.0" unitAccession="UO:0000169" unitName="parts per million" unitCvRef="UO"/>
</ParentTolerance>
<Threshold>
<cvParam cvRef="PSI-MS" accession="MS:1001494" name="no threshold"/>
</Threshold>
</SpectrumIdentificationProtocol>
If you are able to make some changes to the mzIdentML output that would be fantastic, and I would be happy to help and test files etc?
best wishes
Andy