Some improvements to the MS Amanda mzIdentML output

46 views
Skip to first unread message

Andy Jones

unread,
Nov 27, 2018, 7:05:37 AM11/27/18
to MS Amanda
Hi all,

First of all, I am fan of MS Amanda (which I use in standalone mode), since it is one of the few (perhaps only) search engines that supports customizable neutral loss specification for modifications, which is a great feature. As Chair of the PSI, I am trying to encourage the use of mzIdentML as much as possible, and also this is obviously a submission format to PRIDE.

I noticed that there are a few glitches in the MS Amanda mzIdentML format, which while technically (almost) legal files, they could be improved quite a lot with a couple of minor fixes, as follows:

First, the spectrumID attribute is set wrongly:

<SpectrumIdentificationResult id="SIR_42931" spectrumID="42931" spectraData_ref="SD_1">
      <SpectrumIdentificationItem id="SII_0_206120" chargeState="2" experimentalMassToCharge="904.406463069461" peptide_ref="EQEDLPLYQHQATR_0000000000000100" rank="1" passThreshold="True">
        <PeptideEvidenceRef peptideEvidence_ref="EQEDLPLYQHQATR_0000000000000100_sp|Q8NI35" />
        <cvParam name="Amanda:AmandaScore" value="35.1037212392773" cvRef="PSI-MS" accession="MS:1002319" />
      </SpectrumIdentificationItem>


It is an important part of the specification that for an MGF formatted input file, the output should look as follows:

<SpectrumIdentificationResult id="SIR_42931" spectrumID="index=42931" spectraData_ref="SD_1">
      <SpectrumIdentificationItem id="SII_0_206120" chargeState="2" experimentalMassToCharge="904.406463069461" peptide_ref="EQEDLPLYQHQATR_0000000000000100" rank="1" passThreshold="True">
        <PeptideEvidenceRef peptideEvidence_ref="EQEDLPLYQHQATR_0000000000000100_sp|Q8NI35" />
        <cvParam name="Amanda:AmandaScore" value="35.1037212392773" cvRef="PSI-MS" accession="MS:1002319" />
      </SpectrumIdentificationItem>

Assuming that this is a result from searching the spectrum at position 42931 in the file, where index=0 is the first spectrum searches. I realise this looks a bit odd, but we wrote it this was in the specifications for some historic reason which seemed to make sense at the time, but reading software packages now rely on this value for connecting results to spectra.

Second, the SII element is missing the attribute calculatedMassToCharge. This is an optional attribute but if you can set this value, it is preferable since then downstream software can easily calculate mass error and do plots etc.

Third, the SpectrumIdentificationProtocol would be greatly improved if you included some basic settings in there, such as the precursor and fragment tolerance, Modifications, Enzyme searched for etc, as per MSGF (user params are obviously at your own discretion, but it would be straightforward to copy across from your settings.xml):

<SpectrumIdentificationProtocol analysisSoftware_ref="ID_software" id="SearchProtocol_1">
    <SearchType>
      <cvParam cvRef="PSI-MS" accession="MS:1001083" name="ms-ms search"/>
    </SearchType>
    <AdditionalSearchParams>
      <cvParam cvRef="PSI-MS" accession="MS:1001211" name="parent mass type mono"/>
      <cvParam cvRef="PSI-MS" accession="MS:1001256" name="fragment mass type mono"/>
      <userParam name="TargetDecoyApproach" value="false"/>
      <userParam name="MinIsotopeError" value="0"/>
      <userParam name="MaxIsotopeError" value="1"/>
      <userParam name="FragmentMethod" value="HCD"/>
      <userParam name="Instrument" value="QExactive"/>
      <userParam name="Protocol" value="Phosphorylation"/>
      <userParam name="NumTolerableTermini" value="2"/>
      <userParam name="NumMatchesPerSpec" value="10"/>
      <userParam name="MaxNumModifications" value="2"/>
      <userParam name="MinPepLength" value="8"/>
      <userParam name="MaxPepLength" value="30"/>
      <userParam name="MinCharge" value="2"/>
      <userParam name="MaxCharge" value="4"/>
      <userParam name="ChargeCarrierMass" value="1.00727649"/>
    </AdditionalSearchParams>
    <ModificationParams>
      <SearchModification fixedMod="true" massDelta="57.021465" residues="C">
        <cvParam cvRef="UNIMOD" accession="UNIMOD:4" name="Carbamidomethyl"/>
      </SearchModification>
      <SearchModification fixedMod="false" massDelta="79.96633" residues="S">
        <cvParam cvRef="UNIMOD" accession="UNIMOD:21" name="Phospho"/>
      </SearchModification>
      <SearchModification fixedMod="false" massDelta="79.96633" residues="T">
        <cvParam cvRef="UNIMOD" accession="UNIMOD:21" name="Phospho"/>
      </SearchModification>
      <SearchModification fixedMod="false" massDelta="79.96633" residues="Y">
        <cvParam cvRef="UNIMOD" accession="UNIMOD:21" name="Phospho"/>
      </SearchModification>
    </ModificationParams>
    <Enzymes>
      <Enzyme semiSpecific="false" missedCleavages="1000" id="Tryp">
        <EnzymeName>
          <cvParam cvRef="PSI-MS" accession="MS:1001251" name="Trypsin"/>
        </EnzymeName>
      </Enzyme>
    </Enzymes>
    <ParentTolerance>
      <cvParam cvRef="PSI-MS" accession="MS:1001412" name="search tolerance plus value" value="5.0" unitAccession="UO:0000169" unitName="parts per million" unitCvRef="UO"/>
      <cvParam cvRef="PSI-MS" accession="MS:1001413" name="search tolerance minus value" value="5.0" unitAccession="UO:0000169" unitName="parts per million" unitCvRef="UO"/>
    </ParentTolerance>
    <Threshold>
      <cvParam cvRef="PSI-MS" accession="MS:1001494" name="no threshold"/>
    </Threshold>
  </SpectrumIdentificationProtocol>

If you are able to make some changes to the mzIdentML output that would be fantastic, and I would be happy to help and test files etc?
best wishes
Andy

Dorfer Viktoria

unread,
Nov 27, 2018, 7:43:25 AM11/27/18
to msam...@googlegroups.com

Hi Andy,

 

thanks a lot for looking so deeply into our mzIndentML outputs, we greatly appreciate your comments! We will definitely work on improving the output format and would be happy if you could give us again feedback after our changes. I will let you know as soon as there is an update available!

 

Thanks again!

Best regards,

Viktoria

--
You received this message because you are subscribed to the Google Groups "MS Amanda" group.
To unsubscribe from this group and stop receiving emails from it, send an email to msamanda+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andy Jones

unread,
Nov 27, 2018, 7:57:47 AM11/27/18
to MS Amanda
And one more thing - the recommended encoding for mzIdentML is to do gzip, and use the extension .mzid.gz

It is a minor thing but files zip up more 10X usually, so this saves user hard drive space, and reading software is supposed to accept mzid.gz by default

Andy Jones

unread,
Nov 29, 2018, 4:33:22 AM11/29/18
to MS Amanda
Hi Viktoria,
That's great, many thanks! Let me know when you have some new files, and we will run some tests on them.
Andy

Viktoria Dorfer

unread,
Aug 2, 2019, 8:22:53 AM8/2/19
to MS Amanda
Hi Andy,

we finally have a new version of MS Amanda available that includes your requested changes on the mzIdentML output. We would be very happy if you could verify that our result files now correspond to the expected format!

Thanks a lot!
Best regards,
Viktoria
Reply all
Reply to author
Forward
0 new messages