re. SIM-XL mzIdentML output

30 views
Skip to first unread message

colin...@googlemail.com

unread,
Jun 27, 2018, 10:54:57 AM6/27/18
to SIM-XL Group
Hi,

I think I see an issue with the mzIdentML output from SIM-XL.

It seems the PeptideEvidence elements are missing their 'start' and 'end' attributes, which give the position of the peptide in the protein. For example, see the SIM-XL generated files in PRIDE project PDX006574 (https://www.ebi.ac.uk/pride/archive/projects/PXD006574).

The schema (v1.2.0, para 6.49) says that these attributes "must be provided unless this is a de novo search." So I think they should be included in SIM-XL's output?

What do you think?

best wishes,
Colin

Paulo C Carvalho

unread,
Jun 27, 2018, 11:19:55 AM6/27/18
to si...@googlegroups.com
Dear Colin, 

Im not into the details of the schema, Diogo might be able to better address this issue; but does the schema allow addressing two identifications within the same spectrum? If so, I'm sure we can update the software. 

Diogo, could you please check on this as well. 

Thanks
Paulo

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/simxl.
To view this discussion on the web visit https://groups.google.com/d/msgid/simxl/2f807f48-4e66-4b3f-bac6-69b35937a7a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Paulo Costa Carvalho
http://pcarvalho.com

Paulo C Carvalho

unread,
Jun 27, 2018, 11:23:08 AM6/27/18
to si...@googlegroups.com
BTW, the files generated for the pdx below were the first ever xl files done for mzidentml. We worked together with the folks from pride for so. As that specific pdx was the first, it might not reflect the latest revisions of the format. I know this was updated recently within sim xl. 

On Wed, Jun 27, 2018, 7:54 AM colincombe via SIM-XL Group <si...@googlegroups.com> wrote:
--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/simxl.
To view this discussion on the web visit https://groups.google.com/d/msgid/simxl/2f807f48-4e66-4b3f-bac6-69b35937a7a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

colin c

unread,
Jun 28, 2018, 8:31:59 AM6/28/18
to si...@googlegroups.com
Hi,
thanks for reply.

I think the schema allows addressing multiple identifications of the same spectrum (SpectrumIdentificationResult can contain unbounded number of SpectrumIdentificationItems).

Yes, I thought I should check output from the most recent version of your software. Are there examples online somewhere? it might be good to add an example to https://github.com/HUPO-PSI/mzIdentML/tree/master/examples/1_2examples/crosslinking

I think PDX006574 is your second contribution to PRIDE, the first was PDX001677? ;)

best wishes,
colin


On Wed, Jun 27, 2018 at 8:22 PM, Paulo C Carvalho <pa...@pcarvalho.com> wrote:
BTW, the files generated for the pdx below were the first ever xl files done for mzidentml. We worked together with the folks from pride for so. As that specific pdx was the first, it might not reflect the latest revisions of the format. I know this was updated recently within sim xl. 

On Wed, Jun 27, 2018, 7:54 AM colincombe via SIM-XL Group <si...@googlegroups.com> wrote:
Hi,

I think I see an issue with the mzIdentML output from SIM-XL.

It seems the PeptideEvidence elements are missing their 'start' and 'end' attributes, which give the position of the peptide in the protein. For example, see the SIM-XL generated files in PRIDE project PDX006574 (https://www.ebi.ac.uk/pride/archive/projects/PXD006574).

The schema (v1.2.0, para 6.49) says that these attributes "must be provided unless this is a de novo search." So I think they should be included in SIM-XL's output?

What do you think?

best wishes,
Colin

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+unsubscribe@googlegroups.com.
--
Paulo Costa Carvalho
http://pcarvalho.com

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+unsubscribe@googlegroups.com.

Diogo Borges

unread,
Jun 28, 2018, 8:39:56 AM6/28/18
to SIM-XL
Dear Coli,

Firstly, welcome to our discussion forum and thanks for reporting us these missing values in the SIM-XL output file.

Actually, when the xl schema was developed (at the beginning of last year) both attributes ('start' and 'end') were optional even to cross-linking results, however, after to release the final version they turned required except to de novo.

We've already validated our schema and results many times before to release a new SIM-XL version, and every time (even after you sent us this email) they are validated by the oficial mzIdentML Validator.

However, we agree with you and we updated our SIM-XL results inserting these attributes ('start' and 'end' on PeptideEvidence tag).

We've released a new version of the SIM-XL (v. 1.5.1.3) with this update.

Below you can see the Report Validation of mzIdentML Validator for this new SIM-XL output version:
ps: All warning messages are not valid for the XL case, they are valid for shotgun classic results.

Thanks once again for your email.

######
The following messages were obtained during the validation of your XML file:


Message 1:
    Rule ID: CvListObjectRule
    Level: INFO
    Context(/MzIdentML/cvList )
    --> The cv element for PSI-MS uses an old version.
    Tip: Provide the newest version for all cv element under the CvList element./MzIdentML/cvList


Message 2:
    Rule ID: SpectrumIdentificationList_may_rule
    Level: INFO
    Context(/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/DataCollection/AnalysisData/SpectrumIdentificationList/cvParam/@accession' because no values were found:
  - Any children term of MS:1001184 (search statistics). The term can be repeated. The matching value has to be the identifier of the term, not its name.


Message 3:
    Rule ID: DBSequence_ProteinDescription_may_rule
    Level: INFO
    Context(/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/SequenceCollection/DBSequence/cvParam/@accession' because no values were found:
  - The sole term MS:1001088 (protein description) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.


Message 4:
    Rule ID: SearchDatabase_may_rule
    Level: INFO
    Context(/searchDatabase/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/DataCollection/Inputs/searchDatabase/cvParam/@accession' because no values were found:
  - Any children term of MS:1001011 (search database details). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1000561 (data file checksum type). The term can be repeated. The matching value has to be the identifier of the term, not its name.


Message 5:
    Rule ID: DBSequence_may_rule
    Level: INFO
    Context(/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/SequenceCollection/DBSequence/cvParam/@accession' because no values were found:
  - Any children term of MS:1001089 (molecule taxonomy). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1001342 (database sequence details). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1002636 (proteogenomics attribute). The term can be repeated. The matching value has to be the identifier of the term, not its name.


Message 6:
    Rule ID: SourceFile_may_rule
    Level: INFO
    Context(/sourceFile/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/DataCollection/Inputs/sourceFile/cvParam/@accession' because no values were found:
  - Any children term of MS:1000561 (data file checksum type). The term can be repeated. The matching value has to be the identifier of the term, not its name.


Message 7:
    Rule ID: SearchDatabaseDatabaseName_may_rule
    Level: INFO
    Context(/searchDatabase/databaseName/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/DataCollection/Inputs/searchDatabase/databaseName/cvParam/@accession' because no values were found:
  - Any children term of MS:1001013 (database name). The term can be repeated. The matching value has to be the identifier of the term, not its name.


Message 8:
    Rule ID: SpectrumIdentificationResult_may_rule
    Level: INFO
    Context(/spectrumIdentificationResult/cvParam/@accession ) in 248 locations
    --> None of the given CvTerms were found at '/MzIdentML/DataCollection/AnalysisData/SpectrumIdentificationList/spectrumIdentificationResult/cvParam/@accession' because no values were found:
  - The sole term MS:1000894 (retention time) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1001405 (spectrum identification result details). The term can be repeated. The matching value has to be the identifier of the term, not its name.


======== RULE STATISTICS ========
Invalid XML schema validation: 0

CvMappingRule total count: 50
CvMappingRules not run: 5
CvMappingRules run & not matching: 7
CvMappingRules invalid XPath: 0
CvMappingRules run & valid: 38

ObjectRules total count: 15
ObjectRules not run: 2
ObjectRules run & not matching: 1
ObjectRules run & valid: 12

Unanticipated CV terms: 0
XL interaction scoring messages: 0
Not matching messages received: 7
#####

_____________________________________

Diogo Borges Lima
Postdoctoral fellow in Computational Biology
Mass Spectrometry for Biology Unit
Institut Pasteur - CNRS USR 2000
Computational Mass Spectrometry Group - Fiocruz / Brazil

About me: @diogobor | diogobor
URL: diogobor


--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+un...@googlegroups.com.

Diogo Borges

unread,
Jun 28, 2018, 9:02:20 AM6/28/18
to SIM-XL
Hi Colin,

We've submitted recently an example to github forum but so far we have not received an answer. However, I think in few days you can see this examples at the same link you sent us. (https://github.com/HUPO-PSI/mzIdentML/tree/master/examples/1_2examples/crosslinking)

The PDX001677 was the first XL mzIdentML submitted to PRIDE, but its schema was v. 1.1. So after, we submitted the PDX006574 project, which was the first XL mzIdentML submitted to PRIDE with schema version 1.2. As Paulo said previously, we've worked together with the PRIDE team to develop this new schema.

Best regards,
_____________________________________

Diogo Borges Lima
Postdoctoral fellow in Computational Biology
Mass Spectrometry for Biology Unit
Institut Pasteur - CNRS USR 2000
Computational Mass Spectrometry Group - Fiocruz / Brazil

About me: @diogobor | diogobor
URL: diogobor

Le jeu. 28 juin 2018 à 14:32, 'colin c' via SIM-XL Group <si...@googlegroups.com> a écrit :
Hi,
thanks for reply.

I think the schema allows addressing multiple identifications of the same spectrum (SpectrumIdentificationResult can contain unbounded number of SpectrumIdentificationItems).

Yes, I thought I should check output from the most recent version of your software. Are there examples online somewhere? it might be good to add an example to https://github.com/HUPO-PSI/mzIdentML/tree/master/examples/1_2examples/crosslinking

I think PDX006574 is your second contribution to PRIDE, the first was PDX001677? ;)

best wishes,
colin

On Wed, Jun 27, 2018 at 8:22 PM, Paulo C Carvalho <pa...@pcarvalho.com> wrote:
BTW, the files generated for the pdx below were the first ever xl files done for mzidentml. We worked together with the folks from pride for so. As that specific pdx was the first, it might not reflect the latest revisions of the format. I know this was updated recently within sim xl. 

On Wed, Jun 27, 2018, 7:54 AM colincombe via SIM-XL Group <si...@googlegroups.com> wrote:
Hi,

I think I see an issue with the mzIdentML output from SIM-XL.

It seems the PeptideEvidence elements are missing their 'start' and 'end' attributes, which give the position of the peptide in the protein. For example, see the SIM-XL generated files in PRIDE project PDX006574 (https://www.ebi.ac.uk/pride/archive/projects/PXD006574).

The schema (v1.2.0, para 6.49) says that these attributes "must be provided unless this is a de novo search." So I think they should be included in SIM-XL's output?

What do you think?

best wishes,
Colin

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+un...@googlegroups.com.
--
Paulo Costa Carvalho
http://pcarvalho.com

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+un...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+un...@googlegroups.com.

colin c

unread,
Jun 28, 2018, 9:32:54 AM6/28/18
to si...@googlegroups.com
Hi Diogo,

thanks for making this change.

Colin

To unsubscribe from this group and stop receiving emails from it, send an email to simxl+unsubscribe@googlegroups.com.
--
Paulo Costa Carvalho
http://pcarvalho.com

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+unsubscribe@googlegroups.com.

Diogo Borges

unread,
Jun 28, 2018, 11:03:41 AM6/28/18
to SIM-XL
Hi Colin,

Just to know, you can see an example about the *.mzid file generated by SIM-XL in this link:

Best regards,
Diogo

_____________________________________

Diogo Borges Lima
Postdoctoral fellow in Computational Biology
Mass Spectrometry for Biology Unit
Institut Pasteur - CNRS USR 2000
Computational Mass Spectrometry Group - Fiocruz / Brazil

About me: @diogobor | diogobor
URL: diogobor

To unsubscribe from this group and stop receiving emails from it, send an email to simxl+un...@googlegroups.com.
--
Paulo Costa Carvalho
http://pcarvalho.com

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+un...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+un...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+un...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "SIM-XL Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simxl+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages