Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
"base_name" constraint in pepXML
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Hendrik Weisser  
View profile  
 More options Nov 11, 5:53 pm
From: Hendrik Weisser <weis...@imsb.biol.ethz.ch>
Date: Wed, 11 Nov 2009 14:53:54 -0800 (PST)
Local: Wed, Nov 11 2009 5:53 pm
Subject: "base_name" constraint in pepXML
Hi!

I'm working on the pepXML parser in OpenMS. I've been confronted with
a type of pepXML file I hadn't seen before, where search results from
different search engines - but for the same experiment - were
collected in one file (with one "msms_run_summary" per search engine).
I've added (maybe prematurely) support for this to the OpenMS parser,
and then wanted to construct a simple pepXML file for testing
purposes.

In doing so, I've now come across a constraint in the pepXML schema
(at least from v1.8 on) that says values of the "base_name" attribute
(supposed to contain the full path to the searched mzXML file) in the
"search_summary" element have to be unique within the document.
What is the rationale behind this constraint? Is it supposed to
prevent the above case, where different searches of the same
experiment end up in one file? Why would that be desirable/necessary?
(Also note that I can construct a valid and parseable pepXML file from
two different search runs of the same file if I change the path in
"base_name"...)

In an earlier discussion (http://groups.google.com/group/spctools-
discuss/msg/7760dcda02877922?hl=en), it was mentioned that
"base_name"s in "msms_run_summary" elements had to be unique in the
document - however, as per the schema, that's not true. Also, the
"base_name" of an "msms_run_summary" is not tied to the "base_name" in
subordinate "search_summary"s. If there were such a constraint, it
would be impossible to have more than one "search_summary" under an
"msms_run_summary" - however, this is allowed in the schema.
When does it make sense to have different "base_name"s in an
"msms_run_summary" and its subordinate "search_summary"(s)? Judging
from the schema documentation and the files I've seen, it seems that
the values should be the same. On the other hand, why have the
attribute in both elements then?

All this adds to my confusion about the appropriate use of
"base_name"...

I would be happy if someone could clear things up for me.

Best regards

Hendrik


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric Deutsch  
View profile  
 More options Nov 13, 3:31 am
From: "Eric Deutsch" <edeut...@systemsbiology.org>
Date: Fri, 13 Nov 2009 00:31:17 -0800
Local: Fri, Nov 13 2009 3:31 am
Subject: RE: [spctools-discuss] "base_name" constraint in pepXML

Hi Hendrik, I think we need to get an authoritative answer from David on
this one. And he is currently traveling in the Land of the Finns. We will
let/ask him to answer when he is next able.

Regards,
Eric


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Shteynberg  
View profile  
 More options Nov 20, 1:27 pm
From: David Shteynberg <dshteynb...@systemsbiology.org>
Date: Fri, 20 Nov 2009 10:27:32 -0800
Local: Fri, Nov 20 2009 1:27 pm
Subject: Re: [spctools-discuss] Re: "base_name" constraint in pepXML
Hi Hendrik,

The element msms_pipeline_analysis/msms_run_summary has an attribute
base_name to specify the path to the datafile.  In case the searched
file specified is different from the original data file there is
another entry in the element
msms_pipeline_analysis/msms_run_summary/search_summary for base_name.
As far as I know, there is nothing in the schema that requires these
to be unique in the pepXML file. Can you point me to where this
constraint is specified in the schema.  I checked version 1.8.

-David

On Fri, Nov 13, 2009 at 12:31 AM, Eric Deutsch


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Shteynberg  
View profile  
 More options Nov 20, 1:30 pm
From: David Shteynberg <dshteynb...@systemsbiology.org>
Date: Fri, 20 Nov 2009 10:30:10 -0800
Local: Fri, Nov 20 2009 1:30 pm
Subject: Re: [spctools-discuss] Re: "base_name" constraint in pepXML
OK I take that back.  I see where the unique constraint is listed.  I
will have to consider your questions further.

-David

On Fri, Nov 20, 2009 at 10:27 AM, David Shteynberg


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Shteynberg  
View profile  
 More options Nov 20, 1:51 pm
From: David Shteynberg <dshteynb...@systemsbiology.org>
Date: Fri, 20 Nov 2009 10:51:10 -0800
Local: Fri, Nov 20 2009 1:51 pm
Subject: Re: [spctools-discuss] Re: "base_name" constraint in pepXML
I will try to reason this through using trying to think from the
original authors point of view.  The idea is that different searches
of the same data would happen in separate directories and the
base_name (full path to the data file) would identify one search of an
mzXML file representing one msms_run, and more than one search would
never happen in one directory on the same file.  Also it is natural to
keep all results from one search engine run on an mzXML file in one
place in the mzXML file. However, you could have more than one search
that references the same data and these don't necessarily have to be
placed together in the pepXML file.  Although the problem with this is
that you could have different paths to the same data and these would
all be listed.  In the iProphet tool (which combines results from
multiple searches of the same data), I don't look at either base_name
but rather the spectrum names themselves, with the combination of
experiment_label, which is a user specified parameter that identifies
data from the same experiment.  The idea is that the combination of
experiment_label and spectrum name will uniquely identify a spectrum
searched.  I hope this is helpful.  Let us know if you have other
questions.

-David

On Fri, Nov 20, 2009 at 10:30 AM, David Shteynberg


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Greg Bowersock  
View profile  
 More options Nov 20, 2:34 pm
From: Greg Bowersock <bowers...@gmail.com>
Date: Fri, 20 Nov 2009 13:34:08 -0600
Local: Fri, Nov 20 2009 2:34 pm
Subject: Re: [spctools-discuss] Re: "base_name" constraint in pepXML

David, just a quick reply to part of your message. Normally, I make a
directory for an experiment and I will process the mascot, sequest,
and possibly X!Tandem data from each mzXML file in the same directory. I do
append the name of the search to the TPP files, so I can determine which
search engine the data came from. These data sometimes are combined (not
always with iProphet, since I just recently implemented it), so I would
violate your separate directory for each search engine rule. I've never seen
any documentation that either way is required, so I doubt I am the only one
processing data this way.

Greg

On Fri, Nov 20, 2009 at 12:51 PM, David Shteynberg <


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hendrik Weisser  
View profile  
 More options Nov 21, 2:27 pm
From: Hendrik Weisser <weis...@imsb.biol.ethz.ch>
Date: Sat, 21 Nov 2009 11:27:22 -0800 (PST)
Local: Sat, Nov 21 2009 2:27 pm
Subject: Re: "base_name" constraint in pepXML
Dear David and Greg, thanks for your answers.

I still don't see the sense in having the "base_name" constraint,
though. The basic question for me is, what would break without the
constraint? Because as far as I can tell now, it just prevents a
sensible use case - searching the same mzXML file with different
tools, and aggregating the results in one pepXML file.

> dshteynb...@systemsbiology.org> wrote:
> > The idea is that different searches
> > of the same data would happen in separate directories and the
> > base_name (full path to the data file) would identify one search of an
> > mzXML file representing one msms_run, and more than one search would
> > never happen in one directory on the same file.

I understand that you might keep the results from different search
engines in different directories, but surely you wouldn't copy (or
link, if you're smart) an mzXML file to three different directories to
search it with Mascot, Sequest and X!Tandem, would you?

> > In the iProphet tool (which combines results from
> > multiple searches of the same data), I don't look at either base_name
> > but rather the spectrum names themselves, with the combination of
> > experiment_label, which is a user specified parameter that identifies
> > data from the same experiment.  The idea is that the combination of
> > experiment_label and spectrum name will uniquely identify a spectrum
> > searched.

This is exactly the problem that I have: Wouldn't it be far easier to
use the "base_name" attribute to make the connection to the mzXML
file, rather an a custom label? I think it would be - only you
couldn't do it with the current schema because of that strange
uniqueness constraint.
I'm working with iProphet results and I have to do quite some post-
processing to find out which results belong to which mzXML file, so
that I can annotate features derived from the mzXML with peptide
sequences. (To be fair, the main reason is that only for X!Tandem
"base_name" really contains the path to the original mzXML...)
From the perspective of a third-party developer, I woud much have
rather see a clean-up of the existing pepXML than the addition of new
elements.

Hendrik


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google