>ENST00000591655_6 [141 - 263] cdna:known chromosome:GRCh37:HG311_PATCH:118380352:118404649:1 gene:ENSG00000266200 gene_biotype:polymorphic_pseudogene transcript_biotype:nonsense_mediated_decay CCPLGPSAFSCWPQSEEKRSATDNLAAFLMKNHGQEPFSDL >ENST00000567207_4 [81 - 371] cdna:known chromosome:GRCh37:16:90160431:90162566:1 gene:ENSG00000261812 gene_biotype:polymorphic_pseudogene transcript_biotype:processed_transcript TCHRLRWHLPRGQPPAAGARQRAPPRGQRWQVRASRCARGSGAGHHGLRALGALRAGLQARQLHFPSVWGRKQLGQGTLHRRRGADGVSDGRCQKGG >ENST00000390539_12 [1 - 1020] cdna:known chromosome:GRCh37:14:106053226:106054732:-1 gene:ENSG00000211890 gene_biotype:IG_C_gene transcript_biotype:IG_C_gene ASPTSPKVFPLSLDSTPQDGNVVVACLVQGFFPQEPLSVTWSESGQNVTARNFPPSQDASGDLYTTSSQLTLPATQCPDGKSVTCHVKHYTNSSQDVTVPCRVPPPPPCCHPRLSLHRPALEDLLLGSEANLTCTLTGLRDASGATFTWTPSSGKSAVQGPPERDLCGCYSVSSVLPGCAQPWNHGETFTCTAAHPELKTPLTANITKSGNTFRPEVHLLPPPSEELALNELVTLTCLARGFSPKDVLVRWLQGSQELPREKYLTWASRQEPSQGTTTYAVTSILRVAAEDWKKGETFSCMVGHEALPLAFTQKTIDRMAGKPTHINVSVVMAEADGTCY
>sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1 MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASWRIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVFYYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGEEQNKEALQDVEDENQ
Hi Pratik,
As far as I can see the problem is with the parsing of the database headers.
Where do all the non-UniProt headers come from? If these are not in a standard format we cannot parse them.
At the moment this database cannot be indexed by our code:
Galaxy21-[Merged_and_Filtered_FASTA_from_data_10,_data_11,_and_others].fasta
What we need is a generic Java regular expression that can be used to pick up these headers and parse them correctly. Do all of the non-UniProt headers have this general formatting?
dmic_c_1_469 Dialister micraerophilus DSM 19965 [161699 - 160872] aspartate-semialdehyde dehydrogenase Database
And can everything up to the first white space always be used as the accession number?
While we're looking into this, perhaps you can redo the tests on a database without the non-UniProt headers?
I don't get the part about MS-GF+ resulting in empty outputs though (15-17). As far as I can see there is an mzid output and peptide and protein reports?
Best regards,
Harald
Den 2014-09-03 21:41, skrev Pratik Jagtap:
Hello Ira, Bjoern, Harald and Marc,
Please see attached an Excel sheet that enlists steps and issues with
SearchGUI-PeptideShaker that we encountered.
Please see a link to the history for this test:
http://galaxyp-dev.msi.umn.edu:8081/u/pratik/h/p4-control-peptideshaker-test-september-2014
[1]
The good news: Basic search with OMSSA, X!tandem and MS-GF+ searches
against UniProt database with PeptideShaker worked (items 18-20 in the
history).
Error 1: MS-GF+ only searches gave empty outputs (items 15-17 in the
history).
Error 2: Basic search with OMSSA, X!tandem and MS-GF+ searches against
UniProt database (along with HOMD and 3-frame translated cDNA db with
PeptideShaker did not work (items 22-24 in the history).
Please see and suggest what would need to be done to resolve this
issue. Please let me know if you need more information.
Regards,
Pratik
Pratik Jagtap,
Managing Director,
Center for Mass Spectrometry and Proteomics,
43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108
Phone: 612-624-9275 [2]
Links:
------
[1]
http://galaxyp-dev.msi.umn.edu:8081/u/pratik/h/p4-control-peptideshaker-test-september-2014
[2] tel:612-624-9275
Hi again,
I've now added a regular expression to parse the cDNA FASTA headers.
New beta versions are available here:
https://www.dropbox.com/s/h4msbpxj9um3j8t/SearchGUI-1.20.8-beta-mac_and_linux.tar.gz?dl=0
https://www.dropbox.com/s/0nseai5d9nejqt1/PeptideShaker-0.33.6-beta.zip?dl=0
This should solve the database issues.
Best regards,
Harald
Den 2014-09-03 21:41, skrev Pratik Jagtap:
Hello Ira, Bjoern, Harald and Marc,
Please see attached an Excel sheet that enlists steps and issues with
SearchGUI-PeptideShaker that we encountered.
Please see a link to the history for this test:
http://galaxyp-dev.msi.umn.edu:8081/u/pratik/h/p4-control-peptideshaker-test-september-2014
[1]
The good news: Basic search with OMSSA, X!tandem and MS-GF+ searches
against UniProt database with PeptideShaker worked (items 18-20 in the
history).
Error 1: MS-GF+ only searches gave empty outputs (items 15-17 in the
history).
Error 2: Basic search with OMSSA, X!tandem and MS-GF+ searches against
UniProt database (along with HOMD and 3-frame translated cDNA db with
PeptideShaker did not work (items 22-24 in the history).
Please see and suggest what would need to be done to resolve this
issue. Please let me know if you need more information.
Regards,
Pratik
Pratik Jagtap,
Managing Director,
Center for Mass Spectrometry and Proteomics,
43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108
Den 2014-09-04 03:38, skrev Pratik Jagtap:
I don't get the part about MS-GF+ resulting in empty outputs though(15-17). As far as I can see there is an mzid output and
peptide and protein reports?
The mzid output, peptide and protein output mentions that it is empty.
Please let me know if you need more information.
I downloaded the files for 15-17 via the link you sent in the first e-mail, and these were far from empty. So I'm not sure if we're talking about the same files..?
Also, there are no error messages for 15-17?
Harald
Den 2014-09-04 03:52, skrev Pratik Jagtap:
Will this take care of the microbial database as well?
Yes, it should. As long as the formatting is the same. At least I was able to load both databases and add decoys in the new beta version of SearchGUI.
But if you come across a database that doesn't work, just make it available to us and we'll see what we can do.
Also will this need to be "Galaxy-wrapped' by Bjoern, Ira or JJ?
Not sure what you mean. You have to do the same as usual when updating to new versions. There is nothing special about these versions in that regard.
Harald
Den 2014-09-04 04:48, skrev Pratik Jagtap:
You have to do the same as usual when updating to new versions.There is nothing special about these versions in that regard.
Thanks ! Bjoern, Ira or JJ - can you please let us know when the new
beta version has been Galaxy-wrapped and placed in toolshed for Trevor
to upload onto galaxyp-dev?
I noticed in the mzid export that you're using PeptideShaker 0.31.4?
If this is correct, updating to the latest versions of both SearchGUI and PeptideShaker will fix a couple of bugs, and also includes improvements in the protein inference.
But perhaps most importantly it adds support for MyriMatch and includes various zip export options in SearchGUI making it easier to link SearchGUI and PeptideShaker. For details see "Optional output compression parameters" at http://code.google.com/p/searchgui/wiki/SearchCLI. The zip file from SearchGUI can be given directly as input to PeptideShaker, as detailed in a previous e-mail from Marc.
Also note the new species_update parameter added to the PeptideShakerCLI (http://code.google.com/p/peptide-shaker/wiki/PeptideShakerCLI) to control the download and update of the gene and GO mappings.
And as always, if you have any questions about the changes or come across any issues, please let us know.
Best regards,
Harald
Am 04.09.2014 um 14:24 schrieb Harald Barsnes:
Den 2014-09-04 13:44, skrev Björn Grüning:
@Harald: For what is the fasta header used internally? I'm very
sceptical to add new regex into peptideshaker every time a new faster
header is encountered. Fasta headers are not really standardised, at
least they are not used a lot. I think we should reformat fasta files
in Galaxy to fit your input-standard, otherwise you will endup to
support a list of endless file formats.
Great!The FASTA headers are used as a way of referring to a specific protein
in the FASTA file via our index file. Thus we have to be able to parse
the headers to extract a unique accession number. This is indeed not
trivial. For our current FASTA header parsing see:
http://code.google.com/p/compomics-utilities/source/browse/trunk/src/main/java/com/compomics/util/protein/Header.java
Note that if none of our patterns kick in, we end up trying to use the
whole header as the accession number. So even if a header is not
recognized it should (in most case) not break the parsing.
Reformatting the headers in Galaxy would make things simpler for
SearchGUI/PeptideShaker, but I think it would just move the problem, as
the issue of how to handle non-standard headers will be the same, just
at a different location in the pathway.
However, we do have a recommended format for such non-standard headers:
http://code.google.com/p/searchgui/wiki/DatabaseHelp#Non_Standard_FASTA
I'm not talking about converting every format, but Galaxy is flexible enough to create easily a pipeline that will be able convert header X to Non_Standard_FASTA. We just need to inform our users and give advise how to do so, if you have a non_standard_header fasta.
But not sure how easy it would be to convert all possible FASTA file
headers to this format?
We could of course agree on a different common format. The challenge is
that it has to be unique and not clash with the other supported header
formats.
@Pratik: Can you try to reformat the header in Galaxy to fit the
uniprot norm? That would solve your problem.
Yes makes sense. Actually, this is what I meant. Convert your fasta file to a searchgui standard fasta file (what ever this is). I was just assuming that this is UniProt based.I would strongly advice against this, as the header would then be picked
up as a UniProt header and annotated as such. So while the header could
then be parsed, the code would also attempt to link it to UniProt which
of course wouldn't work. So I'd rather go for our own unique header
formatting rather than trying to mimic an existing format.
Thanks,
Bjoern
Harald
--
You received this message because you are subscribed to the Google Groups "Galaxy for Proteomics" group.
To post to this group, send email to gal...@umn.edu.
Visit this group at http://groups.google.com/a/umn.edu/group/galaxyp/.
To view this discussion on the web visit https://groups.google.com/a/umn.edu/d/msgid/galaxyp/CAFMfZ42axs7YhG9RKMC69S6bV9jOEndbgxnP9OYf%3Do4EGZATFg%40mail.gmail.com.To unsubscribe from this group and stop receiving emails from it, send an email to galaxyp+u...@umn.edu.
<PeptideShaker Test PartTwo.xls>
Hello Everyone,Here is a history that goes from RAW files to PeptideShaker outputs: http://galaxyp-dev.msi.umn.edu:8081/u/pratik/h/control-p4-raw-to-second-step-searchHere is the workflow that makes it possible: http://galaxyp-dev.msi.umn.edu:8081/u/pratik/w/copy-of-p4-mzml-to-second-step-searchThe workflow uses a dataset collection input - has msconvert, MGF formatter, Protein database Downloader, Regex Find and Replace, FASTA Merge Files and PeptideShaker as tools.I will keep you updated as we make more progress.Regards,Pratik
Pratik Jagtap,Managing Director,Center for Mass Spectrometry and Proteomics,43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108Phone: 612-624-9275
On Sat, Sep 6, 2014 at 8:37 PM, Pratik Jagtap <pja...@umn.edu> wrote:
Hello Everyone,Here is a cleaner history and its workflow:I will keep everyone updated on any issues / successes.Regards,Pratik
Pratik Jagtap,Managing Director,Center for Mass Spectrometry and Proteomics,43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108Phone: 612-624-9275
On Fri, Sep 5, 2014 at 10:20 AM, Pratik Jagtap <pja...@umn.edu> wrote:My bad - I was looking at other history.I will look at this closer and see if the outputs can be used for further processing.Thanks again !
Pratik Jagtap,Managing Director,Center for Mass Spectrometry and Proteomics,43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108Phone: 612-624-9275
On Fri, Sep 5, 2014 at 10:17 AM, Jim Johnson <john...@umn.edu> wrote:Pratik,
Are we still looking at: http://galaxyp-dev.msi.umn.edu:8081/u/pratik/h/p4-control-peptideshaker-test-september-2014-part-two
It appears to me that the job completed successfully. The 3 output datasets all seem to contain data.
Should there be any additional outputs?
The stdout/stderr does contain info about java VM settings and logging bindings.
But I didn't see anything that would indicate that the job failed.
Is there something I'm missing?
JJ
On 9/5/14, 9:57 AM, Pratik Jagtap wrote:
Hello Everyone,
The items 37, 38 and 39 failed with the following error: Fatal error: Java ExceptionPicked up _JAVA_OPTIONS: -Xmx6291456k
I will wait for JJ or Tom to answer about the "Picked up _JAVA_OPTIONS: -Xmx6291456k" issue and Bjoern's question about wrapper.
Regards,
Pratik
Pratik Jagtap,Managing Director,Center for Mass Spectrometry and Proteomics,43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108
Phone: 612-624-9275
On Fri, Sep 5, 2014 at 9:13 AM, Pratik Jagtap <pja...@umn.edu> wrote:
Hello Harald, Bjoern and JJ,
Thanks JJ - I am testing JJ's suggestion now (see items 37, 38 and 39) in http://galaxyp-dev.msi.umn.edu:8081/u/pratik/h/p4-control-peptideshaker-test-september-2014-part-two
@Harald - the "generic_HOMD|sp|" with "sp|" and "generic_HOMD|tr|" with "tr|" issue was only in item# 25 database. Item #31 which picked up the Java error did not have those entries.
I will ask JJ, Trevor or Tom to look at the Java issue.
@Bjoern -
@Pratik: can you check you have the following lines in your wrapper?
<stdio>
<exit_code range="1:" level="fatal" description="Job Failed" />
<regex match="Error" level="fatal" description="Error encounterd!"/>
</stdio>
This should filter junk from stderr and only fail if there are real errors, indicated with "error" or a realy unix error code.
I will request JJ, Trevor or Tom to look at this. Thanks.
Thanks and Regards,
Pratik
Pratik Jagtap,Managing Director,Center for Mass Spectrometry and Proteomics,43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108
Phone: 612-624-9275
On Fri, Sep 5, 2014 at 8:32 AM, Jim Johnson <john...@umn.edu> wrote:
I'm not sure if my previous email, which suggested a different Ensembl regex, actually got sent.
Find Regex:
>(ENST\S*) \[(\d+) - (\d+)]\s*(.*)
Replacement:
>generic_EnSEMBL|\1_:\2:\3|\4
Here's the stack trace:
java.util.regex.PatternSyntaxException: Illegal character range near index 38
AA_coverage_ccs_ENST00000261769_44_[3-2837]_cus_ENST00000562836_44_[183-2717]_cus_H3BNC6_cus_H3BVI7_cus_P12830
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.range(Pattern.java:2594)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.lang.String.split(String.java:2313)
at java.lang.String.split(String.java:2355)
at eu.isas.peptideshaker.utils.IdentificationFeaturesCache.getObjectKey(IdentificationFeaturesCache.java:644)
at eu.isas.peptideshaker.utils.IdentificationFeaturesCache.addObject(IdentificationFeaturesCache.java:242)
at eu.isas.peptideshaker.utils.IdentificationFeaturesGenerator.getAACoverage(IdentificationFeaturesGenerator.java:143)
at eu.isas.peptideshaker.utils.IdentificationFeaturesGenerator.estimateSequenceCoverage(IdentificationFeaturesGenerator.java:203)
at eu.isas.peptideshaker.utils.IdentificationFeaturesGenerator.getSequenceCoverage(IdentificationFeaturesGenerator.java:438)
at eu.isas.peptideshaker.export.sections.ProteinSection.getFeature(ProteinSection.java:345)
at eu.isas.peptideshaker.export.sections.ProteinSection.writeSection(ProteinSection.java:186)
at eu.isas.peptideshaker.export.PSExportFactory.writeExport(PSExportFactory.java:296)
at eu.isas.peptideshaker.cmd.CLIMethods.exportReport(CLIMethods.java:249)
at eu.isas.peptideshaker.cmd.ReportCLI.call(ReportCLI.java:135)
at eu.isas.peptideshaker.cmd.ReportCLI.main(ReportCLI.java:256)
On 9/5/14, 8:23 AM, Jim Johnson wrote:
The fasta IDs are used to construct a key for caching Features, and the key is used as a regex:
eu.isas.peptideshaker.utils.IdentificationFeaturesCache.getObjectKey(IdentificationFeaturesCache.java:644)
/**
* Convenience method returning the object key based on the cache key.
*
* @param cacheKey the cache key
* @return the object key
*/
private String getObjectKey(String cacheKey) {
StringBuilder buf = new StringBuilder();
String[] splittedKey = cacheKey.split(cacheKey);
for (int i = 1; i < splittedKey.length; i++) {
buf.append(splittedKey[i]);
}
return buf.toString();
}
So the generated "generic" fasta header lines should not have characters that have characters that specify regex constructs: [ ] ( ) \
The range designation in the Ensembl headers needs to be something other that "[2653-3087]"
>generic_EnSEMBL|ENST00000460658_48_[2653-3087]|cdna:known chromosome:GRCh37:22:31484088:31497769:1 gene:ENSG00000183963 gene_biotype:protein_coding transcript_biotype:retained_intron
FLPESIKPFPHSIPCQVMAVPSPQLLLERPLLPVSFMFLTSHPPPRLVCPMHLCICAVWVLVALLRMHGASPAQTSGTRSGNGGCRRHGAGQGRGAATQPLRPPRGTASGQLMALLSALLPRLSGSSTPMMAHGRPAPPQWSRVS
Would using ":2653:3087" work ( or does some other application rely on the "[2653-3087]" construct?
generic_EnSEMBL|ENST00000460658_48_:2653:3087|cdna:known chromosome:GRCh37:22:31484088:31497769:1 gene:ENSG00000183963 gene_biotype:protein_coding
Find Regex:
>(ENST\S*) \[(\d+) - (\d+)]\s*(.*)
Replacement:
>generic_EnSEMBL|\1_:\2:\3|\4
Demonstrated in Step#5 of history:
http://galaxyp-dev.msi.umn.edu:8081/u/jjohnson/h/fasta-id-conversions
--
James E. Johnson Minnesota Supercomputing Institute University of Minnesota
--
James E. Johnson Minnesota Supercomputing Institute University of Minnesota
--
James E. Johnson Minnesota Supercomputing Institute University of Minnesota
Pratik, thanks for sharing! Soon there will be a new version of OpenMS with hopefully all tools!
Cheers,
Bjoern
Am 11.09.2014 um 01:12 schrieb Pratik Jagtap:
Hello Everyone,
Here is a history that goes from RAW files to PeptideShaker outputs:
http://galaxyp-dev.msi.umn.edu:8081/u/pratik/h/control-p4-raw-to-second-step-search
Here is the workflow that makes it possible:
http://galaxyp-dev.msi.umn.edu:8081/u/pratik/w/copy-of-p4-mzml-to-second-step-search
The workflow uses a dataset collection input - has msconvert, MGF
formatter, Protein database Downloader, Regex Find and Replace, FASTA Merge
Files and PeptideShaker as tools.
I will keep you updated as we make more progress.
Regards,
Pratik
Pratik Jagtap,
Managing Director,
Center for Mass Spectrometry and Proteomics,
43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108
Phone: 612-624-9275
Hello Everyone,
Here is a cleaner history and its workflow:
History:
http://galaxyp-dev.msi.umn.edu:8081/u/pratik/h/p4-mzml-to-second-step-search-1
Workflow:
http://galaxyp-dev.msi.umn.edu:8081/u/pratik/w/copy-of-p4-raw-to-second-step-search
I will keep everyone updated on any issues / successes.
Regards,
Pratik
Pratik Jagtap,
Managing Director,
Center for Mass Spectrometry and Proteomics,
43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108
Phone: 612-624-9275
My bad - I was looking at other history.
I will look at this closer and see if the outputs can be used for further
processing.
Thanks again !
Pratik Jagtap,
Managing Director,
Center for Mass Spectrometry and Proteomics,
43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108
Phone: 612-624-9275
Pratik,
Are we still looking at:
http://galaxyp-dev.msi.umn.edu:8081/u/pratik/h/p4-control-peptideshaker-test-september-2014-part-two
It appears to me that the job completed successfully. The 3 output
datasets all seem to contain data.
Should there be any additional outputs?
The stdout/stderr does contain info about java VM settings and logging
bindings.
But I didn't see anything that would indicate that the job failed.
Is there something I'm missing?
JJ
On 9/5/14, 9:57 AM, Pratik Jagtap wrote:
Hello Everyone,
The items 37, 38 and 39 failed with the following error: Fatal error:
Java ExceptionPicked up _JAVA_OPTIONS: -Xmx6291456k
I will wait for JJ or Tom to answer about the "Picked up
_JAVA_OPTIONS: -Xmx6291456k" issue and Bjoern's question about wrapper.
Regards,
Pratik
Pratik Jagtap,
Managing Director,
Center for Mass Spectrometry and Proteomics,
43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108
Phone: 612-624-9275
Hello Harald, Bjoern and JJ,
Thanks JJ - I am testing JJ's suggestion now (see items 37, 38 and
39) in
http://galaxyp-dev.msi.umn.edu:8081/u/pratik/h/p4-control-peptideshaker-test-september-2014-part-two
@Harald - the "generic_HOMD|sp|" with "sp|" and "generic_HOMD|tr|"
with "tr|" issue was only in item# 25 database. Item #31 which picked up
the Java error did not have those entries.
I will ask JJ, Trevor or Tom to look at the Java issue.
@Bjoern -
@Pratik: can you check you have the following lines in your wrapper?
<stdio>
<exit_code range="1:" level="fatal" description="Job Failed" />
<regex match="Error" level="fatal" description="Error
encounterd!"/>
</stdio>
This should filter junk from stderr and only fail if there are real
errors, indicated with "error" or a realy unix error code.
I will request JJ, Trevor or Tom to look at this. Thanks.
Thanks and Regards,
Pratik
Pratik Jagtap,
Managing Director,
Center for Mass Spectrometry and Proteomics,
43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108
Phone: 612-624-9275
* String[] splittedKey = cacheKey.split(cacheKey);*
for (int i = 1; i < splittedKey.length; i++) {
buf.append(splittedKey[i]);
}
return buf.toString();
}
So the generated "generic" fasta header lines should not have
characters that have characters that specify regex constructs: [ ] ( ) \
The range designation in the Ensembl headers needs to be something
other that "[2653-3087]"
generic_EnSEMBL|ENST00000460658_48_[2653-3087]|cdna:knownchromosome:GRCh37:22:31484088:31497769:1 gene:ENSG00000183963
gene_biotype:protein_coding transcript_biotype:retained_intron
FLPESIKPFPHSIPCQVMAVPSPQLLLERPLLPVSFMFLTSHPPPRLVCPMHLCICAVWVLVALLRMHGASPAQTSGTRSGNGGCRRHGAGQGRGAATQPLRPPRGTASGQLMALLSALLPRLSGSSTPMMAHGRPAPPQWSRVS
Would using ":2653:3087" work ( or does some other application rely
on the "[2653-3087]" construct?
generic_EnSEMBL|ENST00000460658_48_:2653:3087|cdna:known
chromosome:GRCh37:22:31484088:31497769:1 gene:ENSG00000183963
gene_biotype:protein_coding
Find Regex:
(ENST\S*) \[(\d+) - (\d+)]\s*(.*)Replacement:
generic_EnSEMBL|\1_:\2:\3|\4
Demonstrated in Step#5 of history:
http://galaxyp-dev.msi.umn.edu:8081/u/jjohnson/h/fasta-id-conversions
On 9/4/14, 8:45 PM, Pratik Jagtap wrote:
--
James E. Johnson Minnesota Supercomputing Institute University of
Minnesota
--
James E. Johnson Minnesota Supercomputing Institute University of
Minnesota