Alternative converters : BioProject or SRA => ISA-TAB ?

146 views
Skip to first unread message

John Matese

unread,
Aug 29, 2016, 3:00:35 PM8/29/16
to ISAforum
Hi All,

I am aware of ISA-converter (to Pride-ML, MAGE-Tab, or SRA)
and also MAGE-TAB => ISA-TAB


Has anyone written a NCBI BioProject => ISA-TAB converter
or NCBI BioProject+SRA => ISA-Tab converter ?

Just wondering what currently exists...

Thanks for the info,
John Matese


John Matese

unread,
Aug 29, 2016, 3:30:52 PM8/29/16
to ISAforum
Found the following sra2isatab script,  anyone have success with it?

John Matese

unread,
Aug 29, 2016, 3:39:21 PM8/29/16
to ISAforum
And on http://isatools.readthedocs.io/en/latest/sraconversion.html
There is a brief note :

Importing SRA to ISA tab

from isatools.convert import sra2isatab
sra2isatab.sra_to_isatab_batch_convert(...)

John Matese

unread,
Aug 29, 2016, 5:04:59 PM8/29/16
to ISAforum
For anyone else interested, I did eventually get this working (on my MacOS environment), after
1) Downloading SaxonHE9-7-0-7J.zip and unzipping as ~/Applications/SaxonHE/ as that is the coded DEFAULT_SAXON_EXECUTABLE
2) I did a pip install , but for whatever reason, it did not install the following 2 required files, so I had to grab them independently, putting them in the coded location

3) Then, this test script seemed to work, so thanks!

import io

from isatools.convert import sra2isatab
isaio = sra2isatab.sra_to_isatab_batch_convert("your_SRP_accession_here")
with io.open('test.zip', 'wb') as file:
    file.write(isaio.read())

Alejandra Gonzalez-Beltran

unread,
Aug 31, 2016, 6:18:30 AM8/31/16
to isaf...@googlegroups.com
Hi John,

I am glad that you managed to ran the test script.

We will investigate why the pip install did not download all the required files.

Thanks for your feedback,

Alejandra (on behalf of the ISAtools team)

--
--
--
 
You received this message because you are subscribed to the Google
Groups "ISAforum" group.
To post to this group, send email to isaf...@googlegroups.com
To unsubscribe from this group, send email to
isaforum+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/isaforum?hl=en-GB
 
Visit the ISA tools website at http://isa-tools.org and http://isacommons.org
---
You received this message because you are subscribed to the Google Groups "ISAforum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isaforum+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Matese

unread,
Sep 8, 2016, 1:12:17 PM9/8/16
to ISAforum
Just following up that although it did write out some ISA-TAB files, the results my specific SRA accession did not validate/load into isacreator, so your mileage may vary... [i.e. additional work may be required]

Alejandra Gonzalez-Beltran

unread,
Sep 8, 2016, 1:21:13 PM9/8/16
to jcma...@gmail.com, isaf...@googlegroups.com
Hi John

Thanks for the follow up.

It would be great if you could provide more details such as the SRA accession you are using and what validations errors you are seeing.

BTW, we have just released the ISA API v0.3 (available via PyPI in the isatools package), which contains newly rewritten validation methods (for ISA-Tab and also the ISA-JSON representation) and it would be good to try with those.

Many thanks,

Alejandra


On 8 September 2016 at 18:12, John Matese <jcma...@gmail.com> wrote:
Just following up that although it did write out some ISA-TAB files, the results my specific SRA accession did not validate/load into isacreator, so your mileage may vary... [i.e. additional work may be required]

--

John Matese

unread,
Sep 8, 2016, 1:32:25 PM9/8/16
to Alejandra Gonzalez-Beltran, isaf...@googlegroups.com
The SRA access was SRP068846 .  I don’t know if it successfully accounted for all the SRA-coded nodes [the s_* file had 1301 lines, but the a_* file had only 28 lines, which seemed a little odd (2 orders of magnitude diff.), but I suppose could be possible].
SRP068846.zip
Screen Shot 2016-09-08 at 1.25.05 PM.png
Screen Shot 2016-09-08 at 1.25.27 PM.png

David Johnson

unread,
Sep 9, 2016, 3:35:30 AM9/9/16
to ISAforum, alejandra.gon...@gmail.com
Hi John,

I think this is a bug, I've had a look at the SRA record and the assay file, and the assay file is missing a whole bunch of assays that correspond to fastq files in the original SRA record. I'm not familiar with what should be the correct output from the SRA importer myself, so will have to liaise with my colleague who developed the script and get back to you on that.

Re: the missing XML files from the pip install - thanks for reporting this, it's just a packaging error and we can fix that right away.

Best/David

David Johnson

unread,
Sep 9, 2016, 9:20:20 AM9/9/16
to ISAforum, alejandra.gon...@gmail.com
Hi John,

I believe we have traced this down to an existing bug https://github.com/ISA-tools/xslt2isa/issues/3 that we hadn't got round to fixing yet.

I'll take a look at it now to see if we can push a fix asap.

Best/David 

David Johnson

unread,
Sep 12, 2016, 12:34:41 PM9/12/16
to ISAforum, alejandra.gon...@gmail.com
Hi John,

I've pushed a fix to the master branch of isa-api, which I have tested against the accession you were trying out, as well as with a larger study (the 3000 Rice Genomes study, ERP005654 - although this one is very large it takes a while to retrieve and convert...) and it seems to work as expected now.

Having had a look at the record on ENA for SRP068846 there should be 1299 samples and 344 experiments, which maps to the number of lines that should now appear if you try the sra2isatab import script.

Let us know how you get on with it!

Best/David

John Matese

unread,
Sep 13, 2016, 9:09:52 AM9/13/16
to ISAforum, alejandra.gon...@gmail.com
Thanks!  I will have a look at it, once I get through with these other datasets in my queue.
Cheers,
John

John Matese

unread,
Sep 22, 2016, 10:16:37 AM9/22/16
to ISAforum, alejandra.gon...@gmail.com
Yep, things look better after pulling your fix (345 lines in the a_* file). I have a question about lines 337-345 of the resulting a_SRP068846.txt file.
In the Raw Data File column, there are 4 ftp URIs (semi-colon delimited), presumably matching the "Parameter Value[read information {index;type;class;base coord}]" type:classes.  I don't think I was aware that multiple raw files could be coded in ISATAB that way; learn something new every day?
-john

John Matese

unread,
Oct 12, 2016, 12:36:33 PM10/12/16
to ISAforum, alejandra.gon...@gmail.com, djco...@gmail.com
Hi Alejandra and David,

I just wanted to report back with another possible issue with converting that SRA accession [ SRP068846 ], this time with potentially missing samples [as opposed to the missing assays I reported before].  After trying to load and curate the conversion result with ISACreator, it eventually reported back these errors in a validation attempt.  
SRS1260580 is a Sample Name in a_srp068846_metagenome_sequencing_nucleotide_sequencing.txt, but it is not defined in the Study Sample File.
error - s_SRP068846.txt
SRS1260573 is a Sample Name in a_srp068846_metagenome_sequencing_nucleotide_sequencing.txt, but it is not defined in the Study Sample File.
error - s_SRP068846.txt
SRS1260574 is a Sample Name in a_srp068846_metagenome_sequencing_nucleotide_sequencing.txt, but it is not defined in the Study Sample File.
error - s_SRP068846.txt
SRS1260575 is a Sample Name in a_srp068846_metagenome_sequencing_nucleotide_sequencing.txt, but it is not defined in the Study Sample File.
error - s_SRP068846.txt
SRS1260576 is a Sample Name in a_srp068846_metagenome_sequencing_nucleotide_sequencing.txt, but it is not defined in the Study Sample File.
error - s_SRP068846.txt
SRS1260577 is a Sample Name in a_srp068846_metagenome_sequencing_nucleotide_sequencing.txt, but it is not defined in the Study Sample File.
error - s_SRP068846.txt
SRS1260578 is a Sample Name in a_srp068846_metagenome_sequencing_nucleotide_sequencing.txt, but it is not defined in the Study Sample File.
error - s_SRP068846.txt
SRS1260579 is a Sample Name in a_srp068846_metagenome_sequencing_nucleotide_sequencing.txt, but it is not defined in the Study Sample File.
error - s_SRP068846.txt
Validation failed due to sample names being referenced in assay files that do not exist! See the log messages for details

The assays (whose samples are not successfully found) are the following

and queries for one of the missing samples ( SRS1260579 ) link directly to the assay (for example,  http://www.ebi.ac.uk/ena/data/view/SRX1544194  ). I think they all reference pools of samples, but perhaps don’t have a dedicated SRA accession, themselves [unless perhaps it is the library name, like H6XTJ2O_region1, H7CICOE_region1, etc.]?  I suspect it has something to do with the encoding/decoding of pooled/multiplexed samples from SRA/ENA and successfully representing them in the converted ISATAB study file, but I don't know that for certain.

Just thought I would report is as a potential issue, but thought you two would know best.  Let me know if I can assist with troubleshooting this.

Cheers,
John

David Johnson

unread,
Oct 13, 2016, 5:27:41 AM10/13/16
to ISAforum, alejandra.gon...@gmail.com, djco...@gmail.com
Hi John,

Thanks for this report. I'll try and have a look today, but I'm not sure about the full range capabilities of the sra2isatab importer as it relies on some XSLTs written by another developer here, so will discuss with the ISA Team to see if there's any known issues like what you're seeing.

Can I check, I presume you're using the SRA importer quite extensively then? Am just wondering so we can prioritise fixes on this accordingly.

Best/David

Philippe

unread,
Oct 13, 2016, 5:42:54 AM10/13/16
to isaf...@googlegroups.com, alejandra.gon...@gmail.com, djco...@gmail.com

Hi John

Thanks for reporting. I just had a look at this submission and indeed, the hint you gave is the cause of the missing samples from the resulting ISA.

The SRA submission does not declare the pool properly so walking back the xml tree stops with the absence of proper linking in the input file.

So it seems we need to get in touch with SRA/ENA on that one as there is not much I can do on the xslt side to address this, except throw a warning when/if i detect such cases.

I can follow up with ENA, unless you have already contacted them

All the best

Philippe

--
--
--
 
You received this message because you are subscribed to the Google
Groups "ISAforum" group.
To post to this group, send email to isaf...@googlegroups.com
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/isaforum?hl=en-GB
 
Visit the ISA tools website at http://isa-tools.org and http://isacommons.org
---
You received this message because you are subscribed to the Google Groups "ISAforum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isaforum+u...@googlegroups.com.

John Matese

unread,
Oct 13, 2016, 9:00:49 AM10/13/16
to isaf...@googlegroups.com, Alejandra Gonzalez-Beltran, djco...@gmail.com
Hi Phillippe,

I had not contacted either ENA or SRA yet, as I was not altogether clear what the issue was.  SRA/ENA clearly knows that the pooled sample exists, because you can query with the pooled accession [ https://www.ncbi.nlm.nih.gov/sra/?term=SRS1260579 ] and somehow SRA knows the relationships between the constituent-samples (table at the bottom of that resulting SRA assay page) => library => assay ). Perhaps the relationship gets lost from SRA-to-ENA transport (I think they attempt to mirror data)?  

David, for our project I guess we are running the full gamut;  we’ll be importing previously published datasets from public repositories (various converters), and also producing new curated datasets (ISAcreator).  So in that sense we are using a reasonable chunk of the ISA software suite.  In the past, we had mainly used both isacreator and http://isatab.sourceforge.net/magetoisa/ .  Given the direction of technology, I suspect we will probably start employing sra2isa and possibly Metabolights integration as well (pull and/or push)?  

Thanks again for all the support your team provides,
John

You received this message because you are subscribed to a topic in the Google Groups "ISAforum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/isaforum/8HhS-e670OU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isaforum+u...@googlegroups.com.

Philippe

unread,
Oct 13, 2016, 12:10:44 PM10/13/16
to isaf...@googlegroups.com, jcma...@gmail.com

Hi John,

Right, in fact, the XML for the ENA experiment indeed defines the pool:

http://www.ebi.ac.uk/ena/data/view/SRX1544187&display=xml so no loss here between SRA and ENA

I  think I have misled by ENA html rendering, which does not reference the samples but simply indicates something like ' 107 samples' but the XML document is fine

then, it means I am missing an xslt template to support the pool/pool_members, I'll need to dig into this transformation and push a fix. Adding a new issue to our tracker

best

Philippe

John Matese

unread,
Feb 16, 2017, 11:14:34 AM2/16/17
to ISAforum, jcma...@gmail.com
Just checking, was this pooling issue addressed?  I saw the the API was recently updated : http://isa-tools.org/2017/01/isa-api-milestone/

Philippe

unread,
Feb 16, 2017, 11:16:48 AM2/16/17
to isaf...@googlegroups.com, jcma...@gmail.com
Hi John

Unfortunately not yet, it is still outstanding.
We'll do our best to fit this in our next release.

Sorry for the delay

Best wishes

Philippe


On 16/02/2017 16:14, John Matese wrote:
Just checking, was this pooling issue addressed?  I saw the the API was recently updated : http://isa-tools.org/2017/01/isa-api-milestone/

moreaup...@gmail.com

unread,
Sep 6, 2017, 4:48:33 AM9/6/17
to ISAforum
Offers of financial loan to serious people.
Hello, we are individuals with capital and want to help people succeed. Then we offering loans of money to any person of good character and with a source of income which can enable it to repay the loan or able to create it. We offer loans ranging from 2,000 to 1,000,000 dollars with a rate of interest of 3% year. We would like to tell you that you will have to sign a loan agreement before the transfer of the loan requested on your bank account. You can get the loan requested in a 24 hour interval; 48 hour or 72 hour depending on the urgency of your request and your cooperation. We remain open to any investment proposal so do not hesitate to contact again us (moreaup...@gmail.com).
Thank you
Reply all
Reply to author
Forward
0 new messages