Local directories and single-end data

41 views
Skip to first unread message

Joanna R.

unread,
Nov 30, 2021, 11:23:12 AM11/30/21
to EvidentialGene
Dear Don,

I hope this finds you well!

I've left my postdoc for a job in industry but am still appreciating EvidentialGene and encouraging others to use it. I have a couple of questions:

1. I have the demo working, but I'm not sure I have the right syntax to start it at step 3 with local data. How should I specify starting at step 3 and tell it where to find data I have already downloaded? 

2. Unfortunately, the publicly available data for the species I now work on are single-end rather than paired-end. When I download from the SRA, the download script appears to split the single-end reads in half, making artificial pairs. Is this what it's supposed to do?

Thanks so much,

Joanna



Don Gilbert

unread,
Dec 4, 2021, 3:27:04 PM12/4/21
to Joanna R., chereddy....@dc.tohoku.ac.jp, EvidentialGene
Dear Joanna R. and Shankar,

SRA2Genes you are asking about has limited documentation and how-to
text as yet.  It is somewhat complex, so are other gene reconstruction pipelines,
and it is not too different in its basics, just better at this, I suppose :)

I have spent a bit of time just now re-running the SRA2Genes  Test Drive, which is your
best starting point: try it with sample data to see how it works. Then replace
samples with your own RNA data to use.   See updated "sra2genes_testdrive_help.txt" at
http://arthropods.eugenes.org/EvidentialGene/other/sra2genes_testdrive/sra2genes4v_testdrive/
 
Joanna q1: how to start at step 3 with local data?
  I need time yet to re-test this step. See updated test drive help:
    replace pairfa/ data with your own RNAset_1.fa and RNAset_2.fa

Joanna  q2: use  single-end rather than paired-end RNA data?
  This is not configured yet. Only paired-end Illumina/short read RNA works with current
assemblers that are configured in.  I've used it with long-read RNA some, but
you will need to do the assembly (or dis-assembly for reads longer than genes),
then start at Step 7, reducing over-assembly with tr2aacds, from trsets/ folder data.

Maybe you all can discuss how to work with SRA2Genes on this list <evident...@googlegroups.com>

-- Don Gilbert


--
You received this message because you are subscribed to the Google Groups "EvidentialGene" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidentialgen...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/evidentialgene/f5f1766e-bbfa-4f61-b0f1-d2db5ded6e17n%40googlegroups.com.


--
don gilbert - www.bio.net - bioinformatics - indiana.u.

Joanna R.

unread,
Dec 6, 2021, 11:15:21 AM12/6/21
to Don Gilbert, chereddy....@dc.tohoku.ac.jp, EvidentialGene
Dear Don,

Thanks, that's very helpful! And the sample data are definitely helpful - the pipeline ran smoothly both on the 1KP Austrocedrus sample and on other paired-end data from the SRA. I just wanted to make sure I was taking the best approach to adapting it to different use cases. 

I'll run an assortment of single-end hybrid assemblies (there's both Illumina and 454 data) and then start from step 7. I'm interested in functional work, so I'm hoping that with a combination of BLAST homology annotations and orthology from other species, I'll be able to design the right primers eventually.

Thanks again for your help!

Best,

Joanna
--
Joanna Rifkin PhD
Lead geneticist at ISPA Environmental Labs


Reply all
Reply to author
Forward
0 new messages