Help for a beginner

22 views
Skip to first unread message

Krešimir Križanović

unread,
Nov 17, 2022, 7:51:36 AM11/17/22
to pasapipeline-users
Hello,

I've have a large genome assembly (from long reads) and also transcripts (from short reads).

I've downloaded PASA pipeline and am trying to use it via a Docker image. I've run a sample data using docker. 

However for my data I'm not sure how to obtain file transcripts.fasta.clean. I wanted to use seqclean utility (mentioned in the wiki), but I can find it.

I home someone can help me.

Brian Haas

unread,
Nov 17, 2022, 9:26:47 AM11/17/22
to Krešimir Križanović, pasapipeline-users
Hi,

In the docker, you'll find seqclean at:

/usr/local/src/PASApipeline/bin

best,

~b
> --
> You received this message because you are subscribed to the Google Groups "pasapipeline-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pasapipeline-us...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pasapipeline-users/dcb7b268-bfd9-4709-a70a-bed4c3042f14n%40googlegroups.com.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

Krešimir Križanović

unread,
Nov 30, 2022, 10:29:21 AM11/30/22
to pasapipeline-users
Hello, I appreciate the answer, and sorry for not acknowledging it earlier.

I have another question for a beginner :).

I've managed to run the pipeline from docker. However, when I compare BUSCO scores from a transcriptome before the Pasa and for sample_mydb_pasa.sqlite.assemblies.fasta, the file obtained from Pasa pipeline has lower busco.

Could someone clear why is that for me?

Brian Haas

unread,
Nov 30, 2022, 10:39:08 AM11/30/22
to Krešimir Križanović, pasapipeline-users
Usually this happens for a few reasons.

1. Only the subset of transcripts that meet rigorous alignment
validation criteria are used for PASA assembly
2. The quality of the genome matters - if there are gaps resulting in
partial alignments, these won't be included
3. If the input transcript data derive from a de novo assembly, these
could include artifacts (imperfections) that won't impact BUSCO but
would impact genome annotations - related to above 1 and 2.

Probably others too. If you take transcripts that have BUSCO but
aren't included in PASA assemblies, you could delve deeper into
specific examples and their reasons.

hope this helps,

~b

On Wed, Nov 30, 2022 at 10:29 AM Krešimir Križanović
> To view this discussion on the web visit https://groups.google.com/d/msgid/pasapipeline-users/fb265fc4-9a04-4ee2-a659-daa71180c808n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages