about database version

18 views
Skip to first unread message

wu__kang

unread,
Feb 14, 2024, 9:21:28 AMFeb 14
to alt_predictions@googleg…, altan...@gmail.com
Dear altanalyze friends:

The selectable highest database version is "EnsMart72" in the windows GUI of v2.1.4.4. As stated by the software/note, "EnsMart72" equals to hg19.

I wonder are there ways we could choose to download hg38-relevant or T2T CHM13v2.0/hs1-relevant databases, since it is the year of 2024 now.
Not matter using GUI or commond-line.

Many thanks for your clarification.

Kang Wu
Shanghai, China

Nathan Salomonis

unread,
Feb 14, 2024, 9:30:12 AMFeb 14
to alt_pre...@googlegroups.com, wu__...@126.com, altan...@gmail.com
Hi Kang,

We have different versions that have been accessible on the command line that we have not rolled out into the GUI, only because we had not extensively vetted the databases across many species. Ensembl 91 and 100 are available and can be downloaded for the GUI through "hidden" command-line options using the .exe or .app file. If T2T is a requirement, while the current Ensembl version is still hg38, there appears to be a mechanism to get a supporting database (https://rapid.ensembl.org/Homo_sapiens_GCA_009914755.4/Info/Index). Otherwise, follow the below directions:

For command-line AltAnalyze:
python AltAnalyze.py --species Hs --update Official --version EnsMart100 --additional all

For GUI Mac (cd on the terminal to the directory with the AltAnalyze executable):
./AltAnalyze.app/Contents/MacOS/AltAnalyze --species Hs --update Official --version EnsMart100 --additional all

For Windows (cd on the command-line to the directory with the AltAnalyze executable):
./AltAnalyze.exe --species Hs --update Official --version EnsMart100 --additional all #(note you won't see updates until it is complete)

Best,
Nathan


--
You received this message because you are subscribed to the Google Groups "Alternative Splicing and Functional Prediction" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alt_predictio...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/alt_predictions/febe677.195be.18da6419e2e.Coremail.wu__kang%40126.com.

Nathan Salomonis

unread,
Feb 18, 2024, 10:29:51 AMFeb 18
to wu__kang, "alt_predictions@googleg…", altan...@gmail.com
Hi Kang,

AltAnalyze provides the ability to generate gene expression and limited splicing results from FASTQ files using the software Kallisto (TPM measurements and known exon-exon junctions), as well as unbiased splicing results using BAM files.

For the database issue, for human this should install fine, but we don’t have a PC environment for immediate testing. Supporting this through the GUI will require some focused time. 

For the Kallisto analyses, you need to save a file such as groups.cancer.txt and comps.cancer.txt in a folder named ExpressionInput in the designated output folder. The first column of the groups file should have the common name of the paired-end FASTQ files. For example, tumor1_read1.fastq.gz will be tumor1. For BAM files (tumor1.bam), it will be tumor.bed. If you run and it errors out due to the names in the groups file (the GUI helps automatically name these files, only an issue on the command-line), it creates an arrays.txt file with the expected names for the groups file. Hope this helps.

Best,
Nathan

On Feb 18, 2024, at 6:27 AM, wu__kang <wu__...@126.com> wrote:


Hi Prof. Nathan,

I tried to analyze raw FASTQ files in a remote server, but I have no idea how to create groups and comps even though I have read the section of "Creating Groups and Comps Outside AltAnalyze" https://altanalyze.readthedocs.io/en/latest/ManualGroupsCompsCreation/.

Could you please tell me how to prepare the cognate groups and comps, as well as the codes? Or share me some other additional materials.

Thanks a lot.

Kang Wu

============================================
Hi Nathan,

I tried both the command-line scripts/codes and windows scripts/codes you recommended.
It seems windows one works.
However the command-line one displayed "EnsMart100 --additional is not a valid version of Ensembl, while EnsMart72 is.

I am afraid my windows computer could not accomplish the task due to RAM limit (i.e. 16 G), so I went with linux server in parallel.

---- Replied Message ----
FromNathan Salomonis<nsalo...@gmail.com>
Date2/14/2024 22:30
To<alt_pre...@googlegroups.com>,
<wu__...@126.com>
Ccaltan...@gmail.com<altan...@gmail.com>
SubjectRe: AltAnalyze User Group about database version

Nathan Salomonis

unread,
Feb 18, 2024, 10:30:07 AMFeb 18
to wu__kang, "alt_predictions@googleg…", altan...@gmail.com
Hi Kang,

Nathan Salomonis

unread,
Feb 23, 2024, 3:04:22 PMFeb 23
to wu__kang, alt_predictions@googleg…, altan...@gmail.com
Sorry for the late reply. EnsMart100 will match to any hg38 but not T2T. 
Best,
Nathan


On Sun, Feb 18, 2024 at 10:57 PM wu__kang <wu__...@126.com> wrote:
Hi Prof. Nathan,

It seems BAM is better than FASTQ, when it comes to splicing results, as you mentioned or from the online manual.
I'd better to use BAM files.

My question is:
Since I downloaded EnsMart100, which ensemble version of release should I use to prepare BAM files? Release-100 or later releases are all OK?

Thanks.

Kang
Reply all
Reply to author
Forward
0 new messages