sjdirs parameter majiq build

54 views
Skip to first unread message

Juan Ferrer-bonsoms

unread,
Apr 27, 2021, 7:28:11 AMApr 27
to majiq_voila
Hi, 

I have problems running majiq build.

$ majiq -v
2.2-e25c4ac

My code:

majiq build /home/annotation/ensembl.hg19.gff3 \
-c /home/majiq/confifile.txt \
-j 4 \
-o /home/majic/build

and my confifile.txt is:

[info]
readlen=100
bamdirs=~/home/BAM_output/
genome=hg19
standness=None
[experiment]
ref_samples=control_1Aligned.sortedByCoord.out,control_2Aligned.sortedByCoord.out,control_3Aligned.sortedByCoord.out
liver=liver_1Aligned.sortedByCoord.out,liver_2Aligned.sortedByCoord.out,liver_3Aligned.sortedByCoord.out


The output of majiq is:


/home/env/lib/python3.7/site-packages/majiq/src/config.py:82: UserWarning: sjdirs parameter not found in config file, using "./" instead
  'sjdirs parameter not found in config file, using "./" instead'
/home/env/lib/python3.7/site-packages/majiq/src/config.py:90: UserWarning: "readlen" parameter is deprecated and will not be used. MAJIQ now detects the maximum read length of each experiment automatically. 
  '"readlen" parameter is deprecated and will not be used.'
2021-04-27 13:17:24,123 (PID:3522805) - INFO - Majiq Build v2.2-e25c4ac
2021-04-27 13:17:24,124 (PID:3522805) - INFO - Command: /home/env/bin/majiq build /home/annotation/ensembl.hg19.gff3 -c /home/majic/confifile.txt -j 12 -o /home/majic/build
2021-04-27 13:17:24,124 (PID:3522805) - INFO - Parsing GFF3
2021-04-27 13:17:52,036 (PID:3522805) - INFO - Reading bamfiles
2021-04-27 13:17:52,038 (PID:3522805) - INFO - Detecting LSVs ngenes: 57773 
2021-04-27 13:18:47,572 (PID:3522805) - INFO - 0 LSV found
2021-04-27 13:18:49,938 (PID:3522805) - INFO - MAJIQ Builder is ended successfully!



I don't understand why it is asking for the parameter sjdirs. I didn't find information about this parameter in https://biociphers.bitbucket.io/majiq/quick.html#conf_file

moreover, I'm doing something wrong and I can't find the error, because 0 LSV found doesn't make sense. I can't find my mistake.

If it useful, I run STAR with the following parameters:

--alignEndsType EndToEnd
--chimSegmentMin 2 
--outFilterMismatchNmax 3
--alignIntronMax 299999

All the BAM files are sorted and indexed (.bai file)





Paul Jewell

unread,
Apr 27, 2021, 5:17:56 PMApr 27
to majiq_voila
Hello, 

I will clear up a few things first:

-If you have listed experiments, and the corresponding bamfiles are not found, the program will throw an error rather than running and finding zero LSVs, so I believe your bamfiles are being found.
-The sjdirs parameter is used together with the bamdirs parameter, if you are using incremental mode. Both will end up being defined, but it's totally fine that there are no sj files found. Basically, the message is superfluous, and not an error. 

Another common issue that can cause this is when the names of the chrosomes/seqids used in your bamfiles are different from those in whatever annotation db you are using. Have you verified they match?

-Paul

Juan Ferrer-bonsoms

unread,
Apr 28, 2021, 10:17:28 AMApr 28
to Paul Jewell, majiq_voila
Hi,

first of all, thanks for your soon answers.

I checked if my BAM files have the "chr" in the names of the chromosomes and they do.

I paste here the "head" of one of my .bam  and the .gff3 files

samtools view -H SRR1513329Aligned.sortedByCoord.out.bam
@HD VN:1.4 SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr2 LN:243199373
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566


samtools view  SRR1513329Aligned.sortedByCoord.out.bam | head
787594_0_40489_1020_156 419 chr1 14407 1 101M = 14462 156 CTGCTCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGC ?CCC@DDDDDDDDCDEFHHHDCCCCCHIJJJJJIIJJIHHHJJJJJJIIIJJJJJJJIJJJJJJJGEEEEHJJJFFFCCDDDDDDDDDDDDDDEGIFFFFA NH:i:3 HI:i:3 AS:i:196 nM:i:2
336917_1_39539_98_117 99 chr1 14460 0 101M = 14476 117 ACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGACAGAAGTCCCCGCCCCAGCTGTGTGGCCTCAAGCCAGCCTTCCGCTCCTT BDDEEEIJJJJJJJJIHHFFFFFF@9FFFFFFEEDDDDDDDDDCCBDDDDCCDACDDDDDDFFFFFFFHHGGFFHH8CFFHHHIGG@@CDCDDCDDDDDDC NH:i:5 HI:i:1 AS:i:198 nM:i:1
787594_0_40489_1020_156 339 chr1 14462 1 101M = 14407 -156 ACAGCGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGACAAAAGTCCCCGCCCCAGCTGTGTGGCCTCAAGCCAGCCTTCCGCTCCTTGA ?@?;@8?A@?5();DDDCC@CDDDDDDDDCDDDDEIIIHHJJJJJJIJJJJGJJJJJIJJJJJIHIJIGIIHHH?<DDD@:<5>CEEDDDDDECC????>C NH:i:3 HI:i:3 AS:i:196 nM:i:2
336917_1_39539_98_117 147 chr1 14476 0 101M = 14460 -117 TGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGACAGAAGTCCCCGCCCCAGCTGTGTGGCCTCAAGCCAGCCCTCCGCTCCTTGAAGCTGGTCTCCACA JJJJJJJJIIIIHFFJJJG>@FFFHHEEDDDDEEDDDAACGJJJJJJJIJJJJJJJJJJJIIJJIHIHHECDB@3EIGIHHGGHJJJIGJJJIGIEEDCCC NH:i:5 HI:i:1 AS:i:198 nM:i:1
739636_0_39539_1497_133 163 chr1 14501 3 101M = 14533 133 CACAGGCAGACAGAAGTCCCCGCCCCAGCTGTGTGGCCTCAAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCC @)7=FFFFFFHIJJJJJJJJJJJJJJJJJJJJJIIGFFIIGGIHHFDDDFFDDDD?CCFFFFFFDDDDHCCDDDDDDDDFF?8DDDDDDDDDDEFFEDADD NH:i:2 HI:i:1 AS:i:200 nM:i:0
739636_0_39539_1497_133 83 chr1 14533 3 101M = 14501 -133 GTGGCCTCAAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGC DDDD@HGDDDDDD><ADEDDDCCCDDEEDB<<???BDDDCCBHHHHHHHHGHGCCA2?GIHHHHHHHHF?95<5>A<::DDDDDBDDDDDDDDDDEEDDDC NH:i:2 HI:i:1 AS:i:200 nM:i:0
778826_1_39539_234_257 99 chr1 14596 3 101M = 14752 397 CCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCC CCDDDDEHIJJJJIJJJJCEEDDDDDDDEDBCDEEEDD77@009@BFFFBDDDDDDDDDEDDFFHHHHFFFHHHHEEEEDDDGHHIJJIJJIJJJGIIJJJ NH:i:2 HI:i:1 AS:i:202 nM:i:0
778826_1_39539_234_257 147 chr1 14752 3 78M140N23M = 14596 -397 GAGGAGGGATGGAGTCTGACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGCCTCATGACCAGCTTG BBBDDDDDDDDDDDDDDFFFFDDDCCCCCCB?DDDCC?<@GIIIHFFFBBDDFFDCCCACC@CCCCEECEDDDDDDDDDDDDDDDDDD?>3:=3<85&9B@ NH:i:2 HI:i:1 AS:i:202 nM:i:0
242194_0_39228_1079_169 419 chr1 16665 3 101M = 16733 261 CGCGGTTGAGGGTGGGAGTGGGGGTGCACTGGCCAGCACCTCAGGAGCTGGGGGTGGTGGTGGGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGT @FFFGGIJJJIJJJJJJJIJJJJJJJJJJIJJHAAHHHGFGEDEGHHGBDDDFEEEDDDDCHHHHHFFEHHHHHHFHHIJJJJIIIIIIIIJJJHHFFFFF NH:i:2 HI:i:2 AS:i:202 nM:i:0
242194_0_39228_1079_169 339 chr1 16733 3 33M92N68M = 16665 -261 GTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAGGGGCAGAGGGGG DDDDDDBDDDGCDDDDDDDDDDDEDDDDDCCCEEEEDC9@C?C@@@IJJIHHJJIHHGHHHHHGIJIIJJJJJJJJJJJJJJJJJJJJJIGIHDD=@AA9@ NH:i:2 HI:i:2 AS:i:202 nM:i:0



and the head of the .gff3 file:

head ensembl.hg19.gff3
chr14 ensGene gene 101511494 101518132 . + . ID=ENSG00000258861;Name=MIR381HG
chr14 ensGene transcript 101511494 101518132 . + . ID=ENST00000553692;Parent=ENSG00000258861;parent_name=MIR381HG
chr14 ensGene exon 101511494 101511521 . + . ID=exon:ENST00000553692:1;Parent=ENST00000553692;exon_number=1;exon_id=ENST00000553692.1;gene_name=MIR381HG
chr14 ensGene exon 101517404 101517606 . + . ID=exon:ENST00000553692:2;Parent=ENST00000553692;exon_number=2;exon_id=ENST00000553692.2;gene_name=MIR381HG
chr14 ensGene exon 101517939 101518132 . + . ID=exon:ENST00000553692:3;Parent=ENST00000553692;exon_number=3;exon_id=ENST00000553692.3;gene_name=MIR381HG
chr14 ensGene gene 60706839 60715483 . - . ID=ENSG00000254718;Name=CTD-2184C24.2
chr14 ensGene transcript 60706860 60712286 . - . ID=ENST00000553775;Parent=ENSG00000254718;parent_name=CTD-2184C24.2
chr14 ensGene exon 60706860 60707172 . - . ID=exon:ENST00000553775:3;Parent=ENST00000553775;exon_number=1;exon_id=ENST00000553775.1;gene_name=CTD-2184C24.2
chr14 ensGene exon 60709481 60709588 . - . ID=exon:ENST00000553775:2;Parent=ENST00000553775;exon_number=2;exon_id=ENST00000553775.2;gene_name=CTD-2184C24.2
chr14 ensGene exon 60712263 60712286 . - . ID=exon:ENST00000553775:1;Parent=ENST00000553775;exon_number=3;exon_id=ENST00000553775.3;gene_name=CTD-2184C24.2

Thanks in advance,

Juan





--
You received this message because you are subscribed to a topic in the Google Groups "majiq_voila" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/majiq_voila/FKUGdJLsJdk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to majiq_voila...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/majiq_voila/bc1a8b76-c832-4c6c-a2c5-32068ba48164n%40googlegroups.com.

Paul Jewell

unread,
Apr 28, 2021, 1:19:52 PMApr 28
to majiq_voila
Hi Juan, 

Thanks for confirming it. 

May you try removing the 'readlen' and 'strandness' parameters? One at a time, and together?

Thanks, 
-Paul
Reply all
Reply to author
Forward
0 new messages