GFF3 error: gene doesn't contain Name attribute

85 views
Skip to first unread message

Paula Navarrete

unread,
Sep 16, 2021, 8:31:01 AM9/16/21
to majiq_voila
Hi,

When I run majiq build command with gff3 file downloaded from ensembl, which contains the Name attribute, I get the following when it reads the :
INFO - Error, Gene doesn't contain one of the Name attribute  information values: ['Name', 'gene_name']
I would like to know if this is just a warning or if not, how can I solve this issue. I checked the gff3 file downloaded from Ensembl and it does contain the Name attribute for genes.

Thank you in advance,

Paula

George Tollefson

unread,
Oct 25, 2021, 5:48:54 PM10/25/21
to majiq_voila
Hi, 

I'm experiencing the same error. Paula, were you able to solve this? Any update from the developers? I am running a default build command using the hg38 GFF3 file downloaded from ensembl.

Thank you,
George 

jai...@biociphers.org

unread,
Oct 25, 2021, 9:12:05 PM10/25/21
to majiq_voila
Dear Paula, George,

In the past, we've found that some GFF3 files have these attributes missing for only some of the records that MAJIQ parses as genes, or transcripts. I made a feature branch that should slightly more detailed logging when the GFF3 parsing hits the record with the missing attribute at majiq@log-gff3-missing-attributes. Could you checkout this branch, reinstall with the updated code, and rerun your analysis? It should describe more information about the records missing the desired attribute. Once you rerun, could you post the updated logged message?

Thanks,
Joseph

Tollefson, George

unread,
Oct 26, 2021, 3:07:11 PM10/26/21
to jai...@biociphers.org, majiq_voila
Hi Joseph,

Thank you very much for your response. I see now that only some lines may be causing the error. I think this may be the case with my run, I see that the .sj and .majiq files are all generated and around 200-300M in size. I've pasted the standard output for the run below which states that the Majiq Builder ended successfully. I tried checking out the feature branch via the bitbucket link you provided, but the checkout command which is displayed by clicking the "checkout" button on the feature branch only displays the branch name and not the complete command to checkout via my local machine. I'm not familiar with bitbucket commands and not sure how to format the checkout command to try it out. 

However, I do see by using grep that my gene of interest is within the .majiq files. Should I take time to troubleshoot using the feature branch, or do you suggest I continue with my analysis since I see my gene of interest in the majiq files and see that the build run was successful via the run output?

...
2021-10-25 17:43:48,365 (PID:154932) - INFO - Error, Gene doesn't contain one of the Name attribute  information values: ['Name', 'gene_name']
2021-10-25 17:43:48,365 (PID:154932) - INFO - Error, Gene doesn't contain one of the Name attribute  information values: ['Name', 'gene_name']
2021-10-25 17:43:50,638 (PID:154932) - INFO - Reading bamfiles
2021-10-25 17:43:50,645 (PID:154932) - INFO - Group h9, number of experiments: 6, minexperiments: 3
2021-10-25 17:43:50,645 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//H9-1_Aligned.sortedByCoord.out.bam
2021-10-25 17:45:13,204 (PID:154932) - INFO - Detect Intron retention H9-1_Aligned.sortedByCoord.out
2021-10-25 17:47:01,009 (PID:154932) - INFO - Done Reading file H9-1_Aligned.sortedByCoord.out
2021-10-25 17:47:02,050 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//H9-2_Aligned.sortedByCoord.out.bam
2021-10-25 17:48:31,110 (PID:154932) - INFO - Detect Intron retention H9-2_Aligned.sortedByCoord.out
2021-10-25 17:50:23,017 (PID:154932) - INFO - Done Reading file H9-2_Aligned.sortedByCoord.out
2021-10-25 17:50:24,318 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//H9-3_Aligned.sortedByCoord.out.bam
2021-10-25 17:51:52,040 (PID:154932) - INFO - Detect Intron retention H9-3_Aligned.sortedByCoord.out
2021-10-25 17:53:41,598 (PID:154932) - INFO - Done Reading file H9-3_Aligned.sortedByCoord.out
2021-10-25 17:53:42,596 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//H9-4_Aligned.sortedByCoord.out.bam
2021-10-25 17:55:07,300 (PID:154932) - INFO - Detect Intron retention H9-4_Aligned.sortedByCoord.out
2021-10-25 17:56:54,355 (PID:154932) - INFO - Done Reading file H9-4_Aligned.sortedByCoord.out
2021-10-25 17:56:55,592 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//H9-5_Aligned.sortedByCoord.out.bam
2021-10-25 17:58:21,444 (PID:154932) - INFO - Detect Intron retention H9-5_Aligned.sortedByCoord.out
2021-10-25 18:00:14,796 (PID:154932) - INFO - Done Reading file H9-5_Aligned.sortedByCoord.out
2021-10-25 18:00:15,904 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//H9-6_Aligned.sortedByCoord.out.bam
2021-10-25 18:01:42,110 (PID:154932) - INFO - Detect Intron retention H9-6_Aligned.sortedByCoord.out
2021-10-25 18:03:30,188 (PID:154932) - INFO - Done Reading file H9-6_Aligned.sortedByCoord.out
2021-10-25 18:03:31,302 (PID:154932) - INFO - Group sa23, number of experiments: 6, minexperiments: 3
2021-10-25 18:03:31,304 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//SA23-1_Aligned.sortedByCoord.out.bam
2021-10-25 18:05:09,920 (PID:154932) - INFO - Detect Intron retention SA23-1_Aligned.sortedByCoord.out
2021-10-25 18:07:07,643 (PID:154932) - INFO - Done Reading file SA23-1_Aligned.sortedByCoord.out
2021-10-25 18:07:08,840 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//SA23-2_Aligned.sortedByCoord.out.bam
2021-10-25 18:08:55,674 (PID:154932) - INFO - Detect Intron retention SA23-2_Aligned.sortedByCoord.out
2021-10-25 18:11:02,083 (PID:154932) - INFO - Done Reading file SA23-2_Aligned.sortedByCoord.out
2021-10-25 18:11:03,034 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//SA23-3_Aligned.sortedByCoord.out.bam
2021-10-25 18:12:34,898 (PID:154932) - INFO - Detect Intron retention SA23-3_Aligned.sortedByCoord.out
2021-10-25 18:14:25,251 (PID:154932) - INFO - Done Reading file SA23-3_Aligned.sortedByCoord.out
2021-10-25 18:14:26,476 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//SA23-4_Aligned.sortedByCoord.out.bam
2021-10-25 18:16:03,264 (PID:154932) - INFO - Detect Intron retention SA23-4_Aligned.sortedByCoord.out
2021-10-25 18:17:57,412 (PID:154932) - INFO - Done Reading file SA23-4_Aligned.sortedByCoord.out
2021-10-25 18:17:58,460 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//SA23-5_Aligned.sortedByCoord.out.bam
2021-10-25 18:19:32,521 (PID:154932) - INFO - Detect Intron retention SA23-5_Aligned.sortedByCoord.out
2021-10-25 18:21:22,455 (PID:154932) - INFO - Done Reading file SA23-5_Aligned.sortedByCoord.out
2021-10-25 18:21:23,498 (PID:154932) - INFO - Reading bam file User/george/genex_splicing_project/star_alignment_ensembl//SA23-6_Aligned.sortedByCoord.out.bam
2021-10-25 18:22:58,579 (PID:154932) - INFO - Detect Intron retention SA23-6_Aligned.sortedByCoord.out
2021-10-25 18:24:49,164 (PID:154932) - INFO - Done Reading file SA23-6_Aligned.sortedByCoord.out
2021-10-25 18:24:50,392 (PID:154932) - INFO - Detecting LSVs ngenes: 60664
2021-10-25 18:25:27,084 (PID:154932) - INFO - 179106 LSV found
2021-10-25 18:25:27,188 (PID:154932) - INFO - DUMP file b'H9-1_Aligned.sortedByCoord.out'
2021-10-25 18:25:47,345 (PID:154932) - INFO - Create majiq file
2021-10-25 18:25:50,146 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:25:51,657 (PID:154932) - INFO - H9-1_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:25:51,841 (PID:154932) - INFO - DUMP file b'H9-2_Aligned.sortedByCoord.out'
2021-10-25 18:26:09,163 (PID:154932) - INFO - Create majiq file
2021-10-25 18:26:11,966 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:26:13,378 (PID:154932) - INFO - H9-2_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:26:13,564 (PID:154932) - INFO - DUMP file b'H9-3_Aligned.sortedByCoord.out'
2021-10-25 18:26:30,372 (PID:154932) - INFO - Create majiq file
2021-10-25 18:26:33,149 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:26:34,587 (PID:154932) - INFO - H9-3_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:26:34,778 (PID:154932) - INFO - DUMP file b'H9-4_Aligned.sortedByCoord.out'
2021-10-25 18:26:51,076 (PID:154932) - INFO - Create majiq file
2021-10-25 18:26:53,837 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:26:55,406 (PID:154932) - INFO - H9-4_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:26:55,600 (PID:154932) - INFO - DUMP file b'H9-5_Aligned.sortedByCoord.out'
2021-10-25 18:27:11,770 (PID:154932) - INFO - Create majiq file
2021-10-25 18:27:14,525 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:27:16,115 (PID:154932) - INFO - H9-5_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:27:16,312 (PID:154932) - INFO - DUMP file b'H9-6_Aligned.sortedByCoord.out'
2021-10-25 18:27:32,323 (PID:154932) - INFO - Create majiq file
2021-10-25 18:27:35,041 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:27:36,631 (PID:154932) - INFO - H9-6_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:27:36,827 (PID:154932) - INFO - DUMP file b'SA23-1_Aligned.sortedByCoord.out'
2021-10-25 18:27:52,450 (PID:154932) - INFO - Create majiq file
2021-10-25 18:27:55,185 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:27:56,728 (PID:154932) - INFO - SA23-1_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:27:56,917 (PID:154932) - INFO - DUMP file b'SA23-2_Aligned.sortedByCoord.out'
2021-10-25 18:28:12,646 (PID:154932) - INFO - Create majiq file
2021-10-25 18:28:15,287 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:28:16,741 (PID:154932) - INFO - SA23-2_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:28:16,921 (PID:154932) - INFO - DUMP file b'SA23-3_Aligned.sortedByCoord.out'
2021-10-25 18:28:32,733 (PID:154932) - INFO - Create majiq file
2021-10-25 18:28:35,445 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:28:37,032 (PID:154932) - INFO - SA23-3_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:28:37,220 (PID:154932) - INFO - DUMP file b'SA23-4_Aligned.sortedByCoord.out'
2021-10-25 18:28:53,067 (PID:154932) - INFO - Create majiq file
2021-10-25 18:28:55,793 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:28:57,329 (PID:154932) - INFO - SA23-4_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:28:57,520 (PID:154932) - INFO - DUMP file b'SA23-5_Aligned.sortedByCoord.out'
2021-10-25 18:29:12,985 (PID:154932) - INFO - Create majiq file
2021-10-25 18:29:15,680 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:29:17,301 (PID:154932) - INFO - SA23-5_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:29:17,491 (PID:154932) - INFO - DUMP file b'SA23-6_Aligned.sortedByCoord.out'
2021-10-25 18:29:33,118 (PID:154932) - INFO - Create majiq file
2021-10-25 18:29:35,820 (PID:154932) - INFO - Dump majiq file
2021-10-25 18:29:37,413 (PID:154932) - INFO - SA23-6_Aligned.sortedByCoord.out: 138085 LSVs
2021-10-25 18:29:38,821 (PID:154932) - INFO - MAJIQ Builder is ended successfully!


Thank you,
George  

--
You received this message because you are subscribed to a topic in the Google Groups "majiq_voila" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/majiq_voila/aoec-f6gf8w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to majiq_voila...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/majiq_voila/08acfd61-2e0f-43ef-9d28-4626f9b53cb2n%40googlegroups.com.


--
George
Message has been deleted

Bea C

unread,
Oct 27, 2021, 8:37:59 AM10/27/21
to majiq_voila

Hi Joseph, George and Paula,

Thank George for your contribution with the last . 

I had the same Error. I saw that you could obtain results even with this error and I also tried. I can identify some LSVs (n=60384). Isn't this number pretty low?
I can see that I don't have so many novel LSVs in my data, and I don't know if it real o I am missing some of them due to the error in the gff3 file. 

I have the same question that Geroge. Could we ignore the error and continue with the analysis? Or should we take time to troubleshoot using the feature branch?

THANK YOU!

2021-10-27 11:15:23,199 (PID:3972689) - INFO - Error, Gene doesn't contain one of the Name attribute  information values: ['Name', 'gene_name']
2021-10-27 11:15:23,200 (PID:3972689) - INFO - Error, Gene doesn't contain one of the Name attribute  information values: ['Name', 'gene_name']
2021-10-27 11:15:23,201 (PID:3972689) - INFO - Error, Gene doesn't contain one of the Name attribute  information values: ['Name', 'gene_name']
2021-10-27 11:15:23,201 (PID:3972689) - INFO - Error, Gene doesn't contain one of the Name attribute  information values: ['Name', 'gene_name']
2021-10-27 11:15:23,201 (PID:3972689) - INFO - Error, Gene doesn't contain one of the Name attribute  information values: ['Name', 'gene_name']
2021-10-27 11:15:23,204 (PID:3972689) - INFO - Error, Gene doesn't contain one of the Name attribute  information values: ['Name', 'gene_name']
2021-10-27 11:15:23,205 (PID:3972689) - INFO - Error, Gene doesn't contain one of the Name attribute  information values: ['Name', 'gene_name']
2021-10-27 11:15:26,051 (PID:3972689) - INFO - Reading bamfiles
2021-10-27 11:15:26,061 (PID:3972689) - INFO - Group case, number of experiments: 2, minexperiments: 1
2021-10-27 11:15:26,061 (PID:3972689) - INFO - Reading bam file /65.1.genome.sorted.bam
2021-10-27 11:15:39,547 (PID:3972689) - INFO - Detect Intron retention 65.1.genome.sorted
2021-10-27 11:15:58,417 (PID:3972689) - INFO - Done Reading file 65.1.genome.sorted
2021-10-27 11:15:58,694 (PID:3972689) - INFO - Reading bam file /66.1.genome.sorted.bam
2021-10-27 11:16:11,752 (PID:3972689) - INFO - Detect Intron retention 66.1.genome.sorted
2021-10-27 11:16:30,113 (PID:3972689) - INFO - Done Reading file 66.1.genome.sorted
2021-10-27 11:16:30,348 (PID:3972689) - INFO - Group control, number of experiments: 2, minexperiments: 1
2021-10-27 11:16:30,349 (PID:3972689) - INFO - Reading bam file 70.1.genome.sorted.bam
2021-10-27 11:16:46,650 (PID:3972689) - INFO - Detect Intron retention 70.1.genome.sorted
2021-10-27 11:17:07,903 (PID:3972689) - INFO - Done Reading file 70.1.genome.sorted
2021-10-27 11:17:08,129 (PID:3972689) - INFO - Reading bam file 71.1.genome.sorted.bam
2021-10-27 11:17:22,799 (PID:3972689) - INFO - Detect Intron retention 71.1.genome.sorted
2021-10-27 11:17:42,112 (PID:3972689) - INFO - Done Reading file 71.1.genome.sorted
2021-10-27 11:17:42,415 (PID:3972689) - INFO - Detecting LSVs ngenes: 60664
2021-10-27 11:18:06,477 (PID:3972689) - INFO - 64925 LSV found
2021-10-27 11:18:06,498 (PID:3972689) - INFO - DUMP file b'65.1.genome.sorted'
2021-10-27 11:18:13,339 (PID:3972689) - INFO - Create majiq file
2021-10-27 11:18:14,862 (PID:3972689) - INFO - Dump majiq file
2021-10-27 11:18:15,485 (PID:3972689) - INFO - 65.1.genome.sorted: 60384 LSVs
2021-10-27 11:18:15,511 (PID:3972689) - INFO - DUMP file b'66.1.genome.sorted'
2021-10-27 11:18:22,708 (PID:3972689) - INFO - Create majiq file
2021-10-27 11:18:24,143 (PID:3972689) - INFO - Dump majiq file
2021-10-27 11:18:24,774 (PID:3972689) - INFO - 66.1.genome.sorted: 60384 LSVs
2021-10-27 11:18:24,800 (PID:3972689) - INFO - DUMP file b'70.1.genome.sorted'
2021-10-27 11:18:32,328 (PID:3972689) - INFO - Create majiq file
2021-10-27 11:18:33,721 (PID:3972689) - INFO - Dump majiq file
2021-10-27 11:18:34,332 (PID:3972689) - INFO - 70.1.genome.sorted: 60384 LSVs
2021-10-27 11:18:34,358 (PID:3972689) - INFO - DUMP file b'71.1.genome.sorted'
2021-10-27 11:18:41,539 (PID:3972689) - INFO - Create majiq file
2021-10-27 11:18:42,876 (PID:3972689) - INFO - Dump majiq file
2021-10-27 11:18:43,502 (PID:3972689) - INFO - 71.1.genome.sorted: 60384 LSVs
2021-10-27 11:18:44,373 (PID:3972689) - INFO - MAJIQ Builder is ended successfully!
Reply all
Reply to author
Forward
0 new messages