Dear everyone!
I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information.
Thank you!
Kang, Yang Jae
Ph.D.
Cropgenomics Lab.
College of Agriculture and Life Science
Seoul National University
Korea
_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Thank for your quick response Mike
I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file?
Thank you
Thank you for quick response again!
I found the non-ATG starting sequences in transcript file. I thought this would be the UTR traces, and I additionally found the offset value some position after ‘>’ letter. Is that indicate the starting ATG?
Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? The reason why I’m asking is that some genes were redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta.
Thank you
Thank you for quick response again!I found the non-ATG starting sequences in transcript file. I thought this would be the UTR traces, and
I additionally found the offset value some position after ‘>’ letter. Is that indicate the starting ATG?
Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? The reason why I’m asking is that some genes were
redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta.
_______________________________________________ maker-devel mailing list maker...@box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Only in the maker.transcripts.fasta will have offsets other than 0, you can use these to get the transcription offset. All other *.transcript.fasta files will always have an offset of 0 for the reason previously mentioned. Some genes will not start with ATG or have stop codons. These are partial models. Set always_complete=1 to reduce these.
Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files?
Final selected annotations go in the maker.proteins.fasta and maker.transcripts.fasta files. Raw unfiltered ab initio prediction from augustus go in the augustus_masked.proteins.fasta and augustus_masked.transcripts.fasta file (these are for reference purposes). A set of non-redundant rejected models go in the non-overlapping.transcripts.fasta and non-overlapping.proteins.fasta files (if you are missing a gene you expected to find, look in this file first – you can add them back if you find protein domains in them for example).
The reason why I’m asking is that some genes were redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta.
This is because some of the augustus generated models made it into the final annotation set.