I am trying to do the exact same as the person above, add eGFP to a genome. I've followed the directions above and then created a fastq with 4 reads which should all align to the eGFP gene. When I run the alignment and look at the eGFPAlignedtoTranscriptome.out.bam, only 1 of the 4 reads is present.
When I look at the eGFPAligned.out.bam, all four aligned reads are present and are aligned to the eGFP chromosome.
DGR4KXP1:301:H7LRPADXX:2:1207:10120:44392 0 eGFP 508 255 100M * 0 0 CACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGA CCCFFFFFHHHHHIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJIIJJJJJJJJJJHHHHFFFDDDDDDDDDDDDDDDDDDDDDDD@DD NH:i:1 HI:i:1 AS:i:98 nM:i:0
DGR4KXP1:301:H7LRPADXX:2:1207:10151:44403 0 eGFP 339 255 100M * 0 0 GAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTAC CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJHIJJJJJJJIJJJJJJJGIIJJJJJJJJJJIJJJJJHHHHHHHFFFFFFEEEFFEEDDDEDDDD NH:i:1 HI:i:1 AS:i:98 nM:i:0
DGR4KXP1:301:H7LRPADXX:2:1207:10488:44274 0 eGFP 170 255 100M * 0 0 CCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCC CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJIJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJIIJJJJJJJHHHHHFFFFFEEEEEDDBDDDDDDDDDDEE NH:i:1 HI:i:1 AS:i:98 nM:i:0
DGR4KXP1:301:H7LRPADXX:2:1207:10340:44430 0 eGFP 1 255 100M * 0 0 ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG CCCFFFFFHHHHHIJJJJJJJJJJJJJJJJJJJJJJJJJJHJJJJJJJJJJJJJJJJJJJJJJJJHHHHHFFFFFFFEEEEEEEEEEEDDDDDDDDDDDD NH:i:1 HI:i:1 AS:i:98 nM:i:0
I don't know why it won't recognize all reads as aligning to eGFP.
The exact steps I did to generate the genome indexes as are follows.
1. Created eGFP fasta file called eGFP.fa
First five lines of file
>eGFP
ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGAC
GGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC
GGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACC
CTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAG
2. Added eGFP to my gtf file
Last 2 lines of gtf, with the last line being the line added for eGFP
chrsM ENSEMBL exon 15356 15422 . - . gene_id "ENSMUSG00000064372.1"; transcript_id "ENSMUST00000082423.1"; gene_type "Mt_tRNA"; gene_status "KNOWN"; gene_name "mt-Tp"; transcript_type "Mt_tRNA"; transcript_status "KNOWN"; transcript_name "mt-Tp-201"; exon_number 1; exon_id "ENSMUSE00000521550.1"; level 3; transcript_support_level "NA"; tag "basic";
eGFP AddedGenes exon 1 720 . + 0 gene_id "eGFP"; transcript_id "eGFP";
3. Load star/2.4.2a
4. Run generate genome command
bsub -n 8 -R "span[hosts=1]" STAR --runMode genomeGenerate \
--runThreadN 8 \
--genomeFastaFiles GRCm38.p4.genome.fa eGFP.fa \
--genomeDir gencode.vM10eGFP2 \
--sjdbGTFfile gencode.vM10.annotation.gtf \
--sjdbOverhang 100
Any help would be greatly appreciated, and let me know if I can provide anymore information.
Thanks,
Matt