Very large Unmapped.out.mate files

76 views
Skip to first unread message

olfe...@gmail.com

unread,
Jan 5, 2025, 11:14:24 PMJan 5
to rna-star
Happy New Year Everyone,

My Unmapped.out.mate1 and Unmapped.out.mate2 fastq files appeared very large (3 Gb each). The size of my input HD0096V01_1.fq.gz HD0096V01_2.fq.gz files are 4.1 and 4.2 Gb corrspondingly. The rate of Uniquely mapped reads = 82.16%. The output bam file is taking only 5.4Gb of space, while  Unmapped.out.mate1 and Unmapped.out.mate2 each takes 3Gb of space. My data coming from BGI DNBSEQ150. Based on my previous experience the size of Unmapped.out.mate file should be approximately ~0.7Gb each and output bam should be ~8Gb.  I recently updated to STAR to 2.7.11b version.  I did it for a bunch of files with the same troubles. 

Any idea what could be the reason for that?



My command is as below:
```bash
STAR --genomeDir GRCh38_release_44_AR --genomeLoad LoadAndKeep --readFilesIn HD0096V01_1.fq.gz HD0096V01_2.fq.gz --readFilesCommand zcat --runThreadN 32 --outFilterMismatchNoverLmax 0.05 --outFilterScoreMinOverLread 0.90 --outFilterMatchNminOverLread 0.90 --outFilterMultimapNmax 20 --alignIntronMax 1000000 --outReadsUnmapped Fastx --quantMode GeneCounts --outStd SAM
```
My Log.final.out as follow:

```bash
Started job on | Jan 05 16:55:29
Started mapping on | Jan 05 16:55:30
Finished on | Jan 05 17:03:22
Mapping speed, Million of reads per hour | 458.87

Number of input reads | 60162948
Average input read length | 299
 UNIQUE READS:
Uniquely mapped reads number | 49427665
Uniquely mapped reads % | 82.16%
 Average mapped length | 298.33
Number of splices: Total | 24747153
Number of splices: Annotated (sjdb) | 24359267
Number of splices: GT/AG | 24393941
Number of splices: GC/AG | 198375
Number of splices: AT/AC | 9652
Number of splices: Non-canonical | 145185
Mismatch rate per base, % | 0.38%
Deletion rate per base | 0.01%
Deletion average length | 1.77
Insertion rate per base | 0.01%
Insertion average length | 1.51
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 1848269
 % of reads mapped to multiple loci | 3.07%
 Number of reads mapped to too many loci | 4099
% of reads mapped to too many loci | 0.01%
UNMAPPED READS:
 Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
 Number of reads unmapped: too short | 8854378
% of reads unmapped: too short | 14.72%
 Number of reads unmapped: other | 28537
% of reads unmapped: other | 0.05%
CHIMERIC READS:
Number of chimeric reads | 0
 % of chimeric reads | 0.00%
```

Reply all
Reply to author
Forward
0 new messages