Thanks so much for the response. I'll test your recommendations, but in the meantime, here is an example Log.final.out where we had issues:
Started job on | Jun 14 21:10:05
Started mapping on | Jun 14 21:21:18
Finished on | Jun 14 21:30:55
Mapping speed, Million of reads per hour | 813.41
Number of input reads | 130371712
Average input read length | 60
UNIQUE READS:
Uniquely mapped reads number | 77430509
Uniquely mapped reads % | 59.39%
Average mapped length | 59.18
Number of splices: Total | 5823745
Number of splices: Annotated (sjdb) | 5776569
Number of splices: GT/AG | 5678374
Number of splices: GC/AG | 22112
Number of splices: AT/AC | 5403
Number of splices: Non-canonical | 117856
Mismatch rate per base, % | 1.12%
Deletion rate per base | 0.03%
Deletion average length | 1.42
Insertion rate per base | 0.03%
Insertion average length | 1.27
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 38461826
% of reads mapped to multiple loci | 29.50%
Number of reads mapped to too many loci | 456
% of reads mapped to too many loci | 0.00%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 10.90%
% of reads unmapped: other | 0.21%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
For comparison, here is what our mapping to the human reference looks like.
Started job on | Mar 19 00:41:17
Started mapping on | Mar 19 01:03:00
Finished on | Mar 19 01:40:06
Mapping speed, Million of reads per hour | 961.75
Number of input reads | 594681159
Average input read length | 66
UNIQUE READS:
Uniquely mapped reads number | 459120957
Uniquely mapped reads % | 77.20%
Average mapped length | 65.53
Number of splices: Total | 128971189
Number of splices: Annotated (sjdb) | 128712920
Number of splices: GT/AG | 127927630
Number of splices: GC/AG | 610176
Number of splices: AT/AC | 37485
Number of splices: Non-canonical | 395898
Mismatch rate per base, % | 0.96%
Deletion rate per base | 0.01%
Deletion average length | 1.28
Insertion rate per base | 0.00%
Insertion average length | 1.19
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 97235568
% of reads mapped to multiple loci | 16.35%
Number of reads mapped to too many loci | 1127
% of reads mapped to too many loci | 0.00%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 4.22%
% of reads unmapped: other | 2.23%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
The difference in multi-mappers is about 10-15%, but certainly it seems to significantly change the percentage of pseudogenes called later. I figured it likely had to do with the genome assembly but I'm not exactly sure why the mouse and human genomes (from the same source) would differ.