Hi Manu,
1. SAM input to HTseq:
I believe this is caused by the
jM:B:c, jI:B:i, SAM attributes, please see this post. The solution is not to use --outSAMattributes All options.
With the latest STAR
patch, you can specify which attributes you need in the SAM output:
--outSAMattributes a string of desired SAM attributes, in the order desired for the output SAM
NH HI AS nM NM MD jM jI XS
Standard : NH HI AS nM
All : NH HI AS nM NM MD jM jI
None
2. I am not sure why RSEM fails to generate the transcript sequences, it's better to ask about it on the RSEM mailing list. I have used it once with ENSEMBL, like this:
rsem-prepare-reference --no-polyA --no-bowtie --no-ntog --gtf Homo_sapiens.GRCh37.66.chrOnly.gtf fullGenome.fa RSEMtr,
and it worked fine.
What is the size of the UCSC transcriptome file? It should be ~300MB.
Please try this option: --genomeChrBinNbits 14 (if it still does not work, go even lower to 13 or 12). Keep --limitGenomeGenerateRAM 7000000000, as it prevents you machine from swapping.
Cheers
Alex