rsem-calculate-expression: 0 expected read counts

765 views
Skip to first unread message

pathunk

unread,
Apr 25, 2012, 6:17:25 PM4/25/12
to RSEM Users
Hi rsem community,

When I run rsem-calculate-expression, the expected counts for nearly
all my 'genes' (contigs) are 0. The very few that are not 0 are very
low numbers. This is puzzling since the reference was assembled with
the same libraries that I gave to rsem to calculate expression. All 6
paired-end libraries are experimental expression assays from the same
genotype.

I built a de novo transcriptome assembly using Abyss. The assembly has
261,872 contigs and N50 = 1389. The assembly is rough as indicated by
the very large number of contigs, but I would still expect that an
appreciable fraction of reads would align. Yet rsem's output reports:

/Volumes/pichia/aman/bowtie-0.12.7/bowtie -q --phred33-quals -n 2 -e
99999999 -l 25 -I 1 -X 1000 -p 4 -a -m 200 -S k76_nn/rsem_k76_nn/
k76_nn_ref -1
fp_asg105.fa,fp_asg106.fa,fp_asg107.fa,fp_asg108.fa,fp_asg109.fa,fp_asg110.fa
-2
rp_asg105.fa,rp_asg106.fa,rp_asg107.fa,rp_asg108.fa,rp_asg109.fa,rp_asg110.fa
| gzip > k76_nn_rsem.sam.gz
# reads processed: 55173258
# reads with at least one reported alignment: 6872 (0.01%)
# reads that failed to align: 55166386 (99.99%)
Reported 106235 paired-end alignments to 1 output stream(s)

Looking over the posts in this group, others seem to have had this
problem but I couldn't find a solution. Any ideas on what's going
wrong? Could this be a problem with bowtie? For other purposes I've
been using bowtie2 for alignments since it's recommended for longer
read lengths (mine are 100bp reads).

My commands:
/Volumes/pichia/aman/rsem-1.1.18-modified/rsem-prepare-reference --no-
polyA --bowtie-path /Volumes/pichia/aman/bowtie-0.12.7 Ua_76_nn-
contigs.fa k76_nn_ref

nohup /Volumes/pichia/aman/rsem-1.1.18-modified/rsem-calculate-
expression -p 4 --calc-ci --tag non-unique --bowtie-path /Volumes/
pichia/aman/bowtie-0.12.7 --paired-end
fp_asg105.fa,fp_asg106.fa,fp_asg107.fa,fp_asg108.fa,fp_asg109.fa,fp_asg110.fa
rp_asg105.fa,rp_asg106.fa,rp_asg107.fa,rp_asg108.fa,rp_asg109.fa,rp_asg110.fa
k76_nn/rsem_k76_nn/k76_nn_ref k76_nn_rsem &

Here's the first few lines of my output file,
k76_nn_rsem.genes.results:
1 0.00 0 0 4.72016114363459e-05 1
10 0.00 0 0 5.76845577305948e-06 10
100 0.00 0 0 0 100
1000 0.00 0 0 1.80834669529904e-05 1000
10000 0.00 0 0 0 10000
100000 0.00 0 0 0 100000
100001 0.00 0 0 0 100001
100002 0.00 0 0 0 100002
100004 0.00 0 0 0 100004
100005 0.00 0 0 0 100005
100007 0.00 0 0 0 100007
100008 0.00 0 0 0 100008
100009 0.00 0 0 0 100009
10001 0.00 0 0 3.05669191568949e-06 10001
100010 0.00 0 0 0 100010
100013 0.00 0 0 0 100013

k76_nn_rsem.isoforms.results begins the same way and is nearly
identical, as expected given that this is a raw unannotated assembly.

Colin Dewey

unread,
Apr 27, 2012, 2:15:45 PM4/27/12
to rsem-...@googlegroups.com
Hi pathunk,

I noticed that you have paired-end reads. One thing you might check is whether your assembler is using the paired-end data to scaffold your contigs, or if it is simply returning just the contigs. RSEM requires that both ends of a read pair align to the *same transcript/contig sequence* in order for that read pair to have a valid alignment. If your assembly is not scaffolded, then many of your read pairs will not have a valid alignment according to this criterion.

If you can only get contigs, then for quantification with RSEM, you will need to only use one end of your paired-end reads (i.e., treat one end of your paired-end reads as single-end read data). For example, try:

nohup /Volumes/pichia/aman/rsem-1.1.18-modified/rsem-calculate-
expression -p 4 --calc-ci --tag non-unique --bowtie-path /Volumes/
pichia/aman/bowtie-0.12.7 fp_asg105.fa,fp_asg106.fa,fp_asg107.fa,fp_asg108.fa,fp_asg109.fa,fp_asg110.fa

Also, no need to use bowtie2 here. RSEM uses bowtie (v1) in a customized way for optimal quantification and has no problem with longer reads (with the exception that indels are currently not supported).

Best,
Colin

pathunk

unread,
May 5, 2012, 9:59:22 AM5/5/12
to RSEM Users
Hi Colin,
Thanks for your input. As you suggested, I ran calculate-expression
using only my forward reads, treating them as single-end. 86% of the
reads aligned, a big improvement. But rsem is getting stuck at rsem-
run-em. I searched for threads addressing this problem, but couldn't
find anything that's likely to apply to me. When I ran this version of
rsem with paired-end reads, I didn't get stuck at this step (though I
had close to 0% of reads aligning).

command:
nohup /Volumes/pichia/aman/rsem-1.1.18-modified/rsem-calculate-
expression -p 4 --calc-ci --tag non-unique --bowtie-path /Volumes/
pichia/aman/bowtie-0.12.7
fp_asg105.fa,fp_asg106.fa,fp_asg107.fa,fp_asg108.fa,fp_asg109.fa,fp_asg110.fa
k76_nn/rsem_k76_nn/k76_nn_ref k76_nn_rsem_fp &

rsem-run-em runs indefinitely. When I kill it (>24 hours after
initiation), log reads:

/Volumes/pichia/aman/rsem-1.1.18-modified/rsem-run-em k76_nn/
rsem_k76_nn/k76_nn_ref 1 k76_nn_rsem_fp k76_nn_rsem_fp -p 4 -b s
k76_nn_rsem_fp.sam.gz 0 --gibbs-out
"/Volumes/pichia/aman/rsem-1.1.18-modified/rsem-run-em k76_nn/
rsem_k76_nn/k76_nn_ref 1 k76_nn_rsem_fp k76_nn_rsem_fp -p 4 -b s
k76_nn_rsem_fp.sam.gz 0 --gibbs-out" failed! Plase check if you
provide correct parameters/options for the pipeline!

b...@cs.wisc.edu

unread,
May 5, 2012, 2:29:49 PM5/5/12
to rsem-...@googlegroups.com
Hi Pathunk,

Is there any intermediate output generated by rsem-run-em?

Thanks,
Bo

pathunk

unread,
May 5, 2012, 8:01:38 PM5/5/12
to RSEM Users
Hi Bo, I am not sure which if any are from rsem-run-em, but these are
the folders and files I get:

k76_nn_rsem_fp.sam.gz (12.2gb)
k76_nn_rsem_fp.stat (folder)
k76_nn_rsem_fp.cnt (4kb)
k76_nn_rsem_fp_run3.temp (folder)
k76_nn_rsem_fp_alignable.fq (7.6 gb)
k76_nn_rsem_fp_alignable.fq.ridx (148mb)
k76_nn_rsem_fp_un.fq (1.22gb)
k76_nn_rsem_fp.dat (3.88gb)
k76_nn_rsem_fp.mparams (4kb)

b...@cs.wisc.edu

unread,
May 5, 2012, 8:17:37 PM5/5/12
to rsem-...@googlegroups.com
Hi Pathunk,

Can you find a "nohup.out" file?

Best,
Bo

pathunk

unread,
May 6, 2012, 10:19:20 AM5/6/12
to RSEM Users
Hi Bo, yes there's also a nohup.out file. I pasted the tail end of the
output from that file earlier, but here's the full contents.


/Volumes/pichia/aman/bowtie-0.12.7/bowtie -q --phred33-quals -n 2 -e
99999999 -l 25 -p 4 -a -m 200 -S k76_nn/rsem_k76_nn/k76_nn_ref
fp_asg105.fa,fp_asg106.fa,fp_asg107.fa,fp_asg108.fa | gzip >
k76_nn_rsem_fp_run3.sam.gz
# reads processed: 40670279
# reads with at least one reported alignment: 35052287 (86.19%)
# reads that failed to align: 5257258 (12.93%)
# reads with alignments suppressed due to -m: 360734 (0.89%)
Reported 386736847 alignments to 1 output stream(s)

/Volumes/pichia/aman/rsem-1.1.18-modified/rsem-parse-alignments k76_nn/
rsem_k76_nn/k76_nn_ref k76_nn_rsem_fp_run3 k76_nn_rsem_fp_run3 s
k76_nn_rsem_fp_run3.sam.gz -t 1 -tag non-unique
[samopen] SAM header is present: 261872 sequences.
Parsed 1000000 entries
Parsed 2000000 entries
Parsed 3000000 entries
Parsed 4000000 entries
Parsed 5000000 entries
Parsed 6000000 entries
Parsed 7000000 entries
Parsed 8000000 entries
Parsed 9000000 entries
Parsed 10000000 entries
Parsed 11000000 entries
Parsed 12000000 entries
Parsed 13000000 entries
Parsed 14000000 entries
Parsed 15000000 entries
Parsed 16000000 entries
Parsed 17000000 entries
Parsed 18000000 entries
Parsed 19000000 entries
Parsed 20000000 entries
Parsed 21000000 entries
Parsed 22000000 entries
Parsed 23000000 entries
Parsed 24000000 entries
Parsed 25000000 entries
Parsed 26000000 entries
Parsed 27000000 entries
Parsed 28000000 entries
Parsed 29000000 entries
Parsed 30000000 entries
Parsed 31000000 entries
Parsed 32000000 entries
Parsed 33000000 entries
Parsed 34000000 entries
Parsed 35000000 entries
Parsed 36000000 entries
Parsed 37000000 entries
Parsed 38000000 entries
Parsed 39000000 entries
Parsed 40000000 entries
Parsed 41000000 entries
Parsed 42000000 entries
Parsed 43000000 entries
Parsed 44000000 entries
Parsed 45000000 entries
Parsed 46000000 entries
Parsed 47000000 entries
Parsed 48000000 entries
Parsed 49000000 entries
Parsed 50000000 entries
Parsed 51000000 entries
Parsed 52000000 entries
Parsed 53000000 entries
Parsed 54000000 entries
Parsed 55000000 entries
Parsed 56000000 entries
Parsed 57000000 entries
Parsed 58000000 entries
Parsed 59000000 entries
Parsed 60000000 entries
Parsed 61000000 entries
Parsed 62000000 entries
Parsed 63000000 entries
Parsed 64000000 entries
Parsed 65000000 entries
Parsed 66000000 entries
Parsed 67000000 entries
Parsed 68000000 entries
Parsed 69000000 entries
Parsed 70000000 entries
Parsed 71000000 entries
Parsed 72000000 entries
Parsed 73000000 entries
Parsed 74000000 entries
Parsed 75000000 entries
Parsed 76000000 entries
Parsed 77000000 entries
Parsed 78000000 entries
Parsed 79000000 entries
Parsed 80000000 entries
Parsed 81000000 entries
Parsed 82000000 entries
Parsed 83000000 entries
Parsed 84000000 entries
Parsed 85000000 entries
Parsed 86000000 entries
Parsed 87000000 entries
Parsed 88000000 entries
Parsed 89000000 entries
Parsed 90000000 entries
Parsed 91000000 entries
Parsed 92000000 entries
Parsed 93000000 entries
Parsed 94000000 entries
Parsed 95000000 entries
Parsed 96000000 entries
Parsed 97000000 entries
Parsed 98000000 entries
Parsed 99000000 entries
Parsed 100000000 entries
Parsed 101000000 entries
Parsed 102000000 entries
Parsed 103000000 entries
Parsed 104000000 entries
Parsed 105000000 entries
Parsed 106000000 entries
Parsed 107000000 entries
Parsed 108000000 entries
Parsed 109000000 entries
Parsed 110000000 entries
Parsed 111000000 entries
Parsed 112000000 entries
Parsed 113000000 entries
Parsed 114000000 entries
Parsed 115000000 entries
Parsed 116000000 entries
Parsed 117000000 entries
Parsed 118000000 entries
Parsed 119000000 entries
Parsed 120000000 entries
Parsed 121000000 entries
Parsed 122000000 entries
Parsed 123000000 entries
Parsed 124000000 entries
Parsed 125000000 entries
Parsed 126000000 entries
Parsed 127000000 entries
Parsed 128000000 entries
Parsed 129000000 entries
Parsed 130000000 entries
Parsed 131000000 entries
Parsed 132000000 entries
Parsed 133000000 entries
Parsed 134000000 entries
Parsed 135000000 entries
Parsed 136000000 entries
Parsed 137000000 entries
Parsed 138000000 entries
Parsed 139000000 entries
Parsed 140000000 entries
Parsed 141000000 entries
Parsed 142000000 entries
Parsed 143000000 entries
Parsed 144000000 entries
Parsed 145000000 entries
Parsed 146000000 entries
Parsed 147000000 entries
Parsed 148000000 entries
Parsed 149000000 entries
Parsed 150000000 entries
Parsed 151000000 entries
Parsed 152000000 entries
Parsed 153000000 entries
Parsed 154000000 entries
Parsed 155000000 entries
Parsed 156000000 entries
Parsed 157000000 entries
Parsed 158000000 entries
Parsed 159000000 entries
Parsed 160000000 entries
Parsed 161000000 entries
Parsed 162000000 entries
Parsed 163000000 entries
Parsed 164000000 entries
Parsed 165000000 entries
Parsed 166000000 entries
Parsed 167000000 entries
Parsed 168000000 entries
Parsed 169000000 entries
Parsed 170000000 entries
Parsed 171000000 entries
Parsed 172000000 entries
Parsed 173000000 entries
Parsed 174000000 entries
Parsed 175000000 entries
Parsed 176000000 entries
Parsed 177000000 entries
Parsed 178000000 entries
Parsed 179000000 entries
Parsed 180000000 entries
Parsed 181000000 entries
Parsed 182000000 entries
Parsed 183000000 entries
Parsed 184000000 entries
Parsed 185000000 entries
Parsed 186000000 entries
Parsed 187000000 entries
Parsed 188000000 entries
Parsed 189000000 entries
Parsed 190000000 entries
Parsed 191000000 entries
Parsed 192000000 entries
Parsed 193000000 entries
Parsed 194000000 entries
Parsed 195000000 entries
Parsed 196000000 entries
Parsed 197000000 entries
Parsed 198000000 entries
Parsed 199000000 entries
Parsed 200000000 entries
Parsed 201000000 entries
Parsed 202000000 entries
Parsed 203000000 entries
Parsed 204000000 entries
Parsed 205000000 entries
Parsed 206000000 entries
Parsed 207000000 entries
Parsed 208000000 entries
Parsed 209000000 entries
Parsed 210000000 entries
Parsed 211000000 entries
Parsed 212000000 entries
Parsed 213000000 entries
Parsed 214000000 entries
Parsed 215000000 entries
Parsed 216000000 entries
Parsed 217000000 entries
Parsed 218000000 entries
Parsed 219000000 entries
Parsed 220000000 entries
Parsed 221000000 entries
Parsed 222000000 entries
Parsed 223000000 entries
Parsed 224000000 entries
Parsed 225000000 entries
Parsed 226000000 entries
Parsed 227000000 entries
Parsed 228000000 entries
Parsed 229000000 entries
Parsed 230000000 entries
Parsed 231000000 entries
Parsed 232000000 entries
Parsed 233000000 entries
Parsed 234000000 entries
Parsed 235000000 entries
Parsed 236000000 entries
Parsed 237000000 entries
Parsed 238000000 entries
Parsed 239000000 entries
Parsed 240000000 entries
Parsed 241000000 entries
Parsed 242000000 entries
Parsed 243000000 entries
Parsed 244000000 entries
Parsed 245000000 entries
Parsed 246000000 entries
Parsed 247000000 entries
Parsed 248000000 entries
Parsed 249000000 entries
Parsed 250000000 entries
Parsed 251000000 entries
Parsed 252000000 entries
Parsed 253000000 entries
Parsed 254000000 entries
Parsed 255000000 entries
Parsed 256000000 entries
Parsed 257000000 entries
Parsed 258000000 entries
Parsed 259000000 entries
Parsed 260000000 entries
Parsed 261000000 entries
Parsed 262000000 entries
Parsed 263000000 entries
Parsed 264000000 entries
Parsed 265000000 entries
Parsed 266000000 entries
Parsed 267000000 entries
Parsed 268000000 entries
Parsed 269000000 entries
Parsed 270000000 entries
Parsed 271000000 entries
Parsed 272000000 entries
Parsed 273000000 entries
Parsed 274000000 entries
Parsed 275000000 entries
Parsed 276000000 entries
Parsed 277000000 entries
Parsed 278000000 entries
Parsed 279000000 entries
Parsed 280000000 entries
Parsed 281000000 entries
Parsed 282000000 entries
Parsed 283000000 entries
Parsed 284000000 entries
Parsed 285000000 entries
Parsed 286000000 entries
Parsed 287000000 entries
Parsed 288000000 entries
Parsed 289000000 entries
Parsed 290000000 entries
Parsed 291000000 entries
Parsed 292000000 entries
Parsed 293000000 entries
Parsed 294000000 entries
Parsed 295000000 entries
Parsed 296000000 entries
Parsed 297000000 entries
Parsed 298000000 entries
Parsed 299000000 entries
Parsed 300000000 entries
Parsed 301000000 entries
Parsed 302000000 entries
Parsed 303000000 entries
Parsed 304000000 entries
Parsed 305000000 entries
Parsed 306000000 entries
Parsed 307000000 entries
Parsed 308000000 entries
Parsed 309000000 entries
Parsed 310000000 entries
Parsed 311000000 entries
Parsed 312000000 entries
Parsed 313000000 entries
Parsed 314000000 entries
Parsed 315000000 entries
Parsed 316000000 entries
Parsed 317000000 entries
Parsed 318000000 entries
Parsed 319000000 entries
Parsed 320000000 entries
Parsed 321000000 entries
Parsed 322000000 entries
Parsed 323000000 entries
Parsed 324000000 entries
Parsed 325000000 entries
Parsed 326000000 entries
Parsed 327000000 entries
Parsed 328000000 entries
Parsed 329000000 entries
Parsed 330000000 entries
Parsed 331000000 entries
Parsed 332000000 entries
Parsed 333000000 entries
Parsed 334000000 entries
Parsed 335000000 entries
Parsed 336000000 entries
Parsed 337000000 entries
Parsed 338000000 entries
Parsed 339000000 entries
Parsed 340000000 entries
Parsed 341000000 entries
Parsed 342000000 entries
Parsed 343000000 entries
Parsed 344000000 entries
Parsed 345000000 entries
Parsed 346000000 entries
Parsed 347000000 entries
Parsed 348000000 entries
Parsed 349000000 entries
Parsed 350000000 entries
Parsed 351000000 entries
Parsed 352000000 entries
Parsed 353000000 entries
Parsed 354000000 entries
Parsed 355000000 entries
Parsed 356000000 entries
Parsed 357000000 entries
Parsed 358000000 entries
Parsed 359000000 entries
Parsed 360000000 entries
Parsed 361000000 entries
Parsed 362000000 entries
Parsed 363000000 entries
Parsed 364000000 entries
Parsed 365000000 entries
Parsed 366000000 entries
Parsed 367000000 entries
Parsed 368000000 entries
Parsed 369000000 entries
Parsed 370000000 entries
Parsed 371000000 entries
Parsed 372000000 entries
Parsed 373000000 entries
Parsed 374000000 entries
Parsed 375000000 entries
Parsed 376000000 entries
Parsed 377000000 entries
Parsed 378000000 entries
Parsed 379000000 entries
Parsed 380000000 entries
Parsed 381000000 entries
Parsed 382000000 entries
Parsed 383000000 entries
Parsed 384000000 entries
Parsed 385000000 entries
Parsed 386000000 entries
Parsed 387000000 entries
Parsed 388000000 entries
Parsed 389000000 entries
Parsed 390000000 entries
Parsed 391000000 entries
Parsed 392000000 entries
Done!

/Volumes/pichia/aman/rsem-1.1.18-modified/rsem-build-read-index 32 1 0
k76_nn_rsem_fp_run3.temp/k76_nn_rsem_fp_run3_alignable.fq
FIN 1000000
FIN 2000000
FIN 3000000
FIN 4000000
FIN 5000000
FIN 6000000
FIN 7000000
FIN 8000000
FIN 9000000
FIN 10000000
FIN 11000000
FIN 12000000
FIN 13000000
FIN 14000000
FIN 15000000
FIN 16000000
FIN 17000000
FIN 18000000
FIN 19000000
FIN 20000000
FIN 21000000
FIN 22000000
FIN 23000000
FIN 24000000
FIN 25000000
FIN 26000000
FIN 27000000
FIN 28000000
FIN 29000000
FIN 30000000
FIN 31000000
FIN 32000000
FIN 33000000
FIN 34000000
FIN 35000000
Build Index k76_nn_rsem_fp_run3.temp/k76_nn_rsem_fp_run3_alignable.fq
is Done!

/Volumes/pichia/aman/rsem-1.1.18-modified/rsem-run-em k76_nn/
rsem_k76_nn/k76_nn_ref 1 k76_nn_rsem_fp_run3 k76_nn_rsem_fp_run3 -p 4 -
b s k76_nn_rsem_fp_run3.sam.gz 0 --gibbs-out
"/Volumes/pichia/aman/rsem-1.1.18-modified/rsem-run-em k76_nn/
rsem_k76_nn/k76_nn_ref 1 k76_nn_rsem_fp_run3 k76_nn_rsem_fp_run3 -p 4 -
b s k76_nn_rsem_fp_run3.sam.gz 0 --gibbs-out" failed! Plase check if
you provide correct parameters/options for the pipeline!



b...@cs.wisc.edu

unread,
May 6, 2012, 11:49:52 AM5/6/12
to rsem-...@googlegroups.com
Hi Pathunk,

That's really strange. Can you try to only use one reads file (say
fp_asg105.fa) and see if RSEM can finish?

Best,
Bo

pathunk

unread,
May 7, 2012, 9:52:37 AM5/7/12
to RSEM Users
Hi Bo, no luck. I started a run using a single read file more than 20
hours ago, and it's still on rsem-run-em (I didn't kill the process
this time; it's still running). nohup.out looks basically the same as
before, and so does the set of intermediate files created. The
nohup.out file was last modified about 2 hours after I initiated rsem-
calculate-expression.

nohup /Volumes/pichia/aman/rsem-1.1.18-modified/rsem-calculate-
expression -p 4 --calc-ci --tag non-unique --bowtie-path /Volumes/
pichia/aman/bowtie-0.12.7 fp_asg105.fa k76_nn/rsem_k76_nn/k76_nn_ref
k76_nn_rsem_fp_run4 &

nohup.out:
/Volumes/pichia/aman/bowtie-0.12.7/bowtie -q --phred33-quals -n 2 -e
99999999 -l 25 -p 4 -a -m 200 -S k76_nn/rsem_k76_nn/k76_nn_ref
fp_asg105.fa | gzip > k76_nn_rsem_fp_run4.sam.gz
# reads processed: 11133404
# reads with at least one reported alignment: 9566220 (85.92%)
# reads that failed to align: 1477080 (13.27%)
# reads with alignments suppressed due to -m: 90104 (0.81%)
Reported 104810892 alignments to 1 output stream(s)

/Volumes/pichia/aman/rsem-1.1.18-modified/rsem-parse-alignments k76_nn/
rsem_k76_nn/k76_nn_ref k76_nn_rsem_fp_run4 k76_nn_rsem_fp_run4 s
k76_nn_rsem_fp_run4.sam.gz -t 1 -tag non-unique
Done!

/Volumes/pichia/aman/rsem-1.1.18-modified/rsem-build-read-index 32 1 0
k76_nn_rsem_fp_run4.temp/k76_nn_rsem_fp_run4_alignable.fq
FIN 1000000
FIN 2000000
FIN 3000000
FIN 4000000
FIN 5000000
FIN 6000000
FIN 7000000
FIN 8000000
FIN 9000000
Build Index k76_nn_rsem_fp_run4.temp/k76_nn_rsem_fp_run4_alignable.fq
is Done!

/Volumes/pichia/aman/rsem-1.1.18-modified/rsem-run-em k76_nn/
rsem_k76_nn/k76_nn_ref 1 k76_nn_rsem_fp_run4 k76_nn_rsem_fp_run4 -p 4 -
b s k76_nn_rsem_fp_run4.sam.gz 0 --gibbs-out
> > Parsed 324000000 entries...
>
> read more »

b...@cs.wisc.edu

unread,
May 7, 2012, 10:34:22 AM5/7/12
to rsem-...@googlegroups.com
Hi Pathunk,

Two things. 1) Can you check if
k76_nn/rsem_k76_nn/k76_nn_ref[.seq/.ti/.grp] exist? 2) Why do you want to
set --tag option?

Best,
Bo
>> read more �
>

pathunk

unread,
May 7, 2012, 11:09:38 AM5/7/12
to RSEM Users
Hi Bo,

Those files do exist. Here's the contents of k76_nn/rsem_k76_nn/
k76_nn_ref.1.ebwt
k76_nn_ref.2.ebwt
k76_nn_ref.3.ebwt
k76_nn_ref.4.ebwt
k76_nn_ref.chrlist
k76_nn_ref.grp
k76_nn_ref.idx.fa
k76_nn_ref.rev.1.ebwt
k76_nn_ref.rev.2.ebwt
k76_nn_ref.seq
k76_nn_ref.ti
k76_nn_ref.transcripts.fa

I thought I'd set the tag option since I have not yet thoroughly
evaluated or processed my assembly (this is a de novo transcriptome
assembly with 261,872 contigs based on the same read libraries for
which I'd like to calculate expression differences) and I was worried
that some reads might align to a large number of contigs. As I
understand the tag option, it flags reads that have "too many valid
alignments."

Could the large number of contigs be an issue here?
> >> > Parsed 129000000 entries...
>
> read more »

b...@cs.wisc.edu

unread,
May 7, 2012, 12:44:05 PM5/7/12
to rsem-...@googlegroups.com
Hi Pathunk,

I do not think so. Please try these, take the header and first 1000 reads
in the .sam.gz file (let us call it truncated sam file) generated and see
if RSEM can finish it in time. If not, please send me the truncated sam
file and k76_nn_ref.transcripts.fa. I'll see what happens.

Thanks,
Bo
>> read more �
>


pathunk

unread,
May 15, 2012, 3:51:04 PM5/15/12
to RSEM Users
Hi Bo,

I created a truncated sam file by concatenating the entire header
section with the first 1000 lines of the sequence portion of the file.
When I ran rsem-calculate-expression, I did get isoforms.results and
genes.results files, but again nearly all of the expected read counts
were 0 (this was the original problem I posted to initiate this
thread).

When I looked line-by-line at the truncated sam file, I noticed that
many of the contigs were short, and contained duplicated regions
relative to other contigs. In case this presented a problem, I tried a
second run, where I filtered my original reference fasta (i.e. my de
novo assembly) to retain only those sequences >1000nt. I ran rsem-
prepare-reference on this filtered reference, and then ran calculate-
expression. Again, rsem-run-em ran forever. I then went through the
same truncation procedure. When I ran calculate-expression with the
new truncated sam, the .results files were created, but again ~99% of
'genes' showed 0 expected counts, and the other ~1% had 1 expected
alignment.

The reference was created with these same sets of reads, so it is
strange to me that I would have so few expected counts. Though I
didn't see alignment stats in the output from the run with the
truncated sam, the run with the full (>1000nt) filtered reference
showed that 24 million reads aligned:
# reads processed: 61349468
# reads with at least one reported alignment: 24294996 (39.60%)
# reads that failed to align: 37025542 (60.35%)
# reads with alignments suppressed due to -m: 28930 (0.05%)

I will send you the filtered truncated sam and the ref.transcripts.fa
file in case you'd like to look at them. Please let me know if you
have any thoughts.

Thanks,
Aman (pathunk)
> >> >> > Parsed 65000000 entries...
>
> read more »

b...@cs.wisc.edu

unread,
May 27, 2012, 2:22:18 PM5/27/12
to rsem-...@googlegroups.com
Hi Aman,

Sorry for late reply.

For the truncated data, you only have 388 reads which can align to your
reference. So it is normal that you find most of contigs have expected
count 0.

I cannot understand why RSEM stuck on your large dataset while it can pass
this small one.

But there is still things we can check. First, for the large data set,
have you seen some outputs like below:

Refs.loadRefs finished!
Thread 0 : N = 49, NHit = 53
Thread 1 : N = 47, NHit = 53
Thread 2 : N = 45, NHit = 53
Thread 3 : N = 47, NHit = 53
Thread 4 : N = 46, NHit = 53
Thread 5 : N = 49, NHit = 53
Thread 6 : N = 48, NHit = 53
DAT 0 reads left
Thread 7 : N = 57, NHit = 59
EM_init finished!
estimateFromReads, N0 finished.
estimateFromReads, N1 finished.
estimateFromReads, N2 finished.

In addition, can you send me the sample.cnt file under sample.temp directory?

Thanks,
Bo
>> read more �
>


Reply all
Reply to author
Forward
0 new messages