How are 'est_clonal_exp' values calculated in TCR-ALL.txt files

59 views
Skip to first unread message

hischm...@googlemail.com

unread,
Nov 14, 2018, 12:51:21 PM11/14/18
to TRUST for T cell receptor hypervariable region assembly
Hello,
How exactly are the values in the 'est_clonal_exp' column of the '...TCR-ALL.txt' files calculated. You have mentioned in previous posts that those are FPKM-like values. But I cant reproduce the estimation.

For example:
sample1.bam    5442    0.0512820512821    TRBV29-1*01_TRBV29-1*02_TRBV29-1*03    TRBJ1-1*01        TRBC2|chr7:142498725-142499111    CSSTTGLNTEAFF    TGCAGCTCCACGACTGGCCTGAACACTGAAGCTTTCTTT    16.8568652736    AAGACAGCAGCATATATCTCTGCAGCTCCACGACTGGCCTGAACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCACAGTTGTAGAGGACCTGAACAAGGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTCCCACACCCAAAAGGCCACACTGGTGTGCCTGGCCACAGGCTTCTACCCCGACCACGTGGAGCTGAGCGGGAGTGTGTGGGGA    5

But when I calculate:
SF = libsize/1mio.      -->     5442/1mio. = 0.005442
RPM = reads/SF        -->     5/0.005442 = 918,78
RPKM = RPM/genelength in KB  -->   918,78/0.233 = 3943.26

Could you please clarify? Thank you in advance.

Kind regards,
Paul

Bo Li

unread,
Nov 14, 2018, 3:09:41 PM11/14/18
to Paul K, trus...@googlegroups.com
Hi Paul,

As you can imagine, a short piece of DNA fragment might be able to mapped to more than one CDR3 contigs. Therefore, when we estimate the frequency, we took an EM approach to split the ambiguous short reads into different contigs based on their estimated frequencies (this procedure was repeated until EM was reached). The FPKM notation was just a parable, not accurate. In reality, we used the expected read coverage (from EM) divided by the total number of TCR reads. 

However, I need to emphasize, unless you believe your data is deep enough, the frequency estimation is not accurate. This is why we never report any analysis of this value in our paper. Our simulation results suggested that, at low coverage, the estimation has almost no correlation with real values, due to high sampling errors. 

Thanks,
Bo

--
You received this message because you are subscribed to the Google Groups "TRUST for T cell receptor hypervariable region assembly" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trusttcr+u...@googlegroups.com.
To post to this group, send email to trus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trusttcr/4aeff624-da22-4eb4-ac69-6d6b7592da9f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages