Discrepancy between *coverage.txt and *ALL.txt files in 3.0

90 views
Skip to first unread message

jbea...@gmail.com

unread,
Apr 17, 2018, 7:09:41 PM4/17/18
to TRUST for T cell receptor hypervariable region assembly
Hi, Could you please help be figure out why it might happen that my *ALL.txt files has 2 lines and my *coverage.txt files has 176 lines? Here is an example below:


Here is the entire *ALL.txt file:

filename est_lib_size est_clonal_exp Vgene Jgene Cgene reportgene cdr3aa cdr3dna minus_log_Eval totaldna contig_reads_count

PCGA-01-0001-029-19762-00897BX-1007485R.clean.bam 11057 0.025641025641 TRBJ2-1*01 LAGSSYNEQFF CTAGCGGGGAGCTCCTACAATGAGCAGTTCTTC 11.697414907 CTAGCGGGGAGCTCCTACAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTGCTAG 1

PCGA-01-0001-029-19762-00897BX-1007485R.clean.bam 11057 0.0512820512821 TRDJ1*01 IFRPTDKLIF ATATTCCGGCCCACCGATAAACTCATCTTT 12.6021047272 ATATTCCGGCCCACCGATAAACTCATCTTTGGAAAAGGAACCCGTGTGACTGTGGAACCAA 2



Here is a portion the *coverage.txt file

TRAJ44|chr14:22963778-22963868 24

TRBJ2-7|chr7:142495112-142495186 95

TRAJ18|chr14:22994592-22994685 9

TRDV2|chr14:22891537-22892072 12

TRAJ24|chr14:22988919-22989009 1

TRAV21|chr14:22520780-22521354 19

TRAJ22|chr14:22990988-22991075 6

TRBV6-1|chr7:142028178-142028649 49

TRDC|chr14:22933196-22933315 56

TRGV1|chr7:38407162-38407656 11

TRAJ57|chr14:22947833-22947923 9

TRAJ50|chr14:22957553-22957640 3

TRBV4-1|chr7:142013036-142013528 7

TRAV14/DV4|chr14:22392314-22392866 10

TRAV24|chr14:22573620-22574123 3

TRGC2|chr7:38279628-38279768 185

TRBV29-1|chr7:142448120-142448780 66

TRAC|chr14:23016447-23016719 976

TRDJ4|chr14:22924241-22924288 1

TRAV30|chr14:22636325-22636923 18

TRAV35|chr14:22689792-22690410 86

TRGC2|chr7:38284785-38284832 30

TRGV7|chr7:38374642-38375012 37

TRGV5P|chr7:38384631-38385100 2

TRAV3|chr14:22192136-22192607 21

TRBV5-5|chr7:142148888-142149391 8

TRBV6-6|chr7:142161891-142162362 27

TRDJ3|chr14:22928062-22928148 5

TRBV21-1|chr7:142344427-142344926 28

TRAJ6|chr14:23007988-23008077 2

TRAJ38|chr14:22971187-22971276 1

TRAJ12|chr14:23000861-23000948 11

TRAV26-1|chr14:22591485-22592282 9

TRAV8-4|chr14:22362741-22363248 37

TRAJ25|chr14:22987762-22987849 1

TRAJ42|chr14:22965844-22965937 13

TRAJ29|chr14:22982893-22982980 12

TRAJ34|chr14:22976623-22976708 3

TRAJ43|chr14:22964868-22964949 42

TRBV5-6|chr7:142131372-142131876 15

TRAJ52|chr14:22955188-22955284 5

TRAV10|chr14:22293672-22294278 14

TRAV12-3|chr14:22433818-22434329 40

TRAV8-3|chr14:22320735-22321222 57

TRBJ2-2|chr7:142494216-142494294 25

TRBV6-4|chr7:142250663-142251136 44

TRAJ40|chr14:22968644-22968732 1

TRGC2|chr7:38282030-38282077 39

TRAV8-2|chr14:22314944-22315441 95

TRBJ2-6|chr7:142494895-142494975 12

TRAJ39|chr14:22970557-22970647 10

TRDC|chr14:22934525-22935569 187

TRAV27|chr14:22616046-22616626 10

TRBVA|chr7:142389237-142389730 2

TRAJ30|chr14:22981806-22981890 26

TRAV9-2|chr14:22409369-22409887 27

TRAV26-2|chr14:22670477-22671301 4

TRBV23-1|chr7:142353468-142354003 52

TRBV10-2|chr7:142206473-142206876 2

TRAV23/DV6|chr14:22554729-22555277 18

TRAJ10|chr14:23002417-23002508 2

TRAJ13|chr14:22999998-23000088 3

TRBV5-7|chr7:142111353-142111858 3

TRDC|chr14:22932766-22932831 40

TRDV1|chr14:22564327-22564935 119

TRAC|chr14:23019501-23019608 390

TRAJ56|chr14:22948482-22948571 1

TRAJ28|chr14:22984573-22984666 2

TRBC2|chr7:142500187-142500210 101

TRBV19|chr7:142326571-142327085 29

TRAJ16|chr14:22997459-22997546 3

TRBV10-1|chr7:142231533-142231938 7



The log file doesn't seem to have any noticeable errors.

Thank you!

Bo Li

unread,
Apr 17, 2018, 8:49:17 PM4/17/18
to Jennifer Beane, TRUST for T cell receptor hypervariable region assembly
Coverage file is the number of reads falling into each gene region. The *ALL file contains the CDR3 calls. There is no error here.

Best,
Bo

--
You received this message because you are subscribed to the Google Groups "TRUST for T cell receptor hypervariable region assembly" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trusttcr+unsubscribe@googlegroups.com.
To post to this group, send email to trus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trusttcr/8cf08150-1377-40aa-a89b-de3c9aaf4d5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jennifer Beane

unread,
Apr 18, 2018, 2:13:48 PM4/18/18
to Bo Li, TRUST for T cell receptor hypervariable region assembly
Hi,

Thank you for the response.  I think I'm still struggling to understand the output.  The data I had was 100 nt paired end with 50 nt of overlap, and thus trust switches to single end mode, and I'm not sure how the assembly is conducted in single end mode.  From reading your paper, I thought that reads were first mapped to the to the gene regions and these were subsequently assembled into CDR3 contigs.  However, in the above results, TRBJ2-1 and TRDJ1, contained in the *ALL.txt files are not present in the *coverage.txt file.  Can you help me understand the output or point me to a document explaining the output in more detail?

Thank you in advance,
Jen

Bo Li

unread,
Apr 18, 2018, 2:45:01 PM4/18/18
to Jennifer Beane, TRUST for T cell receptor hypervariable region assembly
For the coverage file, we only analyzed the variable genes. That's why the joining genes were not included.

Best,
Bo

Jennifer Beane

unread,
Apr 18, 2018, 4:51:03 PM4/18/18
to Bo Li, TRUST for T cell receptor hypervariable region assembly
Hi Dr. Li,

Thank you again for the prompt response.  I have one more clarification.  The coverage file has TRBJ and TRDJ genes, however, the *ALL.txt file has a TRBJ gene that is not in the *coverage.txt file.   For the above sample, the following TRBJ and TRDJ genes were in the coverage.txt file:

TRBJ2-7|chr7:142495112-142495186 95

TRBJ2-2|chr7:142494216-142494294 25

TRBJ2-6|chr7:142494895-142494975 12

TRBJ2-2P|chr7:142494374-142494426 19

TRBJ2-3|chr7:142494503-142494579 89

TRBJ2-5|chr7:142494775-142494850 26

TRBJ2-4|chr7:142494654-142494731 22

TRDJ4|chr14:22924241-22924288 1

TRDJ3|chr14:22928062-22928148 5

TRDJ2|chr14:22925652-22925734 2

TRDJ1|chr14:22919053-22919131 39

 
In the *ALL.txt file the J genes are:
TRBJ2-1*01 and TRDJ1*01  
 
Can you please try to clarify one more time why TRBJ2-1 would be in the *ALL.txt file but not in the *coverage.txt file?

Thank you again,
Jen

Bo Li

unread,
Apr 20, 2018, 8:51:38 PM4/20/18
to Jennifer Beane, TRUST for T cell receptor hypervariable region assembly
Hi Jen,

I would suggest you to read our method description in the 2017 NG paper. For PE library, TRUST search for unmapped reads with their paired mate mapped to a V, J or constant gene. The mapped reads give the gene information, which is included in the *ALL file. 

Best,
Bo
Reply all
Reply to author
Forward
0 new messages