cdr3 boundaries

65 views
Skip to first unread message

Jakub Otwinowski

unread,
Nov 7, 2016, 3:40:35 PM11/7/16
to par...@googlegroups.com
I want to get the cdr3 boundaries. The code has changed, so what is now the easiest way to spit out the boundaries in the annotation csv?

Jakub

Duncan Ralph

unread,
Nov 7, 2016, 4:29:21 PM11/7/16
to Jakub Otwinowski, partis
This should do it, modulo pseudocodery (also attached, but not sure how google groups will handle that):

import csv
import sys
partis_path = '.'
sys.path.insert(1, partis_path + '/python')
import utils
import glutils

glfo = glutils.read_glfo(partis_path + '/data/germlines/human', chain='h')

with open(partis_path + '/test/reference-results/annotate-new-simu.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for line in reader:
        utils.process_input_line(line)
        utils.add_implicit_info(glfo, line, existing_implicit_keys=('aligned_d_seqs', 'aligned_j_seqs', 'aligned_v_seqs', 'cdr3_length', 'naive_seq', 'in_frames', 'mutated_invariants', 'stops', 'mut_freqs'))
        utils.print_reco_event(glfo['seqs'], line)
        cdr3_bounds = (line['codon_positions']['v'], line['codon_positions']['j'] + 3)
        print ''
        print 'should match the above:'
        print '%s naive cdr3' % line['naive_seq'][cdr3_bounds[0] : cdr3_bounds[1]]
        print '%s mature' % line['seqs'][0][cdr3_bounds[0] : cdr3_bounds[1]]
        break


On Mon, Nov 7, 2016 at 12:40 PM, Jakub Otwinowski <jak...@sas.upenn.edu> wrote:
I want to get the cdr3 boundaries. The code has changed, so what is now the easiest way to spit out the boundaries in the annotation csv?

Jakub

--
You received this message because you are subscribed to the Google Groups "partis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partis+unsubscribe@googlegroups.com.
To post to this group, send email to par...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/partis/CAB-exDKcZOG9L2eLt5tZos7%2Bd8Bxs4r_oT7ofWnRao%2BUNkZLYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

tmp.py

Jakub Otwinowski

unread,
Nov 7, 2016, 6:19:46 PM11/7/16
to Duncan Ralph, partis
Thanks, it mostly works, except it eventually hits this error

Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
  File "/partis/python/utils.py", line 822, in add_implicit_info
    uneroded_gl_seq = glfo['seqs'][region][line[region + '_gene']]
KeyError: ''


Duncan Ralph

unread,
Nov 7, 2016, 6:40:05 PM11/7/16
to Jakub Otwinowski, partis
yeah, that'll be a failed sequence, presumably you want to skip it.

Duncan Ralph

unread,
Nov 18, 2016, 9:26:09 PM11/18/16
to partis
Removed the necessity for the crappy implicit key argument, and added this script to github:

https://github.com/psathyrella/partis/blob/master/bin/example-output-processing.py

halas...@gmail.com

unread,
Aug 11, 2017, 12:26:59 PM8/11/17
to partis
Hi,

Is there a version to get the CDR3 boundaries (i.e. where the Vends, the N starts and ends, etc etc.) for TCR sequences?

Hussein

Duncan Ralph

unread,
Aug 11, 2017, 2:42:59 PM8/11/17
to halas...@gmail.com, partis
An old version of the example actually has how to access the cdr3 boundaries in it. Let me know if it isn't clear.

a couple notes:
  - cdr3 boundaries and V/N boundaries aren't the same -- if you want the latter, which I call regional bounds, they're stored in the same dict with the key 'regional_bounds', as a (start, end) pair for each region (v, d, j) that's zero-indexed and python-slice-conventioned.
  - there's nothing different about how to access this for TCRs, although as you've probably already worked out you need to specify the --locus on the command line
  - and... I just realized last week that I screwed up my cdr3 length nomenclature! The numbers I point you to above ^ are fine, since all my codon positions are fine. But, imgt calls the "cdr3 length" the length *excluding* the six bases in the codons, while "junction length" *includes* them, and I somehow reversed this at some point. When I finish freaking out about this and figure out how to fix it without confusing backwards compatibility breakage I'll send a mail to the list.

--
You received this message because you are subscribed to the Google Groups "partis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partis+unsubscribe@googlegroups.com.
To post to this group, send email to par...@googlegroups.com.

Duncan Ralph

unread,
Jan 17, 2018, 2:13:28 PM1/17/18
to partis
I just added an option --extra-annotation-columns so you can specify non-default annotation columns to write to the annotation output file. So, for instance to add the cdr3 seqs you'd add `--extra-annotation-columns cdr3_seqs`. To see other choices, run `./bin/partis annotate --help` (perhaps with `|grep -C3 extra-ann`). This also makes it easy to add anything extra that you want without cluttering up the default output, so if you want something that isn't there just ask.


Reply all
Reply to author
Forward
0 new messages