CDR3 seq column for partis annotate?

19 views
Skip to first unread message

nbst...@gmail.com

unread,
Nov 27, 2017, 2:46:45 PM11/27/17
to partis
I don't see a way for partis annotate to output a CDR3 sequence column (i.e. a column that gives the actual sequence of the CDR3 for each of the input seqs). This would be useful info to have. Given the the information that is outputed by partis annotate, it seems pretty complicated to figure out what the CDR3 seq is. Is there a way to do this? Does one need to run ./bin/partis view-annotations? Thanks for any help you can provide!

Best,

Nicolas Strauli

Duncan Ralph

unread,
Nov 27, 2017, 3:12:40 PM11/27/17
to Nicolas Strauli, partis
Yeah we don't that explicitly to the file, to get it you need to process a bit, an example of processing the output files is here:



and an old version that actually shows how to print the cdr3 seq is here:


and a previous discussion:

--
You received this message because you are subscribed to the Google Groups "partis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partis+unsubscribe@googlegroups.com.
To post to this group, send email to par...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/partis/971638de-86e4-44eb-8daa-bc8317d143dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nicolas Strauli

unread,
Nov 27, 2017, 5:23:16 PM11/27/17
to Duncan Ralph, partis
When running the (modified) example-output-processing.py script on my partis annotate output I get the following error:

Traceback (most recent call last):
  File "get_cdr3_seqs_from_partis_annotate_output.py", line 36, in <module>
    get_cdr3_seqs_foreach_line(input_partis_annotate_csv_filepath=sys.argv[1], output_filepath=sys.argv[2], path_to_partis_master=sys.argv[3])
  File "get_cdr3_seqs_from_partis_annotate_output.py", line 24, in get_cdr3_seqs_foreach_line
    utils.add_implicit_info(glfo, line)
  File "/Users/nstrauli/tools/partis-master/python/utils.py", line 981, in add_implicit_info
    alternate_name = glutils.convert_to_duplicate_name(glfo, line[region + '_gene'])
  File "/Users/nstrauli/tools/partis-master/python/glutils.py", line 80, in convert_to_duplicate_name
    raise Exception('couldn\'t find alternate name for %s (and we\'re probably looking for an alternate name because it wasn\'t in glfo to start with)' % gene)
Exception: couldn't find alternate name for IGHV4-34*01+C67T (and we're probably looking for an alternate name because it wasn't in glfo to start with)

Any ideas?

Thanks,

Nicolas

Duncan Ralph

unread,
Nov 27, 2017, 8:02:34 PM11/27/17
to Nicolas Strauli, partis
Yeah, it looks like it inferred a new allele in your sample, so it needs the germline info directory for your sample instead of the default location. So replace the path in this line


it will have printed out where the parameters were getting written to while it was running. The germline dir is just the parameter dir + '/hmm/germline-sets', if you're in the right place you should see an '/igh/' subdir.
Reply all
Reply to author
Forward
0 new messages