--
You received this message because you are subscribed to the Google Groups "partis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partis+unsubscribe@googlegroups.com.
To post to this group, send email to par...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/partis/91f5fb0b-1fe4-427a-a143-b5fcb67e9ec1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Oh, sorry, I was forgetting about the 'duplicates' column. So let me amend that to say "we don't _use_ multiplicity in any intelligent way to, for instance, inform annotation or partitioning". The 'duplicates' column is just sequences that partis found out were identical after removing non-coding regions to 5' of v and 3' of j. So I guess it might makes sense to allow to add your initial duplicates to that number? But then it might get confusing which is coming from where, and you can probably do that afterward, anyway?
On Tue, Nov 21, 2017 at 2:26 PM, Nicolas Strauli <nbst...@gmail.com> wrote:
Ah, now I see that I was interpreting the 'duplicates' column incorrectly. My 'unique_ids' are coded as integers, so I interpreted the 'duplicates' column as counts. Now I see that they are lists of unique IDs. Thanks. And I'll try your hack.
On Tue, Nov 21, 2017 at 1:40 PM, Duncan Ralph <dkr...@gmail.com> wrote:
No you're right, there isn't, since we don't use multiplicity information at all. We should, though, at some point.If you just want the information available in the output file, you can just make it part of the unique I'd, ie whatever comes after the '>' and before either any white space or pipes.
On Nov 21, 2017 1:21 PM, <nbst...@gmail.com> wrote:
Is there a way to tell partis the count for each input sequence? By count, I mean the number of reads that have the same sequence (after running basic QC stuff). Prior to running partis, I have been using PRESTO to do basic QC stuff (i.e. merging read pairs, Q-score filtering, etc), and at the end I collapse all identical reads. After which, the count information for each sequence is stored in the header line of each entry in the fastq file. The headers end up looking something like: @blah_blah_blah|DUPCOUNT=26|more_blah_blah_blah . Is there a way to tell partis that the count information for this sequence is given by 'DUPCOUNT' and is 26? I looked through the documentation, but didn't see anything there. Sorry if I missed something. It would be nice if this information would be passed on to the 'duplicates' column in the output csv files.
Best,
Nicolas
--
You received this message because you are subscribed to the Google Groups "partis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partis+un...@googlegroups.com.