Reconcile sequence ids from sequence summary file with .csv of .jplace file from PhyloSift

33 views
Skip to first unread message

Michael Doane

unread,
Dec 12, 2017, 1:19:01 PM12/12/17
to PhyloSift
Hi all

I am attempting to manipulate the phylosift output to generate richer figures that visualize the differences between my datasets. To do so, I have converted my .jplace files to csv using the Guppy command to_csv. In the .csv file, I get a list of all my sequence ids and their placement information within the tree. I am taken the sequence_summary file from the PhyloSift output and doing a index(match) function against the generated .csv file in an effort to place a taxonomic label to the edge # (which is not present in the sequence_summary file but is present in the .csv of the .jplace file).

The main problem I am having is that not all sequences (unique sequence ID, ex ID M00285:33:000000000-AB18F:1:1107:22277:13300 1:N:0:4) in the sequence_summary file are in the generated .csv from Guppy. Has anyone else run into this problem?

Thanks in advance.

Mike

 

Michael Doane

unread,
Mar 4, 2018, 12:57:50 PM3/4/18
to PhyloSift
Hi,

I wanted to get this thread back up and going. I am still working through reconciling sequence ID mismatches between .jplace file and sequence_taxa.txt. Using guppy, I have converted my .jplace to a .csv. My aim is to make a single file that has sequence ID, NCBI taxon #, Taxon rank, taxon name, and cumulative probablity.

However going through there are several ID's found in my converted jplace that are not present in my sequence_taxa file. Why are some of the sequence not in the sequence_taxa file?

My second questions pertains to the types of sequence ID's found in the converted jplace. 

Here are the two forms:
K26CO:03160:01585.2.199

M00285:18:000000000A611E:1:2101:15667:135531:N:0:20.1.495

I am assuming the first is the reference sequence used to make the initial tree? I know the second is my sequence ID. 

Thanks in advance for the information.
Mike D

Guillaume Jospin

unread,
Mar 4, 2018, 7:13:46 PM3/4/18
to phyl...@googlegroups.com
Hi Michael,
Sorry, I lost track of this.
Are you saying that you have some sequences from your input file that are in the Jplace file but not in the sequence_taxa file?
That's weird though I'm may not recall what the intended behavior is.
Is your Jplace the one for the concat marker? Some files are only including results for the house keeping markers (DNGNGWU) that get concatenated into the "concat" marker.
Also your converted jplace will include all the taxa on the tree so the reference sequences wouldn't get reported in the sequence_taxa file. Maybe that's where the differences come from.

So for the 2 forms, are those two headers part of your input file?
they should be of the form <read_name>.<start>.<end> 
Start and end being the read coordinates for the match. In the case of the concat marker, it's all the coordinates for the concatenation.
I wasn't able to find K26CO in my references. 

I'm not sure I am answering your question correctly... but I hope this helps.


--
You received this message because you are subscribed to the Google Groups "PhyloSift" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phylosift+unsubscribe@googlegroups.com.
To post to this group, send email to phyl...@googlegroups.com.
Visit this group at https://groups.google.com/group/phylosift.
For more options, visit https://groups.google.com/d/optout.

Michael Doane

unread,
Mar 21, 2018, 8:24:22 PM3/21/18
to PhyloSift
Hey Guillaume,

Thank you so much for the response. Okay, I have sorted out my predicament with reconciling sequences. First, I realized (finally) that I have two different sequencing platform files I am working with. Thus, the sequence ID's have different formats for naming them, and so that answer the ID problem.  The second is the problem of appending the sequence high coordinates to the end of the sequence ID in the jplace file.

So all is well now. Thank you for the help

Michael 
To unsubscribe from this group and stop receiving emails from it, send an email to phylosift+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages