Markers w/in marker database

89 views
Skip to first unread message

Michael Doane

unread,
Oct 23, 2017, 7:13:32 PM10/23/17
to PhyloSift

Hi, 
I'm interested in only the markers of bacterial/archaeal origin. Are one of these marker.tgz files specific to only THOSE particular marker genes? If so, then I could can just download this and perform the 'build' and 'Index' myself, correct? If anyone knows, please help.

Thank you in advance.
Mike

markers_20110804.tgz2011-08-04 22:3913M 
[   ]markers_20110811.tgz2011-08-11 23:2234M 
[   ]markers_20110812.tgz2011-08-12 20:2641M 
[   ]markers_20111212.tgz2011-12-12 06:436.5M 
[   ]markers_20111215.tgz2011-12-15 01:19325M 
[   ]markers_20111220.tgz2011-12-20 11:23222M 
[   ]markers_20111221.tgz2011-12-21 01:12225M 
[   ]markers_20120309.tgz2012-10-16 12:49450M 
[   ]markers_20121021.tgz2012-11-12 12:16489M 
[   ]markers_20131009.tgz2013-11-05 03:09445M 
[   ]markers_20131209.tgz2013-12-08 19:27467M 
[   ]markers_20140224.tgz2014-03-20 21:42749M 
[   ]markers_20140913.tgz2014-10-21 19:54739M 

Guillaume Jospin

unread,
Oct 23, 2017, 7:36:51 PM10/23/17
to phyl...@googlegroups.com
Download normal database and just keep the dngngwu markers and the Concat markers. 

Do the index step you mention and you should be good to go. 


Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "PhyloSift" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phylosift+...@googlegroups.com.
To post to this group, send email to phyl...@googlegroups.com.
Visit this group at https://groups.google.com/group/phylosift.
For more options, visit https://groups.google.com/d/optout.

Guillaume Jospin

unread,
Oct 23, 2017, 7:37:41 PM10/23/17
to phyl...@googlegroups.com
I would pick the latest release. We haven’t worked on an update in a long time. 

Sent from my iPhone

On Oct 23, 2017, at 4:36 PM, Guillaume Jospin <guillaum...@gmail.com> wrote:

Download normal database and just keep the dngngwu markers and the Concat markers. 
Sent from my iPhone

On Oct 23, 2017, at 4:13 PM, Michael Doane <mpdo...@gmail.com> wrote:

--
Message has been deleted

Michael Doane

unread,
Oct 23, 2017, 11:31:01 PM10/23/17
to PhyloSift
Thank you for the response.

I have deleted all the other markers in the marker_folder (leaving all the DNGNGWU and concat markers). When I run the index command (with --debug on, it locates the 40 markers, then stalls. I tried cancelling this process and running PS anyways on a dataset. The process seems to stall. 

Is deleting the other markers, the indexing the right approach for only having a subset of these markers that PS looks at?

Thanks
Mike

On Monday, October 23, 2017 at 7:15:56 PM UTC-7, Guillaume Jospin wrote:
I’m working on it now. The migration from sge to slurm has been complicated. 
I don’t recall where things broke but I’m close. I think I need to figure out why the taxonomy to pplacer nodes file isn’t getting generated. That’s the very last step of the marker build. 
I even gave up on some of the PS gadgetry and manually launched some builds to push things through. 
I’ll definitely release the package when it’s really updated. 


Sent from my iPhone

On Oct 23, 2017, at 4:36 PM, Guillaume Jospin <guillaum...@gmail.com> wrote:

Download normal database and just keep the dngngwu markers and the Concat markers. 


Sent from my iPhone

On Oct 23, 2017, at 4:13 PM, Michael Doane <mpdo...@gmail.com> wrote:

--

Guillaume Jospin

unread,
Oct 24, 2017, 9:53:03 AM10/24/17
to phyl...@googlegroups.com
Hrm. Did you also delete some of the files in there as well?
What the indexing process does is look for representative in all the marker directories and makes lastal files for searching. 
I’m not sure why it would stall. 
You should have at least 41 markers (40 dngngwu and 1 concat) but also some stand alone files such as marker lists, index references. 

Another way about it would be to use a custom file which restricts what candidate sequences get printed to be aligned. 
To do so you would create a text file with the names for the markers you want to keep. 1 marker name per line. You can then specify  —custom <list of markers file> when running phylosift. Now this is less efficient because it’ll search all the markers just not print the results for the markers not in your list. 

Also make sure you have deleted all the lock files that may be in the parent directory. Sometimes those can block a process after an unfinished indexing. 

Sent from my iPhone

On Oct 23, 2017, at 8:31 PM, Michael Doane <mpdo...@gmail.com> wrote:

Thank you for the response.

I have deleted all the other markers in the marker_folder (leaving all the DNGNGWU and concat markers). When I run the index command (with --debug on, it locates the 40 markers, then stalls. I tried cancelling this process and running PS anyways on a dataset. The process seems to stall. 

Is deleting the other markers, the indexing the right approach for only having a subset of these markers that PS looks at?

Thanks
Mike

Michael Doane

unread,
Oct 24, 2017, 10:53:24 AM10/24/17
to PhyloSift
Hi Guillaume,

So the indexing process worked this time after deleting all over markers and removing the .NFS located in the directory. 

The marker_list.txt file has all the bacterial markers, but excludes the concat. I even changed the name of the concat to exclude the .update because I read in another post that indexing does NOT look for the .update files. However this did not work. 

Output after running .phylosift index --debug
[doane@anthill share]$ ~/phylosift_v1.0.1/./phylosift index --debug
Found 40 markers
Requesting Indexed_markers : 0
Indexing 40 in /home3/doane/share/phylosift/markers
Inside index_marker_db


Files inside DNGNGWU00001
CONTENTS.json       DNGNGWU00001.fasta  DNGNGWU00001.log     DNGNGWU00001.rep  DNGNGWU00001.taxonmap  phylo_modelpGMMqI.json
DNGNGWU00001.clean  DNGNGWU00001.hmm    DNGNGWU00001.masked  DNGNGWU00001.stk  DNGNGWU00001.tree

Files inside Concat
concat.updated.clean     concat.updated.pruned.log  concat.updated.pruned.tree  CONTENTS.json           seq_ids.csv  tax_ids.txt
concat.updated.gene_map  concat.updated.pruned.pda  concat.updated.taxonmap     phylo_modelqvjoLH.json  taxa.csv


Some of the files are different inside the concat marker directory. Will this effect whether indexing is able to properly work?

Thanks!
Mike

Guillaume Jospin

unread,
Oct 24, 2017, 11:24:06 AM10/24/17
to phyl...@googlegroups.com
Oh don’t change the .updated stuff that’s for the placement and taxonomy parts of the run. 
Also the concat marker needs to be in there (for later steps) but won’t necessarily show up in the marker list. 

So the command still stalls? No new files get generated?

It should generate new lastal index files. It will also probably show errors when trying to do the rna stuff. You can ignore that. 

Sent from my iPhone

Michael Doane

unread,
Oct 24, 2017, 12:51:25 PM10/24/17
to PhyloSift
So here is what is in the marker directory:

concat        DNGNGWU00004  DNGNGWU00009  DNGNGWU00014  DNGNGWU00019  DNGNGWU00024  DNGNGWU00029  DNGNGWU00034  DNGNGWU00039           replast0.bck  replast0.suf  rnadb0.prj  rnadb.fasta
concat.codon  DNGNGWU00005  DNGNGWU00010  DNGNGWU00015  DNGNGWU00020  DNGNGWU00025  DNGNGWU00030  DNGNGWU00035  DNGNGWU00040           replast0.des  replast0.tis  rnadb0.sds  rnadb.prj
DNGNGWU00001  DNGNGWU00006  DNGNGWU00011  DNGNGWU00016  DNGNGWU00021  DNGNGWU00026  DNGNGWU00031  DNGNGWU00036  marker_list.txt        replast0.prj  replast.prj   rnadb0.ssp  taxdmp.zip
DNGNGWU00002  DNGNGWU00007  DNGNGWU00012  DNGNGWU00017  DNGNGWU00022  DNGNGWU00027  DNGNGWU00032  DNGNGWU00037  ncbi_taxonomy.db       replast0.sds  rnadb0.bck    rnadb0.suf  version.txt
DNGNGWU00003  DNGNGWU00008  DNGNGWU00013  DNGNGWU00018  DNGNGWU00023  DNGNGWU00028  DNGNGWU00033  DNGNGWU00038  ncbi_tree.updated.tre  replast0.ssp  rnadb0.des    rnadb0.tis

Th replast* files were all generated on my last attempt at indexing (time stamp shows file creation at time of index run). The index did go through to completion when I performed this function last.

I think my last concern then is the marker_list.txt file lacking the concat marker, but as you mention, this should be absent. I will amend the concat files to include the .update and re-index the files.

I will attempt a PS run and see how it goes.

Thanks for all the help!
Mike

Michael Doane

unread,
Oct 25, 2017, 2:10:39 PM10/25/17
to PhyloSift
Hey,

I have done a full PS run using only the DNGNGWU markers (+concat) and all seems to run. However, no trees are produced. 

In the run_info.txt, all steps are run. See below:
Chunk 1 sequences processed 20b5b0b6d73b20a830ab3bbeb042afc6
Chunk 1 Search completed 2017-10-24 12:34:08 2017-10-24 15:33:01 10733
Chunk 1 Align completed 2017-10-24 15:33:01 2017-10-24 15:33:14 13
Chunk 1 Place completed 2017-10-24 15:33:14 2017-10-24 15:33:14 0
Chunk 1 Summarize completed 2017-10-24 15:33:14 2017-10-24 15:33:23 9
Chunk 2 sequences processed dcaf5f7c95921f7bd0e232b92c4e0d84
Chunk 2 Search completed 2017-10-24 15:33:23 2017-10-24 17:51:58 8315
Chunk 2 Align completed 2017-10-24 17:51:58 2017-10-24 17:52:12 14
Chunk 2 Place completed 2017-10-24 17:52:12 2017-10-24 17:52:12 0
Chunk 2 Summarize completed 2017-10-24 17:52:12 2017-10-24 17:52:21 9
Chunk 3 sequences processed 3c01814394c7a0a765557b75cee1b6e1
Chunk 3 Search completed 2017-10-24 17:52:21 2017-10-24 20:18:39 8778
Chunk 3 Align completed 2017-10-24 20:18:39 2017-10-24 20:18:53 14
Chunk 3 Place completed 2017-10-24 20:18:53 2017-10-24 20:18:53 0
Chunk 3 Summarize completed 2017-10-24 20:18:53 2017-10-24 20:19:02 9
Chunk 4 sequences processed a7161aa1be467bee6d37f542052ee8d0
Chunk 4 Search completed 2017-10-24 20:19:02 2017-10-24 20:52:13 1991
Chunk 4 Align completed 2017-10-24 20:52:13 2017-10-24 20:52:21 8
Chunk 4 Place completed 2017-10-24 20:52:21 2017-10-24 20:52:21 0
Chunk 4 Summarize completed 2017-10-24 20:52:21 2017-10-24 20:52:27 6

So then in the blast_dir the marker_summary.txt is full of hits totaling to 13495. 

Then in the align directory, there are several alignment files, 1, 2, 3, 4 for each of the above chunks run.

Then in the tree.dir there is nothing.

In addition, all of the standalone summary files are empty. 

It appears that all goes except for the pplace step placing all of these aligned files into the tree. Any thoughts on why manipulating the database would hinder this step from going forwards?

Mike

Guillaume Jospin

unread,
Oct 25, 2017, 2:16:17 PM10/25/17
to phyl...@googlegroups.com
Hrm. The placement and summary stats are done on the concat marker. Do you have concat files in the treeDir??

Maybe we messed up something in the concat markers. I think you should have both a concat.updated and concat in the marker directory. 
PS is looking for updated versions at this point to get better trees for the placement. 

Sent from my iPhone
--

Michael Doane

unread,
Oct 25, 2017, 2:24:46 PM10/25/17
to PhyloSift
Alright, so looking in my marker database, it appears only the concat.codon.update and concat.update were present. The concat marker was not present in my marker directory.

I will re-run on a small dataset and hopefully have results soon.

Thanks
mike

Michael Doane

unread,
Oct 25, 2017, 2:52:33 PM10/25/17
to PhyloSift
I have complete a small run after adding the concat marker (which only contains CONTENT.json) and indexing this. The results are different than before. The marker_summary.txt in the blastdir indentifies several hits to each DNGNGWU markers, the aligndir contains alignment files (marker_summary.txt in this directory shows number for each DNGNGWU marker) for each DNGNGWU markers and concat.codon.updated.1.fasta, concat.updated.1.fasta, and concat16.condon.updated.fasta.

Again, the treedir is empty (no concat files in the treedir) and all standalone summary files are empty.

Michael Doane

unread,
Oct 25, 2017, 3:46:23 PM10/25/17
to PhyloSift
Sorry, in the last message I stated the results were different than before. The result of the run was in fact NOT different than the run before. Sorry for the confusion and wanted to clarify.

Thanks

Michael Doane

unread,
Oct 27, 2017, 12:05:10 PM10/27/17
to PhyloSift
Hi,

I have re-run my test sample using the entire database (downloaded again and left everything to default) and different files created in the alignment directory between the two run types. In marker summary within align dir the same number of alignments are made between the two runs. The major difference is the presence of the concat.codon.updated.sub1.1.fasta file in this directory.

When the whole database is utilized I get these concat files in the alignment dir:
concat.codon.updated.1.fasta
concat.codon.updated.sub1.1.fasta
concat.updated.1.fasta
concat16.codon.updatad.1.fasta

When I run with just the DNGNGWU markers and concat only the present files are:
concat.codon.updated.1.fasta
concat.updated.1.fasta
concat16.codon.updated.1.fasta

Also, the tree placement step is run just fine when the whole database is utilized.

Mike
Reply all
Reply to author
Forward
0 new messages