Re: Hi!

A. Murat Eren

unread,

May 4, 2017, 4:30:19 PM5/4/17

to Tom Delmont, an...@googlegroups.com, Luke McKay, Özcan Esen

Hi Luke,

CC'ing the relevant parts of the response to the discussion group so it is archived and accessible to others as well.

For your bins, you can use something like this in a loop:

anvi-get-sequences-for-hmm-hits -p PROFILE.db \

-c CONTIGS.db \

-C COLLECTION_NAME \

-b BIN_NAME \

-o OUTPUT.fa \

--hmm-source Campbell_et_al \

--gene-names Ribosomal_L27,Ribosomal_X,Ribosomal_Y \

--return-best-hit \

--get-aa-sequences

You can replace the green selection with gene names you will find in the first column of this file:

https://github.com/merenlab/anvio/blob/master/anvio/data/hmm/Campbell_et_al/genes.txt

I am sure all the genes you are interested in is already in Campbell_et_al, so I don't think you need a custom HMM to access to those Ribosomal proteins.

The rest should be rather straightforward, but if you give me the list of gene names you are interested in, I can put together an ad hoc script for you :)

B

est,

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

On Thu, May 4, 2017 at 3:11 PM, Tom Delmont <tomod...@gmail.com> wrote:

On Thu, May 4, 2017 at 2:58 PM, Luke McKay <mcg...@gmail.com> wrote:
Quick question, what is the easiest way to extract a list of genes from each bin? For example, let's say I want to make a concatenated ribosomal protein tree and I have a list of 16 ribo proteins of interest. Is there a simple way to retrieve this information quickly from a handful of specified bins?
--
Dr. Luke McKay

Postdoctoral Fellow
NASA Astrobiology Program
Department of Land Resources and Environmental Sciences (815 LJH)
Center for Biofilm Engineering (313 Barnard Hall)
Montana State University

Tom Delmont

unread,

May 4, 2017, 4:32:35 PM5/4/17

to A. Murat Eren, an...@googlegroups.com, Luke McKay, Özcan Esen

hehe,

well, yes it is what I meant,

sorry for the confusion,

I should respond to emails only the morning :)

Tom

Luke McKay

unread,

May 4, 2017, 4:32:37 PM5/4/17

to A. Murat Eren, Tom Delmont, an...@googlegroups.com, Özcan Esen

This is awesome. Thank you!

Luke McKay

unread,

May 4, 2017, 5:08:50 PM5/4/17

to A. Murat Eren, Tom Delmont, an...@googlegroups.com, Özcan Esen

Just a thought for the future...

It would be REALLY cool if I could run that same command (anvi-get-sequences-for-hmm-hits), but also pass something like "--concatenate", and the output would be a fasta file with a separate entry for each bin followed by the desired aa sequences for that bin in the order in which they were requested (the comma separated list after --gene-names). This would be a very powerful tool and also help with downstream alignments for phylogenetic stuff.

Take it or leave it, I'm just being difficult :)

A. Murat Eren

unread,

May 4, 2017, 5:13:27 PM5/4/17

to Luke McKay, Tom Delmont, an...@googlegroups.com, Özcan Esen

I'll look into this.

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

A. Murat Eren

unread,

May 4, 2017, 11:36:13 PM5/4/17

to Luke McKay, Tom Delmont, an...@googlegroups.com, Özcan Esen

Hi Luke,

We now have a `--concatenate` flag for the program `anvi-get-sequences-for-hmm-hits`. Thank you for the suggestion (Note for self, the way meren did this currently sucks, but we will fix it later, and it will not affect the user experience).

Here is how I tested it, so you can try it on your own data if you switch to anvi'o master repository (please let me know if you test it and if thing break).

I first downloaded the anvi'o data pack for the infant gut data, which is here:

https://ndownloader.figshare.com/files/8252861

Once it was on my disk, I unpacked it and went into it:

tar -zxvf INFANTGUTTUTORIAL.tar.gz && cd INFANT-GUT-TUTORIAL

And imported the collection `merens`:

anvi-import-collection additional-files/collections/merens.txt -p PROFILE.db -c CONTIGS.db -C merens

Then I run the program `anvi-get-sequences-for-hmm-hits` in the anvi'o master this way:

anvi-get-sequences-for-hmm-hits -p PROFILE.db \
-c CONTIGS.db \

-C merens \
-o OUTPUT.fa \
--hmm-source Campbell_et_al \
--gene-names Ribosomal_L27,Ribosomal_L28,Ribosomal_L3 \
--return-best-hit \
--get-aa-sequences \
--concatenate

This instructs anvi'o to get all those three Ribosomal proteins in AA alphabet, and concatenate them for each bin in collection merens (you can specify your bins, too, in which case it wouldn't go through the entire collection). This is the resulting file (the alignments suck because there is a eukaryotic genome bin in this collection):

cat OUTPUT.fa
>P_rhinitidis|genes:Ribosomal_L3,Ribosomal_L27,Ribosomal_L28|separator:XXX
------------------------------------------MKYLVGKKIGMTQI----------FDEEGTVTPVSVIEVEPNVVVQKKTIESDGYNAIQVATQEVKEK--------KL
NKPQKGHLDKAGVGYKKHLSEFRTDDVD-SYNLG---------------------------------------------------DEIKVD-IFEVAEHVDVVGTSKGKGTAGVIKRHNF
GRGRETHG-SKFHRMPGGMGAASYPGKVFKNHRMAGKMGNERVTVQNLEIVRI------------------------------DTDKNLILVKGAIPGPKKGTVKIKSTVKLTK------
-------------------------------------XXX-MIKFDLLLFSS----------------------KKGAGSSKNGRDSNSKRLGVKRGDGQFVLAGNILVRQRGTKIHPGE
NVMKGSDDTLFATADGVLRF----------------------------TTKGKGG-----------------------------------------------------------------
----KKFANVY-------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------VEEKVEAXXX-----------------------------------------------
--------------------------------------MAKRCEICGKEKTFGNKISFSHSRSNRSWSPNLRKVKAIVN--GSPKRIYVCTRCLRS------------------------
-----------------------------------------------------GKVERAI--------------------------------------
>C_albicans|genes:Ribosomal_L3,Ribosomal_L27,Ribosomal_L28|separator:XXX
MSHRKYEAPRHGSLGFLPRKRAAKQRGRVKSFPKDVKSKPVALTAFLGYKAGMTTIVRDLDRPGSKMHKREVVEAATVVDTPPMVVVGVV-----GYVETPRGLRSLTTVWAEHLSEEVR
RRFYKNWYKSKKKAFTKYSGKYATDAKQVETELARIKKYASVVRVLAHTQIKKTPLSQKKAHLAEIQINGGSVSDKVDWAKEHFEKEVSVDSVFEQDEMIDVIAVTKGHGFEGVTHRWGT
KKLPRKT--HRGLRKVACIG-AWHPANVNWTVARAGQNGYHHRTSINHKVYRVGKGTDEANGATEFDRTKKTINPMGGFVRYGNVNNDFVLLKGSIPGVKKRVVTLRKSLYVDTSRRAVE
KVNLKWIDTASRFGKGRFQTPAEKHAFMGTLKKDLENXXX-MSSFVKGLFSHTRKSIDLTSNPLHTSIQIRTAKKRVSGSRTNNKDSAGRRLGPKKNEGHFVNPGQIIMRQRGTKIHPGD
NVKIGVDHTIFAVEPGYVRYYFDPFHPLRKYVGVSLKKNLKLPRPHFEPRLRRFGYVQITDPIEAQEEEASQSRKEMLAQPELEKLKEKKLNEKIQFIESTKTALVNEFGFDSEPSSKQL
EDASERLYNIYQLRASGQLLSEARIQTTFNTLYDLKLQAQKNNIDSLPNLLNEAKEFITRIDSIVGIEPTGELFKNLTKEEQLNLQKEISSELDTLYQTKALEKDYRIEAKKLINTPGVF
EPLQREELMAKYLPQVLPMDYPGSIIEISDSDSKNKNKKLSENIVIQRIFDETTRKVKLIGRPKEAFASAXXXMNVFRGLISIPRISCVSQIYSARQLSSTLPLSTKRTYDKFYKITKQL
QPIDKNVYEIGQERPDNISIPKDLPEFPKYEYEPRFFKRQNRGLYGGLQRKRSKSCSEYLNKTLRAHRPNAQWTKLWSETLNKRLRLRVATRVLKTISKEGGLDQYLLKSTPARVKTMGL
KAWQLRYRILQEREQKQRGNVTLLDGTTKPIQYISSNGLKFHATKDAMLSELYEAVQRDSYYPIKPFHFERDYSWLSYEEIVKKLEQYNWDFSELATK
>F_magna|genes:Ribosomal_L3,Ribosomal_L27,Ribosomal_L28|separator:XXX
------------------------------------------MKSILGKKIGMTQI----------FNEDGSVVPVTVIEAGPMVVTQIKTKEKEGYNAIQVGYIEKKEK--------HV
NQPMRGHFGKAGVSFKKHLQEFRIGDDE-QFNLG---------------------------------------------------DEIKSD-IFQDGDVVDVIGISKGKGTQGAIVRHNY
SRGPMGHG-SKSHRVAGARSAGSYPARVFKGRKGSGKMGHDRVTVQNLKIVKV------------------------------DNERNLLLIKGAVPGNKGGVVTVREAIKSK-------
-------------------------------------XXXMMIKLDLQLFSS----------------------KKGVSSTKNGRDSESKRLGTKKGDGQYVLAGNILVRQRGTKIHPGN
NVGKGGDDTLFTKIDGVVKF----------------------------ERIGKN------------------------------------------------------------------
----RKQVSVY-------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------PKEAXXX-----------------------------------------------
------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
>Aneorococcus_sp|genes:Ribosomal_L3,Ribosomal_L27,Ribosomal_L28|separator:XXX
------------------------------------------MKSIFTTKVGMTQV----------IDEDGVVTPVTVLKADENVVVQVKTEETDGYNAVQIGYMDKKEK--------NV
KKPVKGHFDKAGASYKRYLKEVNYGNDPIELAVG---------------------------------------------------DKLAVD-IFEAGEVVDVVATSKGKGTQGAI-----
------------------------------------------------------------------------------------------------------------------------
-------------------------------------XXX--------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------XXX-----------------------------------------------
------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
>E_facealis|genes:Ribosomal_L3,Ribosomal_L27,Ribosomal_L28|separator:XXX
-----------------------------------------MTKGILGKKVGMTQI----------FTESGELIPVTVVEATPNVVLQVKTVETDGYEAIQVGYQDKREV--------LS
NKPAKGHVAKANTAPKRFIKEFKNVELG-EYEVG---------------------------------------------------KEIKVD-VFQAGDVVDVTGTTKGKGFQGAIKRHGQ
SRGPMSHG-SRYHRRPGSMG-PVAPNRVFKNKRLAGRMGGDRVTIQNLEVVKV------------------------------DVERNVILIKGNIPGAKKSLITIKSAVKAK-------
-------------------------------------XXXMLLTMNLQLFAH----------------------KKGGGSTSNGRDSESKRLGAKSADGQTVTGGSILYRQRGTKIYPGV
NVGIGGDDTLFAKVDGVVRF----------------------------ERKGRD------------------------------------------------------------------
----KKQVSVY-------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------PVANXXX-----------------------------------------------
--------------------------------------MAKVCYFTGRKTSSGNNRSHAMNSTKRTVKPNLQKVRVLID--GKPKKVWVSTRALKS------------------------
-----------------------------------------------------GKIERV---------------------------------------
>S_aureus|genes:Ribosomal_L3,Ribosomal_L27,Ribosomal_L28|separator:XXX
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
-------------------------------------XXX-MLKLNLQFFAS----------------------KKGVSSTKNGRDSESKRLGAKRADGQFVTGGSILYRQRGTKIYPGE
NVGRGGDDTLFAKIDGVVKF----------------------------ERKGRD------------------------------------------------------------------
----KKQVSVY-------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------AVAEXXX-----------------------------------------------
--------------------------------------MGKQCFVTGRKASTGNRRSHALNSTKRRWNANLQKVRILVD--GKPKKVWVSARALKS------------------------
-----------------------------------------------------GKVTRV---------------------------------------
>S_epidermidis|genes:Ribosomal_L3,Ribosomal_L27,Ribosomal_L28|separator:XXX
-----------------------------------------MTKGILGRKIGMTQV----------FGENGELIPVTVVEASQNVVLQKKTEEVDGYNAIQVGFEDKQAYKKGSKSNKYA
NKPAEGHAKKADTAPKRFIREFRNVNVD-EYEVG---------------------------------------------------QEVSVD-TFETGDIIDVTGVSKGKGFQGAIKRHGQ
GRGPMAHG-SHFHRAPGSVGMASDASKVFKGQKMPGRMGGNTVTVQNLEVVQV------------------------------DTENSVILVKGNVPGPKKGLVEITTSIKKGNK-----
-------------------------------------XXX-MLKLNLQFFAS----------------------KKGVSSTKNGRDSESKRLGAKRADGQYVSGGSILYRQRGTKIYPGE
NVGRGGDDTLFAKIDGVVKF----------------------------ERKGRD------------------------------------------------------------------
----KKQVSVY-------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------AVAEXXX-----------------------------------------------
--------------------------------------MGKQCFVTGRKASTGNHRSHALNANKRRWNANLQKVRILVD--GKPKKVWVSARALKS------------------------
-----------------------------------------------------GKVTRV---------------------------------------
>P_avidum|genes:Ribosomal_L3,Ribosomal_L27,Ribosomal_L28|separator:XXX
------------------------------------MTNERTVKGVLGTKLGMTQL----------WDEHNKLVPVTVIQAGPCVVTQVRTPETDGYSAVQLGIGAVKAK--------KV
TKPEAGHFEKAGVTPRRHLVELRTADAS-EYTLG---------------------------------------------------QEITAD-VFSESDFVDVTGTSKGKGTAGVMKRHGF
GGLRATHGVHRKHRSPGSIGGCSTPGKVIKGLRMAGRMGAERVTVQNLQVHSV------------------------------DAERGIMLVRGAVPGPKGSLLVVRSAAKKAAKNGDAA
-------------------------------------XXX---------MAH----------------------KKGASSSRNGRDSNAQRLGVKRFGGQLVNAGEIIVRQRGTHFHPGD
GVGRGGDDTLFALRDGNVEF---------------------------GTRRG--------------------------------------------------------------------
----RKIVNVN-------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------PVEVPVEAXXX-----------------------------------------------
--------------------------------------MSRRCQVRGTKPGFGNNVSHSQRHTKRRWNPNIQKKRYWVPSLGRQVTLTLTPKAMKEIDRRG-------------------
--------------------------------------------VDVVIAEMLARGEKI---------------------------------------
>S_hominis|genes:Ribosomal_L3,Ribosomal_L27,Ribosomal_L28|separator:XXX
-----------------------------------------MTKGILGRKIGMTQV----------FGENGELIPVTVVEANQNVVLQKKTEEVDGYNAIQVGFADKQAYKKDAKSNKYA
NKPAEGHAKKAGAAPKRFIREFRNVNVD-EYEVG---------------------------------------------------QEVTVD-TFEAGDIIDVTGTSKGKGFQGAIKRHGQ
GRGPMAHG-SHFHRAPGSVGMASDASRVFKGQKMPGRMGGNTVTVQNLEVVQV------------------------------DTDNNVILVKGNVPGPKKGFVEIKSSIKKGNK-----
-------------------------------------XXX-MLKLNLQFFAS----------------------KKGVSSTKNGRDSESKRLGAKRADGQFVTGGSILYRQRGTKIYAGE
NVGRGGDDTLFAKIDGVVRF----------------------------ERKGRD------------------------------------------------------------------
----KKQVSVY-------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------AVAEXXX-----------------------------------------------
------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
>L_citreum|genes:Ribosomal_L3,Ribosomal_L27,Ribosomal_L28|separator:XXX
-----------------------------------------MTKGILGRKVGMTQV----------FTESGELIAVTAVEATPNVVLQVKNIATDGYNAIQLGYQDKRTV--------LS
NKPEQGHASKANTTPKRYVREVRDAEG--EFNAG---------------------------------------------------DEIKVD-TFQAGDYVDVTGITKGHGFQGAIKKLGQ
SRGPMAHG-SRYHRRPGSMGAII--NRVFKGKLLPGRMGNNKRTMQNVAIVHV------------------------------DVENNLLLLKGNVPGANKSLLTIKSTVKVN-------
-------------------------------------XXX--------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------XXX-----------------------------------------------
------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------

Best,

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

Luke McKay

unread,

May 10, 2017, 7:54:15 PM5/10/17

to A. Murat Eren, Tom Delmont, an...@googlegroups.com, Özcan Esen

Hi Meren,

Thank you so much for working on this--this is an incredibly powerful option and makes phylogenomics way faster/easier! I switched to master and tried the new --concatenate-genes option for anvi-get-sequences-for-hmm-hits. It went through on the first try without any errors, but there is one kind of important issue... the genes are not concatenated in the same order as is written after --gene-names in the command. This part is crucial for downstream alignments, especially those where you align your sequences with someone else's published alignment which is typically the case (they need to be in the same order).

That said, I couldn't be happier that this is even being considered--so thanks!!

Cheers,

Luke

For reference, here is the command I ran:

anvi-get-sequences-for-hmm-hits -c contigs.db -p PROFILE.db/PROFILE.db -C tSNE_5000 -b C000137_1 --gene-names Ribosomal_L15,Ribosomal_S10,Ribosomal_L2,Ribosomal_L3,Ribosomal_L4,Ribosomal_L18,Ribosomal_L6,Ribosomal_S8,Ribosomal_L5,Ribosomal_L24,Ribosomal_L14,Ribosomal_S17,Ribosomal_S3,Ribosomal_L22,Ribosomal_S19,Ribosomal_10 --return-best-hit --hmm-source Rinke_et_al --get-aa-sequences -o C137_RiboProt_Concat.fa --concatenate

And here is the resulting fasta:

>C000137_1|genes:Ribosomal_L5,Ribosomal_L4,Ribosomal_L22,Ribosomal_S19,Ribosomal_L3,Ribosomal_L14,Ribosomal_S8,Ribosomal_L6,Ribosomal_S17|separator:XXX

MNPMREVRVEKVTLNIGVGEGGEKLSKAETLLERLTGQKPARTLAKKTVREFGMRKGEPVGVKVTLRGRRAEEMLPKLLQAVDNRLSSRSFDGQGNFSFGIKEYIDLPGVRYDPEIGMFG

MDVCVTLERPGYRVKRRKRQRRRIPSSHAITREEAIKFVTEKWGVTVVEXXXMSWRVKVPVFSLEGSPVGELELPEFFQEEYRPDLIRRAVLSSQTARLQPKGVDPMAGKRTSAETWGKG

YGVARVRRVKGRGYPAAGRGAFAPHTVGGRRAHPPKVEKVLKERINRKEKRLALRSAVAATARAELVRSRGHLTEGVPSLPLVVEDRLEELSETKRVKEVFQKLGLWKEVERVSSGIKIR

AGRGKMRGRRYRSPVGPLVVVSKDRGIKKGAGNLPGVKVVEARNLGVEDLAPGGMPARLTVWTPSALQELGERVXXXMPRFGYSCKVEEPCARAMGKELRISPKDAVEICREIRGMGLRE

AQSYLQEVAKGKRAVPFRRHAKKVAHHRGTGGAGAYPVKAAKAILQVLKNAEANAVYKGLNTEKLRVVHASASKGITLPGILPRAFGRATPYNKPLTNVQIVLKEAXXXMPKKFTYRGYT

LEELKKLSLEEFAKLLPSRQRRSLLRMLRGTHPEARKLLAKVRKLSKKGNGETIIRTHRREMIIVPEMVGLKFGVYNGKEFVVVEVKPEMIGHRLGEFSQTRKKVVHGAPGIGATRSSMF

VPLKXXXMGRKAHRPRRGSLAYTPRVRAPSPLPHIRSWPEREETRLLGFAGYKAGMTHVFYLDDRKTSPTAGKEVFSPATVIEAPPLRVWGLRLLGRTSRGLKTLTEVWAENLPKELGRV

LSLPKKSDPERMKRAEELVREGKVEEVRVLACTQPELCSGPKKKPDVLELGVGGKDVGSKWEYAKSLLGKEVRASEVFKPGEYVDVTAITKGKGFQGPVKRWGVKILPRKTDEGRRQVGT

LGPWTPARIMWTVPAAGQMGYHQRTELNKRILKIGRGEEITPKGGFKRWGPVRSDYVLVAGSLPGPTKRLVHLRLARRMPEAGGVPTLTYVASTGAGGXXXMGKRAGGIMATGMAAPVRA

LPIGARLVCADNTGAKEVQIISVVGLRGARRRLVSAGVGDQVVVSVKKGTPELRKQIVRAVVVRQRKPYRRADGLRVKFEDNAVAIIAPDGSPKGSEIRGPIAKEAAERWPKLAGIAAMV

IXXXMLLDPLANALSKINNYEKAGKREVFIFPSSKLIEGVLGVFREEGYIEGFEREGEGLRVRLAGRINRCGAVKPRHSLKKGEYEKWEKRFLPASGLGTIVITSSQGIMSLAKAKERGI

GGRLLAYVYXXXMVVEEIEIPEGVEVKIEGGRVEVSGPKGKVVRNLSLRDLEVRREGNKVRIYSRSDKRKVKAMAGTVKAHLRNAFKGVTQGFVYKLKIVYSHFPITVKVEGDRVTIHNL

LGEKVPRVARIVGGAEVEVVGDEILVRGVDKEEVGQTALNIEQASRPKGKDPRVFQDGCYLFERGXXXMKSVLGVKPPEATCSDPRCPFHGNLSVRGRVLEGTVVSDKRAKTVTVEIPRV

QRVRKYERLEKRTSRIHAHNPPCLNARVGDRVKIAECRRLSKTKAFVVIAKEGG

A. Murat Eren

unread,

May 11, 2017, 4:27:28 PM5/11/17

to Luke McKay, Tom Delmont, an...@googlegroups.com, Özcan Esen

Hi Luke,

I just fixed it in the master repo. If you run the same command, the gene order should be preserved.

Can you please let us know whether it is working properly if you have a chance to test it?

Thanks!

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

Luke McKay

unread,

May 12, 2017, 10:48:45 AM5/12/17

to A. Murat Eren, Tom Delmont, an...@googlegroups.com, Özcan Esen

Hi Meren,

I updated my codebase from the master and tried again. It looks like the protein order in the resulting fasta has changed, but unfortunately it still is not consistent with the order listed in the command. I tried this using a single bin, the full collection, and each with Rinke and with Campbell as the sources. Below is what I tried for a specified bin.

command:

Luke$ anvi-get-sequences-for-hmm-hits -c contigs.db -p PROFILE.db/PROFILE.db -C tSNE_5000 -b C000137_1 --gene-names Ribosomal_L15,Ribosomal_S10,Ribosomal_L2,Ribosomal_L3,Ribosomal_L4,Ribosomal_L18,Ribosomal_L6,Ribosomal_S8,Ribosomal_L5,Ribosomal_L24,Ribosomal_L14,Ribosomal_S17,Ribosomal_S3,Ribosomal_L22,Ribosomal_S19,Ribosomal_10 --return-best-hit --hmm-source Rinke_et_al --get-aa-sequences -o C137_RiboProt_Concat_update.fa --concatenate

And the resulting fasta:

>C000137_1|genes:Ribosomal_L14,Ribosomal_L22,Ribosomal_L3,Ribosomal_L4,Ribosomal_L5,Ribosomal_L6,Ribosomal_S17,Ribosomal_S19,Ribosomal_S8|separator:XXX

MGKRAGGIMATGMAAPVRALPIGARLVCADNTGAKEVQIISVVGLRGARRRLVSAGVGDQVVVSVKKGTPELRKQIVRAVVVRQRKPYRRADGLRVKFEDNAVAIIAPDGSPKGSEIRGP

IAKEAAERWPKLAGIAAMVIXXXMPRFGYSCKVEEPCARAMGKELRISPKDAVEICREIRGMGLREAQSYLQEVAKGKRAVPFRRHAKKVAHHRGTGGAGAYPVKAAKAILQVLKNAEAN

AVYKGLNTEKLRVVHASASKGITLPGILPRAFGRATPYNKPLTNVQIVLKEAXXXMGRKAHRPRRGSLAYTPRVRAPSPLPHIRSWPEREETRLLGFAGYKAGMTHVFYLDDRKTSPTAG

KEVFSPATVIEAPPLRVWGLRLLGRTSRGLKTLTEVWAENLPKELGRVLSLPKKSDPERMKRAEELVREGKVEEVRVLACTQPELCSGPKKKPDVLELGVGGKDVGSKWEYAKSLLGKEV

RASEVFKPGEYVDVTAITKGKGFQGPVKRWGVKILPRKTDEGRRQVGTLGPWTPARIMWTVPAAGQMGYHQRTELNKRILKIGRGEEITPKGGFKRWGPVRSDYVLVAGSLPGPTKRLVH

LRLARRMPEAGGVPTLTYVASTGAGGXXXMSWRVKVPVFSLEGSPVGELELPEFFQEEYRPDLIRRAVLSSQTARLQPKGVDPMAGKRTSAETWGKGYGVARVRRVKGRGYPAAGRGAFA

PHTVGGRRAHPPKVEKVLKERINRKEKRLALRSAVAATARAELVRSRGHLTEGVPSLPLVVEDRLEELSETKRVKEVFQKLGLWKEVERVSSGIKIRAGRGKMRGRRYRSPVGPLVVVSK

DRGIKKGAGNLPGVKVVEARNLGVEDLAPGGMPARLTVWTPSALQELGERVXXXMNPMREVRVEKVTLNIGVGEGGEKLSKAETLLERLTGQKPARTLAKKTVREFGMRKGEPVGVKVTL

RGRRAEEMLPKLLQAVDNRLSSRSFDGQGNFSFGIKEYIDLPGVRYDPEIGMFGMDVCVTLERPGYRVKRRKRQRRRIPSSHAITREEAIKFVTEKWGVTVVEXXXMVVEEIEIPEGVEV

KIEGGRVEVSGPKGKVVRNLSLRDLEVRREGNKVRIYSRSDKRKVKAMAGTVKAHLRNAFKGVTQGFVYKLKIVYSHFPITVKVEGDRVTIHNLLGEKVPRVARIVGGAEVEVVGDEILV

RGVDKEEVGQTALNIEQASRPKGKDPRVFQDGCYLFERGXXXMKSVLGVKPPEATCSDPRCPFHGNLSVRGRVLEGTVVSDKRAKTVTVEIPRVQRVRKYERLEKRTSRIHAHNPPCLNA

RVGDRVKIAECRRLSKTKAFVVIAKEGGXXXMPKKFTYRGYTLEELKKLSLEEFAKLLPSRQRRSLLRMLRGTHPEARKLLAKVRKLSKKGNGETIIRTHRREMIIVPEMVGLKFGVYNG

KEFVVVEVKPEMIGHRLGEFSQTRKKVVHGAPGIGATRSSMFVPLKXXXMLLDPLANALSKINNYEKAGKREVFIFPSSKLIEGVLGVFREEGYIEGFEREGEGLRVRLAGRINRCGAVK

PRHSLKKGEYEKWEKRFLPASGLGTIVITSSQGIMSLAKAKERGIGGRLLAYVY

A. Murat Eren

unread,

May 12, 2017, 11:02:18 AM5/12/17

to Luke McKay, Tom Delmont, an...@googlegroups.com, Özcan Esen

My tests are failing me again? :/

Can you please send me your profile and contigs dbs so I can take another look?

Sorry for the inconvenience.

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

Luke McKay

unread,

May 12, 2017, 11:05:41 AM5/12/17

to A. Murat Eren, Tom Delmont, an...@googlegroups.com, Özcan Esen

Yes...

A. Murat Eren

unread,

May 12, 2017, 4:36:05 PM5/12/17

to Luke McKay, Tom Delmont, an...@googlegroups.com, Özcan Esen

Hi Luke,

Thank you very much for making the files available. I took a look at things using them, and I discovered multiple things. One, I did make a mistake (which is a previously published finding, but still very interesting that we see that over and over again): I changed the library, but forgot to commit my changes in the program. It is done now (please git pull). Second, some of the gene names you had did not occur in the collection (for instance, Rinke et al does not have a Ribosomal_L15, but it has Ribosomal_L15e, etc). Please make sure you compare your gene names with what is in the collection by first running the same command line with the `--list-available-gene-names` flag. Although, this could still be my mistake given the first finding ;)

Best,

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

Luke McKay

unread,

May 12, 2017, 4:55:55 PM5/12/17

to A. Murat Eren, Tom Delmont, an...@googlegroups.com, Özcan Esen

Thanks very much for this, Meren! Good point about the different protein names in Campbell vs Rinke, I forgot about that.

Reply all

Reply to author

Forward