Re: [dartR] Phylogenetic tree from the DArTseq SNP data

56 views
Skip to first unread message

Peter Unmack

unread,
May 29, 2023, 1:16:09 AM5/29/23
to da...@googlegroups.com
To make a phylogenetic tree you need to use the sequence data, not
distances.

gl2fasta(gl, method=3, outfile="filename.fas", outpath=getwd())

If you create a field in your metadata file called phylo_label you can
define the labels for each sequence and have them included by running
these commands prior to using the gl2fasta command above.

pop(gl)<- rep("",nInd(gl))
indNames(gl)<- gl@other$ind.metrics$phylo_label

Once you have the fasta file you need to import it / convert it so you
can use raxml, or a bayesian program to run a phylogenetic analysis.

Cheers
Peter Unmack

On 29/05/2023 3:07 pm, Divya SHAJI wrote:
> Hi,
>
> I would like to construct a Phylogenetic tree from the DArTseq SNP data.
> I am using the following code. I am not sure whether this code correct
> or not.
>
>> library(dartR)
>
>> gl <- gl.read.dart(filename="/home/user/Desktop/testset.csv")
>
>> saveRDS(gl, file="gl.Rdata")
>
>> gl <- readRDS("gl.Rdata")
>
> For creating a distance matrix
>
> D <- utils.dist.ind.snp (gl, method='Euclidean',scale=TRUE)
>
> capture.output (D, file = "output.tab", append = TRUE)
>
>
> How do I create a phylogenetic tree from the distance matrix?
>
> Thank you so much for your time and consideration.
>
> Best regards,
>
> Divya.
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "dartR" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dartr+un...@googlegroups.com
> <mailto:dartr+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dartr/032c7b15-0d7f-4bdc-bd5f-c1c3d9b064een%40googlegroups.com <https://groups.google.com/d/msgid/dartr/032c7b15-0d7f-4bdc-bd5f-c1c3d9b064een%40googlegroups.com?utm_medium=email&utm_source=footer>.

Divya SHAJI

unread,
May 29, 2023, 2:53:55 AM5/29/23
to dartR
Hi Peter.

Thank you for your reply.

I used the gl2fasta function to obtain the fasta file. However, all sequences have the same header. Is it possible to modify the header based on AlleleID?

>_
GNTACTCGTNGTCNTGTCAGCATTAGGTGGAGATTGTCCCTNATATTCGTGCGCAATTGTTNNAGCGANGCAGAACGGNGTNGGGCCNGCCGTNTAGACNAACCCTNGCCAGTAGNNNTNACGAAAAGGCNGNAGACACGCNCANCGCCCAGACCCTCGCGGAGGCGNCGGCAAATNNNCTTTCTCTTGNAARTTNNAAGGACTAGCAAAAATGCTTGAATATATNCCCGTTCCGGTTTTNCCTTCCTANTGGCA  
>_
GNTACTCGTNGTCTTGTCAGCATTAGGTGGAGATTGTCMCTNATATTCGTGCGCAATTGTTNNAGCGANGCAGAACGGNGTNGGGCCNGCCGTNTAGACGAASCCTGGCCAGTAGAGNTNNCGAAAAGGCNGNANACACGCNCANCGCCCAGACCCTCGCGGAGGCGNCGRCAAATNNNCTTTCTCTTGNAAAYNNNAAGGACTAGCAAAAATGCTTGAATATATNCCCGTTCCGGTTTTNCCTTCNTANTGGCA  

Divya.

Arthur Georges

unread,
May 29, 2023, 6:04:30 PM5/29/23
to da...@googlegroups.com
Hi Divya,

The options for doing a distance phylogeny using dartR are based on the script gl2phylip. This script will generate an input file for Felsenstein's phylip workflow, including bootstrapping your tree.

If the trees generated in this way are to reflect a pattern of ancestry and descent, the terminals need to be well defined entities not subject to recent cross-entity geneflow, hybridization or introgression. Certainly not to be applied to individuals.

If you are interested in doing a phylogeny on the SNPs as characters, then SNAPPER is probably your best option (https://github.com/rbouckaert/snapper). We do not yet have a gl2snapper, but gl2snapp might get you most of the way there.

Best. A

Bernd.Gruber

unread,
May 29, 2023, 7:01:16 PM5/29/23
to da...@googlegroups.com

HI Divya,

 

I assume you have not named your loci.

 

This can be tested with

 

locNames(gl)

 

if this is returning empty/null then you could do the following

 

 

locNames(gl)<- gl@other$loc.metrics$AlleleID

 

 

to name them accordingly. Assuming the alleleID is in you metadata.

 

And then rerun gl2fasta

 

Cheers, Bernd

 

 

==============================================================================

Dr Bernd Gruber                                              )/_         

                                                         _.--..---"-,--c_    

Professor Ecological Modelling                      \|..'           ._O__)_     

Tel: (02) 6206 3804                         ,=.    _.+   _ \..--( /          

Fax: (02) 6201 2328                           \\.-''_.-' \ (     \_          

Institute for Applied Ecology                  `'''       `\__   /\          

Faculty of Science and Technology                          ')                

University of Canberra   ACT 2601 AUSTRALIA

Email: bernd....@canberra.edu.au

WWW: bernd-gruber

 

Australian Government Higher Education Provider Number CRICOS #00212K 

NOTICE & DISCLAIMER: This email and any files transmitted with it may contain
confidential or copyright material and are for the attention of the addressee
only. If you have received this email in error please notify us by email
reply and delete it from your system. The University of Canberra accepts
no liability for any damage caused by any virus transmitted by this email.

==============================================================================

To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/0f8811df-84a6-4d09-9c86-ba05ce5675d1n%40googlegroups.com.

Divya SHAJI

unread,
May 29, 2023, 8:24:48 PM5/29/23
to da...@googlegroups.com
Dear Dr. Bernd Gruber,

Thank you so much for your help.

locNames(gl) returned AlleleIDs.  However, Fasta header remains the same.

> locNames(gl)
  [1] "100049687-12-A/G" "100049698-16-C/T" "100049728-23-T/G"
  [4] "100049805-56-A/T" "100049816-51-C/T" "100049839-39-A/T"
  [7] "100049926-33-C/T" "100049990-20-G/T" "100050079-57-T/G"

> pop(gl)<- rep("",nInd(gl))
indNames(gl)<- gl@other$ind.metrics$phylo_label
> gl2fasta(gl, method=3, outfile="filename.fas", outpath=getwd())
Starting gl2fasta
  Processing genlight object with SNP data
  Warning: Dataset contains monomorphic loci which will be included in the output fasta file
  Assigning ambiguity codes to heterozygote SNPs, concatenating SNPs
  Removing loci for which SNP position is outside the length of the trimmed sequences
Generating haplotypes ... This may take some time
Completed: gl2fasta
NULL

Regards,
Divya.

Bernd.Gruber

unread,
May 29, 2023, 8:42:13 PM5/29/23
to da...@googlegroups.com

Hi Divya,

 

My mistake the gl2fasta is by individuals

 

 

Can you make sure that there is no typo

 

gl@other$ind.metrics$phylo_label

 

should produce the individuals names can you check that.

Divya SHAJI

unread,
May 29, 2023, 8:46:32 PM5/29/23
to da...@googlegroups.com
Hi Dr.Bernd Gruber,

It produced NULL

> gl@other$ind.metrics$phylo_label
NULL

Divya

Bernd.Gruber

unread,
May 29, 2023, 8:49:26 PM5/29/23
to da...@googlegroups.com

Then your phylolabel are not set correctly, you need to find one or just

 

 

indNames(gl) <- paste0(“Ind_”,1:nInd(gl))

Divya SHAJI

unread,
May 29, 2023, 9:07:08 PM5/29/23
to da...@googlegroups.com
Hi Dr.Bernd,

> indNames(gl) <- paste0(“Ind_”,1:nInd(gl))locNames(gl)
Error: unexpected input in "indNames(gl) <- paste0(“"

I applied the following code. The output is a fasta file, but the sequence headers are same.

> library(dartR)

> gl <- gl.read.dart(filename="/home/user/Desktop/testset.csv")

> saveRDS(gl, file="gl.Rdata")

> gl <- readRDS("gl.Rdata")

> pop(gl)<- rep("",nInd(gl))

indNames(gl)<- gl@other$ind.metrics$phylo_label

> gl2fasta(gl, method=3, outfile="filename.fas", outpath=getwd())


Thank you for your help.


Regards,

Divya.



Bernd.Gruber

unread,
May 29, 2023, 10:45:11 PM5/29/23
to da...@googlegroups.com

You need to retype the quotes, if you copy it from the email they become the wrong type of character.

 

Cheers, Benrd

Jose Luis Mijangos

unread,
May 29, 2023, 11:09:58 PM5/29/23
to dartR
Hi Divya,

The function gl2fasta first concatenates individual names and the population name of each individual, and then this information is used as the name for each FASTA sequence.

First, check that the fields for individual names and population names are there:

> indNames(testset.gl)
> popNames(testset.gl)

If these fields are not there, you will see the word "NULL" in the R console.
One way to input this information into these fields is by using a CSV file when you read your DArT report into dartR using the function gl.read.dart(). Please check our tutorial starting on page 8 regarding how to input this information into dartR:

http://georges.biomatix.org/storage/app/media/uploaded-files/tutorial3adartrdatastructuresandinput22-dec-21-2.pdf

You can find our other tutorials at:

http://georges.biomatix.org/dartR 

Alternatively, you could assign dummy names, for example:

> # assigning numbers as individual names
> indNames(testset.gl) <- as.character(1:nInd(testset.gl))
> # assigning "pop1" to all individuals
> pop(testset.gl) <- rep("pop1",nInd(testset.gl))

Then, when using the gl2fasta function, each sequence will be named:

> gl2fasta(testset.gl, method=3, outfile="filename.fas", outpath=getwd())

Cheers,
Luis


Reply all
Reply to author
Forward
0 new messages