ClusterConsensusSequence

Swati Puranik

nelasīta,

2023. gada 18. janv. 06:02:5918.01.23

uz dartR

Hello,

My locus metadata has all the typical loc.metrics. The "TrimmedSequence" is missing which can be a possibility according to the manual. But it has a column named "ClusterConsensusSequence" after allele sequence. I could not find any query related to the presence of loc.metric "ClusterConsensusSequence" in this group. I am wondering if someone knows that "TrimmedSequence" and "ClusterConsensusSequence" are one and the same metrices or are they different?

I am trying to use the function "gl.report.bases" but it doesn't work due to the absence of "TrimmedSequence" and I get the following error:

Fatal Error: Dataset does not include variable TrimmedSequence!

Thank you.

Best regards

Swati

Jose Luis Mijangos

nelasīta,

2023. gada 19. janv. 04:22:1319.01.23

uz dartR

Hi Swati,

Was DArT the provider of your dataset?
It seems so.

looking at the subset of your dataset you supplied me, I had never encountered the fields "AlleleSequence" and "ClusterConsensusSequence" in DArT data previously.
Perhaps these are new features added by DArT.

The field "AlleleSequence" seems to be the "TrimmedSequence" field. In such situation, just create the field "TrimmedSequence" as shown below.

gl$other$loc.metrics$TrimmedSequence <- gl$other$loc.metrics$AlleleSequence
ReportBases <- gl.report.bases (gl, plot.out = TRUE, save2tmp = FALSE, verbose = NULL)

I would suggest contacting DArT to ensure this is the case.

Cheers,

Luis

Swati...

nelasīta,

2023. gada 19. janv. 05:38:5319.01.23

uz da...@googlegroups.com

Dear Luis,

Thank you for checking the data and your suggestions. I will contact the provider to get more information.

Best regards

Swati

--
You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/_0aB6kVCw44/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/fbde7447-542d-40fa-a631-6b4a32154798n%40googlegroups.com.

--

Best Regards

Swati

----------------------------------------------------------------------------------
SWATI PURANIK, Ph.D., MSCA Fellow

Junior Researcher

Global Change Research Institute (CzechGlobe), Czech Academy of Sciences
Bělidla 986/4a, 603 00 Brno; Czech Republic
E-mail: s watipu...@gmail.com
Google Scholar | ResearchGate

ORCID ID: https://orcid.org/0000-0003-1787-1550

----------------------------------------------------------------------------------

Swati Puranik

nelasīta,

2023. gada 10. febr. 05:18:0510.02.23

uz dartR

Dear Luis,

So I got it cleared that the field "AlleleSequence" is the "TrimmedSequence" field. Using the commands you sent, I replaced that field and so now the function "gl.report.bases" works.

I was also able to calculate the % of transitions and transversions.

> test <- readRDS("C:/bbbbb/cccccc/xxxx/yyyy/test.rds")

> test$other$loc.metrics$TrimmedSequence <- test$other$loc.metrics$AlleleSequence

> ReportBases <- gl.report.bases (test, plot.out = TRUE, save2tmp = FALSE, verbose = NULL)

Starting gl.report.bases

Processing genlight object with SNP data

Counting the bases

Counting Transitions and Transversions

Average trimmed sequence length: 69 ( 69 to 69 )

Total number of trimmed sequences: 400

Base frequencies (%)

A: 22.06

G: 30.92

T: 19.04

C: 27.98

Transitions : 59.25

Transversions: 40.75

tv/ts ratio: 1.454

Is there any way that the command also returns the actual number of Tv and Ts? I wanted to look at the actual number of A/G and C/T transitions and A/C, A/T, C/G and G/T transversions. Something like this below:

Thank you.

Best regards

Swati

Arthur Georges

nelasīta,

2023. gada 12. febr. 17:06:0612.02.23

uz da...@googlegroups.com

Hi Swati,

My understanding is that the "AlleleSequence" field contains the sequence tags including the adaptors, and that the "TrimmedSequence" field contains the sequence tags with the adaptors stripped out. You need to explicitly request the "TrimmedSequence" when you submit your service to DArT.

The impact on a script like gl.report.bases is that the bases in the adaptor sequence will be included if you use "AlleleSequence", so care on that front. Inclusion of adaptors should not affect tv/tr ratios because the adaptor sequence does not contain any SNPs.

I have no idea what the "ClusterConsensusSequence" is. It may be an intermediate field from the DArT pipeline that the operator neglected to filter out.

The idea of a table of tvs and trs is a good one, and we can add that to the script gl.report.bases in a future release.

All the best. A

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/30551e85-7b92-4698-9ca8-270d054be6e5n%40googlegroups.com.

Swati...

nelasīta,

2023. gada 15. febr. 08:35:5815.02.23

uz da...@googlegroups.com

Dear Arthur,

Thank you for the explanation and heads-up! I appreciate taking up the suggestion to add Tvs and Ts in the future packages.

I have two more doubts which I hope you (or Luis) may help to get cleared.

1. Just like how "gl.report.heterozygosity" can provide He, Ho and FIS values for each individual population, is there a way to calculate PIC and MAF values also for each population?

2. I have a mixed ploidy dataset. However, the output of "ploidy(gl)" results in all individuals being reported as diploid (2). I have set 2x, 4x, 6x as "pop" in the input file Individual Metadata but I am not sure if anything needs to be changed in the input file Locus metadata or how to do it? Is there a command to change the ploidy level of individuals?

Thank you.

best regards

Swati

You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/_0aB6kVCw44/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/CAH6j3HW1F4SffTxY7uorsS%3Dmx7TAq6fsuEteZQodQ4sVw6FyCg%40mail.gmail.com.

Jose Luis Mijangos

nelasīta,

2023. gada 15. febr. 20:32:0815.02.23

uz dartR

Hi Swati,

1. You could use the below code to calculate MAF and PIC by population:

library(dartR)
# test dataset
t1 <- platypus.gl
# separating populations and storing them in a list
t1 <- seppop(t1)
# using lapply to recalculate loc metrics in every population
t1 <- lapply(t1,gl.recalc.metrics)
# MAF
t1$SEVERN_ABOVE$other$loc.metrics$maf
# PIC
t1$SEVERN_ABOVE$other$loc.metrics$AvgPIC

2. dartR is designed to analyse diploid data. A nice article describing software to analyse polyploid data is Meirmans, Patrick G., Shenglin Liu, and Peter H. van Tienderen. "The analysis of polyploid genetic data." Journal of Heredity 109.3 (2018): 283-296.

Cheers,

Luis

Swati Puranik

nelasīta,

2023. gada 11. aug. 08:40:3011.08.23

uz dartR

Dear Luis,

Hello. I have tried to look but I am wondering if there is also a code which I can use to calculate different genetic diversity indices (allelic richness, Ho, uHe, Shannon div index, polyLoc, monoLoc etc.) for each individual/genotype within a particular population? I mean functions such as gl.report.heterozygosity or gl.report.diversity supply us a value with respect to number of populations. How can I also use these functions to identify in what are the values of these parameters in every genotype?

Thank you.

Best regards

Swati

Jose Luis Mijangos

nelasīta,

2023. gada 14. aug. 20:44:4814.08.23

uz dartR

Hi Swati,

You can calculate heterozygosity by individual using the function below:

> library(dartR)
> res <- gl.report.heterozygosity(platypus.gl, method="ind")

Cheers,

Luis

Atbildēt visiem

Atbildēt autoram

Pārsūtīt