ClusterConsensusSequence

83 skatījumi
Pāriet uz pirmo nelasīto ziņojumu

Swati Puranik

nelasīta,
2023. gada 18. janv. 06:02:5918.01.23
uz dartR
Hello,
My locus metadata has all the typical loc.metrics. The "TrimmedSequence" is missing  which can be a possibility according to the manual. But it has a column named "ClusterConsensusSequence" after allele sequence. I could not find any query related to the presence of loc.metric "ClusterConsensusSequence" in this group. I am wondering if someone knows that "TrimmedSequence" and "ClusterConsensusSequence" are one and the same metrices or are they different? 

I am trying to use the function "gl.report.bases" but it doesn't work due to the absence of "TrimmedSequence" and I get the following error:
Fatal Error: Dataset does not include variable TrimmedSequence!

Thank you.

Best regards
Swati

Jose Luis Mijangos

nelasīta,
2023. gada 19. janv. 04:22:1319.01.23
uz dartR
Hi Swati,

Was DArT the provider of your dataset?
It seems so.

looking at the subset of your dataset you supplied me, I had never encountered the fields "AlleleSequence" and "ClusterConsensusSequence" in DArT data previously.
Perhaps these are new features added by DArT.

The field "AlleleSequence" seems to be the "TrimmedSequence" field. In such situation, just create the field "TrimmedSequence" as shown below.

gl$other$loc.metrics$TrimmedSequence <- gl$other$loc.metrics$AlleleSequence
ReportBases <- gl.report.bases (gl, plot.out = TRUE, save2tmp = FALSE, verbose = NULL)

I would suggest contacting DArT to ensure this is the case.

Cheers,
Luis

Swati...

nelasīta,
2023. gada 19. janv. 05:38:5319.01.23
uz da...@googlegroups.com
Dear Luis,

Thank you for checking the data and your suggestions. I will contact the provider to get more information.

Best regards
Swati

--
You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/_0aB6kVCw44/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/fbde7447-542d-40fa-a631-6b4a32154798n%40googlegroups.com.


--
Best Regards
Swati
----------------------------------------------------------------------------------
SWATI PURANIK, Ph.D., MSCA Fellow
Junior Researcher 
Global Change Research Institute (CzechGlobe), Czech Academy of Sciences
Bělidla 986/4a, 603 00 Brno; Czech Republic 
E-mail: swatipu...@gmail.com

Google Scholar | ResearchGate
----------------------------------------------------------------------------------

Swati Puranik

nelasīta,
2023. gada 10. febr. 05:18:0510.02.23
uz dartR
Dear Luis,

So I got it cleared that the field "AlleleSequence" is the "TrimmedSequence" field. Using the commands you sent, I replaced that field and so now  the function "gl.report.bases"  works.
I was also able to calculate the % of transitions and transversions. 

> test <- readRDS("C:/bbbbb/cccccc/xxxx/yyyy/test.rds") 
> test$other$loc.metrics$TrimmedSequence <- test$other$loc.metrics$AlleleSequence 
> ReportBases <- gl.report.bases (test, plot.out = TRUE, save2tmp = FALSE, verbose = NULL)
   Starting gl.report.bases 
   Processing genlight object with SNP data 
   Counting the bases 
   Counting Transitions and Transversions 
   Average trimmed sequence length: 69 ( 69 to 69 ) 
   Total number of trimmed sequences: 400 
   Base frequencies (%) 
   A: 22.06 
   G: 30.92 
   T: 19.04 
   C: 27.98 
  Transitions : 59.25 
  Transversions: 40.75 
   tv/ts ratio: 1.454


Is there any way that the command also returns the actual number of Tv and Ts? I wanted to look at the actual number of A/G and C/T transitions and A/C, A/T,  C/G and G/T transversions. Something like this below:


Thank you.

Best regards
Swati

Arthur Georges

nelasīta,
2023. gada 12. febr. 17:06:0612.02.23
uz da...@googlegroups.com
Hi Swati,

My understanding is that the "AlleleSequence" field contains the sequence tags including the adaptors, and that the "TrimmedSequence" field contains the sequence tags with the adaptors stripped out. You need to explicitly request the "TrimmedSequence" when you submit your service to DArT.

The impact on a script like gl.report.bases is that the bases in the adaptor sequence will be included if you use "AlleleSequence", so care on that front. Inclusion of adaptors should not affect tv/tr ratios because the adaptor sequence does not contain any SNPs.

I have no idea what the "ClusterConsensusSequence" is. It may be an intermediate field from the DArT pipeline that the operator neglected to filter out.

The idea of a table of tvs and trs is a good one, and we can add that to the script gl.report.bases in a future release.

All the best. A





--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/30551e85-7b92-4698-9ca8-270d054be6e5n%40googlegroups.com.

Swati...

nelasīta,
2023. gada 15. febr. 08:35:5815.02.23
uz da...@googlegroups.com
Dear Arthur,

Thank you for the explanation and heads-up! I appreciate taking up the suggestion to add Tvs and Ts in the future packages.

I have two more doubts which I hope you (or Luis) may help to get cleared.

1. Just like how "gl.report.heterozygosity" can provide He, Ho and FIS values for each individual population, is there a way to calculate PIC and MAF values also for each population?

2. I have a mixed ploidy dataset. However, the output of "ploidy(gl)" results in all individuals being reported as diploid (2). I have set 2x, 4x, 6x as "pop" in the input file Individual Metadata but I am not sure if anything needs to be changed in the input file Locus metadata or how to do it? Is there a command to change the ploidy level of individuals?

Thank you.

best regards
Swati

You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/_0aB6kVCw44/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/CAH6j3HW1F4SffTxY7uorsS%3Dmx7TAq6fsuEteZQodQ4sVw6FyCg%40mail.gmail.com.

Jose Luis Mijangos

nelasīta,
2023. gada 15. febr. 20:32:0815.02.23
uz dartR
Hi Swati,

1. You could use the below code to calculate MAF and PIC by population:

library(dartR)
# test dataset
t1 <- platypus.gl
# separating populations and storing them in a list
t1 <- seppop(t1)
# using lapply to recalculate loc metrics in every population
t1 <- lapply(t1,gl.recalc.metrics)
# MAF
t1$SEVERN_ABOVE$other$loc.metrics$maf
# PIC
t1$SEVERN_ABOVE$other$loc.metrics$AvgPIC

2. dartR is designed to analyse diploid data. A nice article describing software to analyse polyploid data is Meirmans, Patrick G., Shenglin Liu, and Peter H. van Tienderen. "The analysis of polyploid genetic data." Journal of Heredity 109.3 (2018): 283-296.

Cheers,
Luis

Swati Puranik

nelasīta,
2023. gada 11. aug. 08:40:3011.08.23
uz dartR
Dear Luis,

Hello. I have tried to look but I am wondering if there is also a code which I can use to calculate different genetic diversity indices (allelic richness, Ho, uHe, Shannon div index, polyLoc, monoLoc etc.) for each individual/genotype within a particular population? I mean functions such as gl.report.heterozygosity or gl.report.diversity supply us a value with respect to number of populations. How can I also use these functions to identify in what are the values of these parameters in every genotype?

Thank you.

Best regards
Swati

Jose Luis Mijangos

nelasīta,
2023. gada 14. aug. 20:44:4814.08.23
uz dartR
Hi Swati,

You can calculate heterozygosity by individual using the function below:

> library(dartR)
> res <- gl.report.heterozygosity(platypus.gl, method="ind")

Cheers,
Luis 

Atbildēt visiem
Atbildēt autoram
Pārsūtīt
0 jauni ziņojumi