Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Qvalue R Package Download

15 views
Skip to first unread message

Carlee Panella

unread,
Jan 2, 2024, 2:50:10 PM1/2/24
to
A lot of papers I've read have used the qvalue package from Bioconductor to do multiple testing correction. There is a function in the basic R stats package called p.adjust which seems to do the same thing. What is the advantage of the qvalue package, and why is it so frequently used instead of the built in R version?


p.adjust() and the qvalue package aren't actually doing exactly the same thing. They are doing quite similar things with similar ends in mind, but the algorithms are different, so they produce different results (Gordon Smyth, who wrote p.adjust(), has a short summary of the history here and Tim Triche gives a nice explanation of the differences in the same thread). Interestingly, the packages I use almost always use p.adjust() rather than the qvalue package, though if you read a paper that uses them the authors might actually say "q-value" rather than "BH adjusted p-value".



qvalue r package download

Download File https://vissecnoaha.blogspot.com/?pb=2x1lob






I know I'm a bit late to the party, but I wanted to point out that the qvalue package is not necessarily tailored to genomic studies. The qvalue algorithm has underlying assumptions that only allow its applicability when there is no dependence in the data, and with a specific p-value distribution. IF there is dependence in the data, and a "U-shaped" p-value distribution, then qvalue is not applicable (according to the q-value package vignette). I'm honestly not sure if dependence affects the BH method of p.adjust.


Thanks for the clarification. My understanding is that the BH procedure still controls the FDR when the tests statistics are positively correlated. The BY modification can be applied for other cases but is more conservative.

Edit: I associated qvalue with genomics studies from this paper.


Yup that paper is the same paper that I associate the q-value with as well. The vignette I was speaking of was from the R/Bioconductor qvalue package, which I think was developed from the authors of the PNAS paper. Here is the vignette, specifically section 5.2 talks about the applicability of the method with different pvalue distributions.






This package takes a list of p-values resulting from the simultaneous testing of many hypotheses and estimates their q-values and local FDR values. The q-value of a test measures the proportion of false positives incurred (called the false discovery rate) when that particular test is called significant. The local FDR measures the posterior probability the null hypothesis is true given the test's p-value. Various plots are automatically generated, allowing one to make sensible significance cut-offs. Several mathematical results have recently been shown on the conservative accuracy of the estimated q-values from this software. The software can be applied to problems in genomics, brain imaging, astrophysics, and data mining.


Package "qvalue" was not available for R version 3.6.1So I installed "BiocManager" package instead, but could't find the examplary code.Anyone have tried it? Or please tell me alternative ways to calculate q-value.


I'm trying to use qvalue package in R to work out the proportion of true alternative tests. However, when I use the package, I get $\pi_0$ (proportion of true nulls) to be > 1, using a small $\lambda$ tuning parameter. Also, $\pi_0(\lambda)$ seems to decrease as $\lambda$ increases.


I would encourage you to perhaps rethink your $\lambda$ selection or look into alternate methods for estimating $\pi_0$. Storey's 2011 paper accompanying the qvalue package provides a good intro and references. Additionally, full code for the qvalue package is found here.


This package takes a list of p-values resulting from the simultaneoustesting of many hypotheses and estimates their q-values. Theq-value of a test measures the proportion of false positives incurred(called the false discovery rate) when that particular test is calledsignificant. Various plots are automatically generated, allowing oneto make sensible significance cut-offs. Several mathematical resultshave recently been shown on the conservative accuracy of the estimatedq-values from this software. The software can be applied to problemsin genomics, brain imaging, astrophysics, and data mining. Tags: Biology: Proteins, Software Development: GNU R Development, Field: field::biology, field::biology:bioinformatics, Medicine, field::statistics, implemented-in::r, User Interface: Command Line, Role: role::program, use::comparing, Works with: Biological Sequence


In the following, we show how the pcadapt packagecan perform genome scans for selection based on individual genotypedata. We show how to run the package using the examplegeno3pops that contains genotype data. A total of 150individuals coming from three different populations were genotyped at1,500 diploid markers. Simulations were performed with simuPOP using a divergencemodel assuming that 150 SNPs confer a selective advantage. To run thepackage on the provided example, just copy and paste shadedR chunks.


Now perform a t-test (use rowttests) comparing CEU samples processed in 2003 to those processed in 2004. Then use the qvalue package to obtain q-values for each gene. How many genes have q-values less than 0.05?


Now we are going to compare ethnicities as was done in the original publication in which these data were first presented. Use the qvalue function to compare the ASN population to the CEU population. Once again, use the qvalue function to obtain q-values. How many genes have q-values Using the functions rowttests and qvalue compare the two groups. Because this is a smaller dataset which decreases our power, we will use the more lenient FDRcut-off of 10%. How many gene have q-values less than 0.1?


This function is adapted from package qvalue. The reason we provide our own copy is that package qvalue contains additional functionality that relies on Tcl/Tk which has led to multiple problems.Our copy does not require Tcl/Tk.


This short tutorial presents the main steps of a population genomic data set analysis using the R computer package LEA. A first objective of the tutorial is to illustrate the use of the functions snmf and lfmm for running large-scale ancestry analyses and for performing genome scans for local adaptation. A second objective is to provide guidelines for controlling false discovery rates in genome scans for selection.


LEA can handle several classical formats for genotypic matrices. More specifically, the package uses the lfmm and geno formats, and provides functions to convert from other formats such asn ped, vcf, and ancestrymap. Ecological variables must be formatted in the env format. In this tutorial, we process data simulated for 165 diploid individuals genotyped at 500 SNPs.The SNP data are available as genotype.lfmm.


However, under IATA and ICAO, if the first of the individual bottles in the above mentioned combination packaging is 5 L, then that first bottle inside the combination package would already have a Q value of 1. That is because if you divide the 5 L in that first bottle into the maximum allowed in the outer container, 5 L, you will arrive at the number 1. So the container could not be shipped with those other hazardous material in the same outer container because the combination packaging with all its different dangerous goods inside the outer package would immediately exceed the Q value of 1.


For today's lesson, we will use data from the Bioconductor package airway. The airway data is from Himes et al. (2014). These data, which are contained within a RangedSummarizedExperiment, object are from a bulk RNAseq experiment. In the experiment, the authors "characterized transcriptomic changes in four primary human ASM cell lines that were treated with dexamethasone," a common therapy for asthma. The airway package includes RNAseq count data from 8 airway smooth muscle cell samples. Each cell line includes a treated and untreated negative control. Differential expression testing using DESeq2 was applied to these data.


We have already mapped gene symbols and Entrez IDs to our data using AnnotationDbi and EnsDb.Hsapiens.v75. However, clusterProfiler does offer support for gene ID conversion using the functions bitr() and bitr_kegg(). bitr() uses the OrgDb packages, of which there are 19 databases. bitr_kegg() unsurprisingly uses KEGG organism annotations. Use search_kegg_organism() to find the appropriate kegg code.


clusterProfiler, along with complementary packages, can easily be used to generate functional enrichment results using over-representation analysis from the following databases: GO, KEGG, DOSE, REACTOME, Wikipathways, DisGeNET, network of cancer genes.


p.adjust may potentially be more conservative and robust than q-value. See the following discussions:

- conceptual question about FDR, FDR adjusted p-value and q-value

- Why use the qvalue package instead of p.adjust in R stats?


Because of the nature of GO, organized as a directed acyclic graph, parent terms can show enrichment due to over represented child terms. To reduce redundancy among enriched terms, clusterProfiler uses a simplify() function that uses the "GOSemSim package to calculate semantic similarities among enriched GO terms using multiple methods based on information content or graph structure." GO terms with >0.7 similarity are removed and represented by the most significant term (Wu et al. 2021).


Using clusterProfiler you can conduct ORA with GO, KEGG, MKEGG (KEGG modules), and WikiPathways. Using complementary packages (i.e., DOSE, ReactomePA), ORA can also be conducted with Disease Ontology (DO), Reactome, network of cancer genes, and DisGeNET.


Similar to ORA, GSEA can be performed using GO (gseGO), KEGG pathways (gseKEGG), KEGG modules (gseMKEGG), and WikiPathways (gseWP). With the DOSE package, you can also use gseDGN, gseDO, and gseNCG, and with ReactomePA (gsePathway).


Testing multiple hypotheses simultaneously increases the number of false positive findings if the corresponding p-values are not corrected. While this multiple testing problem is well known, the classic and advanced correction methods are yet to be implemented into a coherent Python package. This package sets out to fill this gap by implementing methods for controlling the family-wise error rate (FWER) and the false discovery rate (FDR).

35fe9a5643



0 new messages