Stainless steel tanks are used to hold 3,000 US gallons (11,000 L; 2,500 imp gal) or more. Specialists must consider the mixer and temperature control system carefully, and also the formula instructions, the correct types, and the amounts of raw materials at specified times by using computer controls. This process consists of 3 methods: firstly, compounding the batch; secondly, quality control check; and thirdly, filling and packing.[8]
In the first phase, water will be filled into the main batch tank. The suspending agents and some of the ingredients will be added in this phase. Mixing is implemented at low rate for adequate dispersion. During the mixing, there is no air added into the mixture.
In the second phase, pigment is in the process. It will be added in suitable amount into the water. Mixing is implemented at very high rate in this process which is different from the first phase. In the mixing process, when the particle is small enough, it will be added into the main batch.
A batch code is an identification code assigned to a product batch that contains information such as manufacturer code, production date, etc. Expiration dates are often attached for products that have a certain age, such as food or medicine.
To track the batch number every time an object enters (purchased or produced) and exits, in addition to recording the item number, the batch number must also be recorded. By knowing about batch code, then we can move to another question. How to track it?
However, if the number of items is significant, tracking the batch number manually is not only tiring but also makes it easy to make mistakes. Therefore we need a system that can track batch numbers automatically.
Correcting a heterogeneous dataset that presents artefacts from severalconfounders is often an essential bioinformatics task. Attempting to removethese batch effects will result in some biologically meaningful signals beinglost. Thus, a central challenge is assessing if the removal of unwantedtechnical variation harms the biological signal that is of interest to theresearcher.
We describe a novel framework, B-CeF, to evaluate the effectiveness ofbatch correction methods and their tendency toward over or under correction.The approach is based on comparing co-expression of adjusted gene-gene pairsto a-priori knowledge of highly confident gene-gene associations based onthousands of unrelated experiments derived from an external reference. Ourframework includes three steps: (1) data adjustment with the desired methods(2) calculating gene-gene co-expression measurements for adjusted datasets(3) evaluating the performance of the co-expression measurements against agold standard. Using the framework, we evaluated five batch correctionmethods applied to RNA-seq data of six representative tissue datasets derivedfrom the GTEx project.
Our framework enables the evaluation of batch correction methods to betterpreserve the original biological signal. We show that using a multiple linearregression model to correct for known confounders outperforms factoranalysis-based methods that estimate hidden confounders. The code is publiclyavailable as an R package.
Although ultrahigh-throughput sequencing technologies for gene expressionprofiling that measure the expression levels of thousands of genes in asingle experiment present a promising technique to discover novel biomedicalphenomena, they may suffer from artifacts that can delay the discovery. Theadjustment of heterogeneous gene expression data that present noise generatedby a single or multiple confounding factors needs to be taken into account.Attempting to remove batch effects may result in over fitting, which resultsin the loss of some of the biologically meaningful components of themeasurement (i.e., signal). Thus, evaluating the results of the adjustmentmethods is as pivotal as the batch effect removal process itself [1]. Thelack of such evaluation tools may even result in an elevated distortion ofthe data following adjustment, introducing serious errors in the results ofany downstream analysis performed. For example, a loss of an expectedbiological signal of healthy and diseases colorectal/breast cancer patientswas detected following batch correction with PCA (principle componentanalysis) based method [2] and the work in [3] evaluated the extent to whichvarious batch correction algorithms remove true biological heterogeneityusing replicate samples. A pivotal challenge thus arises of how to determinewhether an adjustment assists or damages the biological (i.e., non-technical)signal in the data.
Batch correction approaches can be roughly divided into three categories: (1)those aimed at removing known covariates, e.g., ComBat [4], which applies anempirical Bayes approach, (2) those aimed at removing unknown covariates,e.g., inferring hidden covariates using principal components [5] or factoranalysis [6], and (3) those aimed at removing both known and unknowncovariates. Several powerful approaches aimed at correcting hidden batcheffects prior to differential expression analysis were suggested [7-11]. TheSurrogate Variable Analysis (SVA) method [8] and its SVAseq [9] extension forRNA-seq data, used SVD (singular value decomposition) to define hiddenconfounders on the signal removed residual matrix. The method usespermutation tests to choose the significant singular vectors, finds a subsetof genes that account for them and finally creates a surrogate vector foreach gene subset. Focusing on detecting biological heterogeneity, the pSVAapproach [3] reverses the common application of SVA to estimate biologicalheterogeneity as those features measured from genes not associated with ana-priori known technical covariates in the model matrix. The SVAPLSseq [10]method estimates hidden confounders using partial least square regressionmodel of the original expression matrix on the primary signal removedexpression matrix or using a set of control features. The RUV-2 method [11]suggested adjusting for batch effects using the variation between conditionsof a-priori negative control genes known not to be altered and related to thebiological factor of interest (i.e., not differentially expressed). Usingfactor analysis, the negative control genes were incorporated into a linearregression model to adjust for unwanted variation in a dataset resulting frombatch effects. These methods are dedicated to a downstream differentialexpression analysis that takes into account the differential biologicalvariation between the contrasted groups supervising their computation. Thismakes it less than intuitive to be utilized for the unsupervised batchcorrection computation required for a downstream co-expression analysis.
Recently, several combined methods were developed to account for dataovercorrection. They were mostly based on assessing data variation orreducing it using factor or principle component analysis combined with priorknowledge (e.g., known batches). For example, the Harman method [12] refinedprincipal component analyses using known batch effects to adjust for datavariation related to known batches. They generated principal components onper-batch-summation of the original data. A p-value for the significance of the batch-related first principal componentvariation is then used for the data adjustment. The HCP (Hidden Covariateswith Prior) method [13] also refined principal components-based analysesusing known batches. To asses their method, they evaluated the accuracy ofthe constructed co-expression network (gene-gene pairs from thebatch-corrected expression datasets) to predict functional networks based ongene ontology (GO) categories. Inferred hidden confounder factors, PEERfactors [6], were used to adjust for batch effects for the GTEx humantissues-dataset [14-16]. With the aim of generating co-expression networks,[14] followed the methodology suggested in [13] to preserve the desiredbiological signal and used GO categories to quantify the reasonable numbersof principal components to be adjusted in each tissue with respect to theoptimal GO enrichment. The work at [17] used a-priori knowledge on the truenoise to evaluate adjustment methods. They used control data of technicalreplicates (comparing their correlation before and after batch adjustments)and principal component analysis on simulated data.
Here we present B-CeF (Batch Correction Evaluation Framework), a novel framework for assessment of batch correction approaches onactual data considering the genuine biological signal left. Focusing on thedesired downstream co-expression analysis following the batch correction, wesuggest computing a metric that compares the biological signal left in theadjusted datasets, represented by gene-gene co-expression, to an a-prioriexternal knowledgebase, a gold standard, of a genuine biological signal. Thegold standard, derived from the GIANT database [18], is represented by a setof actual high confident gene-gene associations based on co-expression andprotein-interaction networks derived from thousands of experiments. We usethe B-CeF methodology to evaluate five batch correction methodologies appliedto six representative tissues from the GTEx dataset [15, 16].
The B-CeF assessment framework uses a-priori gene-gene true and false associations to evaluate the effectiveness of batch correction methods topreserve meaningful biological signals (see Fig. 1 for schematic overview). Atrue gene-gene association is defined as two genes that are verified to beco-associated across multiple biological conditions (i.e., based onco-expression and biological interactions, see Methods), and false association is defined as two genes that are thought to not be associated.An adjustment method is considered as being effective if the number of truepositive or true negative pairs in the adjusted dataset increases withrespect to raw unadjusted data. Specifically, the steps of our methodologyinclude: (1) construct the a-priori gold standard of high probability trueand false gene-gene pairs (co-associations); (2) construct for the adjusteddataset a corresponding set of gene-gene pairs and their correlationcoefficients and p-values estimation, and finally (3) evaluate the performance of eachadjustment method using these p-values as scores against the gold standardpairs for generating ROC curves and AUC (see Methods). We demonstrate theB-CeF methodology by contrasting five batch correction methods and raw data.
aa06259810