Statistics in real-time quantitative PCR


STATISTICS AND GENE EXPRESSION ANALYSIS
by  Terry Seed



Why do we measure gene expression? The most common experiment is comparative: we want to compare the mRNA levels of one or more genes in cells from different sources. Comparisons of interest include tumour vs normal cells, cells from a specific organ in a mutant or genetically modified organism vs cells from the same organ in a normal organism of the same strain, and cells before and after an intervention such as a drug treatment. Another important class is the time-course experiments, where cells are sampled at different times, e.g. after the administration of a drug, or as the cell cycle or development proceeds, and interest is in temporal patterns of gene expression. Yet other experiments focus on spatial patterns of gene expression. There are many other kinds of gene expression experiments, essentially as many as there are organisms, cell types and conditions of biological interest.
How do we measure gene expression? As stated above, there are many techniques for doing so, but most rely on DNA-RNA or DNA-DNA hybridization. This is the process through which single-stranded DNA or RNA molecules and and base-pair with their complementary sequences amidst a complex mixture of many molecules of the same kind. The terminology we adopt names the sequence representing a gene of interest the probe, while the pool within which a complemen-tary copy of the probe is sought is named the target DNA or RNA. Other terminologies are the reverse of ours.
On what scale do we measure gene expression? Much of the recent interest by statisticians in this area stems from the availability of data sets giving expression measurements on tens of thousands of genes,so-called microarray gene expression data. However, nylon membrane filters with thousands of genes spotted on them have been around for over a decade, and smaller-scale quantitative expression data for much longer. We begin with a discussion of the first and simplest method of quantifying RNA, as many of the features of the high-throughput methods are already present here.

Real-time PCR Statistics
Joshua S. Yuan and C. Neal Stewart Jr.
PCR Encyclopedia (2005): 101127-49     http://www.pcr-encyclopedia.com/
Department of Plant Sciences and Genomics Hub, University of Tennessee, Knoxville, TN 37996, USA


Real-time quantitative RT-PCR: design, calculations, and statistics.
Rieu I, Powers SJ.
Plant Cell. 2009 21(4): 1023
Two recent letters to the editor of The Plant Cell (Gutierrez et al., 2008; Udvardi et al., 2008) highlighted the importance of following correct experimental protocol in quantitative RT-PCR (qRT-PCR). In these letters, the authors outlined measures to allow precise estimation of gene expression by ensuring the quality of material, refining laboratory practice, and using a normalization of relative quantities of transcripts of genes of interest (GOI; also called target genes) where multiple reference genes have been analyzed appropriately. In this letter, we build on the issues raised by considering the statistical design of qRT-PCR experiments, the calculation of normalized gene expression, and the statistical analysis of the subsequent data. This letter comprises advice for taking account of, in particular, the first and the last of these three vital issues. We concentrate on the situation of comparing transcript levels in different sample types (treatments) using relative quantification, but many of the concerns, particularly those with respect to design, are equally applicable to absolute quantification.

Statistical Selection of Maintenance Genes for Normalization of Gene Expressions.
Yifan Huang Jason C. Hsu† Mario Peruggia‡ Abigail A. Scott
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 4

Maintenance genes can be used for normalization in the comparison of gene expressions. Even though the absolute expression levels of maintenance genes may vary considerably among different tissues or cells, a set of maintenance genes may provide suitable normalization if their expression levels are relatively constant in the specific tissues or cells of interest. A statistical procedure is proposed to select maintenance genes for normalization of gene expression data from tissues or cells of interest. This procedure is based on simultaneous confidence intervals for practical equivalence of relative gene expressions in these tissues or cells. As an illustration, the procedure is applied to the maintenance gene expression data from Vandesompele et al. (2002).

The qPCR Data Statistical Analysis - Integromics White Paper
   Ramon Goni, Patricia García and Sylvain Foissac
   Integromics SL, Madrid Science Park, Santiago Grisolía, 28760 Tres Cantos, Spain

Abstract: Data analysis represents one of the biggest bottlenecks in qPCR experiments and the statistical aspects of the analysis are sometimes considered confusing for the non-expert. In this document we present some of the usual methods used in qPCR data analysis and a practical example using Integromics®' RealTime StatMiner®, the unique software analysis package specialized for qPCR experiments which is compatible with all Applied Biosystems Instruments. RealTime StatMiner® uses a simple, step-by-step analysis workflow guide that includes parametric, non-parametric and paired tests for relative quantification of gene expression, as well as 2-way ANOVA for two-factor differential expression analysis     Link to Integromics web page




Statistical Significance of quantitative PCR.
Yann Karlen , Alan McNair , Sebastien Perseguers , Christian Mazza & Nicolas Mermod
BMC Bioinformatics 2007, 8: 131
Background
PCR has the potential to detect and precisely quantify specific DNA sequences, but it is not yet often used as a fully quantitative method. A number of data collection and processing strategies have been described for the implementation of quantitative PCR. However, they can be experimentally cumbersome, their relative performances have not been evaluated systematically, and they often remain poorly validated statistically and/or experimentally. In this study, we evaluated the performance of known methods, and compared them with newly developed data processing strategies in terms of sensitivity, precision and robustness.

Results
Our results indicate that simple methods that do not rely on the estimation of the efficiency of the PCR amplification may provide reproducible and sensitive data, but that they do not quantify DNA with precision. Other evaluated methods based on sigmoidal or exponential curve fitting were generally of both poor sensitivity and precision. A statistical analysis of the parameters that influence efficiency indicated that it depends mostly on the selected amplicon and to a lesser extent on the particular biological sample analyzed. Thus, we devised various strategies based on individual or averaged efficiency values, which were used to assess the regulated expression of several genes in response to a growth factor.

Conclusions
Overall, qPCR data analysis methods differ significantly in their performance, and this analysis identifies methods that provide DNA quantification estimates of high precision, robustness and reliability. These methods allow reliable estimations of relative expression ratio of two-fold or higher, and our analysis provides an estimation of the number of biological samples that have to be analyzed to achieve a given precision.


Statistical diagnostics emerging from external quality control of real-time PCR.
Marubini E, Verderio P, Raggi CC, Pazzagli M, Orlando C; Italian Network for
Quality Assessment of Tumor Biomakers; Italian Society of Clinical Chemistry and Clinical Molecular Biology.
Institute of Medical Statistics and Biometry, Universita degli Studi di Milano, Milan, Italy.

Orginal Paper: Int J Biol Markers. 2004 Apr-Jun; 19(2): 141-146
Erratum: Int J Biol Markers. 2004 Jul-Sep; 19(3): 256
Besides the application of conventional qualitative PCR as a valuable tool to enrich or identify specific sequences of nucleic acids, a new revolutionary technique for quantitative PCR determination has been introduced recently. It is based on real-time detection of PCR products revealed as a homogeneous accumulating signal generated by specific dyes. However, as far as we know, the influence of the variability of this technique on the reliability of the quantitative assay has not been thoroughly investigated. A national program of external quality assurance (EQA) for real-time PCR determination involving 42 Italian laboratories has been developed to assess the analytical performance of real-time PCR procedures. Participants were asked to perform a conventional experiment based on the use of an external reference curve (standard curve) for real-time detection of three cDNA samples with different concentrations of a specific target. In this paper the main analytical features of the standard curve have been investigated in an attempt to produce statistical diagnostics emerging from external quality control. Specific control charts were drawn to help biochemists take technical decisions aimed at improving the performance of their laboratories. Overall, our results indicated a subset of seven laboratories whose performance appeared to be markedly outside the limits for at least one of the standard curve features investigated. Our findings suggest the usefulness of the approach presented here for monitoring the heterogeneity of results produced by different laboratories and for selecting those laboratories that need technical advice on their performance.

Statistical Inference for Quantitative Polymerase Chain Reaction Using a Hidden Markov Model: A Bayesian Approach
Nadia Lalam, Chalmers University of Technology, Sweden
Statistical Applications in Genetics and Molecular Biology: Vol. 6  : Iss. 1, Article 10.
Quantitative Polymerase Chain Reaction (Q-PCR) aims at determining the initial quantity of specific nucleic acids from the observation of the number of amplified DNA molecules. The most widely used technology to monitor the number of DNA molecules as they replicate is based on fluorescence chemistry. Considering this measurement technique, the observation of DNA amplification by PCR contains intrinsically two kinds of variability. On the one hand, the number of replicated DNA molecules is random, and on the other hand, the measurement of the fluorescence emitted by the DNA molecules is collected with some random error. Relying on a stochastic model of these two types of variability, we aim at providing estimators of the parameters arising in the proposed model, and, more specifically, of the initial amount of molecules. The theory of branching processes is classically used to model the evolution of the number of DNA molecules at each replication cycle. The model is a binary splitting Galton-Watson branching process. Its unknown parameters are the initial number of DNA molecules and the reaction efficiency of PCR, which is defined as the probability of replication of a DNA molecule. The number of DNA molecules is indirectly observed through noisy fluorescence measurements resulting in a so-called Hidden Markov Model. We aim at inference of the parameters of the underlying branching process, and the parameters of the noise from the fluorescence measurements in a Bayesian framework. Using simulations and experimental data, we investigate the performance of the Bayesian estimators obtained by Markov Chain Monte Carlo methods.

Common practice in molecular biology may introduce statistical bias and misleading biological interpretation.
Hocquette JF, Brandstetter AM.
J Nutr Biochem. 2002 Jun;13(6):370-377.
Unite de Recherches sur les Herbivores, Equipe Croissance et Metabolismes du
Muscle, Theix, 63122, Saint-Genes-Champanelle, France

In studies on enzyme activity or gene expression at the protein level, data are usually analyzed by using a standard curve after subtracting blank values. In most cases and for most techniques (spectrophotometric assays, ELISA), this approach satisfies the basic principles of linearity and specificity. In our experience, this might be also the case for Western-blot analysis. By contrast, mRNA data are usually presented as arbitrary units of the ratio of a target RNA over levels of a control RNA species. We here demonstrate by simple experiments and various examples that this data-normalization procedure may result in misleading conclusions. Common molecular biology techniques have never been carefully tested according to the basic principles of validation of quantitative techniques. We thus prefer a regression-based approach for quantifying mRNA levels relatively to a control RNA species by Northern-blot, semi-quantitative RT-PCR or similar techniques. This type of techniques is also characterized by a lower reproducibility for repeated assays when compared to biochemical analyses. Therefore, we also recommend to design experiments, which allow the detection of a similar range of variance by biochemical and molecular biology techniques. Otherwise, spurious conclusions may be provided regarding the control level of gene expression.

Confidence interval estimation for DNA and mRNA concentration by real-time PCR:  A new environment for an old theorem.
Verderio P, Orlando C, Casini Raggi C, Marubini E.
Int J Biol Markers. 2004 Jan-Mar;19(1):76-9.

Operative Unit of Medical Statistics and Biometry, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy.


Bravais-Pearson and Spearman correlation coefficients: meaning, test of hypothesis and confidence interval.
Artusi R, Verderio P, Marubini E.
Int J Biol Markers. 2002 Apr-Jun;17(2):148-51.

Operative Unit of Medical Statistics and Biometry, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy.


Biostatistics and tumor marker studies in breast cancer: design, analysis and interpretation issues.
Biganzoli E, Boracchi P, Marubini E.
Int J Biol Markers. 2003 Jan-Mar;18(1):40-8.

Operative Unit of Medical Statistics and Biometry, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy.


SAS programs for real-time RT-PCR having multiple independent samples.
Cook P, Fu C, Hickey M, Han ES, Miller KS.
Biotechniques. 2004 Dec;37(6): 990-995.
University of Tulsa, Tulsa, OK 74104, USA.

Relative real-time reverse transcription PCR (RT-PCR) has become an important tool for quantifying changes in messenger RNA (mRNA) populations following differential development or stimulation of tissues or cells. However, the best methods for conducting such experiments and analyzing the resultant data remain an issue of discussion. In this report we describe an appropriate experimental methodology and the computer programs necessary to generate a meaningful statistical analysis of the combined biological and experimental variability in such experiments. Specifically, logarithmic transformations of raw fluorescence data from the log-linear portion of real-time PCR growth curves for both target and reference genes are analyzed using a SAS/STAT Mixed Procedure program specifically designed to give a point estimate of the relative expression ratio of the target gene with associated 95% confidence interval. The program code is open-source and is printed in the text.

Relative Expression Software Tool  (REST©)  for group wise comparison
and statistical analysis of relative expression results in real-time PCR

Michael W. Pfaffl   Graham W. Horgan & Leo Dempfle
Nucleic Acids Research 2002 May 1; 30(9): E36

=>    download  latest  REST  versions   <=

Real-time reverse transcription followed by polymerase chain reaction (RT-PCR) is the most suitable method for the detection and quantification of mRNA. It offers high sensitivity, good reproducibility, and a wide quantification range. Today relative expression is increasingly used, where the expression of a target gene is standardised by a non regulated reference gene. Several mathematical algorithm have been developed to compute an expression ratio, based on real-time PCR efficiency and the crossing point deviation of an unknown sample versus a control. But all published equations and available models for the calculation of relative expression ratio allow only for the determination of a single transcription difference between one control and one sample. Therefore a new software tool was established, named REST © (Relative Expression Software Tool), which compares two groups, with up to 16 data points in sample and  16 in control group, for reference and up to four target genes. The mathematical model used is based on the PCR efficiencies and the mean crossing point deviation between sample and control group. Subsequently the expression ratio results of the four investigated transcripts are tested for significances by a randomisation test. Herein development and application of REST is explained and the usefulness of relative expression in real-time PCR using REST is discussed.

Kinetic Outlier Detection (KOD) in real-time PCR.
Tzachi Bar, Anders Stahlberg, Anders Muszta and Mikael Kubista
NAR Vol 31(17)  e105



Department of Chemistry and Bioscience, Chalmers University of Technology, Medicinargatan 7B, 405 30 Gothenburg, Sweden,
Department of Mathematical Statistics, Eklandagatan 86, 412 96, Gothenburg, Sweden
TATAA Biocenter, Medicinargatan 7B, 405 30 Gothenburg, Sweden

Real-time PCR is becoming the method of choice for precise quantification of minute amounts of nucleic acids. For proper comparison of samples, almost all quantification methods assume similar PCR effciencies in the exponential phase of the reaction. However, inhibition of PCR is common when working with biological samples and may invalidate the assumed similarity of PCR effiencies. Here we present a statistical method, Kinetic Outlier Detection (KOD), to detect samples with dissimilar effiiencies. KOD is based on a comparison of PCR effciency, estimated from the amplifiation curve of a test sample, with the mean PCR effiency of samples in a training set. KOD is demonstrated and validated on samples with the same initial number of template molecules, where PCR is inhibited to various degrees by elevated concentrations of dNTP; and in detection of cDNA samples with an aberrant ratio of two genes. Translating the dissimilarity in efficiency to quantity, KOD identifies outliers that differ by 1.3±1.9-fold in their quantity from normal samples with a P-value of 0.05. This precision is higher than the minimal 2-fold difference in number of DNA molecules that real-time PCR usually aims to detect. Thus, KOD may be a useful tool for outlier detection in real-time PCR.

Intuitive Biostatistics
     http://www.graphpad.com/www/book/book.htm   

"The book's title suggests that he can make biostatistics intuitive for non-statisticians (e.g. physicians, clinicians and nurses). After reading through it he has made a believer out of me! He introduces concepts through examples and touches on most of the important statistical methods that are used in the medical literature. ... My usual concern with such books is that concepts are oversimplified and the presentation is too cook-bookish. Amazingly that is not the case here. Motulsky carefully explains concepts such as confidence intervals, p-values, multiple comparison issues, Bayesian thinking and Bayesian controversy in a way that should be understandable to his intended audience." by  Michael R. Chernick, PhD (review posted on amazon.com)
We created the GraphPad library to help biologists (and other scientists) learn about data analysis. This "library" contains articles and manuals written by GraphPad, as well as links to web sites and books written by others. http://www.graphpad.com/index.cfm?cmd=library.index

Applied Robust Statistics
David J. Olive,  Southern Illinois University,  Department of Mathematics,  Carbondale, IL 62901-4408
  Book Content ( 5 pages )
  Complete Book ( 532 pages  4.7 MB )