Logo FlyChip
FlyChip
Functional Genomics for Drosophila
Cambridge Systems Biology Centre, Tennis Court Road, Cambridge, CB2 1QR, UK  [map]
Tel: +44 (0)1223-760280.   Fax: +44 (0)1223-760241.

Statistics:

Overview

We perform a very basic statistical analysis of your data using a range of analytical tools. This analysis includes a p-value estimation of your data that will facilitate direct comparisons between, e.g., samples and their controls. For other experimental designs, e.g., reference designs, or time-course analysis, we do not routinely perform a statistical analysis.

Result file names consist of the replicate group number, normalisation method identifier and the statistical software used for the analysis. For each replicate group of your project we produce summary files which describe the nature of each spot, the transformed normalised intensity differences between the Cy5 and Cy3 channels for the replicate slides (M-values), and the transformed normalised average intensities of the replicate slides (A-values). Normalised data have the dye-swap taken into account; therefore, all M-values should be treated as if there had been no dye swap

M and A values

Differential expression is presented as a ratio of the Cy5 spot signal over the Cy3 spot signal. These are then presented on a log[2] scale. The log[2] differential expression ratio for each spot of 'M' is calaulated as below:

M = log2 (Cy5 / Cy3) or M = log Cy5 - log Cy3

The log intensity of the spot or 'A' (a measure of the overall brightness of the spot) is:

A = (log2 (Cy5 * Cy3) /2) or A = ( log Cy5 + log Cy3) / 2

In our files, M-values represent:

Positive M-values indicate an increase in relative intensity (Cy5 greater than Cy3), negative values indicate a decrease in relative intensity (Cy5 less than Cy3). Remember that both M and A-values are log[2] transformed. Numbers of equal value but opposite sign indicate equivalent fold changes up and down respectively.

Since dye-swaps have been taken into account, the numbers across all replicate slides are comparable and should ideally change in the same direction.

Limma

Limma stands for Linear Models for Microarray Data. For each gene, limma will fit a linear model to the expression data and employ an empirical Bayes method to stabilise the analysis.

A design matrix and a contrast matrix have to be established for the data of a given replicate group. In a paired-data design the number of coefficients is one fewer than RNA sources (e.g. wildtype vs mutant, the number of coefficients equals 1). The first step is to fit a linear model that describes the systematic part of the data, followed by moderated t-statistics calculated for each probe and each contrast, respectively. Moderated t-statistics allow the same interpretation as an ordinary t-statistic although the standard errors have been moderated across genes, i.e. lowered towards a common value by using a simple Bayesian model. P-values are then adjusted using Benjamini and Hochberg's step-up method for controlling the false discovery rate (FDR).

For further information, please see Smyth, G.K. (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3:Article 3. PMID 16646809 (abstract).

siggenes

We can perform the popular Significant Analysis of Microarrays (SAM) using the siggenes software package in Bioconductor.

SAM identifies genes with statistically significant changes in expression by assimilating a set of gene-specific t-tests. Each gene is assigned a score on the basis of its change in gene expression relative to the standard deviation of repeated measurements for that gene. Genes with scores beyond a certain threshold are deemed potentially significant. The percentage of such genes identified by chance is the false discovery rate (FDR). To estimate the FDR, nonsense genes are identified by analyzing permutations of the measurements. The threshold can be adjusted to identify smaller or larger sets of genes, and FDRs are calculated for each set.

For further information, please refer to Tusher, V.G., Tibshirani, R., Chu, G. (2001). Signficiance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98(9), 5116-21. PMID 11309499 (abstract).

Deciding which genes are differentially expressed

There are several points to consider before deciding whether a gene in your experiment is differentially expressed.

Further assistance

Please refer to the FL003 data release CD help file or contact FlyChip for further guidance. Please be aware that funding limitations mean that FlyChip is unable to provide comprehensive analytical support for end-users.