Logo FlyChip
FlyChip
Functional Genomics for Drosophila
Cambridge Systems Biology Centre, Tennis Court Road, Cambridge, CB2 1QR, UK  [map]
Tel: +44 (0)1223-760280.   Fax: +44 (0)1223-760241.

1. General introduction

Each project is divided into replica groups that correspond to each pair of samples being compared. Each of the replica groups is hybridised to four microarrays. Two of these hybridisations are dye swaps, meaning that the sample that was labelled with Cy3 is now labelled with Cy5 and vice versa. This is to allow compensation for the differential labelling efficiencies of each dye.

The project index file describes the groups of replicate hybridisations and how each sample has been labelled. FlyChip projects codes are in the form Pnnnnn, where nnnnn is a unique integer. FlyChip replica group codes are in the form Rnnnnn, where nnnnn is a unique integer. Additionally, some files are named based on which microarray slide they represent. FlyChip microarray slide codes are in the form Snnnnnn, where nnnnn is a unique integer.

For your convenience we provide normalised data and we recommend that you examine this first. The tab-delimited text files are suitable for further processing, e.g. clustering with other tools. Raw data are also provided in case you wish to normalise the data yourself. If you wish to perform spot-finding and all of the data anaylsis yourself, we have provided the raw grey-scale TIF images.

Each of the terms and codes defined above will be used throughout this text and many of the files we have sent to you have names derived from these codes. We will now explain what each of the file types we have sent to you contains. You will find it easier to understand this help text if you have a copy of the file being described open at the same time. This will enable you to compare our descriptions with the file of interest.

2. Project index file: Pnnnnn_info.text description

Pnnnnn_info.text files contain a description of how your project was organised and what sample has been labelled in each channel for each slide within each replica group. You may want to refer to this file when analysing your results. This tab-delimited file is best viewed with the freely available spreadsheet program within OpenOffice or else Microsoft Excel because some text editors will have problems with the variable column spacing.

Important note; dye swaps and gene expression ratios:

3. What images have you sent to me?

We have dispatched two different types of images. The first type are the raw grey-scale 16-bit TIF images that we analyse to quantify how much labelled sample and control has bound to each spot. The second type of image are false colour PNG images. These provide you with a visual (not normalised) representation of your results. Such images can be used to check for slide-specific problems, and for presentations.

4. Raw data

This file is named based on which microarray slide they represent. The spot signals within this file has NOT been normalised, and so this file contains raw unprocessed data. If this represents data from dye-swap slides, the dye swap has not been taken into account. This is the case for slides marked swap_status = 1 within the Pnnnnn_info.text file. If you wish to normalise the data yourself, you will need to take into account the dye swap; we recommend using the single channel data for this purpose.

We recommend viewing this tab-delimited file with the freely available spreadsheet program within OpenOffice or else Microsoft Excel, because some text editors will have problems with the variable column spacing. Header information is denoted by # at the beginning of the line and all other columns are defined below. However, please note that the first number within the grid_x column is the total spot number, and the first number within the grid_y column is the number of channels.

Column definitions for Snnnnnn.state.dat

These first few columns denote where the spot is located within the microarray. Such locations are provided using a system of Cartesian co-ordinates. The x-axis corresponds to the width of the image (the shortest side) and the y-axis corresponds to the length of the image (the longest side). The reference point for these co-ordinates (0,0) is the top left spot in each image.

The following columns provide a description of what each spot is. This description includes the 'Drosophila Gene Collection' clones and the predicted gene for each spot. The last of these columns defines whether the spot should be included in any normalisation, should you choose to do this yourself.

We then have further columns that provide details about the spot status, signal and a pixel count for the foreground (i.e., the spot) and background (i.e., the local area around the spot). Spots with very few pixels in the foreground are probably unreliable because they contain too few pixels for reliable spot signal estimate.

5. Normalised data

Measured fluorescent spot signals will differ systematically between different microarray hybridizations and dyes: there will be differences in background fluorescence as well as differences in overall brightness with, e.g., one dye being twice as bright as another one. The process of correcting for such systematic differences is called normalization. The normalization method employed is closely based on the work published by Huber et al. (2002). We only normalise spots with an [A]ccepted spot status that also have the norm_ignore flag set to 0 (see above).

For each dye and microarray, the background fluorescence and a factor reflecting overall brightness are inferred to make identical the signals for this subset of non-differentially expressed genes. A necessary assumption is that more than half the genes are NOT differentially expressed. For further (technical) information on the normalisation please refer to Huber et al. (2002) Bioinformatics 18(1), S96-104 (abstract).

Normalised data have had the dye swap taken into account and are presented as one file for each replica group and as a separate column for each slide. The dye-swap slide column data are unswapped during the analysis and are presented as a ratio of Cy5 over Cy3, without any background subtraction. This means that all fn:M numbers should be treated as though there has been no dye swap (meaning Swap_status = 0 within Pnnnnn_info.text); for further analysis you do not need to make any compensation for dye swaps. Within the replica group files the order of the slide data columns (e.g. f0:M, f1:M..., f0:A, f1:A...) are the same as within the replica group within the Pnnnnn_info.text file.

The normalised data are very similar to a log[2] scale, i.e

After normalisation, we first calculate 'M' (similar to a log-ratio) for each spot on each slide. This allows partial self-normalization of spatial effects, e.g., variations in hybridisation efficiency across the slide surface. We then calculate an uncertainty of 'M' from the 'pixel intensity fluctuations' reported by dapple. This assumes a flat disc spot model. Then, seperarately for 'normal' and 'dye-swapped' slides, we calculate a weghted average, using the certainties of 'M' as weights. Lastly, we form a simple average of 'M' for 'normal' and 'dye-swapped' slides. The average is unweighted to allow partial compensation of dye-swap effects.

What do Rnnnnn_vsn.tab files contain?

For each replica group within your project we produce a summary file that contains a description of what each spot is, the transformed normalised intensity differences between the Cy5 and Cy3 channels for the replicate slides (f0:M, f1:M, f2:M, f3:M), and the transformed normalised average intensities of the replicate slides (f0:A, f1:A, f2:A, f3:A). These file can be used for further downstream processing, for example, clustering.

The first group of columns within the Rnnnnn_vsn.tab file identify which gene is represented by the spot. The genes are named using FlyChip and FlyBase identifiers.

The following columns contain the transformed normalised intensity differences between the Cy5 and Cy3 channels for the replicate slides (f0:M, f1:M, f2:M, f3:M), and the transformed normalised average intensities of the channels (f0:A, f1:A, f2:A, f3:A). You will probably be most interested in f0:M, f1:M, f2:M and f3:M. Positive numbers indicate an increase in relative intensity (Cy5 greater than Cy3), and negative numbers indicate a decrease in relative intensity (Cy5 less than Cy3). Numbers of equal size but opposite sign indicate equivalent fold changes up and down respectively. Please note, dye-swaps have been taken into account so that the numbers across all replicate slides are comparable, and should ideally change in the same direction.

The last column provides an indication of how good each spot looked and can be used to determine if the reported expression changes are reliable or subject to error due to problems with the either the printing, hybridisation, or spot-finding.

Ranking of microarray data is hard and an active area of research. The data provided should be treated as unranked.

6. Glossary of terms