Significance Analysis of Microarrays implementation using Matlab

The demonstration proceeds to a standard MatSAM analysis. To see the complete documentation for the functions used in the demo, please refer to the API reference home page.

Contents

1- Global arguments

Defines the names of global variables and sets their values.

global data

fprintf ('Declaring and defining global arguments ...\n\n');
global_args ();
Declaring and defining global arguments ...

2- The data set

Specifies the data set from which the experiments and the observations data are extracted. The data set is expected to be provided as two text files (tabulation separated), the first containing only gene expression data values, the second comprised of gene names/tags. This demo is based on the Platinum Spike data set (Zhu et al, 2010).

exprs_file = 'inputs/gse21344_expressions.txt';
obs_file = 'inputs/gse21344_names.txt';
fprintf ('Reading into raw files ...\n');
sam_input (exprs_file, obs_file);
Reading into raw files ...
	Observations count : 18952
	Experiments count : 18

3- Experiment groups

Sets up the design following which the experiments are divided into classes. The experiments are divided into two groups (or classes). In the dataset here used, there are 18 experiments, the first 9 belong to one class (the control group) while the 9 last ones belong to another class (the treatement group). To specify groups, we use 0s and 1s labels. Thus, the classes design will be the following : 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1.

cl = ones (1, size (data, 2));
k = 1;
l = 9;
cl (k:l) = 0;
disp(cl);
  Columns 1 through 13

     0     0     0     0     0     0     0     0     0     1     1     1     1

  Columns 14 through 18

     1     1     1     1     1

4- The main SAM function

The function gives results related, including but not limited to, the expected & observed D scores, the cut-off helping to find expressed & false called observations, the scores of the statistical test, and the fold changes.

fprintf ('Beginning SAM ...\n\n');
[d, d_bar, vec_false, p_value, q_value, fold_out, s] = sam_func(cl);
Beginning SAM ...

Doing statistical test ...

Deciding which statistical test to apply on the data ...

Computing the genes fold changes ...

Computing the numerators and the denominators of the statistical test ...

Computing the fudge factor, "S0" ...

Computing the D-scores ...

Randomly permuting the experiments ...
Computing the D-bar scores and the P-Values ...

Estimating the prior probability that a gene is differential expressed ...

Warning: All data points with NaN in their value will be ignored. 


Computing the q-value for each gene ...

5- Thresholds

Gives the watersheds from which the user will determine how many significant observations to pick up.

fprintf ('Summary of obtained results :\n');
full_results (d, d_bar, vec_false);
Summary of obtained results :
Delta	p0	False	Called	FDR	cutlow	cutup
0.1	0.1	7310.3	13741.0	0.0	-0.7	0.2
0.7	0.1	1514.0	4463.0	0.0	-1.7	1.5
1.3	0.1	8130.9	2464.0	0.2	-2.5	2.4
1.9	0.1	4889.7	1952.0	0.2	-3.1	3.2
2.5	0.1	17238.4	1679.0	0.7	-3.8	3.9
3.1	0.1	2679.2	1468.0	0.1	-4.4	4.6
3.7	0.1	1664.4	1385.0	0.1	-5.0	5.2
4.3	0.1	3852.8	1287.0	0.2	-5.6	5.8
4.9	0.1	0.0	1181.0	0.0	-6.3	6.7
5.4	0.1	5585.4	1133.0	0.3	-6.8	7.1
6.0	0.1	1261.0	1051.0	0.1	-7.4	7.5
6.6	0.1	0.0	889.0	0.0	-8.2	8.2
7.2	0.1	9046.4	785.0	0.8	-8.8	8.8
7.8	0.1	0.0	728.0	0.0	-9.5	9.4
8.4	0.1	5849.6	680.0	0.6	-10.1	10.0
9.0	0.1	7910.0	648.0	0.8	-10.7	10.6
9.6	0.1	0.0	616.0	0.0	-11.3	11.3
10.2	0.1	0.0	585.0	0.0	-12.1	11.9
10.8	0.1	0.0	552.0	0.0	-12.5	12.5
11.4	0.1	6983.7	506.0	1.0	-13.2	13.2
12.0	0.1	2052.7	456.0	0.3	-13.8	13.8
12.6	0.1	2642.6	434.0	0.4	-14.5	14.5
13.2	0.1	5299.1	405.0	0.9	-15.0	15.1
13.8	0.1	6767.2	371.0	1.3	-15.6	15.8
14.4	0.1	645.3	328.0	0.1	-16.2	16.4
15.0	0.1	474.5	298.0	0.1	-16.8	17.1
15.6	0.1	0.0	258.0	0.0	-17.4	17.8
16.1	0.1	8851.8	230.0	2.7	-18.0	18.6
16.7	0.1	1724.1	216.0	0.6	-18.5	19.3
17.3	0.1	9255.7	191.0	3.3	-19.1	20.6
17.9	0.1	1185.9	182.0	0.4	-19.7	21.1
18.5	0.1	647.1	166.0	0.3	-20.4	21.7
19.1	0.1	4920.5	157.0	2.2	-21.0	Inf
19.7	0.1	5916.7	149.0	2.7	-21.7	Inf
20.3	0.1	62.1	139.0	0.0	-22.3	Inf
20.9	0.1	5486.5	126.0	3.0	-22.9	Inf
21.5	0.1	7599.5	111.0	4.7	-23.5	Inf
22.1	0.1	2379.1	101.0	1.6	-24.2	Inf
22.7	0.1	0.0	87.0	0.0	-24.7	Inf
23.3	0.1	9062.0	76.0	8.2	-25.4	Inf
23.9	0.1	5312.8	63.0	5.8	-26.0	Inf
24.5	0.1	2867.8	53.0	3.7	-26.7	Inf
25.1	0.1	4464.3	36.0	8.6	-27.4	Inf
25.7	0.1	7474.6	27.0	19.1	-28.1	Inf
26.3	0.1	0.0	21.0	0.0	-28.8	Inf
26.8	0.1	4321.2	14.0	21.3	-29.4	Inf
27.4	0.1	0.0	8.0	0.0	-30.1	Inf
28.0	0.1	0.0	7.0	0.0	-31.0	Inf
28.6	0.1	0.0	4.0	0.0	-31.6	Inf
29.2	0.1	0.0	1.0	0.0	-32.5	Inf

6- SAM results

Sets the delta value using the precendetely printed thresholds. The up & down regulated observations' statistics are computed based on the chosen delta value.

delta = 16.7;
fprintf ('Gathering SAM results ...\n');

[mat_sig, mat_sig_hdr, called_names, mat_fdr] =...
    sam_results (delta, d, d_bar, vec_false, s, p_value, q_value, fold_out);
Gathering SAM results ...
Determining Delta, prior probability, false called genes, FDR, low and up cut-offs ...

7- Plotting

Plots SAM QxQ plot.

fprintf ('Plotting the SAM results ...\n');
sam_plot (d, d_bar, delta, vec_false);
Plotting the SAM results ...
Determining Delta, prior probability, false called genes, FDR, low and up cut-offs ...

8- Printing

Saves the results on the disk for further analysis purpose.

fprintf ('Printing the SAM results ...\n\n');

fold_status = mkdir ('results');
if fold_status == 1
    file_name = sprintf ('results/golub_results__delta_%.1f.txt', delta);
    save_sam_results (...
        mat_sig, mat_sig_hdr, mat_fdr, called_names, delta, file_name);
else
    fprintf ('Results not saved; check destination folder settings.\n');
end
Printing the SAM results ...

Warning: Directory already exists. 
delta : 16.7
s0 : 0.0265144
Number of permutations : 1000
prior probability : 6.904917e-02
low cutoff : -18.4928
up cutoff : 19.3441
Identified genes : 216
Falsely called genes : 1724.12
FDR : 0.551153

The sesults were saved in : results/golub_results__delta_16.7.txt

9- References

10- Authors contact

Eric NIMPAYE

Ouafae KAISSI

Tiratha Raj SINGH

Brigitte VANNIER

Azeddine IBRAHIMI

Ahmed MOUSSA