Contents
GLOBAL_ARGS declares and defines global arguments.
GLOBAL_ARGS() sets on all the global arguments required and shared throughout the various functions of this SAM implementation.
- data : a PxQ matrix of P observations and Q experiments.
- gene_names : the P-length names of the observations.
- var_equal : if 0 (FALSE), Welch's t-statistic will be computed. If 1 (TRUE), the pooled variance will be used in the computation of the t-statistic.
- B : how many permutations should be used in the estimation of the null distribution.
- med : if TRUE, the median number of falsely called observations is computed; if FALSE, their mean is computed.
- s0 : the fudge factor. If NaN, s0 will be computed automatically.
- s_alpha : quantiles of the s values, each component of s corresponding to the denominator of the statistic test applied on an observation.
- include_zero : if 1 (TRUE), s0 =0 is also a possible choice for the fudge factor. Hence, the usual t-statistic or F statistic, respectively, can also be a possible choice for the expression score d. If FALSE, s0 =0 will not be a possible choice for the fudge factor. The latter follows Tusher et al. (2001) definition of the fudge factor in which only strictly positive values are considered.
- n_subset : how many permutations are considered simultaneously when computing the p-values and the number of falsely called genes. If med = 1 (TRUE), n_subset will be set to 1.
- mat_samp : a PXQ matrix except for the two class paired case for which mat_samp is a Px(Q/2) matrix. Each row specifies a permutation of the group labels used in the computation of the expected % expression scores d_bar. If not specified (mat_samp=0), a BxQ matrix is generated automatically and used in the computation of d_bar. In the two class unpaired case and the multi-class cases, each row of mat_samp must contain the same group labels as cl. In the one class and the two class paired cases, each row must contain -1's and 1's. In the one class case, the expression values are multiplied by these -1's and 1's. In the two class paired case, each column corresponds to one observation pair whose difference is multiplied by either -1 or 1.
- B_more : If the number of all possible permutations is smaller than or equal to (1+B_more)*B , full permutation will be done. Otherwise, B permutations are used. This avoids B permutations to be used if the number of all possible permutations is just a little larger than B.
- B_max : If the number of all possible permutations is smaller than or equal to B_max , B randomly selected permutations will be used in the computation of the null distribution. Otherwise, B random draws of the group labels are used. In the latter way of permuting, it is possible that some of the permutations are used more than once.
- R_fold : If the fold change of an observation is smaller than or equal to R_fold , or larger than or equal to 1/R_fold ,respectively, then this gene will be excluded from the SAM analysis. The expression score d of excluded genes is set to NaN. By default, R_fold is set to 1 such that all genes are included in the SAM analysis. Setting R_fold to 0 or a negative value will avoid the computation of the fold change. The fold change is only computed in the two-class unpaired cases.
- use_dm : if 1 (TRUE), the fold change is computed by 2 to the power of the difference between the mean log2 intensities of the two groups, i.e. 2 to the power of the numerator of the test statistic. If 0 (FALSE), the fold change is determined by computing 2 to the power of data (if R_unlog = TRUE) and then calculating the ratio of the mean intensity in the group coded by 1 to the mean intensity in the group coded by 0. The latter is the definition of the fold change used in Tusher et al. (2001).
- R_unlog : if 1 (TRUE), the anti-log of data will be used in the computation of the fold change. Otherwise, data is used. This transformation should be done when data is log2-transformed (in a SAM analysis it is highly recommended to use log2-transformed expression data). Ignored if use_dm = 1 (TRUE).
- na_replace : if 1 (TRUE), missing values will be removed by the gene-wise statistic specified by na_method . If a gene has less than 2 non-missing values, this gene will be excluded from further analysis. If na_replace =0 (FALSE), all genes with one or more missing values will be excluded from further analysis. The expression score d of excluded genes is set to NaN.
- na_method : a handle to the function that computes the statistic by which missing values will be replaced when if na_replace =1 (TRUE). Must be either "mean" or "median".
- rand_seed : if not NaN, the random number generator will be set into a reproducible state.
- delta : a set of values for the threshold Delta that should be used. If 0 or empty, le_delta Delta values will be computed automatically.
- le_delta : the number of Delta values that will be computed over the range of all possible values for Delta if delta is not specified.
- p0 : the prior probability pi0 that a gene is not differentially expressed. If NA, p0 will be computed by the function pi0_estimation
- lambda : the lambda values used in the estimation of the prior probability.
- ncs_value : Only used if lambda is a vector. Either "max" or "paper".
- q_version : an indicator for the q-value's version to compute. If q_version=2, the original version of the q-value, i.e. min{pFDR}, will be computed. If q_version= 1, min{FDR} will be used in the calculation of the q-value. Otherwise, the q-value is not computed.
- WILC_SIGNED_RANK_STAT : analysis for the one-class case using Wilcoxon signed rank statistics.
- WILC_RANK_SUMS_t : analysis for the two-class unpaired case using Wilcoxon rank sums and assuming var_equal = 0 (FALSE) variances.
- WILC_RANK_SUMS_t_equalvar : analysis for the two-class unpaired case assuming var_equal = 1 (TRUE) variances, using Wilcoxon rank sums.
- WILC_SIGNED_RANK_SCORES : analysis for the two-class paired case using Wilcoxon signed rank scores.
- F_STAT : analysis for the multi-class case.