cospar.tl.differential_genes

cospar.tl.differential_genes(adata, cell_group_A=None, cell_group_B=None, FDR_cutoff=0.05, sort_by='ratio', min_frac_expr=0.05, pseudocount=1)

Perform differential gene expression analysis and plot top DGE genes.

We use Wilcoxon rank-sum test to calculate P values, followed by Benjamini-Hochberg correction.

Parameters
adata : AnnData object

Need to contain gene expression matrix.

cell_group_A : np.array, optional (default: None)

A boolean array of the size adata.shape[0] for defining population A. If not specified, we set it to be adata.obs[‘cell_group_A’].

cell_group_B : np.array, optional (default: None)

A boolean array of the size adata.shape[0] for defining population B. If not specified, we set it to be adata.obs[‘cell_group_A’].

FDR_cutoff : float, optional (default: 0.05)

Cut off for the corrected Pvalue of each gene. Only genes below this cutoff will be shown.

sort_by : float, optional (default: ‘ratio’)

The key to sort the differentially expressed genes. The key can be: ‘ratio’ or ‘Qvalue’.

min_frac_expr : float, optional (default: 0.05)

Minimum expression fraction among selected states for a gene to be considered for DGE analysis.

pseudocount : int, optional (default: 1)

pseudo count for taking the gene expression ratio between the two groups

Returns

  • diff_gene_A (pd.DataFrame) – Genes differentially expressed in cell state group A, ranked by the ratio of mean expressions between the two groups, with the top being more differentially expressed.

  • diff_gene_B (pd.DataFrame) – Genes differentially expressed in cell state group B, ranked by the ratio of mean expressions between the two groups, with the top being more differentially expressed.