dynamo.tl.two_groups_degs

dynamo.tl.two_groups_degs(adata, genes, layer, group, test_group, control_groups, X_data=None, exp_frac_thresh=None, log2_fc_thresh=None, qval_thresh=0.05, subset_control_vals=None)[source]

Find marker genes between two groups of cells based on gene expression or velocity values as specified by the layer.

Tests each gene for differential expression between cells in one group to cells from another groups via Mann-Whitney U test. It also calculates the fraction of cells with non-zero expression, log 2 fold changes as well as the specificity (calculated as 1 - Jessen-Shannon distance between the distribution of percentage of cells with expression across all groups to the hypothetical perfect distribution in which only the current group of cells have expression). In addition, Rank-biserial correlation (rbc) and qval are calculated. The rank biserial correlation is used to assess the relationship between a dichotomous categorical variable and an ordinal variable. The rank biserial test is very similar to the non-parametric Mann-Whitney U test that is used to compare two independent groups on an ordinal variable. Mann-Whitney U tests are preferable to rank biserial correlations when comparing independent groups. Rank biserial correlations can only be used with dichotomous (two levels) categorical variables. qval is calculated using Benjamini-Hochberg adjustment.

Parameters
  • adata (AnnData) – an Annodata object

  • genes (list or None (default: None)) – The list of genes that will be used to subset the data for dimension reduction and clustering. If None, all genes will be used.

  • layer (str or None (default: None)) – The layer that will be used to retrieve data for dimension reduction and clustering. If None, .X is used.

  • group (str or None (default: None)) – The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types or different time points) of cells. This will be used for calculating group-specific genes.

  • test_group (str or None (default: None)) – The group name from group for which markers has to be found.

  • control_groups (list) – The list of group name(s) from group for which markers has to be tested against.

  • X_data (np.ndarray (default: None)) – The user supplied data that will be used for marker gene detection directly.

  • exp_frac_thresh (float (default: None)) – The minimum percentage of cells with expression for a gene to proceed differential expression test. If layer is not velocity related (i.e. velocity_S), exp_frac_thresh by default is set to be 0.1, otherwise 0.

  • log2_fc_thresh (float (default: None)) – The minimal threshold of log2 fold change for a gene to proceed differential expression test. If layer is not velocity related (i.e. velocity_S), log2_fc_thresh by default is set to be 1, otherwise 0.

  • qval_thresh (float (default: 0.05)) – The maximal threshold of qval to be considered as significant genes.

  • subset_control_vals (None or bool (default: None)) – Whether to subset the top ranked control values. When subset_control_vals = None, this is subset to be True when layer is not related to either velocity related or acceleration or curvature related layers and False otherwise. When layer is not related to either velocity related or acceleration or curvature related layers used, the control values will be sorted by absolute values.

Returns

Return type

A pandas DataFrame of the differential expression analysis result between the two groups.