dynamo.tl.glm_degs

dynamo.tl.glm_degs(adata, X_data=None, genes=None, layer=None, fullModelFormulaStr='~cr(integral_time, df=3)', reducedModelFormulaStr='~1', family='NB2')[source]

Differential genes expression tests using generalized linear regressions.

Tests each gene for differential expression as a function of integral time (the time estimated via the reconstructed vector field function) or pseudotime using generalized additive models with natural spline basis. This function can also use other covariates as specified in the full (i.e ~clusters) and reduced model formula to identify differentially expression genes across different categories, group, etc.

glm_degs relies on statsmodels package and is adapted from the differentialGeneTest function in Monocle. Note that glm_degs supports performing deg analysis for any layer or normalized data in your adata object. That is you can either use the total, new, unspliced or velocity, etc. for the differential expression analysis.

Parameters
  • adata (AnnData) – an Annodata object

  • X_data (np.ndarray (default: None)) – The user supplied data that will be used for differential expression analysis directly.

  • genes (list or None (default: None)) – The list of genes that will be used to subset the data for differential expression analysis. If None, all genes will be used.

  • layer (str or None (default: None)) – The layer that will be used to retrieve data for dimension reduction and clustering. If None, .X is used.

  • fullModelFormulaStr (str (default: ~cr(time, df=3))) – A formula string specifying the full model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.

  • reducedModelFormulaStr (str (default: ~1)) – A formula string specifying the reduced model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.

  • family (str (default: NB2)) – The distribution family used for the expression responses in statsmodels. Currently always uses NB2 and this is ignored. NB model requires us to define a parameter $lpha$ which it uses to express the variance in terms of the mean as follows: variance = mean + $lpha$ mean^p. When $p=2$, it corresponds to the NB2 model. In order to obtain the correct parameter $lpha$ (sm.genmod.families.family.NegativeBinomial(link=None, alpha=1.0), by default it is 1), we use the auxiliary OLS regression without a constant from Messrs Cameron and Trivedi. More details can be found here: https://towardsdatascience.com/negative-binomial-regression-f99031bb25b4.

Returns

  • Returns an updated ~anndata.AnnData with a new key glm_degs in the .uns attribute, storing the differential

  • expression test results after the GLM test.