dynamo.tl.glm_degs

dynamo.tl.glm_degs(adata, X_data=None, genes=None, layer=None, fullModelFormulaStr='~cr(integral_time, df=3)', reducedModelFormulaStr='~1', family='NB2')[source]

Differential genes expression tests using generalized linear regressions.

The results would be stored in the adata’s .uns[“glm_degs”] annotation and the update is inplace.

Tests each gene for differential expression as a function of integral time (the time estimated via the reconstructed vector field function) or pseudotime using generalized additive models with natural spline basis. This function can also use other covariates as specified in the full (i.e ~clusters) and reduced model formula to identify differentially expression genes across different categories, group, etc.

glm_degs relies on statsmodels package and is adapted from the differentialGeneTest function in Monocle. Note that glm_degs supports performing deg analysis for any layer or normalized data in your adata object. That is you can either use the total, new, unspliced or velocity, etc. for the differential expression analysis.

Parameters:
  • adata (AnnData) – An AnnData object.

  • X_data (Optional[ndarray]) – The user supplied data that will be used for differential expression analysis directly. Defaults to None.

  • genes (Optional[List[str]]) – The layer that will be used to retrieve data for dimension reduction and clustering. If None, .X is used. Defaults to None.

  • layer (Optional[str]) – The layer that will be used to retrieve data for dimension reduction and clustering. If None, .X is used. Defaults to None.

  • fullModelFormulaStr (str) – A formula string specifying the full model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature. Defaults to “~cr(integral_time, df=3)”.

  • reducedModelFormulaStr (str) – A formula string specifying the reduced model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature. Defaults to “~1”.

  • family (Literal['NB2']) – The distribution family used for the expression responses in statsmodels. Currently, always uses NB2 and this is ignored. NB model requires us to define a parameter alpha which it uses to express the variance in terms of the mean as follows: variance = mean + alpha mean^p. When p=2, it corresponds to the NB2 model. In order to obtain the correct parameter alpha (sm.genmod.families.family.NegativeBinomial (link=None, alpha=1.0), by default it is 1), we use the auxiliary OLS regression without a constant from Messrs Cameron and Trivedi. More details can be found here: https://towardsdatascience.com/negative-binomial-regression-f99031bb25b4. Defaults to “NB2”.

Raises:
  • ValueErrorX_data is provided but genes does not correspond to its columns.

  • Exception – Factors from the model formula fullModelFormulaStr invalid.

Return type:

None