dynamo.tl.gene_wise_confidence

dynamo.tl.gene_wise_confidence(adata, group, lineage_dict=None, genes=None, ekey='M_s', vkey='velocity_S', X_data=None, V_data=None, V_threshold=1)[source]

Diagnostic measure to identify genes contributed to “wrong” directionality of the vector flow.

The results would be stored in a new gene_wise_confidence key in .uns, which contains gene-wise confidence score in each cell state. .var will also be updated with avg_prog_confidence and avg_mature_confidence key which correspond to the average gene wise confidence in the progenitor state or the mature cell state.

In some scenarios, you may find unexpected “wrong vector back-flow” from your dynamo analyses, in order to diagnose those cases, we can identify those genes showing up in the wrong phase portrait position. Then we may remove those identified genes to “correct” velocity vectors. This requires us to give some priors about what progenitor and terminal cell types are. The rationale behind this basically boils down to understanding the following two scenarios:

1). if the progenitor’s expression is low, starting from time point 0, cells should start to increase expression. There must be progenitors that are above the steady-state line. However, if most of the progenitors are laying below the line, we will have negative velocity and this will lead to reversed vector flow.

2). if progenitors start from high expression, starting from time point 0, cells should start to decrease expression. There must be progenitors that are below the steady-state line. However, if most of the progenitors are laying above the steady state line, we will have positive velocity and this will lead to reversed vector flow.

The same rationale can be applied to the mature cell states.

Thus, we design an algorithm to access the confidence of each gene obeying the above two constraints: We first check for whether a gene should be in the induction or repression phase from each progenitor to each terminal cell states (based on the shift of the median gene expression between these two states). If it is in induction phase, cells should show mostly >= small negative velocities; otherwise <= small negative velocities. 1 - ratio of cells with velocities pass those threshold (defined by V_threshold) in each state is then defined as a velocity confidence measure.

Note that, this heuristic method requires you provide meaningful progenitors_groups and mature_cells_groups. In particular, the progenitor groups should in principle have cell going out (transcriptomically) while mature groups should end up in a different expression state and there are intermediate cells going to the dead end cells in each terminal group (or most terminal groups).

Parameters:
  • adata (AnnData) – An AnnData object.

  • group (str) – The column key/name that identifies the cell state grouping information of cells. This will be used for calculating gene-wise confidence score in each cell state.

  • lineage_dict (Optional[Dict[str, str]]) – A dictionary describes lineage priors. Keys correspond to the group name from group that corresponding to the state of one progenitor type while values correspond to the group names from group that corresponding to the states of one or multiple terminal cell states. The best practice for determining terminal cell states are those fully functional cells instead of intermediate cell states. Note that in python a dictionary key cannot be a list, so if you have two progenitor types converge into one terminal cell state, you need to create two records each with the same terminal cell as value but different progenitor as the key. Value can be either a string for one cell group or a list of string for multiple cell groups. Defaults to None.

  • genes (Optional[List[str]]) – The list of genes that will be used to gene-wise confidence score calculation. If None, all genes that go through velocity estimation will be used. Defaults to None.

  • ekey (str) – The layer that will be used to retrieve data for identifying the gene is in induction or repression phase at each cell state. If None, .X is used. Defaults to “M_s”.

  • vkey (str) – The layer that will be used to retrieve velocity data for calculating gene-wise confidence. If None, velocity_S is used. Defaults to “velocity_S”.

  • X_data (Optional[ndarray]) – The user supplied data that will be used for identifying the gene is in induction or repression phase at each cell state directly. Defaults to None.

  • V_data (Optional[ndarray]) – The user supplied data that will be used for calculating gene-wise confidence directly. Defaults to None.

  • V_threshold (float) – The threshold of velocity to calculate the gene wise confidence. Defaults to 1.

Raises:
  • ValueErrorX_data is provided but genes does not correspond to its columns.

  • ValueErrorX_data is provided but genes does not correspond to its columns.

  • Exception – The progenitor cell extracted from lineage_dict is not in adata.obs[group].

  • Exception – The terminal cell extracted from lineage_dict is not in adata.obs[group].

Return type:

None