dynamo.tl.gene_wise_confidence

dynamo.tl.gene_wise_confidence(adata, group, lineage_dict=None, genes=None, ekey='M_s', vkey='velocity_S', X_data=None, V_data=None, V_threshold=1)[source]

Diagnostic measure to identify genes contributed to “wrong” directionality of the vector flow.

In some scenarios, you may find unexpected “wrong vector backflow” from your dynamo analysis, in order to diagnose those cases, we can identify those genes showing up in the wrong phase portrait position. Then we nay remove those identified genes to “correct” velocity vectors. This requires us to give some priors about what progenitor and terminal cell types are. The rationale behind this basically boils down to understanding the following two scenarios:

1). if the progenitor’s expression is low, starting from time point 0, cells should start to increase expression. There must be progenitors that are above the steady-state line. However, if most of the progenitors are laying below the line (indicated by the red cells), we will have negative velocity and this will lead to reversed vector flow.

2). if progenitors start from high expression, starting from time point 0, cells should start to decrease expression. There must be progenitors that are below the steady-state line. However, if most of the progenitors are laying above the steady state line, we will have positive velocity and this will lead to reversed vector flow.

The same rationale can be applied to the mature cell states.

Thus, we design an algorithm to access the confidence of each gene obeying the above two constraints: We first check for whether a gene should be in the induction or repression phase from each progenitor to each terminal cell states (based on the shift of the median gene expression between these two states). If it is in induction phase, cells should show mostly at >= small negative velocity; otherwise <= small negative velocity. 1 - ratio of cells with velocity pass those threshold (defined by V_threshold) in each state is then defined as a velocity confidence measure.

Note that, this heuristic method requires you provide meaningful progenitors_groups and mature_cells_groups. In particular, the progentitor groups should in principle have cell going out (transcriptomically) while mature groups should end up in a different expression state and there are intermediate cells going to the dead end cells in the each terminal group (or most terminal groups).

Parameters
  • adata (AnnData) – an Annodata object

  • group (str) – The column key/name that identifies the cell state grouping information of cells. This will be used for calculating gene-wise confidence score in each cell state.

  • lineage_dict (dict) – A dictionary describes lineage priors. Keys corresponds to the group name from group that corresponding to the state of one progenitor type while values correspond to the group names from group that corresponding to the states of one or multiple terminal cell states. The best practice for determining terminal cell states are those fully functional cells instead of intermediate cell states. Note that in python a dictionary key cannot be a list, so if you have two progenitor types converge into one terminal cell state, you need to create two records each with the same terminal cell as value but different progenitor as the key. Value can be either a string for one cell group or a list of string for multiple cell groups.

  • genes (list or None (default: None)) – The list of genes that will be used to gene-wise confidence score calculation. If None, all genes that go through velocity estimation will be used.

  • ekey (str or None (default: M_s)) – The layer that will be used to retrieve data for identifying the gene is in induction or repression phase at each cell state. If None, .X is used.

  • vkey (str or None (default: velocity_S)) – The layer that will be used to retrieve velocity data for calculating gene-wise confidence. If None, velocity_S is used.

  • X_data (np.ndarray (default: None)) – The user supplied data that will be used for identifying the gene is in induction or repression phase at each cell state directly

  • V_data (np.ndarray (default: None)) – The user supplied data that will be used for calculating gene-wise confidence directly.

  • V_threshold (float (default: 1)) – The threshold of velocity to calculate the gene wise confidence.

Returns

  • An updated adata object with a new gene_wise_confidence key in .uns, which contains gene-wise confidence score

  • in each cell state. .var will also be updated with avg_prog_confidence and avg_mature_confidence key which

  • correspond to the average gene wise confidence in the progenitor state or the mature cell state.