dynamo.pd.fate_bias

dynamo.pd.fate_bias(adata, group, basis='umap', inds=None, use_sink_percentage=True, step_used_percentage=None, speed_percentile=5, dist_threshold=None, source_groups=None, metric='euclidean', metric_kwds=None, cores=1, seed=19491001, **kwargs)[source]

Calculate the lineage (fate) bias of states whose trajectory are predicted.

Fate bias is currently calculated as the percentage of points along the predicted cell fate trajectory whose distance to their 0-th nearest neighbors on the data are close enough (determined by median 1-st nearest neighbors of all observed cells and the dist_threshold) to any cell from each group specified by group key. The details is described as following:

Cell fate predicted by our vector field method sometimes end up in regions that are not sampled with cells. We thus developed a heuristic method to iteratively walk backward the integration path to assign cell fate. We first identify the regions with small velocity in the tail of the integration path (determined by speed_percentile), then we check whether the distance of 0-th nearest points on the observed data to all those points are far away from the observed data (determined by dist_threshold). If they are not all close to data, we then walk backwards along the trajectory by one time step until the distance of any currently visited integration path’s data points’ 0-th nearest points to the observed cells is close enough. In order to calculate the cell fate probability, we diffuse one step further of the identified nearest neighbors from the integration to identify more nearest observed cells, especially those from terminal cell types in case nearby cells first identified are all close to some random progenitor cells. Then we use group information of those observed cells to define the fate probability.

fate_bias calculate a confidence score for the calculated fate probability with a simple metric, defined as

\(1 - (sum(distances > dist_threshold * median_dist) + walk_back_steps) / (len(indices) + walk_back_steps)\)

The distance is currently visited integration path’s data points’ 0-th nearest points to the observed cells. median_dist is median distance of their 1-st nearest cell distance of all observed cells. walk_back_steps is the steps walked backward along the integration path until all currently visited integration points’s 0-th nearest points to the observed cells satisfy the distance threshold. indices are the time indices of integration points that is regarded as the regions with small velocity (note when walking backward, those corresponding points are not necessarily have small velocity anymore).

Parameters
  • adata (AnnData) – AnnData object that contains the predicted fate trajectories in the uns attribute.

  • group (str) – The column key that corresponds to the cell type or other group information for quantifying the bias of cell state.

  • basis (str or None (default: None)) – The embedding data space where cell fates were predicted and cell fates bias will be quantified.

  • inds (Optional[list]) – The indices of the time steps that will be used for calculating fate bias. Otherwise inds need to be a list of integers of the time steps.

  • use_sink_percentage (bool) – If inds is None and use_sink is True, sink calculation will be applied to calculate indices used for fate bias calculation

  • step_used_percentage (Optional[float]) – If inds is None and step_used_percentage is not None, step_used_percentage will be regarded as a percentage, and the LAST step_used_percentage of steps will be used for fate bias calculation.

  • speed_percentile (float (default: 5)) – The percentile of speed that will be used to determine the terminal cells (or sink region on the prediction path where speed is smaller than this speed percentile).

  • dist_threshold (float or None (default: None)) – A multiplier of the median nearest cell distance on the embedding to determine cells that are outside the sampled domain of cells. If the mean distance of identified “terminal cells” is above this number, we will look backward along the trajectory (by minimize all indices by 1) until it finds cells satisfy this threshold. By default it is set to be 1 to ensure only considering points that are very close to observed data points.

  • source_groups (list or None (default: None)) – The groups that corresponds to progenitor groups. They has to have at least one intersection with the groups from the group column. If group is not None, any identified “source_groups” cells that happen to be in those groups will be ignored and the probability of cell fate of those cells will be reassigned to the group that has the highest fate probability among other non source_groups group cells.

  • metric (str or callable, default=’euclidean’) – The distance metric to use for the tree. The default metric is , and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.

  • metric_kwds (dict, default=None) – Additional keyword arguments for the metric function.

  • cores (int (default: 1)) – The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • seed (int (default 19491001)) – Random seed to ensure the reproducibility of each run.

  • kwargs – Additional arguments that will be passed to each nearest neighbor search algorithm.

Returns

fate_bias – A DataFrame that stores the fate bias for each cell state (row) to each cell group (column).

Return type

pandas.DataFrame