dynamo.vf.cluster_field

dynamo.vf.cluster_field(adata, basis='pca', features=['speed', 'potential', 'divergence', 'acceleration', 'curvature', 'curl'], add_embedding_basis=True, embedding_basis=None, normalize=False, method='leiden', cores=1, copy=False, resolution=1.0, **kwargs)[source]

Cluster cells based on vector field features.

We would like to see whether the vector field can be used to better define cell state/types. This can be accessed via characterizing critical points (attractor/saddle/repressor, etc.) and characteristic curves (nullcline, separatrix). However, the calculation of those is not easy, for example, a strict definition of an attractor is states where velocity is 0 and the eigenvalue of the jacobian matrix at that point is all negative. Under this strict definition, we may sometimes find the attractors are very far away from our sampled cell states which makes them less meaningful although this can be largely avoided when we decide to remove the density correction during the velocity projection. This is not unexpected as the vector field we learned is defined via a set of basis functions based on gaussian kernels and thus it is hard to satisfy that strict definition.

Fortunately, we can handle this better with the help of a different set of ideas. Instead of using critical points by the classical dynamic system methods, we can use some machine learning approaches that are based on extracting geometric features of streamline to “cluster vector field space” for define cell states/type. This requires calculating, potential (ordered pseudotime), speed, curliness, divergence, acceleration, curvature, etc. Thanks to the fact that we can analytically calculate the Jacobian matrix, those quantities of the vector field function can be conveniently and efficiently calculated.

Parameters:
  • adata (AnnData) – adata object that includes both newly synthesized and total gene expression of cells. Alternatively, the object should include both unspliced and spliced gene expression of cells.

  • basis (str) – The space that will be used for calculating vector field features. Valid names includes, for example, pca, umap, etc.

  • features (List) – features have to be selected from [‘speed’, ‘potential’, ‘divergence’, ‘acceleration’, ‘curvature’, ‘curl’]

  • add_embedding_basis (bool) – Whether to add the embedding basis to the feature space for clustering.

  • embedding_basis (Optional[str]) – The embedding basis that will be combined with the vector field feature space for clustering.

  • normalize (bool) – Whether to mean center and scale the feature across all cells.

  • method (str) – The method that will be used for clustering, one of {‘kmeans’’, ‘hdbscan’, ‘louvain’, ‘leiden’}. If louvain or leiden used, you need to have cdlib installed.

  • cores (int) – The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • copy (bool) – Whether to return a new deep copy of adata instead of updating adata object passed in arguments.

  • resolution (float) – Clustering resolution, higher values yield more fine-grained clusters.

  • kwargs – Any additional arguments that will be passed to either kmeans, hdbscan, louvain or leiden clustering algorithms.

Return type:

Optional[AnnData]

Returns:

Either updates adata or directly returns a new adata object if copy is True.