dynamo.tl.hdbscan

dynamo.tl.hdbscan(adata, X_data=None, genes=None, layer=None, basis='pca', dims=None, n_pca_components=30, n_components=2, result_key=None, copy=False, **hdbscan_kwargs)[source]

Apply hdbscan to cluster cells in the space defined by basis.

HDBSCAN is a clustering algorithm developed by Campello, Moulavi, and Sander (https://doi.org/10.1007/978-3-642-37456-2_14) which extends DBSCAN by converting it into a hierarchical clustering algorithm, followed by using a technique to extract a flat clustering based in the stability of clusters. Here you can use hdbscan to cluster your data in any space specified by basis. The data that used to produced from this space can be specified by layer. Thus, you are able to use either the unspliced or new RNA data for dimension reduction and clustering. HDBSCAN is a density based method, it thus requires you to perform clustering on relatively low dimension, for example top 30 PCs or top 5 umap dimension with at least several thousands of cells. In practice, HDBSCAN will assign -1 for cells that have low local density and thus not able to confidentially assign to any clusters.

The hdbscan package from Leland McInnes, John Healy, Steve Astels Revision is used.

Parameters:
  • adata (AnnData) – An AnnData object.

  • X_data (Optional[ndarray]) – The user supplied data that will be used for clustering directly. Defaults to None.

  • genes (Optional[List[str]]) – The list of genes that will be used to subset the data for dimension reduction and clustering. If None, all genes will be used. Defaults to None.

  • layer (Optional[str]) – The layer that will be used to retrieve data for dimension reduction and clustering. If None, .X is used. Defaults to None.

  • basis (str) – The space that will be used for clustering. Valid names includes, for example, pca, umap, velocity_pca (that is, you can use velocity for clustering), etc. Defaults to “pca”.

  • dims (Optional[List[int]]) – The list of dimensions that will be selected for clustering. If None, all dimensions will be used. Defaults to None.

  • n_pca_components (int) – The number of pca components that will be used. Defaults to 30.

  • n_components (int) – The number of dimension that non-linear dimension reduction will be projected to. Defaults to 2.

  • result_key (Optional[str]) – The key for storing clustering results in .obs and .uns. Defaults to None.

  • copy (bool) – Whether to return a new deep copy of adata instead of updating adata object passed in arguments. Defaults to False.

Raises:

ImportError – Package hdbscan not installed.

Return type:

Optional[AnnData]

Returns:

An updated AnnData object with the clustering updated. hdbscan and hdbscan_prob are two newly added columns from .obs, corresponding to either the Cluster results or the probability of each cell belong to a cluster. hdbscan key in .uns corresponds to a dictionary that includes additional results returned from hdbscan run. Returned if copy is true.