dynamo.pp.pca

dynamo.pp.pca(adata, X_data=None, n_pca_components=30, pca_key='X_pca', pcs_key='PCs', genes_to_append=None, layer=None, svd_solver='randomized', random_state=0, use_truncated_SVD_threshold=500000, use_incremental_PCA=False, incremental_batch_size=None, return_all=False)[source]

Perform PCA reduction for monocle recipe.

When large dataset is used (e.g. 1 million cells are used), Incremental PCA is recommended to avoid the memory issue. When cell number is less than half a million, by default PCA or _truncatedSVD_with_center (use sparse matrix that doesn’t explicitly perform centering) will be used. TruncatedSVD is the fastest method. Unlike other methods which will center the data first, it performs SVD decomposition on raw input. Only use this when dataset is too large for other methods.

Parameters:
  • adata (AnnData) – an AnnData object.

  • X_data (Optional[ndarray]) – the data to perform dimension reduction on. Defaults to None.

  • n_pca_components (int) – number of PCA components reduced to. Defaults to 30.

  • pca_key (str) – the key to store the reduced data. Defaults to “X”.

  • pcs_key (str) – the key to store the principle axes in feature space. Defaults to “PCs”.

  • genes_to_append (Optional[List[str]]) – a list of genes should be inspected. Defaults to None.

  • layer (Union[List[str], str, None]) – the layer(s) to perform dimension reduction on. Would be overrided by X_data. Defaults to None.

  • svd_solver (Literal['randomized', 'arpack']) – the svd_solver to solve svd decomposition in PCA.

  • random_state (int) – the seed used to initialize the random state for PCA.

  • use_truncated_SVD_threshold (int) – the threshold of observations to use truncated SVD instead of standard PCA for efficiency.

  • use_incremental_PCA (bool) – whether to use Incremental PCA. Recommend enabling incremental PCA when dataset is too large to fit in memory.

  • incremental_batch_size (Optional[int]) – The number of samples to use for each batch when performing incremental PCA. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features.

  • return_all (bool) – whether to return the PCA fit model and the reduced array together with the updated AnnData object. Defaults to False.

Raises:
Return type:

Union[AnnData, Tuple[AnnData, Union[PCA, TruncatedSVD], ndarray]]

Returns:

The updated AnnData object with reduced data if return_all is False. Otherwise, a tuple (adata, fit, X_pca), where adata is the updated AnnData object, fit is the fit model for dimension reduction, and X_pca is the reduced array, will be returned.