dynamo.ext.select_genes_by_pearson_residuals
- dynamo.ext.select_genes_by_pearson_residuals(adata, layer=None, theta=100, clip=None, n_top_genes=2000, batch_key=None, chunksize=1000, check_values=True, inplace=True)[source]
Gene selection and normalization based on [Lause21]. Applies gene selection based on Pearson residuals. On the resulting subset, Expects raw count input.
Params
- adata
The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.
- theta
The negative binomial overdispersion parameter theta for Pearson residuals. Higher values correspond to less overdispersion (var = mean + mean^2/theta), and theta=np.Inf corresponds to a Poisson model.
- clip
- Determines if and how residuals are clipped:
If None, residuals are clipped to the interval [-sqrt(n), sqrt(n)], where n is the number of cells in the dataset (default behavior).
If any scalar c, residuals are clipped to the interval [-c, c]. Set clip=np.Inf for no clipping.
- n_top_genes
Number of highly-variable genes to keep.
- batch_key
If specified, highly-variable genes are selected within each batch separately and merged. This simple process avoids the selection of batch-specific genes and acts as a lightweight batch correction method. Genes are first sorted by how many batches they are a HVG. Ties are broken by the median rank (across batches) based on within-batch residual variance.
- chunksize
This dertermines how many genes are processed at once while computing the Pearson residual variance. Choosing a smaller value will reduce the required memory.
- n_pca_components
Number of principal components to compute in the PCA step.
- check_values
Check if counts in selected layer are integers. A Warning is returned if set to True.
- inplace
Whether to place results in adata or return them.