dynamo.pp.Preprocessor

class dynamo.pp.Preprocessor(collapse_species_adata_function=<function collapse_species_adata>, convert_gene_name_function=<function convert2symbol>, filter_cells_by_outliers_function=<function filter_cells_by_outliers>, filter_cells_by_outliers_kwargs={}, filter_genes_by_outliers_function=<function filter_genes_by_outliers>, filter_genes_by_outliers_kwargs={}, filter_cells_by_highly_variable_genes_function=<function filter_cells_by_highly_variable_genes>, filter_cells_by_highly_variable_genes_kwargs={}, normalize_by_cells_function=<function normalize>, normalize_by_cells_function_kwargs={}, size_factor_function=<function calc_sz_factor>, size_factor_kwargs={}, select_genes_function=<function select_genes_monocle>, select_genes_kwargs={}, normalize_selected_genes_function=None, normalize_selected_genes_kwargs={}, norm_method=<function log1p>, norm_method_kwargs={}, pca_function=<function pca>, pca_kwargs={}, gene_append_list=[], gene_exclude_list=[], force_gene_list=None, sctransform_kwargs={}, regress_out_kwargs={}, cell_cycle_score_enable=False, cell_cycle_score_kwargs={})[source]

Preprocessor constructor.

The default preprocess functions are those of monocle recipe by default. You can pass your own Callable objects (functions) to this constructor directly, which wil be used in the preprocess steps later. These functions parameters are saved into Preprocessor instances. You can set these attributes directly to your own implementation.

Parameters:
  • collapse_species_adata_function (Callable) – function for collapsing the species data. Defaults to collapse_species_adata.

  • convert_gene_name_function (Callable) – transform gene names, by default convert2symbol, which transforms unofficial gene names to official gene names. Defaults to convert2symbol.

  • filter_cells_by_outliers_function (Callable) – filter cells by thresholds. Defaults to monocle_filter_cells_by_outliers.

  • filter_cells_by_outliers_kwargs (Dict[str, Any]) – arguments that will be passed to filter_cells_by_outliers. Defaults to {}.

  • filter_genes_by_outliers_function (Callable) – filter genes by thresholds. Defaults to monocle_filter_genes_by_outliers.

  • filter_genes_by_outliers_kwargs (Dict[str, Any]) – arguments that will be passed to filter_genes_by_outliers. Defaults to {}.

  • filter_cells_by_highly_variable_genes_function (Callable) – filter cells by highly variable genes. Defaults to filter_cells_by_highly_variable_genes.

  • filter_cells_by_highly_variable_genes_kwargs (Dict[str, Any]) – arguments that will be passed to filter_cells_by_highly_variable_genes. Defaults to {}.

  • normalize_by_cells_function (Callable) – function for performing cell-wise normalization. Defaults to normalize_cell_expr_by_size_factors.

  • normalize_by_cells_function_kwargs (Dict[str, Any]) – arguments that will be passed to normalize_by_cells_function. Defaults to {}.

  • select_genes_function (Callable) – function for selecting gene features. Defaults to select_genes_monocle.

  • select_genes_kwargs (Dict[str, Any]) – arguments that will be passed to select_genes. Defaults to {}.

  • normalize_selected_genes_function (Optional[Callable]) – function for normalize selected genes. Defaults to None.

  • normalize_selected_genes_kwargs (Dict[str, Any]) – arguments that will be passed to normalize_selected_genes. Defaults to {}.

  • norm_method (Callable) – whether to use a method to normalize layers in adata. Defaults to True.

  • norm_method_kwargs (Dict[str, Any]) – arguments passed to norm_method. Defaults to {}.

  • pca_function (Callable) – function to perform pca. Defaults to pca in utils.py.

  • pca_kwargs (Dict[str, Any]) – arguments that will be passed pca. Defaults to {}.

  • gene_append_list (List[str]) – ensure that a list of genes show up in selected genes across all the recipe pipeline. Defaults to [].

  • gene_exclude_list (List[str]) – exclude a list of genes across all the recipe pipeline. Defaults to [].

  • force_gene_list (Optional[List[str]]) – use this gene list as selected genes across all the recipe pipeline. Defaults to None.

  • sctransform_kwargs (Dict[str, Any]) – arguments passed into sctransform function. Defaults to {}.

  • regress_out_kwargs (Dict[List[str], Any]) – arguments passed into regress_out function. Defaults to {}.

__init__(collapse_species_adata_function=<function collapse_species_adata>, convert_gene_name_function=<function convert2symbol>, filter_cells_by_outliers_function=<function filter_cells_by_outliers>, filter_cells_by_outliers_kwargs={}, filter_genes_by_outliers_function=<function filter_genes_by_outliers>, filter_genes_by_outliers_kwargs={}, filter_cells_by_highly_variable_genes_function=<function filter_cells_by_highly_variable_genes>, filter_cells_by_highly_variable_genes_kwargs={}, normalize_by_cells_function=<function normalize>, normalize_by_cells_function_kwargs={}, size_factor_function=<function calc_sz_factor>, size_factor_kwargs={}, select_genes_function=<function select_genes_monocle>, select_genes_kwargs={}, normalize_selected_genes_function=None, normalize_selected_genes_kwargs={}, norm_method=<function log1p>, norm_method_kwargs={}, pca_function=<function pca>, pca_kwargs={}, gene_append_list=[], gene_exclude_list=[], force_gene_list=None, sctransform_kwargs={}, regress_out_kwargs={}, cell_cycle_score_enable=False, cell_cycle_score_kwargs={})[source]

Preprocessor constructor.

The default preprocess functions are those of monocle recipe by default. You can pass your own Callable objects (functions) to this constructor directly, which wil be used in the preprocess steps later. These functions parameters are saved into Preprocessor instances. You can set these attributes directly to your own implementation.

Parameters:
  • collapse_species_adata_function (Callable) – function for collapsing the species data. Defaults to collapse_species_adata.

  • convert_gene_name_function (Callable) – transform gene names, by default convert2symbol, which transforms unofficial gene names to official gene names. Defaults to convert2symbol.

  • filter_cells_by_outliers_function (Callable) – filter cells by thresholds. Defaults to monocle_filter_cells_by_outliers.

  • filter_cells_by_outliers_kwargs (Dict[str, Any]) – arguments that will be passed to filter_cells_by_outliers. Defaults to {}.

  • filter_genes_by_outliers_function (Callable) – filter genes by thresholds. Defaults to monocle_filter_genes_by_outliers.

  • filter_genes_by_outliers_kwargs (Dict[str, Any]) – arguments that will be passed to filter_genes_by_outliers. Defaults to {}.

  • filter_cells_by_highly_variable_genes_function (Callable) – filter cells by highly variable genes. Defaults to filter_cells_by_highly_variable_genes.

  • filter_cells_by_highly_variable_genes_kwargs (Dict[str, Any]) – arguments that will be passed to filter_cells_by_highly_variable_genes. Defaults to {}.

  • normalize_by_cells_function (Callable) – function for performing cell-wise normalization. Defaults to normalize_cell_expr_by_size_factors.

  • normalize_by_cells_function_kwargs (Dict[str, Any]) – arguments that will be passed to normalize_by_cells_function. Defaults to {}.

  • select_genes_function (Callable) – function for selecting gene features. Defaults to select_genes_monocle.

  • select_genes_kwargs (Dict[str, Any]) – arguments that will be passed to select_genes. Defaults to {}.

  • normalize_selected_genes_function (Optional[Callable]) – function for normalize selected genes. Defaults to None.

  • normalize_selected_genes_kwargs (Dict[str, Any]) – arguments that will be passed to normalize_selected_genes. Defaults to {}.

  • norm_method (Callable) – whether to use a method to normalize layers in adata. Defaults to True.

  • norm_method_kwargs (Dict[str, Any]) – arguments passed to norm_method. Defaults to {}.

  • pca_function (Callable) – function to perform pca. Defaults to pca in utils.py.

  • pca_kwargs (Dict[str, Any]) – arguments that will be passed pca. Defaults to {}.

  • gene_append_list (List[str]) – ensure that a list of genes show up in selected genes across all the recipe pipeline. Defaults to [].

  • gene_exclude_list (List[str]) – exclude a list of genes across all the recipe pipeline. Defaults to [].

  • force_gene_list (Optional[List[str]]) – use this gene list as selected genes across all the recipe pipeline. Defaults to None.

  • sctransform_kwargs (Dict[str, Any]) – arguments passed into sctransform function. Defaults to {}.

  • regress_out_kwargs (Dict[List[str], Any]) – arguments passed into regress_out function. Defaults to {}.

Methods

__init__([collapse_species_adata_function, ...])

Preprocessor constructor.

add_experiment_info(adata[, tkey, ...])

Infer the experiment type and experiment layers stored in the AnnData object and record the info in unstructured metadata (.uns).

config_monocle_pearson_residuals_recipe(adata)

Automatically configure the preprocessor for using the Monocle-Pearson-residuals style recipe.

config_monocle_recipe(adata[, n_top_genes])

Automatically configure the preprocessor for monocle recipe.

config_pearson_residuals_recipe(adata)

Automatically configure the preprocessor for using the Pearson residuals style recipe.

config_sctransform_recipe(adata)

Automatically configure the preprocessor for using the sctransform style recipe.

config_seurat_recipe(adata)

Automatically configure the preprocessor for using the seurat style recipe.

preprocess_adata(adata[, recipe, tkey, ...])

Preprocess the AnnData object with the recipe specified.

preprocess_adata_monocle(adata[, tkey, ...])

Preprocess the AnnData object based on Monocle style preprocessing recipe.

preprocess_adata_monocle_pearson_residuals(adata)

A combined pipeline of monocle and pearson_residuals.

preprocess_adata_pearson_residuals(adata[, ...])

A pipeline proposed in Pearson residuals (Lause, Berens & Kobak, 2021).

preprocess_adata_sctransform(adata[, tkey, ...])

Python implementation of https://github.com/satijalab/sctransform.

preprocess_adata_seurat(adata[, tkey, ...])

The preprocess pipeline in Seurat based on dispersion, implemented by dynamo authors.

preprocess_adata_seurat_wo_pca(adata[, ...])

Preprocess the anndata object according to standard preprocessing in Seurat recipe without PCA.

standardize_adata(adata, tkey, experiment_type)

Process the AnnData object to make it meet the standards of dynamo.