dynamo.pp.recipe_monocle

dynamo.pp.recipe_monocle(adata, normalized=None, layer=None, total_layers=None, genes_to_use=None, method='pca', num_dim=30, sz_method='median', norm_method=None, pseudo_expr=1, feature_selection='SVR', n_top_genes=2000, relative_expr=True, keep_filtered_cells=True, keep_filtered_genes=True, scopes=None, fc_kwargs=None, fg_kwargs=None, sg_kwargs=None)[source]

This function is partly based on Monocle R package (https://github.com/cole-trapnell-lab/monocle3).

Parameters
  • adata (AnnData) – AnnData object.

  • normalized (None or bool (default: None)) – If you already normalized your data (or run recipe_monocle already), set this to be True to avoid renormalizing your data. By default it is set to be None and the first 20 values of adata.X (if adata.X is sparse) or its first column will be checked to determine whether you already normalized your data. This only works for UMI based or read-counts data.

  • layer (str (default: None)) – The layer(s) to be normalized. Default is all, including RNA (X, raw) or spliced, unspliced, protein, etc.

  • total_layers (bool, list or None (default None)) –

    The layer(s) that can be summed up to get the total mRNA. for example, [“spliced”, “unspliced”], [“uu”, “ul”,

    ”su”, “sl”] or [“total”], etc. If total_layers is True, total_layers will be set to be total or [“uu”, “ul”, “su”, “sl”] depends on whether you have labeling but no splicing or labeling and splicing data.

  • genes_to_use (list (default: None)) – A list genes of gene names that will be used to set as the feature genes for downstream analysis.

  • method (str (default: log)) – The linear dimension reduction methods to be used.

  • num_dim (int (default: 30)) – The number of linear dimensions reduced to.

  • sz_method (str (default: mean-geometric-mean-total)) – The method used to calculate the expected total reads / UMI used in size factor calculation. Only mean-geometric-mean-total / geometric and median are supported. When median is used, locfunc will be replaced with np.nanmedian.

  • norm_method (function or None (default: function None)) – The method to normalize the data. Can be any numpy function or Freeman_Tukey. By default, only .X will be size normalized and log1p transformed while data in other layers will only be size factor normalized.

  • pseudo_expr (int (default: 1)) – A pseudocount added to the gene expression value before log/log2 normalization.

  • feature_selection (str (default: SVR)) – Which soring method, either dispersion, SVR or Gini index, to be used to select genes.

  • n_top_genes (int (default: 2000)) – How many top genes based on scoring method (specified by sort_by) will be selected as feature genes.

  • relative_expr (bool (default: True)) – A logic flag to determine whether we need to divide gene expression values first by size factor before normalization.

  • keep_filtered_cells (bool (default: True)) – Whether to keep genes that don’t pass the filtering in the adata object.

  • keep_filtered_genes (bool (default: True)) – Whether to keep genes that don’t pass the filtering in the adata object.

  • scopes (str, list-like` or None (default: None)) – Scopes are needed when you use non-official gene name as your gene indices (or adata.var_name). This arugument corresponds to type of types of identifiers, either a list or a comma-separated fields to specify type of input qterms, e.g. “entrezgene”, “entrezgene,symbol”, [“ensemblgene”, “symbol”]. Refer to official MyGene.info docs (https://docs.mygene.info/en/latest/doc/query_service.html#available_fields) for full list of fields.

  • fc_kwargs (dict or None (default: None)) – Other Parameters passed into the filter_genes function.

  • fg_kwargs (dict or None (default: None)) – Other Parameters passed into the filter_cells function.

  • sg_kwargs (dict or None (default: None)) – Other Parameters passed into the select_cells function.

Returns

adata – A updated anndata object that are updated with Size_Factor, normalized expression values, X and reduced dimensions, etc.

Return type

AnnData