dynamo.pp.recipe_monocle

dynamo.pp.recipe_monocle(adata, reset_X=False, tkey=None, t_label_keys=None, experiment_type=None, normalized=None, layer=None, total_layers=None, splicing_total_layers=False, X_total_layers=False, genes_use_for_norm=None, genes_to_use=None, genes_to_append=None, genes_to_exclude=None, exprs_frac_max=1, method='pca', num_dim=30, sz_method='median', scale_to=None, norm_method=None, pseudo_expr=1, feature_selection='SVR', n_top_genes=2000, maintain_n_top_genes=True, relative_expr=True, keep_filtered_cells=True, keep_filtered_genes=True, keep_raw_layers=True, scopes=None, fc_kwargs=None, fg_kwargs=None, sg_kwargs=None, copy=False)[source]

This function is partly based on Monocle R package (https://github.com/cole-trapnell-lab/monocle3).

Parameters
  • adata (AnnData) – AnnData object.

  • tkey (str or None (default: None)) – The column key for the labeling time of cells in .obs. Used for labeling based scRNA-seq data (will also support for conventional scRNA-seq data). Note that tkey will be saved to adata.uns[‘pp’][‘tkey’] and used in dyn.tl.dynamics in which when group is None, tkey will also be used for calculating 1st/2st moment or covariance. We recommend to use hour as the unit of time.

  • t_label_keys (str, list or None (default: None)) – The column key(s) for the labeling time label of cells in .obs. Used for either “conventional” or “labeling based” scRNA-seq data. Not used for now and tkey is implicitly assumed as t_label_key (however, tkey should just be the time of the experiment).

  • experiment_type (str {deg, kin, one-shot, mix_std_stm, ‘mixture’} or None, (default: None)) – experiment type for labeling single cell RNA-seq. Available options are: (1) ‘conventional’: conventional single-cell RNA-seq experiment, if experiment_type is None and there is only splicing data, this will be set to conventional; (2) ‘deg’: chase/degradation experiment. Cells are first labeled with an extended period, followed by chase; (3) ‘kin’: pulse/synthesis/kinetics experiment. Cells are labeled for different duration in a time-series; (4) ‘one-shot’: one-shot kinetic experiment. Cells are only labeled for a short pulse duration; Other possible experiments include: (5) ‘mix_pulse_chase’ or ‘mix_kin_deg’: This is a mixture chase experiment in which the entire experiment lasts for a certain period of time which an initial pulse followed by washing out at different time point but chasing cells at the same time point. This type of labeling strategy was adopted in scEU-seq paper. For kind of experiment, we need to assume a non-steady state dynamics. (4) ‘mix_std_stm’;

  • reset_X (bool (default: False)) – Whether do you want to let dynamo reset adata.X data based on layers stored in your experiment. One critical functionality of dynamo is about visualizing RNA velocity vector flows which requires proper data into which the high dimensional RNA velocity vectors will be projected. (1) For kinetics experiment, we recommend the use of total layer as adata.X; (2) For degradation/conventional experiment scRNA-seq, we recommend using splicing layer as adata.X. Set reset_X to True to set those default values if you are not sure.

  • normalized (None or bool (default: None)) – If you already normalized your data (or run recipe_monocle already), set this to be True to avoid renormalizing your data. By default it is set to be None and the first 20 values of adata.X (if adata.X is sparse) or its first column will be checked to determine whether you already normalized your data. This only works for UMI based or read-counts data.

  • layer (str (default: None)) – The layer(s) to be normalized. Default is all, including RNA (X, raw) or spliced, unspliced, protein, etc.

  • total_layers (bool, list or None (default None)) – The layer(s) that can be summed up to get the total mRNA. for example, [“spliced”, “unspliced”], [“uu”, “ul” , “su”, “sl”] or [“total”], etc. If total_layers is True, total_layers will be set to be total or [“uu”, “ul”, “su”, “sl”] depends on whether you have labeling but no splicing or labeling and splicing data.

  • splicing_total_layers (bool (default False)) – Whether to also normalize spliced / unspliced layers by size factor from total RNA.

  • X_total_layers (bool (default False)) – Whether to also normalize adata.X by size factor from total RNA.

  • genes_use_for_norm (list (default: None)) – A list of gene names that will be used to calculate total RNA for each cell and then the size factor for normalization. This is often very useful when you want to use only the host genes to normalize the dataset in a virus infection experiment (i.e. CMV or SARS-CoV-2 infection).

  • genes_to_use (list (default: None)) – A list of gene names that will be used to set as the feature genes for downstream analysis.

  • genes_to_append (list (default: None)) – A list of gene names that will be appended to the feature genes list for downstream analysis.

  • genes_to_exclude (list (default: None)) – A list of gene names that will be excluded to the feature genes list for downstream analysis.

  • exprs_frac_max (float (default: 1)) – The minimal fraction of gene counts to the total counts across cells that will used to filter genes. By default it is 1 which means we don’t filter any genes, but we need to change it to 0.005 or something in order to remove some highly expressed housekeeping genes.

  • method (str (default: pca)) – The linear dimension reduction methods to be used.

  • num_dim (int (default: 30)) – The number of linear dimensions reduced to.

  • sz_method (str (default: mean-geometric-mean-total)) – The method used to calculate the expected total reads / UMI used in size factor calculation. Only mean-geometric-mean-total / geometric and median are supported. When median is used, locfunc will be replaced with np.nanmedian.

  • scale_to (float or None (default: None)) – The final total expression for each cell that will be scaled to.

  • norm_method (function or None (default: function None)) – The method to normalize the data. Can be any numpy function or Freeman_Tukey. By default, only .X will be size normalized and log1p transformed while data in other layers will only be size factor normalized.

  • pseudo_expr (int (default: 1)) – A pseudocount added to the gene expression value before log/log2 normalization.

  • feature_selection (str (default: SVR)) – Which soring method, either dispersion, SVR or Gini index, to be used to select genes.

  • n_top_genes (int (default: 2000)) – How many top genes based on scoring method (specified by sort_by) will be selected as feature genes.

  • maintain_n_top_genes (bool (default: True)) – Whether to ensure 2000 feature genes selected no matter what genes_to_use, genes_to_append, etc. are specified. The only exception is that if genes_to_use is supplied with n_top_genes.

  • relative_expr (bool (default: True)) – A logic flag to determine whether we need to divide gene expression values first by size factor before normalization.

  • keep_filtered_cells (bool (default: True)) – Whether to keep genes that don’t pass the filtering in the returned adata object.

  • keep_filtered_genes (bool (default: True)) – Whether to keep genes that don’t pass the filtering in the returned adata object.

  • keep_raw_layers (bool (default: True)) – Whether to keep layers with raw measurements in the returned adata object.

  • scopes (str, list-like or None (default: None)) – Scopes are needed when you use non-official gene name as your gene indices (or adata.var_name). This arugument corresponds to type of types of identifiers, either a list or a comma-separated fields to specify type of input qterms, e.g. “entrezgene”, “entrezgene,symbol”, [“ensemblgene”, “symbol”]. Refer to official MyGene.info docs (https://docs.mygene.info/en/latest/doc/query_service.html#available_fields) for full list of fields.

  • fc_kwargs (dict or None (default: None)) – Other Parameters passed into the filter_cells function.

  • fg_kwargs (dict or None (default: None)) – Other Parameters passed into the filter_genes function.

  • sg_kwargs (dict or None (default: None)) – Other Parameters passed into the select_genes function.

  • copy (bool) – Whether to return a new deep copy of adata instead of updating adata object passed in arguments.

Returns

adata – An new or updated anndata object, based on copy parameter, that are updated with Size_Factor, normalized expression values, X and reduced dimensions, etc.

Return type

AnnData