dynamo.pp.calc_sz_factor(adata_ori, layers='all', total_layers=None, splicing_total_layers=False, X_total_layers=False, locfunc=<function nanmean>, chunk_size=None, round_exprs=False, method='median', scale_to=None, use_all_genes_cells=True, genes_use_for_norm=None, initial_dtype=None)[source]

Calculate the size factor of each cell using geometric mean or median of total UMI across cells for a AnnData object.

This function is partly based on Monocle R package (https://github.com/cole-trapnell-lab/monocle3).

  • adata_ori (AnnData) – an AnnData object.

  • layers (Union[str, List[str]]) – the layer(s) to be normalized. Defaults to “all”, including RNA (X, raw) or spliced, unspliced, protein, etc.

  • total_layers (Optional[List[str]]) – the layer(s) that can be summed up to get the total mRNA. For example, [“spliced”, “unspliced”], [“uu”, “ul”, “su”, “sl”] or [“new”, “old”], etc. Defaults to None.

  • splicing_total_layers (bool) – whether to also normalize spliced / unspliced layers by size factor from total RNA. Defaults to False.

  • X_total_layers (bool) – whether to also normalize adata.X by size factor from total RNA. Defaults to False.

  • locfunc (Callable) – the function to normalize the data. Defaults to np.nanmean.

  • chunk_size (Optional[int]) – the number of cells to be processed at a time. Defaults to None.

  • round_exprs (bool) – whether the gene expression should be rounded into integers. Defaults to False.

  • method (Literal['mean-geometric-mean-total', 'geometric', 'median']) – the method used to calculate the expected total reads / UMI used in size factor calculation. Only mean-geometric-mean-total / geometric and median are supported. When mean-geometric-mean-total is used, size factors will be calculated using the geometric mean with given mean function. When median is used, locfunc will be replaced with np.nanmedian. When mean is used, locfunc will be replaced with np.nanmean. Defaults to “median”.

  • scale_to (Optional[float]) – the final total expression for each cell that will be scaled to. Defaults to None.

  • use_all_genes_cells (bool) – whether all cells and genes should be used for the size factor calculation. Defaults to True.

  • genes_use_for_norm (Optional[List[str]]) – A list of gene names that will be used to calculate total RNA for each cell and then the size factor for normalization. This is often very useful when you want to use only the host genes to normalize the dataset in a virus infection experiment (i.e. CMV or SARS-CoV-2 infection). Defaults to None.

  • initial_dtype (Optional[type]) – the data type when initializing a new array. Should be one of the float type.

Return type:



An updated anndata object that are updated with the Size_Factor (layer_ + Size_Factor) column(s) in the obs attribute.