dynamo.pp.filter_cells_by_outliers

dynamo.pp.filter_cells_by_outliers(adata, filter_bool=None, layer='all', keep_filtered=False, min_expr_genes_s=50, min_expr_genes_u=25, min_expr_genes_p=1, max_expr_genes_s=inf, max_expr_genes_u=inf, max_expr_genes_p=inf, max_pmito_s=None, shared_count=None, spliced_key='spliced', unspliced_key='unspliced', protein_key='protein', obs_store_key='pass_basic_filter')[source]

Select valid cells based on a collection of filters including spliced, unspliced and protein min/max vals.

Parameters:
  • adata (AnnData) – an AnnData object.

  • filter_bool (Optional[ndarray]) – a boolean array from the user to select cells for downstream analysis. Defaults to None.

  • layer (str) – the layer (include X) used for feature selection. Defaults to “all”.

  • keep_filtered (bool) – whether to keep cells that don’t pass the filtering in the adata object. Defaults to False.

  • min_expr_genes_s (int) – minimal number of genes with expression for a cell in the data from the spliced layer (also used for X). Defaults to 50.

  • min_expr_genes_u (int) – minimal number of genes with expression for a cell in the data from the unspliced layer. Defaults to 25.

  • min_expr_genes_p (int) – minimal number of genes with expression for a cell in the data from in the protein layer. Defaults to 1.

  • max_expr_genes_s (float) – maximal number of genes with expression for a cell in the data from the spliced layer (also used for X). Defaults to np.inf.

  • max_expr_genes_u (float) – maximal number of genes with expression for a cell in the data from the unspliced layer. Defaults to np.inf.

  • max_expr_genes_p (float) – maximal number of protein with expression for a cell in the data from the protein layer. Defaults to np.inf.

  • max_pmito_s (Optional[float]) – maximal percentage of mitochondrial genes for a cell in the data from the spliced layer.

  • shared_count (Optional[int]) – the minimal shared number of counts for each cell across genes between layers. Defaults to None.

  • spliced_key – name of the layer storing spliced data. Defaults to “spliced”.

  • unspliced_key – name of the layer storing unspliced data. Defaults to “unspliced”.

  • protein_key – name of the layer storing protein data. Defaults to “protein”.

  • obs_store_key – name of the layer to store the filtered data. Defaults to “pass_basic_filter”.

Raises:

ValueError – the layer provided is invalid.

Return type:

AnnData

Returns:

An updated AnnData object indicating the selection of cells for downstream analysis. adata will be subsetted with only the cells pass filtering if keep_filtered is set to be False.