dynamo.vf.VectorField

dynamo.vf.VectorField(adata, basis=None, layer=None, dims=None, genes=None, normalize=False, grid_velocity=False, grid_num=50, velocity_key='velocity_S', method='SparseVFC', min_vel_corr=0.6, restart_num=5, restart_seed=[0, 100, 200, 300, 400], model_buffer_path=None, return_vf_object=False, map_topography=False, pot_curl_div=False, cores=1, result_key=None, copy=False, n=25, **kwargs)[source]

Learn a function of high dimensional vector field from sparse single cell samples in the entire space robustly.

Parameters:
  • adata (AnnData) – AnnData object that contains embedding and velocity data

  • basis (Optional[str]) – The embedding data to use. The vector field function will be learned on the low dimensional embedding and can be then projected back to the high dimensional space.

  • layer (Optional[str]) – Which layer of the data will be used for vector field function reconstruction. The layer once provided, will override the basis argument and then learn the vector field function in high dimensional space.

  • dims (Union[int, list, None]) – The dimensions that will be used for reconstructing vector field functions. If it is an int all dimension from the first dimension to dims will be used; if it is a list, the dimensions in the list will be used.

  • genes (Optional[list]) – The gene names whose gene expression will be used for vector field reconstruction. By default, (when genes is set to None), the genes used for velocity embedding (var.use_for_transition) will be used for vector field reconstruction. Note that the genes to be used need to have velocity calculated.

  • normalize (Optional[bool]) – Logic flag to determine whether to normalize the data to have zero means and unit covariance. This is often required for raw dataset (for example, raw UMI counts and RNA velocity values in high dimension). But it is normally not required for low dimensional embeddings by PCA or other non-linear dimension reduction methods.

  • grid_velocity (bool) – Whether to generate grid velocity. Note that by default it is set to be False, but for datasets with embedding dimension less than 4, the grid velocity will still be generated. Please note that number of total grids in the space increases exponentially as the number of dimensions increases. So it may quickly lead to lack of memory, for example, it cannot allocate the array with grid_num set to be 50 and dimension is 6 (50^6 total grids) on 32 G memory computer. Although grid velocity may not be generated, the vector field function can still be learned for thousands of dimensions and we can still predict the transcriptomic cell states over long time period.

  • grid_num (int) – The number of grids in each dimension for generating the grid velocity.

  • velocity_key (str) – The key from the adata layer that corresponds to the velocity matrix.

  • method (str) – Method that is used to reconstruct the vector field functionally. Currently only SparseVFC supported but other improved approaches are under development.

  • min_vel_corr (float) – The minimal threshold for the cosine correlation between input velocities and learned velocities to consider as a successful vector field reconstruction procedure. If the cosine correlation is less than this threshold and restart_num > 1, restart_num trials will be attempted with different seeds to reconstruct the vector field function. This can avoid some reconstructions to be trapped in some local optimal.

  • restart_num (int) – The number of retrials for vector field reconstructions.

  • restart_seed (Optional[list]) – A list of seeds for each retrial. Must be the same length as restart_num or None.

  • model_buffer_path (Optional[str]) – The directory address keeping all the saved/to-be-saved torch variables and NN modules. When method is set to be dynode, buffer_path will be constructed with working directory, basis and datetime.

  • return_vf_object (bool) – Whether to include an instance of a vectorfield class in the VecFld dictionary in the `uns`attribute.

  • map_topography (bool) – Whether to quantify the topography of vector field. Note that for higher than 2D vector field, we can only identify fixed points as high-dimensional nullcline and separatrices are mathematically difficult to be identified. Nullcline and separatrices will also be a surface or manifold in high-dimensional vector field.

  • pot_curl_div (bool) – Whether to calculate potential, curl or divergence for each cell. Potential can be calculated for any basis while curl and divergence is by default only applied to 2D basis. However, divergence is applicable for any dimension while curl is generally only defined for 2/3 D systems.

  • cores (int) – Number of cores to run the ddhodge function. If cores is set to be > 1, multiprocessing will be used to parallel the ddhodge calculation.

  • result_key (Optional[str]) – The key that will be used as prefix for the vector field key in .uns

  • copy (bool) – Whether to return a new deep copy of adata instead of updating adata object passed in arguments and returning None.

  • n (int) – Number of samples for calculating the fixed points.

  • kwargs – Other additional parameters passed to the vectorfield class.

Return type:

Union[AnnData, BaseVectorField]

Returns:

If copy and return_vf_object arguments are set to False, annData object is updated with the

VecFld`dictionary in the `uns attribute.

If return_vf_object is set to True, then a vector field class object is returned. If copy is set to True, a deep copy of the original adata object is returned.