dynamo.vf.SparseVFC

dynamo.vf.SparseVFC(X, Y, Grid, M=100, a=5, beta=None, ecr=1e-05, gamma=0.9, lambda_=3, minP=1e-05, MaxIter=500, theta=0.75, div_cur_free_kernels=False, velocity_based_sampling=True, sigma=0.8, eta=0.5, seed=0, lstsq_method='drouin', verbose=1)[source]

Apply sparseVFC (vector field consensus) algorithm to learn a functional form of the vector field from random samples with outlier on the entire space robustly and efficiently. (Ma, Jiayi, etc. al, Pattern Recognition, 2013)

Parameters:
  • X (ndarray) – Current state. This corresponds to, for example, the spliced transcriptomic state.

  • Y (ndarray) – Velocity estimates in delta t. This corresponds to, for example, the inferred spliced transcriptomic velocity or total RNA velocity based on metabolic labeling data estimated calculated by dynamo.

  • Grid (ndarray) – Current state on a grid which is often used to visualize the vector field. This corresponds to, for example, the spliced transcriptomic state or total RNA state.

  • M (int) – The number of basis functions to approximate the vector field.

  • a (float) – Parameter of the model of outliers. We assume the outliers obey uniform distribution, and the volume of outlier’s variation space is a.

  • beta (Optional[float]) – Parameter of Gaussian Kernel, k(x, y) = exp(-beta*||x-y||^2). If None, a rule-of-thumb bandwidth will be computed automatically.

  • ecr (float) – The minimum limitation of energy change rate in the iteration process.

  • gamma (float) – Percentage of inliers in the samples. This is an initial value for EM iteration, and it is not important.

  • lambda – Represents the trade-off between the goodness of data fit and regularization. Larger Lambda_ put more weights on regularization.

  • minP (float) – The posterior probability Matrix P may be singular for matrix inversion. We set the minimum value of P as minP.

  • MaxIter (int) – Maximum iteration times.

  • theta (float) – Define how could be an inlier. If the posterior probability of a sample is an inlier is larger than theta, then it is regarded as an inlier.

  • div_cur_free_kernels (bool) – A logic flag to determine whether the divergence-free or curl-free kernels will be used for learning the vector field.

  • sigma (float) – Bandwidth parameter.

  • eta (float) – Combination coefficient for the divergence-free or the curl-free kernels.

  • seed (Union[int, ndarray]) – int or 1-d array_like, optional (default: 0) Seed for RandomState. Must be convertible to 32 bit unsigned integers. Used in sampling control points. Default is to be 0 for ensure consistency between different runs.

  • lstsq_method (str) – The name of the linear least square solver, can be either ‘scipy` or douin.

  • verbose (int) – The level of printing running information.

Returns:

X: Current state.

valid_ind: The indices of cells that have finite velocity values. X_ctrl: Sample control points of current state. ctrl_idx: Indices for the sampled control points. Y: Velocity estimates in delta t. beta: Parameter of the Gaussian Kernel for the kernel matrix (Gram matrix). V: Prediction of velocity of X. C: Finite set of the coefficients for the P: Posterior probability Matrix of inliers. VFCIndex: Indexes of inliers found by sparseVFC. sigma2: Energy change rate. grid: Grid of current state. grid_V: Prediction of velocity of the grid. iteration: Number of the last iteration. tecr_traj: Vector of relative energy changes rate comparing to previous step. E_traj: Vector of energy at each iteration,

where V = f(X), P is the posterior probability and VFCIndex is the indexes of inliers found by sparseVFC. Note that V = con_K(Grid, X_ctrl, beta).dot(C) gives the prediction of velocity on Grid (but can also be any point in the gene expression state space).

Return type:

A dictionary which contains