dynamo.vf.SparseVFC

dynamo.vf.SparseVFC(X, Y, Grid, M=100, a=5, beta=None, ecr=1e-05, gamma=0.9, lambda_=3, minP=1e-05, MaxIter=500, theta=0.75, div_cur_free_kernels=False, velocity_based_sampling=True, sigma=0.8, eta=0.5, seed=0, lstsq_method='drouin', verbose=1)[source]

Apply sparseVFC (vector field consensus) algorithm to learn a functional form of the vector field from random samples with outlier on the entire space robustly and efficiently. (Ma, Jiayi, etc. al, Pattern Recognition, 2013)

Parameters
  • X ('np.ndarray') – Current state. This corresponds to, for example, the spliced transcriptomic state.

  • Y ('np.ndarray') – Velocity estimates in delta t. This corresponds to, for example, the inferred spliced transcriptomic velocity or total RNA velocity based on metabolic labeling data estimated calculated by dynamo.

  • Grid ('np.ndarray') – Current state on a grid which is often used to visualize the vector field. This corresponds to, for example, the spliced transcriptomic state or total RNA state.

  • M ('int' (default: 100)) – The number of basis functions to approximate the vector field.

  • a ('float' (default: 10)) – Parameter of the model of outliers. We assume the outliers obey uniform distribution, and the volume of outlier’s variation space is a.

  • beta ('float' (default: 0.1)) – Parameter of Gaussian Kernel, k(x, y) = exp(-beta*||x-y||^2). If None, a rule-of-thumb bandwidth will be computed automatically.

  • ecr ('float' (default: 1e-5)) – The minimum limitation of energy change rate in the iteration process.

  • gamma ('float' (default: 0.9)) – Percentage of inliers in the samples. This is an initial value for EM iteration, and it is not important.

  • lambda ('float' (default: 0.3)) – Represents the trade-off between the goodness of data fit and regularization. Larger Lambda_ put more weights on regularization.

  • minP ('float' (default: 1e-5)) – The posterior probability Matrix P may be singular for matrix inversion. We set the minimum value of P as minP.

  • MaxIter ('int' (default: 500)) – Maximum iteration times.

  • theta ('float' (default: 0.75)) – Define how could be an inlier. If the posterior probability of a sample is an inlier is larger than theta, then it is regarded as an inlier.

  • div_cur_free_kernels (bool (default: False)) – A logic flag to determine whether the divergence-free or curl-free kernels will be used for learning the vector field.

  • sigma (‘int’ (default: 0.8)) – Bandwidth parameter.

  • eta (‘int’ (default: 0.5)) – Combination coefficient for the divergence-free or the curl-free kernels.

  • seed (int or 1-d array_like, optional (default: 0)) – Seed for RandomState. Must be convertible to 32 bit unsigned integers. Used in sampling control points. Default is to be 0 for ensure consistency between different runs.

  • lstsq_method (‘str’ (default: drouin)) – The name of the linear least square solver, can be either ‘scipy` or douin.

  • verbose (int (default: 1)) – The level of printing running information.

Returns

VecFld

A dictionary which contains:

X: Current state. valid_ind: The indices of cells that have finite velocity values. X_ctrl: Sample control points of current state. ctrl_idx: Indices for the sampled control points. Y: Velocity estimates in delta t. beta: Parameter of the Gaussian Kernel for the kernel matrix (Gram matrix). V: Prediction of velocity of X. C: Finite set of the coefficients for the P: Posterior probability Matrix of inliers. VFCIndex: Indexes of inliers found by sparseVFC. sigma2: Energy change rate. grid: Grid of current state. grid_V: Prediction of velocity of the grid. iteration: Number of the last iteration. tecr_vec: Vector of relative energy changes rate comparing to previous step. E_traj: Vector of energy at each iteration,

where V = f(X), P is the posterior probability and VFCIndex is the indexes of inliers found by sparseVFC. Note that V = con_K(Grid, X_ctrl, beta).dot(C) gives the prediction of velocity on Grid (but can also be any point in the gene expression state space).

Return type

‘dict’