dynamo.vf.SparseVFC
- dynamo.vf.SparseVFC(X, Y, Grid, M=100, a=5, beta=None, ecr=1e-05, gamma=0.9, lambda_=3, minP=1e-05, MaxIter=500, theta=0.75, div_cur_free_kernels=False, velocity_based_sampling=True, sigma=0.8, eta=0.5, seed=0, lstsq_method='drouin', verbose=1)[source]
Apply sparseVFC (vector field consensus) algorithm to learn a functional form of the vector field from random samples with outlier on the entire space robustly and efficiently. (Ma, Jiayi, etc. al, Pattern Recognition, 2013)
- Parameters:
X (
ndarray
) – Current state. This corresponds to, for example, the spliced transcriptomic state.Y (
ndarray
) – Velocity estimates in delta t. This corresponds to, for example, the inferred spliced transcriptomic velocity or total RNA velocity based on metabolic labeling data estimated calculated by dynamo.Grid (
ndarray
) – Current state on a grid which is often used to visualize the vector field. This corresponds to, for example, the spliced transcriptomic state or total RNA state.M (
int
) – The number of basis functions to approximate the vector field.a (
float
) – Parameter of the model of outliers. We assume the outliers obey uniform distribution, and the volume of outlier’s variation space is a.beta (
Optional
[float
]) – Parameter of Gaussian Kernel, k(x, y) = exp(-beta*||x-y||^2). If None, a rule-of-thumb bandwidth will be computed automatically.ecr (
float
) – The minimum limitation of energy change rate in the iteration process.gamma (
float
) – Percentage of inliers in the samples. This is an initial value for EM iteration, and it is not important.lambda – Represents the trade-off between the goodness of data fit and regularization. Larger Lambda_ put more weights on regularization.
minP (
float
) – The posterior probability Matrix P may be singular for matrix inversion. We set the minimum value of P as minP.MaxIter (
int
) – Maximum iteration times.theta (
float
) – Define how could be an inlier. If the posterior probability of a sample is an inlier is larger than theta, then it is regarded as an inlier.div_cur_free_kernels (
bool
) – A logic flag to determine whether the divergence-free or curl-free kernels will be used for learning the vector field.sigma (
float
) – Bandwidth parameter.eta (
float
) – Combination coefficient for the divergence-free or the curl-free kernels.seed (
Union
[int
,ndarray
]) – int or 1-d array_like, optional (default: 0) Seed for RandomState. Must be convertible to 32 bit unsigned integers. Used in sampling control points. Default is to be 0 for ensure consistency between different runs.lstsq_method (
str
) – The name of the linear least square solver, can be either ‘scipy` or douin.verbose (
int
) – The level of printing running information.
- Returns:
- X: Current state.
valid_ind: The indices of cells that have finite velocity values. X_ctrl: Sample control points of current state. ctrl_idx: Indices for the sampled control points. Y: Velocity estimates in delta t. beta: Parameter of the Gaussian Kernel for the kernel matrix (Gram matrix). V: Prediction of velocity of X. C: Finite set of the coefficients for the P: Posterior probability Matrix of inliers. VFCIndex: Indexes of inliers found by sparseVFC. sigma2: Energy change rate. grid: Grid of current state. grid_V: Prediction of velocity of the grid. iteration: Number of the last iteration. tecr_traj: Vector of relative energy changes rate comparing to previous step. E_traj: Vector of energy at each iteration,
where V = f(X), P is the posterior probability and VFCIndex is the indexes of inliers found by sparseVFC. Note that V = con_K(Grid, X_ctrl, beta).dot(C) gives the prediction of velocity on Grid (but can also be any point in the gene expression state space).
- Return type:
A dictionary which contains