Class¶
Estimation¶
Conventional scRNAseq (est.csc)¶

class
csc.
ss_estimation
(U=None, Ul=None, S=None, Sl=None, P=None, US=None, S2=None, conn=None, t=None, ind_for_proteins=None, model='stochastic', est_method='gmm', experiment_type='deg', assumption_mRNA=None, assumption_protein='ss', concat_data=True, cores=1, **kwargs)¶ The class that estimates parameters with input data.
 Parameters
U (
ndarray
or sparse csr_matrix) – A matrix of unspliced mRNA count.Ul (
ndarray
or sparse csr_matrix) – A matrix of unspliced, labeled mRNA count.S (
ndarray
or sparse csr_matrix) – A matrix of spliced mRNA count.Sl (
ndarray
or sparse csr_matrix) – A matrix of spliced, labeled mRNA count.P (
ndarray
or sparse csr_matrix) – A matrix of protein count.US (
ndarray
or sparse csr_matrix) – A matrix of second moment of unspliced/spliced gene expression count for conventional or NTR velocity.S2 (
ndarray
or sparse csr_matrix) – A matrix of second moment of spliced gene expression count for conventional or NTR velocity.conn (
ndarray
or sparse csr_matrix) – The connectivity matrix that can be used to calculate first /second moment of the data.t (
ss_estimation
) – A vector of time points.ind_for_proteins (
ndarray
) – A 1D vector of the indices in the U, Ul, S, Sl layers that corresponds to the row name in the protein or X_protein key of .obsm attribute.experiment_type (str) – labelling experiment type. Available options are: (1) ‘deg’: degradation experiment; (2) ‘kin’: synthesis experiment; (3) ‘oneshot’: oneshot kinetic experiment; (4) ‘mix_std_stm’: a mixed steady state and stimulation labeling experiment.
assumption_mRNA (str) – Parameter estimation assumption for mRNA. Available options are: (1) ‘ss’: pseudo steady state; (2) None: kinetic data with no assumption.
assumption_protein (str) – Parameter estimation assumption for protein. Available options are: (1) ‘ss’: pseudo steady state;
concat_data (bool (default: True)) – Whether to concatenate data
cores (int (default: 1)) – Number of cores to run the estimation. If cores is set to be > 1, multiprocessing will be used to parallel the parameter estimation.
 Returns
t (
ss_estimation
) – A vector of time points.data (dict) – A dictionary with uu, ul, su, sl, p as its keys.
extyp (str) – labelling experiment type.
asspt_mRNA (str) – Parameter estimation assumption for mRNA.
asspt_prot (str) – Parameter estimation assumption for protein.
parameters (dict) –
 A dictionary with alpha, beta, gamma, eta, delta as its keys.
alpha: transcription rate beta: RNA splicing rate gamma: spliced mRNA degradation rate eta: translation rate delta: protein degradation rate

concatenate_data
()¶ Concatenate available data into a single matrix.
See “concat_time_series_matrices” for details.

fit
(intercept=False, perc_left=None, perc_right=5, clusters=None, one_shot_method='combined')¶ Fit the input data to estimate all or a subset of the parameters
 Parameters
intercept (bool) – If using steady state assumption for fitting, then: True – the linear regression is performed with an unfixed intercept; False – the linear regression is performed with a fixed zero intercept.
perc_left (float (default: 5)) – The percentage of samples included in the linear regression in the left tail. If set to None, then all the samples are included.
perc_right (float (default: 5)) – The percentage of samples included in the linear regression in the right tail. If set to None, then all the samples are included.
clusters (list) – A list of n clusters, each element is a list of indices of the samples which belong to this cluster.

fit_alpha_oneshot
(t, U, beta, clusters=None)¶ Estimate alpha with the oneshot data.
 Parameters
 Returns
alpha – A numpy array with the dimension of n_genes x clusters.
 Return type
ndarray

fit_beta_gamma_lsq
(t, U, S)¶ Estimate beta and gamma with the degradation data using the least squares method.
 Parameters
t (
ndarray
) – A vector of time points.U (
ndarray
) – A matrix of unspliced mRNA counts. Dimension: genes x cells.S (
ndarray
) – A matrix of spliced mRNA counts. Dimension: genes x cells.
 Returns
beta (
ndarray
) – A vector of betas for all the genes.gamma (
ndarray
) – A vector of gammas for all the genes.u0 (float) – Initial value of u.
s0 (float) – Initial value of s.

fit_gamma_nosplicing_lsq
(t, L)¶ Estimate gamma with the degradation data using the least squares method when there is no splicing data.
 Parameters
t (
ndarray
) – A vector of time points.L (
ndarray
) – A matrix of labeled mRNA counts. Dimension: genes x cells.
 Returns
gamma (
ndarray
) – A vector of gammas for all the genes.l0 (float) – The estimated value for the initial spliced, labeled mRNA count.

fit_gamma_steady_state
(u, s, intercept=True, perc_left=None, perc_right=5, normalize=True)¶ Estimate gamma using linear regression based on the steady state assumption.
 Parameters
u (
ndarray
or sparse csr_matrix) – A matrix of unspliced mRNA counts. Dimension: genes x cells.s (
ndarray
or sparse csr_matrix) – A matrix of spliced mRNA counts. Dimension: genes x cells.intercept (bool) – If using steady state assumption for fitting, then: True – the linear regression is performed with an unfixed intercept; False – the linear regresssion is performed with a fixed zero intercept.
perc_left (float) – The percentage of samples included in the linear regression in the left tail. If set to None, then all the left samples are excluded.
perc_right (float) – The percentage of samples included in the linear regression in the right tail. If set to None, then all the samples are included.
normalize (bool) – Whether to first normalize the
 Returns
k (float) – The slope of the linear regression model, which is gamma under the steady state assumption.
b (float) – The intercept of the linear regression model.
r2 (float) – Coefficient of determination or r square for the extreme data points.
r2 (float) – Coefficient of determination or r square for the extreme data points.
all_r2 (float) – Coefficient of determination or r square for all data points.

fit_gamma_stochastic
(est_method, u, s, us, ss, perc_left=None, perc_right=5, normalize=True)¶ Estimate gamma using GMM (generalized method of moments) or negbin distrubtion based on the steady state assumption.
 Parameters
est_method (str {gmm, negbin} The estimation method to be used when using the stochastic model.) –
Available options when the model is ‘ss’ include:
(2) ‘gmm’: The new generalized methods of moments from us that is based on master equations, similar to the “moment” model in the excellent scVelo package; (3) ‘negbin’: The new method from us that models steady state RNA expression as a negative binomial distribution, also built upon on master equations. Note that all those methods require using extreme data points (except negbin, which use all data points) for estimation. Extreme data points are defined as the data from cells whose expression of unspliced / spliced or new / total RNA, etc. are in the top or bottom, 5%, for example. linear_regression only considers the mean of RNA species (based on the deterministic ordinary different equations) while moment based methods (gmm, negbin) considers both first moment (mean) and second moment (uncentered variance) of RNA species (based on the stochastic master equations). The above method are all (generalized) linear regression based method. In order to return estimated parameters (including RNA halflife), it additionally returns Rsquared (either just for extreme data points or all data points) as well as the loglikelihood of the fitting, which will be used for transition matrix and velocity embedding. All est_method uses least square to estimate optimal parameters with latin cubic sampler for initial sampling.
u (
ndarray
or sparse csr_matrix) – A matrix of unspliced mRNA counts. Dimension: genes x cells.s (
ndarray
or sparse csr_matrix) – A matrix of spliced mRNA counts. Dimension: genes x cells.us (
ndarray
or sparse csr_matrix) – A matrix of unspliced mRNA counts. Dimension: genes x cells.ss (
ndarray
or sparse csr_matrix) – A matrix of spliced mRNA counts. Dimension: genes x cells.perc_left (float) – The percentage of samples included in the linear regression in the left tail. If set to None, then all the left samples are excluded.
perc_right (float) – The percentage of samples included in the linear regression in the right tail. If set to None, then all the samples are included.
normalize (bool) – Whether to first normalize the
 Returns
k (float) – The slope of the linear regression model, which is gamma under the steady state assumption.
b (float) – The intercept of the linear regression model.
r2 (float) – Coefficient of determination or r square for the extreme data points.
r2 (float) – Coefficient of determination or r square for the extreme data points.
all_r2 (float) – Coefficient of determination or r square for all data points.

get_exist_data_names
()¶ Get the names of all the data that are not ‘None’.

get_n_genes
(key=None, data=None)¶ Get the number of genes.

set_parameter
(name, value)¶ Set the value for the specified parameter.
 Parameters
name (string) – The name of the parameter. E.g. ‘beta’.
value (
ndarray
) – A vector of values for the parameter to be set to.

solve_alpha_mix_std_stm
(t, ul, beta, clusters=None, alpha_time_dependent=True)¶ Estimate the steady state transcription rate and analytically calculate the stimulation transcription rate given beta and steady state alpha for a mixed steady state and stimulation labeling experiment.
This approach assumes the same constant beta or gamma for both steady state or stimulation period.
 Parameters
t (list or numpy.ndarray) – Time period for stimulation state labeling for each cell.
ul – A vector of labeled RNA amount in each cell.
beta (numpy.ndarray) – A list of splicing rate for genes.
clusters (list) – A list of n clusters, each element is a list of indices of the samples which belong to this cluster.
alpha_time_dependent (bool) – Whether or not to model the simulation alpha rate as a time dependent variable.
 Returns
alpha_std, alpha_stm – The constant steady state transcription rate (alpha_std) or timedependent or timeindependent (determined by alpha_time_dependent) transcription rate (alpha_stm)
 Return type
numpy.ndarray, numpy.ndarray

class
csc.
velocity
(alpha=None, beta=None, gamma=None, eta=None, delta=None, t=None, estimation=None)¶ The class that computes RNA/protein velocity given unknown parameters.
 Parameters
alpha (
ndarray
) – A matrix of transcription rate.beta (
ndarray
) – A vector of splicing rate constant for each gene.gamma (
ndarray
) – A vector of spliced mRNA degradation rate constant for each gene.eta (
ndarray
) – A vector of protein synthesis rate constant for each gene.delta (
ndarray
) – A vector of protein degradation rate constant for each gene.t (
ndarray
or None (default: None)) – A vector of the measured time points for cellsestimation (
ss_estimation
) – An instance of the estimation class. If this not None, the parameters will be taken from this class instead of the input arguments.

get_n_cells
()¶ Get the number of cells if the parameter alpha is given.
 Returns
n_cells – The second dimension of the alpha matrix, if alpha is given.
 Return type

get_n_genes
()¶ Get the number of genes.
 Returns
n_genes – The first dimension of the alpha matrix, if alpha is given. Or, the length of beta, gamma, eta, or delta, if they are given.
 Return type

vel_p
(S, P)¶ Calculate the protein velocity.
 Parameters
S (
ndarray
or sparse csr_matrix) – A matrix of spliced mRNA counts. Dimension: genes x cells.P (
ndarray
or sparse csr_matrix) – A matrix of protein counts. Dimension: genes x cells.
 Returns
V – Each column of V is a velocity vector for the corresponding cell. Dimension: genes x cells.
 Return type
ndarray
or sparse csr_matrix

vel_s
(U, S)¶ Calculate the unspliced mRNA velocity.
 Parameters
U (
ndarray
or sparse csr_matrix) – A matrix of unspliced mRNA counts. Dimension: genes x cells.S (
ndarray
or sparse csr_matrix) – A matrix of spliced mRNA counts. Dimension: genes x cells.
 Returns
V – Each column of V is a velocity vector for the corresponding cell. Dimension: genes x cells.
 Return type
ndarray
or sparse csr_matrix

vel_u
(U)¶ Calculate the unspliced mRNA velocity.
 Parameters
U (
ndarray
or sparse csr_matrix) – A matrix of unspliced mRNA count. Dimension: genes x cells. Returns
V – Each column of V is a velocity vector for the corresponding cell. Dimension: genes x cells.
 Return type
ndarray
or sparse csr_matrix
Timeresolved metabolic labeling based scRNAseq (est.tsc)¶
Base class: a general estimation framework

class
tsc.
kinetic_estimation
(param_ranges, x0_ranges, simulator)¶ A general parameter estimation framework for all types of timeseris data
 Parameters
param_ranges (
ndarray
) – A nby2 numpy array containing the lower and upper ranges of n parameters (and initial conditions if not fixed).x0_ranges (
ndarray
) – Lower and upper bounds for initial conditions for the integrators. To fix a parameter, set its lower and upper bounds to the same value.simulator (
utils_kinetic.Linear_ODE
) – An instance of python class which solves ODEs. It should have properties ‘t’ (k time points, 1d numpy array), ‘x0’ (initial conditions for m species, 1d numpy array), and ‘x’ (solution, kbym array), as well as two functions: integrate (numerical integration), solve (analytical method).

fit_lsq
(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)¶ Fit timeseris data using least squares
 Parameters
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A mbyn numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
 Returns
popt (
ndarray
) – Optimal parameters.cost (float) – The cost function evaluated at the optimum.

test_chi2
(t, x_data, species=None, method='matrix', normalize=True)¶ perform a Pearson’s chisquare test. The statistics is computed as: sum_i (O_i  E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a mbyk matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a mbyk matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
 Returns
p (float)
The pvalue of a onetailed chisquare test.
c2 (float)
The chisquare statistics.
df (int)
Degree of freedom.
Deterministic models via analytical solution of ODEs

class
tsc.
Estimation_DeterministicDeg
(beta=None, gamma=None, x0=None)¶ An estimation class for degradation (with splicing) experiments. Order of species: <unspliced>, <spliced>

fit_lsq
(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)¶ Fit timeseris data using least squares
 Parameters
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A mbyn numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
 Returns
popt (
ndarray
) – Optimal parameters.cost (float) – The cost function evaluated at the optimum.

test_chi2
(t, x_data, species=None, method='matrix', normalize=True)¶ perform a Pearson’s chisquare test. The statistics is computed as: sum_i (O_i  E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a mbyk matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a mbyk matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
 Returns
p (float)
The pvalue of a onetailed chisquare test.
c2 (float)
The chisquare statistics.
df (int)
Degree of freedom.


class
tsc.
Estimation_DeterministicDegNosp
(gamma=None, x0=None)¶ An estimation class for degradation (without splicing) experiments.

fit_lsq
(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)¶ Fit timeseris data using least squares
 Parameters
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A mbyn numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
 Returns
popt (
ndarray
) – Optimal parameters.cost (float) – The cost function evaluated at the optimum.

test_chi2
(t, x_data, species=None, method='matrix', normalize=True)¶ perform a Pearson’s chisquare test. The statistics is computed as: sum_i (O_i  E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a mbyk matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a mbyk matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
 Returns
p (float)
The pvalue of a onetailed chisquare test.
c2 (float)
The chisquare statistics.
df (int)
Degree of freedom.


class
tsc.
Estimation_DeterministicKinNosp
(alpha, gamma, x0=0)¶ An estimation class for kinetics (without splicing) experiments with the deterministic model. Order of species: <unspliced>, <spliced>

fit_lsq
(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)¶ Fit timeseris data using least squares
 Parameters
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A mbyn numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
 Returns
popt (
ndarray
) – Optimal parameters.cost (float) – The cost function evaluated at the optimum.

test_chi2
(t, x_data, species=None, method='matrix', normalize=True)¶ perform a Pearson’s chisquare test. The statistics is computed as: sum_i (O_i  E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a mbyk matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a mbyk matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
 Returns
p (float)
The pvalue of a onetailed chisquare test.
c2 (float)
The chisquare statistics.
df (int)
Degree of freedom.


class
tsc.
Estimation_DeterministicKin
(alpha, beta, gamma, x0=array([0.0, 0.0]))¶ An estimation class for kinetics experiments with the deterministic model. Order of species: <unspliced>, <spliced>

fit_lsq
(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)¶ Fit timeseris data using least squares
 Parameters
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A mbyn numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
 Returns
popt (
ndarray
) – Optimal parameters.cost (float) – The cost function evaluated at the optimum.

test_chi2
(t, x_data, species=None, method='matrix', normalize=True)¶ perform a Pearson’s chisquare test. The statistics is computed as: sum_i (O_i  E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a mbyk matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a mbyk matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
 Returns
p (float)
The pvalue of a onetailed chisquare test.
c2 (float)
The chisquare statistics.
df (int)
Degree of freedom.

Stochastic models via matrix form of moment equations

class
tsc.
Estimation_MomentDeg
(beta=None, gamma=None, x0=None, include_cov=True)¶ An estimation class for degradation (with splicing) experiments. Order of species: <unspliced>, <spliced>, <uu>, <ss>, <us> Order of parameters: beta, gamma

fit_lsq
(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)¶ Fit timeseris data using least squares
 Parameters
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A mbyn numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
 Returns
popt (
ndarray
) – Optimal parameters.cost (float) – The cost function evaluated at the optimum.

test_chi2
(t, x_data, species=None, method='matrix', normalize=True)¶ perform a Pearson’s chisquare test. The statistics is computed as: sum_i (O_i  E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a mbyk matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a mbyk matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
 Returns
p (float)
The pvalue of a onetailed chisquare test.
c2 (float)
The chisquare statistics.
df (int)
Degree of freedom.


class
tsc.
Estimation_MomentDegNosp
(gamma=None, x0=None)¶ An estimation class for degradation (without splicing) experiments.
An estimation class for degradation (without splicing) experiments. Order of species: <r>, <rr>

fit_lsq
(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)¶ Fit timeseris data using least squares
 Parameters
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A mbyn numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
 Returns
popt (
ndarray
) – Optimal parameters.cost (float) – The cost function evaluated at the optimum.

test_chi2
(t, x_data, species=None, method='matrix', normalize=True)¶ perform a Pearson’s chisquare test. The statistics is computed as: sum_i (O_i  E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a mbyk matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a mbyk matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
 Returns
p (float)
The pvalue of a onetailed chisquare test.
c2 (float)
The chisquare statistics.
df (int)
Degree of freedom.


class
tsc.
Estimation_MomentKin
(a, b, alpha_a, alpha_i, beta, gamma, include_cov=True)¶ An estimation class for kinetics experiments. Order of species: <unspliced>, <spliced>, <uu>, <ss>, <us>

fit_lsq
(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)¶ Fit timeseris data using least squares
 Parameters
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A mbyn numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
 Returns
popt (
ndarray
) – Optimal parameters.cost (float) – The cost function evaluated at the optimum.

test_chi2
(t, x_data, species=None, method='matrix', normalize=True)¶ perform a Pearson’s chisquare test. The statistics is computed as: sum_i (O_i  E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a mbyk matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a mbyk matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
 Returns
p (float)
The pvalue of a onetailed chisquare test.
c2 (float)
The chisquare statistics.
df (int)
Degree of freedom.


class
tsc.
Estimation_MomentKinNosp
(a, b, alpha_a, alpha_i, gamma)¶ An estimation class for kinetics experiments. Order of species: <r>, <rr>

fit_lsq
(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)¶ Fit timeseris data using least squares
 Parameters
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A mbyn numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
 Returns
popt (
ndarray
) – Optimal parameters.cost (float) – The cost function evaluated at the optimum.

test_chi2
(t, x_data, species=None, method='matrix', normalize=True)¶ perform a Pearson’s chisquare test. The statistics is computed as: sum_i (O_i  E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a mbyk matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a mbyk matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
 Returns
p (float)
The pvalue of a onetailed chisquare test.
c2 (float)
The chisquare statistics.
df (int)
Degree of freedom.

Mixture models for kinetic / degradation experiments

class
tsc.
Lambda_NoSwitching
(model1, model2, alpha=None, lambd=None, gamma=None, x0=None, beta=None)¶ An estimation class with the mixture model. If beta is None, it is assumed that the data does not have the splicing process.

fit_lsq
(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)¶ Fit timeseris data using least squares
 Parameters
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A mbyn numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
 Returns
popt (
ndarray
) – Optimal parameters.cost (float) – The cost function evaluated at the optimum.

test_chi2
(t, x_data, species=None, method='matrix', normalize=True)¶ perform a Pearson’s chisquare test. The statistics is computed as: sum_i (O_i  E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a mbyk matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a mbyk matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
 Returns
p (float)
The pvalue of a onetailed chisquare test.
c2 (float)
The chisquare statistics.
df (int)
Degree of freedom.


class
tsc.
Mixture_KinDeg_NoSwitching
(model1, model2, alpha=None, gamma=None, x0=None, beta=None)¶ An estimation class with the mixture model. If beta is None, it is assumed that the data does not have the splicing process.

fit_lsq
(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)¶ Fit timeseris data using least squares
 Parameters
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A mbyn numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
 Returns
popt (
ndarray
) – Optimal parameters.cost (float) – The cost function evaluated at the optimum.

test_chi2
(t, x_data, species=None, method='matrix', normalize=True)¶ perform a Pearson’s chisquare test. The statistics is computed as: sum_i (O_i  E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a mbyk matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a mbyk matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
 Returns
p (float)
The pvalue of a onetailed chisquare test.
c2 (float)
The chisquare statistics.
df (int)
Degree of freedom.

Vector field¶
Vector field class¶

class
dynamo.vf.
vectorfield
(X=None, V=None, Grid=None, **kwargs)[source]¶ Initialize the VectorField class.
 Parameters
X ('np.ndarray' (dimension: n_obs x n_features)) – Original data.
V ('np.ndarray' (dimension: n_obs x n_features)) – Velocities of cells in the same order and dimension of X.
Grid ('np.ndarray') – The function that returns diffusion matrix which can be dependent on the variables (for example, genes)
M ('int' (default: None)) – The number of basis functions to approximate the vector field. By default it is calculated as min(len(X), int(1500 * np.log(len(X)) / (np.log(len(X)) + np.log(100)))). So that any datasets with less than about 900 data points (cells) will use full data for vector field reconstruction while any dataset larger than that will at most use 1500 data points.
a (float (default 5)) – Parameter of the model of outliers. We assume the outliers obey uniform distribution, and the volume of outlier’s variation space is a.
beta (float (default: None)) – Parameter of Gaussian Kernel, k(x, y) = exp(beta*xy^2). If None, a ruleofthumb bandwidth will be computed automatically.
ecr (float (default: 1e5)) – The minimum limitation of energy change rate in the iteration process.
gamma (float (default: 0.9)) – Percentage of inliers in the samples. This is an inital value for EM iteration, and it is not important. Default value is 0.9.
lambda (float (default: 3)) – Represents the tradeoff between the goodness of data fit and regularization.
minP (float (default: 1e5)) – The posterior probability Matrix P may be singular for matrix inversion. We set the minimum value of P as minP.
MaxIter (int (default: 500)) – Maximum iteration times.
theta (float (default 0.75)) – Define how could be an inlier. If the posterior probability of a sample is an inlier is larger than theta, then it is regarded as an inlier.
div_cur_free_kernels (bool (default: False)) – A logic flag to determine whether the divergencefree or curlfree kernels will be used for learning the vector field.
sigma ('int') – Bandwidth parameter.
eta ('int') – Combination coefficient for the divergencefree or the curlfree kernels.
seed (int or 1d array_like, optional (default: 0)) – Seed for RandomState. Must be convertible to 32 bit unsigned integers. Used in sampling control points. Default is to be 0 for ensure consistency between different runs.

fit
(normalize=False, method='SparseVFC', **kwargs)[source]¶ Learn an function of vector field from sparse single cell samples in the entire space robustly. Reference: Regularized vector field learning with sparse approximation for mismatch removal, Ma, Jiayi, etc. al, Pattern Recognition
 Parameters
normalize ('bool' (default: False)) – Logic flag to determine whether to normalize the data to have zero means and unit covariance. This is often required for raw dataset (for example, raw UMI counts and RNA velocity values in high dimension). But it is normally not required for low dimensional embeddings by PCA or other nonlinear dimension reduction methods.
method ('string') – Method that is used to reconstruct the vector field functionally. Currently only SparseVFC supported but other improved approaches are under development.
 Returns
VecFld – A dictionary which contains X, Y, beta, V, C, P, VFCIndex. Where V = f(X), P is the posterior probability and VFCIndex is the indexes of inliers which found by VFC.
 Return type
`dict’

get_Jacobian
(method='analytical', input_vector_convention='row', **kwargs)[source]¶ Get the Jacobian of the vector field function. If method is ‘analytical’: The analytical Jacobian will be returned and it always take row vectors as input no matter what input_vector_convention is.
If method is ‘numerical’: If the input_vector_convention is ‘row’, it means that fjac takes row vectors as input, otherwise the input should be an array of column vectors. Note that the returned Jacobian would behave exactly the same if the input is an 1d array.
The column vector convention is slightly faster than the row vector convention. So the matrix of row vector convention is converted into column vector convention under the hood.
No matter the method and input vector convention, the returned Jacobian is of the following format:
df_1/dx_1 df_1/dx_2 df_1/dx_3 … df_2/dx_1 df_2/dx_2 df_2/dx_3 … df_3/dx_1 df_3/dx_2 df_3/dx_3 … … … … …

evaluate
(CorrectIndex, VFCIndex, siz)[source]¶ Evaluate the precision, recall, corrRate of the sparseVFC algorithm.
 Parameters
CorrectIndex ('List') – Ground truth indexes of the correct vector field samples.
VFCIndex ('List') – Indexes of the correct vector field samples learned by VFC.
siz ('int') – Number of initial matches.
 Returns
A tuple of precision, recall, corrRate
Precision, recall, corrRate (Precision and recall of VFC, percentage of initial correct matches.)
See also::
sparseVFC()
.