Class
Preprocess
- class dynamo.pp.Preprocessor(collapse_speicies_adata_function=<function collapse_species_adata>, convert_gene_name_function=<function convert2symbol>, filter_cells_by_outliers_function=<function filter_cells_by_outliers>, filter_cells_by_outliers_kwargs={}, filter_genes_by_outliers_function=<function filter_genes_by_outliers>, filter_genes_by_outliers_kwargs={}, normalize_by_cells_function=<function normalize_cell_expr_by_size_factors>, normalize_by_cells_function_kwargs={}, select_genes_function=<function select_genes_by_dispersion_general>, select_genes_kwargs={}, normalize_selected_genes_function=None, normalize_selected_genes_kwargs={}, use_log1p=True, log1p_kwargs={}, pca_function=<function pca_monocle>, pca_kwargs={}, gene_append_list=[], gene_exclude_list=[], force_gene_list=None, sctransform_kwargs={})[source]
Preprocessor constructor.
The default preprocess functions are those of monocle recipe by default. You can pass your own Callable objects (functions) to this constructor directly, which wil be used in the preprocess steps later. These functions parameters are saved into Preprocessor instances. You can set these attributes directly to your own implementation.
- Parameters:
collapse_speicies_adata_function (
Callable
) – function for collapsing the species data. Defaults to collapse_species_adata.convert_gene_name_function (
Callable
) – transform gene names, by default convert2symbol, which transforms unofficial gene names to official gene names. Defaults to convert2symbol.filter_cells_by_outliers_function (
Callable
) – filter cells by thresholds. Defaults to monocle_filter_cells_by_outliers.filter_cells_by_outliers_kwargs (
dict
) – arguments that will be passed to filter_cells_by_outliers. Defaults to {}.filter_genes_by_outliers_function (
Callable
) – filter genes by thresholds. Defaults to monocle_filter_genes_by_outliers.filter_genes_by_outliers_kwargs (
dict
) – arguments that will be passed to filter_genes_by_outliers. Defaults to {}.normalize_by_cells_function (
Callable
) – function for performing cell-wise normalization. Defaults to normalize_cell_expr_by_size_factors.normalize_by_cells_function_kwargs (
dict
) – arguments that will be passed to normalize_by_cells_function. Defaults to {}.select_genes_function (
Callable
) – function for selecting gene features. Defaults to select_genes_by_dispersion_general.select_genes_kwargs (
dict
) – arguments that will be passed to select_genes. Defaults to {}.normalize_selected_genes_function (
Optional
[Callable
]) – function for normalize selected genes. Defaults to None.normalize_selected_genes_kwargs (
dict
) – arguments that will be passed to normalize_selected_genes. Defaults to {}.use_log1p (
bool
) – whether to use log1p to normalize layers in adata. Defaults to True.log1p_kwargs (
dict
) – arguments passed to use_log1p. Defaults to {}.pca_function (
bool
) – function to perform pca. Defaults to pca_monocle.pca_kwargs (
dict
) – arguments that will be passed pca. Defaults to {}.gene_append_list (
List
[str
]) – ensure that a list of genes show up in selected genes in monocle recipe pipeline. Defaults to [].gene_exclude_list (
List
[str
]) – exclude a list of genes in monocle recipe pipeline. Defaults to [].force_gene_list (
Optional
[List
[str
]]) – use this gene list as selected genes in monocle recipe pipeline. Defaults to None.sctransform_kwargs – arguments passed into sctransform function. Defaults to {}.
- add_experiment_info(adata, tkey=None, experiment_type=None)[source]
Infer the experiment type and experiment layers stored in the AnnData object and record the info in unstructured metadata (.uns).
- Parameters:
- Raises:
ValueError – the tkey is invalid.
- Return type:
- standardize_adata(adata, tkey, experiment_type)[source]
Process the AnnData object to make it meet the standards of dynamo.
The index of the observations would be ensured to be unique. The layers with sparse matrix would be converted to compressed csr_matrix. DKM.allowed_layer_raw_names() will be used to define only_splicing, only_labeling and splicing_labeling keys. The genes would be renamed to their official name.
- config_monocle_recipe(adata, n_top_genes=2000, gene_selection_method='SVR')[source]
Automatically configure the preprocessor for monocle recipe.
- preprocess_adata_monocle(adata, tkey=None, experiment_type=None)[source]
Preprocess the AnnData object based on Monocle style preprocessing recipe.
- Parameters:
- Return type:
- config_seurat_recipe(adata)[source]
Automatically configure the preprocessor for using the seurat style recipe.
- preprocess_adata_seurat(adata, tkey=None, experiment_type=None)[source]
The preprocess pipeline in Seurat based on dispersion, implemented by dynamo authors.
Stuart and Butler et al. Comprehensive Integration of Single-Cell Data. Cell (2019) Butler et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol
- Parameters:
- Return type:
- config_sctransform_recipe(adata)[source]
Automatically configure the preprocessor for using the sctransform style recipe.
- preprocess_adata_sctransform(adata, tkey=None, experiment_type=None)[source]
Python implementation of https://github.com/satijalab/sctransform.
Hao and Hao et al. Integrated analysis of multimodal single-cell data. Cell (2021)
- Parameters:
- Return type:
- config_pearson_residuals_recipe(adata)[source]
Automatically configure the preprocessor for using the Pearson residuals style recipe.
- preprocess_adata_pearson_residuals(adata, tkey=None, experiment_type=None)[source]
A pipeline proposed in Pearson residuals (Lause, Berens & Kobak, 2021).
Lause, J., Berens, P. & Kobak, D. Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data. Genome Biol 22, 258 (2021). https://doi.org/10.1186/s13059-021-02451-7
- Parameters:
- Return type:
- config_monocle_pearson_residuals_recipe(adata)[source]
Automatically configure the preprocessor for using the Monocle-Pearson-residuals style recipe.
Useful when you want to use Pearson residual to obtain feature genes and perform PCA but also using the standard size-factor normalization and log1p analyses to normalize data for RNA velocity and vector field analyses.
- preprocess_adata_monocle_pearson_residuals(adata, tkey=None, experiment_type=None)[source]
A combined pipeline of monocle and pearson_residuals.
Results after running pearson_residuals can contain negative values, an undesired feature for later RNA velocity analysis. This function combine pearson_residual and monocle recipes so that it uses Pearson residual to obtain feature genes and perform PCA but also uses monocle recipe to generate X_spliced, X_unspliced, X_new, X_total or other data values for RNA velocity and downstream vector field analyses.
- Parameters:
- Return type:
- preprocess_adata(adata, recipe='monocle', tkey=None)[source]
Preprocess the AnnData object with the recipe specified.
- Parameters:
adata (
AnnData
) – An AnnData object.recipe (
Literal
['monocle'
,'seurat'
,'sctransform'
,'pearson_residuals'
,'monocle_pearson_residuals'
]) – The recipe used to preprocess the data. Defaults to “monocle”.tkey (
Optional
[str
]) – the key for time information (labeling time period for the cells) in .obs. Defaults to None.
- Raises:
NotImplementedError – the recipe is invalid.
- Return type:
Estimation
Conventional scRNA-seq (est.csc)
- class dynamo.est.csc.ss_estimation(U=None, Ul=None, S=None, Sl=None, P=None, US=None, S2=None, conn=None, t=None, ind_for_proteins=None, model='stochastic', est_method='gmm', experiment_type='deg', assumption_mRNA=None, assumption_protein='ss', concat_data=True, cores=1, **kwargs)
The class that estimates parameters with input data.
- Parameters:
U (
ndarray
or sparse csr_matrix) – A matrix of unspliced mRNA count.Ul (
ndarray
or sparse csr_matrix) – A matrix of unspliced, labeled mRNA count.S (
ndarray
or sparse csr_matrix) – A matrix of spliced mRNA count.Sl (
ndarray
or sparse csr_matrix) – A matrix of spliced, labeled mRNA count.P (
ndarray
or sparse csr_matrix) – A matrix of protein count.US (
ndarray
or sparse csr_matrix) – A matrix of second moment of unspliced/spliced gene expression count for conventional or NTR velocity.S2 (
ndarray
or sparse csr_matrix) – A matrix of second moment of spliced gene expression count for conventional or NTR velocity.conn (
ndarray
or sparse csr_matrix) – The connectivity matrix that can be used to calculate first /second moment of the data.t (
ss_estimation
) – A vector of time points.ind_for_proteins (
ndarray
) – A 1-D vector of the indices in the U, Ul, S, Sl layers that corresponds to the row name in the protein or X_protein key of .obsm attribute.experiment_type (str) – labelling experiment type. Available options are: (1) ‘deg’: degradation experiment; (2) ‘kin’: synthesis experiment; (3) ‘one-shot’: one-shot kinetic experiment; (4) ‘mix_std_stm’: a mixed steady state and stimulation labeling experiment.
assumption_mRNA (str) – Parameter estimation assumption for mRNA. Available options are: (1) ‘ss’: pseudo steady state; (2) None: kinetic data with no assumption.
assumption_protein (str) – Parameter estimation assumption for protein. Available options are: (1) ‘ss’: pseudo steady state;
concat_data (bool (default: True)) – Whether to concatenate data
cores (int (default: 1)) – Number of cores to run the estimation. If cores is set to be > 1, multiprocessing will be used to parallel the parameter estimation.
- Returns:
t (
ss_estimation
) – A vector of time points.data (dict) – A dictionary with uu, ul, su, sl, p as its keys.
extyp (str) – labelling experiment type.
asspt_mRNA (str) – Parameter estimation assumption for mRNA.
asspt_prot (str) – Parameter estimation assumption for protein.
parameters (dict) –
- A dictionary with alpha, beta, gamma, eta, delta as its keys.
alpha: transcription rate beta: RNA splicing rate gamma: spliced mRNA degradation rate eta: translation rate delta: protein degradation rate
- fit(intercept=False, perc_left=None, perc_right=5, clusters=None, one_shot_method='combined')
Fit the input data to estimate all or a subset of the parameters
- Parameters:
intercept (bool) – If using steady state assumption for fitting, then: True – the linear regression is performed with an unfixed intercept; False – the linear regression is performed with a fixed zero intercept.
perc_left (float (default: 5)) – The percentage of samples included in the linear regression in the left tail. If set to None, then all the samples are included.
perc_right (float (default: 5)) – The percentage of samples included in the linear regression in the right tail. If set to None, then all the samples are included.
clusters (list) – A list of n clusters, each element is a list of indices of the samples which belong to this cluster.
- fit_gamma_steady_state(u, s, intercept=True, perc_left=None, perc_right=5, normalize=True)
Estimate gamma using linear regression based on the steady state assumption.
- Parameters:
u (
ndarray
or sparse csr_matrix) – A matrix of unspliced mRNA counts. Dimension: genes x cells.s (
ndarray
or sparse csr_matrix) – A matrix of spliced mRNA counts. Dimension: genes x cells.intercept (bool) – If using steady state assumption for fitting, then: True – the linear regression is performed with an unfixed intercept; False – the linear regresssion is performed with a fixed zero intercept.
perc_left (float) – The percentage of samples included in the linear regression in the left tail. If set to None, then all the left samples are excluded.
perc_right (float) – The percentage of samples included in the linear regression in the right tail. If set to None, then all the samples are included.
normalize (bool) – Whether to first normalize the data.
- Returns:
- k: float
The slope of the linear regression model, which is gamma under the steady state assumption.
- b: float
The intercept of the linear regression model.
- r2: float
Coefficient of determination or r square for the extreme data points.
- r2: float
Coefficient of determination or r square for the extreme data points.
- all_r2: float
Coefficient of determination or r square for all data points.
- fit_gamma_stochastic(est_method, u, s, us, ss, perc_left=None, perc_right=5, normalize=True)
Estimate gamma using GMM (generalized method of moments) or negbin distrubtion based on the steady state assumption.
- Parameters:
est_method (str {gmm, negbin} The estimation method to be used when using the stochastic model.) –
Available options when the model is ‘ss’ include:
(2) ‘gmm’: The new generalized methods of moments from us that is based on master equations, similar to the “moment” model in the excellent scVelo package; (3) ‘negbin’: The new method from us that models steady state RNA expression as a negative binomial distribution, also built upon on master equations. Note that all those methods require using extreme data points (except negbin, which use all data points) for estimation. Extreme data points are defined as the data from cells whose expression of unspliced / spliced or new / total RNA, etc. are in the top or bottom, 5%, for example. linear_regression only considers the mean of RNA species (based on the deterministic ordinary different equations) while moment based methods (gmm, negbin) considers both first moment (mean) and second moment (uncentered variance) of RNA species (based on the stochastic master equations). The above method are all (generalized) linear regression based method. In order to return estimated parameters (including RNA half-life), it additionally returns R-squared (either just for extreme data points or all data points) as well as the log-likelihood of the fitting, which will be used for transition matrix and velocity embedding. All est_method uses least square to estimate optimal parameters with latin cubic sampler for initial sampling.
u (
ndarray
or sparse csr_matrix) – A matrix of unspliced mRNA counts. Dimension: genes x cells.s (
ndarray
or sparse csr_matrix) – A matrix of spliced mRNA counts. Dimension: genes x cells.us (
ndarray
or sparse csr_matrix) – A matrix of unspliced mRNA counts. Dimension: genes x cells.ss (
ndarray
or sparse csr_matrix) – A matrix of spliced mRNA counts. Dimension: genes x cells.perc_left (float) – The percentage of samples included in the linear regression in the left tail. If set to None, then all the left samples are excluded.
perc_right (float) – The percentage of samples included in the linear regression in the right tail. If set to None, then all the samples are included.
normalize (bool) – Whether to first normalize the
- Returns:
- k: float
The slope of the linear regression model, which is gamma under the steady state assumption.
- b: float
The intercept of the linear regression model.
- r2: float
Coefficient of determination or r square for the extreme data points.
- r2: float
Coefficient of determination or r square for the extreme data points.
- all_r2: float
Coefficient of determination or r square for all data points.
- fit_beta_gamma_lsq(t, U, S)
Estimate beta and gamma with the degradation data using the least squares method.
- Parameters:
- Returns:
- fit_gamma_nosplicing_lsq(t, L)
Estimate gamma with the degradation data using the least squares method when there is no splicing data.
- solve_alpha_mix_std_stm(t, ul, beta, clusters=None, alpha_time_dependent=True)
Estimate the steady state transcription rate and analytically calculate the stimulation transcription rate given beta and steady state alpha for a mixed steady state and stimulation labeling experiment.
This approach assumes the same constant beta or gamma for both steady state or stimulation period.
- Parameters:
t (list or numpy.ndarray) – Time period for stimulation state labeling for each cell.
ul – A vector of labeled RNA amount in each cell.
beta (numpy.ndarray) – A list of splicing rate for genes.
clusters (list) – A list of n clusters, each element is a list of indices of the samples which belong to this cluster.
alpha_time_dependent (bool) – Whether or not to model the simulation alpha rate as a time dependent variable.
- Returns:
- alpha_std, alpha_stm: numpy.ndarray, numpy.ndarray
The constant steady state transcription rate (alpha_std) or time-dependent or time-independent (determined by alpha_time_dependent) transcription rate (alpha_stm)
- fit_alpha_oneshot(t, U, beta, clusters=None)
Estimate alpha with the one-shot data.
- Parameters:
- Returns:
- alpha:
ndarray
A numpy array with the dimension of n_genes x clusters.
- alpha:
- concatenate_data()
Concatenate available data into a single matrix.
See “concat_time_series_matrices” for details.
- get_n_genes(key=None, data=None)
Get the number of genes.
- set_parameter(name, value)
Set the value for the specified parameter.
- Parameters:
name (string) – The name of the parameter. E.g. ‘beta’.
value (
ndarray
) – A vector of values for the parameter to be set to.
- get_exist_data_names()
Get the names of all the data that are not ‘None’.
- dynamo.est.csc.velocity
alias of <module ‘dynamo.estimation.csc.velocity’ from ‘/home/docs/checkouts/readthedocs.org/user_builds/dynamo-release/checkouts/latest/dynamo/estimation/csc/velocity.py’>
Time-resolved metabolic labeling based scRNA-seq (est.tsc)
Base class: a general estimation framework
- class dynamo.est.tsc.kinetic_estimation(param_ranges, x0_ranges, simulator)
A general parameter estimation framework for all types of time-seris data
- Parameters:
param_ranges (
ndarray
) – A n-by-2 numpy array containing the lower and upper ranges of n parameters (and initial conditions if not fixed).x0_ranges (
ndarray
) – Lower and upper bounds for initial conditions for the integrators. To fix a parameter, set its lower and upper bounds to the same value.simulator (
utils_kinetic.Linear_ODE
) – An instance of python class which solves ODEs. It should have properties ‘t’ (k time points, 1d numpy array), ‘x0’ (initial conditions for m species, 1d numpy array), and ‘x’ (solution, k-by-m array), as well as two functions: integrate (numerical integration), solve (analytical method).
- fit_lsq(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)
Fit time-seris data using least squares
- Parameters:
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A m-by-n numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
- Returns:
- popt:
ndarray
Optimal parameters.
- cost: float
The cost function evaluated at the optimum.
- popt:
- test_chi2(t, x_data, species=None, method='matrix', normalize=True)
perform a Pearson’s chi-square test. The statistics is computed as: sum_i (O_i - E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
- Returns:
p: float The p-value of a one-tailed chi-square test.
c2: float The chi-square statistics.
df: int Degree of freedom.
Deterministic models via analytical solution of ODEs
- class dynamo.est.tsc.Estimation_DeterministicDeg(beta=None, gamma=None, x0=None)
An estimation class for degradation (with splicing) experiments. Order of species: <unspliced>, <spliced>
- fit_lsq(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)
Fit time-seris data using least squares
- Parameters:
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A m-by-n numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
- Returns:
- popt:
ndarray
Optimal parameters.
- cost: float
The cost function evaluated at the optimum.
- popt:
- test_chi2(t, x_data, species=None, method='matrix', normalize=True)
perform a Pearson’s chi-square test. The statistics is computed as: sum_i (O_i - E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
- Returns:
p: float The p-value of a one-tailed chi-square test.
c2: float The chi-square statistics.
df: int Degree of freedom.
- class dynamo.est.tsc.Estimation_DeterministicDegNosp(gamma=None, x0=None)
An estimation class for degradation (without splicing) experiments.
- fit_lsq(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)
Fit time-seris data using least squares
- Parameters:
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A m-by-n numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
- Returns:
- popt:
ndarray
Optimal parameters.
- cost: float
The cost function evaluated at the optimum.
- popt:
- test_chi2(t, x_data, species=None, method='matrix', normalize=True)
perform a Pearson’s chi-square test. The statistics is computed as: sum_i (O_i - E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
- Returns:
p: float The p-value of a one-tailed chi-square test.
c2: float The chi-square statistics.
df: int Degree of freedom.
- class dynamo.est.tsc.Estimation_DeterministicKinNosp(alpha, gamma, x0=0)
An estimation class for kinetics (without splicing) experiments with the deterministic model. Order of species: <unspliced>, <spliced>
- fit_lsq(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)
Fit time-seris data using least squares
- Parameters:
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A m-by-n numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
- Returns:
- popt:
ndarray
Optimal parameters.
- cost: float
The cost function evaluated at the optimum.
- popt:
- test_chi2(t, x_data, species=None, method='matrix', normalize=True)
perform a Pearson’s chi-square test. The statistics is computed as: sum_i (O_i - E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
- Returns:
p: float The p-value of a one-tailed chi-square test.
c2: float The chi-square statistics.
df: int Degree of freedom.
- class dynamo.est.tsc.Estimation_DeterministicKin(alpha, beta, gamma, x0=array([0.0, 0.0]))
An estimation class for kinetics experiments with the deterministic model. Order of species: <unspliced>, <spliced>
- fit_lsq(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)
Fit time-seris data using least squares
- Parameters:
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A m-by-n numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
- Returns:
- popt:
ndarray
Optimal parameters.
- cost: float
The cost function evaluated at the optimum.
- popt:
- test_chi2(t, x_data, species=None, method='matrix', normalize=True)
perform a Pearson’s chi-square test. The statistics is computed as: sum_i (O_i - E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
- Returns:
p: float The p-value of a one-tailed chi-square test.
c2: float The chi-square statistics.
df: int Degree of freedom.
Stochastic models via matrix form of moment equations
- class dynamo.est.tsc.Estimation_MomentDeg(beta=None, gamma=None, x0=None, include_cov=True)
An estimation class for degradation (with splicing) experiments. Order of species: <unspliced>, <spliced>, <uu>, <ss>, <us> Order of parameters: beta, gamma
- fit_lsq(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)
Fit time-seris data using least squares
- Parameters:
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A m-by-n numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
- Returns:
- popt:
ndarray
Optimal parameters.
- cost: float
The cost function evaluated at the optimum.
- popt:
- test_chi2(t, x_data, species=None, method='matrix', normalize=True)
perform a Pearson’s chi-square test. The statistics is computed as: sum_i (O_i - E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
- Returns:
p: float The p-value of a one-tailed chi-square test.
c2: float The chi-square statistics.
df: int Degree of freedom.
- class dynamo.est.tsc.Estimation_MomentDegNosp(gamma=None, x0=None)
An estimation class for degradation (without splicing) experiments.
An estimation class for degradation (without splicing) experiments. Order of species: <r>, <rr>
- fit_lsq(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)
Fit time-seris data using least squares
- Parameters:
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A m-by-n numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
- Returns:
- popt:
ndarray
Optimal parameters.
- cost: float
The cost function evaluated at the optimum.
- popt:
- test_chi2(t, x_data, species=None, method='matrix', normalize=True)
perform a Pearson’s chi-square test. The statistics is computed as: sum_i (O_i - E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
- Returns:
p: float The p-value of a one-tailed chi-square test.
c2: float The chi-square statistics.
df: int Degree of freedom.
- class dynamo.est.tsc.Estimation_MomentKin(a, b, alpha_a, alpha_i, beta, gamma, include_cov=True)
An estimation class for kinetics experiments. Order of species: <unspliced>, <spliced>, <uu>, <ss>, <us>
- fit_lsq(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)
Fit time-seris data using least squares
- Parameters:
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A m-by-n numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
- Returns:
- popt:
ndarray
Optimal parameters.
- cost: float
The cost function evaluated at the optimum.
- popt:
- test_chi2(t, x_data, species=None, method='matrix', normalize=True)
perform a Pearson’s chi-square test. The statistics is computed as: sum_i (O_i - E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
- Returns:
p: float The p-value of a one-tailed chi-square test.
c2: float The chi-square statistics.
df: int Degree of freedom.
- class dynamo.est.tsc.Estimation_MomentKinNosp(a, b, alpha_a, alpha_i, gamma)
An estimation class for kinetics experiments. Order of species: <r>, <rr>
- fit_lsq(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)
Fit time-seris data using least squares
- Parameters:
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A m-by-n numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
- Returns:
- popt:
ndarray
Optimal parameters.
- cost: float
The cost function evaluated at the optimum.
- popt:
- test_chi2(t, x_data, species=None, method='matrix', normalize=True)
perform a Pearson’s chi-square test. The statistics is computed as: sum_i (O_i - E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
- Returns:
p: float The p-value of a one-tailed chi-square test.
c2: float The chi-square statistics.
df: int Degree of freedom.
Mixture models for kinetic / degradation experiments
- class dynamo.est.tsc.Lambda_NoSwitching(model1, model2, alpha=None, lambd=None, gamma=None, x0=None, beta=None)
An estimation class with the mixture model. If beta is None, it is assumed that the data does not have the splicing process.
- fit_lsq(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)
Fit time-seris data using least squares
- Parameters:
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A m-by-n numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
- Returns:
- popt:
ndarray
Optimal parameters.
- cost: float
The cost function evaluated at the optimum.
- popt:
- test_chi2(t, x_data, species=None, method='matrix', normalize=True)
perform a Pearson’s chi-square test. The statistics is computed as: sum_i (O_i - E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
- Returns:
p: float The p-value of a one-tailed chi-square test.
c2: float The chi-square statistics.
df: int Degree of freedom.
- class dynamo.est.tsc.Mixture_KinDeg_NoSwitching(model1, model2, alpha=None, gamma=None, x0=None, beta=None)
An estimation class with the mixture model. If beta is None, it is assumed that the data does not have the splicing process.
- fit_lsq(t, x_data, p0=None, n_p0=1, bounds=None, sample_method='lhs', method=None, normalize=True)
Fit time-seris data using least squares
- Parameters:
t (
ndarray
) – A numpy array of n time points.x_data (
ndarray
) – A m-by-n numpy a array of m species, each having n values for the n time points.p0 (
numpy.ndarray
, optional, default: None) – Initial guesses of parameters. If None, a random number is generated within the bounds.n_p0 (int, optional, default: 1) – Number of initial guesses.
bounds (tuple, optional, default: None) – Lower and upper bounds for parameters.
sample_method (str, optional, default: lhs) – Method used for sampling initial guesses of parameters: lhs: latin hypercube sampling; uniform: uniform random sampling.
method (str or None, optional, default: None) – Method used for solving ODEs. See options in simulator classes. If None, default method is used.
normalize (bool, optional, default: True) – Whether or not normalize values in x_data across species, so that large values do not dominate the optimizer.
- Returns:
- popt:
ndarray
Optimal parameters.
- cost: float
The cost function evaluated at the optimum.
- popt:
- test_chi2(t, x_data, species=None, method='matrix', normalize=True)
perform a Pearson’s chi-square test. The statistics is computed as: sum_i (O_i - E_i)^2 / E_i, where O_i is the data and E_i is the model predication.
The data can be either 1. stratified moments: ‘t’ is an array of k distinct time points, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. or 2. raw data: ‘t’ is an array of k time points for k cells, ‘x_data’ is a m-by-k matrix of data, where m is the number of species. Note that if the method is ‘numerical’, t has to monotonically increasing.
If not all species are included in the data, use ‘species’ to specify the species of interest.
- Returns:
p: float The p-value of a one-tailed chi-square test.
c2: float The chi-square statistics.
df: int Degree of freedom.
Vector field
Vector field class
- class dynamo.vf.SvcVectorField(X=None, V=None, Grid=None, *args, **kwargs)[source]
Initialize the VectorField class.
- Parameters:
X (
ndarray
(dimension: n_obs x n_features)) – Original data.V (
ndarray
(dimension: n_obs x n_features)) – Velocities of cells in the same order and dimension of X.Grid (
ndarray
) – The function that returns diffusion matrix which can be dependent on the variables (for example, genes)M (int (default: None)) – The number of basis functions to approximate the vector field. By default it is calculated as min(len(X), int(1500 * np.log(len(X)) / (np.log(len(X)) + np.log(100)))). So that any datasets with less than about 900 data points (cells) will use full data for vector field reconstruction while any dataset larger than that will at most use 1500 data points.
a (float (default 5)) – Parameter of the model of outliers. We assume the outliers obey uniform distribution, and the volume of outlier’s variation space is a.
beta (float (default: None)) – Parameter of Gaussian Kernel, k(x, y) = exp(-beta*||x-y||^2). If None, a rule-of-thumb bandwidth will be computed automatically.
ecr (float (default: 1e-5)) – The minimum limitation of energy change rate in the iteration process.
gamma (float (default: 0.9)) – Percentage of inliers in the samples. This is an inital value for EM iteration, and it is not important. Default value is 0.9.
lambda (float (default: 3)) – Represents the trade-off between the goodness of data fit and regularization.
minP (float (default: 1e-5)) – The posterior probability Matrix P may be singular for matrix inversion. We set the minimum value of P as minP.
MaxIter (int (default: 500)) – Maximum iteration times.
theta (float (default 0.75)) – Define how could be an inlier. If the posterior probability of a sample is an inlier is larger than theta, then it is regarded as an inlier.
div_cur_free_kernels (bool (default: False)) – A logic flag to determine whether the divergence-free or curl-free kernels will be used for learning the vector field.
sigma (int) – Bandwidth parameter.
eta (int) – Combination coefficient for the divergence-free or the curl-free kernels.
seed (int or 1-d array_like, optional (default: 0)) – Seed for RandomState. Must be convertible to 32 bit unsigned integers. Used in sampling control points. Default is to be 0 for ensure consistency between different runs.
- train(normalize=False, **kwargs)[source]
Learn an function of vector field from sparse single cell samples in the entire space robustly. Reference: Regularized vector field learning with sparse approximation for mismatch removal, Ma, Jiayi, etc. al, Pattern Recognition
- Parameters:
normalize ('bool' (default: False)) – Logic flag to determine whether to normalize the data to have zero means and unit covariance. This is often required for raw dataset (for example, raw UMI counts and RNA velocity values in high dimension). But it is normally not required for low dimensional embeddings by PCA or other non-linear dimension reduction methods.
method ('string') – Method that is used to reconstruct the vector field functionally. Currently only SparseVFC supported but other improved approaches are under development.
- Returns:
- VecFld: `dict’
A dictionary which contains X, Y, beta, V, C, P, VFCIndex. Where V = f(X), P is the posterior probability and VFCIndex is the indexes of inliers which found by VFC.
- get_Jacobian(method='analytical', input_vector_convention='row', **kwargs)[source]
Get the Jacobian of the vector field function. If method is ‘analytical’: The analytical Jacobian will be returned and it always take row vectors as input no matter what input_vector_convention is.
If method is ‘numerical’: If the input_vector_convention is ‘row’, it means that fjac takes row vectors as input, otherwise the input should be an array of column vectors. Note that the returned Jacobian would behave exactly the same if the input is an 1d array.
The column vector convention is slightly faster than the row vector convention. So the matrix of row vector convention is converted into column vector convention under the hood.
No matter the method and input vector convention, the returned Jacobian is of the following format:
df_1/dx_1 df_1/dx_2 df_1/dx_3 … df_2/dx_1 df_2/dx_2 df_2/dx_3 … df_3/dx_1 df_3/dx_2 df_3/dx_3 … … … … …
- get_Hessian(method='analytical', **kwargs)[source]
Get the Hessian of the vector field function. If method is ‘analytical’: The analytical Hessian will be returned and it always take row vectors as input no matter what input_vector_convention is.
No matter the method and input vector convention, the returned Hessian is of the following format:
df^2/dx_1^2 df_1^2/(dx_1 dx_2) df_1^2/(dx_1 dx_3) … df^2/(dx_2 dx_1) df^2/dx_2^2 df^2/(dx_2 dx_3) … df^2/(dx_3 dx_1) df^2/(dx_3 dx_2) df^2/dx_3^2 … … … … …
- get_Laplacian(method='analytical', **kwargs)[source]
Get the Laplacian of the vector field. Laplacian is defined as the sum of the diagonal of the Hessian matrix. Because Hessian is originally defined for scalar function and here we extend it to vector functions. We will calculate the summation of the diagonal of each output (target) dimension.
A Laplacian filter is an edge detector used to compute the second derivatives of an image, measuring the rate at which the first derivatives change (so it is the derivative of the Jacobian). This determines if a change in adjacent pixel values is from an edge or continuous progression.
- evaluate(CorrectIndex, VFCIndex, siz)[source]
Evaluate the precision, recall, corrRate of the sparseVFC algorithm.
- Parameters:
CorrectIndex ('List') – Ground truth indexes of the correct vector field samples.
VFCIndex ('List') – Indexes of the correct vector field samples learned by VFC.
siz ('int') – Number of initial matches.
- Returns:
A tuple of precision, recall, corrRate
Precision, recall, corrRate (Precision and recall of VFC, percentage of initial correct matches.)
See also::
sparseVFC()
.
- assign_fixed_points(domain=None, cores=1, **kwargs)
assign each cell to the associated fixed points
- find_fixed_points(n_x0=100, X0=None, domain=None, sampling_method='random', **kwargs)
Search for fixed points of the vector field function.
- class dynamo.vf.Pot(Function=None, DiffMat=None, boundary=None, n_points=25, fixed_point_only=False, find_fixed_points=False, refpoint=None, stable=None, saddle=None)[source]
It implements the least action method to calculate the potential values of fixed points for a given SDE (stochastic differential equation) model. The function requires the vector field function and a diffusion matrix. This code is based on the MATLAB code from Ruoshi Yuan and Ying Tang. Potential landscape of high dimensional nonlinear stochastic dynamics with large noise. Y Tang, R Yuan, G Wang, X Zhu, P Ao - Scientific reports, 2017
- Parameters:
Function ('function') – The (reconstructed) vector field function.
DiffMat ('function') – The function that returns the diffusion matrix which can variable (for example, gene) dependent.
boundary ('list') – The range of variables (genes).
n_points ('int') – The number of points along the least action path.
fixed_point_only ('bool') – The logic flag to determine whether only the potential for fixed point or entire space should be mapped.
find_fixed_points ('bool') – The logic flag to determine whether only the gen_fixed_points function should be run to identify fixed points.
refpoint ('np.ndarray') – The reference point to define the potential.
stable ('np.ndarray') – The matrix for storing the coordinates (gene expression configuration) of the stable fixed point (characteristic state of a particular cell type).
saddle ('np.ndarray') – The matrix for storing the coordinates (gene expression configuration) of the unstable fixed point (characteristic state of cells prime to bifurcation).
- fit(adata, x_lim, y_lim, basis='umap', method='Ao', xyGridSpacing=2, dt=0.01, tol=0.01, numTimeSteps=1400)[source]
Function to map out the pseudo-potential landscape.
Although it is appealing to define “potential” for biological systems as it is intuitive and familiar from other fields, it is well-known that the definition of a potential function in open biological systems is controversial (Ping Ao 2009). In the conservative system, the negative gradient of potential function is relevant to the velocity vector by ma = −Δψ (where m, a, are the mass and acceleration of the object, respectively). However, a biological system is massless, open and nonconservative, thus methods that directly learn potential function assuming a gradient system are not directly applicable. In 2004, Ao first proposed a framework that decomposes stochastic differential equations into either the gradient or the dissipative part and uses the gradient part to define a physical equivalent of potential in biological systems (P. Ao 2004). Later, various theoretical studies have been conducted towards this very goal (Xing 2010; Wang et al. 2011; J. X. Zhou et al. 2012; Qian 2013; P. Zhou and Li 2016). Bhattacharya and others also recently provided a numeric algorithm to approximate the potential landscape.
This function implements the Ao, Bhattacharya method and Ying method and will also support other methods shortly.
- Parameters:
adata (
AnnData
) – AnnData object that contains U_grid and V_grid datax_lim (list) – Lower or upper limit of x-axis.
y_lim (list) – Lower or upper limit of y-axis
basis (str (default: umap)) – The dimension reduction method to use.
method ('string' (default: Bhattacharya)) – Method used to map the pseudo-potential landscape. By default, it is Bhattacharya (A deterministic map of Waddington’s epigenetic landscape for cell fate specification. Sudin Bhattacharya, Qiang Zhang and Melvin E. Andersen). Other methods will be supported include: Tang (), Ping (), Wang (), Zhou ().
- Returns:
if Bhattacharya is used –
- Xgrid: ‘np.ndarray’
The X grid to visualize “potential surface”
- Ygrid: ‘np.ndarray’
The Y grid to visualize “potential surface”
- Zgrid: ‘np.ndarray’
The interpolate potential corresponding to the X,Y grids.
if Tang method is used
retmat (‘np.ndarray’) – The action value for the learned least action path.
LAP (‘np.ndarray’) – The least action path learned
Predictions
Least action path
- class dynamo.pd.GeneTrajectory(adata, X=None, t=None, X_pca=None, PCs='PCs', mean='pca_mean', genes='use_for_pca', expr_func=None, **kwargs)[source]
This class is not fully functional yet.
- calc_msd(save_key='traj_msd', **kwargs)[source]
Calculate the mean squared displacement (MSD) of the curve with respect to a reference point.
- Parameters:
decomp_dim – If True, return the MSD of each dimension separately. If False, return the total MSD.
ref – Index of the reference point. Default is 0.
- Returns:
The MSD of the curve with respect to the reference point.
- calc_arclength()
Calculate the arc length of the trajectory.
- Return type:
- Returns:
arc length of the trajectory
- calc_curvature()
Calculate the curvature of the trajectory.
- Return type:
- Returns:
curvature of the trajectory, shape (n_points,)
- calc_tangent(normalize=True)
Calculate the tangent vectors of the trajectory.
- Parameters:
normalize (
bool
) – whether to normalize the tangent vectors. Defaults to True.- Returns:
tangent vectors of the trajectory, shape (n_points-1, n_dimensions)
- dim()
Returns the number of dimensions in the trajectory.
- Return type:
- Returns:
number of dimensions in the trajectory
- integrate(func)
Calculate the integral of a function along the curve.
- interp_X(num=100, **interp_kwargs)
Interpolates the curve at num equally spaced points in t.
- interp_t(num=100)
Interpolates the t parameter linearly.
- interpolate(t, **interp_kwargs)
Interpolate the curve at new time values.
- Parameters:
t (
ndarray
) – The new time values at which to interpolate the curve.**interp_kwargs – Additional arguments to pass to scipy.interpolate.interp1d.
- Return type:
- Returns:
The interpolated values of the curve at the specified time values.
- Raises:
Exception – If self.t is None, which is needed for interpolation.
- resample(n_points, tol=0.0001, inplace=True)
Resample the curve with the specified number of points.
- Parameters:
- Return type:
- Returns:
A tuple containing the resampled curve coordinates and time values (if available).
- Raises:
ValueError – If the specified number of points is less than 2.
- class dynamo.pd.Trajectory(X, t=None, sort=True)[source]
Base class for handling trajectory interpolation, resampling, etc.
- Parameters:
- set_time(t, sort=True)[source]
Set the time stamps for the trajectory. Sorts the time stamps if requested.
- dim()[source]
Returns the number of dimensions in the trajectory.
- Return type:
- Returns:
number of dimensions in the trajectory
- calc_tangent(normalize=True)[source]
Calculate the tangent vectors of the trajectory.
- Parameters:
normalize (
bool
) – whether to normalize the tangent vectors. Defaults to True.- Returns:
tangent vectors of the trajectory, shape (n_points-1, n_dimensions)
- calc_arclength()[source]
Calculate the arc length of the trajectory.
- Return type:
- Returns:
arc length of the trajectory
- calc_curvature()[source]
Calculate the curvature of the trajectory.
- Return type:
- Returns:
curvature of the trajectory, shape (n_points,)
- resample(n_points, tol=0.0001, inplace=True)[source]
Resample the curve with the specified number of points.
- Parameters:
- Return type:
- Returns:
A tuple containing the resampled curve coordinates and time values (if available).
- Raises:
ValueError – If the specified number of points is less than 2.
- interpolate(t, **interp_kwargs)[source]
Interpolate the curve at new time values.
- Parameters:
t (
ndarray
) – The new time values at which to interpolate the curve.**interp_kwargs – Additional arguments to pass to scipy.interpolate.interp1d.
- Return type:
- Returns:
The interpolated values of the curve at the specified time values.
- Raises:
Exception – If self.t is None, which is needed for interpolation.
- interp_X(num=100, **interp_kwargs)[source]
Interpolates the curve at num equally spaced points in t.
- class dynamo.pd.LeastActionPath(X, vf_func, D=1, dt=1)[source]
A class for computing the Least Action Path for a given function and initial conditions.
- Parameters:
X (
ndarray
) – The initial conditions as a 2D array of shape (n, m), where n is the number of points in the trajectory and m is the dimension of the system.vf_func (
Callable
) – The vector field function that governs the system.D (
float
) – The diffusion constant of the system. Defaults to 1.dt (
float
) – The time step for the simulation. Defaults to 1.
- func
The vector field function that governs the system.
- D
The diffusion constant of the system.
- _action
The Least Action Path action values for each point in the trajectory.
- action(t=None, **interp_kwargs)
Returns the Least Action Path action values at time t. If t is None, returns the action values for all time points. **interp_kwargs are passed to the interp1d function.
- mfpt(action=None)[source]
Returns the mean first passage time using the action values. If action is None, uses the action values stored in the _action attribute.
- optimize_dt()[source]
Optimizes the time step of the simulation to minimize the Least Action Path action.
Initializes the LeastActionPath class instance with the given initial conditions, vector field function, diffusion constant and time step.
- Parameters:
X (
ndarray
) – The initial conditions as a 2D array of shape (n, m), where n is the number of points in the trajectory and m is the dimension of the system.vf_func (
Callable
) – The vector field function that governs the system.D (
float
) – The diffusion constant of the system. Defaults to 1.dt (
float
) – The time step for the simulation. Defaults to 1.
- get_t()[source]
Returns the time points of the trajectory.
- Returns:
The time points of the trajectory.
- Return type:
ndarray
- get_dt()[source]
Returns the time step of the trajectory.
- Returns:
The time step of the trajectory.
- Return type:
- action_t(t=None, **interp_kwargs)[source]
Returns the Least Action Path action values at time t.
- Parameters:
- Returns:
The Least Action Path action value(s).
- Return type:
ndarray
- optimize_dt()[source]
Optimizes the time step of the simulation to minimize the Least Action Path action.
- Returns:
optimal time step
- Return type:
- calc_arclength()
Calculate the arc length of the trajectory.
- Return type:
- Returns:
arc length of the trajectory
- calc_curvature()
Calculate the curvature of the trajectory.
- Return type:
- Returns:
curvature of the trajectory, shape (n_points,)
- calc_msd(decomp_dim=True, ref=0)
Calculate the mean squared displacement (MSD) of the curve with respect to a reference point.
- calc_tangent(normalize=True)
Calculate the tangent vectors of the trajectory.
- Parameters:
normalize (
bool
) – whether to normalize the tangent vectors. Defaults to True.- Returns:
tangent vectors of the trajectory, shape (n_points-1, n_dimensions)
- dim()
Returns the number of dimensions in the trajectory.
- Return type:
- Returns:
number of dimensions in the trajectory
- integrate(func)
Calculate the integral of a function along the curve.
- interp_X(num=100, **interp_kwargs)
Interpolates the curve at num equally spaced points in t.
- interp_t(num=100)
Interpolates the t parameter linearly.
- interpolate(t, **interp_kwargs)
Interpolate the curve at new time values.
- Parameters:
t (
ndarray
) – The new time values at which to interpolate the curve.**interp_kwargs – Additional arguments to pass to scipy.interpolate.interp1d.
- Return type:
- Returns:
The interpolated values of the curve at the specified time values.
- Raises:
Exception – If self.t is None, which is needed for interpolation.
- resample(n_points, tol=0.0001, inplace=True)
Resample the curve with the specified number of points.
- Parameters:
- Return type:
- Returns:
A tuple containing the resampled curve coordinates and time values (if available).
- Raises:
ValueError – If the specified number of points is less than 2.
- class dynamo.pd.GeneLeastActionPath(adata, lap=None, X_pca=None, vf_func=None, D=1, dt=1, **kwargs)[source]
Calculates the least action path trajectory and action for a gene expression dataset. Inherits from GeneTrajectory class.
- Parameters:
adata – AnnData object containing the gene expression dataset.
lap (
Optional
[LeastActionPath
]) – LeastActionPath object. Defaults to None.X_pca – PCA transformed expression data. Defaults to None.
vf_func – Vector field function. Defaults to None.
D – Diffusivity value. Defaults to 1.
dt – Time step size. Defaults to 1.
**kwargs – Additional keyword arguments passed to the GeneTrajectory class.
- adata
AnnData object containing the gene expression dataset.
- X
Expression data.
- to_pca
Transformation matrix from gene expression space to PCA space.
- from_pca
Transformation matrix from PCA space to gene expression space.
- PCs
Principal components from PCA analysis.
- func
Vector field function reconstructed within the PCA space.
- D
Diffusivity value.
- t
Array of time values.
- action
Array of action values.
- from_lap(adata, lap, **kwargs)[source]
Initializes class from a LeastActionPath object.
- Parameters:
adata (
AnnData
) – AnnData object containing the gene expression dataset.lap (
LeastActionPath
) – LeastActionPath object.**kwargs – Additional keyword arguments passed to the GeneTrajectory class.
- get_t()[source]
Returns the array of time values.
- Returns:
Array of time values.
- Return type:
np.ndarray
- genewise_action()[source]
Calculates the genewise action values.
- Returns:
Array of genewise action values.
- Return type:
np.ndarray
- calc_arclength()
Calculate the arc length of the trajectory.
- Return type:
- Returns:
arc length of the trajectory
- calc_curvature()
Calculate the curvature of the trajectory.
- Return type:
- Returns:
curvature of the trajectory, shape (n_points,)
- calc_msd(save_key='traj_msd', **kwargs)
Calculate the mean squared displacement (MSD) of the curve with respect to a reference point.
- Parameters:
decomp_dim – If True, return the MSD of each dimension separately. If False, return the total MSD.
ref – Index of the reference point. Default is 0.
- Returns:
The MSD of the curve with respect to the reference point.
- calc_tangent(normalize=True)
Calculate the tangent vectors of the trajectory.
- Parameters:
normalize (
bool
) – whether to normalize the tangent vectors. Defaults to True.- Returns:
tangent vectors of the trajectory, shape (n_points-1, n_dimensions)
- dim()
Returns the number of dimensions in the trajectory.
- Return type:
- Returns:
number of dimensions in the trajectory
- integrate(func)
Calculate the integral of a function along the curve.
- interp_X(num=100, **interp_kwargs)
Interpolates the curve at num equally spaced points in t.
- interp_t(num=100)
Interpolates the t parameter linearly.
- interpolate(t, **interp_kwargs)
Interpolate the curve at new time values.
- Parameters:
t (
ndarray
) – The new time values at which to interpolate the curve.**interp_kwargs – Additional arguments to pass to scipy.interpolate.interp1d.
- Return type:
- Returns:
The interpolated values of the curve at the specified time values.
- Raises:
Exception – If self.t is None, which is needed for interpolation.
- resample(n_points, tol=0.0001, inplace=True)
Resample the curve with the specified number of points.
- Parameters:
- Return type:
- Returns:
A tuple containing the resampled curve coordinates and time values (if available).
- Raises:
ValueError – If the specified number of points is less than 2.
Movie
Animation class
- class dynamo.mv.StreamFuncAnim(adata, basis='umap', fp_basis=None, dims=None, n_steps=100, cell_states=None, color='ntr', fig=None, ax=None, logspace=False, max_time=None, frame_color=None)[source]
Animating cell fate commitment prediction via reconstructed vector field function.
Animating cell fate commitment prediction via reconstructed vector field function.
This class creates necessary components to produce an animation that describes the exact speed of a set of cells at each time point, its movement in gene expression and the long range trajectory predicted by the reconstructed vector field. Thus it provides intuitive visual understanding of the RNA velocity, speed, acceleration, and cell fate commitment in action.
This function is originally inspired by https://tonysyu.github.io/animating-particles-in-a-flow.html and relies on animation module from matplotlib. Note that you may need to install imagemagick in order to properly show or save the animation. See for example, http://louistiao.me/posts/notebooks/save-matplotlib-animations-as-gifs/ for more details.
- Parameters:
adata (
AnnData
) – AnnData object that already went through the fate prediction.basis (str or None (default: umap)) – The embedding data to use for predicting cell fate. If basis is either umap or pca, the reconstructed trajectory will be projected back to high dimensional space via the inverse_transform function. space.
fps_basis (str or None (default: None)) – The basis that will be used for identifying or retrieving fixed points. Note that if fps_basis is different from basis, the nearest cells of the fixed point from the fps_basis will be found and used to visualize the position of the fixed point on basis embedding.
dims (list or None (default: `None’)) – The dimensions of low embedding space where cells will be drawn and it should corresponds to the space fate prediction take place.
n_steps (int (default: 100)) – The number of times steps (frames) fate prediction will take.
cell_states (int, list or None (default: None)) – The number of cells state that will be randomly selected (if int), the indices of the cells states (if list) or all cell states which fate prediction executed (if None)
fig (matplotlib.figure.Figure or None (default: None)) – The figure that will contain both the background and animated components.
ax (matplotlib.Axis (optional, default None)) – The matplotlib axes object that will be used as background plot of the vector field animation. If ax is None, topography(adata, basis=basis, color=color, ax=ax, save_show_or_return=’return’) will be used to create an axes.
logspace (bool (default: False)) – Whether or to sample time points linearly on log space. If not, the sorted unique set of all time points from all cell states’ fate prediction will be used and then evenly sampled up to n_steps time points.
- Returns:
A class that contains .fig attribute and .update, .init_background that can be used to produce an animation
of the prediction of cell fate commitment.
Examples 1
>>> from matplotlib import animation >>> progenitor = adata.obs_names[adata.obs.clusters == 'cluster_1'] >>> fate_progenitor = progenitor >>> info_genes = adata.var_names[adata.var.use_for_transition] >>> dyn.pd.fate(adata, basis='umap', init_cells=fate_progenitor, interpolation_num=100, direction='forward', ... inverse_transform=False, average=False) >>> instance = dyn.mv.StreamFuncAnim(adata=adata, fig=None, ax=None) >>> anim = animation.FuncAnimation(instance.fig, instance.update, init_func=instance.init_background, ... frames=np.arange(100), interval=100, blit=True) >>> from IPython.core.display import display, HTML >>> HTML(anim.to_jshtml()) # embedding to jupyter notebook. >>> anim.save('fate_ani.gif',writer="imagemagick") # save as gif file.
Examples 2
>>> from matplotlib import animation >>> progenitor = adata.obs_names[adata.obs.clusters == 'cluster_1'] >>> fate_progenitor = progenitor >>> info_genes = adata.var_names[adata.var.use_for_transition] >>> dyn.pd.fate(adata, basis='umap', init_cells=fate_progenitor, interpolation_num=100, direction='forward', ... inverse_transform=False, average=False) >>> fig, ax = plt.subplots() >>> ax = dyn.pl.topography(adata_old, color='time', ax=ax, save_show_or_return='return', color_key_cmap='viridis') >>> ax.set_xlim(xlim) >>> ax.set_ylim(ylim) >>> instance = dyn.mv.StreamFuncAnim(adata=adata, fig=fig, ax=ax) >>> anim = animation.FuncAnimation(fig, instance.update, init_func=instance.init_background, ... frames=np.arange(100), interval=100, blit=True) >>> from IPython.core.display import display, HTML >>> HTML(anim.to_jshtml()) # embedding to jupyter notebook. >>> anim.save('fate_ani.gif',writer="imagemagick") # save as gif file.
Examples 3
>>> from matplotlib import animation >>> progenitor = adata.obs_names[adata.obs.clusters == 'cluster_1'] >>> fate_progenitor = progenitor >>> info_genes = adata.var_names[adata.var.use_for_transition] >>> dyn.pd.fate(adata, basis='umap', init_cells=fate_progenitor, interpolation_num=100, direction='forward', ... inverse_transform=False, average=False) >>> dyn.mv.animate_fates(adata)
See also::
animate_fates()