# Get Started ## 10 minutes to dynamo Welcome to dynamo! Dynamo is a computational framework that includes an inclusive model of expression dynamics with scSLAM-seq / multiomics, vector field reconstruction, and potential landscape mapping. ## Why dynamo Dynamo currently provides a complete solution (see below) to analyze expression dynamics of conventional scRNA-seq or time-resolved metabolic labeling based scRNA-seq. It aspires to become the leading tool in continuously integrating the most exciting developments in machine learning, systems biology, information theory, stochastic physics, etc. to model, understand, and interpret datasets generated from various cutting-edge single-cell genomics techniques (developments of dynamo 2/3 are underway). We hope those models, understandings, and interpretations not only facilitate your research but may also eventually lead to new biological discovery. Dynamo has a strong community so you will feel supported no matter whether you are a newcomer to computational biology or a veteran researcher who wants to contribute to dynamo's development. ## How to install Dynamo requires Python 3.6 or later. Dynamo now has been released to PyPi, you can install the PyPi version via: ```bash pip install dynamo-release ``` To install the newest version of dynamo, you can git clone our repo and then pip install: ```bash git clone https://github.com/aristoteleo/dynamo-release.git pip install dynamo-release/ --user ``` Conda version is also available: ```bash conda install -c conda-forge dynamo-release ``` Note that `--user` flag is used to install the package to your home directory, in case you don't have root privilege. Alternatively, you can install dynamo when you are in the dynamo-release folder by directly using python's setup install: ```bash git clone https://github.com/aristoteleo/dynamo-release.git cd dynamo-release/ python setup.py install --user ``` from source, using the following script: ```bash pip install git+https://github.com/aristoteleo/dynamo-release ``` In order to ensure dynamo runs properly, your python environment needs to satisfy dynamo's [dependencies](https://github.com/aristoteleo/dynamo-release/blob/master/setup.py). We provide a helper function for you to check the versions of dynamo's all dependencies: ```python import dynamo as dyn dyn.session_info() ``` ## Architecture of dynamo ![Dynamo Architecture](https://user-images.githubusercontent.com/7456281/93838294-2026f600-fc57-11ea-971b-c3ececba0d85.png) Dynamo has a few standard modules like most other single cell analysis toolkits (Scanpy, Monocle, or Seurat), for example, data loading (`dyn.read*`), preprocessing (`dyn.pp.*`), tool analysis (`dyn.tl.*`), and plotting (`dyn.pl.*`). Modules specific to dynamo include: - a comprehensive estimation framework (`dyn.est.*`) of expression dynamics that includes: - conventional single cell RNA-seq (scRNA-seq) modeling (`dyn.est.csc.*`) for **standard RNA velocity estimation** and more; - time-resolved metabolic labeling based single cell RNA-seq (scRNA-seq) modeling (`dyn.est.tsc.*`) for **labeling based RNA velocity estimation** and more; - vector field reconstruction and vector calculus (`dyn.vf.*`); - cell fate prediction (`dyn.pd.*`); - create movie of cell fate predictions (`dyn.mv.*`); - stochastic simulation of various metabolic labeling experiments (`dyn.sim.*`); - integration with external tools built by us or others (`dyn.ext.*`); - and more. ## Typical workflow ![Typical Workflow](https://user-images.githubusercontent.com/7456281/93838305-2b7a2180-fc57-11ea-8ec8-552b75446e32.png) A typical workflow in dynamo is similar to most of other single cell analysis toolkits (Scanpy, Monocle, or Seurat), including steps like importing dynamo (`import dynamo as dyn`), loading data (`dyn.read*`), preprocessing (`dyn.pp.*`), tool analysis (`dyn.tl.*`) and plotting (`dyn.pl.*`). To get the best of dynamo though, you need to use the `dyn.vf.*`, `dyn.pd.*` and `dyn.mv.*` modules. ### Import dynamo ```python import dynamo as dyn ``` We provide a few nice visualization defaults for different purposes: ```python dyn.configuration.set_figure_params('dynamo', background='white') # jupyter notebooks dyn.configuration.set_figure_params('dynamo', background='black') # presentation dyn.configuration.set_pub_style() # manuscript ``` ### Load data Dynamo relies on [anndata](https://anndata.readthedocs.io/en/latest/index.html) for data IO. You can read your own data via `read`, `read_loom`, `read_h5ad`, `read_h5` or `load_NASC_seq`, etc: ```python adata = dyn.read(filename) ``` Dynamo also comes with a few built-in sample datasets so you can familiarize with dynamo before analyzing your own dataset. For example, you can load the Dentate Gyrus example dataset: ```python adata = dyn.sample_data.DentateGyrus() ``` There are many sample datasets available. You can check other available datasets via `dyn.sample_data.*`. To process the scSLAM-seq data, please refer to the [NASC-seq analysis pipeline](https://github.com/sandberg-lab/NASC-seq). We are also working on a command line tool for this and will release it in due time. For processing splicing data, you can either use the [velocyto command line interface](http://velocyto.org/velocyto.py/tutorial/cli.html) or the [bustool from Pachter lab](http://pachterlab.github.io/kallistobus). ### Preprocess data After loading data, you are ready to perform some preprocessing. You can use the `Preprocessor` class that applies similar but a generalized strategy from [Monocle 3](https://cole-trapnell-lab.github.io/monocle3/) to normalize all datasets in different layers (the spliced and unspliced or new, i.e., metabolically labeled, and total mRNAs or others), followed by feature selection and PCA dimension reduction: ```python from dynamo.preprocessing import Preprocessor preprocessor = Preprocessor() preprocessor.preprocess_adata(adata, recipe="monocle") ``` ### Learn dynamics Next, you will want to estimate the kinetic parameters of expression dynamics and then learn the velocity values for all genes that pass some filters (selected feature genes, by default) across cells. The `dyn.tl.dynamics` does all the hard work for you: ```python dyn.tl.dynamics(adata) ``` implicitly calls `dyn.tl.moments` first ```python dyn.tl.moments(adata) ``` which calculates the first, second moments (and sometimes covariance between different layers) of the expression data. First / second moments are basically mean and uncentered variance of gene expression, which are calculated based on local smoothing via a nearest neighbors graph, constructed in the reduced PCA space from the spliced or total mRNA expression of single cells. And it then performs the following steps: - checks the data you have and determines the experimental type automatically, either the conventional scRNA-seq, `kinetics`, `degradation` or `one-shot` single-cell metabolic labeling experiment or the `CITE-seq` or `REAP-seq` co-assay, etc. - learns the velocity for each feature gene using either the original deterministic model based on a steady-state assumption from the seminal RNA velocity work or a few new methods, including the `stochastic` (default) or `negative binomial method` for conventional scRNA-seq or `kinetic`, `degradation` or `one-shot` models for metabolic labeling based scRNA-seq. Those later methods are based on moment equations. All those methods use all or part of the output from `dyn.tl.moments(adata)`. Kinetic estimation of the conventional scRNA-seq and metabolic labeling based scRNA-seq is often tricky and has a lot of pitfalls. Sometimes you may even observe undesired backward vector flow. You can evaluate the confidence of gene-wise velocity via: ```python dyn.tl.gene_wise_confidence(adata, group='group', lineage_dict={'Progenitor': ['terminal_cell_state']}) ``` and filter those low-confidence genes for downstream Velocity vectors analysis, etc (See more details in FAQ). ### Dimension reduction By default, we use the `umap` algorithm for dimension reduction. ```python dyn.tl.reduceDimension(adata) ``` If the requested reduced dimension already exists, dynamo won't touch it unless you set `enforce=True`. ```python dyn.tl.reduceDimension(adata, basis='umap', enforce=True) ``` ### Velocity vectors We need to project the velocity vector onto low-dimensional embedding for later visualization. To get there, we can either use the default `correlation/cosine kernel` or the novel Itô kernel from us. ```python dyn.tl.cell_velocities(adata) ``` The above function projects and evaluates velocity vectors on `umap` space, but you can also operate them on other bases, for example, `pca` space: ```python dyn.tl.cell_velocities(adata, basis='pca') ``` You can check the confidence of cell-wise velocity to understand how reliable the recovered velocity is across cells via: ```python dyn.tl.cell_wise_confidence(adata) ``` Obviously, dynamo doesn't stop here. The really exciting part of dynamo lies in the fact that it learns a `functional form of vector field` in the full transcriptomic space, which can be then used to predict cell fate and map single cell potential landscape. ### Vector field reconstruction In classical physics, including fluidics and aerodynamics, velocity and acceleration vector fields are used as fundamental tools to describe motion or external force of objects, respectively. In analogy, RNA velocity or protein accelerations estimated from single cells can be regarded as sparse samples in the velocity (La Manno et al. 2018) or acceleration vector field (Gorin, Svensson, and Pachter 2019) that defined on the gene expression space. In general, a vector field can be defined as a vector-valued function f that maps any points (or cells’ expression state) x in a domain Ω with D dimension (or the gene expression system with D transcripts / proteins) to a vector y (for example, the velocity or acceleration for different genes or proteins), that is f(x) = y. To formally define the problem of velocity vector field learning, we consider a set of measured cells with pairs of current and estimated future expression states. The difference between the predicted future state and current state for each cell corresponds to the velocity vector. We note that the measured single-cell velocity (conventional RNA velocity) is sampled from a smooth, differentiable vector field f that maps from xi to yi on the entire domain. Normally, single-cell velocity measurements are results of biased, noisy, and sparse sampling of the entire state space, thus the goal of velocity vector field reconstruction is to robustly learn a mapping function f that outputs yj given any point xj on the domain based on the observed data with certain smoothness constraints (Jiayi Ma et al. 2013). Under ideal scenario, the mapping function f should recover the true velocity vector field on the entire domain and predict the true dynamics in regions of expression space that are not sampled. To reconstruct vector field function in dynamo, you can simply use the following function to do all the heavy-lifting: ```python dyn.vf.VectorField(adata) ``` By default, it learns the vector field in the `pca` space, but you can of course learn it in any space or even the original gene expression space. ### Characterize vector field topology Since we learn the vector field function of the data, we can then characterize the topology of the full vector field space. For example, we are able to identify - the fixed points (attractor/saddles, etc.) which may correspond to terminal cell types or progenitors; - nullcline, separatrices of a recovered dynamic system, which may formally define the dynamical behavior or the boundary of cell types in gene expression space. Again, you only need to simply run the following function to get all that information. ```python dyn.vf.topography(adata, basis='umap') ``` ### Predict cell fate Cell fate prediction is a crucial problem in single cell analysis. With the continuous vector field function learned, Dynamo is able to calculate the historical and future cell states over arbitrary time scales. You can use the following function to predict the cell fate on given initial cells. ```python dyn.pd.fate(adata, init_cells) ``` ### Map potential landscape The concept of potential landscape is widely appreciated across various biological disciplines, for example, the adaptive landscape in population genetics, protein-folding funnel landscape in biochemistry, epigenetic landscape in developmental biology. In the context of cell fate transition, for example, differentiation, carcinogenesis, etc., a potential landscape will not only offer an intuitive description of the global dynamics of the biological process but also provides key insights to understand the multi-stability and transition rate between different cell types as well as to quantify the optimal path of cell fate transition. Because the classical definition of potential function in physics requires gradient systems (no `curl` or cycling dynamics), which is often not applicable to open biological systems. In dynamo, we provided several ways to quantify the potential of single cells by decomposing the vector field into gradient, curl parts, etc. The recommended method is built on the Hodge decomposition on simplicial complexes (a sparse directional graph) constructed based on the learned vector field function that provides fruitful analogy of gradient, curl, and harmonic (cyclic) flows on the manifold: ```python dyn.ext.ddhodge(adata) ``` In addition, we and others proposed various strategies to decompose the `stochastic differential equations` into either the gradient or the curl component from first principles. We then can use the gradient part to define the potential. Although an analytical decomposition on the reconstructed vector field is challenging, we are able to use a numerical algorithm we recently developed for our purpose. This approach uses a least action method under the A-type stochastic integration (Shi et al. 2012) to globally map the potential landscape Ψ(x) (Tang et al. 2017) by taking the vector field function f(x) as input. ```python dyn.vf.Potential(adata) ``` ### Visualization In two or three dimensions, a streamline plot can be used to visualize the paths of cells will follow if released in different regions of the gene expression state space under a steady flow field. Although we currently do not support this, for vector fields that change over time, similar methods, for example, streakline, pathline, timeline, etc., can also be used to visualize the evolution of single cells or cell populations. In dynamo, we have three standard visual representations of vector fields, including the `cell wise`, `grid` quiver plots, and the `streamline plot`. Another intuitive way to visualize the structure of vector field is the so-called line integral convolution method or LIC (Cabral and Leedom 1993), which works by adding random black-and-white paint sources on the vector field and letting the flowing particles on the vector field picking up some texture to ensure points on the same streamline having similar intensity. We rely on the yt's `annotate_line_integral_convolution` function to visualize the LIC vector field reconstructed from dynamo. ```python dyn.pl.cell_wise_vectors(adata, color=colors, ncols=3) dyn.pl.grid_vectors(adata, color=colors, ncols=3) dyn.pl.streamline_plot(adata, color=colors, ncols=3) dyn.pl.line_integral_conv(adata) ``` Note that `colors` here is a list or str that can be either the column name in `.obs` or `gene names`. To visualize the topological structure of the reconstructed 2D vector fields, we provide the `dyn.pl.topography` function in dynamo. ```python dyn.vf.VectorField(adata, basis='umap') dyn.pl.topography(adata) ``` Plotting functions in dynamo are designed to be extremely flexible. For example, you can combine different types of dynamo plots together (when you visualize only one item for each plot function). ```python import matplotlib.pyplot as plt fig1, f1_axes = plt.subplots(ncols=2, nrows=2, constrained_layout=True, figsize=(12, 10)) f1_axes[0, 0] = dyn.pl.cell_wise_vectors(adata, color='umap_ddhodge_potential', pointsize=0.1, alpha=0.7, ax=f1_axes[0, 0], quiver_length=6, quiver_size=6, save_show_or_return='return') f1_axes[0, 1] = dyn.pl.grid_vectors(adata, color='speed_umap', ax=f1_axes[0, 1], quiver_length=12, quiver_size=12, save_show_or_return='return') f1_axes[1, 0] = dyn.pl.streamline_plot(adata, color='divergence_pca', ax=f1_axes[1, 0], save_show_or_return='return') f1_axes[1, 1] = dyn.pl.topography(adata, color='acceleration_umap', ax=f1_axes[1, 1], save_show_or_return='return') plt.show() ``` The above creates a 2x2 plot that puts `cell_wise_vectors`, `grid_vectors`, `streamline_plot`, and `topography` plots together. Last but not least, Dynamo also provides functions to create a movie of cell fate predictions. The animation needs predicted cell fate information in your dataset and requires a topography plot as the foundation. ```python from matplotlib import animation fig, ax = plt.subplots() ax = dyn.pl.topography(adata, ax=ax) instance = dyn.mv.StreamFuncAnim(adata=adata, ax=ax) anim = animation.FuncAnimation(instance.fig, instance.update, init_func=instance.init_background) ``` ## Compatibility Dynamo is fully compatible with velocyto, scanpy, and scvelo. So you can use your loom or annadata object as input for dynamo. The velocity vector samples estimated from either velocyto or scvelo can be also directly used to reconstruct the functional form of vector field and to map the potential landscape in the entire expression space. ``` This markdown version uses headings, code blocks, and lists to organize the information, making it easier to read and navigate.