Pancreatic endocrinogenesis differential geometry

In this tutorial, we will cover following topics:

  • learn contionus RNA velocity vector field functions in different spaces (e.g. umap or pca space)

  • rank genes based on the jacobian tensor

  • build and visualize gene regulatory network with top ranked genes

  • gene enrichment analyses of top ranked genes

  • visualize Jacobian derived regulatory interactions across cells

[1]:
# from IPython.core.display import display, HTML
# display(HTML("<style>.container { width:90% !important; }</style>"))
# %matplotlib inline
import warnings

warnings.filterwarnings("ignore")

import dynamo as dyn

|-----> setting visualization default mode in dynamo. Your customized matplotlib settings might be overritten.

Showing package dependecies which may help you debug dependency related bugs

[2]:
dyn.get_all_dependencies_version()

package dynamo-release pre-commit colorcet cvxopt loompy matplotlib networkx numba numdifftools numpy pandas pynndescent python-igraph scikit-learn scipy seaborn setuptools statsmodels tqdm umap-learn
version 1.0.0 2.16.0 2.0.6 1.2.7 3.0.6 3.4.3 2.6.3 0.54.1 0.9.40 1.20.3 1.3.4 0.5.5 0.9.8 1.0.2 1.8.0 0.11.2 58.0.4 0.13.2 4.62.3 0.5.2

Loading pancreatic endocrinogenesis dataset via dyn.sample_data

[3]:
dyn.configuration.set_figure_params("dynamo", background="white")
adata = dyn.sample_data.pancreatic_endocrinogenesis()

|-----> Downloading data to ./data/endocrinogenesis_day15.h5ad
|-----> [download] in progress: 100.0000%
|-----> [download] finished [5.4869s]

Defining pancreas_genes list, which we will investigate in the differential geometry analysis later in this notebook.

[4]:
pancreas_genes = [
    "Hes1",
    "Nkx6-1",
    "Nkx2-2",
    "Neurog3",
    "Neurod1",
    "Pax4",
    "Pax6",
    "Arx",
    "Pdx1",
    "Ins1",
    "Ins2",
    "Ghrl",
    "Ptf1a",
    "Iapp",
    "Isl1",
    "Sox9",
    "Gcg",
]

Proprocessing and learn RNA velocities

Normalizing adata with monocle recipe, reducing its dimension via PCA and computing velocities in PCA space

[5]:
dyn.pp.recipe_monocle(adata, n_top_genes=4000, fg_kwargs={"shared_count": 20}, genes_to_append=pancreas_genes)
dyn.tl.dynamics(adata, model="stochastic")
dyn.tl.reduceDimension(adata, n_pca_components=30)
dyn.tl.cell_velocities(adata, method="pearson", other_kernels_dict={"transform": "sqrt"})
dyn.tl.cell_velocities(adata, basis="pca")

|-----> recipe_monocle_keep_filtered_cells_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_filtered_cells_key=True
|-----> recipe_monocle_keep_filtered_genes_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_filtered_genes_key=True
|-----> recipe_monocle_keep_raw_layers_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_raw_layers_key=True
|-----> apply Monocole recipe to adata...
|-----> <insert> pp to uns in AnnData Object.
|-----------> <insert> has_splicing to uns['pp'] in AnnData Object.
|-----------> <insert> has_labling to uns['pp'] in AnnData Object.
|-----------> <insert> splicing_labeling to uns['pp'] in AnnData Object.
|-----------> <insert> has_protein to uns['pp'] in AnnData Object.
|-----> ensure all cell and variable names unique.
|-----> ensure all data in different layers in csr sparse matrix format.
|-----> ensure all labeling data properly collapased
|-----------> <insert> tkey to uns['pp'] in AnnData Object.
|-----------> <insert> experiment_type to uns['pp'] in AnnData Object.
|-----> filtering cells...
|-----> <insert> pass_basic_filter to obs in AnnData Object.
|-----> 3696 cells passed basic filters.
|-----> filtering gene...
|-----> <insert> pass_basic_filter to var in AnnData Object.
|-----> 6956 genes passed basic filters.
|-----> calculating size factor...
|-----> selecting genes in layer: X, sort method: SVR...
|-----> <insert> frac to var in AnnData Object.
|-----> size factor normalizing the data, followed by log1p transformation.
|-----> Set <adata.X> to normalized data
|-----> applying PCA ...
|-----> <insert> pca_fit to uns in AnnData Object.
|-----> <insert> ntr to obs in AnnData Object.
|-----> <insert> ntr to var in AnnData Object.
|-----> cell cycle scoring...
|-----> computing cell phase...
|-----> [cell phase estimation] in progress: 100.0000%
|-----> [cell phase estimation] finished [11.3934s]
|-----> <insert> cell_cycle_phase to obs in AnnData Object.
|-----> <insert> cell_cycle_scores to obsm in AnnData Object.
|-----> [Cell Cycle Scores Estimation] in progress: 100.0000%
|-----> [Cell Cycle Scores Estimation] finished [0.2938s]
|-----> [recipe_monocle preprocess] in progress: 100.0000%
|-----> [recipe_monocle preprocess] finished [5.2262s]
|-----> dynamics_del_2nd_moments_key is None. Using default value from DynamoAdataConfig: dynamics_del_2nd_moments_key=False
|-----------> removing existing M layers:[]...
|-----------> making adata smooth...
|-----> calculating first/second moments...
OMP: Info #273: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
|-----> [moments calculation] in progress: 100.0000%
|-----> [moments calculation] finished [17.8821s]
estimating gamma: 100%|██████████| 4000/4000 [02:28<00:00, 27.02it/s]
|-----> retrive data for non-linear dimension reduction...
|-----? adata already have basis umap. dimension reduction umap will be skipped!
set enforce=True to re-performing dimension reduction.
|-----> [dimension_reduction projection] in progress: 100.0000%
|-----> [dimension_reduction projection] finished [0.0007s]
|-----> 0 genes are removed because of nan velocity values.
|-----> [calculating transition matrix via pearson kernel with sqrt transform.] in progress: 100.0000%
|-----> [calculating transition matrix via pearson kernel with sqrt transform.] finished [4.4131s]
|-----> [projecting velocity vector to low dimensional embedding] in progress: 100.0000%
|-----> [projecting velocity vector to low dimensional embedding] finished [0.6581s]
|-----> 0 genes are removed because of nan velocity values.
Using existing pearson_transition_matrix found in .obsp.
|-----> [projecting velocity vector to low dimensional embedding] in progress: 100.0000%
|-----> [projecting velocity vector to low dimensional embedding] finished [0.7268s]
[5]:
AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'Size_Factor', 'initial_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'ntr', 'cell_cycle_phase'
    var: 'highly_variable_genes', 'nCells', 'nCounts', 'pass_basic_filter', 'log_m', 'log_cv', 'score', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics', 'use_for_transition'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics', 'grid_velocity_umap', 'grid_velocity_pca'
    obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores', 'velocity_umap', 'velocity_pca'
    layers: 'spliced', 'unspliced', 'X_spliced', 'X_unspliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S'
    obsp: 'distances', 'connectivities', 'moments_con', 'pearson_transition_matrix'

Visualize preprocessing results with a streamline plot on the umap basis.

[6]:
dyn.pl.streamline_plot(adata, color=["clusters"], basis="umap", show_legend="on data", show_arrowed_spines=True)
|-----------> plotting with basis key=X_umap
|-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type
../_images/notebooks_tutorial_pancreatic_endocrinogenesis_12_1.png

Visualizing cell velocity vectors via dyn.pl.cell_wise_vectors

[7]:
dyn.pl.cell_wise_vectors(
    adata,
    color=["clusters"],
    basis="umap",
    show_legend="on data",
    quiver_length=6,
    quiver_size=6,
    figsize=(8, 6),
    show_arrowed_spines=False,
)

|-----> X shape: (3696, 2) V shape: (3696, 2)
|-----------> plotting with basis key=X_umap
|-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type
../_images/notebooks_tutorial_pancreatic_endocrinogenesis_14_1.png

Differential geometry analyses reveal dynamics and key gene interactions

The diagram below illustrates the differentiation process of pancreatic endocrine cells and key regulatory genes/motifs.

Diagram

Define the progenitor cells: in this pancreatic endocrinogenesis dataset, we treat Doctal cells as progenitor cells.

[8]:
progenitor = adata.obs_names[adata.obs.clusters.isin(["Ductal"])]
len(progenitor)
[8]:
916
Next we show how to use dynamo to learn a vectorfield. Here we learn the vectorfield on the pca basis. The vector field can be learned on the low dimensional embedding and can be then projected back to the high dimensional space.
Since curl is defined in 2d/3d space, we use umap vectorfield to compute curl. umap and pca basis vectorfields will not interfere with each other (prefixed pca or umap) as you can check computation results in adata. You may try other embeddings such as PCA with fewer principe components to compute curl.

Learning dynamics in PCA space via dyn.vf.VectorField and compute dynamics

[9]:
# dyn.tl.cell_velocities(adata, basis="pca")
dyn.vf.VectorField(adata, basis="pca", pot_curl_div=True)
dyn.vf.VectorField(adata, basis="umap", pot_curl_div=True)

dyn.vf.speed(adata, basis="pca")
dyn.vf.divergence(adata, basis="pca")
dyn.vf.acceleration(adata, basis="pca")
dyn.vf.curl(adata, basis="umap")

|-----> VectorField reconstruction begins...
|-----> Retrieve X and V based on basis: PCA.
        Vector field will be learned in the PCA space.
|-----> Learning vector field with method: sparsevfc.
|-----> [SparseVFC] begins...
|-----> Sampling control points based on data velocity magnitude...
|-----> [SparseVFC] in progress: 100.0000%
|-----> [SparseVFC] finished [0.3509s]
|-----> <insert> velocity_pca_SparseVFC to obsm in AnnData Object.
|-----> <insert> X_pca_SparseVFC to obsm in AnnData Object.
|-----> <insert> VecFld_pca to uns in AnnData Object.
|-----> Running ddhodge to estimate vector field based pseudotime in pca basis...
|-----> graphizing vectorfield...
|-----> incomplete neighbor graph info detected: connectivities and distances do not exist in adata.obsp, indices not in adata.uns.neighbors.
|-----> Neighbor graph is broken, recomputing....
|-----> Start computing neighbor graph...
|-----------> X_data is None, fetching or recomputing...
|-----> fetching X data from layer:None, basis:pca
|-----> method arg is None, choosing methods automatically...
|-----------> method ball_tree selected
|-----> <insert> connectivities to obsp in AnnData Object.
|-----> <insert> distances to obsp in AnnData Object.
|-----> <insert> neighbors to uns in AnnData Object.
|-----> <insert> neighbors.indices to uns in AnnData Object.
|-----> <insert> neighbors.params to uns in AnnData Object.
|-----------? nbrs_idx argument is ignored and recomputed because nbrs_idx is not None and return_nbrs=True
|-----------> calculating neighbor indices...
|-----> [ddhodge completed] in progress: 100.0000%
|-----> [ddhodge completed] finished [19.4571s]
|-----> Computing divergence...
Calculating divergence: 100%|██████████| 4/4 [00:00<00:00, 11.73it/s]
|-----> <insert> control_point_pca to obs in AnnData Object.
|-----> <insert> inlier_prob_pca to obs in AnnData Object.

|-----> <insert> obs_vf_angle_pca to obs in AnnData Object.
|-----> [VectorField] in progress: 100.0000%
|-----> [VectorField] finished [20.2223s]
|-----> VectorField reconstruction begins...
|-----> Retrieve X and V based on basis: UMAP.
        Vector field will be learned in the UMAP space.
|-----> Generating high dimensional grids and convert into a row matrix.
|-----> Learning vector field with method: sparsevfc.
|-----> [SparseVFC] begins...
|-----> Sampling control points based on data velocity magnitude...
|-----> [SparseVFC] in progress: 100.0000%
|-----> [SparseVFC] finished [0.9759s]
|-----> <insert> velocity_umap_SparseVFC to obsm in AnnData Object.
|-----> <insert> X_umap_SparseVFC to obsm in AnnData Object.
|-----> <insert> VecFld_umap to uns in AnnData Object.
|-----> Running ddhodge to estimate vector field based pseudotime in umap basis...
|-----> graphizing vectorfield...
|-----------? nbrs_idx argument is ignored and recomputed because nbrs_idx is not None and return_nbrs=True
|-----------> calculating neighbor indices...
|-----> [ddhodge completed] in progress: 100.0000%
|-----> [ddhodge completed] finished [20.8427s]
|-----> Computing curl...
Calculating 2-D curl: 100%|██████████| 3696/3696 [00:00<00:00, 27838.20it/s]
|-----> Computing divergence...

Calculating divergence: 100%|██████████| 4/4 [00:00<00:00, 33.65it/s]
|-----> <insert> control_point_umap to obs in AnnData Object.
|-----> <insert> inlier_prob_umap to obs in AnnData Object.
|-----> <insert> obs_vf_angle_umap to obs in AnnData Object.
|-----> [VectorField] in progress: 100.0000%
|-----> [VectorField] finished [22.1098s]

Calculating divergence: 100%|██████████| 4/4 [00:00<00:00, 11.50it/s]
|-----> [Calculating acceleration] in progress: 100.0000%
|-----> [Calculating acceleration] finished [0.1408s]
|-----> <insert> acceleration to layers in AnnData Object.
Calculating 2-D curl: 100%|██████████| 3696/3696 [00:00<00:00, 28036.52it/s]

The acceleration and divergence accurately highlight hotspots, including a saddle point in ductal cells (negative divergence), exit from this state to early endocrine progenitors (high positive acceleration), the bifurcation point for late progenitors to differentiate into stable cell types (high acceleration and positive divergence), and stable cell types (negative divergence).

[10]:
dyn.configuration.set_figure_params("dynamo", background="black")

import matplotlib.pyplot as plt

fig1, fig1_axes = plt.subplots(ncols=2, nrows=2, constrained_layout=True, figsize=(12, 8))
dyn.pl.cell_wise_vectors(
    adata,
    color="pca_ddhodge_potential",
    pointsize=0.1,
    alpha=0.7,
    ax=fig1_axes[0, 0],
    quiver_length=6,
    quiver_size=6,
    save_show_or_return="return",
    background="black",
)
dyn.pl.grid_vectors(
    adata,
    color="speed_pca",
    basis="umap",
    ax=fig1_axes[0, 1],
    quiver_length=12,
    quiver_size=12,
    save_show_or_return="return",
    background="black",
)
dyn.pl.streamline_plot(
    adata, color="divergence_pca", basis="umap", ax=fig1_axes[1, 0], save_show_or_return="return", background="black"
)
dyn.pl.streamline_plot(
    adata, color="acceleration_pca", basis="umap", ax=fig1_axes[1, 1], save_show_or_return="return", background="black"
)
plt.show()

|-----> X shape: (3696, 2) V shape: (3696, 2)
|-----------> plotting with basis key=X_umap
|-----------> plotting with basis key=X_umap
|-----------> plotting with basis key=X_umap
|-----------> plotting with basis key=X_umap
../_images/notebooks_tutorial_pancreatic_endocrinogenesis_23_1.png

Visualizing vectorfield attractors and saddle points via dyn.pl.topography

The following 2D UMAP vector field topology shows stable fixed points (attractors) in alpha, beta, and ductal cells, and a saddle point #6 at the branching point between ductal cells and early progenitors.

[11]:
dyn.pl.topography(
    adata, basis="umap", background="white", color=["clusters"], streamline_color="black", show_legend="on data"
)

|-----> Vector field for umap is but its topography is not mapped. Mapping topography now ...
|-----------> plotting with basis key=X_umap
|-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type
../_images/notebooks_tutorial_pancreatic_endocrinogenesis_25_1.png

From the results above, we can observe that high acceleration is observed at the interface between early and late progenitors (magnitude change in velocity), and the bifurcation point where progenitors differentiate into stable cell types (direction change in velocity). Negative divergence is observed at the saddle point and attractors, and positive divergence at the bifurcation point and cell cycle of pancreatic buds.

Constructing a cell type state transition graph

[12]:
dyn.configuration.set_figure_params("dynamo", background="white")
dyn.pd.state_graph(adata, group='clusters', basis='pca', method='vf')
dyn.pl.state_graph(adata,
                   color=['clusters'],
                   group='clusters',
                   basis='umap',
                   show_legend='on data',
                   method='vf');
|-----> Estimating the transition probability between cell types...
|-----> Applying vector field
|-----> [KDTree computation] in progress: 100.0000%in progress: 0.0000%
|-----> [KDTree computation] finished [0.0013s]
|-----> [iterate groups] in progress: 12.5000%
integration with ivp solver: 100%|██████████| 100/100 [00:03<00:00, 25.14it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 146.29it/s]
|-----> [iterate groups] in progress: 25.0000%
integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 22.93it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 144.02it/s]
|-----> [iterate groups] in progress: 37.5000%
integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 24.49it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 142.90it/s]
|-----> [iterate groups] in progress: 50.0000%
integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 22.77it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 135.77it/s]
|-----> [iterate groups] in progress: 62.5000%
integration with ivp solver: 100%|██████████| 70/70 [00:02<00:00, 31.55it/s]
uniformly sampling points along a trajectory: 100%|██████████| 70/70 [00:00<00:00, 144.03it/s]
|-----> [iterate groups] in progress: 75.0000%
integration with ivp solver: 100%|██████████| 100/100 [00:03<00:00, 26.10it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 139.76it/s]
|-----> [iterate groups] in progress: 87.5000%
integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 23.40it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 146.82it/s]
|-----> [iterate groups] in progress: 100.0000%
integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 24.71it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 137.70it/s]
|-----> [iterate groups] in progress: 100.0000%
|-----> [iterate groups] finished [42.2887s]
|-----> [State graph estimation] in progress: 100.0000%
|-----> [State graph estimation] finished [0.0012s]
|-----------> plotting with basis key=X_umap
|-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type
<Figure size 600x400 with 0 Axes>
../_images/notebooks_tutorial_pancreatic_endocrinogenesis_28_18.png

Computing Jacobian and reveal regulator/effector gene relations via dyn.vf.jacobian

[13]:
dyn.vf.jacobian(adata, regulators=pancreas_genes)
Transforming subset Jacobian: 100%|██████████| 3696/3696 [00:00<00:00, 59324.06it/s]
[13]:
AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'Size_Factor', 'initial_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'ntr', 'cell_cycle_phase', 'pca_ddhodge_div', 'pca_ddhodge_potential', 'divergence_pca', 'control_point_pca', 'inlier_prob_pca', 'obs_vf_angle_pca', 'umap_ddhodge_div', 'umap_ddhodge_potential', 'curl_umap', 'divergence_umap', 'control_point_umap', 'inlier_prob_umap', 'obs_vf_angle_umap', 'speed_pca', 'acceleration_pca', 'jacobian_det_pca'
    var: 'highly_variable_genes', 'nCells', 'nCounts', 'pass_basic_filter', 'log_m', 'log_cv', 'score', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics', 'use_for_transition'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics', 'grid_velocity_umap', 'grid_velocity_pca', 'VecFld_pca', 'VecFld_umap', 'clusters_graph', 'jacobian_pca'
    obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores', 'velocity_umap', 'velocity_pca', 'velocity_pca_SparseVFC', 'X_pca_SparseVFC', 'velocity_umap_SparseVFC', 'X_umap_SparseVFC', 'acceleration_pca'
    layers: 'spliced', 'unspliced', 'X_spliced', 'X_unspliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S', 'acceleration'
    obsp: 'distances', 'connectivities', 'moments_con', 'pearson_transition_matrix', 'pca_ddhodge', 'umap_ddhodge'

Pax4/Neurog3

Ngn3 activates Pax4 in progenitors, initiating pancreatic endocrinogenesis

[14]:
dyn.pl.jacobian(
    adata,
    basis="umap",
    regulators=[
        "Neurog3",
    ],
    effectors=["Pax4"],
    alpha=1,
)

../_images/notebooks_tutorial_pancreatic_endocrinogenesis_32_0.png

Pax4/Arx

Jacobian analyses reveal mutual inhibition of Pax4 and Arx at the bifurcation point in progenitors.

[15]:
dyn.pl.jacobian(adata, basis="umap", regulators=["Arx"], effectors=["Pax4"], alpha=1)

../_images/notebooks_tutorial_pancreatic_endocrinogenesis_34_0.png
[16]:
dyn.pl.jacobian(adata, basis="umap", regulators=["Pax4"], effectors=["Arx"], alpha=1)

../_images/notebooks_tutorial_pancreatic_endocrinogenesis_35_0.png

Pax4/Ins2

Pdx1 activates Ins2 in beta cells.

[17]:
dyn.pl.jacobian(adata, basis="umap", regulators=["Pax4"], effectors=["Ins2"], alpha=1)

../_images/notebooks_tutorial_pancreatic_endocrinogenesis_37_0.png
[18]:
adata.obs["clusters"].unique()

[18]:
['Pre-endocrine', 'Ductal', 'Alpha', 'Ngn3 high EP', 'Delta', 'Beta', 'Ngn3 low EP', 'Epsilon']
Categories (8, object): ['Ductal', 'Ngn3 low EP', 'Ngn3 high EP', 'Pre-endocrine', 'Beta', 'Alpha', 'Delta', 'Epsilon']

We can rank the elements in the Jacobian. There are 5 ways to rank the Jacobian:

  1. 'full_reg': top regulators are ranked for each effector for each cell group;

  2. 'full_eff': top effectors are ranked for each regulator for each cell group;

  3. reg’: top regulators in each cell group;

  4. eff’: top effectors in each cell group;

  5. int’: top effector-regulator pairs in each cell group.

The default mdoe is 'full_reg'. Note that in full_reg and full_eff modes, a dictionary mapping clusters (cell type) to ranking dataframes is returned by dyn.vf.rank_jacobian_genes.

[19]:
full_reg_rank = dyn.vf.rank_jacobian_genes(
    adata, groups="clusters", mode="full_reg", abs=True, output_values=True, return_df=True
)
full_eff_rank = dyn.vf.rank_jacobian_genes(
    adata, groups="clusters", mode="full_eff", abs=True, exclude_diagonal=True, output_values=True, return_df=True
)

[20]:
full_reg_rank['Pre-endocrine'][:5]
[20]:
Neurog3 Neurog3_values Sox9 Sox9_values Isl1 Isl1_values Hes1 Hes1_values Ins1 Ins1_values Gcg Gcg_values Neurod1 Neurod1_values Nkx2-2 Nkx2-2_values Ptf1a Ptf1a_values Pax6 Pax6_values Nkx6-1 Nkx6-1_values Pdx1 Pdx1_values Pax4 Pax4_values Ghrl Ghrl_values Iapp Iapp_values Ins2 Ins2_values Arx Arx_values
0 Neurog3 0.000569 Gcg 0.000232 Gcg 0.000681 Gcg 0.000089 Gcg 0.000504 Gcg 0.001130 Neurog3 0.000081 Ghrl 0.000043 Gcg 0.000019 Gcg 0.000283 Isl1 0.000492 Isl1 0.000357 Neurog3 0.000335 Ghrl 0.002203 Iapp 0.001547 Gcg 0.000455 Isl1 0.000319
1 Iapp 0.000453 Iapp 0.000186 Isl1 0.000663 Iapp 0.000076 Iapp 0.000396 Ghrl 0.000568 Iapp 0.000070 Iapp 0.000032 Ghrl 0.000016 Isl1 0.000251 Ghrl 0.000390 Pdx1 0.000329 Isl1 0.000251 Isl1 0.000795 Isl1 0.001015 Isl1 0.000356 Arx 0.000288
2 Gcg 0.000402 Neurog3 0.000115 Nkx6-1 0.000399 Isl1 0.000063 Ins1 0.000365 Isl1 0.000251 Isl1 0.000062 Isl1 0.000026 Iapp 0.000015 Neurog3 0.000224 Neurog3 0.000370 Arx 0.000306 Iapp 0.000233 Pax4 0.000493 Gcg 0.000834 Ins1 0.000354 Pdx1 0.000281
3 Ghrl 0.000382 Ghrl 0.000096 Neurog3 0.000368 Neurog3 0.000049 Ghrl 0.000324 Arx 0.000200 Ins1 0.000047 Gcg 0.000025 Pdx1 0.000013 Ins1 0.000218 Arx 0.000251 Iapp 0.000266 Ghrl 0.000222 Iapp 0.000369 Pdx1 0.000772 Iapp 0.000301 Iapp 0.000200
4 Isl1 0.000366 Isl1 0.000089 Iapp 0.000330 Ghrl 0.000044 Isl1 0.000320 Neurog3 0.000194 Pdx1 0.000041 Neurog3 0.000023 Nkx6-1 0.000012 Iapp 0.000197 Gcg 0.000245 Ghrl 0.000244 Pax4 0.000179 Nkx6-1 0.000361 Neurog3 0.000430 Ins2 0.000271 Ghrl 0.000188

Arcplot shows gene regulatory networks in progenitors and beta cells

Next we use dyn.vf.build_network_per_cluster to build an cluster2edges dictionary which maps cluster name (celltype) to dataframes containing edge information that is necessary to build a networkx graph object. Then we can visualize regulatory networks via dyn.pl.arcPlot.

An example regulatory network in progenitor (Ductal) cells

[21]:
cell_type_regulators = ["Neurog3", "Arx", "Pax4", "Pax6", "Gcg", "Ins2", "Ss5", "Ghri"]
cluster2edges = dyn.vf.build_network_per_cluster(
    adata,
    cluster="clusters",
    cluster_names=None,
    full_reg_rank=full_reg_rank,
    full_eff_rank=full_eff_rank,
    genes=cell_type_regulators,
    n_top_genes=100,
)

import networkx as nx

network = nx.from_pandas_edgelist(
    cluster2edges["Ductal"], "regulator", "target", edge_attr="weight", create_using=nx.DiGraph()
)

ax = dyn.pl.arcPlot(
    adata, cluster="clusters", cluster_name="Beta", edges_list=None, network=network, color=None
)  # color="M_s")

|-----> [iterating reg_groups] in progress: 100.0000%
|-----> [iterating reg_groups] finished [0.1333s]
../_images/notebooks_tutorial_pancreatic_endocrinogenesis_45_1.png

Another example regulatory network in Beta cells

[22]:
cell_type_regulators = ["Pax6", "Pdx1", "Iapp", "Ins2"]
cluster2edges = dyn.vf.build_network_per_cluster(
    adata,
    cluster="clusters",
    cluster_names=None,
    full_reg_rank=full_reg_rank,
    full_eff_rank=full_eff_rank,
    genes=cell_type_regulators,
    n_top_genes=100,
)

import networkx as nx

network = nx.from_pandas_edgelist(
    cluster2edges["Beta"], "regulator", "target", edge_attr="weight", create_using=nx.DiGraph()
)

ax = dyn.pl.arcPlot(
    adata, cluster="clusters", cluster_name="Beta", edges_list=None, network=network, color=None
)  # color="M_s")

|-----> [iterating reg_groups] in progress: 100.0000%
|-----> [iterating reg_groups] finished [0.1114s]
../_images/notebooks_tutorial_pancreatic_endocrinogenesis_47_1.png

Acceleration Ranking

The acceleration can be computed based on the Jacobian. The acceleration indicates the rate of change in the velocity for a gene.

[23]:
dyn.vf.acceleration(adata, basis='pca')
dyn.vf.rank_acceleration_genes(adata, groups='clusters')
|-----> [Calculating acceleration] in progress: 100.0000%
|-----> [Calculating acceleration] finished [0.1464s]
|-----> <insert> acceleration to layers in AnnData Object.
[23]:
AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'Size_Factor', 'initial_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'ntr', 'cell_cycle_phase', 'pca_ddhodge_div', 'pca_ddhodge_potential', 'divergence_pca', 'control_point_pca', 'inlier_prob_pca', 'obs_vf_angle_pca', 'umap_ddhodge_div', 'umap_ddhodge_potential', 'curl_umap', 'divergence_umap', 'control_point_umap', 'inlier_prob_umap', 'obs_vf_angle_umap', 'speed_pca', 'acceleration_pca', 'jacobian_det_pca'
    var: 'highly_variable_genes', 'nCells', 'nCounts', 'pass_basic_filter', 'log_m', 'log_cv', 'score', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics', 'use_for_transition'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics', 'grid_velocity_umap', 'grid_velocity_pca', 'VecFld_pca', 'VecFld_umap', 'clusters_graph', 'jacobian_pca', 'rank_acceleration', 'rank_abs_acceleration'
    obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores', 'velocity_umap', 'velocity_pca', 'velocity_pca_SparseVFC', 'X_pca_SparseVFC', 'velocity_umap_SparseVFC', 'X_umap_SparseVFC', 'acceleration_pca'
    layers: 'spliced', 'unspliced', 'X_spliced', 'X_unspliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S', 'acceleration'
    obsp: 'distances', 'connectivities', 'moments_con', 'pearson_transition_matrix', 'pca_ddhodge', 'umap_ddhodge'
[24]:
adata.uns['rank_acceleration'][:5]
[24]:
Alpha Beta Delta Ductal Epsilon Ngn3 high EP Ngn3 low EP Pre-endocrine
0 Ghrl Ins1 Ins1 Cdc20 Tmsb4x Chga Neurog3 Tmsb4x
1 Cck Ins2 Cck Ccnb2 Gcg Chgb Tmsb4x Pyy
2 Mdk Xist Cdkn1a Hmgb2 Nkx6-1 Tm4sf4 Btbd17 Iapp
3 Rbp4 Krt8 Ins2 Hist1h2bc Rps19 Akr1c19 Mdk Calr
4 Serpina1c Gpx3 Krt7 Birc5 Tmsb10 Cryba2 Btg2 Ppp1r14a

Enrichment Analysis

Enrichment analysis allows us to see if top ranking genes are significantly enriched in certain biological pathways. For example, when we pick the top 100 accelerating genes in Ductal cells, we found that they are highly enriched in cell cycle related pathways.

[25]:
enr = dyn.ext.enrichr(adata.uns['rank_acceleration']['Ductal'][:100].to_list(), organism='mouse', outdir='./enrichr')
enr.results.head(5)
[25]:
Gene_set Term Overlap P-value Adjusted P-value Old P-value Old Adjusted P-value Odds Ratio Combined Score Genes
0 GO_Biological_Process_2018 regulation of mitotic cell cycle phase transit... 16/184 9.341163e-16 7.622389e-13 0 0 22.371882 774.222174 HSP90AA1;ANAPC16;NDE1;BUB1B;HMMR;DYNLL1;AURKA;...
1 GO_Biological_Process_2018 mitotic sister chromatid segregation (GO:0000070) 11/82 2.897512e-13 1.182185e-10 0 0 34.517962 996.525054 CENPE;CCNB1;KIFC1;PRC1;NUSAP1;CDCA8;KIF23;KIF2...
2 GO_Biological_Process_2018 mitotic spindle organization (GO:0007052) 9/74 1.142841e-10 3.108527e-08 0 0 30.180051 690.891800 CENPE;TPX2;CCNB1;KIFC1;PRC1;BIRC5;KIF23;AURKA;...
3 GO_Biological_Process_2018 metaphase plate congression (GO:0051310) 7/44 2.078922e-09 4.241002e-07 0 0 40.407149 807.796133 CENPE;CCNB1;CENPF;KIFC1;CDCA8;KIF2C;KIF22
4 GO_Biological_Process_2018 establishment of chromosome localization (GO:0... 5/13 3.521204e-09 5.746605e-07 0 0 130.868421 2547.283519 CENPE;CENPF;NDE1;KIF2C;KIF22
[26]:
from gseapy.plot import barplot, dotplot
barplot(enr.res2d, title='GO_Biological_Process_2018', cutoff=0.05)
dotplot(enr.res2d, title='KEGG_2016',cmap='viridis_r', cutoff=0.05)
[26]:
<AxesSubplot:title={'center':'KEGG_2016'}, xlabel='-log$_{10}$(Adjusted P-value)'>
../_images/notebooks_tutorial_pancreatic_endocrinogenesis_55_1.png
../_images/notebooks_tutorial_pancreatic_endocrinogenesis_55_2.png
[27]:
enr = dyn.ext.enrichr(adata.uns['rank_acceleration']['Ngn3 low EP'][:100].to_list(), organism='mouse', outdir='./enrichr')
enr.results.head(5)
2022-03-12 13:48:23,207 Warning: No enrich terms using library GO_Biological_Process_2018 when cutoff = 0.05
[27]:
Gene_set Term Overlap P-value Adjusted P-value Old P-value Old Adjusted P-value Odds Ratio Combined Score Genes
0 GO_Biological_Process_2018 purine ribonucleotide metabolic process (GO:00... 4/66 0.000334 0.159565 0 0 13.331989 106.701278 TPST2;SULT2B1;ACLY;HSPA8
1 GO_Biological_Process_2018 oxoacid metabolic process (GO:0043436) 3/32 0.000541 0.159565 0 0 21.191966 159.391414 TPST2;SULT2B1;SULF2
2 GO_Biological_Process_2018 bone development (GO:0060348) 3/34 0.000648 0.159565 0 0 19.822747 145.516274 GNAS;PPIB;SULF2
3 GO_Biological_Process_2018 regulation of protein dephosphorylation (GO:00... 3/40 0.001048 0.159565 0 0 16.603232 113.917477 PPP1R14A;PPP1R14B;CALM2
4 GO_Biological_Process_2018 positive regulation of nuclear-transcribed mRN... 2/11 0.001322 0.159565 0 0 45.104308 298.984305 BTG2;TOB1