# in silico perturbation

$$\newcommand{\pdv}[2]{\dfrac{\partial #1}{\partial #2}} \newcommand{\trp}{\mathsf{T}}$$

We leverage the analytical Jacobian of the reconstructed vector field function to make $$\textit{in silico}$$ genetic perturbations (left panel in this figure.) and predict cell-fate outcomes after the perturbation (right panel in this figure.).

Intuitively, to simulate the genetic perturbation effects, we will introduce genetic perturbations to the system (encoded by the perturbation vector) and then let the perturbations propogate in the gene regulatory network (encoded by the Jacobian matrix) to execute downstream responses. Mathematically, for gene $$i$$ in any cell, the genetic perturbation effects or changes in its velocity (or more accurately, the vector field) w.r.t. to small perturbations in the expression of all genes in the network (encoded by the Jacobian matrix $$\boldsymbol J$$), $$\mathrm dx_1$$, $$\mathrm dx_2$$,…, $$\mathrm dx_n$$, can be calculated with the $$\textit{exact differential}$$:

\begin{align*} \mathrm d f_i = \pdv{f_i}{x_1}\mathrm dx_1 + \pdv{f_i}{x_2}\mathrm dx_2 + ... + \pdv{f_i}{x_n}\mathrm dx_n. \end{align*}

In vectorized form:

\begin{split}\begin{align*} \begin{bmatrix} \mathrm df_1 \\[1.5ex] \mathrm df_2 \\[1.5ex] \dots \\[1.5ex] \mathrm df_n \end{bmatrix} = \begin{bmatrix} \pdv{f_1}{x_1} \ &\pdv{f_1}{x_2} \ &\dots \ &\pdv{f_1}{x_n} \\[2ex] \pdv{f_2}{x_1} \ &\pdv{f_2}{x_2} \ &\dots \ &\pdv{f_2}{x_n} \\[2ex] \dots \ &\dots \ &\dots \ &\dots \\[2ex] \pdv{f_n}{x_1} \ &\pdv{f_n}{x_2} \ &\dots \ &\pdv{f_n}{x_n} \end{bmatrix} \begin{bmatrix} \mathrm dx_1 \\[1.5ex] \mathrm dx_2 \\[1.5ex] \dots \\[1.5ex] \mathrm dx_n \end{bmatrix}. \end{align*}\end{split}

The matrix on the right hand side is the Jacobian of the vector field. Replacing infinitesimal changes with finite perturbations, the above equation becomes:

\begin{align*} \Delta \boldsymbol f = \boldsymbol J \Delta \boldsymbol x. \end{align*}

In practice, a proportionality constant $$c$$ (i.e. setting a perturbation to be 100 or -100) is often added to the perturbation $$\Delta \boldsymbol x$$ to amplify the response $$\Delta \boldsymbol f$$. Furthermore, because vector fields are often learned in the PCA space, the perturbations in the $$d$$-dimensional gene space are first transformed to the $$k$$-dimensional PCA space by: \begin{align*} \Delta \boldsymbol x = \boldsymbol Q^\trp (\Delta \boldsymbol y - \boldsymbol \mu). \end{align*} where $$\boldsymbol Q$$ is the $$d$$-by-$$k$$ PCA loading matrix, and $$\boldsymbol \mu$$ is the mean of the PCA-transformed data. The response $$\Delta \boldsymbol f$$ can be transformed back to the PCA space: \begin{align*} \Delta \boldsymbol g = \boldsymbol Q \Delta \boldsymbol f + \boldsymbol \mu. \end{align*} One can then use $$\Delta \boldsymbol f$$, a gene by cell matrix, to identify the strongest positive or negative responders of the genetic perturbation across cells.

Importantly, because $$\Delta \boldsymbol f$$ implies how each cell state will be affected after genetic perturbations, we can predict the cell fate trajectory under genetic perturbations by integrating the perturbation effects across cells over gene expression space. To visualize the cell fate trajectory, pairs of $$\boldsymbol x$$ and $$\Delta \boldsymbol g$$ are used in the same vein as the gene expression and RNA velocity vector to be further projected onto the UMAP or other low dimensional embeddings using the transition matrix and then plotted with streamlines.