dynamo.pl.topography

dynamo.pl.topography(adata, basis='umap', fps_basis='umap', x=0, y=1, color='ntr', layer='X', highlights=None, labels=None, values=None, theme=None, cmap=None, color_key=None, color_key_cmap=None, background='white', ncols=4, pointsize=None, figsize=(6, 4), show_legend='on data', use_smoothed=True, xlim=None, ylim=None, t=None, terms=['streamline', 'fixed_points'], init_cells=None, init_states=None, quiver_source='raw', fate='both', approx=False, quiver_size=None, quiver_length=None, density=1, linewidth=1, streamline_color=None, streamline_alpha=0.4, color_start_points=None, markersize=200, marker_cmap=None, save_show_or_return='show', save_kwargs={}, aggregate=None, show_arrowed_spines=False, ax=None, sort='raw', frontier=False, s_kwargs_dict={}, q_kwargs_dict={}, n=25, **streamline_kwargs_dict)[source]

Plot the streamline, fixed points (attractor / saddles), nullcline, separatrices of a recovered dynamic system for single cells. The plot is created on two dimensional space.

Topography function plots the full vector field topology including streamline, fixed points, characteristic lines. A key difference between dynamo and Velocyto or scVelo is that we learn a functional form of a vector field which can be used to predict cell fate over arbitrary time and space. On states near observed cells, it retrieves the key kinetics dynamics from the data observed and smoothes them. When time and state is far from your observed single cell RNA-seq datasets, the accuracy of prediction will decay. Vector field can be efficiently reconstructed in high dimension or lower pca/umap space. Since we learn a vector field function, we can plot the full vector via streamline on the entire domain as well as predicts cell fates by providing a set of initial cell states (via init_cells, init_states). The nullcline and separatrix provide topological information about the reconstructed vector field. By definition, the x/y-nullcline is a set of points in the phase plane so that dx/dt = 0 or dy/dt=0. Geometrically, these are the points where the vectors are either straight up or straight down. Algebraically, we find the x-nullcline by solving f(x,y) = 0. The boundary different attractor basis is the separatrix because it separates the regions into different subregions with a specific behavior. To find them is a very difficult problem and separatrix calculated by dynamo requires manual inspection.

Here is more details on the fixed points drawn on the vector field: Fixed points are concepts introduced in dynamic systems theory. There are three types of fixed points: 1) repeller: a repelling state that only has outflows, which may correspond to a pluripotent cell state (ESC) that tends to differentiate into other cell states automatically or under small perturbation; 2) unstable fixed points or saddle points. Those states have attraction on some dimension (genes or reduced dimensions) but diverge in at least one other dimension. Saddle may correspond to progenitors, which are differentiated from ESC/pluripotent cells and relatively stable, but can further differentiate into multiple terminal cell types / states; 3) lastly, stable fixed points / cell type or attractors, which only have inflows and attract all cell states nearby, which may correspond to stable cell types and can only be kicked out of its cell state under extreme perturbation or in very rare situation. Fixed points are numbered with each number color coded. The mapping of the color of the number to the type of fixed point are: red: repellers; blue: saddle points; black: attractors. The scatter point itself also has filled color, which corresponds to confidence of the estimated fixed point. The lighter, the more confident or the fixed points are closer to the sequenced single cells. Confidence of each fixed points can be used in conjunction with the Jacobian analysis for investigating regulatory network with spatiotemporal resolution.

By default, we plot a figure with three subplots, each colors cells either with potential, curl or divergence. potential is related to the intrinsic time, where a small potential is related to smaller intrinsic time and vice versa. Divergence can be used to indicate the state of each cell is in. Negative values correspond to potential sink while positive corresponds to potential source. https://en.wikipedia.org/wiki/Divergence. Curl may be related to cell cycle or other cycling cell dynamics. On 2d, negative values correspond to clockwise rotation while positive corresponds to anticlockwise rotation. https://www.khanacademy.org/math/multivariable-calculus/greens-theorem-and-stokes-theorem/formal-definitions-of-divergence-and-curl/a/defining-curl In conjunction with cell cycle score (dyn.pp.cell_cycle_scores), curl can be used to identify cells under active cell cycle progression.

Parameters:
  • adata (AnnData) – an AnnData object.

  • basis (str) – the reduced dimension stored in adata.obsm. The specific basis key will be constructed in the following priority if exits: 1) specific layer input + basis 2) X_ + basis 3) basis. E.g. if basis is PCA, scatters is going to look for 1) if specific layer is spliced, spliced_pca 2) X_pca (dynamo convention) 3) pca. Defaults to “umap”.

  • fps_basis (str) – the basis that will be used for identifying or retrieving fixed points. Note that if fps_basis is different from basis, the nearest cells of the fixed point from the fps_basis will be found and used to visualize the position of the fixed point on basis embedding. Defaults to “umap”.

  • x (int) – the column index of the low dimensional embedding for the x-axis. Defaults to 0.

  • y (int) – the column index of the low dimensional embedding for the y-axis. Defaults to 1.

  • color (str) – any column names or gene expression, etc. that will be used for coloring cells. Defaults to “ntr”.

  • layer (str) – the layer of data to use for the scatter plot. Defaults to “X”.

  • highlights (Optional[list]) – the color group that will be highlighted. If highligts is a list of lists, each list is relate to each color element. Defaults to None.

  • labels (Optional[list]) – an array of labels (assumed integer or categorical), one for each data sample. This will be used for coloring the points in the plot according to their label. Note that this option is mutually exclusive to the values option. Defaults to None.

  • values (Optional[list]) – an array of values (assumed float or continuous), one for each sample. This will be used for coloring the points in the plot according to a colorscale associated to the total range of values. Note that this option is mutually exclusive to the labels option. Defaults to None.

  • theme (Optional[Literal['blue', 'red', 'green', 'inferno', 'fire', 'viridis', 'darkblue', 'darkred', 'darkgreen']]) – A color theme to use for plotting. A small set of predefined themes are provided which have relatively good aesthetics. Available themes are: {‘blue’, ‘red’, ‘green’, ‘inferno’, ‘fire’, ‘viridis’, ‘darkblue’, ‘darkred’, ‘darkgreen’}. Defaults to None.

  • cmap (Optional[str]) – The name of a matplotlib colormap to use for coloring or shading points. If no labels or values are passed this will be used for shading points according to density (largely only of relevance for very large datasets). If values are passed this will be used for shading according the value. Note that if theme is passed then this value will be overridden by the corresponding option of the theme. Defaults to None.

  • color_key (Union[Dict[str, str], List[str], None]) – the method to assign colors to categoricals. This can either be an explicit dict mapping labels to colors (as strings of form ‘#RRGGBB’), or an array like object providing one color for each distinct category being provided in labels. Either way this mapping will be used to color points according to the label. Note that if theme is passed then this value will be overridden by the corresponding option of the theme. Defaults to None.

  • color_key_cmap (Optional[str]) – the name of a matplotlib colormap to use for categorical coloring. If an explicit color_key is not given a color mapping for categories can be generated from the label list and selecting a matching list of colors from the given colormap. Note that if theme is passed then this value will be overridden by the corresponding option of the theme. Defaults to None.

  • background (Optional[str]) – the color of the background. Usually this will be either ‘white’ or ‘black’, but any color name will work. Ideally one wants to match this appropriately to the colors being used for points etc. This is one of the things that themes handle for you. Note that if theme is passed then this value will be overridden by the corresponding option of the theme. Defaults to None.

  • ncols (int) – the number of columns for the figure. Defaults to 4.

  • pointsize (Optional[float]) – the scale of the point size. Actual point cell size is calculated as 500.0 / np.sqrt(adata.shape[0]) * pointsize. Defaults to None.

  • figsize (Tuple[float, float]) – the width and height of a figure. Defaults to (6, 4).

  • show_legend (str) – whether to display a legend of the labels. Defaults to “on data”.

  • use_smoothed (bool) – whether to use smoothed values (i.e. M_s / M_u instead of spliced / unspliced, etc.). Defaults to True.

  • xlim (Optional[ndarray]) – the range of x-coordinate. Defaults to None.

  • ylim (Optional[ndarray]) – the range of y-coordinate. Defaults to None.

  • t (Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]], None]) – the length of the time period from which to predict cell state forward or backward over time. This is used by the odeint function. Defaults to None.

  • terms (List[str]) – a tuple of plotting items to include in the final topography figure. (‘streamline’, ‘nullcline’, ‘fixed_points’, ‘separatrix’, ‘trajectory’, ‘quiver’) are all the items that we can support. Defaults to [“streamline”, “fixed_points”].

  • init_cells (Optional[List[int]]) – cell name or indices of the initial cell states for the historical or future cell state prediction with numerical integration. If the names in init_cells are not find in the adata.obs_name, it will be treated as cell indices and must be integers. Defaults to None.

  • init_states (Optional[ndarray]) – the initial cell states for the historical or future cell state prediction with numerical integration. It can be either a one-dimensional array or N x 2 dimension array. The init_state will be replaced to that defined by init_cells if init_cells are not None. Defaults to None.

  • quiver_source (Literal['raw', 'reconstructed']) – the data source that will be used to draw the quiver plot. If init_cells is provided, this will set to be the projected RNA velocity before vector field reconstruction automatically. If init_cells is not provided, this will set to be the velocity vectors calculated from the reconstructed vector field function automatically. If quiver_source is reconstructed, the velocity vectors calculated from the reconstructed vector field function will be used. Defaults to “raw”.

  • fate (Literal['history', 'future', 'both']) – predict the historial, future or both cell fates. This corresponds to integrating the trajectory in forward, backward or both directions defined by the reconstructed vector field function. Defaults to “both”.

  • approx (bool) – whether to use streamplot to draw the integration line from the init_state. Defaults to False.

  • quiver_size (Optional[float]) – the size of quiver. If None, we will use set quiver_size to be 1. Note that quiver quiver_size is used to calculate the head_width (10 x quiver_size), head_length (12 x quiver_size) and headaxislength (8 x quiver_size) of the quiver. This is done via the default_quiver_args function which also calculate the scale of the quiver (1 / quiver_length). Defaults to None.

  • quiver_length (Optional[float]) – the length of quiver. The quiver length which will be used to calculate scale of quiver. Note that befoe applying default_quiver_args velocity values are first rescaled via the quiver_autoscaler function. Scale of quiver indicates the number of data units per arrow length unit, e.g., m/s per plot width; a smaller scale parameter makes the arrow longer. Defaults to None.

  • density (float) – the density of the plt.streamplot function. Defaults to 1.

  • linewidth (float) – the multiplier of automatically calculated linewidth passed to the plt.streamplot function. Defaults to 1.

  • streamline_color (Optional[str]) – the color of the vector field streamlines. Defaults to None.

  • streamline_alpha (float) – the alpha value applied to the vector field streamlines. Defaults to 0.4.

  • color_start_points (Optional[str]) – the color of the starting point that will be used to predict cell fates. Defaults to None.

  • markersize (float) – the size of the marker. Defaults to 200.

  • marker_cmap (Optional[str]) – the name of a matplotlib colormap to use for coloring or shading the confidence of fixed points. If None, the default color map will set to be viridis (inferno) when the background is white (black). Defaults to None.

  • save_show_or_return (Literal['save', 'show', 'return']) – whether to save, show or return the figure. Defaults to “show”.

  • save_kwargs (Dict[str, Any]) – a dictionary that will be passed to the save_show_ret function. By default, it is an empty dictionary and the save_show_ret function will use the {“path”: None, “prefix”: ‘topography’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise, you can provide a dictionary that properly modify those keys according to your needs. Defaults to {}.

  • aggregate (Optional[str]) – the column in adata.obs that will be used to aggregate data points. Defaults to None.

  • show_arrowed_spines (bool) – whether to show a pair of arrowed spines representing the basis of the scatter is currently using. Defaults to False.

  • ax (Optional[Axes]) – the axis on which to make the plot. Defaults to None.

  • sort (Literal['raw', 'abs', 'neg']) – the method to reorder data so that high values points will be on top of background points. Can be one of {‘raw’, ‘abs’, ‘neg’}, i.e. sorted by raw data, sort by absolute values or sort by negative values. Defaults to “raw”. Defaults to “raw”.

  • frontier (bool) – whether to add the frontier. Scatter plots can be enhanced by using transparency (alpha) in order to show area of high density and multiple scatter plots can be used to delineate a frontier. See matplotlib tips & tricks cheatsheet (https://github.com/matplotlib/cheatsheets). Originally inspired by figures from scEU-seq paper: https://science.sciencemag.org/content/367/6482/1151. Defaults to False.

  • s_kwargs_dict (Dict[str, Any]) – the dictionary of the scatter arguments. Defaults to {}.

  • q_kwargs_dict (Dict[str, Any]) – additional parameters that will be passed to plt.quiver function. Defaults to {}.

  • n (int) – Number of samples for calculating the fixed points.

  • **streamline_kwargs_dict – any other kwargs that would be passed to pyplot.streamline.

Return type:

Union[Axes, List[Axes], None]

Returns:

None would be returned by default. If save_show_or_return is set to be ‘return’, the Axes of the generated subplots would be returned.