PyPI PyPIDownloads Docs

CoSpar - dynamic inference by integrating state and lineage information

https://user-images.githubusercontent.com/4595786/104988296-b987ce00-59e5-11eb-8dbe-a463b355a9fd.png

CoSpar is a toolkit for dynamic inference from lineage-traced single cells.
The methods are based on Wang et al. Nat. Biotech. (2022).

Dynamic inference based on single-cell state measurement alone requires serious simplifications. On the other hand, direct dynamic measurement via lineage tracing only captures partial information and its interpretation is challenging. CoSpar integrates both state and lineage information to infer a finite-time transition map of a development/differentiation system. It gains superior robustness and accuracy by exploiting both the local coherence and sparsity of differentiation transitions, i.e., neighboring initial states share similar yet sparse fate outcomes. Building around the AnnData object, CoSpar provides an integrated analysis framework for datasets with both state and lineage information. When only state information is available, CoSpar also improves upon existing dynamic inference methods by imposing sparsity and coherence. It offers essential toolkits for analyzing lineage data, state information, or their integration.

CoSpar’s key applications

  • infer transition maps from lineage data, state measurements, or their integration.

  • predict the fate bias of progenitor cells.

  • order cells along a differentiation trajectory leading to a given cell fate.

  • predict gene expression dynamics along a trajectory.

  • predict genes whose expression correlates with future fate outcome.

  • generate a putative fate hierarchy, ordering fates by their lineage distances.

Recorded Talks

Support

Feel free to submit an issue or send us an email. Your help to improve CoSpar is highly appreciated.

Acknowledgment

Shou-Wen Wang wants to acknowledge Xiaojie Qiu for inspiring him to make this website. He also wants to acknowledge the community that maintains scanpy and scvelo, where he learned about proper code documentation. He thanks Tal Debrobrah Scully, Qiu Wu and other lab members for testing the package. Shou-Wen wants to thank especially Allon Klein for his mentorship. Finally, he wants to acknowledge the generous support of the Damon Runyon Foundation through the Quantitative Biology Fellowship.

About CoSpar

The following information is adapted from Wang et al. Nat. Biotech. (2022). High-throughput single-cell measurements have enabled unbiased studies of development and differentiation, leading to numerous methods for dynamic inference. However, single-cell RNA sequencing (scRNA-seq) data alone does not fully constrain the differentiation dynamics, and existing methods inevitably operate under simplified assumptions. In parallel, the lineage information of individual cells can be profiled simultaneously along with their transcriptome by using a heritable and expressible DNA barcode as a lineage tracer (we call lineage-tracing scRNAseq, or LT-scSeq). The barcode may remain static or evolve over time.

However, the lineage data could be challenging to analyze. These challenges include stochastic differentiation and variable expansion of clones; cells loss during analysis; barcode homoplasy wherein cells acquire the same barcode despite not having a lineage relationship; access to clones only at a single time point; and clonal dispersion due to a lag time between labeling cells and the first sampling (the lag time is necessary to allow the clone to grow large for resampling).

CoSpar, developed by Wang et al. Nat. Biotech. (2022), is among the first tools to perform dynamic inference by integrating state and lineage information. It solves for the transition probability map from cell states at an earlier time point to states at a later time point. It achieves accuracy and robustness by learning a sparse and coherent transition map, where neighboring initial states share similar yet sparse fate outcomes. Built upon the finite-time transition map, CoSpar can 1) infer fate potential of early states; 2) detect early fate bias (thus, fate boundary) among a heterogeneous progenitor population; 3) identify putative driver genes for fate bifurcation; 4) infer fate coupling or hierarchy; 5) visualize gene expression dynamics along an inferred differential trajectory. CoSpar also provides several methods to analyze clonal data by itself, including the clonal coupling between fate clusters and the bias of a clone towards a given fate, etc. We envision CoSpar to be a platform to integrate key methods needed to analyze data with both state and lineage information.

https://user-images.githubusercontent.com/4595786/113746452-56e4cb00-96d4-11eb-8278-0aac0469ba9d.png

(copy right: Nature Biotechnology)

Coherent sparse optimization

One formalization of dynamic inference is to identify a transition map, a matrix \(T_{ij} (t_1,t_2)\), which describes the probability of a cell, initially in some state \(i\) at time \(t_1\), giving rise to progeny in a state \(j\) at time \(t_2\). We define \(T_{ij} (t_1,t_2)\) specifically as the fraction of progeny from state \(i\) that occupy state \(j\). This transition matrix averages the effects of cell division, loss, and differentiation, but it nonetheless proves useful for several applications.

We make two reasonable assumptions about the nature of biological dynamics to constrain inference of the transition map. We assume the map to be a sparse matrix, since most cells can access just a few states during an experiment. And we assume the map to be locally coherent, meaning that cells in similar states should share similar fate outcomes. These constraints together force transition maps to be parsimonious and smooth, which we reasoned would make them robust to practical sources of noise in LT-scSeq experiments. As inputs, CoSpar requires a barcode-by-cell matrix \(I(t)\) that encodes the clonal information at time \(t\), and a data matrix for observed cell states (e.g. from scRNA-seq). Clonal data may have nested structure reflecting subclonal labeling.

CoSpar is formulated assuming that we have information on the same clones at more than one time point. More often, one might observe clones at only a later time point \(t_2\). For these cases inference is not fully constrained, one must learn both the transition map T and the initial clonal data \(I(t_1)\). We approximate a solution additionally constrained by a minimum global transport cost. We show that this approach is robust to initialization in tested datasets. Finally, coherence and sparsity provide reasonable constraints to the simpler problem of predicting dynamics from state heterogeneity alone without lineage data. We extended CoSpar to this case. Thus, CoSpar is flexible to different experimental designs, as summarized by the above figure. Our core algorithms are illustrated below.

https://user-images.githubusercontent.com/4595786/113746670-93b0c200-96d4-11eb-89c0-d1e7d72383e7.png

(copy right: Nature Biotechnology)

Below, we formalize the coherent sparse optimization by which CoSpar infers the transition map.

In a model of stochastic differentiation, cells in a clone are distributed across states with a time-dependent density vector \(\vec{P}(t)\). A transition map \(T\) directly links clonal density profiles \(\vec{P}(t_{1,2})\) between time points:

\[\begin{equation} P_i(t_2 )= \sum_j P_j(t_1 )T_{ji}(t_1,t_2), \quad \quad \quad \text{Eq. (1)} \end{equation}\]

From multiple clonal observations, our goal is to learn \(T\). To do so, we consider each observed cell transcriptome as a distinct state (\(\vec{P}(t)\in R^{N_t}\)) for \(N_t\) cells profiled at time \(t\)), and introduce \(S(t)\in R^{N_t\times N_t}\) as a matrix of cell-cell similarity over all observed cell states, including those lacking clonal information. Denoting \(I(t)\in \{0,1\}^{M\times N_t}\) as a clone-by-cell matrix of \(M\) clonal barcodes, the density profiles of observed clones \(P(t)\in R^{M\times N_t}\) are estimated as \(P(t)\approx I(t)S(t)\). In matrix form, the constraint in Eq. (1) from all observed clones then becomes \(P(t_2)\approx P(t_1)T(t_1,t_2)\).

Since the matrices \(P(t_{1,2})\) are determined directly from data, with enough information \(T(t_1,t_2)\) could be learnt by matrix inversion. However, in most cases, the number of clones is far less than the number of states. To constrain the map, we require that: 1) \(T\) is a sparse matrix; 2) \(T\) is locally coherent; and 3) \(T\) is a non-negative matrix. With these requirements, the inference becomes an optimization problem:

\[\begin{equation} \min_{T} ||T||_1+\alpha ||LT||_2, \; \text{s.t.} \; ||P(t_2)- P(t_1) T(t_1,t_2)||_{2}\le\epsilon;\; T\ge 0; \text{Normalization}. \end{equation}\]

Here, \(‖T‖_1\) quantifies the sparsity of the matrix T through its l-1 norm, while \(‖LT‖_2\) quantifies the local coherence of \(T\) (\(L\) is the Laplacian of the cell state similarity graph, and \(LT\) is the local divergence). The remaining constraints enforce the observed clonal dynamics, non-negativity of \(T\), and map normalization, respectively. At \(\alpha=0\), the minimization takes the form of Lasso, an algorithm for compressed sensing. Our formulation extends compressed sensing from vectors to matrices, and to enforce local coherence. The local coherence extension is reminiscent of the fused Lasso problem. An iterative, heuristic approach solves the CoSpar optimization efficiently, replacing \((\alpha,\epsilon)\) with parameters that explicitly control coherence and sparsity. See Wang et al. Nat. Biotech. (2022) for a detailed exposition of the method and its implementation.

CoSpar - dynamic inference by integrating transcriptome and lineage information

API

Import CoSpar as:

import cospar as cs

CoSpar is built around the AnnData object (usually called adata). For each cell, we store its RNA count matrix at adata.X, the gene names at adata.var_names,time information at adata.obs['time_info'], state annotation at adata.obs['state_info'], clonal information at adata.obsm['X_clone'], and 2-d embedding at adata.obsm['X_emb'].

Once the AnnData object is initialized via cs.pp.initialize_adata_object(), the typical flow of analysis is to 1) perform preprocessing and dimension reduction (cs.pp.*); 2) visualize and analyze clonal data alone (cs.pl.*); 3) infer transition map (cs.tmap.*); and 4) analyze inferred map (cs.tl.*) and then visualize the results with the plotting functions (cs.pl.*). Typically, each cs.tl.* function has a corresponding cs.pl.* function. We also provide several built-in datasets (cs.datasets.*) and miscellaneous functions to assist with the analysis (cs.hf.*). See tutorial for details.

Preprocessing

pp.initialize_adata_object([adata, X_state, …])

Initialized the AnnData object.

pp.get_highly_variable_genes(adata[, …])

Get highly variable genes.

pp.remove_cell_cycle_correlated_genes(adata)

Remove cell-cycle correlated genes.

pp.get_X_pca(adata[, n_pca_comp])

Get X_pca.

pp.get_X_emb(adata[, n_neighbors, umap_min_dist])

Get X_emb using scanpy.tl.umap()

pp.get_X_clone(adata, clone_data_cell_id, …)

Build the X_clone matrix from data.

pp.get_state_info(adata[, n_neighbors, …])

Update state_info using scanpy.tl.leiden()

pp.refine_state_info_by_marker_genes(adata, …)

Refine state info according to marker gene expression.

pp.refine_state_info_by_leiden_clustering(adata)

Refine state info by clustering states at given time points.

Transition map inference

tmap.infer_Tmap_from_multitime_clones(adata_orig)

Infer transition map for clonal data with multiple time points.

tmap.infer_Tmap_from_one_time_clones(adata_orig)

Infer transition map from clones with a single time point

tmap.infer_Tmap_from_state_info_alone(adata_orig)

Infer transition map from state information alone.

tmap.infer_Tmap_from_clonal_info_alone(…)

Compute transition map using only the lineage information.

Analysis

tl.clonal_fate_bias(adata[, selected_fate, …])

Compute clonal fate bias towards a cluster.

tl.fate_biased_clones(adata, selected_fate)

Find clones that significantly biased towards a given fate

tl.fate_coupling(adata[, selected_fates, …])

Compute fate coupling determined by the transition map.

tl.fate_hierarchy(adata[, selected_fates, …])

Build fate hierarchy or lineage trees

tl.fate_map(adata[, selected_fates, source, …])

Compute transition probability to given fate/ancestor clusters.

tl.fate_potency(adata[, selected_fates, …])

Quantify how multi-potent a cell state is.

tl.fate_bias(adata[, selected_fates, …])

Compute fate bias to given two fate clusters (A, B).

tl.progenitor(adata[, selected_fates, …])

Identify trajectories towards/from two given clusters.

tl.iterative_differentiation(adata[, …])

Infer trajectory towards/from a cluster

tl.differential_genes(adata[, cell_group_A, …])

Perform differential gene expression analysis and plot top DGE genes.

Plotting

Clone analysis (clone visualization, clustering etc.)

pl.clones_on_manifold(adata[, …])

Plot clones on top of state embedding.

pl.barcode_heatmap(adata[, selected_times, …])

Plot barcode heatmap among different fate clusters.

pl.clonal_fate_bias(adata[, show_histogram, FDR])

Plot clonal fate bias towards a cluster.

pl.fate_coupling(adata[, source, color_bar, …])

Plot fate coupling determined by the transition map.

pl.fate_hierarchy(adata[, source, …])

Plot fate coupling determined by the transition map.

pl.clonal_fates_across_time(adata, …)

Scatter plot for clonal fate number across time point

pl.clonal_reports(adata[, selected_times])

Report the statistics of the clonal data.

pl.visualize_tree(input_tree[, …])

Visualize a tree structured in ete3 style.

Transition map analysis (fate bias etc.)

pl.single_cell_transition(adata, …[, …])

Plot transition probability from given initial cell states.

pl.fate_map(adata[, selected_fates, source, …])

Plot transition probability to given fate/ancestor clusters.

pl.fate_potency(adata[, source, …])

Plot fate potency.

pl.fate_bias(adata[, selected_fates, …])

Plot fate bias.

pl.progenitor(adata[, selected_fates, …])

Plot the progenitors of given fate clusters.

pl.iterative_differentiation(adata[, …])

Plot the progenitors of given fates across time points

pl.gene_expression_dynamics(adata, …[, …])

Plot gene trend along the inferred dynamic trajectory.

pl.fate_coupling(adata[, source, color_bar, …])

Plot fate coupling determined by the transition map.

pl.fate_hierarchy(adata[, source, …])

Plot fate coupling determined by the transition map.

General

pl.embedding(adata[, basis, color, cmap])

Scatter plot for user-specified embedding basis.

pl.embedding_genes(adata[, basis, color, …])

A test embedding method for plotting genes

pl.gene_expression_on_manifold(adata, …[, …])

Plot gene expression on the state manifold.

pl.gene_expression_heatmap(adata[, …])

Plot heatmap of gene expression within given clusters.

settings.set_figure_params([style, dpi, …])

Set resolution/size, styling and format of figures.

Datasets

datasets.hematopoiesis([data_des])

The hematopoiesis data set from

datasets.hematopoiesis_130K([data_des])

The hematopoiesis data set from

datasets.hematopoiesis_subsampled([data_des])

Top 15% most heterogeneous clones of the hematopoiesis data set from

datasets.hematopoiesis_Gata1_states([data_des])

All of the hematopoiesis data set from

datasets.lung([data_des])

The direct lung differentiation dataset from

datasets.reprogramming([data_des])

The reprogramming dataset from

datasets.reprogramming_Day0_3_28([data_des])

The reprogramming dataset from

datasets.synthetic_bifurcation([data_des])

Synthetic clonal dataset with static barcoding.

Help functions

hf.read(filename[, backed, sheet, ext, …])

Read file and return AnnData object.

hf.save_map(adata)

Save the adata and print the filename prefix.

hf.save_preprocessed_adata(adata[, data_des])

Save preprocessed adata.

hf.check_adata_structure(adata)

Check whether the adata has the right structure.

hf.check_available_choices(adata)

Check available parameter choices.

hf.update_time_ordering(adata[, …])

Update the ordering of time points at adata.uns[‘time_ordering’]

hf.update_data_description(adata[, …])

Update data_des, a string to distinguish different datasets

tl.get_normalized_covariance(data[, method])

Compute the normalized correlation of the data matrix.

hf.get_X_clone_with_reference_ordering(…)

Build the X_clone matrix from data.

Simulations

simulate.linear_differentiation_model([Nt1, …])

Simulate linear differentiation corrupted with barcode collision (See Fig.

simulate.bifurcation_model([progeny_N, t1, …])

Simulate bifurcation corrupted with clonal dispersion (See Fig.

simulate.quantify_correlation_with_ground_truth_fate_bias_BifurcationModel(…)

Quantify the correlation of an inferred fate bias with the actual one, using a map from the Bifurcation Model.

simulate.quantify_transition_peak_TPR_LinearDifferentiation(…)

Quantify the True positive rate of a transition map for linear differentiation model.

Release notes

v0.2.1

Major changes from v0.1.8 to v0.2.1:
  • Split each plotting function into two parts: computing the results (stored at cospar.tl.**) and actually plotting the result (stored at cospar.pl.**).

  • Update the notebooks to accomodate these changes.

  • Update the datasets in the cloud to add more annotations.

  • Re-organize the content of the plot, tool, and tmap modules.

  • Fix stochasticity when running HighVar method to generate the initialized map.

  • Fix generating X_clone from the cell_id-by-barcode_id list.

  • Add a few more functions: cospar.pl.clonal_fates_across_time(), cospar.pl.clonal_reports(), cospar.pl.embedding_genes(), cospar.tl.fate_biased_clones()

  • Update cospar.pl.barcode_heatmap() to order clones in a better way

  • Fix the docs.

  • Adopt “Raise ValueError” method for error handling.

  • Unify error checking at the beginning of several functions.

v0.1.8

This is used in running the notebooks that generate figures for the published paper. To run the original notebooks, you should switch to this version.

Installation

CoSpar requires Python 3.6 or later. We recommend using Miniconda for package management. For a computer that does not have a package management tool yet, please install Miniconda first, and activate it by running the following command in the terminal:

source ~/.bash_profile

PyPI

Install CoSpar from PyPI using:

pip install --upgrade cospar

If you get a Permission denied error, use pip install --upgrade cospar --user instead.

If you get errors related to ‘gcc’, try to specify the following gcc path for installation:

env CXX=/usr/local/Cellar/gcc/8.2.0/bin/g++-8 CC=/usr/local/Cellar/gcc/8.2.0/bin/gcc-8 pip install  cospar

If you get errors for version conflicts with existing packages, try:

pip install --ignore-installed --upgrade cospar

Development Version

To work with the latest development version, install from GitHub using:

pip install git+https://github.com/AllonKleinLab/cospar

or:

git clone https://github.com/AllonKleinLab/cospar
pip install -e cospar

-e is short for --editable and links the package to the original cloned location such that pulled changes are also reflected in the environment.

Dependencies

Jupyter Notebook

To run the tutorials in a notebook locally, please install:

conda install notebook

and run jupyter notebook in the terminal. If you get the error Not a directory: 'xdg-settings', use jupyter notebook --no-browser instead and open the url manually (or use this bugfix).

If you run into issues, do not hesitate to approach us or raise a GitHub issue.

Testing CoSpar in a new environment

In case you want to test cospar without affecting existing python packages, you can create a new conda environment and install CoSpar there:

conda create -n test_cospar python=3.6
conda activate test_cospar
pip install cospar

Now, install jupyter notebook in this environment:

pip install --user ipykernel

If you encounter an error related to nbconvert, run (this is optional):

pip3 install --upgrade --user nbconvert

Finally, install the jupyter notebook kernel related to this environment:

python -m ipykernel install --user --name=test_cospar

Now, you can open jupyter notebook by running jupyter notebook in the terminal, and select the kernel test_cospar to run CoSpar.

Getting Started

Here, we explain the basics of using CoSpar. CoSpar requires the count matrix not log-transformed. This is specifically assumed in selecting highly variable genes, in computing PCA, and in the HighVar method for initializing the joint optimization using a single clonal time point. CoSpar also assumes that the dataset has more than one time point. However, if you have only a snapshot, you can still manually cluster the cells into more than one time point to use CoSpar.

First, import CoSpar with:

import cospar as cs

For better visualization you can change the matplotlib settings to our defaults with:

cs.settings.set_figure_params()

If you want to adjust parameters for a particular plot, just pass the parameters into this function.

The workflow of CoSpar is summarized by the following illustration:

https://user-images.githubusercontent.com/4595786/145308761-a6532c6b-ac5b-4457-a00e-4a0f3972a360.png

Also, below is a summary of the main analyses after we infer the transition map, and its connection with the mathematical formulation in Wang et al. Nat. Biotech. (2022).

https://user-images.githubusercontent.com/4595786/161853386-04126382-6a9a-4817-b6a8-e5e950977357.jpg

Initialization

Given the gene expression matrix, clonal matrix, and other information, initialize the anndata object using:

adata_orig = cs.pp.initialize_adata_object(adata=None,**params)

The AnnData object adata_orig stores the count matrix (adata_orig.X), gene names (adata_orig.var_names), and temporal annotation of cells (adata_orig.obs['time_info']). Optionally, you can also provide the clonal matrix X_clone, selected PCA matrix X_pca, the embedding matrix X_emb, and the state annotation state_info, which will be stored at adata_orig.obsm['X_clone'], adata_orig.obsm['X_pca'], adata_orig.obsm['X_emb'], and adata_orig.obs['state_info'], respectively.

If an adata object is provided as an input, the initialization function will try to automatically generate the correct data structure, and all annotations associated with the provided adata will remain intact. You can add new annotations to supplement or override existing annotations in the adata object.

If you do not have a dataset yet, you can still play around using one of the built-in datasets, e.g.:

adata_orig = cs.datasets.hematopoiesis_subsampled()

Preprocessing & dimension reduction

Assuming basic quality control (excluding cells with low read count etc.) have been done, we provide basic preprocessing (gene selection and normalization) and dimension reduction related analysis (PCA, UMAP embedding etc.) at cs.pp.*:

cs.pp.get_highly_variable_genes(adata_orig,**params)
cs.pp.remove_cell_cycle_correlated_genes(adata_orig,**params)
cs.pp.get_X_pca(adata_orig,**params)
cs.pp.get_X_emb(adata_orig,**params)
cs.pp.get_state_info(adata_orig,**params)
cs.pp.get_X_clone(adata_orig,**params)

The first step get_highly_variable_genes also includes count matrix normalization. The second step, which is optional but recommended, removes cell cycle correlated genes among the selected highly variable genes. In get_X_pca, we apply z-score transformation for each gene expression before computing the PCA. In get_X_emb, we simply use the umap function from scanpy. With get_state_info, we extract state information using leiden clustering implemented in scanpy. In get_X_clone, we faciliate the conversion of the raw clonal data into a cell-by-clone matrix. As mentioned before, this preprocessing assumes that the count matrix is not log-transformed.

Basic clonal analysis

We provide a few plotting functions to help visually exploring the clonal data before any downstream analysis. You can visualize clones on state manifold directly:

cs.pl.clones_on_manifold(adata_orig,**params)

You can generate the barcode heatmap across given clusters to inspect clonal behavior:

cs.pl.barcode_heatmap(adata_orig,**params)

You can quantify the clonal coupling across different fate clusters:

cs.tl.fate_coupling(adata_orig,source='X_clone',**params)
cs.pl.fate_coupling(adata_orig,source='X_clone',**params)

Strong coupling implies the existence of bi-potent or multi-potent cell states at the time of barcoding. You can visualize the fate hierarchy by a simple neighbor-joining method:

cs.tl.fate_hierarchy(adata_orig,source='X_clone',**params)
cs.pl.fate_hierarchy(adata_orig,source='X_clone',**params)

Finally, you can infer the fate bias \(-log_{10}(P_{value})\) of each clone towards a designated fate cluster:

cs.pl.clonal_fate_bias(adata_orig,**params)

A biased clone towards this cluster has a statistically significant cell fraction within or outside this cluster.

Transition map inference

The core of the software is efficient and robust inference of a transition map by integrating state and clonal information. If the dataset has multiple clonal time points, you can run:

adata=cs.tmap.infer_Tmap_from_multitime_clones(adata_orig,clonal_time_points=None,later_time_point=None,**params)

It subsamples the input data at selected time points and computes the transition map, stored at adata.uns['transition_map'] and adata.uns['intraclone_transition_map'], with the latter restricted to intra-clone transitions. Depending on later_time_point, it has two modes of inference:

  1. When later_time_point=None, it infers a transition map between neighboring time points. For example, for clonal_time_points=[‘day1’, ‘day2’, ‘day3’], it computes transitions for pairs (‘day1’, ‘day2’) and (‘day2’, ‘day3’), but not for (‘day1’, ‘day3’).

  2. If later_time_point is specified, it generates a transition map between this time point and each of the earlier time points. In the previous example, if later_time_point=='day3', we infer transitions for pairs (‘day1’, ‘day3’) and (‘day2’, ‘day3’). This applies to the following map inference functions.


If the dataset has only one clonal time point, you can run:

adata=cs.tmap.infer_Tmap_from_one_time_clones(adata_orig,initial_time_points=None, later_time_point=None,initialize_method='OT',**params)

which jointly optimizes the transition map and the initial clonal structure. It requires initializing the transition map using state information alone. We provide two methods for such initialization: 1) OT for using the standard optimal transport approach; 2) HighVar for a customized approach, assuming that cells similar in gene expression across time points share clonal origin. For the OT method, if you wish to utilize the growth rate information as Waddington-OT, you can directly pass the growth rate estimate for each cell to the input AnnaData object at adata_orig.obs["cell_growth_rate"]. Depending on the choice, the initialized map is stored at adata.uns['OT_transition_map'] or adata.uns['HighVar_transition_map']. The final product is stored at adata.uns['transition_map'].

HighVar converts highly variable genes into pseudo multi-time clones and infers a putative map with coherent sparse optimization. We find the HighVar method performs better than the OT method, especially when there are large differentiation effects over the observed time window, or batch effects.

If initial_time_points and later_time_point are not specified, a map with transitions from all time points to the last time point is generated.


If you do not have any clonal information, you can still run:

adata=cs.tmap.infer_Tmap_from_state_info_alone(adata_orig,initial_time_points=None,later_time_point=None,initialize_method='OT',**params)

It is the same as cs.tmap.infer_Tmap_from_one_time_clones except that we assume a pseudo clonal data where each cell at the later time point occupies a unique clone.


We also provide simple methods that infer transition map from clonal information alone:

adata=cs.tmap.infer_Tmap_from_clonal_info_alone(adata_orig,clonal_time_points=None,later_time_point=None,**params)

The result is stored at adata.uns['clonal_transition_map'].

Analysis and visualization

Finally, each of the computed transition maps can be explored on state embedding at the single-cell level using a variety of analysis and plotting functions. There are some common parameters: 1) source, for choosing one of the pre-computed transition maps (or the raw clonal data) for analysis; 2) selected_fates, for visualizing the fate bias towards/against given fate clusters; 3) map_backward, for analyzing forward or backward transitions; 4) method, for different methods in fate probability analysis. See CoSpar basics for more details.

Below, we frame the task in the language of analyzing backward transitions for convenience. To see where a cell came from, run:

cs.pl.single_cell_transition(adata,**params)

To visualize the fate probability of initial cell states, run:

cs.tl.fate_map(adata,**params)
cs.pl.fate_map(adata,**params)

To infer the fate bias of initial cell states between two fate clusters, run:

cs.tl.fate_bias(adata,**params)
cs.pl.fate_bias(adata,**params)

To infer the dynamic trajectory towards given fate clusters, run:

cs.tl.progenitor(adata,**params)
cs.pl.progenitor(adata,**params)

or, alternatively if you have data with multiple clonal time points, run:

cs.tl.iterative_differentiation(adata,**params)
cs.pl.iterative_differentiation(adata,**params)

The first method (cs.tl.progenitor) assumes two input fate clusters and infers each trajectory by thresholding the corresponding fate bias. The second method (cs.tl.iterative_differentiation) infers the trajectory by iteratively tracing a selected fate cluster all the way back to its putative origin at the initial time point. For both methods, the inferred trajectory for each fate will be saved at adata.obs[f'diff_trajectory_{source}_{fate_name}'], and we can explore the gene expression dynamics along this trajectory using:

cs.pl.gene_expression_dynamics(adata,**params)

Additionally, the first method (cs.pl.progenitor) exports the selected ancestor states selected fate clusters at adata.obs[f'progenitor_{source}_{fate_name}'], which can be used to infer the driver genes for fate bifurcation by running:

cs.pl.differential_genes(adata,**params)

If there are multiple mature fate clusters, you can infer their differentiation coupling from the fate probabilities of initial cells or the raw clonal matrix by:

cs.tl.fate_coupling(adata,source='transition_map',**params)
cs.pl.fate_coupling(adata,source='transition_map',**params)

You can also infer the fate hierarchy from:

cs.tl.fate_hierarchy(adata,source='transition_map',**params)
cs.pl.fate_hierarchy(adata,source='transition_map',**params)

Loading data

We provide a simple guide in how to load your own data and initialize the adata object for downstream analysis. This tutorial assumes that the files are either in txt or mtx format. After running this pipeline, please see the folder test_data for how exactly each data is stored.

The first step to use CoSpar is to construct an anndata object that stores all relevant data. The key annotations after initialization are:

  • adata.X: state count matrix, shape (n_cell, n_gene). This should not be log-normalized.

  • adata.var_names: list of gene names, shape (n_genes,).

  • adata.obs[‘time_info’]: time annotation (type: str) for each cell, shape (n_cell,).

  • adata.obs[‘state_info’]: state annotation for each cell, shape (n_cell, 1). [Optional. Can be generated in preprocessing.

  • adata.obsm[‘X_clone’]: clonal labels for each cell in the form of np.array or sparse matrix, shape (n_cell, n_clone).

  • adata.obsm[‘X_pca’]: PCA matrix, shape (n_cell, n_pcs). [Optional. Can be generated in preprocessing.

  • adata.obsm[‘X_emb’]: two-dimensional embedding, shape (n_cell, 2). [Optional. Can be generated in preprocessing.

cs.pp.initialize_adata_object assists this initialization.

[1]:
import cospar as cs
import pandas as pd
import scipy.io as sio
import numpy as np
import os
[2]:
cs.logging.print_version()

# Set the messaging level. At a given value, a running function will
# print information at or below its level.
cs.settings.verbosity = 2  # range: 0 (error),1 (warning),2 (info),3 (hint).

# Plot setting. If you want to control a particular plot,
# just change the setting here, and run that plotting function.
cs.settings.set_figure_params(
    format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=3
)
Running cospar 0.2.0 (python 3.8.12) on 2022-02-08 19:25.

Each dataset should have its folder to avoid conflicts.

[3]:
# set the directory for figures and data. If not existed yet, they will be created automtaically.
cs.settings.data_path = "test_data"  # This name should be exactly 'text_data'
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()

cs.datasets.raw_data_for_import_exercise()  # download the data

Import method A

Assuming that the X_clone data is stored in mtx format

[4]:
df_cell_id = pd.read_csv(os.path.join(cs.settings.data_path, "cell_id.txt"))
df_cell_id.head()
[4]:
Cell_ID
0 cell_10
1 cell_13
2 cell_18
3 cell_32
4 cell_70
[5]:
df_gene = pd.read_csv(os.path.join(cs.settings.data_path, "gene_names.txt"))
df_gene.head()
[5]:
gene_names
0 0610009L18Rik
1 0610037L13Rik
2 1110012L19Rik
3 1110020A21Rik
4 1110028F11Rik
[6]:
df_time = pd.read_csv(os.path.join(cs.settings.data_path, "time_info.txt"))
df_time.head()
[6]:
time_info
0 6
1 6
2 6
3 6
4 6
[7]:
RNA_count_matrix = sio.mmread(
    os.path.join(cs.settings.data_path, "gene_expression_count_matrx.mtx")
)
RNA_count_matrix
[7]:
<781x2481 sparse matrix of type '<class 'numpy.float64'>'
        with 225001 stored elements in COOrdinate format>
[8]:
X_clone = sio.mmread(os.path.join(cs.settings.data_path, "cell_by_clone_matrx.mtx"))
X_clone
[8]:
<781x339 sparse matrix of type '<class 'numpy.int64'>'
        with 781 stored elements in COOrdinate format>
[9]:
df_state = pd.read_csv(os.path.join(cs.settings.data_path, "state_info.txt"))
df_state.head()
[9]:
state_info
0 Neutrophil
1 Baso
2 Monocyte
3 Monocyte
4 Baso
[10]:
df_X_emb = pd.read_csv(os.path.join(cs.settings.data_path, "embedding.txt"))
df_X_emb.head()
[10]:
x y
0 1165.708 -2077.047
1 -980.871 -2055.629
2 2952.953 281.797
3 1314.821 -560.760
4 -1067.502 -1275.488
[11]:
X_emb = np.array([df_X_emb["x"], df_X_emb["y"]]).T

Now, initialize the adata object

[12]:
adata_orig = cs.pp.initialize_adata_object(
    X_state=RNA_count_matrix,
    gene_names=df_gene["gene_names"],
    cell_names=df_cell_id["Cell_ID"],
    time_info=df_time["time_info"],
    X_clone=X_clone,
    state_info=df_state["state_info"],
    X_emb=X_emb,
    data_des="my_data",
)
Create new anndata object
WARNING: X_pca not provided. Downstream processing is needed to generate X_pca before computing the transition map.
Time points with clonal info: ['2' '4' '6']
WARNING: Default ascending order of time points are: ['2' '4' '6']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.

Update the time ordering. A correct time ordering is assumed later.

[13]:
cs.hf.update_time_ordering(adata_orig, updated_ordering=["2", "4", "6"])
[14]:
adata_orig
[14]:
AnnData object with n_obs × n_vars = 781 × 2481
    obs: 'time_info', 'state_info'
    uns: 'data_des', 'time_ordering', 'clonal_time_points'
    obsm: 'X_clone', 'X_emb'

Alternatively, you can initialize the object by building on an existing adata object. This will keep existing annotations from the old adata. You can make modifications by specifying additional entries.

[15]:
adata_orig_new = cs.pp.initialize_adata_object(adata=adata_orig, X_clone=X_clone)
WARNING: X_pca not provided. Downstream processing is needed to generate X_pca before computing the transition map.
Time points with clonal info: ['2' '4' '6']
WARNING: Default ascending order of time points are: ['2' '4' '6']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
[16]:
cs.pl.embedding(adata_orig, color="state_info")
_images/20210602_loading_data_23_0.png

Import method B

Assuming that the X_clone data is stored in a (cell_id, barcode_id) format

First, initialize the adata without concerning the clonal data

[17]:
df_cell_id = pd.read_csv(os.path.join(cs.settings.data_path, "cell_id.txt"))
df_gene = pd.read_csv(os.path.join(cs.settings.data_path, "gene_names.txt"))
df_time = pd.read_csv(os.path.join(cs.settings.data_path, "time_info.txt"))
RNA_count_matrix = sio.mmread(
    os.path.join(cs.settings.data_path, "gene_expression_count_matrx.mtx")
)
df_state = pd.read_csv(os.path.join(cs.settings.data_path, "state_info.txt"))
df_X_emb = pd.read_csv(os.path.join(cs.settings.data_path, "embedding.txt"))
X_emb = np.array([df_X_emb["x"], df_X_emb["y"]]).T
[18]:
adata_orig = cs.pp.initialize_adata_object(
    X_state=RNA_count_matrix,
    gene_names=df_gene["gene_names"],
    cell_names=df_cell_id["Cell_ID"],
    time_info=df_time["time_info"],
    state_info=df_state["state_info"],
    X_emb=X_emb,
    data_des="my_data",
)
Create new anndata object
WARNING: X_pca not provided. Downstream processing is needed to generate X_pca before computing the transition map.
Time points with clonal info: []
WARNING: Default ascending order of time points are: ['2' '4' '6']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.

Now, load the clonal data. Here, we start with a table of (cell_id, clone_id) pair. The cell_id does not need to be ranked. It should match the cell_id in the adata_orig.obs_names

[19]:
df_X_clone = pd.read_csv(
    os.path.join(cs.settings.data_path, "clonal_data_in_table_format.txt")
)
df_X_clone
[19]:
Cell_ID Clone_ID
0 cell_0 clone_275
1 cell_1 clone_329
2 cell_2 clone_56
3 cell_3 clone_236
4 cell_4 clone_213
... ... ...
776 cell_776 clone_196
777 cell_777 clone_239
778 cell_778 clone_259
779 cell_779 clone_217
780 cell_780 clone_259

781 rows × 2 columns

[20]:
adata_orig.obs_names
[20]:
Index(['cell_10', 'cell_13', 'cell_18', 'cell_32', 'cell_70', 'cell_80',
       'cell_90', 'cell_97', 'cell_108', 'cell_117',
       ...
       'cell_7389', 'cell_7392', 'cell_7402', 'cell_7407', 'cell_7409',
       'cell_7417', 'cell_7423', 'cell_7435', 'cell_7436', 'cell_7437'],
      dtype='object', length=781)
[21]:
cs.pp.get_X_clone(adata_orig, df_X_clone["Cell_ID"], df_X_clone["Clone_ID"])
100%|██████████| 781/781 [00:00<00:00, 27474.68it/s]
[22]:
adata_orig
[22]:
AnnData object with n_obs × n_vars = 781 × 2481
    obs: 'time_info', 'state_info'
    uns: 'data_des', 'time_ordering', 'clonal_time_points', 'clone_id'
    obsm: 'X_clone', 'X_emb'
[23]:
cs.pl.embedding(adata_orig, color="state_info")
_images/20210602_loading_data_33_0.png

Preprocessing

Note that this step is optional. If the data do not yet have one of X_pca, X_emb, or state_info, you will need to run the preprocessing and dimension reduction.

[1]:
import cospar as cs
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2  # range: 0 (error),1 (warning),2 (info),3 (hint).
cs.settings.set_figure_params(
    format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=3
)
Running cospar 0.2.0 (python 3.8.12) on 2022-02-08 19:31.
[3]:
# Each dataset should have its folder to avoid conflicts.
cs.settings.data_path = "data_cospar"
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()

Load an existing dataset. (If you have pre-processed data, you can load it with cs.hf.read(file_name).)

[4]:
adata_orig = cs.datasets.hematopoiesis_subsampled()

Dimension reduction

Select highly variable genes. Before this, there is count normalization. It requires that the count matrix is NOT log-normalized.

[5]:
cs.pp.get_highly_variable_genes(
    adata_orig,
    normalized_counts_per_cell=10000,
    min_counts=3,
    min_cells=3,
    min_gene_vscore_pctl=90,
)
Finding highly variable genes...
_images/20211010_preprocessing_8_1.png
Keeping 1615 genes

Compute for each gene its correlation with a set of cell cycle genes. The default cell cycle genes are for mouse. You need to use your own genes for a different species. This step is optional, but recommended.

[6]:
cs.pp.remove_cell_cycle_correlated_genes(
    adata_orig,
    cycling_gene_list=[
        "Ube2c",
        "Hmgb2",
        "Hmgn2",
        "Tuba1b",
        "Ccnb1",
        "Tubb5",
        "Top2a",
        "Tubb4b",
    ],
)
adata.var['highly_variable'] not updated.
Please choose corr_threshold properly, and set confirm_change=True
_images/20211010_preprocessing_10_1.png

Now, confirm the change at a specific cutoff threshold.

[7]:
cs.pp.remove_cell_cycle_correlated_genes(
    adata_orig, corr_threshold=0.2, confirm_change=True
)
Number of selected non-cycling highly variable genes: 1508
Remove 107 cell cycle correlated genes.
adata.var['highly_variable'] updated

Compute the X_pca, X_emb. X_pca will be used to build the similarity matrix later. X_emb is only used for visualization. You can also pass your favorite embedding directly to adata.obsm['X_emb'].

[8]:
cs.pp.get_X_pca(adata_orig, n_pca_comp=40)
cs.pp.get_X_emb(adata_orig, n_neighbors=20, umap_min_dist=0.3)
WARNING: get_X_pca assumes that the count matrix adata.X is NOT log-transformed.
/Users/shouwenwang/miniconda3/envs/CoSpar_test/lib/python3.8/site-packages/sklearn/utils/validation.py:585: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
  warnings.warn(
/Users/shouwenwang/miniconda3/envs/CoSpar_test/lib/python3.8/site-packages/sklearn/utils/validation.py:585: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
  warnings.warn(

Note that previous X_pca and X_emb (if exists) will be kept and renamed as X_pca_old and X_emb_old

[9]:
adata_orig
[9]:
AnnData object with n_obs × n_vars = 7438 × 25289
    obs: 'time_info', 'state_info', 'n_counts'
    var: 'highly_variable', 'highly_variable_old'
    uns: 'clonal_time_points', 'data_des', 'neighbors', 'state_info_colors', 'time_ordering', 'umap'
    obsm: 'X_clone', 'X_emb', 'X_emb_old', 'X_pca', 'X_pca_old', 'X_umap'
    obsp: 'connectivities', 'distances'
[10]:
cs.pl.embedding(adata_orig, color="state_info")
_images/20211010_preprocessing_17_0.png

State annotation

This generates adata_orig.obs['state_info'].

[11]:
cs.pp.get_state_info(adata_orig, n_neighbors=20, resolution=0.5)
[12]:
cs.pl.embedding(adata_orig, color="state_info")
_images/20211010_preprocessing_20_0.png

Similarly, the pre-existing state_info will be renamed

[13]:
adata_orig
[13]:
AnnData object with n_obs × n_vars = 7438 × 25289
    obs: 'time_info', 'state_info', 'n_counts', 'leiden', 'state_info_old'
    var: 'highly_variable', 'highly_variable_old'
    uns: 'clonal_time_points', 'data_des', 'neighbors', 'state_info_colors', 'time_ordering', 'umap', 'leiden'
    obsm: 'X_clone', 'X_emb', 'X_emb_old', 'X_pca', 'X_pca_old', 'X_umap'
    obsp: 'connectivities', 'distances'
Refine state annotation by marker genes

The goal here is to refine adata_orig.obs['state_info']. In this method, a state is selected if it expresses all genes in the list of marker_genes, and the expression is above the relative threshold express_threshold. You can also specify which time point you want to focus on. In addition, we also include cell states neighboring to these valid states to smooth the selection (controlled by add_neighbor_N). First, explore parameters to find satisfactory annotation.

[14]:
confirm_change = False
marker_genes = ["Mpo", "Elane", "S100a8"]
cs.pp.refine_state_info_by_marker_genes(
    adata_orig,
    marker_genes,
    express_threshold=0.1,
    selected_times=["4"],
    new_cluster_name="new",
    add_neighbor_N=10,
    confirm_change=confirm_change,
)
_images/20211010_preprocessing_25_0.png

Once you are happy with the result, set confirm_change=True to confirm changes to adata.obs['state_info'].

[15]:
confirm_change = True
marker_genes = ["Mpo", "Elane", "S100a8"]
cs.pp.refine_state_info_by_marker_genes(
    adata_orig,
    marker_genes,
    express_threshold=0.1,
    selected_times=["4"],
    new_cluster_name="new",
    add_neighbor_N=10,
    confirm_change=confirm_change,
)
Change state annotation at adata.obs['state_info']
_images/20211010_preprocessing_27_1.png
_images/20211010_preprocessing_27_2.png
Refine state annotation by clustering states at given time points

First, explore the parameters.

[16]:
confirm_change = False
cs.pp.refine_state_info_by_leiden_clustering(
    adata_orig,
    selected_times=["2"],
    n_neighbors=20,
    resolution=0.5,
    confirm_change=confirm_change,
)
_images/20211010_preprocessing_30_0.png

Once you are happy with the result, set confirm_change=True to confirm changes to adata.obs['state_info'].

Miscellaneous

[17]:
cs.hf.check_available_choices(adata_orig)
Available transition maps: []
Available clusters: ['7', '2', '8', '4', '1', '5', '6', '3', '0', '9', 'new', '10']
Available time points: ['2' '4' '6']
Clonal time points: ['2' '4' '6']

You can choose to save preprocessed data. It can be loaded using cs.hf.read(file_name).

[18]:
# cs.hf.save_preprocessed_adata(adata_orig)

Clonal analysis

[1]:
import cospar as cs
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2  # range: 0 (error),1 (warning),2 (info),3 (hint).
cs.settings.set_figure_params(
    format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=3
)
Running cospar 0.2.0 (python 3.8.12) on 2022-01-30 14:45.
[3]:
# Each dataset should have its folder to avoid conflicts.
cs.settings.data_path = "data_cospar"
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()

Load an existing dataset. (If you have pre-processed data, you can load it with cs.hf.read(file_name).)

[4]:
adata_orig = cs.datasets.hematopoiesis_subsampled()

Show barcode heatmap as aggregated into given fate clusters (defined in adata_orig.obs['state_info'])

[5]:
selected_times = None
selected_fates = [
    "Ccr7_DC",
    "Mast",
    "Meg",
    "pDC",
    "Eos",
    "Lymphoid",
    "Erythroid",
    "Baso",
    "Neutrophil",
    "Monocyte",
]
celltype_names = ["mDC", "Mast", "Meg", "pDC", "Eos", "Ly", "Ery", "Baso", "Neu", "Mon"]
cs.pl.barcode_heatmap(
    adata_orig,
    selected_times=selected_times,
    selected_fates=selected_fates,
    color_bar=True,
    rename_fates=celltype_names,
    log_transform=False,
    binarize=True,
)
Data saved at adata.uns['barcode_heatmap']
[5]:
<AxesSubplot:>
_images/20211010_clonal_analysis_7_2.png

Fate coupling in the underlying clonal data, defined in our package as the normalized barcode covariance between cells annotated in different fates.

[6]:
cs.tl.fate_coupling(
    adata_orig, selected_fates=selected_fates, source="X_clone"
)  # compute the fate coupling
cs.pl.fate_coupling(adata_orig, source="X_clone")  # actually plot the coupling
Results saved as dictionary at adata.uns['fate_coupling_X_clone']
[6]:
<AxesSubplot:title={'center':'source: X_clone'}>
_images/20211010_clonal_analysis_9_2.png

Fate hierarchy constructed from fate coupling of the underlying clonal data, using the neighbor-joining method.

[7]:
cs.tl.fate_hierarchy(
    adata_orig, selected_fates=selected_fates, source="X_clone"
)  # compute the fate hierarchy
cs.pl.fate_hierarchy(adata_orig, source="X_clone")  # actually plot the hierarchy
Results saved as dictionary at adata.uns['fate_hierarchy_X_clone']

            /-Lymphoid
         /-|
        |   \-Ccr7_DC
      /-|
     |  |   /-Monocyte
     |   \-|
     |      \-Neutrophil
   /-|
  |  |         /-Baso
  |  |      /-|
  |  |   /-|   \-Mast
  |  |  |  |
--|   \-|   \-Eos
  |     |
  |     |   /-Erythroid
  |      \-|
  |         \-Meg
  |
   \-pDC

Next, we compute the clonal fate bias, -log(Q-value). We calculated a P-value that that a clone is enriched (or depleted) in a fate, using Fisher-Exact test (accounting for clone size). The P-value is then corrected to give a Q-value by Benjamini-Hochberg procedure. The alternative hypothesis options are: {‘two-sided’,’greater’,’less’}. The default is ‘two-sided’.

[8]:
cs.tl.clonal_fate_bias(
    adata_orig, selected_fate="Monocyte", alternative="two-sided"
)  # compute the fate hierarchy
cs.pl.clonal_fate_bias(adata_orig)  # actually plot the hierarchy
100%|██████████| 500/500 [00:01<00:00, 442.37it/s]
Data saved at adata.uns['clonal_fate_bias']
_images/20211010_clonal_analysis_13_2.png
_images/20211010_clonal_analysis_13_3.png
[9]:
result = adata_orig.uns["clonal_fate_bias"]
result
[9]:
Clone_ID Clone_size Q_value Fate_bias clonal_fraction_in_target_fate
0 488 58 1.037416e-20 19.984047 0.896552
1 387 37 1.608291e-16 15.793635 0.945946
2 227 45 1.665465e-15 14.778465 0.866667
3 162 42 6.210001e-13 12.206908 0.833333
4 302 112 6.210001e-13 12.206908 0.000000
... ... ... ... ... ...
495 366 9 1.000000e+00 -0.000000 0.222222
496 46 18 1.000000e+00 -0.000000 0.222222
497 408 27 1.000000e+00 -0.000000 0.259259
498 6 4 1.000000e+00 -0.000000 0.250000
499 463 3 1.000000e+00 -0.000000 0.333333

500 rows × 5 columns

Illustrate some most biased clones.

[14]:
ids = result["Clone_ID"][:2]
cs.pl.clones_on_manifold(
    adata_orig,
    selected_clone_list=ids,
    color_list=["black", "red", "blue"],
    clone_markersize=15,
)
_images/20211010_clonal_analysis_16_0.png
_images/20211010_clonal_analysis_16_1.png

Transition map inference

Below, we illustrate transition map inference from different methods in cospar. Once computed, they can be analyzed and visualized later.

[1]:
import cospar as cs
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2  # range: 0 (error),1 (warning),2 (info),3 (hint).
cs.settings.set_figure_params(
    format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=3
)
Running cospar 0.2.0 (python 3.8.12) on 2022-02-08 19:17.
[3]:
# Each dataset should have its folder to avoid conflicts.
cs.settings.data_path = "data_cospar"
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()

Load an existing dataset. (If you have pre-processed data, you can load it with cs.hf.read(file_name).)

[4]:
adata_orig = cs.datasets.hematopoiesis_subsampled()

Transition map from clones across multiple time points

This method utilizes clones with multiple time points. It is the core algorithm for the CoSpar package. The first time it runs, it will compute the similarity matrix at different smooth depth and save them. Then, it will infer the transition map. Some key parameters:

  • clonal_time_points (list; default: all): List of time points with clonal data that you want to perform inference.

  • later_time_point (str; default: None): If not set, CoSpar learns a transition map between neighboring time points specified by clonal_time_points, i.e., transitions from day 2 to day 4, and from day 4 to day 6 if clonal_time_points=[‘2’,’4’,’6’]. Ohterwise, it learns transitions from any of clonal_time_points to the later_time_point.

  • smooth_array (list; default: [15,10,5]): List of smooth depth at initial runs of iteration. Suppose that it has a length N. For iteration nN, we use the last entry of smooth_array to compute the S matrix. We recommend starting with more smoothing depth and gradually reduce the depth, as inspired by simulated annealing. Data with higher clonal dispersion should start with higher smoothing depth. The final depth should depend on the manifold itself. For fewer cells, it results in a small KNN graph, and a small final depth should be used. We recommend to use a number at the multiple of 5 for computational efficiency i.e., smooth_array=[20, 15, 10, 5], or [20,15,10].

  • sparsity_threshold (float; default: 0.1): sparsity threshold to remove spurious transitions in the updated transition map, in the range [0,1].

  • max_iter_N (int; default: 5): maximum number of iterations in refining the transition map. We found that 3 iterations are sufficient for tested datasets.

  • epsilon_converge (float; default: 0.05): Iteration is considered converged when the correlation R between maps from consecutive iteration satisfies: R>1-epsilon_converge.

It takes ~3 mins to run for the first time, and ~20 s for later runs.

[5]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
    adata_orig,
    clonal_time_points=["2", "4"],
    later_time_point="6",
    smooth_array=[20, 15, 10, 5],
    sparsity_threshold=0.1,
    intraclone_threshold=0.2,
    max_iter_N=10,
    epsilon_converge=0.01,
)
Trying to set attribute `.uns` of view, copying.
------Compute the full Similarity matrix if necessary------
------Infer transition map between initial time points and the later time one------
--------Current initial time point: 2--------
Step 1: Select time points
Number of multi-time clones post selection: 185
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.942
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
--------Current initial time point: 4--------
Step 1: Select time points
Number of multi-time clones post selection: 500
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.956
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
-----------Total used time: 19.377027988433838 s ------------

We can see that after this step, we have two maps: 'transition_map' and 'intraclone_transition_map'. Both of them can be used for downstream analysis.

[6]:
cs.hf.check_available_map(adata)
adata.uns["available_map"]
[6]:
['transition_map', 'intraclone_transition_map']

Transition map from end-point clones, or without clones

After initializing the map by either the OT or HighVar method, We jointly infer the likely clonal ancestors and the corresponding transition map between given time points. You need to choose the initialization method and set the corresponding parameters.

  • initialize_method=OT: using optimal transport. Key parameters:

    • OT_epsilon (float; default: 0.02): the entropic regularization

    • OT_cost ({GED, SPD}, default: SPD): method for computing the cost function. GED uses simple gene expression distances; it is faster. SPD uses the shortest path distances; it is slower but often more accurate.

  • initialize_method=HighVar: HighVar assumes that cells similar in gene expression across time points share clonal origin. It converts highly variable genes into pseudo multi-time clones and runs cs.tmap.infer_Tmap_from_multitime_clones to construct the map. We find HighVar performs better than OT, especially when there are large differentiation effects over the observed time window, or batch effects.

    • HighVar_gene_pctl (int; default: 85): percentile threshold to select highly variable genes. Range: [0,100]. A higher value selects more variable genes.

  • max_iter_N (list; default: [3,5]): a list of two entries, representing the maximum iteration number for joint optimization and CoSpar core function, respectively.

  • epsilon_converge (list; default: [0.05,0.05]), a list of two entries, indicating the map convergence threshold for joint optimization and CoSpar core function, respectively.

Infer transition map from a single clonal time point (using OT), please run ( ~ 1 min):

[7]:
adata_1 = cs.tmap.infer_Tmap_from_one_time_clones(
    adata_orig,
    initial_time_points=["4"],
    later_time_point="6",
    initialize_method="OT",
    OT_cost="GED",
    smooth_array=[20, 15, 10, 5],
    sparsity_threshold=0.1,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 4--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Finishing computing gene expression distance, used time 6.2248289585113525
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 36.00060701370239
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.961
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.275
Finishing Joint Optimization, used time 11.890204191207886
-----------Total used time: 54.800551891326904 s ------------
[8]:
cs.hf.check_available_map(adata_1)
adata_1.uns["available_map"]
[8]:
['transition_map', 'OT_transition_map']

Infer transition map from state information alone (using HighVar): (it takes ~ 2 mins)

[9]:
adata_2 = cs.tmap.infer_Tmap_from_state_info_alone(
    adata_orig,
    initial_time_points=["4"],
    later_time_point="6",
    initialize_method="HighVar",
    HighVar_gene_pctl=85,
    max_iter_N=[10, 10],
    epsilon_converge=[0.01, 0.01],
    smooth_array=[20, 15, 10, 5],
    sparsity_threshold=0.1,
)
Step I: Generate pseudo clones where each cell has a unique barcode-----
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
Step II: Perform joint optimization-----
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 4--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use the HighVar method for initialization-------
Step a: find the commonly shared highly variable genes------
Highly varable gene number: 2136 (t1); 2245 (t2). Common set: 978
Step b: convert the shared highly variable genes into clonal info------
 87%|████████▋ | 853/978 [00:00<00:00, 5630.14it/s]
Total used genes=853 (no cells left)
Step c: compute the transition map based on clonal info from highly variable genes------
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.966
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.988
Iteration 6, Use smooth_round=5
Convergence (CoSpar, iter_N=6): corr(previous_T, current_T)=0.996
Finishing initialization using HighVar, used time 16.57118797302246
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.934
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.998
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.784
-----JointOpt Iteration 2: Infer initial clonal structure
-----JointOpt Iteration 2: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.936
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=2): corr(previous_T, current_T)=0.966
-----JointOpt Iteration 3: Infer initial clonal structure
-----JointOpt Iteration 3: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.994
Convergence (JointOpt, iter_N=3): corr(previous_T, current_T)=0.97
-----JointOpt Iteration 4: Infer initial clonal structure
-----JointOpt Iteration 4: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.995
Convergence (JointOpt, iter_N=4): corr(previous_T, current_T)=0.968
-----JointOpt Iteration 5: Infer initial clonal structure
-----JointOpt Iteration 5: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.938
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=5): corr(previous_T, current_T)=0.973
-----JointOpt Iteration 6: Infer initial clonal structure
-----JointOpt Iteration 6: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=6): corr(previous_T, current_T)=0.969
-----JointOpt Iteration 7: Infer initial clonal structure
-----JointOpt Iteration 7: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.935
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.995
Convergence (JointOpt, iter_N=7): corr(previous_T, current_T)=0.971
-----JointOpt Iteration 8: Infer initial clonal structure
-----JointOpt Iteration 8: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.936
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=8): corr(previous_T, current_T)=0.976
-----JointOpt Iteration 9: Infer initial clonal structure
-----JointOpt Iteration 9: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.936
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=9): corr(previous_T, current_T)=0.977
-----JointOpt Iteration 10: Infer initial clonal structure
-----JointOpt Iteration 10: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
Convergence (JointOpt, iter_N=10): corr(previous_T, current_T)=0.975
Finishing Joint Optimization, used time 242.97642278671265
-----------Total used time: 260.5958061218262 s ------------
[10]:
cs.hf.check_available_map(adata_2)
adata_2.uns["available_map"]
[10]:
['transition_map', 'HighVar_transition_map']

Transition map from clonal data alone, without state information

As a comparision, we can also learn the transition map using clonal information alone:

[11]:
# We use the Weinreb method, which selects uni-potent clones based on a pre-defined list of fates.
selected_fates = [
    "Baso",
    "Ccr7_DC",
    "Eos",
    "Erythroid",
    "Lymphoid",
    "Mast",
    "Meg",
    "Monocyte",
    "Neu_Mon",
    "Neutrophil",
    "pDC",
]

adata_3 = cs.tmap.infer_Tmap_from_clonal_info_alone(
    adata_orig, method="weinreb", later_time_point="6", selected_fates=selected_fates
)
Trying to set attribute `.uns` of view, copying.
Infer transition map between initial time points and the later time point.
--------Current initial time point: 2--------
Number of multi-time clones post selection: 185
Use only uni-potent clones (weinreb et al., 2020)
Used uni-potent clone fraction 0.6162162162162163
--------Current initial time point: 4--------
Number of multi-time clones post selection: 500
Use only uni-potent clones (weinreb et al., 2020)
Used uni-potent clone fraction 0.538
[12]:
cs.hf.check_available_map(adata_3)
adata_3.uns["available_map"]
[12]:
['clonal_transition_map']

Saving maps

[13]:
cs.hf.save_map(adata)
Saved file: data_cospar/LARRY_sp500_ranking1_MultiTimeClone_Later_FullSpace0_t*2*4*6_adata_with_transition_map.h5ad

It can be loaded again with adata=cs.hf.read(file_path)

Transition map analysis

[1]:
import cospar as cs
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2  # range: 0 (error),1 (warning),2 (info),3 (hint).
cs.settings.set_figure_params(
    format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=3
)
Running cospar 0.2.0 (python 3.8.12) on 2022-02-08 19:23.
[3]:
# Each dataset should have its folder to avoid conflicts.
cs.settings.data_path = "data_cospar"
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()

Load an existing dataset. (If you have pre-processed data, you can load it with cs.hf.read(file_name).)

[4]:
adata_orig = cs.datasets.hematopoiesis_subsampled()

Generate a transition map

[5]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
    adata_orig,
    clonal_time_points=["2", "4"],
    later_time_point="6",
    smooth_array=[20, 15, 10, 5],
    sparsity_threshold=0.1,
    intraclone_threshold=0.2,
    max_iter_N=10,
    epsilon_converge=0.01,
)
Trying to set attribute `.uns` of view, copying.
------Compute the full Similarity matrix if necessary------
------Infer transition map between initial time points and the later time one------
--------Current initial time point: 2--------
Step 1: Select time points
Number of multi-time clones post selection: 185
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.942
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
--------Current initial time point: 4--------
Step 1: Select time points
Number of multi-time clones post selection: 500
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.956
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
-----------Total used time: 19.324687957763672 s ------------
[6]:
# adata=cs.hf.read('data_cospar/LARRY_sp500_ranking1_MultiTimeClone_Later_FullSpace0_t*2*4*6_adata_with_transition_map.h5ad')

Key parameters

The analysis is done with the plotting module. There are some common parameters for the APIs in this module:

  • source (str; default: transition_map). It determines which transition map to use for analysis. Choices: {transition_map, intraclone_transition_map, OT_transition_map, HighVar_transition_map, clonal_transition_map}

  • selected_fates (list of str). Selected clusters to aggregate differentiation dynamics and visualize fate bias etc.. It allows nested structure, e.g., selected_fates=[‘a’, [‘b’, ‘c’]] selects two clusters: cluster ‘a’ and the other that combines ‘b’ and ‘c’.

  • map_backward (bool; default: True). We can analyze either the backward transitions, i.e., where these selected states or clusters came from (map_backward=True); or the forward transitions, i.e., where the selected states or clusters are going (map_backward=False).

  • selected_times (list; default: all). List of time points to use. By default, all are used.

  • method (str; default: norm-sum). Method to aggregate the transition probability within a cluster. Choices: {norm-sum,sum}. norm-sum returns the probability that a fate cluster originates from an early state i; while sum gives the probability that an initial state i gives rise to a fate cluster.

Plotting transition profiles for single cells

First, check the forward transitions (i.e., future states) from the 'transition_map'.

[7]:
selected_state_id_list = [
    1,
    10,
]  # This is a relative ID. Its mapping to the actual cell id depends on map_backward.

map_backward = False
cs.pl.single_cell_transition(
    adata,
    selected_state_id_list=selected_state_id_list,
    source="transition_map",
    map_backward=map_backward,
)
_images/20211010_map_analysis_13_0.png

Now, backward transitions (i.e., past states) from the same map.

[8]:
selected_state_id_list = [2]
map_backward = True
cs.pl.single_cell_transition(
    adata,
    selected_state_id_list=selected_state_id_list,
    source="transition_map",
    map_backward=map_backward,
)
_images/20211010_map_analysis_15_0.png

Finally, switch to the 'intraclone_transition_map', and check the observed clonal transitions:

[9]:
selected_state_id_list = [2]
map_backward = True
cs.pl.single_cell_transition(
    adata,
    selected_state_id_list=selected_state_id_list,
    source="intraclone_transition_map",
    map_backward=map_backward,
)
_images/20211010_map_analysis_17_0.png

Fate map

Inspect the backward transitions, and ask where the selected fate clusters come from.

[10]:
cs.tl.fate_map(
    adata,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    map_backward=True,
)
cs.pl.fate_map(
    adata,
    selected_fates=["Neutrophil"],
    source="transition_map",
    plot_target_state=True,
    show_histogram=False,
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
_images/20211010_map_analysis_20_1.png

The results are stored at adata.obs[f'fate_map_{source}_{fate_name}'], which can be used for a customized analysis.

[11]:
adata.obs.keys()
[11]:
Index(['time_info', 'state_info', 'n_counts',
       'fate_map_transition_map_Neutrophil',
       'fate_map_transition_map_Monocyte'],
      dtype='object')

As an example of how to use this data, you can re-plot the fate map as follows:

[12]:
cs.pl.embedding(adata, color="fate_map_transition_map_Neutrophil")
_images/20211010_map_analysis_24_0.png

Fate potency

Count fate number directly (fate_count=True); otherwise, calculate the potencyusing entropy. The results are stored at adata.obs[f'fate_potency_{source}'].

[13]:
cs.tl.fate_potency(
    adata,
    source="transition_map",
    map_backward=True,
    method="norm-sum",
    fate_count=True,
)
cs.pl.fate_potency(adata, source="transition_map")
Results saved at adata.obs['fate_map_transition_map_Ccr7_DC']
Results saved at adata.obs['fate_map_transition_map_Neu_Mon']
Results saved at adata.obs['fate_map_transition_map_undiff']
Results saved at adata.obs['fate_map_transition_map_Mast']
Results saved at adata.obs['fate_map_transition_map_Erythroid']
Results saved at adata.obs['fate_map_transition_map_Baso']
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_pDC']
Results saved at adata.obs['fate_map_transition_map_Lymphoid']
Results saved at adata.obs['fate_map_transition_map_Meg']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_map_transition_map_Eos']
Results saved at adata.obs['fate_potency_transition_map']
_images/20211010_map_analysis_27_1.png

Fate bias

The fate bias of an initial state i is defined by the competition of fate probability towards two fate clusters A and B:

  • Bias_i=P(i;A)/[P(i;A)+P(i;B)].

Only states with fate probabilities satisfying this criterion will be shown:

  • P(i; A)+P(i; B)>sum_fate_prob_thresh

The inferred fate bias is stored at adata.obs[f'fate_bias_{source}_{fate_A}*{fate_B}'].

[14]:
cs.tl.fate_bias(
    adata,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    pseudo_count=0,
)
cs.pl.fate_bias(
    adata,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    plot_target_state=False,
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']
_images/20211010_map_analysis_30_1.png

We can also study fate bias in the Gata1+ states. First, prepare a sate mask:

[15]:
x_emb = adata.obsm["X_emb"][:, 0]
y_emb = adata.obsm["X_emb"][:, 1]

index_2 = cs.hf.above_the_line(adata.obsm["X_emb"], [100, 500], [500, -1000])
index_5 = cs.hf.above_the_line(adata.obsm["X_emb"], [0, -500], [1000, 2])
final_mask = (~index_2) & (
    (index_5 | (x_emb < 0))
)  # & index_3 & index_4 & index_5 #mask_1 &

Now, compute the fate bias. We can concatenate cluster ‘Meg’ and ‘Erythroid’ using a nested list. Same for concatenating ‘Baso’, ‘Mast’, and ‘Eos’.

[16]:
# cs.pl.fate_bias(adata,selected_fates=[['Meg','Erythroid'],['Baso','Mast','Eos']],source='transition_map',
#         plot_target_state=False,mask=final_mask,map_backward=True,sum_fate_prob_thresh=0.01,method='norm-sum')
cs.tl.fate_bias(
    adata,
    selected_fates=[["Meg", "Erythroid"], ["Baso", "Mast", "Eos"]],
    source="transition_map",
    pseudo_count=0,
    map_backward=True,
    sum_fate_prob_thresh=0.01,
    method="norm-sum",
)
cs.pl.fate_bias(
    adata,
    selected_fates=[["Meg", "Erythroid"], ["Baso", "Mast", "Eos"]],
    source="transition_map",
    plot_target_state=False,
    mask=final_mask,
    background=False,
)
Results saved at adata.obs['fate_map_transition_map_Meg_Erythroid']
Results saved at adata.obs['fate_map_transition_map_Baso_Mast_Eos']
Results saved at adata.obs['fate_bias_transition_map_Meg_Erythroid*Baso_Mast_Eos']
_images/20211010_map_analysis_34_1.png

Dynamic trajectory inference

We can infer the dynamic trajectory and ancestor population using the fate bias from binary fate competition. Here, fate bias is a scalar between (0,1) at each state. Selected ancestor population satisfies:

  • P(i;A) + P(i;B) > sum_fate_prob_thresh;

  • Ancestor states {i} for A: Bias_i > bias_threshold_A

  • Ancestor states {i} for B: Bias_i < bias_threshold_B

They will be stored at adata.obs[f'progenitor_{source}_{fate_name}'] and adata.obs[f'diff_trajectory_{source}_{fate_name}'].

[17]:
cs.tl.progenitor(
    adata,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    map_backward=True,
    bias_threshold_A=0.5,
    bias_threshold_B=0.5,
    sum_fate_prob_thresh=0.2,
    avoid_target_states=True,
)
cs.pl.progenitor(
    adata, selected_fates=["Neutrophil", "Monocyte"], source="transition_map"
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']
Results saved at adata.obs[f'progenitor_transition_map_Neutrophil'] and adata.obs[f'diff_trajectory_transition_map_Neutrophil']
Results saved at adata.obs[f'progenitor_transition_map_Monocyte'] and adata.obs[f'diff_trajectory_transition_map_Monocyte']
_images/20211010_map_analysis_37_1.png
_images/20211010_map_analysis_37_2.png

We can visualize the progenitors for Neutrophil directly

[18]:
fate_name = "Neutrophil"  # Monocyte
traj_name = f"progenitor_transition_map_{fate_name}"
cs.pl.embedding(adata, color=traj_name)
_images/20211010_map_analysis_39_0.png

We can also visualize the differentiation trajectory for Neutrophil.

[19]:
fate_name = "Neutrophil"  # Monocyte
traj_name = f"diff_trajectory_transition_map_{fate_name}"
cs.pl.embedding(adata, color=traj_name)
_images/20211010_map_analysis_41_0.png

Differential genes for two ancestor groups

It would be interesting to see what genes are differentially expressed between these two ancestor populations, which might drive fate bifurcation. We provide a simple differentiation gene expression analysis that uses Wilcoxon rank-sum test to calculate P values, followed by Benjamini-Hochberg correction. You can always use your own method.

[20]:
import numpy as np

cell_group_A = np.array(adata.obs["diff_trajectory_transition_map_Neutrophil"])
cell_group_B = np.array(adata.obs["diff_trajectory_transition_map_Monocyte"])
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
    adata, cell_group_A=cell_group_A, cell_group_B=cell_group_B, FDR_cutoff=0.05
)
[21]:
dge_gene_A
[21]:
index gene Qvalue mean_1 mean_2 ratio
0 5 Wfdc17 0.000000e+00 0.256167 24.515341 -4.344265
1 9 Lpl 9.000377e-267 0.798759 27.861982 -4.004097
2 0 Mmp8 0.000000e+00 1.777057 40.070595 -3.886477
3 304 H2-Aa 1.071471e-33 0.083807 14.977456 -3.881858
4 8 Ctss 1.935638e-273 0.130347 15.299898 -3.850025
... ... ... ... ... ... ...
1061 1789 H3f3a 2.146904e-04 17.266771 18.038162 -0.059673
1062 3329 Gm37214 4.577627e-02 1.208660 1.298273 -0.057379
1063 1881 mt-Co2 3.645677e-04 195.855591 203.123032 -0.052301
1064 2978 Nom1 2.363414e-02 0.813894 0.820711 -0.005412
1065 2647 Ffar2 9.581307e-03 2.096291 2.100280 -0.001857

1066 rows × 6 columns

Gene expression can be explored directly:

[22]:
selected_genes = dge_gene_A["gene"][:2]
cs.pl.gene_expression_on_manifold(
    adata, selected_genes=selected_genes, color_bar=True, savefig=False
)
_images/20211010_map_analysis_47_0.png
_images/20211010_map_analysis_47_1.png

You can visualize the gene expression differences using a heat map.

[24]:
gene_list = list(dge_gene_A["gene"][:20]) + list(
    dge_gene_B["gene"][:20]
)  # select the top 20 genes from both populations

selected_fates = [
    "Neutrophil",
    "Monocyte",
    ["Baso", "Eos", "Erythroid", "Mast", "Meg"],
    ["pDC", "Ccr7_DC", "Lymphoid"],
]
renames = ["Neu", "Mon", "Meg-Ery-MBaE", "Lym-Dc"]
gene_expression_matrix = cs.pl.gene_expression_heatmap(
    adata,
    selected_genes=gene_list,
    selected_fates=selected_fates,
    rename_fates=renames,
    fig_width=12,
)
_images/20211010_map_analysis_49_0.png

You can run the differential gene expression analysis on different cell clusters.

[25]:
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
    adata, cell_group_A="Neutrophil", cell_group_B="Monocyte"
)

Gene expression dynamics

We can calculate the pseudotime along an inferred trajectory, and plot the gene expression along the pseudotime. This method requires that a trajectory has been inferred in previously steps.

[26]:
gene_name_list = ["Gata1", "Mpo", "Elane", "S100a8"]
selected_fate = "Neutrophil"
cs.pl.gene_expression_dynamics(
    adata,
    selected_fate,
    gene_name_list,
    traj_threshold=0.2,
    invert_PseudoTime=False,
    compute_new=True,
    gene_exp_percentile=99,
    n_neighbors=8,
    plot_raw_data=False,
)
_images/20211010_map_analysis_54_0.png
_images/20211010_map_analysis_54_1.png

Fate coupling and hierarchy

The inferred transition map can be used to estimate differentiation coupling between different fate clusters.

[27]:
selected_fates = [
    "Ccr7_DC",
    "Mast",
    "Meg",
    "pDC",
    "Eos",
    "Lymphoid",
    "Erythroid",
    "Baso",
    "Neutrophil",
    "Monocyte",
]
cs.tl.fate_coupling(adata, selected_fates=selected_fates, source="transition_map")
cs.pl.fate_coupling(adata, source="transition_map")
Results saved as dictionary at adata.uns['fate_coupling_transition_map']
[27]:
<AxesSubplot:title={'center':'source: transition_map'}>
_images/20211010_map_analysis_57_2.png

We can also infer fate hierarchy from a transition map, based on the fate coupling matrix and using the neighbor-joining method.

[28]:
cs.tl.fate_hierarchy(adata, selected_fates=selected_fates, source="transition_map")
cs.pl.fate_hierarchy(adata, source="transition_map")
Results saved as dictionary at adata.uns['fate_hierarchy_transition_map']

            /-Baso
         /-|
      /-|   \-Eos
     |  |
   /-|   \-Mast
  |  |
  |  |   /-Erythroid
  |   \-|
--|      \-Meg
  |
  |      /-Monocyte
  |   /-|
  |  |   \-Neutrophil
   \-|
     |      /-Lymphoid
     |   /-|
      \-|   \-Ccr7_DC
        |
         \-pDC

Propagate a cluster in time

One way to define the dynamic trajectory is simply mapping a given fate cluster backward in time. The whole trajectory across multiple time points will be saved at adata.obs[f'diff_trajectory_{source}_{fate_name}']. This method requires having multiple clonal time points, and a transition map between neighboring time points.

First, use all clonal time points (the default), and infer a transition map between neighboring time points (by not setting later_time_point).

[29]:
adata_5 = cs.tmap.infer_Tmap_from_multitime_clones(
    adata_orig,
    smooth_array=[20, 15, 10, 5],
    sparsity_threshold=0.2,
    intraclone_threshold=0.2,
    max_iter_N=10,
    epsilon_converge=0.01,
)
------Compute the full Similarity matrix if necessary------
----Infer transition map between neighboring time points-----
Step 1: Select time points
Number of multi-time clones post selection: 500
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.912
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.993
-----------Total used time: 17.11899995803833 s ------------
[30]:
cs.tl.iterative_differentiation(
    adata_5, selected_fates="Neutrophil", source="intraclone_transition_map"
)
cs.pl.iterative_differentiation(adata_5, source="intraclone_transition_map")
Results saved at adata.obs[f'diff_trajectory_intraclone_transition_map_Neutrophil']
_images/20211010_map_analysis_64_1.png

The trajectory can be used for visualizing gene expression dynamics using cs.pl.gene_expression_dynamics, as shown above:

[31]:
gene_name_list = ["Gata1", "Mpo", "Elane", "S100a8"]
selected_fate = "Neutrophil"
cs.pl.gene_expression_dynamics(
    adata_5,
    selected_fate,
    gene_name_list,
    traj_threshold=0.1,
    invert_PseudoTime=True,
    source="intraclone_transition_map",
)
_images/20211010_map_analysis_66_0.png
_images/20211010_map_analysis_66_1.png
[ ]:

Hematopoiesis

Hematopoiesis dataset from Weinreb, C., Rodriguez-Fraticelli, A., Camargo, F. D. & Klein, A. M. Science 367, (2020).

This dataset has 3 time points for both the clonal and state measurements. It has ~50000 clonally-labeled cells. Running the whole pipeline for the first time could take 5.2 hours in a standard personal computer. Most of the time (3.3 h) is used for generating the similarity matrix, which are saved for later usage.

Key components:

  • Part I: Infer transition map from all clonal data

  • Part II: Infer transition using end-point clones

  • Part III: Infer transition map from state information alone

  • Part IV: Predict fate bias for Gata1+ states

[1]:
import cospar as cs
import numpy as np
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2
cs.settings.data_path = "LARRY_data"  # A relative path to save data. If not existed before, create a new one.
cs.settings.figure_path = "LARRY_figure"  # A relative path to save figures. If not existed before, create a new one.
cs.settings.set_figure_params(
    format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=2
)
Running cospar 0.2.0 (python 3.8.12) on 2022-02-07 21:21.
[3]:
# # This is for testing this notebook
# cs.settings.data_path='data_cospar'
# cs.settings.figure_path='fig_cospar'
# adata_orig=cs.datasets.hematopoiesis_subsampled()
# adata_orig.uns['data_des']=['LARRY_sp500_ranking1']

Loading data

[4]:
adata_orig = cs.datasets.hematopoiesis()
[5]:
adata_orig
[5]:
AnnData object with n_obs × n_vars = 49116 × 25289
    obs: 'time_info', 'state_info'
    uns: 'clonal_time_points', 'data_des', 'state_info_colors'
    obsm: 'X_clone', 'X_emb', 'X_pca'
[6]:
cs.pl.embedding(adata_orig, color="state_info")
_images/20210121_all_hematopoietic_data_v3_8_0.png
[7]:
cs.hf.check_available_choices(adata_orig)
Available transition maps: []
Available clusters: ['undiff', 'Meg', 'Mast', 'Neu_Mon', 'Eos', 'Erythroid', 'Ccr7_DC', 'Baso', 'Lymphoid', 'pDC', 'Neutrophil', 'Monocyte']
Available time points: ['2' '4' '6']
Clonal time points: ['2' '4' '6']

Basic clonal analysis

[8]:
cs.pl.clones_on_manifold(
    adata_orig, selected_clone_list=[1], color_list=["black", "red", "blue"]
)
_images/20210121_all_hematopoietic_data_v3_11_0.png
[9]:
selected_times = "4"
selected_fates = [
    "Ccr7_DC",
    "Mast",
    "Meg",
    "pDC",
    "Eos",
    "Lymphoid",
    "Erythroid",
    "Baso",
    "Neutrophil",
    "Monocyte",
]
cs.tl.fate_coupling(
    adata_orig,
    source="X_clone",
    selected_fates=selected_fates,
    selected_times=selected_times,
)
cs.pl.fate_coupling(adata_orig, source="X_clone")
Results saved as dictionary at adata.uns['fate_coupling_X_clone']
[9]:
<AxesSubplot:title={'center':'source: X_clone'}>
_images/20210121_all_hematopoietic_data_v3_12_2.png
[10]:
cs.pl.barcode_heatmap(
    adata_orig,
    selected_times=selected_times,
    selected_fates=selected_fates,
    color_bar=True,
)
Data saved at adata.uns['barcode_heatmap']
[10]:
<AxesSubplot:>
_images/20210121_all_hematopoietic_data_v3_13_2.png

Part I: Infer transition map from all clonal data

Map inference

When first run, it takes around 3 h 37 mins, in which 3 h 20 mins are used for computing the similarity matrices and saving these data. It takes only 20 mins for later runs.

[11]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
    adata_orig,
    clonal_time_points=["2", "4", "6"],
    later_time_point="6",
    smooth_array=[20, 15, 10],
    sparsity_threshold=0.2,
    max_iter_N=3,
)
Trying to set attribute `.uns` of view, copying.
------Compute the full Similarity matrix if necessary------
------Infer transition map between initial time points and the later time one------
--------Current initial time point: 2--------
Step 1: Select time points
Number of multi-time clones post selection: 1216
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.903
--------Current initial time point: 4--------
Step 1: Select time points
Number of multi-time clones post selection: 3047
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.97
-----------Total used time: 1604.8337297439575 s ------------
Save pre-computed data (optional)
[12]:
save_data = False
if save_data:
    cs.hf.save_map(adata)
Plotting
Transition profiles for single cells

Forward transitions with map_backward=False.

[13]:
selected_state_id_list = [2]

cs.pl.single_cell_transition(
    adata,
    selected_state_id_list=selected_state_id_list,
    color_bar=False,
    source="transition_map",
    map_backward=False,
)
_images/20210121_all_hematopoietic_data_v3_22_0.png

Backward transitions with map_backward=True.

[14]:
selected_state_id_list = [4]
cs.pl.single_cell_transition(
    adata,
    selected_state_id_list=selected_state_id_list,
    color_bar=False,
    source="transition_map",
    map_backward=True,
)
_images/20210121_all_hematopoietic_data_v3_24_0.png
Fate map
[15]:
cs.tl.fate_map(
    adata,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    map_backward=True,
)
cs.pl.fate_map(
    adata,
    selected_fates=["Neutrophil"],
    source="transition_map",
    plot_target_state=True,
    show_histogram=False,
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
_images/20210121_all_hematopoietic_data_v3_26_1.png
Fate bias
[16]:
cs.tl.fate_bias(
    adata,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    pseudo_count=0,
    sum_fate_prob_thresh=0.1,
)
cs.pl.fate_bias(
    adata,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    plot_target_state=False,
    selected_times=["4"],
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']
_images/20210121_all_hematopoietic_data_v3_28_1.png
Fate coupling from the transition map
[17]:
selected_fates = [
    "Ccr7_DC",
    "Mast",
    "Meg",
    "pDC",
    "Eos",
    "Lymphoid",
    "Erythroid",
    "Baso",
    "Neutrophil",
    "Monocyte",
]
cs.tl.fate_coupling(adata, selected_fates=selected_fates, source="transition_map")
cs.pl.fate_coupling(adata, source="transition_map")
Results saved as dictionary at adata.uns['fate_coupling_transition_map']
[17]:
<AxesSubplot:title={'center':'source: transition_map'}>
_images/20210121_all_hematopoietic_data_v3_30_2.png
Hierarchy
[18]:
cs.tl.fate_hierarchy(adata, selected_fates=selected_fates, source="transition_map")
cs.pl.fate_hierarchy(adata, source="transition_map")
Results saved as dictionary at adata.uns['fate_hierarchy_transition_map']

            /-Erythroid
         /-|
      /-|   \-Meg
     |  |
   /-|   \-Mast
  |  |
  |  |   /-Baso
  |   \-|
--|      \-Eos
  |
  |      /-Monocyte
  |   /-|
  |  |   \-Neutrophil
   \-|
     |      /-Lymphoid
     |   /-|
      \-|   \-Ccr7_DC
        |
         \-pDC
Dynamic trajectory inference
[19]:
cs.tl.progenitor(
    adata,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    map_backward=True,
    bias_threshold_A=0.5,
    bias_threshold_B=0.5,
    sum_fate_prob_thresh=0.2,
    avoid_target_states=True,
)
cs.pl.progenitor(
    adata, selected_fates=["Neutrophil", "Monocyte"], source="transition_map"
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']
Results saved at adata.obs[f'progenitor_transition_map_Neutrophil'] and adata.obs[f'diff_trajectory_transition_map_Neutrophil']
Results saved at adata.obs[f'progenitor_transition_map_Monocyte'] and adata.obs[f'diff_trajectory_transition_map_Monocyte']
_images/20210121_all_hematopoietic_data_v3_34_1.png
_images/20210121_all_hematopoietic_data_v3_34_2.png
Gene trend along the dynamic trajectory
[20]:
gene_name_list = ["Gata1", "Mpo", "Elane", "S100a8"]
selected_fate = "Neutrophil"
cs.pl.gene_expression_dynamics(
    adata, selected_fate, gene_name_list, traj_threshold=0.2, invert_PseudoTime=False
)
_images/20210121_all_hematopoietic_data_v3_36_0.png
_images/20210121_all_hematopoietic_data_v3_36_1.png

Part II: Infer transition map from end-point clones

When run for the firs time, assuming that the similarity matrices are pre-computed, it takes 73 mins. Around 40 mins of which are used to compute the initialized map.

We group time points ‘2’ and ‘4’ so that the inference on day 2 states (which has much fewer cells) are better.

[21]:
time_info = np.array(adata_orig.obs["time_info"])
time_info[time_info == "2"] = "24"
time_info[time_info == "4"] = "24"
adata_orig.obs["time_info"] = time_info
adata_orig = cs.pp.initialize_adata_object(adata_orig)
Clones without any cells are removed.
Time points with clonal info: ['24' '6']
WARNING: Default ascending order of time points are: ['24' '6']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
[22]:
adata_1 = cs.tmap.infer_Tmap_from_one_time_clones(
    adata_orig,
    initial_time_points=["24"],
    later_time_point="6",
    initialize_method="OT",
    OT_cost="GED",
    smooth_array=[20, 15, 10],
    max_iter_N=[1, 3],
    sparsity_threshold=0.2,
    use_full_Smatrix=True,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 24--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed custom OT matrix
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.974
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.144
Finishing Joint Optimization, used time 1308.7951350212097
-----------Total used time: 1341.8913249969482 s ------------
Fate bias
[23]:
cs.tl.fate_bias(
    adata_1,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    pseudo_count=0,
    sum_fate_prob_thresh=0.1,
)
cs.pl.fate_bias(
    adata_1,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    plot_target_state=False,
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']
_images/20210121_all_hematopoietic_data_v3_43_1.png

Part III: Infer transition map from state information alone

[24]:
adata_2 = cs.tmap.infer_Tmap_from_state_info_alone(
    adata_orig,
    initial_time_points=["24"],
    later_time_point="6",
    initialize_method="OT",
    OT_cost="GED",
    smooth_array=[20, 15, 10],
    max_iter_N=[1, 3],
    sparsity_threshold=0.2,
    use_full_Smatrix=True,
)
Step I: Generate pseudo clones where each cell has a unique barcode-----
Trying to set attribute `.uns` of view, copying.
Step II: Perform joint optimization-----
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 24--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed custom OT matrix
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.977
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.14
Finishing Joint Optimization, used time 1854.5912249088287
-----------Total used time: 1923.1848938465118 s ------------
[25]:
cs.tl.fate_bias(
    adata_2,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    pseudo_count=0,
    sum_fate_prob_thresh=0.1,
)
cs.pl.fate_bias(
    adata_2,
    selected_fates=["Neutrophil", "Monocyte"],
    source="transition_map",
    plot_target_state=False,
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']
_images/20210121_all_hematopoietic_data_v3_46_1.png

Part IV: Predict fate bias for Gata1+ states

Load data

We load a dataset that also includes non-clonally labeled cells in the Gata1+ states (in total ~38K cells), which increases the statistical power of DGE analysis.

[26]:
cs.settings.data_path = "LARRY_data_Gata1"  # A relative path to save data. If not existed before, create a new one.
cs.settings.figure_path = "LARRY_figure_Gata1"  # A relative path to save figures. If not existed before, create a new one.
adata_orig_1 = cs.datasets.hematopoiesis_Gata1_states()
[27]:
adata_orig_1 = cs.pp.initialize_adata_object(adata_orig_1)
Time points with clonal info: ['2' '4' '6']
WARNING: Default ascending order of time points are: ['2' '4' '6']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
[28]:
cs.pl.embedding(adata_orig_1, color="time_info")
_images/20210121_all_hematopoietic_data_v3_52_0.png
[29]:
adata_orig_1
[29]:
AnnData object with n_obs × n_vars = 38457 × 25289
    obs: 'n_counts', 'Source', 'Well', 'mito_frac', 'time_info', 'state_info'
    var: 'highly_variable'
    uns: 'AllCellCyc_genes', 'clonal_time_points', 'data_des', 'max_mito', 'min_tot', 'new_highvar_genes', 'state_info_colors', 'time_info_colors', 'time_ordering'
    obsm: 'X_clone', 'X_emb', 'X_pca', 'X_umap'
[30]:
## This is for testing this notebook
# adata_orig_1=adata_orig
# adata_3=adata_1
Infer transition map from all clones
[31]:
adata_3 = cs.tmap.infer_Tmap_from_multitime_clones(
    adata_orig_1,
    later_time_point="6",
    smooth_array=[20, 15, 10],
    sparsity_threshold=0.1,
    intraclone_threshold=0.1,
    extend_Tmap_space=True,
)
------Compute the full Similarity matrix if necessary------
Trying to set attribute `.uns` of view, copying.
------Infer transition map between initial time points and the later time one------
--------Current initial time point: 2--------
Step 1: Select time points
Number of multi-time clones post selection: 355
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.969
--------Current initial time point: 4--------
Step 1: Select time points
Number of multi-time clones post selection: 1270
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.985
-----------Total used time: 477.8539500236511 s ------------
Fate bias
[32]:
selected_fates = [["Meg", "Erythroid"], ["Baso", "Mast", "Eos"]]
cs.tl.fate_bias(
    adata_3,
    selected_fates=selected_fates,
    source="transition_map",
    pseudo_count=0,
    sum_fate_prob_thresh=0.05,
)
cs.pl.fate_bias(
    adata_3,
    selected_fates=selected_fates,
    source="transition_map",
    plot_target_state=False,
)
Results saved at adata.obs['fate_map_transition_map_Meg_Erythroid']
Results saved at adata.obs['fate_map_transition_map_Baso_Mast_Eos']
Results saved at adata.obs['fate_bias_transition_map_Meg_Erythroid*Baso_Mast_Eos']
_images/20210121_all_hematopoietic_data_v3_58_1.png
Identify ancestor populations
[33]:
x_emb = adata_orig_1.obsm["X_emb"][:, 0]
y_emb = adata_orig_1.obsm["X_emb"][:, 1]

index_2 = cs.hf.above_the_line(adata_orig_1.obsm["X_emb"], [0, -500], [-300, 1000])
index_2_2 = cs.hf.above_the_line(adata_orig_1.obsm["X_emb"], [-450, -600], [-600, 800])

final_mask = (index_2_2) & (~index_2)  # & index_3 & index_4 & index_5 #mask_1 &
cs.pl.customized_embedding(x_emb, y_emb, final_mask)
[33]:
<AxesSubplot:>
_images/20210121_all_hematopoietic_data_v3_60_1.png
[34]:
selected_fates = [["Meg", "Erythroid"], ["Baso", "Mast", "Eos"]]
cs.tl.progenitor(
    adata_3,
    selected_fates=selected_fates,
    source="transition_map",
    map_backward=True,
    bias_threshold_A=0.6,
    bias_threshold_B=0.3,
    sum_fate_prob_thresh=0.05,
    avoid_target_states=True,
)
cs.pl.progenitor(
    adata_3, selected_fates=selected_fates, source="transition_map", mask=final_mask
)
Results saved at adata.obs['fate_map_transition_map_Meg_Erythroid']
Results saved at adata.obs['fate_map_transition_map_Baso_Mast_Eos']
Results saved at adata.obs['fate_bias_transition_map_Meg_Erythroid*Baso_Mast_Eos']
Results saved at adata.obs[f'progenitor_transition_map_Meg_Erythroid'] and adata.obs[f'diff_trajectory_transition_map_Meg_Erythroid']
Results saved at adata.obs[f'progenitor_transition_map_Baso_Mast_Eos'] and adata.obs[f'diff_trajectory_transition_map_Baso_Mast_Eos']
_images/20210121_all_hematopoietic_data_v3_61_1.png
_images/20210121_all_hematopoietic_data_v3_61_2.png
DGE analysis
[35]:
cell_group_A = np.array(adata_3.obs[f"progenitor_transition_map_Meg_Erythroid"])
cell_group_B = np.array(adata_3.obs[f"progenitor_transition_map_Baso_Mast_Eos"])
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
    adata_3, cell_group_A=cell_group_A, cell_group_B=cell_group_B, FDR_cutoff=0.05
)
[36]:
# All, ranked, DGE genes for group A
dge_gene_A
[36]:
index gene Qvalue mean_1 mean_2 ratio
0 4 Ctsg 0.000000e+00 0.103370 2.567811 -1.693123
1 3 Emb 0.000000e+00 0.186135 2.703714 -1.642705
2 26 Elane 3.264509e-241 0.151518 2.530409 -1.616299
3 2 Plac8 0.000000e+00 1.096986 4.811569 -1.470611
4 17 Igfbp4 0.000000e+00 1.042382 3.969988 -1.282990
... ... ... ... ... ... ...
965 2772 Rps29 2.823620e-02 2.905993 3.008932 -0.037529
966 2343 Rpl38 8.297305e-03 5.915980 6.061140 -0.029967
967 2976 Rpl5 4.697313e-02 4.714540 4.806636 -0.023065
968 2544 Ly6e 1.556172e-02 2.882883 2.944137 -0.022582
969 2705 Isg20 2.485104e-02 0.330973 0.342714 -0.012671

970 rows × 6 columns

[37]:
sub_predicted_upper = [
    "Slc14a1",
    "Plxnc1",
    "Mef2c",
    "Tnip3",
    "Pbx1",
    "Hbb-bt",
    "Tmem176b",
    "Car2",
    "Pim2",
    "Tsc22d1",
    "Plcg2",
    "Gmpr",
    "Cd44",
    "Irgm1",
    "Hsd17b10",
    "Dlk1",
]
sub_predicted_lower = [
    "Ctsg",
    "Emb",
    "Ccl9",
    "Igfbp7",
    "Ap3s1",
    "Gstm1",
    "Ms4a3",
    "Fcgr3",
    "Mpo",
    "Cebpb",
    "Thy1",
    "Anxa3",
    "Sell",
    "Pcsk9",
    "Fcgr2b",
    "Cebpa",
    "Cxcr4",
]
[38]:
state_info = np.array(adata_3.obs["state_info"])
state_info[cell_group_A > 0] = "MegEr_Prog"
state_info[cell_group_B > 0] = "EMB_Prog"
[39]:
adata_3.obs["state_info"] = state_info
cs.pl.embedding(adata_3, color="state_info")
/Users/shouwenwang/miniconda3/envs/CoSpar_test/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
... storing 'state_info' as categorical
_images/20210121_all_hematopoietic_data_v3_67_1.png
[40]:
cs.settings.set_figure_params(
    format="pdf", figsize=[4, 3.5], dpi=200, fontsize=10, pointsize=3
)

gene_list = sub_predicted_upper + sub_predicted_lower
selected_fates = ["MegEr_Prog", "EMB_Prog"]
renames = ["MkEr prog", "MaBaEos prog"]

gene_expression_matrix = cs.pl.gene_expression_heatmap(
    adata_3,
    selected_genes=gene_list,
    selected_fates=selected_fates,
    rename_fates=renames,
    horizontal=True,
    fig_width=6.5,
    fig_height=2,
)
_images/20210121_all_hematopoietic_data_v3_68_0.png

Reprogramming

The reprogramming dataset from Biddy, B. A. et al. Nature 564, 219–224 (2018). This dataset has multiple time points for both clonal and state measurements.

Key components:

  • Part I: Infer transition map using clones from all time points

  • Part II: Infer transition map from end-point clones

  • Part III: Infer transition map from state information alone

  • Part IV: Predict early fate bias on day 3

[1]:
import cospar as cs
import numpy as np
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2
cs.settings.set_figure_params(
    format="png", dpi=75, fontsize=14
)  # use png to reduce file size.

cs.settings.data_path = "CellTag_data"  # A relative path to save data. If not existed before, create a new one.
cs.settings.figure_path = "CellTag_figure"  # A relative path to save figures. If not existed before, create a new one.
Running cospar 0.2.0 (python 3.8.12) on 2022-02-07 22:21.

Loading data

[3]:
adata_orig = cs.datasets.reprogramming()
[4]:
adata_orig = cs.pp.initialize_adata_object(adata_orig)
Time points with clonal info: ['Day12' 'Day15' 'Day21' 'Day28' 'Day6' 'Day9']
WARNING: Default ascending order of time points are: ['Day12' 'Day15' 'Day21' 'Day28' 'Day6' 'Day9']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
[5]:
cs.hf.update_time_ordering(
    adata_orig, updated_ordering=["Day6", "Day9", "Day12", "Day15", "Day21", "Day28"]
)

The cells are barcoded over 3 rounds during the entire differentiation process. There are multiple ways to assemble the barcodes on day 0, day 3, and day 13 into a clonal ID. Below, we provide three variants:

  • Concatenate barcodes on day 0 and day 13, as in the original analysis (adata_orig.obsm['X_clone_Concat_D0D3'], the default);

  • Concatenate barcodes on day 0, day 3, and day 13 (adata_orig.obsm['X_clone_Concat_D0D3D13']);

  • No concatenation; each cell has up to 3 barcodes (adata_orig.obsm['X_clone_NonConcat_D0D3D13']).

The last choice keeps the nested clonal structure in the data. You can choose any one of the clonal arrangement for downstream analysis, by setting adata_orig.obsm['X_clone']=adata_orig.obsm['X_clone_Concat_D0D3']. The three clonal arrangements give very similar fate prediction.

[6]:
adata_orig
[6]:
AnnData object with n_obs × n_vars = 18803 × 28001
    obs: 'time_info', 'state_info', 'reprogram_trajectory', 'failed_trajectory', 'Reference_fate_bias'
    uns: 'clonal_time_points', 'data_des', 'state_info_colors', 'time_info_colors', 'time_ordering'
    obsm: 'X_clone', 'X_clone_Concat_D0D3', 'X_clone_Concat_D0D3D13', 'X_clone_NonConcat_D0D3D13', 'X_emb', 'X_pca'
[7]:
cs.hf.check_available_choices(adata_orig)
Available transition maps: []
Available clusters: ['Reprogrammed', 'Failed', 'Others']
Available time points: ['Day6' 'Day9' 'Day12' 'Day15' 'Day21' 'Day28']
Clonal time points: ['Day6' 'Day9' 'Day12' 'Day15' 'Day21' 'Day28']
[8]:
re_propressing = False
if re_propressing:
    cs.pp.get_highly_variable_genes(adata_orig)
    cs.pp.remove_cell_cycle_correlated_genes(
        adata_orig, corr_threshold=0.03, confirm_change=True
    )
    cs.pp.get_X_pca(adata_orig, n_pca_comp=40)
    cs.pp.get_X_emb(adata_orig, n_neighbors=20, umap_min_dist=0.3)
    # cs.pp.get_state_info(adata_orig,n_neighbors=20,resolution=0.5) # if this is changed, the cluster name used later will be wrong.
    cs.pl.embedding(adata_orig, color="time_info")
[9]:
cs.pl.embedding(adata_orig, color="time_info")
_images/20210121_reprogramming_static_barcoding_v2_12_0.png
[10]:
cs.pl.embedding(adata_orig, color="state_info")
_images/20210121_reprogramming_static_barcoding_v2_13_0.png
[11]:
adata_orig.obs["reprogram_trajectory"] = adata_orig.obs["reprogram_trajectory"].astype(
    int
)
cs.pl.embedding(adata_orig, color="reprogram_trajectory")
_images/20210121_reprogramming_static_barcoding_v2_14_0.png
[12]:
adata_orig.obs["failed_trajectory"] = adata_orig.obs["failed_trajectory"].astype(int)
cs.pl.embedding(adata_orig, color="failed_trajectory")
_images/20210121_reprogramming_static_barcoding_v2_15_0.png
Basic clonal analysis
[13]:
cs.tl.clonal_fate_bias(
    adata_orig, selected_fate="Reprogrammed", alternative="two-sided"
)
cs.pl.clonal_fate_bias(adata_orig)
100%|██████████| 1451/1451 [00:01<00:00, 792.99it/s]
Data saved at adata.uns['clonal_fate_bias']
_images/20210121_reprogramming_static_barcoding_v2_17_2.png
_images/20210121_reprogramming_static_barcoding_v2_17_3.png
[14]:
result = adata_orig.uns["clonal_fate_bias"]
result
[14]:
Clone_ID Clone_size Q_value Fate_bias clonal_fraction_in_target_fate
0 1014 2382.0 1.000000e-20 20.000000 0.371537
1 611 1908.0 1.000000e-20 20.000000 0.265199
2 600 780.0 1.000000e-20 20.000000 0.014103
3 1188 289.0 1.000000e-20 20.000000 0.446367
4 545 283.0 9.769878e-15 14.010111 0.007067
... ... ... ... ... ...
1446 497 4.0 1.000000e+00 -0.000000 0.000000
1447 496 1.0 1.000000e+00 -0.000000 1.000000
1448 494 22.0 1.000000e+00 -0.000000 0.227273
1449 504 1.0 1.000000e+00 -0.000000 0.000000
1450 1450 3.0 1.000000e+00 -0.000000 0.000000

1451 rows × 5 columns

[15]:
ids = result["Clone_ID"][8:10]
# ids=[324,313,446,716,367]
cs.pl.clones_on_manifold(adata_orig, selected_clone_list=ids)
_images/20210121_reprogramming_static_barcoding_v2_19_0.png
_images/20210121_reprogramming_static_barcoding_v2_19_1.png

Part I: Infer transition map using clones from all time points

Map inference

Running it the first time takes ~20 mins, ~17 mins of which are used to compute the similarity matrix. When it is run again, it only takes ~3 mins.

[16]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
    adata_orig,
    clonal_time_points=["Day15", "Day21"],
    later_time_point="Day28",
    smooth_array=[15, 10, 5],
    sparsity_threshold=0.2,
    intraclone_threshold=0.2,
)
Trying to set attribute `.uns` of view, copying.
------Compute the full Similarity matrix if necessary------
------Infer transition map between initial time points and the later time one------
--------Current initial time point: Day15--------
Step 1: Select time points
Number of multi-time clones post selection: 179
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.929
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.995
--------Current initial time point: Day21--------
Step 1: Select time points
Number of multi-time clones post selection: 226
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.921
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.99
-----------Total used time: 165.4444100856781 s ------------
[17]:
cs.hf.check_available_map(adata)
adata.uns["available_map"]
[17]:
['transition_map', 'intraclone_transition_map']
Save or load pre-computed data

This can be used to save adata with maps computed from different tools or parameters. Usually, different parameter choices will result in different data_des, a prefix to identify the anndata. Saving an adata would print the data_des, which can be used to load the corresponding adata.

[18]:
save_data = False
if save_data:
    cs.hf.save_map(adata)

load_data = False
if load_data:
    # file_path='CellTag_data/cospar_MultiTimeClone_Later_FullSpace0_t*Day15*Day21*Day28_adata_with_transition_map.h5ad'
    adata = cs.hf.read(file_path)
[19]:
cs.hf.check_available_choices(adata)
WARNING: Pre-computed time_ordering does not include the right time points. Re-compute it!
Available transition maps: ['transition_map', 'intraclone_transition_map']
Available clusters: ['Reprogrammed', 'Failed', 'Others']
Available time points: ['Day15' 'Day21' 'Day28']
Clonal time points: ['Day15' 'Day21' 'Day28']
Plotting
Single-cell transitions
[20]:
selected_state_id_list = [1000, 3500, 6000, 5500]

cs.pl.single_cell_transition(
    adata,
    selected_state_id_list=selected_state_id_list,
    source="transition_map",
    map_backward=False,
)
_images/20210121_reprogramming_static_barcoding_v2_31_0.png
Fate map
[21]:
cs.tl.fate_map(
    adata,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    map_backward=True,
)
cs.pl.fate_map(
    adata,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    plot_target_state=True,
)
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
_images/20210121_reprogramming_static_barcoding_v2_33_1.png
_images/20210121_reprogramming_static_barcoding_v2_33_2.png
Fate bias
[22]:
cs.tl.fate_bias(
    adata,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    map_backward=True,
    method="norm-sum",
)
cs.pl.fate_bias(
    adata,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    plot_target_state=False,
    background=False,
    show_histogram=True,
)
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']
_images/20210121_reprogramming_static_barcoding_v2_35_1.png
Identify differentially expressed genes
[23]:
selected_fates = ["Reprogrammed", "Failed"]
cs.tl.progenitor(
    adata,
    selected_fates,
    source="transition_map",
    sum_fate_prob_thresh=0.2,
    bias_threshold_A=0.5,
    bias_threshold_B=0.5,
)
cs.pl.progenitor(adata, selected_fates, source="transition_map")
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']
Results saved at adata.obs[f'progenitor_transition_map_Reprogrammed'] and adata.obs[f'diff_trajectory_transition_map_Reprogrammed']
Results saved at adata.obs[f'progenitor_transition_map_Failed'] and adata.obs[f'diff_trajectory_transition_map_Failed']
_images/20210121_reprogramming_static_barcoding_v2_37_1.png
_images/20210121_reprogramming_static_barcoding_v2_37_2.png
Differential genes for two ancestor groups
[24]:
cell_group_A = np.array(adata.obs[f"progenitor_transition_map_Reprogrammed"])
cell_group_B = np.array(adata.obs[f"progenitor_transition_map_Failed"])
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
    adata, cell_group_A=cell_group_A, cell_group_B=cell_group_B, FDR_cutoff=0.05
)
[25]:
# All, ranked, DGE genes for group A
dge_gene_A
[25]:
index gene Qvalue mean_1 mean_2 ratio
0 35 Col3a1 6.087368e-223 0.807060 5.999794 -1.953668
1 57 Col1a2 1.345045e-161 0.297653 3.076256 -1.651340
2 41 Spp1 3.197045e-207 10.833703 32.003143 -1.479702
3 9 Col1a1 0.000000e+00 2.036317 7.257215 -1.443333
4 6 Il6st 0.000000e+00 2.108766 7.134142 -1.387648
... ... ... ... ... ... ...
4828 5607 Tbk1 2.006752e-02 0.224493 0.224873 -0.000448
4829 5172 Sec23a 1.005175e-02 0.344517 0.344837 -0.000344
4830 6047 Ints5 3.603720e-02 0.226673 0.226857 -0.000216
4831 5737 Taf2 2.431736e-02 0.278397 0.278525 -0.000144
4832 6101 Dgkq 3.874214e-02 0.144835 0.144836 -0.000002

4833 rows × 6 columns

Gene trend along the trajectory

The results are based on pre-computed dynamic trajectories from the preceding step. It is better to use the intraclone_transition_map. First, show the expression dynamics along the failed trajectory:

[26]:
gene_name_list = ["Col1a2", "Apoa1", "Peg3", "Spint2", "Mettl7a1", "Cdh1"]
selected_fate = "Failed"
cs.pl.gene_expression_dynamics(
    adata, selected_fate, gene_name_list, traj_threshold=0.1, invert_PseudoTime=True
)
_images/20210121_reprogramming_static_barcoding_v2_43_0.png
_images/20210121_reprogramming_static_barcoding_v2_43_1.png

Expression dynamics along the reprogramming trajectory:

[27]:
gene_name_list = ["Col1a2", "Apoa1", "Peg3", "Spint2", "Mettl7a1", "Cdh1"]
selected_fate = "Reprogrammed"
cs.pl.gene_expression_dynamics(
    adata, selected_fate, gene_name_list, traj_threshold=0.1, invert_PseudoTime=True
)
_images/20210121_reprogramming_static_barcoding_v2_45_0.png
_images/20210121_reprogramming_static_barcoding_v2_45_1.png

Part II: Infer transition map from end-point clones¶

It takes ~12 mins to compute for the first time (excluding the time for computing similarity matrix); and ~5 mins later.

[28]:
adata = cs.tmap.infer_Tmap_from_one_time_clones(
    adata_orig,
    initial_time_points=["Day15", "Day21"],
    later_time_point="Day28",
    initialize_method="OT",
    OT_cost="SPD",
    smooth_array=[15, 10, 5],
    sparsity_threshold=0.2,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: Day15--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed shortest path distance matrix
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 51.17336821556091
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.875
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.992
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.276
Finishing Joint Optimization, used time 80.46711325645447
Trying to set attribute `.uns` of view, copying.
--------Current initial time point: Day21--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed shortest path distance matrix
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 98.62810587882996
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.896
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.987
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.15
Finishing Joint Optimization, used time 162.67132997512817
-----------Total used time: 408.8195090293884 s ------------
[29]:
cs.hf.check_available_map(adata)
adata.uns["available_map"]
[29]:
['transition_map', 'OT_transition_map']
Fate bias
[30]:
cs.tl.fate_bias(
    adata,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    map_backward=True,
    method="norm-sum",
)
cs.pl.fate_bias(
    adata,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    plot_target_state=False,
    background=False,
    show_histogram=True,
)
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']
_images/20210121_reprogramming_static_barcoding_v2_51_1.png

Part III: Infer transition map from state information alone

It takes ~5 mins

[31]:
adata = cs.tmap.infer_Tmap_from_state_info_alone(
    adata_orig,
    initial_time_points=["Day15", "Day21"],
    later_time_point="Day28",
    initialize_method="OT",
    OT_cost="SPD",
    smooth_array=[15, 10, 5],
    sparsity_threshold=0.2,
    use_full_Smatrix=True,
)
Step I: Generate pseudo clones where each cell has a unique barcode-----
Trying to set attribute `.uns` of view, copying.
Step II: Perform joint optimization-----
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: Day15--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed shortest path distance matrix
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 42.353046894073486
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.877
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.994
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.301
Finishing Joint Optimization, used time 97.98227381706238
Trying to set attribute `.uns` of view, copying.
--------Current initial time point: Day21--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed shortest path distance matrix
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 125.77742171287537
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.897
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.984
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.29
Finishing Joint Optimization, used time 114.50927495956421
-----------Total used time: 393.1687672138214 s ------------
[32]:
cs.hf.check_available_map(adata)
adata.uns["available_map"]
[32]:
['transition_map', 'OT_transition_map']
Fate bias
[33]:
cs.tl.fate_bias(
    adata,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    map_backward=True,
    method="norm-sum",
)
cs.pl.fate_bias(
    adata,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    plot_target_state=False,
    background=False,
    show_histogram=True,
)
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']
_images/20210121_reprogramming_static_barcoding_v2_57_1.png

Part IV: Predict early fate bias on day 3

Load data

The above dataset only includes clonally-labeled states from day 6 to day 28. We load a dataset that has day-0 and day-3 states, which do not have clonal information.

[34]:
cs.settings.data_path = "CellTag_data_full"  # A relative path to save data. If not existed before, create a new one.
cs.settings.figure_path = "CellTag_figure_full"  # A relative path to save figures. If not existed before, create a new one.
adata_orig_1 = cs.datasets.reprogramming_Day0_3_28()
[35]:
adata_orig_1 = cs.pp.initialize_adata_object(adata_orig_1)
Time points with clonal info: ['Day28']
WARNING: Default ascending order of time points are: ['Day0' 'Day28' 'Day3']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
[36]:
cs.hf.update_time_ordering(adata_orig_1, updated_ordering=["Day0", "Day3", "Day28"])
[37]:
cs.pl.embedding(adata_orig_1, color="time_info")
_images/20210121_reprogramming_static_barcoding_v2_64_0.png
[38]:
adata_orig_1
[38]:
AnnData object with n_obs × n_vars = 27718 × 28001
    obs: 'predicted_doublet', 'row_counts', 'time_info', 'state_info', 'batch'
    var: 'highly_variable-0'
    uns: 'clonal_time_points', 'data_des', 'state_info_colors', 'time_info_colors', 'time_ordering'
    obsm: 'X_clone', 'X_emb', 'X_emb_old', 'X_emb_v1', 'X_pca'
Infer transition map from end-point clones¶

It takes ~4 mins

[39]:
adata_1 = cs.tmap.infer_Tmap_from_one_time_clones(
    adata_orig_1,
    initial_time_points=["Day3"],
    later_time_point=["Day28"],
    initialize_method="HighVar",
    smooth_array=[15, 10, 5],
    max_iter_N=[1, 3],
    sparsity_threshold=0.2,
    use_full_Smatrix=True,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: Day3--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use the HighVar method for initialization-------
Step a: find the commonly shared highly variable genes------
Highly varable gene number: 1684 (t1); 1696 (t2). Common set: 960
Step b: convert the shared highly variable genes into clonal info------
 92%|█████████▎| 888/960 [00:00<00:00, 1292.96it/s]
Total used genes=888 (no cells left)
Step c: compute the transition map based on clonal info from highly variable genes------
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.862
Finishing initialization using HighVar, used time 158.03640127182007
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.921
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.512
Finishing Joint Optimization, used time 126.52328705787659
-----------Total used time: 289.5595648288727 s ------------
Fate bias
[40]:
cs.tl.fate_map(
    adata,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    map_backward=True,
)
cs.pl.fate_map(
    adata,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    plot_target_state=True,
)
Use pre-computed fate map
_images/20210121_reprogramming_static_barcoding_v2_69_1.png
_images/20210121_reprogramming_static_barcoding_v2_69_2.png
[41]:
cs.tl.fate_bias(
    adata_1,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    map_backward=True,
    method="norm-sum",
)
cs.pl.fate_bias(
    adata_1,
    selected_fates=["Reprogrammed", "Failed"],
    source="transition_map",
    selected_times=["Day3"],
    plot_target_state=False,
    background=False,
    show_histogram=True,
)
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']
_images/20210121_reprogramming_static_barcoding_v2_70_1.png
Identify ancestor populations
[42]:
selected_fates = ["Reprogrammed", "Failed"]
cs.tl.progenitor(
    adata_1,
    selected_fates,
    source="transition_map",
    sum_fate_prob_thresh=0.1,
    bias_threshold_A=0.5,
    bias_threshold_B=0.5,
)
cs.pl.progenitor(adata_1, selected_fates, source="transition_map")
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']
Results saved at adata.obs[f'progenitor_transition_map_Reprogrammed'] and adata.obs[f'diff_trajectory_transition_map_Reprogrammed']
Results saved at adata.obs[f'progenitor_transition_map_Failed'] and adata.obs[f'diff_trajectory_transition_map_Failed']
_images/20210121_reprogramming_static_barcoding_v2_72_1.png
_images/20210121_reprogramming_static_barcoding_v2_72_2.png
DGE analysis
[43]:
cell_group_A = np.array(adata_1.obs[f"progenitor_transition_map_Reprogrammed"])
cell_group_B = np.array(adata_1.obs[f"progenitor_transition_map_Failed"])
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
    adata_1, cell_group_A=cell_group_A, cell_group_B=cell_group_B, FDR_cutoff=0.05
)
[44]:
# All, ranked, DGE genes for group A
dge_gene_A
[44]:
index gene Qvalue mean_1 mean_2 ratio
0 46 Ptn 1.714648e-105 1.406754 6.237863 -1.588475
1 11 Cnn1 9.708323e-167 0.914648 4.413725 -1.499543
2 25 Thy1 6.483313e-134 0.640454 2.997786 -1.285106
3 81 Penk 4.283470e-73 0.532019 2.376845 -1.140242
4 2 Cd248 0.000000e+00 1.403314 4.151762 -1.100041
... ... ... ... ... ... ...
1514 3217 Fads3 2.390519e-02 0.770464 0.798334 -0.022533
1515 3353 Tmem189 3.330180e-02 0.633464 0.658867 -0.022264
1516 3590 Rere 4.966121e-02 0.542979 0.565672 -0.021064
1517 1546 Tuba1b 5.925740e-06 5.398442 5.484683 -0.019315
1518 3545 Vegfa 4.597810e-02 1.169517 1.198003 -0.018820

1519 rows × 6 columns

Update cluster annotation on day 3

[45]:
x_emb = adata_1.obsm["X_emb"][:, 0]
state_info = np.array(adata_1.obs["state_info"]).astype(">U15")
sp_idx = (cell_group_A > 0) & (x_emb > 0)
state_info[sp_idx] = "Repro_Day3"

sp_idx = (cell_group_B > 0) & (x_emb > 0)
state_info[sp_idx] = "Failed_Day3"
adata_1.obs["state_info"] = state_info
cs.pl.embedding(adata_1, color="state_info")
/Users/shouwenwang/miniconda3/envs/CoSpar_test/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
... storing 'state_info' as categorical
_images/20210121_reprogramming_static_barcoding_v2_77_1.png
[46]:
cs.settings.set_figure_params(fontsize=12)
gene_list = [
    "FoxA1.HNF4a",
    "Spint2",
    "Apoa1",
    "Hes6",
    "S100a7a",
    "Krt19",
    "Clu",
    "S100a13",
    "Ppa1",
    "Mgst3",
    "Angptl4",
    "Kng1",
    "Igfbp2",
    "Mt2",
    "Areg",
    "Rbp1",
    "Anxa8",
    "Gsto1",
    "Ldhb",
    "Mmp13",
    "Cgref1",
    "Fbln5",
    "Gss",
    "Col1a2",
    "Dlk1",
    "Peg3",
    "Sfrp1",
    "Mrc2",
    "Col1a1",
]

selected_fates = ["Repro_Day3", "Failed_Day3"]
renames = ["Repro. prog.", "Failed prog."]

gene_expression_matrix = cs.pl.gene_expression_heatmap(
    adata_1,
    selected_genes=gene_list,
    selected_fates=selected_fates,
    rename_fates=renames,
    horizontal=True,
    fig_width=6.5,
    fig_height=2,
)
_images/20210121_reprogramming_static_barcoding_v2_78_0.png
[47]:
gene_list = [
    "FoxA1.HNF4a",
    "Spint2",
    "Apoa1",
    "Hes6",
    "S100a7a",
    "Krt19",
    "Clu",
    "S100a13",
    "Ppa1",
    "Mgst3",
    "Angptl4",
    "Kng1",
    "Igfbp2",
    "Mt2",
    "Areg",
    "Rbp1",
    "Anxa8",
    "Gsto1",
    "Ldhb",
    "Mmp13",
    "Cgref1",
    "Fbln5",
    "Gss",
    "Col1a2",
    "Dlk1",
    "Peg3",
    "Sfrp1",
    "Mrc2",
    "Col1a1",
]

selected_fates = ["Reprogrammed", "Failed"]
renames = ["Repro.", "Failed"]

gene_expression_matrix = cs.pl.gene_expression_heatmap(
    adata_1,
    selected_genes=gene_list,
    selected_fates=selected_fates,
    rename_fates=renames,
    horizontal=True,
    fig_width=6.5,
    fig_height=2,
)
_images/20210121_reprogramming_static_barcoding_v2_79_0.png

Lung differentiation

The direct lung differentiation dataset from Hurley, K. et al. Cell Stem Cell (2020) doi:10.1016/j.stem.2019.12.009.

This dataset has multiple time points for the state manifold, but only one time point for the clonal observation on day 27.

[1]:
import cospar as cs
import numpy as np
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2
cs.settings.data_path = "lung_data_paper"  # A relative path to save data. If not existed before, create a new one.
cs.settings.figure_path = "lung_figure_paper"  # A relative path to save figures. If not existed before, create a new one.
cs.settings.set_figure_params(
    format="png", dpi=75, fontsize=14
)  # use png to reduce file size.
Running cospar 0.2.0 (python 3.8.12) on 2022-02-07 22:07.

Load data

[3]:
adata_orig = cs.datasets.lung()
[4]:
adata_orig
[4]:
AnnData object with n_obs × n_vars = 15832 × 26766
    obs: 'time_info', 'state_info'
    uns: 'clonal_time_points', 'data_des', 'state_info_colors'
    obsm: 'X_clone', 'X_emb', 'X_pca'

Preprocessing (optional)

[5]:
preprocessing = False
if preprocessing:
    cs.pp.get_highly_variable_genes(
        adata_orig,
        normalized_counts_per_cell=10000,
        min_counts=3,
        min_cells=3,
        min_gene_vscore_pctl=80,
    )

    cs.pp.remove_cell_cycle_correlated_genes(
        adata_orig, corr_threshold=0.03, confirm_change=False
    )  # optional step
    cs.pp.get_X_pca(adata_orig, n_pca_comp=40)
    # cs.pp.get_X_umap(adata_orig,n_neighbors=20,umap_min_dist=0.3) # we want to keep the original embedding
    # cs.pp.get_state_info(adata_orig,leiden_resolution=0.5) # we want to keep the original state annotation
[6]:
cs.pl.embedding(adata_orig, color="state_info")
_images/20210121_lung_data_v2_9_0.png
[7]:
cs.hf.check_available_choices(adata_orig)
Available transition maps: []
Available clusters: ['NLE', 'Endoderm', 'PNEC', 'Gut', 'iAEC2', 'Others']
Available time points: ['D27' 'pos_17' 'pos_21']
Clonal time points: ['D27']
Basic clonal analysis
[8]:
cs.pl.clones_on_manifold(
    adata_orig,
    selected_clone_list=[30, 198, 150, 58, 263],
    color_list=["blue", "red"],
    selected_times=["D27"],
)
_images/20210121_lung_data_v2_12_0.png
_images/20210121_lung_data_v2_12_1.png
_images/20210121_lung_data_v2_12_2.png
_images/20210121_lung_data_v2_12_3.png
_images/20210121_lung_data_v2_12_4.png
[9]:
cs.tl.fate_coupling(
    adata_orig,
    source="X_clone",
    selected_fates=["iAEC2", "PNEC", "Gut", "NLE", "Endoderm"],
)
cs.pl.fate_coupling(adata_orig, source="X_clone")
WARNING: ['Endoderm'] are ignored due to lack of cells
Results saved as dictionary at adata.uns['fate_coupling_X_clone']
[9]:
<AxesSubplot:title={'center':'source: X_clone'}>
_images/20210121_lung_data_v2_13_2.png
[10]:
cs.pl.barcode_heatmap(
    adata_orig,
    selected_times="D27",
    selected_fates=["iAEC2", "PNEC", "Gut", "NLE", "Endoderm"],
    binarize=True,
)
Data saved at adata.uns['barcode_heatmap']
[10]:
<AxesSubplot:>
_images/20210121_lung_data_v2_14_2.png
[11]:
cs.tl.clonal_fate_bias(adata_orig, selected_fate="iAEC2", alternative="two-sided")
cs.pl.clonal_fate_bias(adata_orig)
100%|██████████| 272/272 [00:00<00:00, 616.34it/s]
Data saved at adata.uns['clonal_fate_bias']
_images/20210121_lung_data_v2_15_2.png
_images/20210121_lung_data_v2_15_3.png

Infer transition map from end-point clones

It takes around 9 minutes to run it for the first time (the data has ~15000 cells). Later runs take < 3 mins.

[12]:
initial_time_points = ["pos_17", "pos_21"]
clonal_time_point = "D27"
adata = cs.tmap.infer_Tmap_from_one_time_clones(
    adata_orig,
    initial_time_points=["pos_17", "pos_21"],
    later_time_point="D27",
    initialize_method="HighVar",
    HighVar_gene_pctl=80,
    max_iter_N=[1, 3],
    smooth_array=[20, 15, 10],
    sparsity_threshold=0.2,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: pos_17--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use the HighVar method for initialization-------
Step a: find the commonly shared highly variable genes------
Highly varable gene number: 3689 (t1); 3729 (t2). Common set: 1075
Step b: convert the shared highly variable genes into clonal info------
Total used genes=863 (no cells left)
Step c: compute the transition map based on clonal info from highly variable genes------
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.923
Finishing initialization using HighVar, used time 75.68489861488342
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.959
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.413
Finishing Joint Optimization, used time 28.069818019866943
Trying to set attribute `.uns` of view, copying.
--------Current initial time point: pos_21--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use the HighVar method for initialization-------
Step a: find the commonly shared highly variable genes------
Highly varable gene number: 3794 (t1); 3729 (t2). Common set: 1152
Step b: convert the shared highly variable genes into clonal info------
Total used genes=1088 (no cells left)
Step c: compute the transition map based on clonal info from highly variable genes------
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.939
Finishing initialization using HighVar, used time 80.7336368560791
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.982
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.823
Finishing Joint Optimization, used time 29.09626293182373
-----------Total used time: 219.88412308692932 s ------------
Save pre-computed map
[13]:
save_data = False
if save_data:
    cs.hf.save_map(adata)

# adata=cs.hf.read('lung_data_paper/Lung_OneTimeClone_t*pos_17*pos_21*D27_adata_with_transition_map.h5ad')
Plotting
[14]:
cs.tl.fate_map(adata, source="transition_map", map_backward=True)
cs.pl.fate_map(
    adata, selected_fates=["iAEC2"], source="transition_map", plot_target_state=True
)
Results saved at adata.obs['fate_map_transition_map_NLE']
Results saved at adata.obs['fate_map_transition_map_Endoderm']
Results saved at adata.obs['fate_map_transition_map_PNEC']
Results saved at adata.obs['fate_map_transition_map_Gut']
Results saved at adata.obs['fate_map_transition_map_iAEC2']
Results saved at adata.obs['fate_map_transition_map_Others']
_images/20210121_lung_data_v2_22_1.png
[15]:
# cs.settings.set_figure_params(figsize=(4,3.5),format='png',fontsize=17)
selected_fates = ["iAEC2", ["PNEC", "Gut", "NLE", "Endoderm"]]
cs.tl.fate_bias(
    adata, selected_fates, source="transition_map", map_backward=True, method="norm-sum"
)
cs.pl.fate_bias(
    adata,
    selected_fates=selected_fates,
    source="transition_map",
    plot_target_state=False,
    background=False,
    show_histogram=True,
)
Results saved at adata.obs['fate_map_transition_map_iAEC2']
Results saved at adata.obs['fate_map_transition_map_PNEC_Gut_NLE_Endoderm']
Results saved at adata.obs['fate_bias_transition_map_iAEC2*PNEC_Gut_NLE_Endoderm']
_images/20210121_lung_data_v2_23_1.png
DGE analysis
[16]:
selected_fates = ["iAEC2", ["PNEC", "Gut", "NLE", "Endoderm"]]
cs.tl.progenitor(
    adata,
    selected_fates,
    source="transition_map",
    sum_fate_prob_thresh=0,
    bias_threshold_A=0.6,
    bias_threshold_B=0.3,
)

cs.pl.progenitor(
    adata, selected_fates, source="transition_map", selected_times=["pos_17"]
)
Results saved at adata.obs['fate_map_transition_map_iAEC2']
Results saved at adata.obs['fate_map_transition_map_PNEC_Gut_NLE_Endoderm']
Results saved at adata.obs['fate_bias_transition_map_iAEC2*PNEC_Gut_NLE_Endoderm']
Results saved at adata.obs[f'progenitor_transition_map_iAEC2'] and adata.obs[f'diff_trajectory_transition_map_iAEC2']
Results saved at adata.obs[f'progenitor_transition_map_PNEC_Gut_NLE_Endoderm'] and adata.obs[f'diff_trajectory_transition_map_PNEC_Gut_NLE_Endoderm']
_images/20210121_lung_data_v2_25_1.png
_images/20210121_lung_data_v2_25_2.png
[17]:
import numpy as np

cell_group_A = np.array(adata.obs["progenitor_transition_map_iAEC2"])
cell_group_B = np.array(adata.obs["progenitor_transition_map_PNEC_Gut_NLE_Endoderm"])
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
    adata, cell_group_A=cell_group_A, cell_group_B=cell_group_B, FDR_cutoff=0.05
)
[18]:
selected_genes = dge_gene_A["gene"][:2]
cs.pl.gene_expression_on_manifold(
    adata, selected_genes=selected_genes, color_bar=True, savefig=False
)
_images/20210121_lung_data_v2_27_0.png
_images/20210121_lung_data_v2_27_1.png

Update cluster annotation

[19]:
state_info = np.array(adata.obs["state_info"]).astype(">U15")

state_info[cell_group_A > 0] = "iAEC2 prog."
state_info[cell_group_B > 0] = "Non-iAEC2 prog."
adata.obs["state_info"] = state_info
[20]:
cs.settings.set_figure_params(fontsize=10)
state_info = np.array(adata.obs["state_info"])

gene_list = [
    "KLF9",
    "CEBPD",
    "YBX3",
    "ATF4",
    "NKX2-1",
    "IRX3",
    "SLC7A11",
    "LIFR",  #'HSPA5',
    "SFRP1",
    "CD55",
    "KIT",
    "CPM",  #'LDLR', 'GFRA1',
    "PITX1",
    "MSX2",
    "MYCN",
    "SOX9",
    "SOX11",
    "ID2",
    "PBX1",
    "FOXA1",
    "SP5",
    "TMPO",
    "DEK",
    "HMGB1",
    "PARP1",
]  # , 'TFDP1', 'HNRNPD', 'MYBL2']

selected_fates = ["iAEC2 prog.", "Non-iAEC2 prog."]
renames = ["iAEC2 prog.", "Non-iAEC2 prog."]

gene_expression_matrix = cs.pl.gene_expression_heatmap(
    adata,
    selected_genes=gene_list,
    selected_fates=selected_fates,
    rename_fates=renames,
    horizontal=True,
    fig_width=6.5,
    fig_height=2,
)
_images/20210121_lung_data_v2_30_0.png

Synthetic bifurcation

We simulated a differentiation process over a bifurcation fork. In this simulation, cells are barcoded in the beginning, and the barcodes remain un-changed. In the simulation we resample clones over time. The first sample is obtained after 5 cell cycles post labeling. The dataset has two time points. See Wang et al. (2021) for more details.

[1]:
import cospar as cs
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2
cs.settings.set_figure_params(
    format="png", dpi=75, fontsize=14
)  # use png to reduce file size.
Running cospar 0.2.0 (python 3.8.12) on 2022-02-07 22:07.
[3]:
cs.settings.data_path = (
    "test"  # A relative path to save data. If not existed before, create a new one.
)
cs.settings.figure_path = (
    "test"  # A relative path to save figures. If not existed before, create a new one.
)

Loading data

[4]:
adata_orig = cs.datasets.synthetic_bifurcation()
[5]:
adata_orig.obsm["X_clone"]
[5]:
<2474x52 sparse matrix of type '<class 'numpy.float64'>'
        with 2474 stored elements in Compressed Sparse Row format>
[6]:
cs.pl.embedding(adata_orig, color="state_info")
_images/20210120_bifurcation_model_static_barcoding_8_0.png
[7]:
cs.pl.embedding(adata_orig, color="time_info")
_images/20210120_bifurcation_model_static_barcoding_9_0.png

Basic clonal analysis

[8]:
cs.pl.clones_on_manifold(adata_orig, selected_clone_list=[1])
_images/20210120_bifurcation_model_static_barcoding_11_0.png
[9]:
cs.tl.fate_coupling(adata_orig, source="X_clone")
cs.pl.fate_coupling(adata_orig, source="X_clone")
Results saved as dictionary at adata.uns['fate_coupling_X_clone']
[9]:
<AxesSubplot:title={'center':'source: X_clone'}>
_images/20210120_bifurcation_model_static_barcoding_12_2.png
[10]:
cs.pl.barcode_heatmap(adata_orig, selected_times="2", color_bar=True, binarize=True)
Data saved at adata.uns['barcode_heatmap']
[10]:
<AxesSubplot:>
_images/20210120_bifurcation_model_static_barcoding_13_2.png
[11]:
cs.tl.clonal_fate_bias(adata_orig, selected_fate="Fate_A", alternative="two-sided")
cs.pl.clonal_fate_bias(adata_orig)
100%|██████████| 52/52 [00:00<00:00, 254.70it/s]
Data saved at adata.uns['clonal_fate_bias']
_images/20210120_bifurcation_model_static_barcoding_14_2.png
_images/20210120_bifurcation_model_static_barcoding_14_3.png

Transition map inference

Transition map from multiple clonal time points.
[12]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
    adata_orig,
    clonal_time_points=["1", "2"],
    smooth_array=[10, 10, 10],
    CoSpar_KNN=20,
    sparsity_threshold=0.2,
)
------Compute the full Similarity matrix if necessary------
----Infer transition map between neighboring time points-----
Step 1: Select time points
Number of multi-time clones post selection: 52
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=10
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.96
-----------Total used time: 1.4379048347473145 s ------------
[13]:
cs.tl.fate_bias(adata, selected_fates=["Fate_A", "Fate_B"], source="transition_map")
cs.pl.fate_bias(adata, source="transition_map")
Results saved at adata.obs['fate_map_transition_map_XXX']
Results saved at adata.obs['fate_bias_transition_map_fateA*fateB']
selected_fates not specified. Using the first available pre-computed fate_bias
_images/20210120_bifurcation_model_static_barcoding_18_1.png
Transition map from a single clonal time point
[14]:
adata = cs.tmap.infer_Tmap_from_one_time_clones(
    adata_orig,
    initial_time_points=["1"],
    later_time_point="2",
    initialize_method="OT",
    smooth_array=[10, 10, 10],
    sparsity_threshold=0.2,
    compute_new=False,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 1--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed shortest path distance matrix
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 3.3392720222473145
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=10
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.98
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.094
Finishing Joint Optimization, used time 1.3652610778808594
-----------Total used time: 4.814281940460205 s ------------
[15]:
cs.tl.fate_bias(adata, selected_fates=["Fate_A", "Fate_B"], source="transition_map")
cs.pl.fate_bias(adata, source="transition_map")
Results saved at adata.obs['fate_map_transition_map_XXX']
Results saved at adata.obs['fate_bias_transition_map_fateA*fateB']
selected_fates not specified. Using the first available pre-computed fate_bias
_images/20210120_bifurcation_model_static_barcoding_21_1.png
Transition amp from only the clonal information
[16]:
adata = cs.tmap.infer_Tmap_from_clonal_info_alone(adata_orig)

cs.tl.fate_bias(
    adata, selected_fates=["Fate_A", "Fate_B"], source="clonal_transition_map"
)
cs.pl.fate_bias(adata, source="clonal_transition_map")
Infer transition map between neighboring time points.
Number of multi-time clones post selection: 52
Use all clones (naive method)
Results saved at adata.obs['fate_map_clonal_transition_map_XXX']
Results saved at adata.obs['fate_bias_clonal_transition_map_fateA*fateB']
selected_fates not specified. Using the first available pre-computed fate_bias
_images/20210120_bifurcation_model_static_barcoding_23_1.png
[ ]:

Simulate differentiation process

[1]:
import cospar as cs

cs.settings.verbosity = 1
cs.settings.set_figure_params(
    format="pdf", figsize=[4, 3.5], dpi=100, fontsize=14, pointsize=10
)
# Each dataset should have its folder to avoid conflicts.
cs.settings.data_path = "data_cospar"
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()

The bifurcation model with clonal dispersion

[2]:
L = 10
adata = cs.simulate.bifurcation_model(t1=4, M=50, L=L)
Load existing data
[3]:
cs.pl.clones_on_manifold(adata, selected_clone_list=[1, 10, 20])
_images/20220402_simulate_differentiation_4_0.png
_images/20220402_simulate_differentiation_4_1.png
_images/20220402_simulate_differentiation_4_2.png
Using the observed clonal data, no fancy analysis
[4]:
adata = cs.tmap.infer_Tmap_from_clonal_info_alone(adata, method="naive")
Tmap = adata.uns["clonal_transition_map"]
state_info = adata.obs["state_info"]
cell_id_t1 = adata.uns["Tmap_cell_id_t1"]
cell_id_t2 = adata.uns["Tmap_cell_id_t2"]
correlation_naive = (
    cs.simulate.quantify_correlation_with_ground_truth_fate_bias_BifurcationModel(
        Tmap, state_info, cell_id_t1, cell_id_t2
    )
)
print(
    f"Fate bias correlation from the observed transition map: {correlation_naive:.3f}"
)
Fate bias correlation from the observed transition map: 0.588
[5]:
cs.tl.fate_bias(adata, source="clonal_transition_map", selected_fates=["1", "0"])
cs.pl.fate_bias(adata, source="clonal_transition_map", selected_fates=["1", "0"])
_images/20220402_simulate_differentiation_7_0.png
Apply the CoSpar method
[6]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
    adata, smooth_array=[10, 10, 10], compute_new=True
)
Tmap = adata.uns["transition_map"]
state_info = adata.obs["state_info"]
cell_id_t1 = adata.uns["Tmap_cell_id_t1"]
cell_id_t2 = adata.uns["Tmap_cell_id_t2"]
correlation_cospar = (
    cs.simulate.quantify_correlation_with_ground_truth_fate_bias_BifurcationModel(
        Tmap, state_info, cell_id_t1, cell_id_t2
    )
)
print(
    f"Fate bias correlation from the predicted transition map: {correlation_cospar:.3f}"
)
Fate bias correlation from the predicted transition map: 0.945
[7]:
cs.tl.fate_bias(adata, source="transition_map", selected_fates=["1", "0"])
cs.pl.fate_bias(adata, source="transition_map", selected_fates=["1", "0"])
_images/20220402_simulate_differentiation_10_0.png

Linear differentiation under barcode homoplasy

[8]:
adata = cs.simulate.linear_differentiation_model(
    Nt1=400, progeny_N=1, used_clone_N=100, always_simulate_data=True
)
Generate new data
Time elapsed for generating clonal data:  0.2516930103302002
Generate mixing matrix
[9]:
cs.pl.clones_on_manifold(adata, selected_clone_list=[1, 10, 20])
_images/20220402_simulate_differentiation_13_0.png
_images/20220402_simulate_differentiation_13_1.png
_images/20220402_simulate_differentiation_13_2.png
Using the observed clonal data, no fancy method
[10]:
adata = cs.tmap.infer_Tmap_from_clonal_info_alone(adata, method="naive")
Tmap = adata.uns["clonal_transition_map"]
state_info = adata.obs["state_info"]
cell_id_t1 = adata.uns["Tmap_cell_id_t1"]
cell_id_t2 = adata.uns["Tmap_cell_id_t2"]

X_t1 = adata.obsm["X_orig"][cell_id_t1]
X_t2 = adata.obsm["X_orig"][cell_id_t2]
TPR_naive = cs.simulate.quantify_transition_peak_TPR_LinearDifferentiation(
    Tmap, X_t1, X_t2
)
print(f"True positive rate for the observed transition map: {TPR_naive:.3f}")
True positive rate for the observed transition map: 0.300
_images/20220402_simulate_differentiation_15_1.png
Apply CoSpar
[11]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
    adata, smooth_array=[10, 10, 10], compute_new=True
)
Tmap = adata.uns["transition_map"]
state_info = adata.obs["state_info"]
cell_id_t1 = adata.uns["Tmap_cell_id_t1"]
cell_id_t2 = adata.uns["Tmap_cell_id_t2"]

X_t1 = adata.obsm["X_orig"][cell_id_t1]
X_t2 = adata.obsm["X_orig"][cell_id_t2]
TPR_cospar = cs.simulate.quantify_transition_peak_TPR_LinearDifferentiation(
    Tmap, X_t1, X_t2
)
print(f"True positive rate for the predicted transition map: {TPR_cospar:.3f}")
True positive rate for the predicted transition map: 0.833
_images/20220402_simulate_differentiation_17_1.png