CoSpar - dynamic inference by integrating state and lineage information¶

CoSpar is a toolkit for dynamic inference from lineage-traced single cells.
The methods are based on
Wang et al. Nat. Biotech. (2022).
Dynamic inference based on single-cell state measurement alone requires serious simplifications. On the other hand, direct dynamic measurement via lineage tracing only captures partial information and its interpretation is challenging. CoSpar integrates both state and lineage information to infer a finite-time transition map of a development/differentiation system. It gains superior robustness and accuracy by exploiting both the local coherence and sparsity of differentiation transitions, i.e., neighboring initial states share similar yet sparse fate outcomes. Building around the AnnData
object, CoSpar provides an integrated analysis framework for datasets with both state and lineage information. When only state information is available, CoSpar also improves upon existing dynamic inference methods by imposing sparsity and coherence. It offers essential toolkits for analyzing lineage data, state information, or their integration.
CoSpar’s key applications¶
infer transition maps from lineage data, state measurements, or their integration.
predict the fate bias of progenitor cells.
order cells along a differentiation trajectory leading to a given cell fate.
predict gene expression dynamics along a trajectory.
predict genes whose expression correlates with future fate outcome.
generate a putative fate hierarchy, ordering fates by their lineage distances.
Recorded Talks¶
Jun 1: Single-Cell Data Science 2022. This is a 20-min short talk focusing more on the utility of CoSpar: talk video
Oct 19, 2022: Invited MIA talk at Broad Institute. This is an one-hour talk focusing on the Machine Learning part of CoSpar: talk video. The talk slides can be found here.
Reference¶
S.-W. Wang*, M. Herriges, K. Hurley, D. Kotton, A. M. Klein*, CoSpar identifies early cell fate biases from single cell transcriptomic and lineage information, Nat. Biotech. (2022). [* corresponding authors]
Support¶
Feel free to submit an issue or send us an email. Your help to improve CoSpar is highly appreciated.
Acknowledgment¶
Shou-Wen Wang wants to acknowledge Xiaojie Qiu for inspiring him to make this website. He also wants to acknowledge the community that maintains scanpy and scvelo, where he learned about proper code documentation. He thanks Tal Debrobrah Scully, Qiu Wu and other lab members for testing the package. Shou-Wen wants to thank especially Allon Klein for his mentorship. Finally, he wants to acknowledge the generous support of the Damon Runyon Foundation through the Quantitative Biology Fellowship.
About CoSpar¶
The following information is adapted from Wang et al. Nat. Biotech. (2022). High-throughput single-cell measurements have enabled unbiased studies of development and differentiation, leading to numerous methods for dynamic inference. However, single-cell RNA sequencing (scRNA-seq) data alone does not fully constrain the differentiation dynamics, and existing methods inevitably operate under simplified assumptions. In parallel, the lineage information of individual cells can be profiled simultaneously along with their transcriptome by using a heritable and expressible DNA barcode as a lineage tracer (we call lineage-tracing scRNAseq, or LT-scSeq). The barcode may remain static or evolve over time.
However, the lineage data could be challenging to analyze. These challenges include stochastic differentiation and variable expansion of clones; cells loss during analysis; barcode homoplasy wherein cells acquire the same barcode despite not having a lineage relationship; access to clones only at a single time point; and clonal dispersion due to a lag time between labeling cells and the first sampling (the lag time is necessary to allow the clone to grow large for resampling).
CoSpar, developed by Wang et al. Nat. Biotech. (2022), is among the first tools to perform dynamic inference by integrating state and lineage information. It solves for the transition probability map from cell states at an earlier time point to states at a later time point. It achieves accuracy and robustness by learning a sparse and coherent transition map, where neighboring initial states share similar yet sparse fate outcomes. Built upon the finite-time transition map, CoSpar can 1) infer fate potential of early states; 2) detect early fate bias (thus, fate boundary) among a heterogeneous progenitor population; 3) identify putative driver genes for fate bifurcation; 4) infer fate coupling or hierarchy; 5) visualize gene expression dynamics along an inferred differential trajectory. CoSpar also provides several methods to analyze clonal data by itself, including the clonal coupling between fate clusters and the bias of a clone towards a given fate, etc. We envision CoSpar to be a platform to integrate key methods needed to analyze data with both state and lineage information.

(copy right: Nature Biotechnology)
Coherent sparse optimization¶
One formalization of dynamic inference is to identify a transition map, a matrix \(T_{ij} (t_1,t_2)\), which describes the probability of a cell, initially in some state \(i\) at time \(t_1\), giving rise to progeny in a state \(j\) at time \(t_2\). We define \(T_{ij} (t_1,t_2)\) specifically as the fraction of progeny from state \(i\) that occupy state \(j\). This transition matrix averages the effects of cell division, loss, and differentiation, but it nonetheless proves useful for several applications.
We make two reasonable assumptions about the nature of biological dynamics to constrain inference of the transition map. We assume the map to be a sparse matrix, since most cells can access just a few states during an experiment. And we assume the map to be locally coherent, meaning that cells in similar states should share similar fate outcomes. These constraints together force transition maps to be parsimonious and smooth, which we reasoned would make them robust to practical sources of noise in LT-scSeq experiments. As inputs, CoSpar requires a barcode-by-cell matrix \(I(t)\) that encodes the clonal information at time \(t\), and a data matrix for observed cell states (e.g. from scRNA-seq). Clonal data may have nested structure reflecting subclonal labeling.
CoSpar is formulated assuming that we have information on the same clones at more than one time point. More often, one might observe clones at only a later time point \(t_2\). For these cases inference is not fully constrained, one must learn both the transition map T and the initial clonal data \(I(t_1)\). We approximate a solution additionally constrained by a minimum global transport cost. We show that this approach is robust to initialization in tested datasets. Finally, coherence and sparsity provide reasonable constraints to the simpler problem of predicting dynamics from state heterogeneity alone without lineage data. We extended CoSpar to this case. Thus, CoSpar is flexible to different experimental designs, as summarized by the above figure. Our core algorithms are illustrated below.

(copy right: Nature Biotechnology)
Below, we formalize the coherent sparse optimization by which CoSpar infers the transition map.
In a model of stochastic differentiation, cells in a clone are distributed across states with a time-dependent density vector \(\vec{P}(t)\). A transition map \(T\) directly links clonal density profiles \(\vec{P}(t_{1,2})\) between time points:
From multiple clonal observations, our goal is to learn \(T\). To do so, we consider each observed cell transcriptome as a distinct state (\(\vec{P}(t)\in R^{N_t}\)) for \(N_t\) cells profiled at time \(t\)), and introduce \(S(t)\in R^{N_t\times N_t}\) as a matrix of cell-cell similarity over all observed cell states, including those lacking clonal information. Denoting \(I(t)\in \{0,1\}^{M\times N_t}\) as a clone-by-cell matrix of \(M\) clonal barcodes, the density profiles of observed clones \(P(t)\in R^{M\times N_t}\) are estimated as \(P(t)\approx I(t)S(t)\). In matrix form, the constraint in Eq. (1) from all observed clones then becomes \(P(t_2)\approx P(t_1)T(t_1,t_2)\).
Since the matrices \(P(t_{1,2})\) are determined directly from data, with enough information \(T(t_1,t_2)\) could be learnt by matrix inversion. However, in most cases, the number of clones is far less than the number of states. To constrain the map, we require that: 1) \(T\) is a sparse matrix; 2) \(T\) is locally coherent; and 3) \(T\) is a non-negative matrix. With these requirements, the inference becomes an optimization problem:
Here, \(‖T‖_1\) quantifies the sparsity of the matrix T through its l-1 norm, while \(‖LT‖_2\) quantifies the local coherence of \(T\) (\(L\) is the Laplacian of the cell state similarity graph, and \(LT\) is the local divergence). The remaining constraints enforce the observed clonal dynamics, non-negativity of \(T\), and map normalization, respectively. At \(\alpha=0\), the minimization takes the form of Lasso, an algorithm for compressed sensing. Our formulation extends compressed sensing from vectors to matrices, and to enforce local coherence. The local coherence extension is reminiscent of the fused Lasso problem. An iterative, heuristic approach solves the CoSpar optimization efficiently, replacing \((\alpha,\epsilon)\) with parameters that explicitly control coherence and sparsity. See Wang et al. Nat. Biotech. (2022) for a detailed exposition of the method and its implementation.
CoSpar - dynamic inference by integrating transcriptome and lineage information
API¶
Import CoSpar as:
import cospar as cs
CoSpar is built around the AnnData
object (usually called adata). For each cell, we store its RNA count matrix at adata.X
, the gene names at adata.var_names
,time information at adata.obs['time_info']
, state annotation at adata.obs['state_info']
, clonal information at adata.obsm['X_clone']
, and 2-d embedding at adata.obsm['X_emb']
.
Once the AnnData
object is initialized via cs.pp.initialize_adata_object()
, the typical flow of analysis is to 1) perform preprocessing and dimension reduction (cs.pp.*
); 2) visualize and analyze clonal data alone (cs.pl.*
); 3) infer transition map (cs.tmap.*
); and 4) analyze inferred map (cs.tl.*
) and then visualize the results with the plotting functions (cs.pl.*
). Typically, each cs.tl.*
function has a corresponding cs.pl.*
function. We also provide several built-in datasets (cs.datasets.*
) and miscellaneous functions to assist with the analysis (cs.hf.*
). See tutorial for details.
Preprocessing¶
|
Initialized the |
|
Get highly variable genes. |
Remove cell-cycle correlated genes. |
|
|
Get X_pca. |
|
Get X_emb using |
|
Build the X_clone matrix from data. |
|
Update state_info using |
|
Refine state info according to marker gene expression. |
Refine state info by clustering states at given time points. |
Transition map inference¶
|
Infer transition map for clonal data with multiple time points. |
|
Infer transition map from clones with a single time point |
|
Infer transition map from state information alone. |
Compute transition map using only the lineage information. |
Analysis¶
|
Compute clonal fate bias towards a cluster. |
|
Find clones that significantly biased towards a given fate |
|
Compute fate coupling determined by the transition map. |
|
Build fate hierarchy or lineage trees |
|
Compute transition probability to given fate/ancestor clusters. |
|
Quantify how multi-potent a cell state is. |
|
Compute fate bias to given two fate clusters (A, B). |
|
Identify trajectories towards/from two given clusters. |
|
Infer trajectory towards/from a cluster |
|
Perform differential gene expression analysis and plot top DGE genes. |
Plotting¶
Clone analysis (clone visualization, clustering etc.)
|
Plot clones on top of state embedding. |
|
Plot barcode heatmap among different fate clusters. |
|
Plot clonal fate bias towards a cluster. |
|
Plot fate coupling determined by the transition map. |
|
Plot fate coupling determined by the transition map. |
|
Scatter plot for clonal fate number across time point |
|
Report the statistics of the clonal data. |
|
Visualize a tree structured in ete3 style. |
Transition map analysis (fate bias etc.)
|
Plot transition probability from given initial cell states. |
|
Plot transition probability to given fate/ancestor clusters. |
|
Plot fate potency. |
|
Plot fate bias. |
|
Plot the progenitors of given fate clusters. |
|
Plot the progenitors of given fates across time points |
|
Plot gene trend along the inferred dynamic trajectory. |
|
Plot fate coupling determined by the transition map. |
|
Plot fate coupling determined by the transition map. |
General
|
Scatter plot for user-specified embedding basis. |
|
A test embedding method for plotting genes |
|
Plot gene expression on the state manifold. |
|
Plot heatmap of gene expression within given clusters. |
|
Set resolution/size, styling and format of figures. |
Datasets¶
|
The hematopoiesis data set from |
|
The hematopoiesis data set from |
|
Top 15% most heterogeneous clones of the hematopoiesis data set from |
|
All of the hematopoiesis data set from |
|
The direct lung differentiation dataset from |
|
The reprogramming dataset from |
|
The reprogramming dataset from |
|
Synthetic clonal dataset with static barcoding. |
Help functions¶
|
Read file and return |
|
Save the adata and print the filename prefix. |
|
Save preprocessed adata. |
|
Check whether the adata has the right structure. |
|
Check available parameter choices. |
|
Update the ordering of time points at adata.uns[‘time_ordering’] |
|
Update data_des, a string to distinguish different datasets |
|
Compute the normalized correlation of the data matrix. |
Build the X_clone matrix from data. |
Simulations¶
|
Simulate linear differentiation corrupted with barcode collision (See Fig. |
|
Simulate bifurcation corrupted with clonal dispersion (See Fig. |
|
Quantify the correlation of an inferred fate bias with the actual one, using a map from the Bifurcation Model. |
|
Quantify the True positive rate of a transition map for linear differentiation model. |
Release notes¶
v0.2.1¶
- Major changes from v0.1.8 to v0.2.1:
Split each plotting function into two parts: computing the results (stored at cospar.tl.**) and actually plotting the result (stored at cospar.pl.**).
Update the notebooks to accomodate these changes.
Update the datasets in the cloud to add more annotations.
Re-organize the content of the plot, tool, and tmap modules.
Fix stochasticity when running HighVar method to generate the initialized map.
Fix generating X_clone from the cell_id-by-barcode_id list.
Add a few more functions:
cospar.pl.clonal_fates_across_time()
,cospar.pl.clonal_reports()
,cospar.pl.embedding_genes()
,cospar.tl.fate_biased_clones()
Update
cospar.pl.barcode_heatmap()
to order clones in a better wayFix the docs.
Adopt “Raise ValueError” method for error handling.
Unify error checking at the beginning of several functions.
v0.1.8¶
This is used in running the notebooks that generate figures for the published paper. To run the original notebooks, you should switch to this version.
Installation¶
CoSpar requires Python 3.6 or later. We recommend using Miniconda for package management. For a computer that does not have a package management tool yet, please install Miniconda first, and activate it by running the following command in the terminal:
source ~/.bash_profile
PyPI¶
Install CoSpar from PyPI using:
pip install --upgrade cospar
If you get a Permission denied
error, use pip install --upgrade cospar --user
instead.
If you get errors related to ‘gcc’, try to specify the following gcc path for installation:
env CXX=/usr/local/Cellar/gcc/8.2.0/bin/g++-8 CC=/usr/local/Cellar/gcc/8.2.0/bin/gcc-8 pip install cospar
If you get errors for version conflicts with existing packages, try:
pip install --ignore-installed --upgrade cospar
Development Version¶
To work with the latest development version, install from GitHub using:
pip install git+https://github.com/AllonKleinLab/cospar
or:
git clone https://github.com/AllonKleinLab/cospar
pip install -e cospar
-e
is short for --editable
and links the package to the original cloned location such that pulled changes are also reflected in the environment.
Jupyter Notebook¶
To run the tutorials in a notebook locally, please install:
conda install notebook
and run jupyter notebook
in the terminal. If you get the error Not a directory: 'xdg-settings'
,
use jupyter notebook --no-browser
instead and open the url manually (or use this
bugfix).
If you run into issues, do not hesitate to approach us or raise a GitHub issue.
Testing CoSpar in a new environment¶
In case you want to test cospar without affecting existing python packages, you can create a new conda environment and install CoSpar there:
conda create -n test_cospar python=3.6
conda activate test_cospar
pip install cospar
Now, install jupyter notebook in this environment:
pip install --user ipykernel
If you encounter an error related to nbconvert
, run (this is optional):
pip3 install --upgrade --user nbconvert
Finally, install the jupyter notebook kernel related to this environment:
python -m ipykernel install --user --name=test_cospar
Now, you can open jupyter notebook by running jupyter notebook
in the terminal, and select the kernel test_cospar
to run CoSpar.
Getting Started¶
Here, we explain the basics of using CoSpar. CoSpar requires the count matrix not log-transformed
. This is specifically assumed in selecting highly variable genes, in computing PCA, and in the HighVar method for initializing the joint optimization using a single clonal time point. CoSpar also assumes that the dataset has more than one time point. However, if you have only a snapshot, you can still manually cluster the cells into more than one time point to use CoSpar.
First, import CoSpar with:
import cospar as cs
For better visualization you can change the matplotlib settings to our defaults with:
cs.settings.set_figure_params()
If you want to adjust parameters for a particular plot, just pass the parameters into this function.
The workflow of CoSpar is summarized by the following illustration:

Also, below is a summary of the main analyses after we infer the transition map, and its connection with the mathematical formulation in Wang et al. Nat. Biotech. (2022).

Initialization¶
Given the gene expression matrix, clonal matrix, and other information, initialize the anndata object using:
adata_orig = cs.pp.initialize_adata_object(adata=None,**params)
The AnnData
object adata_orig
stores the count matrix (adata_orig.X
), gene names (adata_orig.var_names
), and temporal annotation of cells (adata_orig.obs['time_info']
). Optionally, you can also provide the clonal matrix X_clone
, selected PCA matrix X_pca
, the embedding matrix X_emb
, and the state annotation state_info
, which will be stored at adata_orig.obsm['X_clone']
, adata_orig.obsm['X_pca']
, adata_orig.obsm['X_emb']
, and adata_orig.obs['state_info']
, respectively.
If an adata object is provided as an input, the initialization function will try to automatically generate the correct data structure, and all annotations associated with the provided adata will remain intact. You can add new annotations to supplement or override existing annotations in the adata object.
If you do not have a dataset yet, you can still play around using one of the built-in datasets, e.g.:
adata_orig = cs.datasets.hematopoiesis_subsampled()
Preprocessing & dimension reduction¶
Assuming basic quality control (excluding cells with low read count etc.) have been done, we provide basic preprocessing (gene selection and normalization) and dimension reduction related analysis (PCA, UMAP embedding etc.) at cs.pp.*
:
cs.pp.get_highly_variable_genes(adata_orig,**params)
cs.pp.remove_cell_cycle_correlated_genes(adata_orig,**params)
cs.pp.get_X_pca(adata_orig,**params)
cs.pp.get_X_emb(adata_orig,**params)
cs.pp.get_state_info(adata_orig,**params)
cs.pp.get_X_clone(adata_orig,**params)
The first step get_highly_variable_genes
also includes count matrix normalization. The second step, which is optional but recommended, removes cell cycle correlated genes among the selected highly variable genes. In get_X_pca
, we apply z-score transformation for each gene expression before computing the PCA. In get_X_emb
, we simply use the umap function from scanpy
. With get_state_info
, we extract state information using leiden clustering implemented in scanpy
.
In get_X_clone
, we faciliate the conversion of the raw clonal data into a cell-by-clone matrix. As mentioned before, this preprocessing assumes that the count matrix is not log-transformed.
Basic clonal analysis¶
We provide a few plotting functions to help visually exploring the clonal data before any downstream analysis. You can visualize clones on state manifold directly:
cs.pl.clones_on_manifold(adata_orig,**params)
You can generate the barcode heatmap across given clusters to inspect clonal behavior:
cs.pl.barcode_heatmap(adata_orig,**params)
You can quantify the clonal coupling across different fate clusters:
cs.tl.fate_coupling(adata_orig,source='X_clone',**params)
cs.pl.fate_coupling(adata_orig,source='X_clone',**params)
Strong coupling implies the existence of bi-potent or multi-potent cell states at the time of barcoding. You can visualize the fate hierarchy by a simple neighbor-joining method:
cs.tl.fate_hierarchy(adata_orig,source='X_clone',**params)
cs.pl.fate_hierarchy(adata_orig,source='X_clone',**params)
Finally, you can infer the fate bias \(-log_{10}(P_{value})\) of each clone towards a designated fate cluster:
cs.pl.clonal_fate_bias(adata_orig,**params)
A biased clone towards this cluster has a statistically significant cell fraction within or outside this cluster.
Transition map inference¶
The core of the software is efficient and robust inference of a transition map by integrating state and clonal information. If the dataset has multiple clonal time points, you can run:
adata=cs.tmap.infer_Tmap_from_multitime_clones(adata_orig,clonal_time_points=None,later_time_point=None,**params)
It subsamples the input data at selected time points and computes the transition map, stored at adata.uns['transition_map']
and adata.uns['intraclone_transition_map']
, with the latter restricted to intra-clone transitions. Depending on later_time_point
, it has two modes of inference:
When
later_time_point=None
, it infers a transition map between neighboring time points. For example, for clonal_time_points=[‘day1’, ‘day2’, ‘day3’], it computes transitions for pairs (‘day1’, ‘day2’) and (‘day2’, ‘day3’), but not for (‘day1’, ‘day3’).If
later_time_point
is specified, it generates a transition map between this time point and each of the earlier time points. In the previous example, iflater_time_point=='day3'
, we infer transitions for pairs (‘day1’, ‘day3’) and (‘day2’, ‘day3’). This applies to the following map inference functions.
If the dataset has only one clonal time point, you can run:
adata=cs.tmap.infer_Tmap_from_one_time_clones(adata_orig,initial_time_points=None, later_time_point=None,initialize_method='OT',**params)
which jointly optimizes the transition map and the initial clonal structure. It requires initializing the transition map using state information alone. We provide two methods for such initialization: 1) OT
for using the standard optimal transport approach; 2) HighVar
for a customized approach, assuming that cells similar in gene expression across time points share clonal origin. For the OT
method, if you wish to utilize the growth rate information as Waddington-OT, you can directly pass the growth rate estimate for each cell to the input AnnaData object at adata_orig.obs["cell_growth_rate"]
. Depending on the choice, the initialized map is stored at adata.uns['OT_transition_map']
or adata.uns['HighVar_transition_map']
. The final product is stored at adata.uns['transition_map']
.
HighVar
converts highly variable genes into pseudo multi-time clones and infers a putative map with coherent sparse optimization. We find the HighVar
method performs better than the OT method, especially when there are large differentiation effects over the observed time window, or batch effects.
If initial_time_points
and later_time_point
are not specified, a map with transitions from all time points to the last time point is generated.
If you do not have any clonal information, you can still run:
adata=cs.tmap.infer_Tmap_from_state_info_alone(adata_orig,initial_time_points=None,later_time_point=None,initialize_method='OT',**params)
It is the same as cs.tmap.infer_Tmap_from_one_time_clones
except that we assume a pseudo clonal data where each cell at the later time point occupies a unique clone.
We also provide simple methods that infer transition map from clonal information alone:
adata=cs.tmap.infer_Tmap_from_clonal_info_alone(adata_orig,clonal_time_points=None,later_time_point=None,**params)
The result is stored at adata.uns['clonal_transition_map']
.
Analysis and visualization¶
Finally, each of the computed transition maps can be explored on state embedding at the single-cell level using a variety of analysis and plotting functions. There are some common parameters: 1) source
, for choosing one of the pre-computed transition maps (or the raw clonal data) for analysis; 2) selected_fates
, for visualizing the fate bias towards/against given fate clusters; 3) map_backward
, for analyzing forward or backward transitions; 4) method
, for different methods in fate probability analysis. See CoSpar basics for more details.
Below, we frame the task in the language of analyzing backward transitions for convenience. To see where a cell came from, run:
cs.pl.single_cell_transition(adata,**params)
To visualize the fate probability of initial cell states, run:
cs.tl.fate_map(adata,**params)
cs.pl.fate_map(adata,**params)
To infer the fate bias of initial cell states between two fate clusters, run:
cs.tl.fate_bias(adata,**params)
cs.pl.fate_bias(adata,**params)
To infer the dynamic trajectory towards given fate clusters, run:
cs.tl.progenitor(adata,**params)
cs.pl.progenitor(adata,**params)
or, alternatively if you have data with multiple clonal time points, run:
cs.tl.iterative_differentiation(adata,**params)
cs.pl.iterative_differentiation(adata,**params)
The first method (cs.tl.progenitor
) assumes two input fate clusters and infers each trajectory by thresholding the corresponding fate bias. The second method (cs.tl.iterative_differentiation
) infers the trajectory by iteratively tracing a selected fate cluster all the way back to its putative origin at the initial time point. For both methods, the inferred trajectory for each fate will be saved at adata.obs[f'diff_trajectory_{source}_{fate_name}']
, and we can explore the gene expression dynamics along this trajectory using:
cs.pl.gene_expression_dynamics(adata,**params)
Additionally, the first method (cs.pl.progenitor
) exports the selected ancestor states selected fate clusters at adata.obs[f'progenitor_{source}_{fate_name}']
, which can be used to infer the driver genes for fate bifurcation by running:
cs.pl.differential_genes(adata,**params)
If there are multiple mature fate clusters, you can infer their differentiation coupling from the fate probabilities of initial cells or the raw clonal matrix by:
cs.tl.fate_coupling(adata,source='transition_map',**params)
cs.pl.fate_coupling(adata,source='transition_map',**params)
You can also infer the fate hierarchy from:
cs.tl.fate_hierarchy(adata,source='transition_map',**params)
cs.pl.fate_hierarchy(adata,source='transition_map',**params)
Loading data¶
We provide a simple guide in how to load your own data and initialize the adata object for downstream analysis. This tutorial assumes that the files are either in txt or mtx format. After running this pipeline, please see the folder test_data for how exactly each data is stored.
The first step to use CoSpar is to construct an anndata
object that stores all relevant data. The key annotations after initialization are:
adata.X: state count matrix, shape (n_cell, n_gene). This should not be log-normalized.
adata.var_names: list of gene names, shape (n_genes,).
adata.obs[‘time_info’]: time annotation (type:
str
) for each cell, shape (n_cell,).adata.obs[‘state_info’]: state annotation for each cell, shape (n_cell, 1). [Optional. Can be generated in preprocessing.
adata.obsm[‘X_clone’]: clonal labels for each cell in the form of np.array or sparse matrix, shape (n_cell, n_clone).
adata.obsm[‘X_pca’]: PCA matrix, shape (n_cell, n_pcs). [Optional. Can be generated in preprocessing.
adata.obsm[‘X_emb’]: two-dimensional embedding, shape (n_cell, 2). [Optional. Can be generated in preprocessing.
cs.pp.initialize_adata_object
assists this initialization.
[1]:
import cospar as cs
import pandas as pd
import scipy.io as sio
import numpy as np
import os
[2]:
cs.logging.print_version()
# Set the messaging level. At a given value, a running function will
# print information at or below its level.
cs.settings.verbosity = 2 # range: 0 (error),1 (warning),2 (info),3 (hint).
# Plot setting. If you want to control a particular plot,
# just change the setting here, and run that plotting function.
cs.settings.set_figure_params(
format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=3
)
Running cospar 0.2.0 (python 3.8.12) on 2022-02-08 19:25.
Each dataset should have its folder to avoid conflicts.
[3]:
# set the directory for figures and data. If not existed yet, they will be created automtaically.
cs.settings.data_path = "test_data" # This name should be exactly 'text_data'
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()
cs.datasets.raw_data_for_import_exercise() # download the data
Import method A¶
Assuming that the X_clone data is stored in mtx format
[4]:
df_cell_id = pd.read_csv(os.path.join(cs.settings.data_path, "cell_id.txt"))
df_cell_id.head()
[4]:
Cell_ID | |
---|---|
0 | cell_10 |
1 | cell_13 |
2 | cell_18 |
3 | cell_32 |
4 | cell_70 |
[5]:
df_gene = pd.read_csv(os.path.join(cs.settings.data_path, "gene_names.txt"))
df_gene.head()
[5]:
gene_names | |
---|---|
0 | 0610009L18Rik |
1 | 0610037L13Rik |
2 | 1110012L19Rik |
3 | 1110020A21Rik |
4 | 1110028F11Rik |
[6]:
df_time = pd.read_csv(os.path.join(cs.settings.data_path, "time_info.txt"))
df_time.head()
[6]:
time_info | |
---|---|
0 | 6 |
1 | 6 |
2 | 6 |
3 | 6 |
4 | 6 |
[7]:
RNA_count_matrix = sio.mmread(
os.path.join(cs.settings.data_path, "gene_expression_count_matrx.mtx")
)
RNA_count_matrix
[7]:
<781x2481 sparse matrix of type '<class 'numpy.float64'>'
with 225001 stored elements in COOrdinate format>
[8]:
X_clone = sio.mmread(os.path.join(cs.settings.data_path, "cell_by_clone_matrx.mtx"))
X_clone
[8]:
<781x339 sparse matrix of type '<class 'numpy.int64'>'
with 781 stored elements in COOrdinate format>
[9]:
df_state = pd.read_csv(os.path.join(cs.settings.data_path, "state_info.txt"))
df_state.head()
[9]:
state_info | |
---|---|
0 | Neutrophil |
1 | Baso |
2 | Monocyte |
3 | Monocyte |
4 | Baso |
[10]:
df_X_emb = pd.read_csv(os.path.join(cs.settings.data_path, "embedding.txt"))
df_X_emb.head()
[10]:
x | y | |
---|---|---|
0 | 1165.708 | -2077.047 |
1 | -980.871 | -2055.629 |
2 | 2952.953 | 281.797 |
3 | 1314.821 | -560.760 |
4 | -1067.502 | -1275.488 |
[11]:
X_emb = np.array([df_X_emb["x"], df_X_emb["y"]]).T
Now, initialize the adata object
[12]:
adata_orig = cs.pp.initialize_adata_object(
X_state=RNA_count_matrix,
gene_names=df_gene["gene_names"],
cell_names=df_cell_id["Cell_ID"],
time_info=df_time["time_info"],
X_clone=X_clone,
state_info=df_state["state_info"],
X_emb=X_emb,
data_des="my_data",
)
Create new anndata object
WARNING: X_pca not provided. Downstream processing is needed to generate X_pca before computing the transition map.
Time points with clonal info: ['2' '4' '6']
WARNING: Default ascending order of time points are: ['2' '4' '6']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
Update the time ordering. A correct time ordering is assumed later.
[13]:
cs.hf.update_time_ordering(adata_orig, updated_ordering=["2", "4", "6"])
[14]:
adata_orig
[14]:
AnnData object with n_obs × n_vars = 781 × 2481
obs: 'time_info', 'state_info'
uns: 'data_des', 'time_ordering', 'clonal_time_points'
obsm: 'X_clone', 'X_emb'
Alternatively, you can initialize the object by building on an existing adata object. This will keep existing annotations from the old adata. You can make modifications by specifying additional entries.
[15]:
adata_orig_new = cs.pp.initialize_adata_object(adata=adata_orig, X_clone=X_clone)
WARNING: X_pca not provided. Downstream processing is needed to generate X_pca before computing the transition map.
Time points with clonal info: ['2' '4' '6']
WARNING: Default ascending order of time points are: ['2' '4' '6']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
[16]:
cs.pl.embedding(adata_orig, color="state_info")

Import method B¶
Assuming that the X_clone data is stored in a (cell_id, barcode_id) format
First, initialize the adata without concerning the clonal data
[17]:
df_cell_id = pd.read_csv(os.path.join(cs.settings.data_path, "cell_id.txt"))
df_gene = pd.read_csv(os.path.join(cs.settings.data_path, "gene_names.txt"))
df_time = pd.read_csv(os.path.join(cs.settings.data_path, "time_info.txt"))
RNA_count_matrix = sio.mmread(
os.path.join(cs.settings.data_path, "gene_expression_count_matrx.mtx")
)
df_state = pd.read_csv(os.path.join(cs.settings.data_path, "state_info.txt"))
df_X_emb = pd.read_csv(os.path.join(cs.settings.data_path, "embedding.txt"))
X_emb = np.array([df_X_emb["x"], df_X_emb["y"]]).T
[18]:
adata_orig = cs.pp.initialize_adata_object(
X_state=RNA_count_matrix,
gene_names=df_gene["gene_names"],
cell_names=df_cell_id["Cell_ID"],
time_info=df_time["time_info"],
state_info=df_state["state_info"],
X_emb=X_emb,
data_des="my_data",
)
Create new anndata object
WARNING: X_pca not provided. Downstream processing is needed to generate X_pca before computing the transition map.
Time points with clonal info: []
WARNING: Default ascending order of time points are: ['2' '4' '6']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
Now, load the clonal data. Here, we start with a table of (cell_id, clone_id) pair. The cell_id does not need to be ranked. It should match the cell_id in the adata_orig.obs_names
[19]:
df_X_clone = pd.read_csv(
os.path.join(cs.settings.data_path, "clonal_data_in_table_format.txt")
)
df_X_clone
[19]:
Cell_ID | Clone_ID | |
---|---|---|
0 | cell_0 | clone_275 |
1 | cell_1 | clone_329 |
2 | cell_2 | clone_56 |
3 | cell_3 | clone_236 |
4 | cell_4 | clone_213 |
... | ... | ... |
776 | cell_776 | clone_196 |
777 | cell_777 | clone_239 |
778 | cell_778 | clone_259 |
779 | cell_779 | clone_217 |
780 | cell_780 | clone_259 |
781 rows × 2 columns
[20]:
adata_orig.obs_names
[20]:
Index(['cell_10', 'cell_13', 'cell_18', 'cell_32', 'cell_70', 'cell_80',
'cell_90', 'cell_97', 'cell_108', 'cell_117',
...
'cell_7389', 'cell_7392', 'cell_7402', 'cell_7407', 'cell_7409',
'cell_7417', 'cell_7423', 'cell_7435', 'cell_7436', 'cell_7437'],
dtype='object', length=781)
[21]:
cs.pp.get_X_clone(adata_orig, df_X_clone["Cell_ID"], df_X_clone["Clone_ID"])
100%|██████████| 781/781 [00:00<00:00, 27474.68it/s]
[22]:
adata_orig
[22]:
AnnData object with n_obs × n_vars = 781 × 2481
obs: 'time_info', 'state_info'
uns: 'data_des', 'time_ordering', 'clonal_time_points', 'clone_id'
obsm: 'X_clone', 'X_emb'
[23]:
cs.pl.embedding(adata_orig, color="state_info")

Preprocessing¶
Note that this step is optional. If the data do not yet have one of X_pca
, X_emb
, or state_info
, you will need to run the preprocessing and dimension reduction.
[1]:
import cospar as cs
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2 # range: 0 (error),1 (warning),2 (info),3 (hint).
cs.settings.set_figure_params(
format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=3
)
Running cospar 0.2.0 (python 3.8.12) on 2022-02-08 19:31.
[3]:
# Each dataset should have its folder to avoid conflicts.
cs.settings.data_path = "data_cospar"
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()
Load an existing dataset. (If you have pre-processed data, you can load it with cs.hf.read(file_name)
.)
[4]:
adata_orig = cs.datasets.hematopoiesis_subsampled()
Dimension reduction¶
Select highly variable genes. Before this, there is count normalization. It requires that the count matrix is NOT log-normalized.
[5]:
cs.pp.get_highly_variable_genes(
adata_orig,
normalized_counts_per_cell=10000,
min_counts=3,
min_cells=3,
min_gene_vscore_pctl=90,
)
Finding highly variable genes...

Keeping 1615 genes
Compute for each gene its correlation with a set of cell cycle genes. The default cell cycle genes are for mouse. You need to use your own genes for a different species. This step is optional, but recommended.
[6]:
cs.pp.remove_cell_cycle_correlated_genes(
adata_orig,
cycling_gene_list=[
"Ube2c",
"Hmgb2",
"Hmgn2",
"Tuba1b",
"Ccnb1",
"Tubb5",
"Top2a",
"Tubb4b",
],
)
adata.var['highly_variable'] not updated.
Please choose corr_threshold properly, and set confirm_change=True

Now, confirm the change at a specific cutoff threshold.
[7]:
cs.pp.remove_cell_cycle_correlated_genes(
adata_orig, corr_threshold=0.2, confirm_change=True
)
Number of selected non-cycling highly variable genes: 1508
Remove 107 cell cycle correlated genes.
adata.var['highly_variable'] updated
Compute the X_pca
, X_emb
. X_pca
will be used to build the similarity matrix later. X_emb
is only used for visualization. You can also pass your favorite embedding directly to adata.obsm['X_emb']
.
[8]:
cs.pp.get_X_pca(adata_orig, n_pca_comp=40)
cs.pp.get_X_emb(adata_orig, n_neighbors=20, umap_min_dist=0.3)
WARNING: get_X_pca assumes that the count matrix adata.X is NOT log-transformed.
/Users/shouwenwang/miniconda3/envs/CoSpar_test/lib/python3.8/site-packages/sklearn/utils/validation.py:585: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
warnings.warn(
/Users/shouwenwang/miniconda3/envs/CoSpar_test/lib/python3.8/site-packages/sklearn/utils/validation.py:585: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
warnings.warn(
Note that previous X_pca
and X_emb
(if exists) will be kept and renamed as X_pca_old
and X_emb_old
[9]:
adata_orig
[9]:
AnnData object with n_obs × n_vars = 7438 × 25289
obs: 'time_info', 'state_info', 'n_counts'
var: 'highly_variable', 'highly_variable_old'
uns: 'clonal_time_points', 'data_des', 'neighbors', 'state_info_colors', 'time_ordering', 'umap'
obsm: 'X_clone', 'X_emb', 'X_emb_old', 'X_pca', 'X_pca_old', 'X_umap'
obsp: 'connectivities', 'distances'
[10]:
cs.pl.embedding(adata_orig, color="state_info")

State annotation¶
This generates adata_orig.obs['state_info']
.
[11]:
cs.pp.get_state_info(adata_orig, n_neighbors=20, resolution=0.5)
[12]:
cs.pl.embedding(adata_orig, color="state_info")

Similarly, the pre-existing state_info
will be renamed
[13]:
adata_orig
[13]:
AnnData object with n_obs × n_vars = 7438 × 25289
obs: 'time_info', 'state_info', 'n_counts', 'leiden', 'state_info_old'
var: 'highly_variable', 'highly_variable_old'
uns: 'clonal_time_points', 'data_des', 'neighbors', 'state_info_colors', 'time_ordering', 'umap', 'leiden'
obsm: 'X_clone', 'X_emb', 'X_emb_old', 'X_pca', 'X_pca_old', 'X_umap'
obsp: 'connectivities', 'distances'
Refine state annotation by marker genes¶
The goal here is to refine adata_orig.obs['state_info']
. In this method, a state is selected if it expresses all genes in the list of marker_genes
, and the expression is above the relative threshold express_threshold
. You can also specify which time point you want to focus on. In addition, we also include cell states neighboring to these valid states to smooth the selection (controlled by add_neighbor_N
). First, explore parameters to find satisfactory annotation.
[14]:
confirm_change = False
marker_genes = ["Mpo", "Elane", "S100a8"]
cs.pp.refine_state_info_by_marker_genes(
adata_orig,
marker_genes,
express_threshold=0.1,
selected_times=["4"],
new_cluster_name="new",
add_neighbor_N=10,
confirm_change=confirm_change,
)

Once you are happy with the result, set confirm_change=True
to confirm changes to adata.obs['state_info']
.
[15]:
confirm_change = True
marker_genes = ["Mpo", "Elane", "S100a8"]
cs.pp.refine_state_info_by_marker_genes(
adata_orig,
marker_genes,
express_threshold=0.1,
selected_times=["4"],
new_cluster_name="new",
add_neighbor_N=10,
confirm_change=confirm_change,
)
Change state annotation at adata.obs['state_info']


Refine state annotation by clustering states at given time points¶
First, explore the parameters.
[16]:
confirm_change = False
cs.pp.refine_state_info_by_leiden_clustering(
adata_orig,
selected_times=["2"],
n_neighbors=20,
resolution=0.5,
confirm_change=confirm_change,
)

Once you are happy with the result, set confirm_change=True
to confirm changes to adata.obs['state_info']
.
Miscellaneous¶
[17]:
cs.hf.check_available_choices(adata_orig)
Available transition maps: []
Available clusters: ['7', '2', '8', '4', '1', '5', '6', '3', '0', '9', 'new', '10']
Available time points: ['2' '4' '6']
Clonal time points: ['2' '4' '6']
You can choose to save preprocessed data. It can be loaded using cs.hf.read(file_name)
.
[18]:
# cs.hf.save_preprocessed_adata(adata_orig)
Clonal analysis¶
[1]:
import cospar as cs
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2 # range: 0 (error),1 (warning),2 (info),3 (hint).
cs.settings.set_figure_params(
format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=3
)
Running cospar 0.2.0 (python 3.8.12) on 2022-01-30 14:45.
[3]:
# Each dataset should have its folder to avoid conflicts.
cs.settings.data_path = "data_cospar"
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()
Load an existing dataset. (If you have pre-processed data, you can load it with cs.hf.read(file_name)
.)
[4]:
adata_orig = cs.datasets.hematopoiesis_subsampled()
Show barcode heatmap as aggregated into given fate clusters (defined in adata_orig.obs['state_info']
)
[5]:
selected_times = None
selected_fates = [
"Ccr7_DC",
"Mast",
"Meg",
"pDC",
"Eos",
"Lymphoid",
"Erythroid",
"Baso",
"Neutrophil",
"Monocyte",
]
celltype_names = ["mDC", "Mast", "Meg", "pDC", "Eos", "Ly", "Ery", "Baso", "Neu", "Mon"]
cs.pl.barcode_heatmap(
adata_orig,
selected_times=selected_times,
selected_fates=selected_fates,
color_bar=True,
rename_fates=celltype_names,
log_transform=False,
binarize=True,
)
Data saved at adata.uns['barcode_heatmap']
[5]:
<AxesSubplot:>

Fate coupling in the underlying clonal data, defined in our package as the normalized barcode covariance between cells annotated in different fates.
[6]:
cs.tl.fate_coupling(
adata_orig, selected_fates=selected_fates, source="X_clone"
) # compute the fate coupling
cs.pl.fate_coupling(adata_orig, source="X_clone") # actually plot the coupling
Results saved as dictionary at adata.uns['fate_coupling_X_clone']
[6]:
<AxesSubplot:title={'center':'source: X_clone'}>

Fate hierarchy constructed from fate coupling of the underlying clonal data, using the neighbor-joining method.
[7]:
cs.tl.fate_hierarchy(
adata_orig, selected_fates=selected_fates, source="X_clone"
) # compute the fate hierarchy
cs.pl.fate_hierarchy(adata_orig, source="X_clone") # actually plot the hierarchy
Results saved as dictionary at adata.uns['fate_hierarchy_X_clone']
/-Lymphoid
/-|
| \-Ccr7_DC
/-|
| | /-Monocyte
| \-|
| \-Neutrophil
/-|
| | /-Baso
| | /-|
| | /-| \-Mast
| | | |
--| \-| \-Eos
| |
| | /-Erythroid
| \-|
| \-Meg
|
\-pDC
Next, we compute the clonal fate bias, -log(Q-value). We calculated a P-value that that a clone is enriched (or depleted) in a fate, using Fisher-Exact test (accounting for clone size). The P-value is then corrected to give a Q-value by Benjamini-Hochberg procedure. The alternative hypothesis options are: {‘two-sided’,’greater’,’less’}. The default is ‘two-sided’.
[8]:
cs.tl.clonal_fate_bias(
adata_orig, selected_fate="Monocyte", alternative="two-sided"
) # compute the fate hierarchy
cs.pl.clonal_fate_bias(adata_orig) # actually plot the hierarchy
100%|██████████| 500/500 [00:01<00:00, 442.37it/s]
Data saved at adata.uns['clonal_fate_bias']


[9]:
result = adata_orig.uns["clonal_fate_bias"]
result
[9]:
Clone_ID | Clone_size | Q_value | Fate_bias | clonal_fraction_in_target_fate | |
---|---|---|---|---|---|
0 | 488 | 58 | 1.037416e-20 | 19.984047 | 0.896552 |
1 | 387 | 37 | 1.608291e-16 | 15.793635 | 0.945946 |
2 | 227 | 45 | 1.665465e-15 | 14.778465 | 0.866667 |
3 | 162 | 42 | 6.210001e-13 | 12.206908 | 0.833333 |
4 | 302 | 112 | 6.210001e-13 | 12.206908 | 0.000000 |
... | ... | ... | ... | ... | ... |
495 | 366 | 9 | 1.000000e+00 | -0.000000 | 0.222222 |
496 | 46 | 18 | 1.000000e+00 | -0.000000 | 0.222222 |
497 | 408 | 27 | 1.000000e+00 | -0.000000 | 0.259259 |
498 | 6 | 4 | 1.000000e+00 | -0.000000 | 0.250000 |
499 | 463 | 3 | 1.000000e+00 | -0.000000 | 0.333333 |
500 rows × 5 columns
Illustrate some most biased clones.
[14]:
ids = result["Clone_ID"][:2]
cs.pl.clones_on_manifold(
adata_orig,
selected_clone_list=ids,
color_list=["black", "red", "blue"],
clone_markersize=15,
)


Transition map inference¶
Below, we illustrate transition map inference from different methods in cospar. Once computed, they can be analyzed and visualized later.
[1]:
import cospar as cs
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2 # range: 0 (error),1 (warning),2 (info),3 (hint).
cs.settings.set_figure_params(
format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=3
)
Running cospar 0.2.0 (python 3.8.12) on 2022-02-08 19:17.
[3]:
# Each dataset should have its folder to avoid conflicts.
cs.settings.data_path = "data_cospar"
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()
Load an existing dataset. (If you have pre-processed data, you can load it with cs.hf.read(file_name)
.)
[4]:
adata_orig = cs.datasets.hematopoiesis_subsampled()
Transition map from clones across multiple time points¶
This method utilizes clones with multiple time points. It is the core algorithm for the CoSpar package. The first time it runs, it will compute the similarity matrix at different smooth depth and save them. Then, it will infer the transition map. Some key parameters:
clonal_time_points (
list
; default: all): List of time points with clonal data that you want to perform inference.later_time_point (
str
; default: None): If not set, CoSpar learns a transition map between neighboring time points specified by clonal_time_points, i.e., transitions from day 2 to day 4, and from day 4 to day 6 if clonal_time_points=[‘2’,’4’,’6’]. Ohterwise, it learns transitions from any ofclonal_time_points
to thelater_time_point
.smooth_array (
list
; default: [15,10,5]): List of smooth depth at initial runs of iteration. Suppose that it has a length N. For iteration nN, we use the last entry of smooth_array to compute the S matrix. We recommend starting with more smoothing depth and gradually reduce the depth, as inspired by simulated annealing. Data with higher clonal dispersion should start with higher smoothing depth. The final depth should depend on the manifold itself. For fewer cells, it results in a small KNN graph, and a small final depth should be used. We recommend to use a number at the multiple of 5 for computational efficiency i.e., smooth_array=[20, 15, 10, 5], or [20,15,10].sparsity_threshold (
float
; default: 0.1): sparsity threshold to remove spurious transitions in the updated transition map, in the range [0,1].max_iter_N (
int
; default: 5): maximum number of iterations in refining the transition map. We found that 3 iterations are sufficient for tested datasets.epsilon_converge (
float
; default: 0.05): Iteration is considered converged when the correlation R between maps from consecutive iteration satisfies: R>1-epsilon_converge
.
It takes ~3 mins to run for the first time, and ~20 s for later runs.
[5]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
adata_orig,
clonal_time_points=["2", "4"],
later_time_point="6",
smooth_array=[20, 15, 10, 5],
sparsity_threshold=0.1,
intraclone_threshold=0.2,
max_iter_N=10,
epsilon_converge=0.01,
)
Trying to set attribute `.uns` of view, copying.
------Compute the full Similarity matrix if necessary------
------Infer transition map between initial time points and the later time one------
--------Current initial time point: 2--------
Step 1: Select time points
Number of multi-time clones post selection: 185
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.942
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
--------Current initial time point: 4--------
Step 1: Select time points
Number of multi-time clones post selection: 500
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.956
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
-----------Total used time: 19.377027988433838 s ------------
We can see that after this step, we have two maps: 'transition_map'
and 'intraclone_transition_map'
. Both of them can be used for downstream analysis.
[6]:
cs.hf.check_available_map(adata)
adata.uns["available_map"]
[6]:
['transition_map', 'intraclone_transition_map']
Transition map from end-point clones, or without clones¶
After initializing the map by either the OT
or HighVar
method, We jointly infer the likely clonal ancestors and the corresponding transition map between given time points. You need to choose the initialization method and set the corresponding parameters.
initialize_method=
OT
: using optimal transport. Key parameters:OT_epsilon (
float
; default: 0.02): the entropic regularizationOT_cost ({
GED
,SPD
}, default:SPD
): method for computing the cost function.GED
uses simple gene expression distances; it is faster.SPD
uses the shortest path distances; it is slower but often more accurate.
initialize_method=
HighVar
:HighVar
assumes that cells similar in gene expression across time points share clonal origin. It converts highly variable genes into pseudo multi-time clones and runscs.tmap.infer_Tmap_from_multitime_clones
to construct the map. We findHighVar
performs better thanOT
, especially when there are large differentiation effects over the observed time window, or batch effects.HighVar_gene_pctl (
int
; default: 85): percentile threshold to select highly variable genes. Range: [0,100]. A higher value selects more variable genes.
max_iter_N (
list
; default: [3,5]): a list of two entries, representing the maximum iteration number for joint optimization and CoSpar core function, respectively.epsilon_converge (
list
; default: [0.05,0.05]), a list of two entries, indicating the map convergence threshold for joint optimization and CoSpar core function, respectively.
Infer transition map from a single clonal time point (using OT
), please run ( ~ 1 min):
[7]:
adata_1 = cs.tmap.infer_Tmap_from_one_time_clones(
adata_orig,
initial_time_points=["4"],
later_time_point="6",
initialize_method="OT",
OT_cost="GED",
smooth_array=[20, 15, 10, 5],
sparsity_threshold=0.1,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 4--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Finishing computing gene expression distance, used time 6.2248289585113525
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 36.00060701370239
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.961
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.275
Finishing Joint Optimization, used time 11.890204191207886
-----------Total used time: 54.800551891326904 s ------------
[8]:
cs.hf.check_available_map(adata_1)
adata_1.uns["available_map"]
[8]:
['transition_map', 'OT_transition_map']
Infer transition map from state information alone (using HighVar
): (it takes ~ 2 mins)
[9]:
adata_2 = cs.tmap.infer_Tmap_from_state_info_alone(
adata_orig,
initial_time_points=["4"],
later_time_point="6",
initialize_method="HighVar",
HighVar_gene_pctl=85,
max_iter_N=[10, 10],
epsilon_converge=[0.01, 0.01],
smooth_array=[20, 15, 10, 5],
sparsity_threshold=0.1,
)
Step I: Generate pseudo clones where each cell has a unique barcode-----
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
Step II: Perform joint optimization-----
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 4--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use the HighVar method for initialization-------
Step a: find the commonly shared highly variable genes------
Highly varable gene number: 2136 (t1); 2245 (t2). Common set: 978
Step b: convert the shared highly variable genes into clonal info------
87%|████████▋ | 853/978 [00:00<00:00, 5630.14it/s]
Total used genes=853 (no cells left)
Step c: compute the transition map based on clonal info from highly variable genes------
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.966
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.988
Iteration 6, Use smooth_round=5
Convergence (CoSpar, iter_N=6): corr(previous_T, current_T)=0.996
Finishing initialization using HighVar, used time 16.57118797302246
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.934
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.998
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.784
-----JointOpt Iteration 2: Infer initial clonal structure
-----JointOpt Iteration 2: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.936
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=2): corr(previous_T, current_T)=0.966
-----JointOpt Iteration 3: Infer initial clonal structure
-----JointOpt Iteration 3: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.994
Convergence (JointOpt, iter_N=3): corr(previous_T, current_T)=0.97
-----JointOpt Iteration 4: Infer initial clonal structure
-----JointOpt Iteration 4: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.995
Convergence (JointOpt, iter_N=4): corr(previous_T, current_T)=0.968
-----JointOpt Iteration 5: Infer initial clonal structure
-----JointOpt Iteration 5: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.938
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=5): corr(previous_T, current_T)=0.973
-----JointOpt Iteration 6: Infer initial clonal structure
-----JointOpt Iteration 6: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=6): corr(previous_T, current_T)=0.969
-----JointOpt Iteration 7: Infer initial clonal structure
-----JointOpt Iteration 7: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.935
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.995
Convergence (JointOpt, iter_N=7): corr(previous_T, current_T)=0.971
-----JointOpt Iteration 8: Infer initial clonal structure
-----JointOpt Iteration 8: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.936
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=8): corr(previous_T, current_T)=0.976
-----JointOpt Iteration 9: Infer initial clonal structure
-----JointOpt Iteration 9: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.936
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=9): corr(previous_T, current_T)=0.977
-----JointOpt Iteration 10: Infer initial clonal structure
-----JointOpt Iteration 10: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
Convergence (JointOpt, iter_N=10): corr(previous_T, current_T)=0.975
Finishing Joint Optimization, used time 242.97642278671265
-----------Total used time: 260.5958061218262 s ------------
[10]:
cs.hf.check_available_map(adata_2)
adata_2.uns["available_map"]
[10]:
['transition_map', 'HighVar_transition_map']
Transition map from clonal data alone, without state information¶
As a comparision, we can also learn the transition map using clonal information alone:
[11]:
# We use the Weinreb method, which selects uni-potent clones based on a pre-defined list of fates.
selected_fates = [
"Baso",
"Ccr7_DC",
"Eos",
"Erythroid",
"Lymphoid",
"Mast",
"Meg",
"Monocyte",
"Neu_Mon",
"Neutrophil",
"pDC",
]
adata_3 = cs.tmap.infer_Tmap_from_clonal_info_alone(
adata_orig, method="weinreb", later_time_point="6", selected_fates=selected_fates
)
Trying to set attribute `.uns` of view, copying.
Infer transition map between initial time points and the later time point.
--------Current initial time point: 2--------
Number of multi-time clones post selection: 185
Use only uni-potent clones (weinreb et al., 2020)
Used uni-potent clone fraction 0.6162162162162163
--------Current initial time point: 4--------
Number of multi-time clones post selection: 500
Use only uni-potent clones (weinreb et al., 2020)
Used uni-potent clone fraction 0.538
[12]:
cs.hf.check_available_map(adata_3)
adata_3.uns["available_map"]
[12]:
['clonal_transition_map']
Saving maps¶
[13]:
cs.hf.save_map(adata)
Saved file: data_cospar/LARRY_sp500_ranking1_MultiTimeClone_Later_FullSpace0_t*2*4*6_adata_with_transition_map.h5ad
It can be loaded again with adata=cs.hf.read(file_path)
Transition map analysis¶
[1]:
import cospar as cs
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2 # range: 0 (error),1 (warning),2 (info),3 (hint).
cs.settings.set_figure_params(
format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=3
)
Running cospar 0.2.0 (python 3.8.12) on 2022-02-08 19:23.
[3]:
# Each dataset should have its folder to avoid conflicts.
cs.settings.data_path = "data_cospar"
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()
Load an existing dataset. (If you have pre-processed data, you can load it with cs.hf.read(file_name)
.)
[4]:
adata_orig = cs.datasets.hematopoiesis_subsampled()
Generate a transition map
[5]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
adata_orig,
clonal_time_points=["2", "4"],
later_time_point="6",
smooth_array=[20, 15, 10, 5],
sparsity_threshold=0.1,
intraclone_threshold=0.2,
max_iter_N=10,
epsilon_converge=0.01,
)
Trying to set attribute `.uns` of view, copying.
------Compute the full Similarity matrix if necessary------
------Infer transition map between initial time points and the later time one------
--------Current initial time point: 2--------
Step 1: Select time points
Number of multi-time clones post selection: 185
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.942
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
--------Current initial time point: 4--------
Step 1: Select time points
Number of multi-time clones post selection: 500
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.956
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
-----------Total used time: 19.324687957763672 s ------------
[6]:
# adata=cs.hf.read('data_cospar/LARRY_sp500_ranking1_MultiTimeClone_Later_FullSpace0_t*2*4*6_adata_with_transition_map.h5ad')
Key parameters¶
The analysis is done with the plotting module. There are some common parameters for the APIs in this module:
source (
str
; default:transition_map
). It determines which transition map to use for analysis. Choices: {transition_map
,intraclone_transition_map
,OT_transition_map
,HighVar_transition_map
,clonal_transition_map
}selected_fates (
list
ofstr
). Selected clusters to aggregate differentiation dynamics and visualize fate bias etc.. It allows nested structure, e.g., selected_fates=[‘a’, [‘b’, ‘c’]] selects two clusters: cluster ‘a’ and the other that combines ‘b’ and ‘c’.map_backward (
bool
; default: True). We can analyze either the backward transitions, i.e., where these selected states or clusters came from (map_backward=True
); or the forward transitions, i.e., where the selected states or clusters are going (map_backward=False
).selected_times (
list
; default: all). List of time points to use. By default, all are used.method (
str
; default:norm-sum
). Method to aggregate the transition probability within a cluster. Choices: {norm-sum
,sum
}.norm-sum
returns the probability that a fate cluster originates from an early state i; whilesum
gives the probability that an initial state i gives rise to a fate cluster.
Plotting transition profiles for single cells¶
First, check the forward transitions (i.e., future states) from the 'transition_map'
.
[7]:
selected_state_id_list = [
1,
10,
] # This is a relative ID. Its mapping to the actual cell id depends on map_backward.
map_backward = False
cs.pl.single_cell_transition(
adata,
selected_state_id_list=selected_state_id_list,
source="transition_map",
map_backward=map_backward,
)

Now, backward transitions (i.e., past states) from the same map.
[8]:
selected_state_id_list = [2]
map_backward = True
cs.pl.single_cell_transition(
adata,
selected_state_id_list=selected_state_id_list,
source="transition_map",
map_backward=map_backward,
)

Finally, switch to the 'intraclone_transition_map'
, and check the observed clonal transitions:
[9]:
selected_state_id_list = [2]
map_backward = True
cs.pl.single_cell_transition(
adata,
selected_state_id_list=selected_state_id_list,
source="intraclone_transition_map",
map_backward=map_backward,
)

Fate map¶
Inspect the backward transitions, and ask where the selected fate clusters come from.
[10]:
cs.tl.fate_map(
adata,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
map_backward=True,
)
cs.pl.fate_map(
adata,
selected_fates=["Neutrophil"],
source="transition_map",
plot_target_state=True,
show_histogram=False,
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']

The results are stored at adata.obs[f'fate_map_{source}_{fate_name}']
, which can be used for a customized analysis.
[11]:
adata.obs.keys()
[11]:
Index(['time_info', 'state_info', 'n_counts',
'fate_map_transition_map_Neutrophil',
'fate_map_transition_map_Monocyte'],
dtype='object')
As an example of how to use this data, you can re-plot the fate map as follows:
[12]:
cs.pl.embedding(adata, color="fate_map_transition_map_Neutrophil")

Fate potency¶
Count fate number directly (fate_count=True
); otherwise, calculate the potencyusing entropy. The results are stored at adata.obs[f'fate_potency_{source}']
.
[13]:
cs.tl.fate_potency(
adata,
source="transition_map",
map_backward=True,
method="norm-sum",
fate_count=True,
)
cs.pl.fate_potency(adata, source="transition_map")
Results saved at adata.obs['fate_map_transition_map_Ccr7_DC']
Results saved at adata.obs['fate_map_transition_map_Neu_Mon']
Results saved at adata.obs['fate_map_transition_map_undiff']
Results saved at adata.obs['fate_map_transition_map_Mast']
Results saved at adata.obs['fate_map_transition_map_Erythroid']
Results saved at adata.obs['fate_map_transition_map_Baso']
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_pDC']
Results saved at adata.obs['fate_map_transition_map_Lymphoid']
Results saved at adata.obs['fate_map_transition_map_Meg']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_map_transition_map_Eos']
Results saved at adata.obs['fate_potency_transition_map']

Fate bias¶
The fate bias of an initial state i is defined by the competition of fate probability towards two fate clusters A and B:
Bias_i=
P(i;A)/[P(i;A)+P(i;B)]
.
Only states with fate probabilities satisfying this criterion will be shown:
P(i; A)+P(i; B)>sum_fate_prob_thresh
The inferred fate bias is stored at adata.obs[f'fate_bias_{source}_{fate_A}*{fate_B}']
.
[14]:
cs.tl.fate_bias(
adata,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
pseudo_count=0,
)
cs.pl.fate_bias(
adata,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
plot_target_state=False,
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']

We can also study fate bias in the Gata1+ states. First, prepare a sate mask:
[15]:
x_emb = adata.obsm["X_emb"][:, 0]
y_emb = adata.obsm["X_emb"][:, 1]
index_2 = cs.hf.above_the_line(adata.obsm["X_emb"], [100, 500], [500, -1000])
index_5 = cs.hf.above_the_line(adata.obsm["X_emb"], [0, -500], [1000, 2])
final_mask = (~index_2) & (
(index_5 | (x_emb < 0))
) # & index_3 & index_4 & index_5 #mask_1 &
Now, compute the fate bias. We can concatenate cluster ‘Meg’ and ‘Erythroid’ using a nested list. Same for concatenating ‘Baso’, ‘Mast’, and ‘Eos’.
[16]:
# cs.pl.fate_bias(adata,selected_fates=[['Meg','Erythroid'],['Baso','Mast','Eos']],source='transition_map',
# plot_target_state=False,mask=final_mask,map_backward=True,sum_fate_prob_thresh=0.01,method='norm-sum')
cs.tl.fate_bias(
adata,
selected_fates=[["Meg", "Erythroid"], ["Baso", "Mast", "Eos"]],
source="transition_map",
pseudo_count=0,
map_backward=True,
sum_fate_prob_thresh=0.01,
method="norm-sum",
)
cs.pl.fate_bias(
adata,
selected_fates=[["Meg", "Erythroid"], ["Baso", "Mast", "Eos"]],
source="transition_map",
plot_target_state=False,
mask=final_mask,
background=False,
)
Results saved at adata.obs['fate_map_transition_map_Meg_Erythroid']
Results saved at adata.obs['fate_map_transition_map_Baso_Mast_Eos']
Results saved at adata.obs['fate_bias_transition_map_Meg_Erythroid*Baso_Mast_Eos']

Dynamic trajectory inference¶
We can infer the dynamic trajectory and ancestor population using the fate bias from binary fate competition. Here, fate bias is a scalar between (0,1) at each state. Selected ancestor population satisfies:
P(i;A) + P(i;B) > sum_fate_prob_thresh;
Ancestor states {i} for A: Bias_i > bias_threshold_A
Ancestor states {i} for B: Bias_i < bias_threshold_B
They will be stored at adata.obs[f'progenitor_{source}_{fate_name}']
and adata.obs[f'diff_trajectory_{source}_{fate_name}']
.
[17]:
cs.tl.progenitor(
adata,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
map_backward=True,
bias_threshold_A=0.5,
bias_threshold_B=0.5,
sum_fate_prob_thresh=0.2,
avoid_target_states=True,
)
cs.pl.progenitor(
adata, selected_fates=["Neutrophil", "Monocyte"], source="transition_map"
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']
Results saved at adata.obs[f'progenitor_transition_map_Neutrophil'] and adata.obs[f'diff_trajectory_transition_map_Neutrophil']
Results saved at adata.obs[f'progenitor_transition_map_Monocyte'] and adata.obs[f'diff_trajectory_transition_map_Monocyte']


We can visualize the progenitors for Neutrophil directly
[18]:
fate_name = "Neutrophil" # Monocyte
traj_name = f"progenitor_transition_map_{fate_name}"
cs.pl.embedding(adata, color=traj_name)

We can also visualize the differentiation trajectory for Neutrophil.
[19]:
fate_name = "Neutrophil" # Monocyte
traj_name = f"diff_trajectory_transition_map_{fate_name}"
cs.pl.embedding(adata, color=traj_name)

Differential genes for two ancestor groups¶
It would be interesting to see what genes are differentially expressed between these two ancestor populations, which might drive fate bifurcation. We provide a simple differentiation gene expression analysis that uses Wilcoxon rank-sum test to calculate P values, followed by Benjamini-Hochberg correction. You can always use your own method.
[20]:
import numpy as np
cell_group_A = np.array(adata.obs["diff_trajectory_transition_map_Neutrophil"])
cell_group_B = np.array(adata.obs["diff_trajectory_transition_map_Monocyte"])
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
adata, cell_group_A=cell_group_A, cell_group_B=cell_group_B, FDR_cutoff=0.05
)
[21]:
dge_gene_A
[21]:
index | gene | Qvalue | mean_1 | mean_2 | ratio | |
---|---|---|---|---|---|---|
0 | 5 | Wfdc17 | 0.000000e+00 | 0.256167 | 24.515341 | -4.344265 |
1 | 9 | Lpl | 9.000377e-267 | 0.798759 | 27.861982 | -4.004097 |
2 | 0 | Mmp8 | 0.000000e+00 | 1.777057 | 40.070595 | -3.886477 |
3 | 304 | H2-Aa | 1.071471e-33 | 0.083807 | 14.977456 | -3.881858 |
4 | 8 | Ctss | 1.935638e-273 | 0.130347 | 15.299898 | -3.850025 |
... | ... | ... | ... | ... | ... | ... |
1061 | 1789 | H3f3a | 2.146904e-04 | 17.266771 | 18.038162 | -0.059673 |
1062 | 3329 | Gm37214 | 4.577627e-02 | 1.208660 | 1.298273 | -0.057379 |
1063 | 1881 | mt-Co2 | 3.645677e-04 | 195.855591 | 203.123032 | -0.052301 |
1064 | 2978 | Nom1 | 2.363414e-02 | 0.813894 | 0.820711 | -0.005412 |
1065 | 2647 | Ffar2 | 9.581307e-03 | 2.096291 | 2.100280 | -0.001857 |
1066 rows × 6 columns
Gene expression can be explored directly:
[22]:
selected_genes = dge_gene_A["gene"][:2]
cs.pl.gene_expression_on_manifold(
adata, selected_genes=selected_genes, color_bar=True, savefig=False
)


You can visualize the gene expression differences using a heat map.
[24]:
gene_list = list(dge_gene_A["gene"][:20]) + list(
dge_gene_B["gene"][:20]
) # select the top 20 genes from both populations
selected_fates = [
"Neutrophil",
"Monocyte",
["Baso", "Eos", "Erythroid", "Mast", "Meg"],
["pDC", "Ccr7_DC", "Lymphoid"],
]
renames = ["Neu", "Mon", "Meg-Ery-MBaE", "Lym-Dc"]
gene_expression_matrix = cs.pl.gene_expression_heatmap(
adata,
selected_genes=gene_list,
selected_fates=selected_fates,
rename_fates=renames,
fig_width=12,
)

You can run the differential gene expression analysis on different cell clusters.
[25]:
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
adata, cell_group_A="Neutrophil", cell_group_B="Monocyte"
)
Gene expression dynamics¶
We can calculate the pseudotime along an inferred trajectory, and plot the gene expression along the pseudotime. This method requires that a trajectory has been inferred in previously steps.
[26]:
gene_name_list = ["Gata1", "Mpo", "Elane", "S100a8"]
selected_fate = "Neutrophil"
cs.pl.gene_expression_dynamics(
adata,
selected_fate,
gene_name_list,
traj_threshold=0.2,
invert_PseudoTime=False,
compute_new=True,
gene_exp_percentile=99,
n_neighbors=8,
plot_raw_data=False,
)


Fate coupling and hierarchy¶
The inferred transition map can be used to estimate differentiation coupling between different fate clusters.
[27]:
selected_fates = [
"Ccr7_DC",
"Mast",
"Meg",
"pDC",
"Eos",
"Lymphoid",
"Erythroid",
"Baso",
"Neutrophil",
"Monocyte",
]
cs.tl.fate_coupling(adata, selected_fates=selected_fates, source="transition_map")
cs.pl.fate_coupling(adata, source="transition_map")
Results saved as dictionary at adata.uns['fate_coupling_transition_map']
[27]:
<AxesSubplot:title={'center':'source: transition_map'}>

We can also infer fate hierarchy from a transition map, based on the fate coupling matrix and using the neighbor-joining method.
[28]:
cs.tl.fate_hierarchy(adata, selected_fates=selected_fates, source="transition_map")
cs.pl.fate_hierarchy(adata, source="transition_map")
Results saved as dictionary at adata.uns['fate_hierarchy_transition_map']
/-Baso
/-|
/-| \-Eos
| |
/-| \-Mast
| |
| | /-Erythroid
| \-|
--| \-Meg
|
| /-Monocyte
| /-|
| | \-Neutrophil
\-|
| /-Lymphoid
| /-|
\-| \-Ccr7_DC
|
\-pDC
Propagate a cluster in time¶
One way to define the dynamic trajectory is simply mapping a given fate cluster backward in time. The whole trajectory across multiple time points will be saved at adata.obs[f'diff_trajectory_{source}_{fate_name}']
. This method requires having multiple clonal time points, and a transition map between neighboring time points.
First, use all clonal time points (the default), and infer a transition map between neighboring time points (by not setting later_time_point
).
[29]:
adata_5 = cs.tmap.infer_Tmap_from_multitime_clones(
adata_orig,
smooth_array=[20, 15, 10, 5],
sparsity_threshold=0.2,
intraclone_threshold=0.2,
max_iter_N=10,
epsilon_converge=0.01,
)
------Compute the full Similarity matrix if necessary------
----Infer transition map between neighboring time points-----
Step 1: Select time points
Number of multi-time clones post selection: 500
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.912
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.993
-----------Total used time: 17.11899995803833 s ------------
[30]:
cs.tl.iterative_differentiation(
adata_5, selected_fates="Neutrophil", source="intraclone_transition_map"
)
cs.pl.iterative_differentiation(adata_5, source="intraclone_transition_map")
Results saved at adata.obs[f'diff_trajectory_intraclone_transition_map_Neutrophil']

The trajectory can be used for visualizing gene expression dynamics using cs.pl.gene_expression_dynamics
, as shown above:
[31]:
gene_name_list = ["Gata1", "Mpo", "Elane", "S100a8"]
selected_fate = "Neutrophil"
cs.pl.gene_expression_dynamics(
adata_5,
selected_fate,
gene_name_list,
traj_threshold=0.1,
invert_PseudoTime=True,
source="intraclone_transition_map",
)


[ ]:
Hematopoiesis¶
Hematopoiesis dataset from Weinreb, C., Rodriguez-Fraticelli, A., Camargo, F. D. & Klein, A. M. Science 367, (2020).
This dataset has 3 time points for both the clonal and state measurements. It has ~50000 clonally-labeled cells. Running the whole pipeline for the first time could take 5.2 hours in a standard personal computer. Most of the time (3.3 h) is used for generating the similarity matrix, which are saved for later usage.
Key components:
Part I: Infer transition map from all clonal data
Part II: Infer transition using end-point clones
Part III: Infer transition map from state information alone
Part IV: Predict fate bias for Gata1+ states
[1]:
import cospar as cs
import numpy as np
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2
cs.settings.data_path = "LARRY_data" # A relative path to save data. If not existed before, create a new one.
cs.settings.figure_path = "LARRY_figure" # A relative path to save figures. If not existed before, create a new one.
cs.settings.set_figure_params(
format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=2
)
Running cospar 0.2.0 (python 3.8.12) on 2022-02-07 21:21.
[3]:
# # This is for testing this notebook
# cs.settings.data_path='data_cospar'
# cs.settings.figure_path='fig_cospar'
# adata_orig=cs.datasets.hematopoiesis_subsampled()
# adata_orig.uns['data_des']=['LARRY_sp500_ranking1']
Loading data¶
[4]:
adata_orig = cs.datasets.hematopoiesis()
[5]:
adata_orig
[5]:
AnnData object with n_obs × n_vars = 49116 × 25289
obs: 'time_info', 'state_info'
uns: 'clonal_time_points', 'data_des', 'state_info_colors'
obsm: 'X_clone', 'X_emb', 'X_pca'
[6]:
cs.pl.embedding(adata_orig, color="state_info")

[7]:
cs.hf.check_available_choices(adata_orig)
Available transition maps: []
Available clusters: ['undiff', 'Meg', 'Mast', 'Neu_Mon', 'Eos', 'Erythroid', 'Ccr7_DC', 'Baso', 'Lymphoid', 'pDC', 'Neutrophil', 'Monocyte']
Available time points: ['2' '4' '6']
Clonal time points: ['2' '4' '6']
Basic clonal analysis¶
[8]:
cs.pl.clones_on_manifold(
adata_orig, selected_clone_list=[1], color_list=["black", "red", "blue"]
)

[9]:
selected_times = "4"
selected_fates = [
"Ccr7_DC",
"Mast",
"Meg",
"pDC",
"Eos",
"Lymphoid",
"Erythroid",
"Baso",
"Neutrophil",
"Monocyte",
]
cs.tl.fate_coupling(
adata_orig,
source="X_clone",
selected_fates=selected_fates,
selected_times=selected_times,
)
cs.pl.fate_coupling(adata_orig, source="X_clone")
Results saved as dictionary at adata.uns['fate_coupling_X_clone']
[9]:
<AxesSubplot:title={'center':'source: X_clone'}>

[10]:
cs.pl.barcode_heatmap(
adata_orig,
selected_times=selected_times,
selected_fates=selected_fates,
color_bar=True,
)
Data saved at adata.uns['barcode_heatmap']
[10]:
<AxesSubplot:>

Part I: Infer transition map from all clonal data¶
Map inference¶
When first run, it takes around 3 h 37 mins, in which 3 h 20 mins are used for computing the similarity matrices and saving these data. It takes only 20 mins for later runs.
[11]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
adata_orig,
clonal_time_points=["2", "4", "6"],
later_time_point="6",
smooth_array=[20, 15, 10],
sparsity_threshold=0.2,
max_iter_N=3,
)
Trying to set attribute `.uns` of view, copying.
------Compute the full Similarity matrix if necessary------
------Infer transition map between initial time points and the later time one------
--------Current initial time point: 2--------
Step 1: Select time points
Number of multi-time clones post selection: 1216
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.903
--------Current initial time point: 4--------
Step 1: Select time points
Number of multi-time clones post selection: 3047
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.97
-----------Total used time: 1604.8337297439575 s ------------
Save pre-computed data (optional)¶
[12]:
save_data = False
if save_data:
cs.hf.save_map(adata)
Plotting¶
Transition profiles for single cells¶
Forward transitions with map_backward=False
.
[13]:
selected_state_id_list = [2]
cs.pl.single_cell_transition(
adata,
selected_state_id_list=selected_state_id_list,
color_bar=False,
source="transition_map",
map_backward=False,
)

Backward transitions with map_backward=True
.
[14]:
selected_state_id_list = [4]
cs.pl.single_cell_transition(
adata,
selected_state_id_list=selected_state_id_list,
color_bar=False,
source="transition_map",
map_backward=True,
)

Fate map¶
[15]:
cs.tl.fate_map(
adata,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
map_backward=True,
)
cs.pl.fate_map(
adata,
selected_fates=["Neutrophil"],
source="transition_map",
plot_target_state=True,
show_histogram=False,
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']

Fate bias¶
[16]:
cs.tl.fate_bias(
adata,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
pseudo_count=0,
sum_fate_prob_thresh=0.1,
)
cs.pl.fate_bias(
adata,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
plot_target_state=False,
selected_times=["4"],
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']

Fate coupling from the transition map¶
[17]:
selected_fates = [
"Ccr7_DC",
"Mast",
"Meg",
"pDC",
"Eos",
"Lymphoid",
"Erythroid",
"Baso",
"Neutrophil",
"Monocyte",
]
cs.tl.fate_coupling(adata, selected_fates=selected_fates, source="transition_map")
cs.pl.fate_coupling(adata, source="transition_map")
Results saved as dictionary at adata.uns['fate_coupling_transition_map']
[17]:
<AxesSubplot:title={'center':'source: transition_map'}>

Hierarchy¶
[18]:
cs.tl.fate_hierarchy(adata, selected_fates=selected_fates, source="transition_map")
cs.pl.fate_hierarchy(adata, source="transition_map")
Results saved as dictionary at adata.uns['fate_hierarchy_transition_map']
/-Erythroid
/-|
/-| \-Meg
| |
/-| \-Mast
| |
| | /-Baso
| \-|
--| \-Eos
|
| /-Monocyte
| /-|
| | \-Neutrophil
\-|
| /-Lymphoid
| /-|
\-| \-Ccr7_DC
|
\-pDC
Dynamic trajectory inference¶
[19]:
cs.tl.progenitor(
adata,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
map_backward=True,
bias_threshold_A=0.5,
bias_threshold_B=0.5,
sum_fate_prob_thresh=0.2,
avoid_target_states=True,
)
cs.pl.progenitor(
adata, selected_fates=["Neutrophil", "Monocyte"], source="transition_map"
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']
Results saved at adata.obs[f'progenitor_transition_map_Neutrophil'] and adata.obs[f'diff_trajectory_transition_map_Neutrophil']
Results saved at adata.obs[f'progenitor_transition_map_Monocyte'] and adata.obs[f'diff_trajectory_transition_map_Monocyte']


Gene trend along the dynamic trajectory¶
[20]:
gene_name_list = ["Gata1", "Mpo", "Elane", "S100a8"]
selected_fate = "Neutrophil"
cs.pl.gene_expression_dynamics(
adata, selected_fate, gene_name_list, traj_threshold=0.2, invert_PseudoTime=False
)


Part II: Infer transition map from end-point clones¶
When run for the firs time, assuming that the similarity matrices are pre-computed, it takes 73 mins. Around 40 mins of which are used to compute the initialized map.
We group time points ‘2’ and ‘4’ so that the inference on day 2 states (which has much fewer cells) are better.
[21]:
time_info = np.array(adata_orig.obs["time_info"])
time_info[time_info == "2"] = "24"
time_info[time_info == "4"] = "24"
adata_orig.obs["time_info"] = time_info
adata_orig = cs.pp.initialize_adata_object(adata_orig)
Clones without any cells are removed.
Time points with clonal info: ['24' '6']
WARNING: Default ascending order of time points are: ['24' '6']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
[22]:
adata_1 = cs.tmap.infer_Tmap_from_one_time_clones(
adata_orig,
initial_time_points=["24"],
later_time_point="6",
initialize_method="OT",
OT_cost="GED",
smooth_array=[20, 15, 10],
max_iter_N=[1, 3],
sparsity_threshold=0.2,
use_full_Smatrix=True,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 24--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed custom OT matrix
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.974
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.144
Finishing Joint Optimization, used time 1308.7951350212097
-----------Total used time: 1341.8913249969482 s ------------
Fate bias¶
[23]:
cs.tl.fate_bias(
adata_1,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
pseudo_count=0,
sum_fate_prob_thresh=0.1,
)
cs.pl.fate_bias(
adata_1,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
plot_target_state=False,
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']

Part III: Infer transition map from state information alone¶
[24]:
adata_2 = cs.tmap.infer_Tmap_from_state_info_alone(
adata_orig,
initial_time_points=["24"],
later_time_point="6",
initialize_method="OT",
OT_cost="GED",
smooth_array=[20, 15, 10],
max_iter_N=[1, 3],
sparsity_threshold=0.2,
use_full_Smatrix=True,
)
Step I: Generate pseudo clones where each cell has a unique barcode-----
Trying to set attribute `.uns` of view, copying.
Step II: Perform joint optimization-----
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 24--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed custom OT matrix
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.977
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.14
Finishing Joint Optimization, used time 1854.5912249088287
-----------Total used time: 1923.1848938465118 s ------------
[25]:
cs.tl.fate_bias(
adata_2,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
pseudo_count=0,
sum_fate_prob_thresh=0.1,
)
cs.pl.fate_bias(
adata_2,
selected_fates=["Neutrophil", "Monocyte"],
source="transition_map",
plot_target_state=False,
)
Results saved at adata.obs['fate_map_transition_map_Neutrophil']
Results saved at adata.obs['fate_map_transition_map_Monocyte']
Results saved at adata.obs['fate_bias_transition_map_Neutrophil*Monocyte']

Part IV: Predict fate bias for Gata1+ states¶
Load data¶
We load a dataset that also includes non-clonally labeled cells in the Gata1+ states (in total ~38K cells), which increases the statistical power of DGE analysis.
[26]:
cs.settings.data_path = "LARRY_data_Gata1" # A relative path to save data. If not existed before, create a new one.
cs.settings.figure_path = "LARRY_figure_Gata1" # A relative path to save figures. If not existed before, create a new one.
adata_orig_1 = cs.datasets.hematopoiesis_Gata1_states()
[27]:
adata_orig_1 = cs.pp.initialize_adata_object(adata_orig_1)
Time points with clonal info: ['2' '4' '6']
WARNING: Default ascending order of time points are: ['2' '4' '6']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
[28]:
cs.pl.embedding(adata_orig_1, color="time_info")

[29]:
adata_orig_1
[29]:
AnnData object with n_obs × n_vars = 38457 × 25289
obs: 'n_counts', 'Source', 'Well', 'mito_frac', 'time_info', 'state_info'
var: 'highly_variable'
uns: 'AllCellCyc_genes', 'clonal_time_points', 'data_des', 'max_mito', 'min_tot', 'new_highvar_genes', 'state_info_colors', 'time_info_colors', 'time_ordering'
obsm: 'X_clone', 'X_emb', 'X_pca', 'X_umap'
[30]:
## This is for testing this notebook
# adata_orig_1=adata_orig
# adata_3=adata_1
Infer transition map from all clones¶
[31]:
adata_3 = cs.tmap.infer_Tmap_from_multitime_clones(
adata_orig_1,
later_time_point="6",
smooth_array=[20, 15, 10],
sparsity_threshold=0.1,
intraclone_threshold=0.1,
extend_Tmap_space=True,
)
------Compute the full Similarity matrix if necessary------
Trying to set attribute `.uns` of view, copying.
------Infer transition map between initial time points and the later time one------
--------Current initial time point: 2--------
Step 1: Select time points
Number of multi-time clones post selection: 355
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.969
--------Current initial time point: 4--------
Step 1: Select time points
Number of multi-time clones post selection: 1270
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.985
-----------Total used time: 477.8539500236511 s ------------
Fate bias¶
[32]:
selected_fates = [["Meg", "Erythroid"], ["Baso", "Mast", "Eos"]]
cs.tl.fate_bias(
adata_3,
selected_fates=selected_fates,
source="transition_map",
pseudo_count=0,
sum_fate_prob_thresh=0.05,
)
cs.pl.fate_bias(
adata_3,
selected_fates=selected_fates,
source="transition_map",
plot_target_state=False,
)
Results saved at adata.obs['fate_map_transition_map_Meg_Erythroid']
Results saved at adata.obs['fate_map_transition_map_Baso_Mast_Eos']
Results saved at adata.obs['fate_bias_transition_map_Meg_Erythroid*Baso_Mast_Eos']

Identify ancestor populations¶
[33]:
x_emb = adata_orig_1.obsm["X_emb"][:, 0]
y_emb = adata_orig_1.obsm["X_emb"][:, 1]
index_2 = cs.hf.above_the_line(adata_orig_1.obsm["X_emb"], [0, -500], [-300, 1000])
index_2_2 = cs.hf.above_the_line(adata_orig_1.obsm["X_emb"], [-450, -600], [-600, 800])
final_mask = (index_2_2) & (~index_2) # & index_3 & index_4 & index_5 #mask_1 &
cs.pl.customized_embedding(x_emb, y_emb, final_mask)
[33]:
<AxesSubplot:>

[34]:
selected_fates = [["Meg", "Erythroid"], ["Baso", "Mast", "Eos"]]
cs.tl.progenitor(
adata_3,
selected_fates=selected_fates,
source="transition_map",
map_backward=True,
bias_threshold_A=0.6,
bias_threshold_B=0.3,
sum_fate_prob_thresh=0.05,
avoid_target_states=True,
)
cs.pl.progenitor(
adata_3, selected_fates=selected_fates, source="transition_map", mask=final_mask
)
Results saved at adata.obs['fate_map_transition_map_Meg_Erythroid']
Results saved at adata.obs['fate_map_transition_map_Baso_Mast_Eos']
Results saved at adata.obs['fate_bias_transition_map_Meg_Erythroid*Baso_Mast_Eos']
Results saved at adata.obs[f'progenitor_transition_map_Meg_Erythroid'] and adata.obs[f'diff_trajectory_transition_map_Meg_Erythroid']
Results saved at adata.obs[f'progenitor_transition_map_Baso_Mast_Eos'] and adata.obs[f'diff_trajectory_transition_map_Baso_Mast_Eos']


DGE analysis¶
[35]:
cell_group_A = np.array(adata_3.obs[f"progenitor_transition_map_Meg_Erythroid"])
cell_group_B = np.array(adata_3.obs[f"progenitor_transition_map_Baso_Mast_Eos"])
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
adata_3, cell_group_A=cell_group_A, cell_group_B=cell_group_B, FDR_cutoff=0.05
)
[36]:
# All, ranked, DGE genes for group A
dge_gene_A
[36]:
index | gene | Qvalue | mean_1 | mean_2 | ratio | |
---|---|---|---|---|---|---|
0 | 4 | Ctsg | 0.000000e+00 | 0.103370 | 2.567811 | -1.693123 |
1 | 3 | Emb | 0.000000e+00 | 0.186135 | 2.703714 | -1.642705 |
2 | 26 | Elane | 3.264509e-241 | 0.151518 | 2.530409 | -1.616299 |
3 | 2 | Plac8 | 0.000000e+00 | 1.096986 | 4.811569 | -1.470611 |
4 | 17 | Igfbp4 | 0.000000e+00 | 1.042382 | 3.969988 | -1.282990 |
... | ... | ... | ... | ... | ... | ... |
965 | 2772 | Rps29 | 2.823620e-02 | 2.905993 | 3.008932 | -0.037529 |
966 | 2343 | Rpl38 | 8.297305e-03 | 5.915980 | 6.061140 | -0.029967 |
967 | 2976 | Rpl5 | 4.697313e-02 | 4.714540 | 4.806636 | -0.023065 |
968 | 2544 | Ly6e | 1.556172e-02 | 2.882883 | 2.944137 | -0.022582 |
969 | 2705 | Isg20 | 2.485104e-02 | 0.330973 | 0.342714 | -0.012671 |
970 rows × 6 columns
[37]:
sub_predicted_upper = [
"Slc14a1",
"Plxnc1",
"Mef2c",
"Tnip3",
"Pbx1",
"Hbb-bt",
"Tmem176b",
"Car2",
"Pim2",
"Tsc22d1",
"Plcg2",
"Gmpr",
"Cd44",
"Irgm1",
"Hsd17b10",
"Dlk1",
]
sub_predicted_lower = [
"Ctsg",
"Emb",
"Ccl9",
"Igfbp7",
"Ap3s1",
"Gstm1",
"Ms4a3",
"Fcgr3",
"Mpo",
"Cebpb",
"Thy1",
"Anxa3",
"Sell",
"Pcsk9",
"Fcgr2b",
"Cebpa",
"Cxcr4",
]
[38]:
state_info = np.array(adata_3.obs["state_info"])
state_info[cell_group_A > 0] = "MegEr_Prog"
state_info[cell_group_B > 0] = "EMB_Prog"
[39]:
adata_3.obs["state_info"] = state_info
cs.pl.embedding(adata_3, color="state_info")
/Users/shouwenwang/miniconda3/envs/CoSpar_test/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
... storing 'state_info' as categorical

[40]:
cs.settings.set_figure_params(
format="pdf", figsize=[4, 3.5], dpi=200, fontsize=10, pointsize=3
)
gene_list = sub_predicted_upper + sub_predicted_lower
selected_fates = ["MegEr_Prog", "EMB_Prog"]
renames = ["MkEr prog", "MaBaEos prog"]
gene_expression_matrix = cs.pl.gene_expression_heatmap(
adata_3,
selected_genes=gene_list,
selected_fates=selected_fates,
rename_fates=renames,
horizontal=True,
fig_width=6.5,
fig_height=2,
)

Reprogramming¶
The reprogramming dataset from Biddy, B. A. et al. Nature 564, 219–224 (2018). This dataset has multiple time points for both clonal and state measurements.
Key components:
Part I: Infer transition map using clones from all time points
Part II: Infer transition map from end-point clones
Part III: Infer transition map from state information alone
Part IV: Predict early fate bias on day 3
[1]:
import cospar as cs
import numpy as np
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2
cs.settings.set_figure_params(
format="png", dpi=75, fontsize=14
) # use png to reduce file size.
cs.settings.data_path = "CellTag_data" # A relative path to save data. If not existed before, create a new one.
cs.settings.figure_path = "CellTag_figure" # A relative path to save figures. If not existed before, create a new one.
Running cospar 0.2.0 (python 3.8.12) on 2022-02-07 22:21.
Loading data¶
[3]:
adata_orig = cs.datasets.reprogramming()
[4]:
adata_orig = cs.pp.initialize_adata_object(adata_orig)
Time points with clonal info: ['Day12' 'Day15' 'Day21' 'Day28' 'Day6' 'Day9']
WARNING: Default ascending order of time points are: ['Day12' 'Day15' 'Day21' 'Day28' 'Day6' 'Day9']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
[5]:
cs.hf.update_time_ordering(
adata_orig, updated_ordering=["Day6", "Day9", "Day12", "Day15", "Day21", "Day28"]
)
The cells are barcoded over 3 rounds during the entire differentiation process. There are multiple ways to assemble the barcodes on day 0, day 3, and day 13 into a clonal ID. Below, we provide three variants:
Concatenate barcodes on day 0 and day 13, as in the original analysis (
adata_orig.obsm['X_clone_Concat_D0D3']
, the default);Concatenate barcodes on day 0, day 3, and day 13 (
adata_orig.obsm['X_clone_Concat_D0D3D13']
);No concatenation; each cell has up to 3 barcodes (
adata_orig.obsm['X_clone_NonConcat_D0D3D13']
).
The last choice keeps the nested clonal structure in the data. You can choose any one of the clonal arrangement for downstream analysis, by setting adata_orig.obsm['X_clone']=adata_orig.obsm['X_clone_Concat_D0D3']
. The three clonal arrangements give very similar fate prediction.
[6]:
adata_orig
[6]:
AnnData object with n_obs × n_vars = 18803 × 28001
obs: 'time_info', 'state_info', 'reprogram_trajectory', 'failed_trajectory', 'Reference_fate_bias'
uns: 'clonal_time_points', 'data_des', 'state_info_colors', 'time_info_colors', 'time_ordering'
obsm: 'X_clone', 'X_clone_Concat_D0D3', 'X_clone_Concat_D0D3D13', 'X_clone_NonConcat_D0D3D13', 'X_emb', 'X_pca'
[7]:
cs.hf.check_available_choices(adata_orig)
Available transition maps: []
Available clusters: ['Reprogrammed', 'Failed', 'Others']
Available time points: ['Day6' 'Day9' 'Day12' 'Day15' 'Day21' 'Day28']
Clonal time points: ['Day6' 'Day9' 'Day12' 'Day15' 'Day21' 'Day28']
[8]:
re_propressing = False
if re_propressing:
cs.pp.get_highly_variable_genes(adata_orig)
cs.pp.remove_cell_cycle_correlated_genes(
adata_orig, corr_threshold=0.03, confirm_change=True
)
cs.pp.get_X_pca(adata_orig, n_pca_comp=40)
cs.pp.get_X_emb(adata_orig, n_neighbors=20, umap_min_dist=0.3)
# cs.pp.get_state_info(adata_orig,n_neighbors=20,resolution=0.5) # if this is changed, the cluster name used later will be wrong.
cs.pl.embedding(adata_orig, color="time_info")
[9]:
cs.pl.embedding(adata_orig, color="time_info")

[10]:
cs.pl.embedding(adata_orig, color="state_info")

[11]:
adata_orig.obs["reprogram_trajectory"] = adata_orig.obs["reprogram_trajectory"].astype(
int
)
cs.pl.embedding(adata_orig, color="reprogram_trajectory")

[12]:
adata_orig.obs["failed_trajectory"] = adata_orig.obs["failed_trajectory"].astype(int)
cs.pl.embedding(adata_orig, color="failed_trajectory")

Basic clonal analysis¶
[13]:
cs.tl.clonal_fate_bias(
adata_orig, selected_fate="Reprogrammed", alternative="two-sided"
)
cs.pl.clonal_fate_bias(adata_orig)
100%|██████████| 1451/1451 [00:01<00:00, 792.99it/s]
Data saved at adata.uns['clonal_fate_bias']


[14]:
result = adata_orig.uns["clonal_fate_bias"]
result
[14]:
Clone_ID | Clone_size | Q_value | Fate_bias | clonal_fraction_in_target_fate | |
---|---|---|---|---|---|
0 | 1014 | 2382.0 | 1.000000e-20 | 20.000000 | 0.371537 |
1 | 611 | 1908.0 | 1.000000e-20 | 20.000000 | 0.265199 |
2 | 600 | 780.0 | 1.000000e-20 | 20.000000 | 0.014103 |
3 | 1188 | 289.0 | 1.000000e-20 | 20.000000 | 0.446367 |
4 | 545 | 283.0 | 9.769878e-15 | 14.010111 | 0.007067 |
... | ... | ... | ... | ... | ... |
1446 | 497 | 4.0 | 1.000000e+00 | -0.000000 | 0.000000 |
1447 | 496 | 1.0 | 1.000000e+00 | -0.000000 | 1.000000 |
1448 | 494 | 22.0 | 1.000000e+00 | -0.000000 | 0.227273 |
1449 | 504 | 1.0 | 1.000000e+00 | -0.000000 | 0.000000 |
1450 | 1450 | 3.0 | 1.000000e+00 | -0.000000 | 0.000000 |
1451 rows × 5 columns
[15]:
ids = result["Clone_ID"][8:10]
# ids=[324,313,446,716,367]
cs.pl.clones_on_manifold(adata_orig, selected_clone_list=ids)


Part I: Infer transition map using clones from all time points¶
Map inference¶
Running it the first time takes ~20 mins, ~17 mins of which are used to compute the similarity matrix. When it is run again, it only takes ~3 mins.
[16]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
adata_orig,
clonal_time_points=["Day15", "Day21"],
later_time_point="Day28",
smooth_array=[15, 10, 5],
sparsity_threshold=0.2,
intraclone_threshold=0.2,
)
Trying to set attribute `.uns` of view, copying.
------Compute the full Similarity matrix if necessary------
------Infer transition map between initial time points and the later time one------
--------Current initial time point: Day15--------
Step 1: Select time points
Number of multi-time clones post selection: 179
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.929
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.995
--------Current initial time point: Day21--------
Step 1: Select time points
Number of multi-time clones post selection: 226
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.921
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.99
-----------Total used time: 165.4444100856781 s ------------
[17]:
cs.hf.check_available_map(adata)
adata.uns["available_map"]
[17]:
['transition_map', 'intraclone_transition_map']
Save or load pre-computed data¶
This can be used to save adata with maps computed from different tools or parameters. Usually, different parameter choices will result in different data_des
, a prefix to identify the anndata. Saving an adata would print the data_des
, which can be used to load the corresponding adata.
[18]:
save_data = False
if save_data:
cs.hf.save_map(adata)
load_data = False
if load_data:
# file_path='CellTag_data/cospar_MultiTimeClone_Later_FullSpace0_t*Day15*Day21*Day28_adata_with_transition_map.h5ad'
adata = cs.hf.read(file_path)
[19]:
cs.hf.check_available_choices(adata)
WARNING: Pre-computed time_ordering does not include the right time points. Re-compute it!
Available transition maps: ['transition_map', 'intraclone_transition_map']
Available clusters: ['Reprogrammed', 'Failed', 'Others']
Available time points: ['Day15' 'Day21' 'Day28']
Clonal time points: ['Day15' 'Day21' 'Day28']
Plotting¶
Single-cell transitions¶
[20]:
selected_state_id_list = [1000, 3500, 6000, 5500]
cs.pl.single_cell_transition(
adata,
selected_state_id_list=selected_state_id_list,
source="transition_map",
map_backward=False,
)

Fate map¶
[21]:
cs.tl.fate_map(
adata,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
map_backward=True,
)
cs.pl.fate_map(
adata,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
plot_target_state=True,
)
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']


Fate bias¶
[22]:
cs.tl.fate_bias(
adata,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
map_backward=True,
method="norm-sum",
)
cs.pl.fate_bias(
adata,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
plot_target_state=False,
background=False,
show_histogram=True,
)
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']

Identify differentially expressed genes¶
[23]:
selected_fates = ["Reprogrammed", "Failed"]
cs.tl.progenitor(
adata,
selected_fates,
source="transition_map",
sum_fate_prob_thresh=0.2,
bias_threshold_A=0.5,
bias_threshold_B=0.5,
)
cs.pl.progenitor(adata, selected_fates, source="transition_map")
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']
Results saved at adata.obs[f'progenitor_transition_map_Reprogrammed'] and adata.obs[f'diff_trajectory_transition_map_Reprogrammed']
Results saved at adata.obs[f'progenitor_transition_map_Failed'] and adata.obs[f'diff_trajectory_transition_map_Failed']


Differential genes for two ancestor groups¶
[24]:
cell_group_A = np.array(adata.obs[f"progenitor_transition_map_Reprogrammed"])
cell_group_B = np.array(adata.obs[f"progenitor_transition_map_Failed"])
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
adata, cell_group_A=cell_group_A, cell_group_B=cell_group_B, FDR_cutoff=0.05
)
[25]:
# All, ranked, DGE genes for group A
dge_gene_A
[25]:
index | gene | Qvalue | mean_1 | mean_2 | ratio | |
---|---|---|---|---|---|---|
0 | 35 | Col3a1 | 6.087368e-223 | 0.807060 | 5.999794 | -1.953668 |
1 | 57 | Col1a2 | 1.345045e-161 | 0.297653 | 3.076256 | -1.651340 |
2 | 41 | Spp1 | 3.197045e-207 | 10.833703 | 32.003143 | -1.479702 |
3 | 9 | Col1a1 | 0.000000e+00 | 2.036317 | 7.257215 | -1.443333 |
4 | 6 | Il6st | 0.000000e+00 | 2.108766 | 7.134142 | -1.387648 |
... | ... | ... | ... | ... | ... | ... |
4828 | 5607 | Tbk1 | 2.006752e-02 | 0.224493 | 0.224873 | -0.000448 |
4829 | 5172 | Sec23a | 1.005175e-02 | 0.344517 | 0.344837 | -0.000344 |
4830 | 6047 | Ints5 | 3.603720e-02 | 0.226673 | 0.226857 | -0.000216 |
4831 | 5737 | Taf2 | 2.431736e-02 | 0.278397 | 0.278525 | -0.000144 |
4832 | 6101 | Dgkq | 3.874214e-02 | 0.144835 | 0.144836 | -0.000002 |
4833 rows × 6 columns
Gene trend along the trajectory¶
The results are based on pre-computed dynamic trajectories from the preceding step. It is better to use the intraclone_transition_map
. First, show the expression dynamics along the failed trajectory:
[26]:
gene_name_list = ["Col1a2", "Apoa1", "Peg3", "Spint2", "Mettl7a1", "Cdh1"]
selected_fate = "Failed"
cs.pl.gene_expression_dynamics(
adata, selected_fate, gene_name_list, traj_threshold=0.1, invert_PseudoTime=True
)


Expression dynamics along the reprogramming trajectory:
[27]:
gene_name_list = ["Col1a2", "Apoa1", "Peg3", "Spint2", "Mettl7a1", "Cdh1"]
selected_fate = "Reprogrammed"
cs.pl.gene_expression_dynamics(
adata, selected_fate, gene_name_list, traj_threshold=0.1, invert_PseudoTime=True
)


Part II: Infer transition map from end-point clones¶¶
It takes ~12 mins to compute for the first time (excluding the time for computing similarity matrix); and ~5 mins later.
[28]:
adata = cs.tmap.infer_Tmap_from_one_time_clones(
adata_orig,
initial_time_points=["Day15", "Day21"],
later_time_point="Day28",
initialize_method="OT",
OT_cost="SPD",
smooth_array=[15, 10, 5],
sparsity_threshold=0.2,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: Day15--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed shortest path distance matrix
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 51.17336821556091
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.875
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.992
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.276
Finishing Joint Optimization, used time 80.46711325645447
Trying to set attribute `.uns` of view, copying.
--------Current initial time point: Day21--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed shortest path distance matrix
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 98.62810587882996
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.896
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.987
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.15
Finishing Joint Optimization, used time 162.67132997512817
-----------Total used time: 408.8195090293884 s ------------
[29]:
cs.hf.check_available_map(adata)
adata.uns["available_map"]
[29]:
['transition_map', 'OT_transition_map']
Fate bias¶
[30]:
cs.tl.fate_bias(
adata,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
map_backward=True,
method="norm-sum",
)
cs.pl.fate_bias(
adata,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
plot_target_state=False,
background=False,
show_histogram=True,
)
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']

Part III: Infer transition map from state information alone¶
It takes ~5 mins
[31]:
adata = cs.tmap.infer_Tmap_from_state_info_alone(
adata_orig,
initial_time_points=["Day15", "Day21"],
later_time_point="Day28",
initialize_method="OT",
OT_cost="SPD",
smooth_array=[15, 10, 5],
sparsity_threshold=0.2,
use_full_Smatrix=True,
)
Step I: Generate pseudo clones where each cell has a unique barcode-----
Trying to set attribute `.uns` of view, copying.
Step II: Perform joint optimization-----
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: Day15--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed shortest path distance matrix
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 42.353046894073486
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.877
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.994
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.301
Finishing Joint Optimization, used time 97.98227381706238
Trying to set attribute `.uns` of view, copying.
--------Current initial time point: Day21--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed shortest path distance matrix
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 125.77742171287537
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.897
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.984
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.29
Finishing Joint Optimization, used time 114.50927495956421
-----------Total used time: 393.1687672138214 s ------------
[32]:
cs.hf.check_available_map(adata)
adata.uns["available_map"]
[32]:
['transition_map', 'OT_transition_map']
Fate bias¶
[33]:
cs.tl.fate_bias(
adata,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
map_backward=True,
method="norm-sum",
)
cs.pl.fate_bias(
adata,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
plot_target_state=False,
background=False,
show_histogram=True,
)
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']

Part IV: Predict early fate bias on day 3¶
Load data¶
The above dataset only includes clonally-labeled states from day 6 to day 28. We load a dataset that has day-0 and day-3 states, which do not have clonal information.
[34]:
cs.settings.data_path = "CellTag_data_full" # A relative path to save data. If not existed before, create a new one.
cs.settings.figure_path = "CellTag_figure_full" # A relative path to save figures. If not existed before, create a new one.
adata_orig_1 = cs.datasets.reprogramming_Day0_3_28()
[35]:
adata_orig_1 = cs.pp.initialize_adata_object(adata_orig_1)
Time points with clonal info: ['Day28']
WARNING: Default ascending order of time points are: ['Day0' 'Day28' 'Day3']. If not correct, run cs.hf.update_time_ordering for correction.
WARNING: Please make sure that the count matrix adata.X is NOT log-transformed.
[36]:
cs.hf.update_time_ordering(adata_orig_1, updated_ordering=["Day0", "Day3", "Day28"])
[37]:
cs.pl.embedding(adata_orig_1, color="time_info")

[38]:
adata_orig_1
[38]:
AnnData object with n_obs × n_vars = 27718 × 28001
obs: 'predicted_doublet', 'row_counts', 'time_info', 'state_info', 'batch'
var: 'highly_variable-0'
uns: 'clonal_time_points', 'data_des', 'state_info_colors', 'time_info_colors', 'time_ordering'
obsm: 'X_clone', 'X_emb', 'X_emb_old', 'X_emb_v1', 'X_pca'
Infer transition map from end-point clones¶¶
It takes ~4 mins
[39]:
adata_1 = cs.tmap.infer_Tmap_from_one_time_clones(
adata_orig_1,
initial_time_points=["Day3"],
later_time_point=["Day28"],
initialize_method="HighVar",
smooth_array=[15, 10, 5],
max_iter_N=[1, 3],
sparsity_threshold=0.2,
use_full_Smatrix=True,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: Day3--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use the HighVar method for initialization-------
Step a: find the commonly shared highly variable genes------
Highly varable gene number: 1684 (t1); 1696 (t2). Common set: 960
Step b: convert the shared highly variable genes into clonal info------
92%|█████████▎| 888/960 [00:00<00:00, 1292.96it/s]
Total used genes=888 (no cells left)
Step c: compute the transition map based on clonal info from highly variable genes------
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.862
Finishing initialization using HighVar, used time 158.03640127182007
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=15
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=5
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.921
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.512
Finishing Joint Optimization, used time 126.52328705787659
-----------Total used time: 289.5595648288727 s ------------
Fate bias¶
[40]:
cs.tl.fate_map(
adata,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
map_backward=True,
)
cs.pl.fate_map(
adata,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
plot_target_state=True,
)
Use pre-computed fate map


[41]:
cs.tl.fate_bias(
adata_1,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
map_backward=True,
method="norm-sum",
)
cs.pl.fate_bias(
adata_1,
selected_fates=["Reprogrammed", "Failed"],
source="transition_map",
selected_times=["Day3"],
plot_target_state=False,
background=False,
show_histogram=True,
)
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']

Identify ancestor populations¶
[42]:
selected_fates = ["Reprogrammed", "Failed"]
cs.tl.progenitor(
adata_1,
selected_fates,
source="transition_map",
sum_fate_prob_thresh=0.1,
bias_threshold_A=0.5,
bias_threshold_B=0.5,
)
cs.pl.progenitor(adata_1, selected_fates, source="transition_map")
Results saved at adata.obs['fate_map_transition_map_Reprogrammed']
Results saved at adata.obs['fate_map_transition_map_Failed']
Results saved at adata.obs['fate_bias_transition_map_Reprogrammed*Failed']
Results saved at adata.obs[f'progenitor_transition_map_Reprogrammed'] and adata.obs[f'diff_trajectory_transition_map_Reprogrammed']
Results saved at adata.obs[f'progenitor_transition_map_Failed'] and adata.obs[f'diff_trajectory_transition_map_Failed']


DGE analysis¶
[43]:
cell_group_A = np.array(adata_1.obs[f"progenitor_transition_map_Reprogrammed"])
cell_group_B = np.array(adata_1.obs[f"progenitor_transition_map_Failed"])
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
adata_1, cell_group_A=cell_group_A, cell_group_B=cell_group_B, FDR_cutoff=0.05
)
[44]:
# All, ranked, DGE genes for group A
dge_gene_A
[44]:
index | gene | Qvalue | mean_1 | mean_2 | ratio | |
---|---|---|---|---|---|---|
0 | 46 | Ptn | 1.714648e-105 | 1.406754 | 6.237863 | -1.588475 |
1 | 11 | Cnn1 | 9.708323e-167 | 0.914648 | 4.413725 | -1.499543 |
2 | 25 | Thy1 | 6.483313e-134 | 0.640454 | 2.997786 | -1.285106 |
3 | 81 | Penk | 4.283470e-73 | 0.532019 | 2.376845 | -1.140242 |
4 | 2 | Cd248 | 0.000000e+00 | 1.403314 | 4.151762 | -1.100041 |
... | ... | ... | ... | ... | ... | ... |
1514 | 3217 | Fads3 | 2.390519e-02 | 0.770464 | 0.798334 | -0.022533 |
1515 | 3353 | Tmem189 | 3.330180e-02 | 0.633464 | 0.658867 | -0.022264 |
1516 | 3590 | Rere | 4.966121e-02 | 0.542979 | 0.565672 | -0.021064 |
1517 | 1546 | Tuba1b | 5.925740e-06 | 5.398442 | 5.484683 | -0.019315 |
1518 | 3545 | Vegfa | 4.597810e-02 | 1.169517 | 1.198003 | -0.018820 |
1519 rows × 6 columns
Update cluster annotation on day 3
[45]:
x_emb = adata_1.obsm["X_emb"][:, 0]
state_info = np.array(adata_1.obs["state_info"]).astype(">U15")
sp_idx = (cell_group_A > 0) & (x_emb > 0)
state_info[sp_idx] = "Repro_Day3"
sp_idx = (cell_group_B > 0) & (x_emb > 0)
state_info[sp_idx] = "Failed_Day3"
adata_1.obs["state_info"] = state_info
cs.pl.embedding(adata_1, color="state_info")
/Users/shouwenwang/miniconda3/envs/CoSpar_test/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
... storing 'state_info' as categorical

[46]:
cs.settings.set_figure_params(fontsize=12)
gene_list = [
"FoxA1.HNF4a",
"Spint2",
"Apoa1",
"Hes6",
"S100a7a",
"Krt19",
"Clu",
"S100a13",
"Ppa1",
"Mgst3",
"Angptl4",
"Kng1",
"Igfbp2",
"Mt2",
"Areg",
"Rbp1",
"Anxa8",
"Gsto1",
"Ldhb",
"Mmp13",
"Cgref1",
"Fbln5",
"Gss",
"Col1a2",
"Dlk1",
"Peg3",
"Sfrp1",
"Mrc2",
"Col1a1",
]
selected_fates = ["Repro_Day3", "Failed_Day3"]
renames = ["Repro. prog.", "Failed prog."]
gene_expression_matrix = cs.pl.gene_expression_heatmap(
adata_1,
selected_genes=gene_list,
selected_fates=selected_fates,
rename_fates=renames,
horizontal=True,
fig_width=6.5,
fig_height=2,
)

[47]:
gene_list = [
"FoxA1.HNF4a",
"Spint2",
"Apoa1",
"Hes6",
"S100a7a",
"Krt19",
"Clu",
"S100a13",
"Ppa1",
"Mgst3",
"Angptl4",
"Kng1",
"Igfbp2",
"Mt2",
"Areg",
"Rbp1",
"Anxa8",
"Gsto1",
"Ldhb",
"Mmp13",
"Cgref1",
"Fbln5",
"Gss",
"Col1a2",
"Dlk1",
"Peg3",
"Sfrp1",
"Mrc2",
"Col1a1",
]
selected_fates = ["Reprogrammed", "Failed"]
renames = ["Repro.", "Failed"]
gene_expression_matrix = cs.pl.gene_expression_heatmap(
adata_1,
selected_genes=gene_list,
selected_fates=selected_fates,
rename_fates=renames,
horizontal=True,
fig_width=6.5,
fig_height=2,
)

Lung differentiation¶
The direct lung differentiation dataset from Hurley, K. et al. Cell Stem Cell (2020) doi:10.1016/j.stem.2019.12.009.
This dataset has multiple time points for the state manifold, but only one time point for the clonal observation on day 27.
[1]:
import cospar as cs
import numpy as np
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2
cs.settings.data_path = "lung_data_paper" # A relative path to save data. If not existed before, create a new one.
cs.settings.figure_path = "lung_figure_paper" # A relative path to save figures. If not existed before, create a new one.
cs.settings.set_figure_params(
format="png", dpi=75, fontsize=14
) # use png to reduce file size.
Running cospar 0.2.0 (python 3.8.12) on 2022-02-07 22:07.
Load data¶
[3]:
adata_orig = cs.datasets.lung()
[4]:
adata_orig
[4]:
AnnData object with n_obs × n_vars = 15832 × 26766
obs: 'time_info', 'state_info'
uns: 'clonal_time_points', 'data_des', 'state_info_colors'
obsm: 'X_clone', 'X_emb', 'X_pca'
Preprocessing (optional)¶
[5]:
preprocessing = False
if preprocessing:
cs.pp.get_highly_variable_genes(
adata_orig,
normalized_counts_per_cell=10000,
min_counts=3,
min_cells=3,
min_gene_vscore_pctl=80,
)
cs.pp.remove_cell_cycle_correlated_genes(
adata_orig, corr_threshold=0.03, confirm_change=False
) # optional step
cs.pp.get_X_pca(adata_orig, n_pca_comp=40)
# cs.pp.get_X_umap(adata_orig,n_neighbors=20,umap_min_dist=0.3) # we want to keep the original embedding
# cs.pp.get_state_info(adata_orig,leiden_resolution=0.5) # we want to keep the original state annotation
[6]:
cs.pl.embedding(adata_orig, color="state_info")

[7]:
cs.hf.check_available_choices(adata_orig)
Available transition maps: []
Available clusters: ['NLE', 'Endoderm', 'PNEC', 'Gut', 'iAEC2', 'Others']
Available time points: ['D27' 'pos_17' 'pos_21']
Clonal time points: ['D27']
Basic clonal analysis¶
[8]:
cs.pl.clones_on_manifold(
adata_orig,
selected_clone_list=[30, 198, 150, 58, 263],
color_list=["blue", "red"],
selected_times=["D27"],
)





[9]:
cs.tl.fate_coupling(
adata_orig,
source="X_clone",
selected_fates=["iAEC2", "PNEC", "Gut", "NLE", "Endoderm"],
)
cs.pl.fate_coupling(adata_orig, source="X_clone")
WARNING: ['Endoderm'] are ignored due to lack of cells
Results saved as dictionary at adata.uns['fate_coupling_X_clone']
[9]:
<AxesSubplot:title={'center':'source: X_clone'}>

[10]:
cs.pl.barcode_heatmap(
adata_orig,
selected_times="D27",
selected_fates=["iAEC2", "PNEC", "Gut", "NLE", "Endoderm"],
binarize=True,
)
Data saved at adata.uns['barcode_heatmap']
[10]:
<AxesSubplot:>

[11]:
cs.tl.clonal_fate_bias(adata_orig, selected_fate="iAEC2", alternative="two-sided")
cs.pl.clonal_fate_bias(adata_orig)
100%|██████████| 272/272 [00:00<00:00, 616.34it/s]
Data saved at adata.uns['clonal_fate_bias']


Infer transition map from end-point clones¶
It takes around 9 minutes to run it for the first time (the data has ~15000 cells). Later runs take < 3 mins.
[12]:
initial_time_points = ["pos_17", "pos_21"]
clonal_time_point = "D27"
adata = cs.tmap.infer_Tmap_from_one_time_clones(
adata_orig,
initial_time_points=["pos_17", "pos_21"],
later_time_point="D27",
initialize_method="HighVar",
HighVar_gene_pctl=80,
max_iter_N=[1, 3],
smooth_array=[20, 15, 10],
sparsity_threshold=0.2,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: pos_17--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use the HighVar method for initialization-------
Step a: find the commonly shared highly variable genes------
Highly varable gene number: 3689 (t1); 3729 (t2). Common set: 1075
Step b: convert the shared highly variable genes into clonal info------
Total used genes=863 (no cells left)
Step c: compute the transition map based on clonal info from highly variable genes------
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.923
Finishing initialization using HighVar, used time 75.68489861488342
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.959
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.413
Finishing Joint Optimization, used time 28.069818019866943
Trying to set attribute `.uns` of view, copying.
--------Current initial time point: pos_21--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use the HighVar method for initialization-------
Step a: find the commonly shared highly variable genes------
Highly varable gene number: 3794 (t1); 3729 (t2). Common set: 1152
Step b: convert the shared highly variable genes into clonal info------
Total used genes=1088 (no cells left)
Step c: compute the transition map based on clonal info from highly variable genes------
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.939
Finishing initialization using HighVar, used time 80.7336368560791
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.982
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.823
Finishing Joint Optimization, used time 29.09626293182373
-----------Total used time: 219.88412308692932 s ------------
Save pre-computed map¶
[13]:
save_data = False
if save_data:
cs.hf.save_map(adata)
# adata=cs.hf.read('lung_data_paper/Lung_OneTimeClone_t*pos_17*pos_21*D27_adata_with_transition_map.h5ad')
Plotting¶
[14]:
cs.tl.fate_map(adata, source="transition_map", map_backward=True)
cs.pl.fate_map(
adata, selected_fates=["iAEC2"], source="transition_map", plot_target_state=True
)
Results saved at adata.obs['fate_map_transition_map_NLE']
Results saved at adata.obs['fate_map_transition_map_Endoderm']
Results saved at adata.obs['fate_map_transition_map_PNEC']
Results saved at adata.obs['fate_map_transition_map_Gut']
Results saved at adata.obs['fate_map_transition_map_iAEC2']
Results saved at adata.obs['fate_map_transition_map_Others']

[15]:
# cs.settings.set_figure_params(figsize=(4,3.5),format='png',fontsize=17)
selected_fates = ["iAEC2", ["PNEC", "Gut", "NLE", "Endoderm"]]
cs.tl.fate_bias(
adata, selected_fates, source="transition_map", map_backward=True, method="norm-sum"
)
cs.pl.fate_bias(
adata,
selected_fates=selected_fates,
source="transition_map",
plot_target_state=False,
background=False,
show_histogram=True,
)
Results saved at adata.obs['fate_map_transition_map_iAEC2']
Results saved at adata.obs['fate_map_transition_map_PNEC_Gut_NLE_Endoderm']
Results saved at adata.obs['fate_bias_transition_map_iAEC2*PNEC_Gut_NLE_Endoderm']

DGE analysis¶
[16]:
selected_fates = ["iAEC2", ["PNEC", "Gut", "NLE", "Endoderm"]]
cs.tl.progenitor(
adata,
selected_fates,
source="transition_map",
sum_fate_prob_thresh=0,
bias_threshold_A=0.6,
bias_threshold_B=0.3,
)
cs.pl.progenitor(
adata, selected_fates, source="transition_map", selected_times=["pos_17"]
)
Results saved at adata.obs['fate_map_transition_map_iAEC2']
Results saved at adata.obs['fate_map_transition_map_PNEC_Gut_NLE_Endoderm']
Results saved at adata.obs['fate_bias_transition_map_iAEC2*PNEC_Gut_NLE_Endoderm']
Results saved at adata.obs[f'progenitor_transition_map_iAEC2'] and adata.obs[f'diff_trajectory_transition_map_iAEC2']
Results saved at adata.obs[f'progenitor_transition_map_PNEC_Gut_NLE_Endoderm'] and adata.obs[f'diff_trajectory_transition_map_PNEC_Gut_NLE_Endoderm']


[17]:
import numpy as np
cell_group_A = np.array(adata.obs["progenitor_transition_map_iAEC2"])
cell_group_B = np.array(adata.obs["progenitor_transition_map_PNEC_Gut_NLE_Endoderm"])
dge_gene_A, dge_gene_B = cs.tl.differential_genes(
adata, cell_group_A=cell_group_A, cell_group_B=cell_group_B, FDR_cutoff=0.05
)
[18]:
selected_genes = dge_gene_A["gene"][:2]
cs.pl.gene_expression_on_manifold(
adata, selected_genes=selected_genes, color_bar=True, savefig=False
)


Update cluster annotation
[19]:
state_info = np.array(adata.obs["state_info"]).astype(">U15")
state_info[cell_group_A > 0] = "iAEC2 prog."
state_info[cell_group_B > 0] = "Non-iAEC2 prog."
adata.obs["state_info"] = state_info
[20]:
cs.settings.set_figure_params(fontsize=10)
state_info = np.array(adata.obs["state_info"])
gene_list = [
"KLF9",
"CEBPD",
"YBX3",
"ATF4",
"NKX2-1",
"IRX3",
"SLC7A11",
"LIFR", #'HSPA5',
"SFRP1",
"CD55",
"KIT",
"CPM", #'LDLR', 'GFRA1',
"PITX1",
"MSX2",
"MYCN",
"SOX9",
"SOX11",
"ID2",
"PBX1",
"FOXA1",
"SP5",
"TMPO",
"DEK",
"HMGB1",
"PARP1",
] # , 'TFDP1', 'HNRNPD', 'MYBL2']
selected_fates = ["iAEC2 prog.", "Non-iAEC2 prog."]
renames = ["iAEC2 prog.", "Non-iAEC2 prog."]
gene_expression_matrix = cs.pl.gene_expression_heatmap(
adata,
selected_genes=gene_list,
selected_fates=selected_fates,
rename_fates=renames,
horizontal=True,
fig_width=6.5,
fig_height=2,
)

Synthetic bifurcation¶
We simulated a differentiation process over a bifurcation fork. In this simulation, cells are barcoded in the beginning, and the barcodes remain un-changed. In the simulation we resample clones over time. The first sample is obtained after 5 cell cycles post labeling. The dataset has two time points. See Wang et al. (2021) for more details.
[1]:
import cospar as cs
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2
cs.settings.set_figure_params(
format="png", dpi=75, fontsize=14
) # use png to reduce file size.
Running cospar 0.2.0 (python 3.8.12) on 2022-02-07 22:07.
[3]:
cs.settings.data_path = (
"test" # A relative path to save data. If not existed before, create a new one.
)
cs.settings.figure_path = (
"test" # A relative path to save figures. If not existed before, create a new one.
)
Loading data¶
[4]:
adata_orig = cs.datasets.synthetic_bifurcation()
[5]:
adata_orig.obsm["X_clone"]
[5]:
<2474x52 sparse matrix of type '<class 'numpy.float64'>'
with 2474 stored elements in Compressed Sparse Row format>
[6]:
cs.pl.embedding(adata_orig, color="state_info")

[7]:
cs.pl.embedding(adata_orig, color="time_info")

Basic clonal analysis¶
[8]:
cs.pl.clones_on_manifold(adata_orig, selected_clone_list=[1])

[9]:
cs.tl.fate_coupling(adata_orig, source="X_clone")
cs.pl.fate_coupling(adata_orig, source="X_clone")
Results saved as dictionary at adata.uns['fate_coupling_X_clone']
[9]:
<AxesSubplot:title={'center':'source: X_clone'}>

[10]:
cs.pl.barcode_heatmap(adata_orig, selected_times="2", color_bar=True, binarize=True)
Data saved at adata.uns['barcode_heatmap']
[10]:
<AxesSubplot:>

[11]:
cs.tl.clonal_fate_bias(adata_orig, selected_fate="Fate_A", alternative="two-sided")
cs.pl.clonal_fate_bias(adata_orig)
100%|██████████| 52/52 [00:00<00:00, 254.70it/s]
Data saved at adata.uns['clonal_fate_bias']


Transition map inference¶
Transition map from multiple clonal time points.¶
[12]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
adata_orig,
clonal_time_points=["1", "2"],
smooth_array=[10, 10, 10],
CoSpar_KNN=20,
sparsity_threshold=0.2,
)
------Compute the full Similarity matrix if necessary------
----Infer transition map between neighboring time points-----
Step 1: Select time points
Number of multi-time clones post selection: 52
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=10
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.96
-----------Total used time: 1.4379048347473145 s ------------
[13]:
cs.tl.fate_bias(adata, selected_fates=["Fate_A", "Fate_B"], source="transition_map")
cs.pl.fate_bias(adata, source="transition_map")
Results saved at adata.obs['fate_map_transition_map_XXX']
Results saved at adata.obs['fate_bias_transition_map_fateA*fateB']
selected_fates not specified. Using the first available pre-computed fate_bias

Transition map from a single clonal time point¶
[14]:
adata = cs.tmap.infer_Tmap_from_one_time_clones(
adata_orig,
initial_time_points=["1"],
later_time_point="2",
initialize_method="OT",
smooth_array=[10, 10, 10],
sparsity_threshold=0.2,
compute_new=False,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 1--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Load pre-computed shortest path distance matrix
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 3.3392720222473145
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=10
Iteration 2, Use smooth_round=10
Iteration 3, Use smooth_round=10
Convergence (CoSpar, iter_N=3): corr(previous_T, current_T)=0.98
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.094
Finishing Joint Optimization, used time 1.3652610778808594
-----------Total used time: 4.814281940460205 s ------------
[15]:
cs.tl.fate_bias(adata, selected_fates=["Fate_A", "Fate_B"], source="transition_map")
cs.pl.fate_bias(adata, source="transition_map")
Results saved at adata.obs['fate_map_transition_map_XXX']
Results saved at adata.obs['fate_bias_transition_map_fateA*fateB']
selected_fates not specified. Using the first available pre-computed fate_bias

Transition amp from only the clonal information¶
[16]:
adata = cs.tmap.infer_Tmap_from_clonal_info_alone(adata_orig)
cs.tl.fate_bias(
adata, selected_fates=["Fate_A", "Fate_B"], source="clonal_transition_map"
)
cs.pl.fate_bias(adata, source="clonal_transition_map")
Infer transition map between neighboring time points.
Number of multi-time clones post selection: 52
Use all clones (naive method)
Results saved at adata.obs['fate_map_clonal_transition_map_XXX']
Results saved at adata.obs['fate_bias_clonal_transition_map_fateA*fateB']
selected_fates not specified. Using the first available pre-computed fate_bias

[ ]:
Simulate differentiation process¶
[1]:
import cospar as cs
cs.settings.verbosity = 1
cs.settings.set_figure_params(
format="pdf", figsize=[4, 3.5], dpi=100, fontsize=14, pointsize=10
)
# Each dataset should have its folder to avoid conflicts.
cs.settings.data_path = "data_cospar"
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()
The bifurcation model with clonal dispersion¶
[2]:
L = 10
adata = cs.simulate.bifurcation_model(t1=4, M=50, L=L)
Load existing data
[3]:
cs.pl.clones_on_manifold(adata, selected_clone_list=[1, 10, 20])



Using the observed clonal data, no fancy analysis¶
[4]:
adata = cs.tmap.infer_Tmap_from_clonal_info_alone(adata, method="naive")
Tmap = adata.uns["clonal_transition_map"]
state_info = adata.obs["state_info"]
cell_id_t1 = adata.uns["Tmap_cell_id_t1"]
cell_id_t2 = adata.uns["Tmap_cell_id_t2"]
correlation_naive = (
cs.simulate.quantify_correlation_with_ground_truth_fate_bias_BifurcationModel(
Tmap, state_info, cell_id_t1, cell_id_t2
)
)
print(
f"Fate bias correlation from the observed transition map: {correlation_naive:.3f}"
)
Fate bias correlation from the observed transition map: 0.588
[5]:
cs.tl.fate_bias(adata, source="clonal_transition_map", selected_fates=["1", "0"])
cs.pl.fate_bias(adata, source="clonal_transition_map", selected_fates=["1", "0"])

Apply the CoSpar method¶
[6]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
adata, smooth_array=[10, 10, 10], compute_new=True
)
Tmap = adata.uns["transition_map"]
state_info = adata.obs["state_info"]
cell_id_t1 = adata.uns["Tmap_cell_id_t1"]
cell_id_t2 = adata.uns["Tmap_cell_id_t2"]
correlation_cospar = (
cs.simulate.quantify_correlation_with_ground_truth_fate_bias_BifurcationModel(
Tmap, state_info, cell_id_t1, cell_id_t2
)
)
print(
f"Fate bias correlation from the predicted transition map: {correlation_cospar:.3f}"
)
Fate bias correlation from the predicted transition map: 0.945
[7]:
cs.tl.fate_bias(adata, source="transition_map", selected_fates=["1", "0"])
cs.pl.fate_bias(adata, source="transition_map", selected_fates=["1", "0"])

Linear differentiation under barcode homoplasy¶
[8]:
adata = cs.simulate.linear_differentiation_model(
Nt1=400, progeny_N=1, used_clone_N=100, always_simulate_data=True
)
Generate new data
Time elapsed for generating clonal data: 0.2516930103302002
Generate mixing matrix
[9]:
cs.pl.clones_on_manifold(adata, selected_clone_list=[1, 10, 20])



Using the observed clonal data, no fancy method¶
[10]:
adata = cs.tmap.infer_Tmap_from_clonal_info_alone(adata, method="naive")
Tmap = adata.uns["clonal_transition_map"]
state_info = adata.obs["state_info"]
cell_id_t1 = adata.uns["Tmap_cell_id_t1"]
cell_id_t2 = adata.uns["Tmap_cell_id_t2"]
X_t1 = adata.obsm["X_orig"][cell_id_t1]
X_t2 = adata.obsm["X_orig"][cell_id_t2]
TPR_naive = cs.simulate.quantify_transition_peak_TPR_LinearDifferentiation(
Tmap, X_t1, X_t2
)
print(f"True positive rate for the observed transition map: {TPR_naive:.3f}")
True positive rate for the observed transition map: 0.300

Apply CoSpar¶
[11]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
adata, smooth_array=[10, 10, 10], compute_new=True
)
Tmap = adata.uns["transition_map"]
state_info = adata.obs["state_info"]
cell_id_t1 = adata.uns["Tmap_cell_id_t1"]
cell_id_t2 = adata.uns["Tmap_cell_id_t2"]
X_t1 = adata.obsm["X_orig"][cell_id_t1]
X_t2 = adata.obsm["X_orig"][cell_id_t2]
TPR_cospar = cs.simulate.quantify_transition_peak_TPR_LinearDifferentiation(
Tmap, X_t1, X_t2
)
print(f"True positive rate for the predicted transition map: {TPR_cospar:.3f}")
True positive rate for the predicted transition map: 0.833
