Transition map inference

Below, we illustrate transition map inference from different methods in cospar. Once computed, they can be analyzed and visualized later.

[1]:
import cospar as cs
[2]:
cs.logging.print_version()
cs.settings.verbosity = 2  # range: 0 (error),1 (warning),2 (info),3 (hint).
cs.settings.set_figure_params(
    format="png", figsize=[4, 3.5], dpi=75, fontsize=14, pointsize=3
)
Running cospar 0.2.0 (python 3.8.12) on 2022-02-08 19:17.
[3]:
# Each dataset should have its folder to avoid conflicts.
cs.settings.data_path = "data_cospar"
cs.settings.figure_path = "fig_cospar"
cs.hf.set_up_folders()

Load an existing dataset. (If you have pre-processed data, you can load it with cs.hf.read(file_name).)

[4]:
adata_orig = cs.datasets.hematopoiesis_subsampled()

Transition map from clones across multiple time points

This method utilizes clones with multiple time points. It is the core algorithm for the CoSpar package. The first time it runs, it will compute the similarity matrix at different smooth depth and save them. Then, it will infer the transition map. Some key parameters:

  • clonal_time_points (list; default: all): List of time points with clonal data that you want to perform inference.

  • later_time_point (str; default: None): If not set, CoSpar learns a transition map between neighboring time points specified by clonal_time_points, i.e., transitions from day 2 to day 4, and from day 4 to day 6 if clonal_time_points=[‘2’,‘4’,‘6’]. Ohterwise, it learns transitions from any of clonal_time_points to the later_time_point.

  • smooth_array (list; default: [15,10,5]): List of smooth depth at initial runs of iteration. Suppose that it has a length N. For iteration n<N, the n-th entry of smooth_array determines the kernel exponent to build the S matrix at the n-th iteration. When n>N, we use the last entry of smooth_array to compute the S matrix. We recommend starting with more smoothing depth and gradually reduce the depth, as inspired by simulated annealing. Data with higher clonal dispersion should start with higher smoothing depth. The final depth should depend on the manifold itself. For fewer cells, it results in a small KNN graph, and a small final depth should be used. We recommend to use a number at the multiple of 5 for computational efficiency i.e., smooth_array=[20, 15, 10, 5], or [20,15,10].

  • sparsity_threshold (float; default: 0.1): sparsity threshold to remove spurious transitions in the updated transition map, in the range [0,1].

  • max_iter_N (int; default: 5): maximum number of iterations in refining the transition map. We found that 3 iterations are sufficient for tested datasets.

  • epsilon_converge (float; default: 0.05): Iteration is considered converged when the correlation R between maps from consecutive iteration satisfies: R>1-epsilon_converge.

It takes ~3 mins to run for the first time, and ~20 s for later runs.

[5]:
adata = cs.tmap.infer_Tmap_from_multitime_clones(
    adata_orig,
    clonal_time_points=["2", "4"],
    later_time_point="6",
    smooth_array=[20, 15, 10, 5],
    sparsity_threshold=0.1,
    intraclone_threshold=0.2,
    max_iter_N=10,
    epsilon_converge=0.01,
)
Trying to set attribute `.uns` of view, copying.
------Compute the full Similarity matrix if necessary------
------Infer transition map between initial time points and the later time one------
--------Current initial time point: 2--------
Step 1: Select time points
Number of multi-time clones post selection: 185
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.942
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
--------Current initial time point: 4--------
Step 1: Select time points
Number of multi-time clones post selection: 500
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.956
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
-----------Total used time: 19.377027988433838 s ------------

We can see that after this step, we have two maps: 'transition_map' and 'intraclone_transition_map'. Both of them can be used for downstream analysis.

[6]:
cs.hf.check_available_map(adata)
adata.uns["available_map"]
[6]:
['transition_map', 'intraclone_transition_map']

Transition map from end-point clones, or without clones

After initializing the map by either the OT or HighVar method, We jointly infer the likely clonal ancestors and the corresponding transition map between given time points. You need to choose the initialization method and set the corresponding parameters.

  • initialize_method=OT: using optimal transport. Key parameters:

    • OT_epsilon (float; default: 0.02): the entropic regularization

    • OT_cost ({GED, SPD}, default: SPD): method for computing the cost function. GED uses simple gene expression distances; it is faster. SPD uses the shortest path distances; it is slower but often more accurate.

  • initialize_method=HighVar: HighVar assumes that cells similar in gene expression across time points share clonal origin. It converts highly variable genes into pseudo multi-time clones and runs cs.tmap.infer_Tmap_from_multitime_clones to construct the map. We find HighVar performs better than OT, especially when there are large differentiation effects over the observed time window, or batch effects.

    • HighVar_gene_pctl (int; default: 85): percentile threshold to select highly variable genes. Range: [0,100]. A higher value selects more variable genes.

  • max_iter_N (list; default: [3,5]): a list of two entries, representing the maximum iteration number for joint optimization and CoSpar core function, respectively.

  • epsilon_converge (list; default: [0.05,0.05]), a list of two entries, indicating the map convergence threshold for joint optimization and CoSpar core function, respectively.

Infer transition map from a single clonal time point (using OT), please run ( ~ 1 min):

[7]:
adata_1 = cs.tmap.infer_Tmap_from_one_time_clones(
    adata_orig,
    initial_time_points=["4"],
    later_time_point="6",
    initialize_method="OT",
    OT_cost="GED",
    smooth_array=[20, 15, 10, 5],
    sparsity_threshold=0.1,
)
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 4--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use OT method for initialization-------
Finishing computing gene expression distance, used time 6.2248289585113525
Compute new custom OT matrix
Use uniform growth rate
OT solver: duality_gap
Finishing computing optial transport map, used time 36.00060701370239
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.961
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.275
Finishing Joint Optimization, used time 11.890204191207886
-----------Total used time: 54.800551891326904 s ------------
[8]:
cs.hf.check_available_map(adata_1)
adata_1.uns["available_map"]
[8]:
['transition_map', 'OT_transition_map']

Infer transition map from state information alone (using HighVar): (it takes ~ 2 mins)

[9]:
adata_2 = cs.tmap.infer_Tmap_from_state_info_alone(
    adata_orig,
    initial_time_points=["4"],
    later_time_point="6",
    initialize_method="HighVar",
    HighVar_gene_pctl=85,
    max_iter_N=[10, 10],
    epsilon_converge=[0.01, 0.01],
    smooth_array=[20, 15, 10, 5],
    sparsity_threshold=0.1,
)
Step I: Generate pseudo clones where each cell has a unique barcode-----
Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
Step II: Perform joint optimization-----
--------Infer transition map between initial time points and the later time one-------
--------Current initial time point: 4--------
Step 0: Pre-processing and sub-sampling cells-------
Step 1: Use the HighVar method for initialization-------
Step a: find the commonly shared highly variable genes------
Highly varable gene number: 2136 (t1); 2245 (t2). Common set: 978
Step b: convert the shared highly variable genes into clonal info------
 87%|████████▋ | 853/978 [00:00<00:00, 5630.14it/s]
Total used genes=853 (no cells left)
Step c: compute the transition map based on clonal info from highly variable genes------
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.966
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.988
Iteration 6, Use smooth_round=5
Convergence (CoSpar, iter_N=6): corr(previous_T, current_T)=0.996
Finishing initialization using HighVar, used time 16.57118797302246
Step 2: Jointly optimize the transition map and the initial clonal states-------
-----JointOpt Iteration 1: Infer initial clonal structure
-----JointOpt Iteration 1: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.934
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.998
Convergence (JointOpt, iter_N=1): corr(previous_T, current_T)=0.784
-----JointOpt Iteration 2: Infer initial clonal structure
-----JointOpt Iteration 2: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.936
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=2): corr(previous_T, current_T)=0.966
-----JointOpt Iteration 3: Infer initial clonal structure
-----JointOpt Iteration 3: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.994
Convergence (JointOpt, iter_N=3): corr(previous_T, current_T)=0.97
-----JointOpt Iteration 4: Infer initial clonal structure
-----JointOpt Iteration 4: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.995
Convergence (JointOpt, iter_N=4): corr(previous_T, current_T)=0.968
-----JointOpt Iteration 5: Infer initial clonal structure
-----JointOpt Iteration 5: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.938
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=5): corr(previous_T, current_T)=0.973
-----JointOpt Iteration 6: Infer initial clonal structure
-----JointOpt Iteration 6: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=6): corr(previous_T, current_T)=0.969
-----JointOpt Iteration 7: Infer initial clonal structure
-----JointOpt Iteration 7: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.935
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.995
Convergence (JointOpt, iter_N=7): corr(previous_T, current_T)=0.971
-----JointOpt Iteration 8: Infer initial clonal structure
-----JointOpt Iteration 8: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.936
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=8): corr(previous_T, current_T)=0.976
-----JointOpt Iteration 9: Infer initial clonal structure
-----JointOpt Iteration 9: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.936
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.997
Convergence (JointOpt, iter_N=9): corr(previous_T, current_T)=0.977
-----JointOpt Iteration 10: Infer initial clonal structure
-----JointOpt Iteration 10: Update the transition map by CoSpar
Load pre-computed similarity matrix
Iteration 1, Use smooth_round=20
Iteration 2, Use smooth_round=15
Iteration 3, Use smooth_round=10
Iteration 4, Use smooth_round=5
Convergence (CoSpar, iter_N=4): corr(previous_T, current_T)=0.937
Iteration 5, Use smooth_round=5
Convergence (CoSpar, iter_N=5): corr(previous_T, current_T)=0.996
Convergence (JointOpt, iter_N=10): corr(previous_T, current_T)=0.975
Finishing Joint Optimization, used time 242.97642278671265
-----------Total used time: 260.5958061218262 s ------------
[10]:
cs.hf.check_available_map(adata_2)
adata_2.uns["available_map"]
[10]:
['transition_map', 'HighVar_transition_map']

Transition map from clonal data alone, without state information

As a comparision, we can also learn the transition map using clonal information alone:

[11]:
# We use the Weinreb method, which selects uni-potent clones based on a pre-defined list of fates.
selected_fates = [
    "Baso",
    "Ccr7_DC",
    "Eos",
    "Erythroid",
    "Lymphoid",
    "Mast",
    "Meg",
    "Monocyte",
    "Neu_Mon",
    "Neutrophil",
    "pDC",
]

adata_3 = cs.tmap.infer_Tmap_from_clonal_info_alone(
    adata_orig, method="weinreb", later_time_point="6", selected_fates=selected_fates
)
Trying to set attribute `.uns` of view, copying.
Infer transition map between initial time points and the later time point.
--------Current initial time point: 2--------
Number of multi-time clones post selection: 185
Use only uni-potent clones (weinreb et al., 2020)
Used uni-potent clone fraction 0.6162162162162163
--------Current initial time point: 4--------
Number of multi-time clones post selection: 500
Use only uni-potent clones (weinreb et al., 2020)
Used uni-potent clone fraction 0.538
[12]:
cs.hf.check_available_map(adata_3)
adata_3.uns["available_map"]
[12]:
['clonal_transition_map']

Saving maps

[13]:
cs.hf.save_map(adata)
Saved file: data_cospar/LARRY_sp500_ranking1_MultiTimeClone_Later_FullSpace0_t*2*4*6_adata_with_transition_map.h5ad

It can be loaded again with adata=cs.hf.read(file_path)