cospar.tmap.infer_Tmap_from_one_time_clones

cospar.tmap.infer_Tmap_from_one_time_clones(adata_orig, initial_time_points=None, later_time_point=None, initialize_method='OT', OT_epsilon=0.02, OT_dis_KNN=5, OT_cost='SPD', HighVar_gene_pctl=85, padding_X_clone=False, normalization_mode=1, sparsity_threshold=0.2, CoSpar_KNN=20, use_full_Smatrix=True, smooth_array=[15, 10, 5], trunca_threshold=[0.001, 0.01], compute_new=False, max_iter_N=[1, 5], epsilon_converge=[0.05, 0.05], use_fixed_clonesize_t1=False, sort_clone=1, save_subset=True, use_existing_KNN_graph=False)

Infer transition map from clones with a single time point

We jointly infer a transition map and the initial clonal observation through iteration. The inferred map is between each of the initial time points [‘day_1’,’day_2’,…,] and the time point with clonal observation. We initialize the transition map by either the OT method or HighVar method.

Summary

  • Parameters relevant for cell state selection: initial_time_points, later_time_point.

  • Initialization methods:

    • ‘OT’: optional transport based method. Key parameters: OT_epsilon, OT_dis_KNN. See infer_Tmap_from_optimal_transport().

    • ‘HighVar’: a customized approach, assuming that cells similar in gene expression across time points share clonal origin. Key parameter: HighVar_gene_pctl. See infer_Tmap_from_HighVar().

  • Key parameters relevant for joint optimization itself (which relies on coherent sparse optimization): smooth_array, CoSpar_KNN, sparsity_threshold. See refine_Tmap_through_joint_optimization().

Parameters
adata_orig : AnnData object

It is assumed to be preprocessed and has multiple time points.

initial_time_points : list, optional (default, all time points)

List of initial time points to be included for the transition map. Like [‘day_1’,’day_2’]. Entries consistent with adata.obs[‘time_info’].

later_time_point : str, optional (default, the last time point)

The time point with clonal observation. Its value should be consistent with adata.obs[‘time_info’].

initialize_method : str, optional (default ‘OT’)

Method to initialize the transition map from state information. Choice: {‘OT’, ‘HighVar’}.

OT_epsilon : float, optional (default: 0.02)

The entropic regularization, >0. A larger value increases uncertainty of the transition. Relevant when initialize_method=’OT’.

OT_dis_KNN : int, optional (default: 5)

Number of nearest neighbors to construct the KNN graph for computing the shortest path distance. Relevant when initialize_method=’OT’.

OT_cost : str, optional (default: SPD), options {‘GED’,’SPD’}

The cost metric. We provide gene expression distance (GED), and also shortest path distance (SPD). GED is much faster, but SPD is more accurate. However, cospar is robust to the initialization.

HighVar_gene_pctl : int, optional (default: 85)

Percentile threshold to select highly variable genes to construct pseudo-clones. A higher value selects more variable genes. Range: [0,100]. Relevant when initialize_method=’HighVar’.

padding_X_clone : bool, optional (default: False)

If true, select cells at the later_time_point yet without any clonal label, and generate a unique clonal label for each of them. This adds artificial clonal data. However, it will make the best use of the state information, especially when there are very few clonal barcodes in the data.

normalization_mode : int, optional (default: 1)

Normalization method. Choice: [0,1]. 0, single-cell normalization; 1, Clone normalization. The clonal normalization suppresses the contribution of large clones, and is much more robust.

smooth_array : list, optional (default: [15,10,5])

List of smooth rounds at initial runs of iteration. Suppose that it has a length N. For iteration n<N, the n-th entry of smooth_array determines the kernel exponent to build the S matrix at the n-th iteration. When n>N, we use the last entry of smooth_array to compute the S matrix. We recommend starting with more smoothing depth and gradually reduce the depth, as inspired by simulated annealing. Data with higher clonal dispersion should start with higher smoothing depth. The final depth should depend on the manifold itself. For fewer cells, it results in a small KNN graph, and a small final depth should be used. We recommend to use a number at the multiple of 5 for computational efficiency i.e., smooth_array=[20, 15, 10, 5], or [20,15,10]

max_iter_N : list, optional (default: [1,5])

A list for maximum iterations for the Joint optimization and CoSpar core function, respectively.

epsilon_converge : list, optional (default: [0.05,0.05])

A list of convergence threshold for the Joint optimization and CoSpar core function, respectively. The convergence threshold is for the change of map correlations between consecutive iterations. For CoSpar core function, this convergence test is activated only when CoSpar has iterated for 3 times.

CoSpar_KNN : int, optional (default: 20)

The number of neighbors for KNN graph used for computing the similarity matrix.

trunca_threshold : list, optional (default: [0.001,0.01])

Threshold to reset entries of a matrix to zero. The first entry is for Similarity matrix; the second entry is for the Tmap. This is only for computational and storage efficiency.

sparsity_threshold : float, optional (default: 0.1)

The relative threshold to remove noises in the updated transition map, in the range [0,1].

save_subset : bool, optional (default: True)

If true, save only Smatrix at smooth round [5,10,15,…]; Otherwise, save Smatrix at each round.

use_full_Smatrix : bool, optional (default: True)

If true, extract the relevant Smatrix from the full Smatrix defined by all cells. This tends to be more accurate. The package is optimized around this choice.

use_fixed_clonesize_t1 : bool, optional (default: False)

If true, fix the number of initial states as the same for all clones

sort_clone : int, optional (default: 1)

The order to infer initial states for each clone: {1,-1,others}. 1, sort clones by size from small to large; -1, sort clones by size from large to small; others, do not sort.

compute_new : bool, optional (default: False)

If True, compute everything (ShortestPathDis, OT_map, etc.) from scratch, whether it was computed and saved before or not. Regarding the Smatrix, it is recomputed only when use_full_Smatrix=False.

use_existing_KNN_graph : bool, optional (default: False)

If true and adata.obsp[‘connectivities’], use the existing knn graph to compute the shortest-path distance. Revelant if initialize_method=’OT’. This overrides all other relevant parameters for building shortest-path distance.

Returns

adata (AnnData object) – Update adata.obsm[‘X_clone’] and adata.uns[‘transition_map’], as well as adata.uns[‘OT_transition_map’] or adata.uns[‘HighVar_transition_map’], depending on the initialization. adata_orig.obsm[‘X_clone’] remains the same.