Abstract LB020: Epigenomic tumor evolution modeling with single-cell methylation data profiling
The heritability of methylation patterns in tumor cells, as shown in recent studies, suggests that tumor heterogeneity and progression can be interpreted and predicted in the context of methylation changes. To elucidate methylation-based evolution trajectory in tumors, we introduce a novel computational method for methylation phylogeny reconstruction leveraging single cell bisulfite treated whole genome sequencing data (scBS-seq), incorporating additional copy number information inferred independently from matched single cell RNA sequencing (scRNA-seq) data, when available. We validate our method with the scBS-seq data of multi-regionally sampled colorectal cancer cells, and demonstrate that the cell lineages constructed by our method strongly correlate with original sampling regions. Our method consists of three components: (i) noise-minimizing site selection, (ii) likelihood-based sequencing error correction, and (iii) pairwise expected distance calculation for cells, all designed to mitigate the effect of noise and uncertainty due to data sparsity commonly observed in scBS-seq data. In (i), we present an integer linear program-based biclustering formulation to select a set of CpG-sites and cells so that the number of CpG-sites with non-zero coverage in the selected cells is maximized. This procedure filters out cells with read information in too few sites and CpG-sites with read information in too few cells. In (ii), we address the sequencing errors commonly encountered in currently available platforms with a maximum log likelihood approach to correct likely sequencing errors in scBS-seq reads, incorporating CpG-site copy number information in case it can be orthogonally obtained. Given the copy number and read information for a site in a cell, together with the overall sequencing error probability, we compute the log likelihood for all possible underlying allele statuses. If the mixed read statuses at the CpG-site for the cell are more likely due to sequencing error on homozygous alleles as opposed to the presence of alleles mixed methylation statuses, we correct the reads of the minority methylation status to the majority one. In (iii), we introduce a formulation to estimate distances between any pair of cells. As scBS-seq data is typically characterized by shallow read coverage, there is rarely read count evidence for two (or more, depending on CNV status) alleles at a CpG-site. Since allele-specific methylation has been shown to have increased frequency in cancer tissues, given the reads at a CpG-site, it is especially important to consider the possibility of unobserved alleles and their methylation status when determining the CpG-site9s possible methylation zygosities. Our method incorporates copy number information when available, and for each CpG-site in a cell, we compute a probability distribution across all possible methylation zygosities. Then, given specific distance values between pairs of distinct zygosities and the likelihood of each possible zygosity for each shared CpG-site in both cells, we compute the expected total distance between any pair of cells as the mean of expected distances across all shared CpG-sites. We leverage such pairwise distances in methylation phylogeny construction. Citation Format: Xuan C. Li, Yuelin Liu, Farid Rashidi, Salem Malikic, Stephen M. Mount, Eytan Ruppin, Kenneth Aldape, Cenk Sahinalp. Epigenomic tumor evolution modeling with single-cell methylation data profiling [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr LB020.