Copy number alterations (CNA) is a phenomenon during cancer evolution where some regions of the genome may be amplified or deleted. This results in heterogeneous collections of cancer cells. Profiling and classification of CNA profiles play a vital role in understanding the cancer heterogeneity and evolution to better inform diagnosis and treatment. There are several short-reads haplotype-specific CNA profiling tools but short reads provide a limited phasing range. Long-reads facilitate the direct phasing of genomic variants into megabase-scale haplotypes, which supports the reconstruction of longer, up to chromosome-scale, CNA profiles. Here we present Wakhan, a tool to analyze haplotype-specific chromosome-scale somatic copy number aberrations using long reads. Leveraging high-quality genome assembly coverage profiles, we show that Wakhan significantly outperforms other common short- and long-read CNA callers in achieving chromosome-level CNA consistency. Wakhan uses tumor-normal long-read BAMs and phased germline SNP calls as input. It first extends the input phasing to be chromosome-scale by exploiting haplotype coverage imbalance. Wakhan detects those phase switch regions and corrects them by taking into consideration the changes in haplotype-specific coverage. Next, Severus utilizes this enhanced phasing to generate phased structural variant (SV) calls. Finally, Wakhan's integrated CNA algorithm uses the SV calls as boundaries and employs a haplotype coverage model to assign integer copy-number states to the resultant CNA regions. https://github.com/KolmogorovLab/Wakhan We sought to compare Wakhan's performance against several state-of-the-art haplotype-specific CNA calling tools. The tools selected for short-read analysis included: Purple, Hatchet, Battenberg and for long-read analysis Purple and Savana are included. As benchmarks for small variants and SV calling are available but no similar benchmarks for somatic CNA calls are available. We designed a CASTLE panel based CNA calling benchmark, consisting of 6 pairs of tumor/normal cell lines sequenced with multiple short- and long-read sequencing technologies. We define segment error (SE) as for each CNA segment, we calculate the haplotype-specific mean squared distance between expected and reference coverage at heterozygous SNPs. This is then used to compute a weighted chromosomal average, normalized by the tumor haplotype's mean coverage. Similarly, for chromosome error (CE), compare the phase of the whole chromosome against the reference coverage. In the five CASTLE datasets, Wakhan and PURPLE had the lowest SE50 and SE75, indicating high accuracy in reconstructing individual CNA segments. We also evaluated Wakhan on a tumor-only dataset. Both Wakhan and PURPLE handled the absence of normal samples well and accurately reflected the expected tumor/normal profiles. Tanveer Ahmad, Ayse Keskus, Mikhail Kolmogorov, Sergey Aganezov, Michael C. Dean, Midhat S. Farooqi, S. Cenk Sahinalp, Benedict Paten, Karen H. Miga, Salem Malikić, Yuelin Liu, Byunggil Yoo, Ataberk Ataberk Donmez, Anton Goretsky. Wakhan: Reconstruction of chromosome-scale copy number profiles of tumor genomes with long-read sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 6900.
Tumor evolution is driven by various mutational processes, ranging from single nucleotide variants (SNVs) to large structural variants (SVs) to dynamic shifts in DNA methylation. Current short-read sequencing methods struggle to accurately capture the full spectrum of these genomic and epigenomic alterations, as well as their relations, due to inherent technical limitations. Here we used Nanopore long-read sequencing to profile 23 subclones, each derived from a single cell of a mouse melanoma cell line, for precise detection and evolutionary ordering of SNVs, SVs, copy number alterations (CNAs), and DNA methylation changes at subclonal level. Through phylogenetic analysis of these subclones, we reconstruct the timing of mutational processes and their contributions to diverse clonal phenotypes. The analysis reveals recurrent amplifications of putative driver genes, generated by independent SVs across different lineages, suggesting parallel evolution. Additionally, we described lineage-specific methylation changes associated with aggressive tumor subclones, highlighting epigenetic trajectories linked to tumor progression. Overall, we demonstrate that our long-read approach enables a uniquely comprehensive view of melanoma progression, highlighting that SVs and methylation played an important role in initiation, clonal diversification, and development of therapeutic resistance in this tumor, in consistence with recent clinical findings. We will release the sequencing data and curated variant calls to encourage developments of new computational methods. Chi-Ping Day, Yuelin Liu, Anton Goretsky, Ayse Keskus, Salem Malikic, Eva Perez-Guijarro, Glenn Merlino, Eytan Ruppin, Suleyman Cenk Sahinalp, Mikhail Kolmogorov. Full-range genomic analysis at single-cell resolution reveals genetic, epigenetic, and parallel evolution of melanoma subclones [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 704.
Multi-sample bulk DNA sequencing enables reconstruction of a tumor’s clonal history, but scalable methods often rely on heuristic search and provide no optimality guarantees. We present CITUP2, an integrative combinatorial optimization framework that reconstructs clonal trees from descendant cell fractions (DCFs) of mutational clusters. CITUP2 formulates tree inference as a mixed-integer quadratic program (MIQP) that jointly determines the tree topology and clone prevalences across samples. It minimizes a weighted discrepancy between observed and inferred DCFs, with options to prioritize trees exhibiting consistency in the presence-absence patterns of parent-child clones. Under this formulation, CITUP2 returns provably optimal solutions (with respect to the model) and avoids the combinatorial explosion of exhaustive topology enumeration used by existing methods with optimality guarantees. In addition, CITUP2 can report a user-specified number of best trees. In simulations and analyses of a large, recently published multi-sample TRACERx cohort, CITUP2 scales to trees with tens of clones (approximately 30) and matches or improves on the fit attained by state-of-the-art approaches, while providing clear optimality certificates. Salem Malikic, Hamza Iseric, Chih Hao Wu, Erin Molloy, S. Cenk Sahinalp. Reconstruction of Tumor Clonal Trees with Multi-Sample Bulk Sequencing Data by Integrative Combinatorial Optimization [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 6905.
Understanding and comparing tumor evolutionary histories is fundamental to cancer genomics, with direct implications for tracking subclonal population dynamics, treatment resistance, and tumor heterogeneity. Clonal trees, widely used to model tumor progression, are rooted, unordered trees in which each node represents a subclone labeled by a set of distinct mutations. Various principled and efficient methods have been developed for inferring clonal trees from either bulk or single-cell sequencing data. However, no existing computational approach offers a method that is both efficient and principled to fully align clonal trees and to compare their subclonal architectures, which limits the robustness of any downstream analysis based on inferred clonal trees. We introduce omlta, the optimal multi-label tree alignment of two clonal trees, which removes the minimum number of mutation labels, so that the remaining trees are isomorphic. Computing omlta is NP-hard. Here, we present a fixed-parameter tractable algorithm to compute the omlta, with a running time of O(L^3 log L 2^k) where L is the number of mutation labels shared between the input trees and k is the minimum possible number of mutation labels that need to be removed for the alignment - which we call omltd, the optimal multi-label tree edit distance. Our approach provides an exponentially better (in k) asymptotic runtime than the state-of-the-art algorithm by Akutsu et al. for computing the classic tree alignment and edit distance, concepts similar to what omlta/omltd optimizes on clonal trees. We applied omlta to 126 multi-sample bulk-sequencing data from the TRACERx study on non-small cell lung cancers by comparing clonal trees inferred by CONIPHER and PairTree. Despite the theoretically exponential runtime, we could compute the tree alignment for each tumor quickly, often within seconds. The omltd between CONIPHER and PairTree clonal trees on the same tumor varies substantially across tumors and the distances are negatively associated with the mean cancer cell fraction among mutations. For the tumors characterized by mutations with low cancer cell fractions, it is thus advisable not to use a single tree, but rather the alignment of multiple alternative trees, so that downstream inferences are informed only by robustly placed mutations. We further evaluated our algorithm on an in-house melanoma sample with clonal trees inferred by PhISCS and ScisTree, highlighting the utility of omlta on trees inferred from single-cell sequencing data. On these datasets, our algorithm completed all analyses in practical wall-clock times and showed that it can identify common evolutionary trajectories among clonal trees representing (i) distinct tumors, (ii) distinct samples from the same tumor, (iii) distinct sequencing data from the same sample. Additional supplementary results demonstrate the robustness of our approach in comparison to alternatives on simulated data. Jacob Gilbert, Chih Hao Wu, Marina Knittel, Alejandro Schaffer, Salem Malikić, S. Cenk Sahinalp. Identifying robust subclonal structures through tumor progression tree alignment [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 6898.
Clonal evolution of cancer results in intratumor heterogeneity, making treatment and cure challenging. Single-cell sequencing has advanced our understanding of intratumor heterogeneity, but tracing subclonal evolution using mutational profiles of cells is limited by scale and noise. Moreover, available tumor progression tree inference methods usually offer a single tree to explain the progression of a tumor, and do not inform about alternative evolutionary scenarios. We introduce the bi-partition function for a tumor progression tree, to assess the reliability of any proposed subclonal structure in a single-cell sequenced tumor. By using the bi-partition function, we calculate the probability that any given subset R of mutation-profiled single cells from a tumor forms a clade rooted by a specified mutation ρ across all possible tumor progression trees. This provides the means to evaluate whether R forms a subclone with ρ as a possible subclonal driver, which is especially useful if the cells of R are biologically or clinically significant, e.g., have aggressive growth, therapy resistance, or metastatic potential. We also introduce an algorithm to estimate the bi-partition function, which treats the ground truth as a probability distribution derived from mutational profiles of single cells and samples a tumor progression tree from this distribution independently in each iteration. We prove that our algorithm’s estimate of the bi-partition function asymptotically approaches the ground truth and demonstrate its accuracy on simulated data. Applying our algorithm to the tumor progression tree inferred from single-cell-derived melanoma sublines revealed that, while major clades and their root mutations are robust, (i) the placement of one clade in the tree is unreliable, which we later observed to be a result of Loss of Heterozygosity, and (ii) some of the mutations identified as false positives in the tree are unreliable, which later turned out to be the result of a doublet - a subline which has contamination from another subline. Interestingly, bootstrapping, a technique commonly employed for species trees, failed to point out any of these issues. After correcting the input data for these issues, the reliability of the progression tree improved substantially, demonstrating how our bi-partition function algorithm can aid studies on tumor evolution and intratumor heterogeneity. Farid Rashidi Mehrabadi, Erfan Sadeqi Azer, John D. Bridgers, Teresa M. Przytycka, Salem Malikic, Funda Ergun, Cenk Sahinalp. A bi-partition function algorithm to evaluate inferred subclonal structures in single-cell sequencing data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 6897.
Understanding and comparing tumor evolutionary histories is fundamental to cancer genomics. Clonal trees, used to model tumor progression, are rooted, unordered trees in which each node represents a subclone labeled by a set of distinct mutations. To compare two clonal trees, we introduce omlta, the optimal multi-label tree alignment, which removes the minimum number of mutation labels from the trees, so that the remaining trees are isomorphic. Computing omlta is NP-hard. Here, we present an algorithm to compute the omlta, with a running time of where L ≥ 1 is the total number of mutation labels occurring in the input trees and k is the minimum possible number of mutation labels that need to be removed for the alignment. Our implementation (https://github.com/algo-cancer/omlta) is the first computational tool for determining the optimal alignment between clonal trees. We applied omlta to 126 cases from the TRACERx study on non-small cell lung cancers and some melanoma single-cell data.
In the era of exponential data generation, a fast, consistent, and efficient string processing technique is necessary to represent extensive genomic data. One of the earliest string processing techniques, predating MinHash and minimizer-based sketching, is Locally Consistent Parsing (LCP). This technique partitions an input string and identifies short, exactly occurring substrings called cores, which collectively cover the input string while maintaining Partition and Labeling Consistency. The iterative application of LCP yields progressively longer cores in a compressed format, thereby substantially enhancing the efficiency of genomic sequence representation and subsequent downstream analysis. We have previously developed Lcptools as the first iterative implementation of LCP for the DNA alphabet and demonstrated its effectiveness in identifying cores with minimal collisions. Here, we introduce GenCore, a computational method that leverages LCP cores for the first time to sketch and estimate genomic distances for closely related large genomes, and successfully reconstruct simulated progression trees. GenCore also successfully recapitulates primate phylogeny using both telomere-to-telomere (T2T) assemblies and the PacBio HiFi reads for assembly-free comparisons. Availability GenCore is available at https://github.com/BilkentCompGen/gencore
Efficient and consistent string processing is critical in the exponentially growing genomic data era. Locally Consistent Parsing (LCP) addresses this need by partitioning an input genome string into short, exactly matching substrings (e.g., "cores"), ensuring consistency across partitions. Labeling the cores of an input string consistently not only provides a compact representation of the input but also enables the reapplication of LCP to refine the cores over multiple iterations, providing a progressively longer and more informative set of substrings for downstream analyses. We present the first iterative implementation of LCP with Lcptools and demonstrate its effectiveness in identifying cores with minimal collisions. Experimental results show that the number of cores at the i^th iteration is O(n/c^i) for c ~ 2.34, while the average length and the average distance between consecutive cores are O(c^i). Compared to the popular sketching techniques, LCP produces significantly fewer cores, enabling a more compact representation and faster analyses. To demonstrate the advantages of LCP in genomic string processing in terms of computation and memory efficiency, we also introduce LCPan, an efficient variation graph constructor. We show that LCPan generates variation graphs >10x faster than vg, while using >13x less memory.
Efficient and consistent string processing is critical in the exponentially growing genomic data era. Locally Consistent Parsing (LCP) addresses this need by partitioning an input genome string into short, exactly matching substrings (e.g.,"cores"), ensuring consistency across partitions. Labeling the cores of an input string consistently not only provides a compact representation of the input but also enables the reapplication of LCP to refine the cores over multiple iterations, providing a progressively longer and more informative set of substrings for downstream analyses. We present the first iterative implementation of LCP with Lcptools and demonstrate its effectiveness in identifying cores with minimal collisions. Experimental results show that the number of cores at the i^th iteration is O(n/c^i) for c ~ 2.34, while the average length and the average distance between consecutive cores are O(c^i). Compared to the popular sketching techniques, LCP produces significantly fewer cores, enabling a more compact representation and faster analyses. To demonstrate the advantages of LCP in genomic string processing in terms of computation and memory efficiency, we also introduce LCPan, an efficient variation graph constructor. We show that LCPan generates variation graphs>10x faster than vg, while using>13x less memory.
Tumor evolution is driven by various mutational processes, ranging from single-nucleotide vari- ants (SNVs) to large structural variants (SVs) to dynamic shifts in DNA methylation. Current short-read sequencing methods struggle to accurately capture the full spectrum of these genomic and epigenomic alter- ations due to inherent technical limitations. To overcome that, here we introduce an approach for long-read sequencing of single-cell derived subclones, and use it to profile 23 subclones of a mouse melanoma cell line, characterized with distinct growth phenotypes and treatment responses. We develop a computational frame- work for harmonization and joint analysis of different variant types in the evolutionary context. Uniquely, our framework enables detection of recurrent amplifications of putative driver genes, generated by indepen- dent SVs across different lineages, suggesting parallel evolution. In addition, our approach revealed gradual and lineage-specific methylation changes associated with aggressive clonal phenotypes. We also show our set of phylogeny-constrained variant calls along with openly released sequencing data can be a valuable resource for the development of new computational methods.
Most human cancers arise from somatic alterations, ranging from single nucleotide variations to structural variations (SVs) that can alter the genomic organization. Pathogenic SVs are identified in various cancer types and subtypes, and they play a crucial role in diagnosis and patient stratification. However, the studies on structural variations have been limited due to biological and computational challenges, including tumor heterogeneity, aneuploidy, and the diverse spectrum of SVs from simpler deletions and focal amplifications to catastrophic events shuffling large fragments from one or multiple chromosomes. Long-read sequencing provides the advantage of improved mappability and direct haplotype phasing. Yet, no tool currently exists to comprehensively analyze complex rearrangements within the cancer genome using long-read sequencing. Here, we present Severus, a tool for somatic SV calling and complex SV characterization using long reads. Severus first detects individual SV junctions from phased split alignments, then constructs a phased breakpoint graph to cluster junctions into complex rearrangement events. We first benchmarked the somatic SV calling performance using six tumor/normal cell line pairs (HCC1395, H1437, H2009, HCC1937, HCC1954, Hs578T). We sequenced all cell lines with Illumina, ONT, and PacBio HiFi. We then established a set of high-confidence calls supported by multiple technologies and tools. Severus consistently had the highest F1 scores compared to the HiFi, ONT, and Illumina methods against this high-confidence SV call set. We then extend our analysis to complex SVs. Severus accurately detected complex events, i.e., chromothripsis and chromoplexy, and templated insertion cycles/chains (TIC), reported for these cell lines. We then compared Severus’ performance with Jabba and Linx, two widely used tools for complex SV calling in short-read sequencing. Our comparison revealed that Severus showed higher agreement with Linx, while Jabba failed to detect most of the SV clusters identified by both Severus and Linx. Severus also outperformed the other tools in characterizing complex reciprocal translocations and TICs. Most of the junctions in complex SVs called by either of the tools but not Severus were either simple SVs with a single long-read junction or were not present in long-read sequencing. In contrast, Severus effectively resolved overlapping SVs by utilizing long-read connectivity, allowing for more accurate clustering of smaller genomic segments. We have also applied Severus to seventeen pediatric leukemia cases. Severus identified two chromoplexy and two cryptic translocations, which were missed by FISH and karyotype panels and were incomplete in Illumina SV calls, further validated by RNA-seq. This highlights the potential of the long-read whole genome sequencing approach for diagnosing complex cases driven by SVs. Ayse Keskus, Asher Bryant, Tanveer Ahmad, Anton Goretsky, Byunggil Yoo, Sergey Aganezov, Ataberk Donmez, Lisa A. Lansdon, Isabel Rodriguez, Jimin Park, Yuelin Liu, Xiwen Cui, Joshua Gardner, Brandy McNulty, Samuel Sacco, Jyoti Shetty, Yongmei Zhao, Bao Tran, Giuseppe Narzisi, Adrienne Helland, Daniel Cook, Pi-Chuan Chang, Alexey Kolesnikov, Andrew Carroll, Erin Molloy, Chengpeng Bi, Adam Walter, Margaret Gibson, Irina Pushel, Erin Guest, Tomi Pastinen, Kishwar Shafin, Karen Miga, Salem Malikic, Chi-Ping Day, Nicolas Robine, Cenk Sahinalp, Michael Dean, Midhat S. Farooqi, Benedict Paten, Mikhail Kolmogorov. Severus: A tool for detecting and characterizing complex structural variants in cancer using long-read sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 2848.
Melanoma, a highly heterogeneous cancer, evolves through a complex interplay of genetic alterations, including both single nucleotide variants (SNVs) and structural variants (SVs). To study the evolutionary trajectory of melanoma, we established a model system composed of 24 single-cell-derived clonal sublines (C1-C24) from the M4 melanoma model, developed in a genetically engineered hepatocyte growth factor (HGF)-transgenic mouse. While SNVs have been extensively used to construct phylogenetic trees using Trisicell (Triple-toolkit for single-cell intratumor heterogeneity inference), a tool that analyzes intratumor heterogeneity and single-cell RNA mutations, the role and timing of SVs in melanoma evolution remain less well understood. This study integrates SV data with an SNV-driven phylogeny to investigate whether SV patterns align with SNV-based evolutionary trajectories in the mouse melanoma model, providing insights into the functional impact of SVs during tumor progression. We performed long-read sequencing on the 24 clonal sublines and detected SVs using Severus, a tool optimized for phasing in long-read sequencing. The SVs were mapped to the SNV-driven phylogeny using R and classified as either concordant (aligning with the SNV-based tree) or discordant (deviating from the SNV phylogeny). Gene ontology enrichment analysis revealed that concordant SVs were significantly enriched in genes associated with the hepatocyte growth factor receptor signaling pathway and the negative regulation of peptidyl-threonine phosphorylation, both of which represent core drivers of tumor progression. In contrast, discordant SVs were associated with a broader range of functional pathways, including the positive regulation of antigen receptor-mediated signaling and the regulation of natural killer cell-mediated cytotoxicity, though the exact mechanisms underlying these associations remain unclear. By integrating these SVs with an established SNV-driven phylogeny, this study highlights the distinct and critical roles SVs play in melanoma evolution. Concordant SVs appear to drive core oncogenic processes, while discordant SVs may contribute to other aspects of tumor evolution. These findings underscore the importance of considering SVs alongside SNVs to fully capture the complexity of melanoma evolution. Ongoing investigations will continue to explore the functional implications of these SVs and how the gene disruption patterns they cause shape the evolutionary trajectory of melanoma, offering potential targets for future therapeutic strategies. Xiwen Cui, Ayse G. Keskus, Salem Malikic, Yuelin Liu, Anton Goretsky, Chi-Ping Day, Farid R. Mehrabadi, Mikhail Kolmogorov, Glenn Merlino, S. Cenk Sahinalp. Integrating structural variants and single nucleotide variants to uncover evolutionary trajectories in melanoma [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 3898.
Nema pronađenih rezultata, molimo da izmjenite uslove pretrage i pokušate ponovo!
Ova stranica koristi kolačiće da bi vam pružila najbolje iskustvo
Saznaj više