Abstract 6900: Wakhan: Reconstruction of chromosome-scale copy number profiles of tumor genomes with long-read sequencing.
Copy number alterations (CNA) is a phenomenon during cancer evolution where some regions of the genome may be amplified or deleted. This results in heterogeneous collections of cancer cells. Profiling and classification of CNA profiles play a vital role in understanding the cancer heterogeneity and evolution to better inform diagnosis and treatment. There are several short-reads haplotype-specific CNA profiling tools but short reads provide a limited phasing range. Long-reads facilitate the direct phasing of genomic variants into megabase-scale haplotypes, which supports the reconstruction of longer, up to chromosome-scale, CNA profiles. Here we present Wakhan, a tool to analyze haplotype-specific chromosome-scale somatic copy number aberrations using long reads. Leveraging high-quality genome assembly coverage profiles, we show that Wakhan significantly outperforms other common short- and long-read CNA callers in achieving chromosome-level CNA consistency. Wakhan uses tumor-normal long-read BAMs and phased germline SNP calls as input. It first extends the input phasing to be chromosome-scale by exploiting haplotype coverage imbalance. Wakhan detects those phase switch regions and corrects them by taking into consideration the changes in haplotype-specific coverage. Next, Severus utilizes this enhanced phasing to generate phased structural variant (SV) calls. Finally, Wakhan's integrated CNA algorithm uses the SV calls as boundaries and employs a haplotype coverage model to assign integer copy-number states to the resultant CNA regions. https://github.com/KolmogorovLab/Wakhan We sought to compare Wakhan's performance against several state-of-the-art haplotype-specific CNA calling tools. The tools selected for short-read analysis included: Purple, Hatchet, Battenberg and for long-read analysis Purple and Savana are included. As benchmarks for small variants and SV calling are available but no similar benchmarks for somatic CNA calls are available. We designed a CASTLE panel based CNA calling benchmark, consisting of 6 pairs of tumor/normal cell lines sequenced with multiple short- and long-read sequencing technologies. We define segment error (SE) as for each CNA segment, we calculate the haplotype-specific mean squared distance between expected and reference coverage at heterozygous SNPs. This is then used to compute a weighted chromosomal average, normalized by the tumor haplotype's mean coverage. Similarly, for chromosome error (CE), compare the phase of the whole chromosome against the reference coverage. In the five CASTLE datasets, Wakhan and PURPLE had the lowest SE50 and SE75, indicating high accuracy in reconstructing individual CNA segments. We also evaluated Wakhan on a tumor-only dataset. Both Wakhan and PURPLE handled the absence of normal samples well and accurately reflected the expected tumor/normal profiles. Tanveer Ahmad, Ayse Keskus, Mikhail Kolmogorov, Sergey Aganezov, Michael C. Dean, Midhat S. Farooqi, S. Cenk Sahinalp, Benedict Paten, Karen H. Miga, Salem Malikić, Yuelin Liu, Byunggil Yoo, Ataberk Ataberk Donmez, Anton Goretsky. Wakhan: Reconstruction of chromosome-scale copy number profiles of tumor genomes with long-read sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 6900.