Logo
Nazad
Akmuhammet Ashyralyyev, Ege Sirvan, Ecem İlgün, S. Malikić, Tuğkan Batu, S. C. Sahinalp, Can Alkan
0 12. 1. 2026.

GenCore: Genomic distance estimation using Locally Consistent Parsing

In the era of exponential data generation, a fast, consistent, and efficient string processing technique is necessary to represent extensive genomic data. One of the earliest string processing techniques, predating MinHash and minimizer-based sketching, is Locally Consistent Parsing (LCP). This technique partitions an input string and identifies short, exactly occurring substrings called cores, which collectively cover the input string while maintaining Partition and Labeling Consistency. The iterative application of LCP yields progressively longer cores in a compressed format, thereby substantially enhancing the efficiency of genomic sequence representation and subsequent downstream analysis. We have previously developed Lcptools as the first iterative implementation of LCP for the DNA alphabet and demonstrated its effectiveness in identifying cores with minimal collisions. Here, we introduce GenCore, a computational method that leverages LCP cores for the first time to sketch and estimate genomic distances for closely related large genomes, and successfully reconstruct simulated progression trees. GenCore also successfully recapitulates primate phylogeny using both telomere-to-telomere (T2T) assemblies and the PacBio HiFi reads for assembly-free comparisons. Availability GenCore is available at https://github.com/BilkentCompGen/gencore


Pretplatite se na novosti o BH Akademskom Imeniku

Ova stranica koristi kolačiće da bi vam pružila najbolje iskustvo

Saznaj više