Logo
Nazad
1 19. 3. 2013.

Boosting high throughput sequencing data compression algorithms using reordering

The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for the computational infrastructure. Currently, most HTS data is compressed through general purpose algorithms such as gzip. These algorithms are not designed for compressing data generated by the HTS platform, as they do not take advantage of the specific nature of genomic sequence data. Here we present SCALCE, a “boosting” scheme based on Locally Consistent Parsing technique which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome. Our tests indicate that SCALCE improves compression rate and time of gzip significantly. We also showed that reordering problem can be considered as an instance of set-cover problem, and that Locally Consistent Parsing is practically good as the best known approximation of set-cover problem. keywords: FASTQ, Genome Sequence Compression, High Throughput Sequencing Technology, Lempel-Ziv Techniques, Locally Consistent Parsing, Boosting


Pretplatite se na novosti o BH Akademskom Imeniku

Ova stranica koristi kolačiće da bi vam pružila najbolje iskustvo

Saznaj više