BH akademski imenik

Comparison of high-throughput sequencing data compression tools

24. 10. 2016.

88

Ibrahim Numanagić, J. Bonfield, Faraz Hach, Jan Voges, J. Ostermann, C. Alberti, M. Mattavelli, S. C. Sahinalp

Nature Methods

Vidi više

The State of the Art in High Throughput Sequencing Data Compression Supplementary Materials

2016.

0

Ibrahim Numanagić, J. Bonfield, Faraz Hach, Jan Voges, Jörn, Ostermann, C. Alberti, M. Mattavelli et al.

Vidi više

METHODS FOR THE DETECTION OF SINGLE NUCLEOTIDE VARIANTS AND INDELS FROM CELL-FREE DNA

2016.

0

B. McConeghy, K. Beja, A. Haegert, R. Bell, Yen-Yi Lin, Ibrahim Numanagić

Successful development and application of precision oncology approaches require robust elucidation of the genomic landscape of a patient’s cancer and the ability to monitor therapy-induced genomic changes in the tumour in an inexpensive and minimally invasive manner. Thanks to recent advances in sequencing technologies, ”liquid biopsy”, the sampling of patient’s bodily fluids such as blood, is considered as one of the most promising approaches to achieve this goal. In many cancer patients, especially those with advanced metastatic disease, deep sequencing of cell-free DNA (cfDNA) obtained from patient’s blood yields a mixture of reads originating from the normal DNA and from multiple tumour subclones called circulating tumour DNA (ctDNA). The ctDNA/cfDNA ratio and the proportion of ctDNA originating from specific tumour subclones depend on multiple factors, making comprehensive detection of mutations difficult, especially at early stages of cancer. We introduce SiNVICT, a computational method for analysis of cfDNA sequencing data. keywords: Cancer genomics, SNV calling, cell-free DNA

Vidi više

Cypiripi: exact genotyping of CYP2D6 using high-throughput sequencing data

10. 6. 2015.

39

Ibrahim Numanagić, S. Malikić, V. Pratt, T. Skaar, D. Flockhart, S. C. Sahinalp

Bioinform.

Motivation: CYP2D6 is highly polymorphic gene which encodes the (CYP2D6) enzyme, involved in the metabolism of 20–25% of all clinically prescribed drugs and other xenobiotics in the human body. CYP2D6 genotyping is recommended prior to treatment decisions involving one or more of the numerous drugs sensitive to CYP2D6 allelic composition. In this context, high-throughput sequencing (HTS) technologies provide a promising time-efficient and cost-effective alternative to currently used genotyping techniques. To achieve accurate interpretation of HTS data, however, one needs to overcome several obstacles such as high sequence similarity and genetic recombinations between CYP2D6 and evolutionarily related pseudogenes CYP2D7 and CYP2D8, high copy number variation among individuals and short read lengths generated by HTS technologies. Results: In this work, we present the first algorithm to computationally infer CYP2D6 genotype at basepair resolution from HTS data. Our algorithm is able to resolve complex genotypes, including alleles that are the products of duplication, deletion and fusion events involving CYP2D6 and its evolutionarily related cousin CYP2D7. Through extensive experiments using simulated and real datasets, we show that our algorithm accurately solves this important problem with potential clinical implications. Availability and implementation: Cypiripi is available at http://sfu-compbio.github.io/cypiripi. Contact: cenk@sfu.ca.

Preuzmi PDF

Vidi više

DeeZ: reference-based compression by local assembly

30. 10. 2014.

46

Faraz Hach, Ibrahim Numanagić, S. C. Sahinalp

Nature Methods

Vidi više

ORMAN: Optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms

1. 3. 2014.

16

Phuong Dao, Ibrahim Numanagić, Yen-Yi Lin, Faraz Hach, E. Karakoç, Nilgun Donmez, C. Collins, E. Eichler et al.

Bioinform.

MOTIVATION RNA-Seq technology is promising to uncover many novel alternative splicing events, gene fusions and other variations in RNA transcripts. For an accurate detection and quantification of transcripts, it is important to resolve the mapping ambiguity for those RNA-Seq reads that can be mapped to multiple loci: >17% of the reads from mouse RNA-Seq data and 50% of the reads from some plant RNA-Seq data have multiple mapping loci. In this study, we show how to resolve the mapping ambiguity in the presence of novel transcriptomic events such as exon skipping and novel indels towards accurate downstream analysis. We introduce ORMAN ( O ptimal R esolution of M ultimapping A mbiguity of R N A-Seq Reads), which aims to compute the minimum number of potential transcript products for each gene and to assign each multimapping read to one of these transcripts based on the estimated distribution of the region covering the read. ORMAN achieves this objective through a combinatorial optimization formulation, which is solved through well-known approximation algorithms, integer linear programs and heuristics. RESULTS On a simulated RNA-Seq dataset including a random subset of transcripts from the UCSC database, the performance of several state-of-the-art methods for identifying and quantifying novel transcripts, such as Cufflinks, IsoLasso and CLIIQ, is significantly improved through the use of ORMAN. Furthermore, in an experiment using real RNA-Seq reads, we show that ORMAN is able to resolve multimapping to produce coverage values that are similar to the original distribution, even in genes with highly non-uniform coverage. AVAILABILITY ORMAN is available at http://orman.sf.net

Preuzmi PDF

Vidi više

Boosting high throughput sequencing data compression algorithms using reordering

19. 3. 2013.

1

Ibrahim Numanagić

The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for the computational infrastructure. Currently, most HTS data is compressed through general purpose algorithms such as gzip. These algorithms are not designed for compressing data generated by the HTS platform, as they do not take advantage of the specific nature of genomic sequence data. Here we present SCALCE, a “boosting” scheme based on Locally Consistent Parsing technique which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome. Our tests indicate that SCALCE improves compression rate and time of gzip significantly. We also showed that reordering problem can be considered as an instance of set-cover problem, and that Locally Consistent Parsing is practically good as the best known approximation of set-cover problem. keywords: FASTQ, Genome Sequence Compression, High Throughput Sequencing Technology, Lempel-Ziv Techniques, Locally Consistent Parsing, Boosting

Vidi više

SCALCE: boosting sequence compression algorithms using locally consistent encoding

1. 12. 2012.

153

Faraz Hach, Ibrahim Numanagić, C. Alkan, S. C. Sahinalp

Bioinform.

MOTIVATION The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for the computational infrastructure. Data management, storage and analysis have become major logistical obstacles for those adopting the new platforms. The requirement for large investment for this purpose almost signalled the end of the Sequence Read Archive hosted at the National Center for Biotechnology Information (NCBI), which holds most of the sequence data generated world wide. Currently, most HTS data are compressed through general purpose algorithms such as gzip. These algorithms are not designed for compressing data generated by the HTS platforms; for example, they do not take advantage of the specific nature of genomic sequence data, that is, limited alphabet size and high similarity among reads. Fast and efficient compression algorithms designed specifically for HTS data should be able to address some of the issues in data management, storage and communication. Such algorithms would also help with analysis provided they offer additional capabilities such as random access to any read and indexing for efficient sequence similarity search. Here we present SCALCE, a 'boosting' scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome. RESULTS Our tests indicate that SCALCE can improve the compression rate achieved through gzip by a factor of 4.19-when the goal is to compress the reads alone. In fact, on SCALCE reordered reads, gzip running time can improve by a factor of 15.06 on a standard PC with a single core and 6 GB memory. Interestingly even the running time of SCALCE + gzip improves that of gzip alone by a factor of 2.09. When compared with the recently published BEETL, which aims to sort the (inverted) reads in lexicographic order for improving bzip2, SCALCE + gzip provides up to 2.01 times better compression while improving the running time by a factor of 5.17. SCALCE also provides the option to compress the quality scores as well as the read names, in addition to the reads themselves. This is achieved by compressing the quality scores through order-3 Arithmetic Coding (AC) and the read names through gzip through the reordering SCALCE provides on the reads. This way, in comparison with gzip compression of the unordered FASTQ files (including reads, read names and quality scores), SCALCE (together with gzip and arithmetic encoding) can provide up to 3.34 improvement in the compression rate and 1.26 improvement in running time. AVAILABILITY Our algorithm, SCALCE (Sequence Compression Algorithm using Locally Consistent Encoding), is implemented in C++ with both gzip and bzip2 compression options. It also supports multithreading when gzip option is selected, and the pigz binary is available. It is available at http://scalce.sourceforge.net. CONTACT fhach@cs.sfu.ca or cenk@cs.sfu.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Preuzmi PDF

Vidi više

Nema pronađenih rezultata, molimo da izmjenite uslove pretrage i pokušate ponovo!

Publikacije (38)

Filters

Filteri

Datum objave

Uključeni istraživači

Dodatni filteri

Pretplatite se na novosti o BH Akademskom Imeniku