Logo
Nazad
Marko Lalic, Emina Memic, Faruk Kesan, Edita Gondzic, N. Smajic, N. Nosovic
2 20. 5. 2013.

Comparison of a sequential and a MapReduce approach to joining large datasets

MapReduce as a programming model is considered one of the biggest improvements in massive data processing which utilizes parallelization. The increasing amount of data being processed and stored has caused a need to investigate more efficient solutions to common problems, one of which is performing a join operation on two interconnected datasets. In this paper, a classic sequential solution to this problem is compared with a MapReduce approach, with the intent of discovering the relative advantages of the two. The sequential application runtime for datasets of negligible sizes in today's terms is proven prohibitively slow. Furthermore, a MapReduce cluster of five Amazon EC2 nodes is shown to process, in the same time period, ten times larger data than the sequential application.


Pretplatite se na novosti o BH Akademskom Imeniku

Ova stranica koristi kolačiće da bi vam pružila najbolje iskustvo

Saznaj više