BH akademski imenik

Marko Lalic, Emina Memic, Faruk Kesan, Edita Gondzic, N. Smajic, N. Nosovic

2 20. 5. 2013.

Comparison of a sequential and a MapReduce approach to joining large datasets

International Convention on Information and Communication Technology, Electronics and Microelectronics

MapReduce as a programming model is considered one of the biggest improvements in massive data processing which utilizes parallelization. The increasing amount of data being processed and stored has caused a need to investigate more efficient solutions to common problems, one of which is performing a join operation on two interconnected datasets. In this paper, a classic sequential solution to this problem is compared with a MapReduce approach, with the intent of discovering the relative advantages of the two. The sequential application runtime for datasets of negligible sizes in today's terms is proven prohibitively slow. Furthermore, a MapReduce cluster of five Amazon EC2 nodes is shown to process, in the same time period, ten times larger data than the sequential application.

+ 1
- Computer Science

Pretplatite se na novosti o BH Akademskom Imeniku