Publication Date

Spring 2015

Document Type

Project Summary

Degree Name

Master of Science


Computer Science

First Advisor

(Clare) Xueqing Tang, Ph.D.

Second Advisor

Soon-Ok Park, Ph.D.

Third Advisor

Kong-Cheng Wong, Ph.D.


Hadoop is one of the tools designed to handle big data. Hadoop and other software products work to interpret or parse the results of big data searches through specific proprietary algorithms and methods. Hadoop is an open-source program under the Apache license that is maintained by a global community of users. It includes various main components, including a MapReduce set of functions and a Hadoop distributed file system (HDFS). The idea behind MapReduce is that Hadoop can first map a large data set, and then perform a reduction on that content for specific results. A reduce function can be thought of as a kind of filter for raw data.

The HDFS system then acts to distribute data across a network or migrate it as necessary. The term "Hadoop" often refers not just to the base modules above but also to the collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Spark, and others. Prominent corporate users of Hadoop include Face book and Yahoo. It can be deployed in traditional onsite datacenters as well as via the cloud; e.g., it is available on Microsoft Azure, Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3), Google App Engine and IBM Bluemix cloud services.

In this paper, we significantly identify and describe the major factors, that Hadoop approach improves accessing large sets of data say “big data” to meet the rapid changing business environments. We also provide a brief comparison Hadoop techniques with traditional systems techniques, and discuss current state of adopting Hadoop techniques. We speculate that from the need to satisfy the customer through time dependency. Hadoop is emerged as an alternative to traditional methods. The purpose of this paper is to provide an in-depth understanding, the major benefits of Hadoop approach to access, as well as provide a study report of Hadoop importance in the present scenario.


Co-authored capstone with authors listed in alphabetical order by OPUS staff.