Publication Date

Fall 2022

Document Type

Capstone Project

Degree Name

Master of Science

Department

Computer Science

First Advisor

Yunchuan Liu

Second Advisor

Xin (Jasmine) Chen

Third Advisor

Dae Wook Kim

Abstract

The Airline Search Engine Project is a tool that helps anyone to find the facts/data related to Airlines/Airports. For this project, the raw data set is available in the .dat format. We are going to use this data, which can be downloaded from [1].
The tool may also do some first cleaning of the data if needed for forming dimensional data, the cleaning process such as data value unification, data type and size unification, deduplication, dropping columns, and correcting some known errors.
The data will be processed with the help of languages like Python and Spark. By storing the data, we can distribute storage systems such as Hadoop and Amazon S3. The Integrated Development Environment (IDE) used in this project would be editors such as Google Colab and PyCharm.
This tool can be run as a job in different clusters such as EMR (Elastic MapReduce), HDInsight, Cloudera, and Databricks. It can solve/derive data by analyzing terra bytes of raw data into useful information. We can create reports out of it, which Data Analysts, Data Scientists, and businesspeople can use.

Share

COinS