Publication Date
Fall 2022
Document Type
Capstone Project
Degree Name
Master of Science
Department
Computer Science
First Advisor
Yunchuan Liu
Second Advisor
Xin (Jasmine) Chen
Third Advisor
Dae Wook Kim
Abstract
The Airline Search Engine Project is a tool that helps anyone to find the facts/data related to Airlines/Airports. For this project, the raw data set is available in the .dat format. We are going to use this data, which can be downloaded from [1].
The tool may also do some first cleaning of the data if needed for forming dimensional data, the cleaning process such as data value unification, data type and size unification, deduplication, dropping columns, and correcting some known errors.
The data will be processed with the help of languages like Python and Spark. By storing the data, we can distribute storage systems such as Hadoop and Amazon S3. The Integrated Development Environment (IDE) used in this project would be editors such as Google Colab and PyCharm.
This tool can be run as a job in different clusters such as EMR (Elastic MapReduce), HDInsight, Cloudera, and Databricks. It can solve/derive data by analyzing terra bytes of raw data into useful information. We can create reports out of it, which Data Analysts, Data Scientists, and businesspeople can use.
Recommended Citation
Poldasu, Jayachandra, "Airline Search Engine Project" (2022). All Capstone Projects. 576.
https://opus.govst.edu/capstones/576