Publication Date

Summer 2017

Document Type


Degree Name

Master of Science



First Advisor

Andrius Tamulis, Ph.D.

Second Advisor

Anne Morlet, Ph.D.

Third Advisor

Jing Zhang, Ph.D.


Since ancient times, men have built and sold houses. But just how much is a house worth? The challenge is to be able to use information about a house such as its location, and the area on which it is built to predict its price. Such predicted prices can be of great importance to any participant in the real estate business be it an agent, a buyer, seller or a bank to make intelligent decisions and the profit that come with such decisions. Since every company’s success depends on its ability to accurately predict financial outcomes, its profitability will depend on how well it can forecast economic outcomes. The goal of this thesis is to demonstrate how to use the forecasting tools of the software R to forecast house prices. To achieve this, we use random forest, correlation plots and scatter plots to select variables to include to use in building a model using the information in one of the data sets (training data set) and then test the effectiveness of the model on another set (test data set). Then, we explore the relationships between these variables and decide whether it is appropriate to build linear models(lm) or a generalized linear models(glm). Finally, we build our model on the dataset making sure to avoid an overly complex or overfit model. Noting that our model suffers from unconditional heteroskedasticity, we discuss its goodness of fit. Then we use the model to predict sales prices for the point in the testing data set.