"Big" data analytics: projects for ISyE 6416

Prof. Yao Xie, Georgia Tech, Spring 2016

Welcome to the website for ISyE 6416, ISyE, Georgia Tech. We will post our projects on data analytics here. The goal of the project is to connect the course material to real-life data analysis. More information here.

Projects

1. Better utilization of the room

by Yi Wen, Xi Yang and Xue Zhang
  • In this project, we used the data that can be a factor to determine room occupancy, and using several classification methods to develop a effective model for room occupancy determination.
  • Proposal
  • Final Report
  • 2. Multi-temporal Classification of a Seasonally Flooded Wetland Using Optical Remote Sensing (Satellite) Indices.

    by Courtney Di Vittorio and Robert Woessner
  • Using the NDWI and NDVI values we will experiment with both clustering and supervised classification techniques. We can classify images individually for one instance in time or incorporate multiple images into the classification algorithm.
  • Proposal
  • Final Report
  • 3. Clustering and Classification of Handwritten Digits

    by Caroline Roeger and Damon Frezza
  • We plan to identify the best algorithm or method for handwritten digit recognition. Handwritten digit recognition is important in a world where tasks and processes are becoming more automated. Banking, for example, is significantly more automated than it used to be.
  • Proposal
  • Final Report
  • 4. Comparison of Classification Methods in Identifying Handwritten Digits

    by Danielle Boccelli and William Henry Taslim
  • The result of our project will provide an insightful information about each method for tech companies or computer scientists who are looking for the best way to recognize handwritten words.
  • Proposal
  • Final Report
  • 5. What Industry Are You?

    by Simon Chow and Pravara Harati
  • Our project is to build an image classification system. Given the headshot of a person, our classifier will determine which industry the person is likely to belong to. This could have numerous practical applications if a successful classifier is found.
  • Proposal
  • Final Report
  • 6. Music Genre Classification (Best Project Award)

    by Chen Feng, Mina Georgieva and Tony Yaacoub
  • The aim of this project is to improve upon the accuracy of genre classification. We are considering a 10-genre classification problem with the following categories: classic pop and rock; classical; dance and electronics; folk; hip-hop; jazz and blues; metal, pop; punk; soul and reggae. The features we will use for classification are timbre, tempo and loudness information.
  • Proposal
  • Final Report
  • 7. HMM Model in Economic Recession Identification

    by Ting Sun, Shiyu Zheng and Zhe Liu
  • In this project, we develop an extension of the Hidden Markov Model (HMM) to find the hidden economic states of the US behind the data. Specifically, the HMM has a Gaussian mixture at each state as the forecast generator.
  • Proposal
  • Final Report
  • 8. Beijing PM2.5 Time Series Analysis and Prediction using Regression and Markov Model at Different Time Scales

    by ​Mengmeng Liu and Xin Cao
  • Two different models are considered to be used to do the time series analysis of PM2.5 in Beijing during 2009-2015. After obtained the model, we use them to predict the PM 2.5 in 2016, and compare our prediction with the real data.
  • Proposal
  • Final Report
  • 9. Profit-based classification in customer churn prediction: a case study in banking industry

    by Ashkan Zakaryazad and Taewoon Kong
  • The word “churn” means to stop consuming products of a specific company and use fungible product of another company because of its better quality or service or less price.
  • Proposal
  • Final Report
  • 10. Women's Gymnastics (Best Project Award)

    by Bryan Hartman, Scott Lynch and Ben Pope
  • For our data analysis, we focused our attention to the Artistic Women's Gymnastics events from the 2012 London Olympics. This was the most recent Summer Olympics and the first to adopt a new scoring systme for women's gymnastics.
  • Proposal
  • Final Report
  • 11. An Application of Clustering in Weighted Social Networks

    by Junying He and Xi He
  • In this project, we want to apply the idea of clustering to Enron email dataset. Our goal is to detect communities within the company, based on the email communications among the employees. We will also compare the result of our method with the graphical representation of the social network, to see if our result is reasonable.
  • Proposal
  • Final Report
  • 12. Facial Recognition

    by Hugo Acuna, James Netter and Mahadevan Vaidyanathan
  • With a rise in the importance of facial recognition, for not only law enforcement but for websites like Facebook, the ability to create software that has the ability to recognize a human face is paramount. In this project, our goal is to create an effective algorithm for detecting key facial features in a picture of a person’s face.
  • Proposal
  • Final Report
  • 13. Hidden Markov Model for Stock Market Index

    by Ziqi Yang and Feng Gao
  • In this project, we are going to use Hidden Markov Model to analyze which state the market is in and trying to obtain the volatility of each states and using the Black-Sholes formula for option pricing.
  • Proposal
  • Final Report
  • 14. Baseball Predictions

    by Bella Smith and Betsie Last
  • Which is better for predicting what baseball team will make the playoffs, k-means clustering or LDA classification?
  • Proposal
  • Final Report
  • 15. Choosing Between Branch-Based and Branch-Avoiding Algorithms for Social Network Graphs

    by Chris Meixell, Eisha Nathan and Chuanping Yu
  • We focus on those graphs modeling social networks. Much research of social networks focuses on identifying important players in a graph, using some centrality metric. One such metric is clustering coefficients, which is useful for finding key players in a network based on their local connectivity.
  • Proposal
  • Final Report
  • 16. Can We Know How Much Money You Make Based on Where You Went to College?

    by Chris Shartrand and Toyya Pujol
  • Using the United States Government College Scorecard dataset, is it possible to use a combination of principal component analysis and k-means clustering to accurately predict the income from former college students, 6 and 10 years out, based on the college that they attended.
  • Proposal
  • Final Report
  • 17. Motor Vehicle Accident Analysis

    by Xueting Wang, Hao Wei and Hongao Yang
  • To sum up, we would like to find the most important causing the vehicle accident factors and their weights by establishing suitable models, such as logistic regression model and multivariable regression model, and using the official data from New York Department of vehicle.
  • Proposal
  • Final Report
  • 18. Clustering of NFL Quarterbacks Based on Performance Metrics Correlated with the Probability of Winning a Game

    by Benjamin Peters
  • The objectives of this project are threefold: 1) Determine the effect of quarterback performance on the probability of winning an NFL game. 2) Based on relevant performance metrics, cluster the players into groups. The groups should reflect tiers. For example, the best quarterbacks would be in tier 1 while the worst would be in the lower tiers. 3) Based on which tier a quarterback belongs to, the amount of money they should be paid can be assessed.
  • Proposal
  • Final Report