"Big" data analytics: projects for ISyE 6416

Prof. Yao Xie, Georgia Tech, Spring 2016

Welcome to the website for ISyE 6416, ISyE, Georgia Tech. We will post our projects on data analytics here. The goal of the project is to connect the course material to real-life data analysis. More information here.

Projects

1. Better utilization of the room

by Yi Wen, Xi Yang and Xue Zhang

In this project, we used the data that can be a factor to determine room occupancy, and using several classification methods to develop a effective model for room occupancy determination.

2. Multi-temporal Classification of a Seasonally Flooded Wetland Using Optical Remote Sensing (Satellite) Indices.

by Courtney Di Vittorio and Robert Woessner

Using the NDWI and NDVI values we will experiment with both clustering and supervised classification techniques. We can classify images individually for one instance in time or incorporate multiple images into the classification algorithm.

3. Clustering and Classification of Handwritten Digits

by Caroline Roeger and Damon Frezza

We plan to identify the best algorithm or method for handwritten digit recognition. Handwritten digit recognition is important in a world where tasks and processes are becoming more automated. Banking, for example, is significantly more automated than it used to be.

4. Comparison of Classification Methods in Identifying Handwritten Digits

by Danielle Boccelli and William Henry Taslim

The result of our project will provide an insightful information about each method for tech companies or computer scientists who are looking for the best way to recognize handwritten words.

5. What Industry Are You?

by Simon Chow and Pravara Harati

Our project is to build an image classification system. Given the headshot of a person, our classifier will determine which industry the person is likely to belong to. This could have numerous practical applications if a successful classifier is found.

6. Music Genre Classification (Best Project Award)

by Chen Feng, Mina Georgieva and Tony Yaacoub

The aim of this project is to improve upon the accuracy of genre classification. We are considering a 10-genre classification problem with the following categories: classic pop and rock; classical; dance and electronics; folk; hip-hop; jazz and blues; metal, pop; punk; soul and reggae. The features we will use for classification are timbre, tempo and loudness information.

7. HMM Model in Economic Recession Identification

by Ting Sun, Shiyu Zheng and Zhe Liu

In this project, we develop an extension of the Hidden Markov Model (HMM) to find the hidden economic states of the US behind the data. Specifically, the HMM has a Gaussian mixture at each state as the forecast generator.

8. Beijing PM2.5 Time Series Analysis and Prediction using Regression and Markov Model at Different Time Scales

by Mengmeng Liu and Xin Cao

Two different models are considered to be used to do the time series analysis of PM2.5 in Beijing during 2009-2015. After obtained the model, we use them to predict the PM 2.5 in 2016, and compare our prediction with the real data.

9. Profit-based classification in customer churn prediction: a case study in banking industry

by Ashkan Zakaryazad and Taewoon Kong

The word “churn” means to stop consuming products of a specific company and use fungible product of another company because of its better quality or service or less price.

10. Women's Gymnastics (Best Project Award)

by Bryan Hartman, Scott Lynch and Ben Pope

For our data analysis, we focused our attention to the Artistic Women's Gymnastics events from the 2012 London Olympics. This was the most recent Summer Olympics and the first to adopt a new scoring systme for women's gymnastics.

11. An Application of Clustering in Weighted Social Networks

by Junying He and Xi He

In this project, we want to apply the idea of clustering to Enron email dataset. Our goal is to detect communities within the company, based on the email communications among the employees. We will also compare the result of our method with the graphical representation of the social network, to see if our result is reasonable.

12. Facial Recognition

by Hugo Acuna, James Netter and Mahadevan Vaidyanathan

With a rise in the importance of facial recognition, for not only law enforcement but for websites like Facebook, the ability to create software that has the ability to recognize a human face is paramount. In this project, our goal is to create an effective algorithm for detecting key facial features in a picture of a person’s face.

13. Hidden Markov Model for Stock Market Index

by Ziqi Yang and Feng Gao

In this project, we are going to use Hidden Markov Model to analyze which state the market is in and trying to obtain the volatility of each states and using the Black-Sholes formula for option pricing.

14. Baseball Predictions

by Bella Smith and Betsie Last

Which is better for predicting what baseball team will make the playoffs, k-means clustering or LDA classification?

15. Choosing Between Branch-Based and Branch-Avoiding Algorithms for Social Network Graphs

by Chris Meixell, Eisha Nathan and Chuanping Yu

We focus on those graphs modeling social networks. Much research of social networks focuses on identifying important players in a graph, using some centrality metric. One such metric is clustering coefficients, which is useful for finding key players in a network based on their local connectivity.

16. Can We Know How Much Money You Make Based on Where You Went to College?

by Chris Shartrand and Toyya Pujol

Using the United States Government College Scorecard dataset, is it possible to use a combination of principal component analysis and k-means clustering to accurately predict the income from former college students, 6 and 10 years out, based on the college that they attended.

17. Motor Vehicle Accident Analysis

by Xueting Wang, Hao Wei and Hongao Yang

To sum up, we would like to find the most important causing the vehicle accident factors and their weights by establishing suitable models, such as logistic regression model and multivariable regression model, and using the official data from New York Department of vehicle.

18. Clustering of NFL Quarterbacks Based on Performance Metrics Correlated with the Probability of Winning a Game

by Benjamin Peters

The objectives of this project are threefold: 1) Determine the effect of quarterback performance on the probability of winning an NFL game. 2) Based on relevant performance metrics, cluster the players into groups. The groups should reflect tiers. For example, the best quarterbacks would be in tier 1 while the worst would be in the lower tiers. 3) Based on which tier a quarterback belongs to, the amount of money they should be paid can be assessed.