"Big" data analytics: projects for ISyE 6416
Welcome to the website for ISyE 6416, ISyE, Georgia Tech. We will post our projects on data analytics here. The goal of the project is to connect the course material to real-life data analysis. More information here.
Projects
1. Better utilization of the room
by Yi Wen, Xi Yang and Xue Zhang
In this project, we used the data that can be a factor to determine room occupancy, and using several classification methods to develop a effective model for room occupancy determination.
Proposal
Final Report
2. Multi-temporal Classification of a Seasonally Flooded Wetland Using Optical Remote
Sensing (Satellite) Indices.
by Courtney Di Vittorio and Robert Woessner
Using the NDWI and NDVI values we will experiment with both clustering and
supervised classification techniques. We can classify images individually for one
instance in time or incorporate multiple images into the classification algorithm.
Proposal
Final Report
3. Clustering and Classification of Handwritten Digits
by Caroline Roeger and Damon Frezza
We plan to identify the best algorithm or method for handwritten digit recognition. Handwritten
digit recognition is important in a world where tasks and processes are becoming more
automated. Banking, for example, is significantly more automated than it used to be.
Proposal
Final Report
4. Comparison of Classification Methods in Identifying Handwritten
Digits
by Danielle Boccelli and William Henry Taslim
The result of our project will provide an insightful information
about each method for tech companies or computer scientists who are looking for
the best way to recognize handwritten words.
Proposal
Final Report
5. What Industry Are You?
by Simon Chow and Pravara Harati
Our project is to build an image classification system. Given the headshot of a
person, our classifier will determine which industry the person is likely to belong to. This
could have numerous practical applications if a successful classifier is found.
Proposal
Final Report
6. Music Genre Classification (Best Project Award)
by Chen Feng, Mina Georgieva and Tony Yaacoub
The aim of this project is to improve upon the accuracy of genre classification. We are
considering a 10-genre classification problem with the following categories: classic pop and rock;
classical; dance and electronics; folk; hip-hop; jazz and blues; metal, pop; punk; soul and
reggae. The features we will use for classification are timbre, tempo and loudness information.
Proposal
Final Report
7. HMM Model in Economic Recession Identification
by Ting Sun, Shiyu Zheng and Zhe Liu
In this project, we develop an extension of the Hidden Markov Model (HMM) to find the hidden economic states of the US behind the data. Specifically, the HMM has a Gaussian mixture at each state as the forecast generator.
Proposal
Final Report
8. Beijing PM2.5 Time Series Analysis and Prediction using Regression and Markov
Model at Different Time Scales
by Mengmeng Liu and Xin Cao
Two different models are considered to be used to do the time series analysis of PM2.5 in
Beijing during 2009-2015. After obtained the model, we use them to predict the PM 2.5 in 2016,
and compare our prediction with the real data.
Proposal
Final Report
9. Profit-based classification in customer churn prediction: a case study in banking industry
by Ashkan Zakaryazad and Taewoon Kong
The word “churn” means to stop consuming products of a specific
company and use fungible product of another company because of its better quality
or service or less price.
Proposal
Final Report
10. Women's Gymnastics (Best Project Award)
by Bryan Hartman, Scott Lynch and Ben Pope
For our data analysis, we focused our attention to the Artistic Women's Gymnastics events from
the 2012 London Olympics. This was the most recent Summer Olympics and the first to adopt a new
scoring systme for women's gymnastics.
Proposal
Final Report
11. An Application of Clustering in Weighted Social Networks
by Junying He and Xi He
In this project, we want to apply the idea of clustering to Enron email dataset. Our goal is
to detect communities within the company, based on the email communications among the
employees. We will also compare the result of our method with the graphical representation of
the social network, to see if our result is reasonable.
Proposal
Final Report
12. Facial Recognition
by Hugo Acuna, James Netter and Mahadevan Vaidyanathan
With a rise in the importance of facial recognition, for not only law enforcement but for
websites like Facebook, the ability to create software that has the ability to recognize a human
face is paramount. In this project, our goal is to create an effective algorithm for detecting key
facial features in a picture of a person’s face.
Proposal
Final Report
13. Hidden Markov Model for Stock Market Index
by Ziqi Yang and Feng Gao
In this project, we are going to use Hidden Markov Model to analyze which state the market is in and trying to obtain the volatility of each states and using the Black-Sholes formula for option pricing.
Proposal
Final Report
14. Baseball Predictions
by Bella Smith and Betsie Last
Which is better for predicting what baseball team will make the playoffs, k-means
clustering or LDA classification?
Proposal
Final Report
15. Choosing Between Branch-Based and Branch-Avoiding Algorithms for Social Network Graphs
by Chris Meixell, Eisha Nathan and Chuanping Yu
We focus on those graphs modeling social networks. Much research of social networks focuses on identifying important players in a graph, using some centrality metric. One such metric is clustering coefficients, which is useful for finding key players in a network based on their local connectivity.
Proposal
Final Report
16. Can We Know How Much Money You Make Based on Where You Went to College?
by Chris Shartrand and Toyya Pujol
Using the United States Government College Scorecard dataset, is it possible to use a
combination of principal component analysis and k-means clustering to accurately predict the income
from former college students, 6 and 10 years out, based on the college that they attended.
Proposal
Final Report
17. Motor Vehicle Accident Analysis
by Xueting Wang, Hao Wei and Hongao Yang
To sum up, we would like to find the most important causing the vehicle
accident factors and their weights by establishing suitable models, such as
logistic regression model and multivariable regression model, and using the
official data from New York Department of vehicle.
Proposal
Final Report
18. Clustering of NFL Quarterbacks Based on Performance Metrics
Correlated with the Probability of Winning a Game
by Benjamin Peters
The objectives of this project are threefold:
1) Determine the effect of quarterback performance on the probability of
winning an NFL game.
2) Based on relevant performance metrics, cluster the players into groups. The
groups should reflect tiers. For example, the best quarterbacks would be in
tier 1 while the worst would be in the lower tiers.
3) Based on which tier a quarterback belongs to, the amount of money they
should be paid can be assessed.
Proposal
Final Report
|