"Big" data analytics: projects for ISyE 2028
Welcome to the website for ISyE 2028, ISyE, Georgia Tech. We will post our projects on "Big" data analytics here. The goal of the project is to connect the course material to real-life data analysis.
Big data is about the volume of the data, the variety and the velocity of the data. But it is also about what can we do, and how we to with the data: to extract the information that people were not able to find from a piece of data, or demonstrate the novel use of data to achieve something new. In this project, we will perform some "big" data analysis using methods you learned in class. The dataset you use does not necessarily have to be really large. But you should come up with a novel way of using data. More information here.
Projects
1. Price of used car on craigslist
by John Barbour
Build an application to determined the distribution of prices for used cars. A piece of software was written that is somewhat similar to the script is here: http://chrisholdgraf.com/querying-craigslist-with-python/
Proposal
Presentation
Final Report
2. Monitoring ORP for GT swimming pool
by Morgan Jacobus
I work at the CRC as a pool technician. Basically my job is to maintain the pools, including the chemical levels. Everyday, we must go into the pump rooms and record measurements on pH, temperature, chlorine, etc. One of the measurements is ORP otherwise known as oxidation reduction potential. This measures the pool's ability to fight off bacteria and other harmful agents. For each pool, there is a set point as to what the ORP should be. My project would be to collect recorded data and compute a confidence level to see if the the sample data contains the actual set point in which the ORP needs to be. I am excited for this project because it has real world application to my job here at Georgia Tech.
Proposal
Final Report
Presentation
3. Time spent on social media?
by Wijaya, Mario
Estimate the expected difference on the time spend on social media for male versus female student.
proposal
final report
presentation
4. Crime report by GT Police
by Ousmane Kaba
Study the number of crimes reported to GT Police each week in past 24 months (10/19/2015-8/19/2015). Construct a confidence interval of number of crimes reported to GT Police each day for the past 3 months. Here's the website to get data from: http://www.police.gatech.edu/crimeinfo/crimelogs/crimelog.html
proposal
Final Report
Presentation
5. US Veterans suicide data
by Chris Noerjadi
It was said in the news that 22 veterans commit suicide every day, or about 1 every 65 minutes. Additionally, it has also been said that suicide rate for veterans is higher than the civilian counterpart, when in fact it is actually lower. I'll be gathering the data from multiple resources, and confidence confidence interval in regards to the proportion of veteran suicides when compared to total number of suicides (veterans + civilians).
proposal
Final report
Presentation
6. US women in engineering
by Rebeccah Sharpe
There are not many women in the engineering arena so it will be interesting to see the distribution of the undergraduate or even graduate or PHD degrees received by women from colleges across the country.
I will use a mixture of estimators, histograms, and find the five number summary combined with other methods learned in this class to comment on the data presented. I have found a large amount of data for many colleges for women in engineering programs for undergraduate, graduate, and PHD levels from this website.
proposal
Final Report
Presentation
7. Most efficient route from Tech Square to CULC
by Hudson Lynam
The goal is to find the most efficient way to walk from Tech Square to the CULC, a common route Georgia Tech students, including myself, take daily.
I will time myself, walking several (2-3) different routes from point A (Tech Square) to point B (The northern entrance to the CULC), and calculate descriptive
statistics as well as constructing histograms and other meaningful graphical displays of the data, all to determine which route takes the least amount of time.
My data source would be, as mentioned, my own timed walks from Tech Square to the CULC.
Proposal
Final Report
Presentation
8. Price of vegetable oils in the global commodity
by Nathan Stefanick
I will do analysis on vegetable oils in the global commodity and see how the price of different oils change in comparison to each other. This could have the impact of global buying and selling.
Sources.
Proposal
Final Report
Presentation
9. Academic performance of student athletes
by Arthi Nithi
I want to analyze the difference in how various student athletes perform academically at Georgia Tech. I believe students part of the athletic department do not do as well as student who partake in club sports or intramural. I will get data by conducting a survey and through research online.
Proposal
Final Report
Presentation
10. Does wearing shoes affect the height of a vertical jump?
by Ludwika Pankowska
I will conduct an experiment on around 30 people of the age group 18-24 and check the differences between the vertical jump with or without shoes. I am very curious about the result and how shoes affect the end result of our physical abilities given that sport shoes are supposed to increase our "effectiveness" in the sport environment.
Proposal
Final Report
Presentation
11. Yik Yak traffic
by Mary Alyce Martin
Yik Yak is an app popular on college campuses that students use to anonymously post statements and statuses. I am going to attempt to find the prime time for its usage on Georgia Tech's campus by observing the number of posts ("yaks") at a given time and the popularity of the top-rated yaks based not the number of "up-votes" these top yaks have earned. I plan to check this information multiple, specific times a day for a number of days and use this data to estimate the time of highest traffic on this app on any given day, while also comparing the difference between a weekday and a weekend day. My data source will be the Yik Yak app itself, which gives the numbers needed for my study. I can then construct confidence intervals for the number of yaks at a specific time for multiple times during a day.
Proposal
Final Report
Presentation
12. Main factor of motorcyclist accidents.
by Sana Fathima
I am interested in figuring out the main cause/factor, if there is one, of motorcyclist accidents. I will be getting data from a website called FARS, which has data about the time/day of the accident, age, # wearing helmets, and the side of motorcycle that the collision happened on. My main goal in this project is to analyze when is it more common for motorcyclists to get in an accident; is it when they are young, or if they drive during the night, is it usually their fault (this can be found by seeing where the initial point of impact is. Here is a link of the data that I will use.
Proposal
Final Report
Presentation
13. Traveling time in Atlanta's traffic.
by Ji Han Ko
I wish to study the estimated travel time from GA Tech to my residence, Duluth, at different points during rush hour (e.g. one hour intervals from 3 to 7). As a commuting student, I want to find out if leaving school during the hours of 3 to 7 is worth the drive opposed to the average time of 30 minutes after 8. I will use Google Maps to take traffic into consideration and collect the travel time at the different intervals.
14. Airline boarding process.
by Christopher Fu
One of the issues major airlines face are flight delays and flight cancellations. This causes a chain reaction as other connecting flights are delayed or cancelled. This is very problematic not only for the passengers but for the airlines as well. According to the Federal Aviation Administration, it is estimated that flight delays cost airlines $22 billion yearly. One of the questions we want to figure out is as follows:
Based on the time history of the boarding pass scans, how is the probability of late departure times affected.
Proposal
Final Report
Presentation
15. Compare tennis players.
by Daniel Pare
I'm a big tennis fan and I want to compare my favorite player Rafael Nadal against his rival Roger Federer (and maybe Djokovic if that is possible). They are both players that are very good on one surface, but more vulnerable on others. My idea is to compile data for players that specialize on clay versus those that specialize on grass and compare how they did on other surfaces. I'll web scrape data using python to collect my data.
Proposal
Final Report
Presentation
16. Majors and their study habits.
by Lauren Siegmann
I plan to determine if there is a difference between the amount of time engineering students study and work on homework per week and the amount of time business/liberal arts students during the week. I will send a survey to various audiences (my sorority, some fraternities, my 2018 graduating class FB page, my student organizations, etc.) to get the most unbiased data, and I will calculate the average time each student spends studying, working on homework, and writing papers, then try to see if there is a relation to their major. My hypothesis is that engineering students spend the most time on studying, computer science students spend the most time on homeworks, and business/liberal arts spend the most time on writing papers.
Proposal
Final report
Presentation
17. Effect of homegrown players on professional sports teams.
by Rahul Patel
For my project I will be looking at sports teams across the NFL and NBA's successes over the past 5 years. While several things change that could affect performance over the course of 5 seasons (coaching staff, injuries, stadiums etc.), I wanted to look at the correlation between whether a team can experience a large amount of success while having the majority of their players come from their respective drafts. Is it better for teams to hold onto the players they draft, or for teams to pursue big-contract, high-value targets in free agency? I will look at the percentage of each team's roster that played with the same organization the season before (and how many years they have been with the team) for 5 seasons. Then I will look at that teams successes over the same 5 seasons. Even though I have a sample size from two different leagues the logic behind the correlation should be the same. I will, however, separate the data between the two to discern the two leagues. This information is available on the Player's Football Focus and LandofBasketball.com. I will also use this data.
Proposal
Presentation
Final Report
18. Shoe size and height.
by Aditya Sehgal
For this project, I wish to perform a study on male and female height and shoe size. I wish to test the following hypothesis :
1. ``mean male height is greater than mean female height'';
2. ``mean male shoe size is greater than mean female shoe size''.
I will then attempt to determine if there is evidence to suggest a linear correlation between (i) male height and male shoe size, and (ii) female height and female shoe size.
Proposal
Final Report
Presentation
19. Exchange rate of South Korean Won.
by Daniel Kim
I would like to investigate in which month the exchange rate most likely increases or decreases.
Proposal
Final Report
Presentation
20. STEM vs. non STEM majors academic performance at Georgia Tech.
by Chrystelle Nare
The goal of this project is to analyze the differences between the academic performance of STEM and non-STEM majors at Georgia Tech using the grades(GPA) obtained by students during the Spring and Summer semesters in 2015 as a comparison between the two groups.
Proposal
Final Report
Presentation
21. Hold Time on Calls to Customer Service.
by Abigail Copeland
The problem that I want to research is the time spent on hold when people call customer service centers.
Proposal
Final Report
Presentation
22. Shows online or TV?
by Heer Patel
Study the relevance of the television set by comparing the hours spent watching shows online to the hours spent watching shows on television.
Proposal
Final Report
Presentation
23. Waiting time in emergency room.
by Young Jae Han
I would like to perform a study on average waiting time in emergency room in USA. I believe many of you had an experience of suffering in ER to receive immediate care. As you know, however, it is almost impossible to get immediate care unless you have suffered life-threatening injuries like as heart attacks. So I want to analyze how much the system has improved over time.
Proposal
Final Report
Presentation
24. Goal of Europe's top soccer leagues.
by Ryan Sanders
I want to find out which of Europe's top soccer leagues (English Premier League, La Liga, Bundesliga, and Serie A) scored the most goals over the past few years, and the distribution of goals per year, trying to figure out in which months teams score the most often.
Proposal
Final Report
Presentation
25. 2016 Presidential Candidates.
by Kripa Varghese
I would like to see how each of the 5 major political candidates stand within various polls before and after each of the political debates that have been conducted thus far. It will be interesting to notice how their popularity fluctuates based on their performance within each debate as well as the effectiveness of their campaign. In addition, I would like to see their general popularity levels based on the poll data for the past few months as a gauge in the likelihood of which candidate will be their respective party's representative in the final election next year.
Proposal
Final Report
Presentation
26. Netflix and Study.
by Julia Gasbarro
I am very interested in examining the watching habits of my peers, and seeing if there is any correlation between the amount time a person spends on Netflix and the person's age, major, and study habits.
Proposal
Final Report
Presentation
27. Stocks of healthcare companies.
by Viral Shah
I want to study the stocks of four healthcare competitors: Celgene Corporation (CELG), Gilead Sciences (GILD), Amgen (AMGN), and Biogen (BIIB). I want to measure how each of these corporations? have deviated from historic projections of their stocks? earnings per share (EPS).
Proposal
Presentation
Final Report
28. Relationship between University Endowment and Ranking.
by Alan Johnson
I plan to study the relationship between a university's ranking by U.S. News and World Report and its endowment. Each year many students look at the rankings of universities to help them decide which school to attend. Universities use endowments to generate a return which can be used to hire professors, upgrade facilities, fund scholarships, and lower tuition. For this reason, endowment size could be an important factor for undergraduate or postgraduate candidates to consider when making their university search and decision. All of the data for this project is available online on the U.S. News and World Report website and the National Center for Education Statistics website. Hopefully the result of this project will be a model that can predict a university's ranking based on its endowment.
Proposal
Final Report
Presentation
29. GT on 2016 Presidential Election.
by Max Bruccoleri
For my project, I would like to determine the general political leanings of the Georgia Tech population based on each student?s chosen major by posing the question of which major political party (if any) they plan on voting for in the upcoming 2016 United States presidential election.
Proposal
Presentation
30. Better NFL drafting.
by Bhavna Choudhury
We would like to ask what the qualities of a player are that makes him a top prospect.
Proposal
Final Report
Presentation
31. Sleeping habits of college students.
by Connor Hutcherson
I want to collect data about the sleeping habits of college students, broken down into groups by major, gender, and GPA range. I will then use statistical analysis to attempt to draw a conclusion that shows correlation between sleep and academic success.
Proposal
Presentation
Final Report
32. Basketball Team Statistics as Predictors for Postseason Success.
by Zach Cole
Essentially, I intend to analyze which statistics lead to a team getting a better seed in the NCAA Tournament, and from there, which statistics lead to a team having more success in the tournament. By going back and analyzing every tournament team since 1985, there should be roughly 2000 teams, so 2000 data points for each statistic (points per game, strength of schedule, etc). The data will come largely from sports-reference.com, and more recent data will come from kenpom.com (that site has more in-depth statistics, but only goes back to 2002).
Proposal
Presentation
Final Report
33. The relationship between alcohol abuse and college GPA.
by Sunmin Kim
I'm going to collect professional data about college drinking and alcohol abuse on the Internet.
Proposal
Final Report
Presentation
34. Spread Betting Lines vs. Actual Results for NFL Teams.
by Jonathan Pang
In this study, trends will be analyzed to determine whether or not any teams in the NFL are consistently able to beat the spread.
Proposal
Final Report
Presentation
35. GT starbucks.
by Miwa Katamura
I would like to try to create a model of the monthly sales at Starbucks over the past years and compare data between the two locations.
Proposal
Final Report
Presentation
36. News and stocks.
by Kristiaan Sheedy
For the project, I would like to test the effect of news articles on the price movements of a stock to determine if news (aside from earnings reports) has a statistically significant effect.
Proposal
Presentation
Final Report
37. Engineering vs. Non-engineering - Time Spent Exercising.
by Nisha Shah
I would like to see whether or not there is a difference between the time that engineering vs non-engineering students spend doing physical exercise. I am curious to see whether or not the stereotype that engineering students are generally less physically fit than non-engineering students holds true.
Proposal
Presentation
Final Report
38. Luxury cars?
by Grant Herman
Proportion of cars that are considered luxury brands within the city compared to the suburbs.
Proposal
Final Report
Presentation
39. Correlation between standardized test scores (such as the SAT/ACT) and first year college Grade Point Average (GPA)
by Yesol Do
I will model the relationship between the two factors (test scores and GPA) using linear regression.
Proposal
Final Report
Presentation
40. Money on drinks?
by Hardik Tuteja
I will estimate the expected difference in the amount of money spent on drinks at bars for males versus females.
Proposal
41. Student loans.
by Jinghua Yang
I am interested in the percentage of students that graduate with loan.
Proposal
Final Report
Presentation
42. Coke Cola and the South.
by Mary Latimer
I wonder if Coke vs Pepsi is a regionally induced preference. I think that a greater proportion of people who grew up in Southeastern states prefer Coke as compared to people who grew up in non Southeastern states.
Proposal
Final Report
Presentation
43. Study hours and performance.
by James Karle
To track the number of hours people have been studying and correlate that to how well we feel we are doing on tests and quizzes after the fact.
Proposal
Presentation
Final Report
44. How much do walks hurt the Georgia Tech Baseball Team?
by Kevin Breslin
Baseball players and coaches say that walking batters will greatly hurt a team's chance of winning. In fact, it is said by many professionals that 80% of walked runners score, so I want to see if 80% of walked runners by the Georgia Tech Baseball Team in 2015 actually scored against them. I ultimately want to perform a two-tailed confidence interval, along with a five-number summary and a histogram to better visualize the data and see if the percentage for 2015 is at or near 80%.
Proposal
Final Report
Presentation
45. Trending key words
by James Moriarty
I would like to find ways to identify highlights, or trending news, and find out what keywords and phrases would be most representative of what is currently trending.
Proposal
Presentation
Final Report
|