gravatar

Kevin_Martin16

Kevin Martin

Recently Published

Food Access and Pricing Inequality Across NYC Neighborhoods
This proposal explores how grocery pricing and store availability differ between low-income and high-income neighborhoods in New York City. Focusing on Brownsville (Brooklyn) and Lower Manhattan, the project investigates whether essential items like eggs and milk are more expensive or less accessible in areas with higher poverty rates. The analysis uses a placeholder dataset for now, with plans to incorporate real data from NYC Open Data, USDA Food Access Research Atlas, and the U.S. Census for the final project.
Week 7 – Working with HTML, XML, and JSON in R
This project explores how different data formats—HTML, XML, and JSON—can represent the same information and be read into R for analysis. Each file was created manually to better understand structural differences and how R packages like rvest, xml2, and jsonlite handle them. The comparison confirmed that all formats matched perfectly after being normalized. This assignment helped me connect classroom learning to real-world data handling, especially how formats are chosen based on whether data is meant for humans or systems to read.
Exploring the Normal Distribution in Fast Food Nutrition Data
This lab explores the concept of the normal distribution using nutritional data from fast food restaurants. Through visualization, simulation, and probability analysis in R, we examine how well real-world data (like calories from fat, sodium, and carbohydrates) align with a theoretical normal distribution. Using the tidyverse and openintro packages, I compared McDonald’s and Dairy Queen menu items, generated Q-Q plots, and calculated both theoretical and empirical probabilities. This lab demonstrates how statistical concepts can be applied to everyday datasets — providing practical experience in data visualization, distribution analysis, and probability modeling.
Project 2 – Data Transformation: Converting Wide Data into Tidy Formats
This project demonstrates how to transform wide datasets into tidy formats using R. Three datasets—Sales, Scores, and Vaccinations—were cleaned, reshaped, and summarized to prepare them for analysis and visualization. The project highlights the use of pivot_longer(), mutate(), and group_by() for data tidying, and includes visual summaries created with ggplot2. The completed outputs were exported to CSV files and packaged into a single zip file for easy sharing.
Lab 3 — Probability (Hot Hand)
This lab investigates the “hot hand” idea using Kobe Bryant’s 2009 NBA Finals shot data. I compute streak lengths from the real data and compare them to a simulation of an independent shooter with the same make rate (45%). Using histograms and summary statistics of streak lengths, I assess whether Kobe’s patterns look meaningfully different from randomness. The results suggest most streaks are short, and the longer ones we do see are consistent with what independence would produce.
Assignment 5B – ELO Calculations and Performance Analysis
This report analyzes chess tournament results using ELO calculations to compare actual player performance against expected outcomes based on pre-tournament ratings. It identifies the top overperformers and underperformers, explains patterns using statistical modeling, and includes visualizations, tables, and a CSV export of results. The analysis demonstrates how data transformation and tidy data principles can be applied to real-world competitive data.
Assignment 5A – Airline Delays Analysis
This assignment analyzes airline delay data for two airlines across five cities. The dataset was initially provided in a wide format and transformed into a tidy long format using R. The analysis includes: 1. Cleaning and handling missing data. 2. Calculating the overall share of delays for each airline. 3. Comparing the percentage of delays within each city, visualized through a stacked bar plot. 4. Identifying discrepancies between overall totals and city-by-city breakdowns, illustrating Simpson’s Paradox. This work demonstrates how data transformation and visualization in R can be used.
Project 1 - Chess Tournament Data
Analysis of chess tournament player data using R. Includes average opponent ratings and player statistics.
Week 3B: Window Functions — Moving Averages
This report analyzes stock price data for Apple and Microsoft using YTD averages and 6-day moving averages to highlight short-term vs. long-term trends.
Week 3A: Global Baseline Estimates (Movie Ratings)
Global Baseline recommender using μ + user_avg + movie_avg − μ. Includes cleaned data, baseline tables, predictions, recommendations, and visuals.
Week 2B: Evaluating Classification Model Performance
Null error rate, confusion matrices at thresholds 0.2/0.5/0.8, and accuracy/precision/recall/F1 for penguin predictions.
Week 2A: SQL and R — Movie Ratings
Connecting to MySQL from RStudio, importing data, exporting to CSV, and generating movie ratings summaries.
Week 2A: SQL and R — Movie Ratings
Connecting to MySQL from RStudio, importing movie ratings data, exporting to CSV, and generating basic summaries.
DATA607 Week 1: Pima Indians Diabetes Analysis
Assignment for DATA607 showing how to load and clean the Pima Indians Diabetes dataset.