

John Cruz

Recently Published

Trees and Rules
Non-Linear Regression
Linear Regression
Exponential Smoothing
Data Pre-Processing
Time Series Analysis
Final Kaggle
Spam Email Classifier
Working with test document data this project will classify documents as spam or non-spam email. A training and validation set will be created and pushed into a decision tree and logistic regression model.
Tidyverse - Fuzzyjoin
Determine proximity of geo-location data with MTA subway data and NYC public hospitals.
Recommender Systems
Tidyverse - Lubridate
Using the lubridate package within the Tidyverse ensemble, I created examples exploring NYC Filming Permits data.
Sentiment Analysis - NY Times
Example of sentiment analysis from Text Mining book and personal example using NY Times data
Working with NY Times API
The New York Times (NYT), provides access to its data through the Times API. With it, data analysis and visualizations can be performed on trends or decisions made within their published articles.
Favorite Books
Focuses on working with different types of files for analysis. I will be manually creating HTML, XML, and JSON formats that store three of my favorite books related to data science and programming.
MTA Daily Ridership
Compare daily ridership between different modes of transportation against estimated pre-pandemic levels
Pokemon Pokedex
Determine a frequency chart of which Pokemon fall into which types they are.
CDC Health Care Employment 2000-2020
Look to see if there is a relationship between the percentage change in employment versus the percentage change in mean hourly wages.
Probability: Kobe Bryant and 'Hot Hand'
Comparing Kobe Bryant and a simulated player to determine if Kobe had a 'hot hand' or if was he making shots as expected.
Transforming Wide Data
The objective is to be able to transform a wide format data structure into a long format where you have 'tidy' the data to perform the analysis easier. The data is flight arrival counts from two airlines in five different cities.
Chess Text File to CSV
Transform chess tournament text data into CSV
Character Manipulation
Working with strings and regular expressions
How Americans Like Their Steak
Walt Hickey from FiveThirtyEight collected data from people within the United States to see if a risk-averse person would be more likely to order a steak well done. They found no evidence a person that was a higher risk taker would prefer their steaks rare.