gravatar

sconnin

Sean Connin

Recently Published

Machine Learning Classification & Principal Components Analysis
The work consists of two separate sections. The first part compares classification outcomes for selected Machine Learning algorithms relative to a baseline model (logistic regression). Model building and evaluation is conducted using Tidymodels. The second part consists of a principal components analysis study of water chemistry from streams across New York State. The package Factoextra is used for calculations and data visualizations. The results demonstrate to use of this unsupervised learning technique to cluster waters based on their chemical composition.
Tree Regression Models
Based on Chapter 8 of Applied Predictive Modeling (Kuhn and Johnson). Includes application of Tidymodels.
Nonlinear Predictive Models
Based on Chapter 7 of Applied Predictive Modeling (Kuhn and Johnson). Includes application of Tidymodels.
STEM Completion: SVM Regression
An initial SVM model for the average percentage of women completing STEM programs (undergraduate) at public/private 4-yr + institutions in the United States.
Regularization
Based on Chapter 6 of Applied Predictive Modeling (Kuhn and Johnson).
STEM Completion Models
Initial tree and ensemble models for the average percentage of women completing STEM programs (undergraduate) at public/private 4-yr + institutions in the United States.
Forecasting Project
624 - Project 1. Forecasting models for ATM cash withdrawals and residential power consumption.
ARIMA
624 - Forecasting Principals and Practice. Chapter 9.
622-1 Classification Models
Comparison of Classifiers on a synthetic dataset(s).
Exponential Smoothing
624 - Forecasting Principals and Practice. Chapter 8.
Data Processing and Overfitting
Based on Chapter 3 of Kuhn and Johnson's Applied Predictive Modeling.
Forecasters Toolbox
624 - Forecasting Principals and Practice. Chapter 5.
Time Series Decomposition
624 - Forecasting Principals and Practice. Chapter 3.
Time Series Graphics
624 - Forecasting Principals and Practice. Chapter 2.
Insurance Classification and Prediction
The purpose of this work is to estimate the probability that an individual (seeking auto insurance) will be in an accident and then to forecast the potential cost of an ensuing claim. A synthetic data set of ~ 8000 observations will be used to construct predictive models for this purpose. The use of synthetic data permits model development absent proprietary information, while also providing a heuristic for quantitative reasoning.
Javascript Snippets
A entry-level javascript application for Data608 - Knowledge and Visual Analytics.
501C3 Study
Initial phase of 501c3 study for VT.
2016 Campaign Speeches
Initial sentiment analysis of 2016 presidential campaign speeches.
API Query - NY Times
NY Time API - code and query results. Data 607 exercise.
Files structures - xml, json, html
File format and manipulation. Exercise for Data 607.
New Orleans - Multiple Data Sources
Team project to explore the impact of hurricane Katrina on demographic characteristics of New Orleans at the parish level. This document includes code and an initial analysis of cyclone data extracted from Wikipedia tables.
Tidy Data
Exercise to clean untidy airline data using Dplyr and Tidyr.
Character Manipulation and Date Processing
Data processing exercise in R.
Congressional Redistricting in Texas
Initial cleaning and analysis of data from the "Atlas of Redistricting' project. The latter was published online by FiveThirtyEight in 2018.