gravatar

nomarpicasso

Ramon Rodriguez-Santana

Recently Published

Estimating the Size of a Smoking Population Using the Mark–Recapture Method in R
The Mark–Recapture method, originally developed in ecology to estimate wildlife populations, provides a statistically rigorous alternative for estimating the size of partially observed human populations. Here is an example demonstrating how to apply the Mark–Recapture method to estimate the size of a smoking population using simulated data.
Greenfield Analysis (GFA) using a center-of-gravity (CoG) approach to identify 4 optimal health care facility locations in Connecticut
Greenfield Analysis (GFA)is a facility location optimization technique used to identify the most suitable placement of new service centers, warehouses, or healthcare facilities when no prior infrastructure constraints exist. In this analysis, we applied a center-of-gravity (CoG) approach to identify four optimal facility locations in Connecticut.
Comparing the Historical Limits Method (HLM) and Negative Binomial (NB) Regression for Detecting Quarterly Infectious Disease Outbreaks in R
When detecting HCV outbreaks, HLM will alert you if the current quarter’s case count is much higher than the average of previous quarters, without adjusting for changes in population size or time trends. In contrast, NB regression can model expected case counts based on factors such as time, region, population, and seasonality, providing a more accurate expected count along with confidence intervals.
Time-Space Alert Detection (TSAD) in R
This tutorial provides a practical guide to using the R programming language to detect infectious disease outbreaks through time–space analysis. You’ll learn how to use R’s powerful data-handling capabilities and functions to identify unusual increases in specific diagnoses.
OneR (One Rule) classification model in R
The OneR (One Rule) classification model is a simple yet effective rule-based machine learning algorithm. It generates one rule for each feature in the dataset and then selects the rule with the lowest error rate for classification. Despite its simplicity, OneR performs surprisingly well on many classification tasks, often competing with more complex machine learning models.
Forecasting time-series data using 19 {forecast} package algorithms in R
Forecasting plays a vital role in identifying patterns within time-series data and supporting informed decision-making across diverse fields such as business, healthcare, economics, and environmental studies.
Machine Learning Regression Models in R
Regression analysis in machine learning (ML) is a method used to examine the connection between independent variables and a dependent variable. This type of analysis it is known as predictive modeling, in which an algorithm or method is used to predict continuous outcomes. Here are the steps for conducting a ML regression analysis and deploying the final selected model.
Using the {tidycensus} package in R to estimate number of four Connecticut Hispanic subgroups populations
{tidycensus} enables users to interact with specific US Census Bureau data APIs. It retrieves data frames compatible with tidyverse and integrates a straightforward geography feature.
Hierarchical cluster analysis of prescription counts by town of residence in R
Hierarchical cluster analysis example in R.
K-Means and K-Medoids clustering of prescription counts by Connecticut towns
Steps on how to K-Means and K-Medoids clustering. K-means clustering is an unsupervised machine learning algorithm that identifies groups in unlabeled data. K-medoids is an unsupervised method with unlabelled data to be clustered. It is an improvised version of the K-Means algorithm mainly designed to deal with outlier data sensitivity.
Create maps in R by merging shapefile with data source file
Create {ggplo2} and {tmap} maps in R by merging shapefile and data source file.
Local Moran and Local Getis-Ord Maps of OD Deaths by Town of Residence using the {rgeoda} package in R
Using the {rgeoda} package to create Local Moran and Local Getis-Ord Maps.
Create publication-ready analytical and summary tables using {gtsummary} package in R
Using the {gtsummary} package offers a stylish and adaptable method for producing analytical and summary tables that are ready for publication.
Build a HeatMap in R using the {leaflet} and {leaflet.extras} packages
Creating a HeatMap from 2012 Starbucks locations in CT, MA and RI.
Detecting anomalies in your data using Benford’s Law and R
Analyzing large amounts of data in search of anomalies can be a frustrating task. You need techniques that allow you to quickly evaluate data in a way that highlights potential anomalies and prevents you from conducting analysis that are meaningless.
Impute missing values using {missForest} package in R
In this presentation we will learn how to impute missing values using the {missForest} package. This package uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data.
Time series forecasting using {modeltime} in R
Here are the instructions on how to perform classical time series analysis and machine learning modeling in one framework.
Model deployment using {plumber} package in R
This presentation shows the steps on how to deploy a GLM (Logistic Regression) machine learning model created in R via a Plumber API.
Compare 13 models and select the best using the {caret} R package
Compare the estimated accuracy of different machine learning algorithms (models). Select the most accurate model for your predictive analytics project. When working on a machine learning project, you often have several good models to choose from. Each of the models you selected needs to be measure for accuracy. In order to select the best and final model(s), you should use several different methods to estimated the accuracy of your machine learning models. Here are the steps on how to select the best and final model(s) using the Vertical Box-and-Whisker Plot method.
Using {Rayshaders} package to visualize 3D map in R
The {rayshader} is an open source package for producing 2D and 3D data visualizations in R.
Using the AutoML {forester} package for Tree-based Models
The {forester} package is an AutoML tool in R for tabular data regression and binary classification tasks. It wraps up all machine learning processes into a single train() function.
Spatial statistical modeling and prediction using the {spmodel} package in R
The spmodel is an R package used to fit, summarize, and predict for a variety spatial statistical models applied to point-referenced or areal (lattice) data. Parameters are estimated using various methods, including likelihood-based optimization and weighted least squares based on variograms.
Using the {sociome} package to identify high deprivation areas in Connecticut.
The ADI scores shown here identify areas where deprivation and affluence exist within communities in Connecticut. Organizations implementing Overdose, HIV and Hep C prevention interventions can use this information to identify high deprivation areas in Connecticut. It is recommended for organizations in Connecticut to focus their Overdose, HIV and Hep C prevention efforts in high level deprivation areas.
Exploring data using the vtree package
The {vtree} package is a tool for calculating and displaying variable trees.
Using {sf} package for spatial counting points in polygons via spatial join
Here are the steps for spatial counting points (i.e., Long and Lat coordinates) using the {sf} package
Finding variable importance in a logistic regression model in R
Steps on how to assess variable importance in a logistic regression analysis.
Binary Logistic Regression in R
A logistic regression is used to predict a class (or category) variable (y) based on one or more predictor variables (x).
Exploratory data analysis (EDA) using {summarytools}
The {summarytools} package allows you to quickly create an EDA report in R.