RPubs

by RStudio

lukahere007

Luke Wamalwa

Recently Published

Multiple Linear Regression: A Case Study on Insurance Redlining in Chicago

This report investigates whether racial composition influences the issuance of FAIR plan insurance policies in Chicago ZIP codes. Using multiple linear regression, the analysis controls for fire and theft rates, housing age, and income to assess the independent effect of minority population percentage on policy issuance.

3 days ago

Phishing Email Detection Analysis with Random Forest & GBM

End‑to‑end comparison of Random Forest and Gradient Boosting classifiers on an imbalanced phishing‑email dataset. We split 60/20/20, applied down‑sampling, up‑sampling, ROSE and SMOTE, tuned probability thresholds to maximize F₁, then evaluated the champion model on a held‑out test set

about 2 months ago

A Validation-Based Model Selection Strategy for Breast Cancer Diagnosis Using Logistic Regression

This analysis investigates the use of logistic regression models to predict malignancy in breast cancer based on tumor characteristics derived from digitized medical images. Using the Breast Cancer Wisconsin (Diagnostic) dataset, we: Conducted exploratory data analysis to visualize key variables. Performed feature selection using both statistical significance and multicollinearity checks. Split the dataset into 60% training, 20% validation, and 20% test sets to evaluate generalization. Fitted multiple logistic regression models, refining them iteratively based on AIC, deviance, accuracy, and ROC/AUC. Identified a final model with three key predictors: texture_mean, concavity_mean, and radius_mean. Validated the final model on the test set, achieving strong predictive performance with high sensitivity, specificity, and an AUC of 0.974. Visualized the effect of predictors on malignancy probability using ggplot2, pROC, and ggpmisc. This project demonstrates the importance of model validation, feature interpretability, and visualization in clinical predictive modeling, and offers a reproducible pipeline for diagnostic model development using logistic regression.

about 2 months ago

Power Analysis for A/B Testing: Impact of Sample Size in R

This project demonstrates how small sample sizes in A/B testing can lead to inconclusive results and how adjusting sample sizes through power analysis reveals statistically significant effects. Simulated data is used to compare p-values, conversion rates, and statistical power before and after sample size adjustment. Visualizations include ggstatsplot, ggpubr, ggsignif, and ggpmisc.

about 2 months ago

Simulated Vaccine Analysis

This report explores a simulated clinical dataset evaluating vaccine-induced immune responses over time. It models antibody titer changes across vaccine doses and patient characteristics using ANOVA, logistic regression, and visualizations from the tidyverse, ggpubr, ggpmisc, and gghighlight packages.

about 2 months ago

Sign In

RPubs

lukahere007

Luke Wamalwa

Recently Published

Multiple Linear Regression: A Case Study on Insurance Redlining in Chicago

Phishing Email Detection Analysis with Random Forest & GBM

A Validation-Based Model Selection Strategy for Breast Cancer Diagnosis Using Logistic Regression

Power Analysis for A/B Testing: Impact of Sample Size in R

Simulated Vaccine Analysis