RPubs

by RStudio

Recently Published

BANA 7052: Applied Linear Regression - Module 3 Assignment

By mensor

5 months ago

My very first R Plot

By wildlife-conservation-steph

2020 bat emergence times relative to sunset and the air temperature at the time of emergence

5 months ago

Distribuciones de probabilidad

By anahi01

Anahí Marín Isabella Escobar

5 months ago

Sleep Health and Lifestyle Analysis

By umer550

Exploring dataset through various visualizations to better understand sleep health.

5 months ago

Exercício 11 - CPAD

By Lucas-X-A

Documento referente à atividade 11 da cadeira de CPAD - BCC - UFRPE

5 months ago

Logistic Regression Analysis This analysis examined crime rates across 466 Boston neighborhoods using logistic regression to predict whether areas exceed the median crime rate. The dataset contained 12 predictor variables including residential zoning (zn), pollution levels (nox), housing characteristics (rm, age), accessibility metrics (dis, rad), and socioeconomic indicators (lstat, medv), with no missing values. Data preparation involved log-transforming right-skewed variables (nox, lstat) and addressing multicollinearity by removing highly correlated predictors—specifically dropping tax (correlated 0.91 with rad), indus (correlated 0.76 with nox), and medv (correlated -0.74 with lstat)—reducing all VIF values below 5. Three models were developed: Model 1 used all prepared variables, Model 2 applied stepwise selection for parsimony, and Model 3 incorporated interaction terms (rm × lstat) and polynomial features (rm²) to capture non-linear relationships. Model 2 emerged as the optimal choice, balancing predictive accuracy (88.6% accuracy, 0.874 precision, 0.865 specificity) with model simplicity (lowest AIC=232.6, BIC=269.9), retaining eight significant predictors including nox_log, rad, dis, and rm while excluding the theoretically problematic lstat_log variable that showed a counter-intuitive negative coefficient in Model 1. Despite Model 1's marginally better performance metrics, an ANOVA test revealed no significant improvement from the additional variable (p=0.63), confirming Model 2 as the most parsimonious and interpretable model for predicting high-crime neighborhoods.

5 months ago

t-Test HW

By ssammak

5 months ago

Movie Success Prediction

By vivixntrxn

This data analysis explores the key factors influencing movie success using the ggplot2movies dataset of 58,788 films. Through comprehensive statistical analysis and interactive visualizations, we answer some industry questions: (1) Which genres consistently achieve higher ratings? (2) Does budget correlate with better ratings? (3) How have ratings evolved over time? (4) Do genre combinations outperform single genres? The presentation includes an interactive Shiny app for exploring genre combinations demonstrating predictive limitations. This is perfect for filmmakers, investors, and data enthusiasts interested in the cinema business.

5 months ago