Recently Published

85302_exam_output
My very first R Plot
2020 bat emergence times relative to sunset and the air temperature at the time of emergence
Distribuciones de probabilidad
Anahí Marín Isabella Escobar
Apply9
Apply9
Sleep Health and Lifestyle Analysis
Exploring dataset through various visualizations to better understand sleep health.
Exercício 11 - CPAD
Documento referente à atividade 11 da cadeira de CPAD - BCC - UFRPE
Logistic Regression
Logistic Regression Analysis This analysis examined crime rates across 466 Boston neighborhoods using logistic regression to predict whether areas exceed the median crime rate. The dataset contained 12 predictor variables including residential zoning (zn), pollution levels (nox), housing characteristics (rm, age), accessibility metrics (dis, rad), and socioeconomic indicators (lstat, medv), with no missing values. Data preparation involved log-transforming right-skewed variables (nox, lstat) and addressing multicollinearity by removing highly correlated predictors—specifically dropping tax (correlated 0.91 with rad), indus (correlated 0.76 with nox), and medv (correlated -0.74 with lstat)—reducing all VIF values below 5. Three models were developed: Model 1 used all prepared variables, Model 2 applied stepwise selection for parsimony, and Model 3 incorporated interaction terms (rm × lstat) and polynomial features (rm²) to capture non-linear relationships. Model 2 emerged as the optimal choice, balancing predictive accuracy (88.6% accuracy, 0.874 precision, 0.865 specificity) with model simplicity (lowest AIC=232.6, BIC=269.9), retaining eight significant predictors including nox_log, rad, dis, and rm while excluding the theoretically problematic lstat_log variable that showed a counter-intuitive negative coefficient in Model 1. Despite Model 1's marginally better performance metrics, an ANOVA test revealed no significant improvement from the additional variable (p=0.63), confirming Model 2 as the most parsimonious and interpretable model for predicting high-crime neighborhoods.
t-Test HW
Movie Success Prediction
This data analysis explores the key factors influencing movie success using the ggplot2movies dataset of 58,788 films. Through comprehensive statistical analysis and interactive visualizations, we answer some industry questions: (1) Which genres consistently achieve higher ratings? (2) Does budget correlate with better ratings? (3) How have ratings evolved over time? (4) Do genre combinations outperform single genres? The presentation includes an interactive Shiny app for exploring genre combinations demonstrating predictive limitations. This is perfect for filmmakers, investors, and data enthusiasts interested in the cinema business.