RPubs

by RStudio

Recently Published

Hypothesis Testing

In this analysis. I performed hypothesis testing about the dataset "mtcarts" that contains about 32 car models from 1970.

5 months ago

Nonlinear Regression Analysis: Friedman1 Benchmark Dataset This analysis explores nonlinear regression modeling using the Friedman1 benchmark dataset, a simulated dataset designed to evaluate machine learning algorithms on complex nonlinear relationships. The true data-generating function is y = 10·sin(π·X1·X2) + 20·(X3-0.5)² + 10·X4 + 5·X5 + ε, where only five of ten predictors (X1-X5) are informative, while the remaining five (X6-X10) are pure noise. Across Exercises 7.2 and 7.5, we trained and evaluated multiple regression models including Linear Regression, GLMNET, K-Nearest Neighbors, Multivariate Adaptive Regression Splines (MARS), Support Vector Machines (SVM), and Random Forest. MARS emerged as the optimal model with a test set RMSE of 1.159 and R² of 0.946, representing a 56.6% improvement over the best linear model (RMSE = 2.670). Remarkably, MARS achieved perfect feature selection accuracy (100%), correctly identifying all five informative predictors while completely excluding all noise variables—a capability that distinguishes it from linear approaches which assigned non-zero importance to spurious predictors. The analysis demonstrates that MARS's adaptive basis functions not only capture complex nonlinear patterns including multiplicative interactions (X1·X2) and quadratic relationships (X3²) but also perform automatic variable selection, making it particularly valuable for high-dimensional datasets where distinguishing signal from noise is critical. These findings validate MARS as a powerful tool for nonlinear regression that combines predictive accuracy with interpretability through its piecewise linear structure and transparent feature selection mechanism.

5 months ago