Recently Published
Hypothesis Testing
In this analysis. I performed hypothesis testing about the dataset "mtcarts" that contains about 32 car models from 1970.
Non Linear Models
Nonlinear Regression Analysis: Friedman1 Benchmark Dataset
This analysis explores nonlinear regression modeling using the Friedman1 benchmark dataset,
a simulated dataset designed to evaluate machine learning algorithms on complex nonlinear
relationships. The true data-generating function is y = 10·sin(π·X1·X2) + 20·(X3-0.5)² +
10·X4 + 5·X5 + ε, where only five of ten predictors (X1-X5) are informative, while the
remaining five (X6-X10) are pure noise. Across Exercises 7.2 and 7.5, we trained and
evaluated multiple regression models including Linear Regression, GLMNET, K-Nearest Neighbors,
Multivariate Adaptive Regression Splines (MARS), Support Vector Machines (SVM), and Random
Forest. MARS emerged as the optimal model with a test set RMSE of 1.159 and R² of 0.946,
representing a 56.6% improvement over the best linear model (RMSE = 2.670). Remarkably,
MARS achieved perfect feature selection accuracy (100%), correctly identifying all five
informative predictors while completely excluding all noise variables—a capability that
distinguishes it from linear approaches which assigned non-zero importance to spurious
predictors. The analysis demonstrates that MARS's adaptive basis functions not only capture
complex nonlinear patterns including multiplicative interactions (X1·X2) and quadratic
relationships (X3²) but also perform automatic variable selection, making it particularly
valuable for high-dimensional datasets where distinguishing signal from noise is critical.
These findings validate MARS as a powerful tool for nonlinear regression that combines
predictive accuracy with interpretability through its piecewise linear structure and
transparent feature selection mechanism.