Recently Published
Forecasting Lumpy and Intermittent Stores Sales Series with Global Recursive Machine Learning Using Modeltime
This study aims to compare the performance of traditional time series models with machine learning models, especially XGBoost, for forecasting intermittent and lumpy demand. It also evaluates the effectiveness of global models compared to local approaches, considering the heterogeneity of the product base
Association Rules
This work explores association rule mining, specifically focusing on the Apriori algorithm and the use of Bootstrap for enhancing the robustness of the results. The study explains how Apriori identifies frequent item sets and generates association rules based on support and confidence metrics, with the concept of "Lift" used to measure the strength of item relationships. The dataset used for the analysis is the Groceries dataset, where the algorithm identifies frequent item combinations like "if buying milk, then buying bread." The paper also demonstrates the application of Bootstrap to improve reliability and mitigate errors, ensuring that the identified rules are applicable to new datasets. Lastly, the work highlights the visual representation of results through tables and graphs to interpret the findings more effectively
Predictive Modeling of Dementia: Variable Analysis
This project investigates the relationship between ASF (Atlas Scaling Factor), EDUC (years of education), and nWBV (Normalized Whole Brain Volume) in predicting dementia. It employs univariate analyses, Random Forest, and decision trees to identify significant predictors. The results show that lower brain volumes and education levels are associated with a higher risk of dementia, achieving an AUC of 0.73, indicating moderate predictive ability. The final model selected EDUC and nWBV as key variables for predictions.
LMM Models - Random and Fixed Effects
This study investigates the application of Linear Mixed Models (LMM) to analyze clustered data, where traditional independence assumptions are violated. By distinguishing between fixed and random effects, we aim to capture the inherent variability within clusters, such as schools or companies. This approach enhances our understanding of how predictors influence outcomes while accounting for inter-cluster differences, ultimately providing deeper insights into complex datasets. The findings underscore the importance of LMM in transforming data into actionable insights in various fields.
Non-Linear Regression Splines
This document explores the approach to nonlinear regression models using splines, focusing on solving linearity problems in data. Through the application of cubic and natural splines, analyzes and visualizations are presented that demonstrate how these techniques improve the fit to the data. Furthermore, performance metrics are calculated to evaluate the effectiveness of the proposed models, contributing to a deeper understanding of the non-linear relationships between variables.
Three Optimization Methods for Linear Regressions
This project aims to demonstrate the fundamental calculations behind a regression function using three distinct optimization methods. Each approach offers a unique perspective on how the regression coefficients are determined, allowing for a comparative analysis of their efficiencies and applicability in different scenarios.
T Test for Natural Science Classes Grade
Test Applied for Significance of Differences in Natural Science Scores
A test was conducted to determine the significance of the differences in natural science scores from the ENEM at School Y. Additionally, a beta probability was calculated to ensure the reliability of the results
Lasso Regression for High-Dimensional Datasets
When considering regression for data prediction, Ordinary Least Squares (OLS) is often the first method that comes to mind. However, with large datasets and many variables, the risk of overfitting increases. This is where Lasso and Ridge regressions come in. Ridge regression applies an L2 penalty to control variance, while Lasso uses an L1 penalty to shrink some coefficients to zero, resulting in sparser models that are more effective for variable selection. These techniques are essential for building more robust and generalizable models, especially in high-dimensional scenarios.