gravatar

nduonochie

Ndubuisi Chibuogwu

Recently Published

Premier League Match Predicition
This project combines exploratory data analysis, dimensionality reduction, clustering, and predictive modeling to analyze soccer match data and predict outcomes. The results provide valuable insights into team performance and the factors influencing match results, with potential applications in sports analytics, betting, and team strategy development.
Predicting Concrete Strength: A Multivariate and Logistic Regression Approach to Classifying Compressive Strength Outcomes
We wanted to understand if given the ingredients of concrete, could we accurately predict if the resultant compressive strength of that concrete would meet industry standards (4000 PSI). First, we performed EDA to refine a multi-linear regression model. Then, we took our model and our engineered term of above or below 4000 PSI to train a subset of our data and assess the accuracy against a testing subset. Our final model came out at 87% accurate. Below, you can find our interpretation of this value. Interesting insights and limitations: Some variables of concrete are not ‘necessary’ but are rather additives that can strengthen concrete by enhancing the effects of more primary ingredients like cement. These types of elements (like Superplasticity, Slag, and Fly Ash) have strong interactivity with other concrete ingredients to improve the overall strength. Once we added interactive terms to our multi-linear regression for these ancillary ingredients with more primary ingredients, the model R2 value improved by 15% Some variables have a direct effect on the strength of concrete without any interactive term (or added ingredients). For example, cement content correlates closely with concrete compressive strength. This is evident in the low p-value from the model summary, and a simple scatter plot between the two variables. Our logistic regression model had an accuracy of 87%. In other words, if someone has the ingredients to make concrete and plugs those values into our model, our model will predict whether the resultant compressive strength is above or below 4000 PSI. 87% of the time, our model will accurately predict if the concrete strength is above or below that threshold. These types of logistic regression models are likely widely used in the real-world. If concrete unexpectedly fails, the consequences can be severe. While decent, our model could likely be improved. We estimate that more time would be needed to determine exact interactive terms between variables. In this model, we managed to capture a few obvious ones from some of our diagnostic plots and EDA. Given how those interactive terms improved our model accuracy, more refined ones may further improve this model.
Concrete Strength
The document walks through a structured analysis pipeline to identify key predictors affecting concrete strength. It begins with loading a dataset, renaming and cleaning the data, and applying data normalization techniques. The study then develops multiple linear regression models, tests for model assumptions, and applies transformations (Box-Cox, square root) to improve model fit and interpretability. The analysis also includes influence diagnostics (Cook’s distance, leverage plots) and residual analysis to validate model robustness. Key variables such as Cement, BFS (Blast Furnace Slag), Water, Superplasticizer, Coarse Aggregate, Fine Aggregate, and Age are evaluated for their contribution to compressive strength. Box-Cox analysis suggests a square root transformation is optimal. EDA visuals (histograms, scatter plots, pair plots) highlight variable distributions, correlations, and skewness. The report concludes with detailed interpretation of variable effects on concrete strength, identifying Cement and Age as the strongest positive contributors, and Water as the most negatively correlated factor. It provides a summary of statistical findings, model summaries, and recommendations for predictive modeling improvements.
Document
Independently explore data files, which include students' performance on STAAR EOC for Algebra I and student demographic information. Merge datasets, perform an exploratory analysis, and communicate findings. This assignment aims to evaluate my proficiency in data cleaning, analysis, and visualization techniques.
Data Confidentiality
This publication is about COMPAS - Correctional Offender Management Profiling for Alternative Sanctions (COMPAS). A case management and decision support tool developed and owned by Northpointe. It is used by U.S. courts to assess the likelihood of a defendant becoming a recidivist. This publication showcases the replication of the COMPAS using RMarkdown.
RMarkdown Publication
Sample Work of A Technical Documentation using LaTeX