gravatar

amorciglio1027

Anthony Morciglio

Recently Published

Testing Homogeneity of Variance
In this R script, we compare the effects of medication for two independent groups on blood pressure. We apply F-test (for normality assumptions) and Levene's test (a more robust method) to test for the homogeneity of variance. If the variances are the same for each group, we use the pooled variance and compare the differences in the means between the two groups. Otherwise, the variances are unequal, and we apply the Welch's t-Test to determine the differences in averages between the two groups.
PBJ Healthcare Analysis
This R script summarizes the key findings of a healthcare dataset obtained on CMS.gov. The data consists of hypothesis testing and the distribution of time worked for medical staff.
Central Limit Theorem of the Exponential Distribution
In this R script, we demonstrate the distribution of the sample average of an exponentially distributed random variable resembles the normal distribution.
Shapiro - Wilk Test for Normality
In this R script, we test to see if a continuously distributed random variable is normally distributed. Typically, empirical observations need not follow a specific distribution, so we need to conduct statistical test to see what is the most likely distribution. Experimentally, we sample from an exponential and normal distribution and perform the Shapiro-Wilk test (useful for low samples - KS test for large samples) to test for normality.
Bayesian Structure for Dirichlet Multinomial Model
In this R script, we illustrate the heterogeneity of the posterior distribution using a Dirichlet prior for the multinomial distribution. We display Bayesian conjugacy plots that display the variability of prior and posterior distributions in the form of a contour plot.
Dynamical Systems - Logistic Difference Equation
In this R script, we explore the finite difference equation known as the logistic equation. We compute the trajectories for various parameter values of r and plot the steady states as a function of r and plot the last 400 of the 1200 time points. The result is a bifurcation map that displays a period-doubling bifurcation along with a region of chaotic oscillations (unpredictable).
Hypothesis Testing for Population Variance
In this R script, we conjecture if the population variance is greater than the hypothesized population variance using the sample standard and critical chi-square value. If the p-value is below a threshold, we reject the null hypothesis and the population variance is indeed greater than the hypothesized variance. Results are useful when comparing samples and making inferences about the population variances before performing statistical tests on the measures of central tendency.
Confidence Interval for Population Variance
In this R script, we explore the methods to estimate the population variance using the chi-square distribution and interval estimation. Results can be applied to a given data variable where the population variance is unknown, but the sample standard deviation can be computed.
Incomplete Beta Function
In this Rscript, we illustrate the connection between the Incomplete Beta Function (used in the Beta probability distribution) and how it is related to the Gamma function.
Health and Fitness Project
This R script performs a comprehensive statistical and machine learning analysis of a health and fitness dataset. The Health and Fitness dataset can be accessed in the following link: Kaggle. It begins with descriptive statistics of key variables, then explores relationships between features. Specifically, it runs a regression model predicting Calories_Burned based on BMI, Workout_Frequency, and Workout_Type. The code also calculates correlations, including the relationship between Water_Intake and Avg_BPM, and between Workout_Type and Calories_Burned, to assess how hydration and exercise style influence performance. To uncover patterns in member behavior, it applies K-means clustering on variables such as Age, Avg_BPM, and Calories_Burned, grouping individuals into distinct activity level categories. Additionally, the script visualizes the distributions of calories and exercise types, highlighting measures of central tendency along with the 25th and 75th percentiles. Together, this analysis provides insights into how workout habits, body metrics, and exercise choices shape overall fitness outcomes.
Geometric Distribution - Method of Moments
The accompanying R code generates a visualization of the Expected Value and Median as functions of p, using ggplot2 to plot both curves on the same graph. The results show that as p approaches 1, both statistics converge to 1 (success almost always occurs immediately), while for small p, the Expected Value grows rapidly like 1/p while the Median increases more moderately, reflecting the skewness of the distribution. Together, the blog and code provide both a mathematical derivation and a graphical interpretation that clarify how the mean and median behave under different probabilities.
Uniform Distribution: Moment Generating Function
This R script is a hands-on guide to the Moment Generating Function (MGF) of the Discrete Uniform Distribution. It defines a custom function, uniform_mgf, to calculate the MGF for any discrete uniform random variable. The code then uses this function to demonstrate a powerful application: approximating the distribution's mean and variance by taking numerical derivatives of the MGF evaluated at t=0. The script also provides two informative plots to visualize these concepts. The first plot displays the MGF for different distributions (e.g., a 6-sided die, a 10-sided die, etc.), showing how the function's curve changes with the number of possible outcomes. The second plot illustrates the fundamental relationship between the number of outcomes and the key moments of the distribution, clearly showing that the mean grows linearly, while the variance increases quadratically. This visualization effectively reinforces the theoretical properties of the discrete uniform distribution discussed in the blog post.
Visualizing the Measures of Spread and Central Tendency of the Discrete Uniform Distribution
The R script is a comprehensive tool for analyzing the Discrete Uniform Distribution. It begins by defining the parameters of a fair eight-sided die, which serves as the core example. The code then systematically calculates and displays the key descriptive statistics: median, mean, and variance. A series of `ggplot2` plots visually complements these numerical results.
Binomial Distribution: PMF and CDF
Analyzing the probability mass function (PMF) and distribution function (CDF) of the Binomial Distribution.
Exponential Distribution: An Experiment
We use the Mean Time Until Failure (MTTF) to estimate the rate parameter of the Exponential Distribution. After that, we visualize the density and distribution functions.
Real Estate Analysis - Texas 2021 - Draft 1
This is a summary of the different property listings in the state of Texas. In this analysis, I make a hypothesis about the influence of School Performance on the property value indicated by listing price and sales velocity. I use visualizations to depict the qualitative differences based on the data.
A Bakery Sales Mock Project
In this mock project, we evaluate the weekly sales at a local bakery. Compared to other projects, this is the simplest analysis that can be used for small businesses to better understand which items are most popular.
Portfolio Project Smartwatch
In this project, I combine multiple datasets (some of which have different time units) into a single data frame. The result is that it contains multiple numerical columns and I use binning to categorize the data based on minutes of activity. These visualizations are created to understand the trends within the data.
Portfolio Project Cyclist Trips Q1
This project analysis was performed by Anthony in the beginning of June from his Coursera Google Analytics Certification. The data is from Coursera and the study is performed to understand the properties of cyclists who purchase the membership. For clarity, a Customer is a person who purchases a single-day pass, whereas a Subscriber purchases an annual membership. For simplicity, the analysis was conducted in the first quarter (Q1) of 2019 and 2020. The process aims to develop better marketing strategies to convince Customers to enroll in the membership and become Subscribers by identifying trends within historical Cyclist data in Chicago.
R Markdown - Demo - Penguins
This markdown file is created to showcase the capabilities of RStudio for data analysis including using ggplot2, dplyr, and corrplot. The dataset is imported from the library 'palmerpenguins' and consists of 344 observations and 8 features.