Recently Published
NYC 2013 Flights Data Cleaning
The main objective of this cleaning process is to transform the raw, segmented temporal data of the NYC 2013 flights dataset into a continuous and usable timeline. By converting military-style integers into formal time objects and reconciling scheduled times with their respective delays, the script aims to create accurate, high-fidelity datetime columns. A critical component of this objective is the implementation of logical corrections for overnight flights, ensuring that arrivals occurring after midnight are correctly attributed to the following calendar day, thereby maintaining temporal integrity for subsequent analysis.
Bavanam Poojitha
In this analysis, I examined whether the intervention reduced participants’ stress levels. The same participants were measured before the intervention (Stress_Pre) and after the intervention (Stress_Post), making this a dependent (paired) design.
My research question was: Is there a significant difference in stress levels before and after the intervention?
The null hypothesis stated that there is no difference in stress scores before and after the intervention. The alternative hypothesis stated that there is a significant difference in stress scores.
Before conducting the inferential test, I examined the assumption of normality using a histogram, boxplot, and Shapiro-Wilk test on the difference scores (After − Before). Although the histogram appeared roughly symmetrical and moderately bell-shaped, the Shapiro-Wilk test was statistically significant (p < .05), indicating that the data were not normally distributed. Additionally, while there were a couple of outliers in the boxplot, they were not far from the whiskers and were not considered severe. Because the normality assumption was violated, I proceeded with a Wilcoxon Signed-Rank test instead of a paired t-test.
The results showed a statistically significant difference between stress levels before and after the intervention, V = 620, p < .001. The median stress score decreased from 47.24 before the intervention to 40.85 after the intervention. The effect size was large (r₍rb₎ = .84), indicating a strong reduction in stress levels following the intervention.
Bavanam Poojitha
In this analysis, I examined whether a physical activity program reduced students’ stress levels. The dataset included the same group of participants measured at two time points: before the program (Stress_Pre) and after the program (Stress_Post). Because the same students were measured twice, this was a dependent (paired) design.
My research question was: Is there a significant difference in stress levels before and after the program?
The null hypothesis stated that there is no difference in stress scores before and after participation in the program. The alternative hypothesis stated that there is a significant difference in stress scores.
Before conducting the inferential test, I checked the assumption of normality using a histogram, boxplot, and Shapiro-Wilk test on the difference scores. Although the histogram appeared slightly negatively skewed and somewhat flat, the Shapiro-Wilk test indicated that the data were normally distributed (p > .05). Therefore, I proceeded with a Dependent Samples t-test.
The results showed a statistically significant difference in stress scores between the pre-program and post-program measurements. Stress levels were significantly lower after the program. The effect size was medium (Cohen’s d = 0.66), indicating a meaningful reduction in stress following participation in the physical activity program.
Bavanam Poojitha
In this analysis, I examined whether students who work differ in the number of hours they study each week compared to students who do not work. The dataset included two independent groups: students who work and students who do not work. The variable of interest was weekly study hours.
My research question was: Is there a significant difference in weekly study hours between working and non-working students?
The null hypothesis stated that there is no difference in study hours between the two groups. The alternative hypothesis stated that there is a significant difference.
Because the two groups consisted of different students, this required an Independent Samples comparison. Before conducting the inferential test, I evaluated the assumption of normality using histograms, boxplots, and Shapiro-Wilk tests. The visual inspections showed positive skewness and potential outliers, particularly in the non-working group. The Shapiro-Wilk test confirmed that at least one group violated the assumption of normality. Therefore, I conducted a Mann-Whitney U test instead of an Independent t-test.
The results indicated that there was not a statistically significant difference in weekly study hours between students who work and those who do not work. Although non-working students had a higher median number of study hours compared to working students, this difference was not statistically significant.