gravatar

mtazike

Mahya

Recently Published

Data Dive 8 — Regression Modeling
This notebook continues the analysis of the World Bank Statistical Performance Indicators (SPI) dataset, a longitudinal country-level dataset covering 217 countries from 2004 to 2023. Each row represents one country-year observation and includes multiple measures of statistical capacity, such as data use, data products, and infrastructure. Building on the hypothesis testing work from Week 7, this analysis shifts toward modeling. Two questions are addressed. First, does income group explain differences in overall statistical performance across countries? This is tested using a one-way ANOVA. Second, can a single continuous sub-indicator, data_products_score, predict a country's overall performance? This is explored through simple linear regression. These two approaches move beyond comparing group means toward understanding the structure of the relationship between statistical capacity indicators.
Week 7 Data Dive — Hypothesis Testing
This notebook continues the analysis of the World Bank Statistical Performance Indicators (SPI) dataset, a longitudinal country-level dataset covering 217 countries from 2004 to 2023. Each row represents one country-year observation and includes multiple measures of statistical capacity, such as data use, production, and infrastructure. This week, hypothesis testing is used to examine whether meaningful differences in statistical performance exist between income groups. Specifically, AB testing compares High income countries (Group A) and Low income countries (Group B) across two performance indicators. Two hypothesis testing frameworks are applied. Hypothesis 1 uses the Neyman–Pearson framework, which involves pre-specified error rates, power analysis, and a reject or fail-to-reject decision. Hypothesis 2 uses Fisher’s significance testing framework, which focuses on interpreting the p-value and assessing the strength of evidence against the null hypothesis. Understanding the relationship between income level and statistical capacity has policy relevance, as it may inform decisions related to development funding, technical assistance, and governance priorities.
Week 6 Data Dive — Confidence Intervals
In this data dive, I explore the Statistical Performance Indicators (SPI) dataset from the World Bank, accessed via TidyTuesday. Each row in this dataset represents a country–year observation, tracking how well countries manage and use statistical data across multiple dimensions over the years 2004–2023.
Data Dive 5 - Documentation
In this data dive, I examine the Statistical Performance Indicators dataset from the World Bank to identify unclear elements in the data and documentation. The dataset contains 4,340 rows and 12 columns, with each row representing a country in a specific year. The goal is to critically evaluate what's clear, what's unclear, and what issues might affect analysis.
Data Dive 4 - Sampling and Drawing Conclusions
In this data dive, I will explore how different random samples from the same dataset can produce varying results. This helps demonstrate how sampling variability can influence the conclusions we draw from data.
Week 3 Data Dive - Group By and Probabilities
This notebook explores group-by analysis and probability concepts using a country–year dataset. It examines regions, income levels, population summaries, and region–income combinations to identify rare, common, and missing patterns.
Data Dive
Summary statistics and visual exploration of the dataset