Recently Published

Thermal data panel corrections
Recommendation Engines
Using Market Basket Analysis & Collaborative Filtering models to evaluate data.
Trying TWD_norm on MPJ data
Recommendation Engines
Shinylive Demo
Demo for shinylive
Document
Metodo Aceptacion rechazo
Ejemplos de método aceptación rechazo uno con densidad conocida y el segundo con una función de densidad
US Data Salaries
This presentation analyzes U.S. Bureau of Labor Statistics OEWS May 2024 data to explore salary distributions across nine data practitioner roles nationally and by state, using box plots, ranked bar charts, connected dot plots, and choropleth maps to visualize how role and geography shape earning potential
hw4 solongo
investment portfolio analysis
Data Dive 10 — Generalized Linear Models
This notebook extends the regression analysis from previous weeks by introducing a generalized linear model (GLM), specifically logistic regression, to model a binary outcome. Prior analyses focused on continuous outcomes using linear regression — predicting overall_score from sub-indicator scores and income group. Logistic regression is needed here because the response variable is binary. Rather than modeling a score, the goal is to model the probability that a country is high-performing, defined as having an overall_score above 85. This threshold was chosen because it represents a meaningfully high level of statistical performance, well above the dataset mean, and results in a reasonably balanced split: 48 high-performing countries and 138 that fall below the threshold. The three predictors used are data_use_score, data_services_score, and data_infrastructure_score — sub-indicators that reflect distinct dimensions of a country’s statistical capacity and were not directly used as outcome variables in prior models. The dataset is the World Bank Statistical Performance Indicators dataset, covering 217 countries from 2004 to 2023. The analysis uses only the 2023 cross-sectional snapshot.
program_5