gravatar

r4_data

Juan Fernando Mosquera Araujo

Recently Published

Sampling methods, IBM Attrition dataset
We'll look at sampling methods using a synthetic (fictional) employee attrition dataset from IBM
Paired t-test US presidential elections 2008-2012
Here's a dataset of US presidential elections. Each row represents a presidential election at the county level. The variables in the dataset are the US state, the county within that state, and the percentage of votes that went to the Democrat candidate in 2008, and in 2012.
Late Shipments Hypothesis testing
The late_shipments dataset contains supply chain data on the delivery of medical supplies. Each row represents one delivery of a part. The late columns denotes whether or not the part was delivered late. A value of "Yes" means that the part was delivered late, and a value of "No" means the part was delivered on time.
Hypothesis testing, Stack Overflow developer survey 2020
Each year, Stack Overflow surveys its users, who are primarily software developers, about themselves, how they use Stack Overflow, their work, and the development tools they use. In this analysis we'll look at a subset of the survey responses, from users who identified as Data Scientists.
Exploratory Data Analysis DC & Marvel Comics
Two publishers, Marvel and DC, have created a host of superheroes that have made their way into popular culture. You’re probably familiar with Batman and Spiderman, but what about Mor the Mighty? The comics dataset has information on all comic characters that have been introduced by DC and Marvel.