Recently Published

It's Not Always Sunny in New York City
This is an in-depth analysis of the Flights dataset, which contains flights that departed from New York City airports in 2013.
Assignment 3
Weather and the Relation to Flight Cancellations and Departures
A MAT-210 project looking at data in NYCflight13 to find a relationship between weather and flight delays and cancellations.
Publish Document
tarea
Assignment 3
Homework 1
HTML
Latihan Week 4
Plot
In this case study, we will perform data analysis for Bellabeat, a high-tech manufacturer of health-focused products for women. You will analyze smart device data to gain insight into how consumers are using their smart devices. Your analysis will help guide future marketing strategies for your team. Along the way, you will perform numerous real-world tasks of a junior data analyst by following the steps of the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act. By the time you are done, you will have a portfolio-ready case study to help us demonstrate your knowledge and skills to potential employers. Questions for the analysis¶ 1.What are some trends in smart device usage? 2.How could these trends apply to Bellabeat customers? 3.How could these trends help influence Bellabeat marketing strategy Business task Identify potential opportunities for growth and recommendations for the Bellabeat marketing strategy improvement based on trends in smart device usage. Loading packages library(tidyverse) install.packages(tidyverse) library(lubridate) library(dplyr) library(ggplot2) library(tidyr) > library(tidyverse) ── Attaching core tidyverse packages ─────────────────────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.2 ✔ tibble 3.3.0 ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ✔ purrr 1.1.0 ── Conflicts ───────────────────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package to force all conflicts to become errors > install.packages("tidyverse") Attaching package: ‘lubridate’ The following objects are masked from ‘package:base’: date, intersect, setdiff, union Importing datasets¶ For this project, I will use FitBit Fitness Tracker Data. bellabeat <- read.csv("bellabeat_merged.csv") # https://www.kaggle.com/arashnic/fitbit # Remember, there are many different CSV files in the dataset. # We have uploaded two CSVs into the project, but you will likely # want to use more than just these two CSV files. # Create a dataframe named 'bellabeat' and read in one # of the CSV files from the dataset. Remember, you can name your dataframe # something different, and you can also save your CSV file under a different name as well. I already checked the data in Google Sheets. I just need to make sure that everything were imported correctly by using View() and head() functions. head(bellabeat) > head(bellabeat) Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitiesDistance 1 1503960366 3/25/2016 11004 7.11 7.11 0 2 1503960366 3/26/2016 17609 11.55 11.55 0 3 1503960366 3/27/2016 12736 8.53 8.53 0 4 1503960366 3/28/2016 13231 8.93 8.93 0 5 1503960366 3/29/2016 12041 7.85 7.85 0 6 1503960366 3/30/2016 10970 7.16 7.16 0 VeryActiveDistance ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance 1 2.57 0.46 4.07 0 2 6.92 0.73 3.91 0 3 4.66 0.16 3.71 0 4 3.19 0.79 4.95 0 5 2.16 1.09 4.61 0 6 2.36 0.51 4.29 0 VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories 1 33 12 205 804 1819 2 89 17 274 588 2154 3 56 5 268 605 1944 4 39 20 224 1080 1932 5 28 28 243 763 1886 6 30 13 223 1174 1820 > intensities$ActivityHour=as.POSIXct(intensities$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()) I spotted some problems with the timestamp data. So before analysis, I need to convert it to date time format and split to date and time. colnames(bellabeat) head(sleep_day) colnames(sleep_day) Exploring and summarizing data¶. n_distinct(daily_activity$Id) n_distinct(bellabeat$Id) n_distinct(sleep_day$Id) nrow(bellabeat) nrow(sleep_day) This information tells us about number participants in each data sets. There is 33 participants in the activity, calories and intensities data sets, 24 in the sleep and only 8 in the weight data set. 8 participants is not significant to make any recommendations and conclusions based on this data. Let’s have a look at summary statistics of the data sets: bellabeat %>% select(TotalSteps, TotalDistance, SedentaryMinutes) %>% summary() sleep_day %>% select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>% summary() TotalSteps TotalDistance SedentaryMinutes Min. : 0 Min. : 0.000 Min. : 32.0 1st Qu.: 1988 1st Qu.: 1.410 1st Qu.: 728.0 Median : 5986 Median : 4.090 Median :1057.0 Mean : 6547 Mean : 4.664 Mean : 995.3 3rd Qu.:10198 3rd Qu.: 7.160 3rd Qu.:1285.0 Max. :28497 Max. :27.530 Max. :1440. Some interesting discoveries from this summary: Average sedentary time is 991 minutes or 16 hours. Definately needs to be reduced! The majority of the participants are lightly active. On the average, participants sleep 1 time for 7 hours. Average total steps per day are 7638 which a little bit less for having health benefits for according to the CDC research. They found that taking 8,000 steps per day was associated with a 51% lower risk for all-cause mortality (or death from all causes). Taking 12,000 steps per day was associated with a 65% lower risk compared with taking 4,000 steps. Merging data Before beginning to visualize the data, I need to merge two data sets. I’m going to merge (inner join) activity and sleep on columns Id and date (that I previously created after converting data to date time format). Visualization¶ ggplot(data=bellabeat, aes(x=TotalSteps, y=SedentaryMinutes)) + geom_point() ggplot(data=sleep_day, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + geom_point() combined_data <- merge(sleep_day, bellabeat, by="Id") n_distinct(combined_data$Id) # Note that there were more participant Ids in the daily activity # dataset that have been filtered out using merge. Consider using 'outer_join' # to keep those in the dataset. # Now you can explore some different relationships between activity and sleep as well. # For example, do you think participants who sleep more also take more steps or fewer # steps per day? Is there a relationship at all? How could these answers help inform # the marketing strategy of how you position this new product? # This is just one example of how to get started with this data - there are many other # files and questions to explore as well!