RPubs

by RStudio

Recently Published

QUARTO PENGUINS - quarto trial

By Ellelouise

First attempt at using quarto software for research methods and data analysis.

over 1 year ago

Statistics and Probability for Data Analytics - Kobe Bryant and the "Hot Hand Theory" Lab 3 -

By AkeemLawrence

Week 5 of Data Science. Basketball players who make several baskets in succession are described as having a hot hand. Fans and players have long believed in the hot hand phenomenon, which refutes the assumption that each shot is independent of the next. However, a 1985 paper by Gilovich, Vallone, and Tversky collected evidence that contradicted this belief and showed that successive shots are independent events. This paper started a great controversy that continues to this day, as you can see by Googling "hot hand basketball." For this assignment, we used an open-source dataset of Kobe's shot data from @OpenIntro to focus on the performance of one player: Kobe Bryant of the Los Angeles Lakers. His performance against the Orlando Magic in the 2009 NBA Finals earned him the title Most Valuable Player and many spectators commented on how he appeared to show a hot hand. From my analysis, I debunk the theory of Kobe (RIP) having a hot hand. #RIPKobe #DataScience #hotHandTheory #Probability #independentEvents #GoogleCloud #NBA #Lakers #OrlandoMagic

over 1 year ago

Industry Proximity Network | Sep 29, 2024

By feshbach

# Load required libraries library(tidyverse) library(igraph) library(ggraph) library(colorspace) library(plotly) # Read the data data <- read_csv("Downloads/cgt_county_data_sep_29_2024.csv") # Filter data and calculate RCA data_filtered <- data %>% filter(!grepl("county not reported", county_name, ignore.case = TRUE)) # Create a matrix of industry co-occurrences industry_matrix <- data_filtered %>% filter(M == 1) %>% select(county_geoid, industry_code) %>% table() %>% as.matrix() # Calculate the co-occurrence matrix co_occurrence <- t(industry_matrix) %*% industry_matrix # Calculate the proximity matrix diag_values <- diag(co_occurrence) proximity <- co_occurrence / pmax(rep(diag_values, each = nrow(co_occurrence)), rep(diag_values, times = nrow(co_occurrence))) diag(proximity) <- 0 # Create an edge list from the proximity matrix edge_list <- which(proximity > quantile(proximity[proximity > 0], 0.95), arr.ind = TRUE) edge_df <- data.frame( from = rownames(proximity)[edge_list[,1]], to = colnames(proximity)[edge_list[,2]], weight = proximity[edge_list] ) # Create the graph object g <- graph_from_data_frame(edge_df, directed = FALSE) # Add node attributes V(g)$name <- data_filtered$industry_desc[match(V(g)$name, data_filtered$industry_code)] V(g)$sector <- substr(data_filtered$industry_code[match(V(g)$name, data_filtered$industry_desc)], 1, 2) # Create a color palette for sectors n_sectors <- length(unique(V(g)$sector)) sector_colors <- setNames( rainbow_hcl(n_sectors), sort(unique(V(g)$sector)) ) # Create the ggplot object with tooltips set.seed(42) # For reproducibility p <- ggraph(g, layout = "fr") + geom_edge_link(aes(edge_alpha = weight), show.legend = FALSE, edge_colour = "lightgray") + geom_node_point(aes(color = sector, text = name), size = 3) + # 'text' stores the industry name for tooltips geom_node_text(aes(label = name), repel = TRUE, size = 2, max.overlaps = 10) + scale_edge_alpha(range = c(0.1, 0.5)) + scale_color_manual(values = sector_colors) + theme_graph() + theme(legend.position = "none") + labs(title = "Industry Space: Co-location Based Proximity") # Convert ggplot to an interactive plotly object with hover info p_interactive <- ggplotly(p, tooltip = "text") # 'tooltip = "text"' ensures industry names appear on hover # Display the interactive plot p_interactive # Save the interactive plot as an HTML file htmlwidgets::saveWidget(p_interactive, "industry_space_interactive.html")

over 1 year ago