Recently Published

Консенсусное дерево советских авторов (современников Шолохова)
Допустим, в нашем научном поле поставили острый вопрос авторства (например: правда ли автором "Тихого Дона" является Шолохов, а не, скажем, Фадеев?). Одним из методов, способных помочь нам ответить на этот вопрос является иерархическая кластеризация и её визуализация через консенсусные деревья. В домашнем задании предлагалось скачать файл со стилометрическими данными "Тихого Дона" и романов других авторов, современников Шолохова, дабы проверить высказанную выше гипотезу с точки зрения стилометрии.
Hand On Mg3 Wasis
Tugas Hand On Mg3
Project-Data Science Presentations
Complete 5-Slide R Presentation for SwiftKey Capstone
Project-Data Science Capstone
This report presents an exploratory analysis of the three text data sets provided for the SwiftKey Capstone Project: blogs, news, and Twitter. The goal is to understand the basic characteristics of these data sets before building a next-word prediction algorithm. Key findings include: The Twitter data set has the most lines (over 2 million) but the smallest file size The blogs data set contains the longest individual lines (over 40,000 characters) Word "love" appears about 4 times more frequently than "hate" in Twitter data All three data sets show similar patterns in word frequency distributions
Document
Government Funding
Infrastructure Funding: Which States Score Big — And Is It Fair? This analysis investigates whether federal funding under the Infrastructure Investment and Jobs Act (IIJA) is distributed equitably across the 50 U.S. states. Using three datasets — federal funding allocations, U.S. Census population data, and 2020 presidential election results — we examine the relationship between population size, political alignment, and federal investment. The analysis unfolds in three phases. First, we compare the top 30 most populated states against the top 30 most funded states, identifying exceptions where funding does not follow population. A dumbbell chart quantifies exactly how far off these outlier states are from their expected share. Second, we shift to a per capita lens, mapping federal funding per person across all 50 states to reveal which citizens receive the most and least federal investment regardless of state size. Third, we introduce the political dimension — using choropleth maps, diverging bar charts, scatter plots with regression lines, and a Wilcoxon rank-sum statistical test to determine whether states that voted for Biden in 2020 received preferential treatment in IIJA funding allocations. Key findings include a strong but imperfect correlation between population and total funding, significant per capita disparities favoring smaller states, and a statistical verdict on whether red or blue alignment predicts funding outcomes. Built in R using ggplot2, dplyr, ggtext, ggrepel, and the maps package. Author: Candace Grant | AI Engineer & Data Scientist | Birds and Roses LLC
Analisis dan Visualisasi Diamonds
Visualisasi ini menggunakan data yang bersumber dari R, yaitu dataset diamonds, yang memuat informasi tentang cut, color, clarity, dan atribut lain dari berlian tersebut.
Linh Le - DV Lab HW 6