RPubs

by RStudio

eegabrielvicee

Escarlet Gabriel Vicente

Recently Published

Spam vs Ham Classification

This project uses the SpamAssassin corpus to build a Naive Bayes classifier that predicts whether emails are spam or ham. After preprocessing and vectorizing the text, the model is trained and evaluated on a test set. The results highlight the challenges of imbalanced text data while demonstrating the full workflow for building a spam detection model in R.

about 2 hours ago

Chess ELO Calculations

This publication applies the Elo expected-score formula to a chess tournament dataset. By estimating each player’s predicted score from their rating and average opponent strength, I calculate expected performance for all seven rounds. I then compare these expectations to actual results, identify overperformers and underperformers, and demonstrate how Elo metrics can be used to evaluate tournament performance.

7 days ago

Tidying and Transforming Data (Corrected)

In this assignment, I tidied a flight delay dataset, reshaped it into long format, and compared on-time vs delayed flights. The analysis showed how overall results differed from city-level patterns.

7 days ago

Analyzing Nobel Prize Data Using the Nobel Prize API

This report explores Nobel Prize data retrieved directly from the official Nobel Prize API. Using R, the analysis extracts and transforms JSON data to answer key questions about laureate distribution, average age at award, and global trends across countries and continents.

14 days ago

Scenario Design Analysis: Netflix Recommender System

This report explores Netflix’s recommendation system through the scenario design framework. It examines both user and organizational perspectives to understand how personalization supports engagement and satisfaction. The analysis also includes a brief reverse engineering of Netflix’s hybrid model and recommendations to improve transparency, diversity, and user experience.

14 days ago

Sentiment Analysis with Tidy Data

Using Jane Austen’s novels, this analysis extends examples from Text Mining with R through the application of NRC, Bing, and AFINN lexicons to study sentiment and emotion.

21 days ago

New York Times API — Most Emailed Articles Analysis

his report uses the New York Times Most Popular API to retrieve and analyze data on the most emailed articles from the past seven days. The JSON response was parsed and transformed into a clean R DataFrame, allowing exploration of the most discussed sections and topics trending among readers.

28 days ago

Most Valued Data Science Skills: A Relational Database and R Analysis

This project analyzes Kaggle’s Data Science Job Postings & Skills (2024) dataset using PostgreSQL and RStudio to identify the most in-demand skills in data science. After cleaning and normalizing the data, the results show that Python, SQL, and Machine Learning are the top skills sought by employers in today’s data-driven market.

about 1 month ago

Most Valued Data Science Skills

This project explores the most valued data science skills using a Kaggle dataset of over 12,000 LinkedIn job postings. The analysis was completed in R through data cleaning, transformation, and visualization to identify which technical and analytical skills are most in demand. The results show that Python, SQL, Machine Learning, and Communication are among the top abilities sought by employers in 2024.

about 1 month ago

Favorite Books HTML XML and JSON Data Representation

This assignment demonstrates how the same dataset can be represented in different formats (HTML, XML, JSON) and how each can be read back into R for comparison. The example uses three personal books to show how structure and purpose vary across formats while maintaining consistent information.

about 1 month ago

Tidying and Transforming Travel Agency Price Data

This final part of the project focuses on tidying a travel pricing dataset. By converting wide-format seasonal data and separating combined fields, it became easier to analyze and visualize price trends across agencies and service types. This final section concludes the full data tidying workflow by emphasizing organization, transformation, and clarity in analysis.

about 2 months ago

Tidying and Transforming Country GDP and Population Data

In this continuation of the project, I tidied and transformed a wide-format dataset containing population and GDP data for the USA, China, and India from 2000 to 2010. Converting the dataset into long format made it easier to analyze growth patterns and visualize economic trends across countries and years.

about 2 months ago

Tidying and Transforming Travel Expense Report Data

In the first part of this project, I worked with an untidy Travel Expense Report dataset to demonstrate how data can be cleaned, structured, and transformed using R. The analysis compared total and average spending across two cities, San Jose and Seattle, and the final visualization highlighted how data tidying enables clear and reliable insights.

about 2 months ago

RPubs

eegabrielvicee

Escarlet Gabriel Vicente

Recently Published

Spam vs Ham Classification

Chess ELO Calculations

Tidying and Transforming Data (Corrected)

Analyzing Nobel Prize Data Using the Nobel Prize API

Scenario Design Analysis: Netflix Recommender System

Sentiment Analysis with Tidy Data

New York Times API — Most Emailed Articles Analysis

Most Valued Data Science Skills: A Relational Database and R Analysis

Most Valued Data Science Skills

Favorite Books HTML XML and JSON Data Representation

Tidying and Transforming Travel Agency Price Data

Tidying and Transforming Country GDP and Population Data

Tidying and Transforming Travel Expense Report Data

Tidying and Transforming Data

Chess Tournament Project

Airline Safety Dataset

Sign In

eegabrielvicee

Escarlet Gabriel Vicente

Recently Published