Recently Published

Text Mining Project: EDA and Prediction Plan
This project demonstrates the end-to-end development of a data-driven application using R and Shiny. The goal is to build an interactive Next Word Predictor powered by an N-gram language model with a backoff strategy. The application processes text data from the Reuters crude oil dataset, cleans and tokenizes it, and constructs unigrams, bigrams, and trigrams to predict the most likely next word in a user-provided phrase. The model prioritizes trigram matches for context, falls back to bigrams when necessary, and defaults to unigrams for general predictions. To quantify uncertainty, the app calculates entropy, providing users with a measure of prediction confidence. The Shiny interface allows users to input text, view top predictions, and explore visualizations such as word frequency charts, bigram and trigram plots, and word clouds.
Palmers Penguins Data
DSLabs
New York Times API — Most Emailed Articles Analysis
This report uses the New York Times Most Popular API to retrieve and analyze data on the most emailed articles from the past seven days. The JSON response was parsed and transformed into a clean R DataFrame, allowing exploration of the most discussed sections and topics trending among readers.
Document
DSLabs
Tidyverse_Vignette
Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle or another source of your choosing, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset
DATA 624 - Project 1