gravatar

HARIVARDHINI

HARIVARDHINI V

Recently Published

Exploratory Data Analysis for Next-Word Prediction Using the SwiftKey Text Dataset
This report presents the exploratory analysis completed for the Coursera Data Science Capstone project, which involves building a next-word prediction model using the SwiftKey text corpus (Blogs, News, and Twitter data). The analysis includes: Loading and sampling the raw dataset Text cleaning and preprocessing Summary statistics such as line and word counts per source Tokenization and creation of unigram, bigram, and trigram frequency tables Visualizations of the most frequent words and n-grams The report also outlines the planned predictive modeling approach using n-grams with a backoff strategy and the development of an interactive Shiny application that will provide next-word suggestions to users.