gravatar

TAFADZWA

LOVENESS

Recently Published

Document
This project applies a pure dimension reduction approach using t-SNE to the Wine Quality dataset, which consists of physicochemical measurements of red wine samples and corresponding quality ratings assigned by expert tasters. The objective of the analysis is to explore whether wines with similar quality scores exhibit similar physicochemical characteristics when projected into a two-dimensional space
Document
This report analyzes the New York City Jobs dataset using Principal Component Analysis (PCA) and K-means clustering. The analysis focuses on numeric features such as the number of positions and salary ranges to identify patterns in job postings. First, the data is cleaned and numeric columns are standardized. PCA is then applied to reduce dimensionality, summarizing the main variation in the dataset into two principal components. The optimal number of clusters is determined using the Elbow and Silhouette methods, followed by K-means clustering on the PCA-reduced data. The resulting clusters are visualized and summarized in a table, providing insights into the distribution of job positions and salary ranges. This workflow allows readers to quickly understand patterns in the NYC Jobs dataset and can serve as a foundation for further analysis, such as investigating salary trends by agency or job type.