Recently Published
UN General Debate Speeches (1946–2024): Unsupervised Clustering with Dimension Reduction
This study analyzes United Nations General Debate (UNGD) speeches from 1946 to 2024 to identify latent thematic and geopolitical structures without predefined labels. Using a classic information‑retrieval pipeline—TF‑IDF vectorization, truncated singular value decomposition (LSA) for dimension reduction, and spherical k‑means clustering—we recover coherent clusters that align with major shifts in multilateral discourse. Cluster labels are derived with a class‑based TF‑IDF (c‑TF‑IDF) procedure, and robustness is supported by stability diagnostics across random initializations. The temporal distribution of clusters highlights clear transitions in agenda setting (e.g., post‑war/decolonization, Cold War realignments, development eras, sustainability and health shocks), while country‑level summaries show geographic concentration within clusters. The results demonstrate that simple, transparent linear methods can yield interpretable structure in large political text corpora.