Recently Published

UN General Debate Speeches (1946–2024): Unsupervised Clustering with Dimension Reduction
This study analyzes United Nations General Debate (UNGD) speeches from 1946 to 2024 to identify latent thematic and geopolitical structures without predefined labels. Using a classic information‑retrieval pipeline—TF‑IDF vectorization, truncated singular value decomposition (LSA) for dimension reduction, and spherical k‑means clustering—we recover coherent clusters that align with major shifts in multilateral discourse. Cluster labels are derived with a class‑based TF‑IDF (c‑TF‑IDF) procedure, and robustness is supported by stability diagnostics across random initializations. The temporal distribution of clusters highlights clear transitions in agenda setting (e.g., post‑war/decolonization, Cold War realignments, development eras, sustainability and health shocks), while country‑level summaries show geographic concentration within clusters. The results demonstrate that simple, transparent linear methods can yield interpretable structure in large political text corpora.
ANALISE DE JAMBU (IN VITRO)
Duas variedades de Jambu sob duas temperaturas
Pitch Deck
Análisis Probabilístico de los Operadores de Pozos Petrolíferos en Brasil
Estudio estadístico de los pozos en Brasil, clasificando a sus operadores en categorías estratégicas (Estatales, Nacionales y Extranjeras). Incluye tablas de frecuencia, visualización gráfica y un modelo de probabilidad empírica para la inferencia de datos.
Project 2 PCA
PCA
Project 1 Clustering
Clustering
Association Rules for Business Trust Risk Signals (The InBillo Project)
This project applies association rule mining to a subset of the InBillo dataset in order to identify interpretable combinations of business characteristics associated with low customer trust. The analysis focused on non-score attributes such as firm age, size, legal form, financial transparency (debts) and online presence, using the Apriori algorithm to identify and extract repetitive patterns.
Global Development Patterns via PCA and Clustering
This report applies unsupervised learning methods to World Development Indicators (WDI) data to explore latent global development structures. Principal Component Analysis (PCA) is used to reduce dimensionality and identify interpretable development dimensions, followed by hierarchical clustering in the reduced space to derive stable country groups. The analysis emphasizes methodological justification, validation, and interpretability.
Reporting Flexdashboard by Prof. Dr. Solym Manou-Abi
Reporting Flexdashboard – Analyse des vols NYC 2013