gravatar

tiangenglu

Tiangeng Lu

Recently Published

aws s3
Access data in aws s3 bucket in R
readxl-dplyr
A few things to note when reading in .xlsx data.
connect-postgresql
Testing R connection to in-memory sql databases using `RPostgres`. The package gets around the error message of "SCRAM authentication requires libpq version 10 or above"
Two-mode Network Visualization
Visualize two-mode (m*n) networks using `igraph`
Skilled Migration Literature Review
A systematic literature review of skilled migration that covers scholarly publications between 1957 and 2023.
World Location List
Uniformed world location list with countries, and cities, and then merge with U.S. Census ACS country codes. This is ongoing work.
Twitter Data Construction and Analyses
After raw Twitter data were downloaded, cleaned, reshaped, and saved. It is time to construct datasets for different analyses. This tutorial includes exploratory/descriptive Twitter data longitudinal patterns, mention-network attributes, hashtag co-occurrence adjacent matrix construction, and hashtag-network visualization. Specifically, this tutorial demonstrates some complex data manipulation tricks such as reshaping, nested loops, and customizing network visualization parameters.
AcademictwitteR Referenced Tweets Download
This tutorial documents Twitter data collection via their ids.
academictwitteR User Download
Download and process (flatten) Twitter user data.
academictwitteR
This is a demonstration of downloading and cleaning non-retweet Twitter data using the academic research API. I’m glad to share more best practices and lessons learned in Twitter data collection and analysis.
Twitter (Retweet) Network Gini Coefficient Calculation
This tutorial demonstrates how to use the Gini coefficient to describe and measure large and weighted social networks (e.g., social media data). And then, reduce the large network using the Lorenz Curve as a reference for better visualization
Data Cleaning .pdf(.txt) to .csv
This is the second tutorial of my R administrative records data cleaning series. This first tutorial is available here. The raw data were downloaded from the United States Department of State–Bureau of Consular Affairs.
PDF Administrative Records Data Cleaning
This is a demonstration of cleaning Administrative Records (AR) downloaded from the United States Department of State—Bureau of Consular Affairs. The raw data were stored in 67 .txt files. Each of the .txt files contains the monthly non-immigrant visa issuances by nationality and visa class.
datacamp Unsupervised Learning
DataCamp Unsupervised Learning Course R codes replications with notes.
Support Vector Machines
This is an optional assignment of Classification: Nonparametric Methods, Support Vector Machines, STAT 508 Applied Data Mining and Statistical Learning, Pennsylvania State University. The course instructor is Dr. Lingzhou Xue, associate professor of statistics at Penn State. This demonstration is replicable without local datasets. Packages used include ggplot2, e1071, and dplyr.
How to Read SPSS Files in R
This tutorial shows: (1) Import .sav SPSS files to R; (2) Read variable attributes from the SPSS “variable view” table; (3) Organize the variable coding to data frames; (4) Fix reverse coding in survey data; and (5) Best-subset regression
Trends of Legislative Status 2011-2020
The legislative progress of all immigration bills introduced between the 112th and the 116th Congresses
Hashtag Networks S386 Passed Senate
S386 passed Senate on 12/2/2020
Hashtag Networks
S.386 Debate - 7/21/20
Hashtag Co-Occurrence Networks
H.R. 1044 passed/agreed to in House on 07/10/2019.
Hashtag Co-Occurrence Networks
S386 passed in Senate on 12/02/2020.
R Web Scraping Demonstration
Scraping a table with embedded URLs, and then extracting texts from these URLs.
Matching Multiple Patterns & N-grams Text Mining Using R base
This tutorial shows how to extract the frequency table of individual words and n-grams. I mostly rely on R base commands. Data were from an ongoing research project of a systematic literature review of hundreds of abstracts in public policy & public administration.