RPubs

by RStudio

ParichartP

Parichart Pattarapanitchai

Recently Published

slide-chapter10_regression_analysis

about 15 hours ago

slide-chapter09_data_classification

about 15 hours ago

slide-chapter08_data_clustering

about 15 hours ago

slide-chapter07_dimension_reduction

about 15 hours ago

slide-chapter06_data_preprocessing

about 15 hours ago

slide-chapter05_data_sampling

about 15 hours ago

slide-chapter04_test_of_independence

about 15 hours ago

slide-chapter03_hypothesis_testing

about 15 hours ago

slide-chapter02_data_distribution_probability

about 15 hours ago

slide-chapter01_descriptive_statistics

about 16 hours ago

Chapter 10: Regression Analysis

This chapter introduces the "Mathematical Compass" of Data Science: Regression Analysis. While previous chapters focused on grouping and labeling, students will now learn to predict continuous, numerical outcomes. By mastering the ability to quantify exactly how one variable influences another, students will move from simply describing data to building models that can estimate everything from market trends to housing prices with statistical precision. Core Topics covered: Chapter Overview Simple Linear Regression Multiple Linear Regression Regression Diagnostics Variable Selection Regularized Regression Non-Linear Regression Regression for Count and Categorical Outcomes Chapter Lab Activity: Housing Price Regression with Boston Dataset

about 1 month ago

Statistics for Data Science (229711) - Chapter 9: Data Classification

This chapter shifts the focus to the most popular domain of Supervised Learning: Data Classification. Students will learn how to build models that can "decide" and "predict" categorical labels for new data. From determining whether an email is spam to diagnosing a medical condition, this chapter provides a robust toolkit for making evidence-based predictions by learning from historical patterns. Core Topics covered: Introduction to Classification Logistic Regression K-Nearest Neighbors Decision Trees Random Forest and Ensemble Methods Model Evaluation Model Comparison and Selection Chapter Lab Activity: Medical Diagnosis Classification with Pima Data

about 1 month ago

Statistics for Data Science (229711) - Chapter 8: Data Clustering

This chapter introduces the concept of Unsupervised Learning through the lens of Data Clustering. Students will learn how to find "hidden structures" in data without predefined labels, mastering the techniques used to group similar observations together. From identifying customer segments to discovering natural patterns in biology, this chapter provides the tools to make sense of unlabeled datasets by letting the data speak for itself. Core Topics covered: Introduction to Clustering K-Means Clustering Hierarchical Clustering DBSCAN Cluster Validation Gaussian Mixture Models Practical Clustering Workflow Chapter Lab Activity: Customer Segmentation with wholesales Data

about 1 month ago

Statistics for Data Science (229711) - Chapter 7: Data Dimension Reduction

This chapter explores the "Art of Information Distillation": Dimension Reduction. Students will learn how to navigate the "Curse of Dimensionality," discovering how to condense massive, complex datasets into their most essential structures. The focus is on finding the "signal" within the "noise"—transforming hundreds of variables into a few meaningful dimensions that tell the real story. Core Topics covered: The Curse of Dimensionality Principal Component Analysis (PCA) Factor Analysis Linear Discriminant Analysis (LDA) t-SNE Feature Selection Methods Evaluating Dimension Reduction Chapter Lab Activity: Dimension Reduction Pipeline with decathlon2

about 1 month ago

Statistics for Data Science (229711) - Chapter 6: Data Preprocessing

This chapter dives into the "engine room" of Data Science: Preprocessing. Students will learn that the quality of a model is determined long before it is trained, focusing on the critical steps required to turn messy, real-world data into a "model-ready" format. Core Topics covered: Why Preprocessing Matters Handling Missing Data Outlier Detection and Treatment Data Transformation Encoding Categorical Variables Feature Scaling Data Integration and Reshaping Chapter Lab Activity: Full Preprocessing Pipeline with msleep

about 1 month ago

Statistics for Data Science (229711) - Chapter 5: Data Sampling Techniques

This chapter addresses the foundational question of data science: "How do we ensure our data truly represents the world?" It explores the mechanics of selection, the math of sample size, and the power of computational resampling. Core Topics covered: Why Sampling Matters Probability Sampling Methods Non-Probability Sampling Methods Sample Size Determination Sampling Bias and Common Pitfalls Bootstrap Resampling Evaluating Sample Quality Chapter Lab Activity: Exploring Sampling with nhanes-Style Data

about 1 month ago

Statistics for Data Science (229711) - Chapter 4: Test of Independence of Variables

This chapter explores the statistical frameworks used to detect and quantify relationships between variables. It moves from testing the independence of categorical factors to measuring the strength and direction of associations in both discrete and continuous data. Core Topics covered: The Concept of Independence Chi-Square Test of Independence Fisher’s Exact Test Cramér’s V and Effect Size for Categorical Association Correlation Tests Point-Biserial and Phi Coefficients Partial Correlation Chapter Lab Activity: Exploring Independence with the titanic and mtcars Datasets

about 1 month ago

Statistics for Data Science (229711) - Chapter 3: Hypothesis Testing

This chapter introduces the core engine of statistical decision-making: Hypothesis Testing. It provides a rigorous framework for making inferences about populations based on sample evidence, a critical skill for any Data Scientist. Core Topics covered: The Logic of Hypothesis Testing One-Sample Tests Two-Sample Tests Paired Sample Test One-Way ANOVA Non-Parametric Alternatives Effect Size and Statistical Power Chapter Lab Activity: Exploring Hypothesis Testing with the ToothGrowth Dataset

about 1 month ago

Statistics for Data Science (229711) - Chapter 2: Data Distribution and Probability

This chapter serves as the theoretical bridge between descriptive analysis and statistical inference. It introduces the fundamental concepts of probability and explores the mathematical distributions that model real-world data behavior. Core Topics covered: Types of Data and Measurement Scales Probability Fundamentals Conditional Probability and Bayes’ Theorem Discrete Probability Distributions Continuous Probability Distributions Sampling Distributions and the Central Limit Theorem Assessing Normality Chapter Lab Activity: Exploring Distributions with the airquality Dataset

about 1 month ago

Statistics for Data Science (229711) - Chapter 1: Descriptive Statistics

This document serves as the introductory chapter for the Statistics for Data Science course at the graduate level. It focuses on the fundamental principles of Exploratory Data Analysis (EDA), shifting the focus from simple computation to critical statistical interpretation . Topics covered: Measures of Central Tendency Measures of Dispersion Measures of Shape: Skewness and Kurtosis Data Visualization for Descriptive Statistics Multivariate Descriptive Statistics Chapter Lab Activity: Exploring the mtcars Dataset

about 1 month ago

208251_LAB5_Nonparametric Statistics

Students are able to 1)perform descriptive statistics 2)apply appropriate non-parametric statistics tests to answer research questions of interest.

about 2 years ago

208251_LAB4_Nonparametric Statistics

Students are able to 1)perform descriptive statistics 2)apply appropriate non-parametric statistics tests to answer reseach questions of interest.

about 2 years ago

208251_LAB3_Model diagnostics

Students are able to use R language to analyse data using multiple linear regression: 1. Perform linear regression analysis 2. Check Normality Assumptions 3. Check Constant Variance Assumptions 4. Check Independence (Autocorrelation) Assumptions 5. Dealing with Invalid Model Assumption

about 2 years ago

208251_LAB1_SimpleLinearRegression

Students are able to use R language to 1. perform descriptive statistics 2. construct scatterplot between two quantitative variables 3. perform correlation analysis 4. perform linear regression analysis and inference on regression parameters 5. interpret the results

about 2 years ago

208251_LAB2_MultipleLinearRegression

Students are able to use R language to analyse data using multiple linear regression: 1. perform descriptive statistsics 2. transform qualitative independent variable into dummy variables 3. select independent variables 4. perform linear regression analysis and inference on regression parameters 5. interpret the results

about 2 years ago

Sign In

ParichartP

Parichart Pattarapanitchai

Recently Published