Recently Published
Data 622 Assignment 2
Reccomending a classifier
The most selective colleges (left) & most accepted (up)
Colleges toward the left edge of this graph are the most selective in America (1st principal component). I believe accepted applicants are most likely to enroll at schools toward the top (2nd principal component).
Caso 17
Distribución Binomial
R basic
给初学者的建议
Module 3 - In Class Activity 8
**Module 3: Moneyball and The Power of Sports Analytics in Baseball**
**// In-class activity # 8: Predicting the Number of Runs**
Module 3 - In Class Activity 7
**Module 3: Moneyball and The Power of Sports Analytics in Baseball**
**// In-class activity # 7: Predicting the Number of Wins**
Module 3 - In Class Activity 7 and 8
**Module 3: Moneyball and The Power of Sports Analytics in Baseball**
**// In-class activity # 7: Predicting the Number of Wins**
**// In-class activity # 8: Predicting the Number of Runs**
Data 622 Assignment 2
Introduction
In Machine Learning, Experimentation refers to the systematic process of designing, executing, and analyzing different configurations to identify the optimal settings that performs best on a given task. Experimentation is learning by doing. It involves systematically changing parameters, evaluating results with metrics, and comparing different approaches to find the best solution; essentially, it's the practice of testing and refining machine learning models through controlled experiments to improve their performance.
The key is to modify only one or a few variables at a time to isolate the impact of each change and understand its effect on model performance. In the assignment you will conduct at least 6 experiments. In real life, data scientists run anywhere from a dozen to hundreds of experiments (depending on the dataset and problem domain).
Assignment
This assignment consists of conducting at least two (2) experiments for different algorithms: Decision Trees, Random Forest and Adaboost. That is, at least six (6) experiments in total (3 algorithms x 2 experiments each). For each experiment you will define what you are trying to achieve (before each run), conduct the experiment, and at the end you will review how your experiment went. These experiments will allow you to compare algorithms and choose the optimal model.
Using the dataset and EDA from the previous assignment, perform the following:
Algorithm Selection
You will perform experiments using the following algorithms:
Decision Trees
Random Forest
Adaboost
Experiment
For each of the algorithms (above), perform at least two (2) experiments. In a typical experiment you should:
Define the objective of the experiment (hypothesis)
Decide what will change, and what will stay the same
Select the evaluation metric (what you want to measure)
Perform the experiment
Document the experiment so you compare results (track progress)
Variations
There are many things you can vary between experiments, here are some examples:
Data sampling (feature selection)
Data augmentation e.g., regularization, normalization, scaling
Hyperparameter optimization (you decide, random search, grid search, etc.)
Decision Tree breadth & depth (this is an example of a hyperparameter)
Evaluation metrics e.g., Accuracy, precision, recall, F1-score, AUC-ROC
Cross-validation strategy e.g., holdout, k-fold, leave-one-out
Number of trees (for ensemble models)
Train-test split: Using different data splits to assess model generalization ability
DsLabs
A look into stars