Intro to Statistical Learning 2nd Edition (ISLR2e): Porting R to Python
R and Python
Bootstrap Sampling, Cross Validation, Regression, Neural Networks, Survival Analysis, and Hypothesis Testing
Background
To learn more about machine learning algorithms, I've been working through the book Introduction to Statistical Learning (Second Edition), commonly referred to as ISLR (2e), pictured below:![](images/ISLR_Python/islr_2e_image.png)
The labs are performed in R, and the exercises in the book are meant to be as well. However, everything can be done in Python if you know the right libraries and how to use them. There are several great GitHub repos out there where people have performed most of the labs or exercises from this book in Python, but not all labs and exercises have been ported over yet.
ISLR2e contains several chapters that weren't present in the first edition, and at the time of this writing, the second edition had only been out for a little over a year (published in August 2021). Consequently, there are no resources that I've been able to find regarding the labs or exercises in these new chapters:
To fill the void, I've ported the R code from labs in these new chapters to Python as well as used Python to answer each chapter's exercises, found below. I'll also include any code I write for labs or exercises from other chapters in the textbook.
Update (June 2023) : The authors of the textbook released a new version where all labs are performed in Python. Unfortunately, that renders this project moot, and I was hoping my contributions would have lasted longer. Regardless, the project still significantly helped level up my R, Python, and ML skillset.
Code:
GitHub repo found here- Chapter 5 - Resampling Methods: Pandas, NumPy, Sklearn, Statsmodels, Scipy, Rpy2, Matplotlib
- Chapter 10 - Deep Learning: Pandas, NumPy, Sklearn, Statsmodels, Matplotlib, Tensorflow/Keras, Patsy, glmnet_python
- Chapter 11 - Survival Analysis and Censored Data: Pandas, NumPy, Survive, Lifelines, Rpy2, Scikit-survive, Matplotlib
- Chapter 13 - Multiple Testing: Pandas, NumPy, Statsmodels, Scipy, Rpy2, Matplotlib
References:
James, G., Witten, D., Hastie, T., Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R, Second Edition, Springer Science+Business Media, New York. https://www.statlearning.com/