Personal Projects

Acupuncture Analysis

Constructed confidence intervals and linear regression models and performed A/B testing on data from a scholarly article testing the effectiveness of acupuncture treatments on hypertension.

Python
A/B Testing, Confidence Intervals, Linear Regression
Pandas, NumPy, Sklearn, Statsmodels, Matplotlib, Seaborn

Craigslist Tutoring Exploratory Data Analysis

Using data collected from my ETL Craigslist web scraper, I've analyzed tutoring prices on Craigslist to better understand my competition as a math tutor. I query the data from a local PostgreSQL database and then visualize it using Plotly Express and Matplotlib.

Python, PostgreSQL
Data Analysis, Visualization
Pandas, Psycopg2, Plotly Express, and Matplotlib

Craigslist Tutoring Rates Dashboard

I built this interactive dashboard using data scraped from Craigslist in another project to understand national, regional, state, and local tutoring prices.

Tableau
Visualization

Datasheet Question Answering (QA) Tool

A tool that harnesses the power of large language models (LLMs) to answer domain-specific questions, citing pages from official documentation as sources.

Python, LLMs
Web Scraping, Vectorstores
Langchain, Streamlit

ETL Craigslist Web Scraper

I created this extract/transform/load tool to understand my competition as a math tutor. I extracted tutoring prices from the services section of Craigslist, transformed them to my specifications, and loaded them into a SQL database for analysis.

Python, PostgreSQL, HTML, RegEx
Web Scraping, Data Cleaning
Requests, BeautifulSoup, Pandas, NumPy, Psycopg2, and SKlearn

Pokémon Legendary Classifier

Classification models to predict whether a Pokémon is legendary. The model was trained on Generation 1-6 Pokémon and tested on the new Pokémon from Generation 7.

Python
Visualization, k-Nearest Neighbors, Logistic Regression, Over and Under-Sampling Techniques
Pandas, Sklearn, Statsmodels, Matplotlib, Seaborn, Imblearn

Porting Code from ISLR (2e)

The textbook Introduction to Statistical Learning (ISLR) released a second edition in August 2021, with three new chapters that were absent in the first edition. This project takes R code from the labs in these three chapters, ports it to Python, and answers each chapter's exercises in Python, too.

R and Python
Regression, Neural Networks, Survival Analysis, and Hypothesis Testing
Pandas, NumPy, Sklearn, Statsmodels, Scipy, Matplotlib, Tensorflow/Keras, Patsy, glmnet_python, Survive, Lifelines, Scikit-survive

U.S. Accounting Survey Analysis

This project is an exploratory data analysis inspired by my wife's entry into accounting last year. I cleaned and manipulated publicly available data to analyze various segments of this profession, visualize the results, and determine what factors maximize long-term salary.

Python, Excel
Data Cleaning, Exploratory Data Analysis, Visualization, Linear Regression
Pandas, NumPy, Matplotlib, Seaborn, SciPy, SKlearn, Statsmodels

U.S. Accounting Viz

A Tableau story focusing on visualizing salary differences among male and female accountants using the cleaned data from another project.

Tableau
Visualization