Analytics Portfolio

Download .zip Download .tar.gz View on GitHub


Hello! Thanks for visiting my webpage. My name is Michelle Tat, and I am currently a principal data scientist at the City of Boston. I encourage you to browse my GitHub repositories (click the “View on GitHub” button above!, or click here). I’ve started to do a few fun things lately, which include:

  • A script exploring cryptocurrency forecasting (with the use of autoregressive models)
  • A module that will allow individuals to easily access Boston 311 data for analysis

Feel free to visit my LinkedIn profile, or the sites for my Insight Health Data Science projects, Happy Helper and PubMed Topic Modeler, if you would like to know more about me.

Any and all work related code can be seen at the City of Boston Github. Unfortunately, many of the analytics and data science projects I am working on are under private repos. They will become public as our team progresses to completing those projects.

I also have a blog, where I recently wrote a introductory tutorial on Random Forest in R. Feel free to check it out!

PubMed Topic Modeler

  • This project takes a keyword search for PubMed, and uses natural language processing to clean and run topic modeling on the retrieved text.

  • Notabably, this project was an exercise in implementing more computer science principles, such as object oriented programming, unit testing, and the use of version control in github.

Happy Helper Notebooks and Scripts

  • Notebooks ands scripts covering my development process for Happy Helper. Happy Helper was my initial Insight Project that I demoed at various Boston area companies.

  • This project takes 90,000 reddit comments from Google BigQuery (specifically from the subreddits /r/anxiety and /r/depression), takes those comments, cleans them up, and implements a classification analysis on the text using various models (e.g., Support Vector Machines, Naive Bayes)

  • A web app came out of this where a user could input a chunk of text, and it would get classified as being similar to anxious or depressive text. Further, the user would be referred to an appropriate reddit support forum links, where the user input is matched on similarity with posts in that support forum. Check out Happy Helper for an example.

  • The webapp was developed using the Flask framework, and deployed using Amazon EC2.

Other Fun Stuff

Boston 311 Request RShiny Example Dashboard

  • This was an exploratory project, where I learned to put together a basic dashboard in RShiny, using the Boston 311 Requests dataset.

Transgender Natural Language Project

  • This is an ongoing project where I use natural language processing (e.g.,frequency analyses, document classification, word embeddings) to examine the concerns of the transgender community. You can read more about it on my blog. This project was written in Python.

Updated 9/23/2017

  • Cleaned up webpage and personal repos.