my happy place

After several months of continuous writing, I have published a great number of blog posts and I feel it is time to organize them in one place. I cannot express how grateful I am for those who view, read, clapped, and responded to my articles. Watching the number of followers…

Data Science, Data Visualization, Programming

my first for-fun data science project

Recently I got myself obsessed with a Japanese tv show. I found myself cannot stop checking on Twitter, Instagram, and a Chinese app called Douban for updates and discussions about the show. In the meantime, I ran into an introduction article about the Python library Twint, which is very convenient…

part 1: preprocessing text data

It is estimated that 80% of the world’s data is unstructured. Thus deriving information from unstructured data is an essential part of data analysis. Text mining is the process of deriving valuable insights from unstructured text data, and sentiment analysis is one applicant of text mining. It is using natural…

Last year at this date, I published my first article on Medium. At that time, I was evacuated from the California wildfire while receiving rejection after months of interviewing with my dream company. At the lowest point, I made the best decision to start writing on Medium, the platform that…

Part 2, A beginner’s guide to the LDA Model

In my previous post, I introduced the clustering algorithm and discussed the K-Means algorithm in detail as the first part of the topic modeling series:

Part 1: A beginner’s guide to K-Means

Part 2: A beginner’s guide to LDA (this article)

Part 3: Use K-Means and LDA for topic modeling…

Part 1: A beginner's guide to K-means

Clustering is one of the most used unsupervised machine learning algorithms. You can think of clustering as putting unorganized data points into different categories so that you can learn more about the structures of your data. Clustering has a variety of applications in extracting information from data without labels. For…

conduct reliable causal inference with historical data

A causal relationship, unlike a correlation, is a much stronger relationship between two variables. Although it is hard to claim a causal relationship, it gives meaningful insights and informative guidance once proven. In my previous article, I have discussed what, why and how regarding causal inference:

As mentioned in the…

Office Hours

Nail the data science interviews with confidence, part 5

I have listed the technical questions to practice in machine learning, statistics, and probability theory in my previous articles regarding data science interview preparations. I have also discussed the strategies that can be used to prepare case study questions before and during data science interviews. This article is the fifth…

Thinking like a Bayesian

You have probably heard about Bayes’ theorem from statistic classes. The magic formula based on conditional probability unlocks a novel way of interpreting the probability of an event. In this article, I will discuss the basics regarding Bayesian Statistics, and what is the difference between Frequentist and Bayesian in terms…

Making Sense of Big Data

Using seasonality indicators or Fourier analysis

Seasonality, as its name suggested, refers to the seasonal characteristics of the time series data. It is the predictable pattern that repeats at a certain frequency within one year, such as weekly, monthly, quarterly, etc. The most straightforward example to demonstrate seasonality is to look at the temperature data. …

