After several months of continuous writing, I have published a great number of blog posts and I feel it is time to organize them in one place. I cannot express how grateful I am for those who view, read, clapped, and responded to my articles. Watching the number of followers grows from zero to 3-digits has encouraged and inspired me to keep producing meaningful content. This article serves as a list of all my blog posts and will be keep updating as I write more blog posts.
Recently I got myself obsessed with a Japanese tv show. I found myself cannot stop checking on Twitter, Instagram, and a Chinese app called Douban for updates and discussions about the show. In the meantime, I ran into an introduction article about the Python library Twint, which is very convenient in gathering twitter data. While it is torturing waiting for the new episode to come out every week, I decided to use the waiting time exploring Twint and derive some insights about the show from Twitter.
This article will discuss the use of twint
, including how to installtwint
, how to…
It is estimated that 80% of the world’s data is unstructured. Thus deriving information from unstructured data is an essential part of data analysis. Text mining is the process of deriving valuable insights from unstructured text data, and sentiment analysis is one applicant of text mining. It is using natural language processing and machine learning techniques to understand and classify subjective emotions from text data. In business settings, sentiment analysis is widely used in understanding customer reviews, detecting spam from emails, etc. This article is the first part of the tutorial that introduces the specific techniques used to conduct sentiment…
2020 was probably the worst year for job hunting, especially as a new grad. Being in the same job-hunting pool with many experienced talents competing for limited job opportunities has been a nightmare for me. Luckily, after 7 months of constant job hunting, I finally got an offer I am super excited about and satisfied with. I want to thank myself for always having the strength to put myself together, even facing the rejections over and over again. How to deal with failures is the most important lesson I have learned from 2020. …
In April 2020, I attended a Bootcamp and completed the intensive 8-week data science training program virtually. Now when I look back, these two months definitely deserve the highlight of my 2020. In this article, I want to share my experience attending the Bootcamp and discuss how I benefited in seven aspects, hoping to give some insights to those considering attending one.k
There are rising numbers of data science Bootcamps as the data science positions are getting more and more popular worldwide. The Bootcamp prepares the candidates with data science training and helps them work on a data science project…
Statistical inference is the process of making reasonable guesses about the population's distribution and parameters given the observed data. Conducting hypothesis testing and constructing confidence interval are two examples of statistical inference. Hypothesis testing is the process of calculating the probability of observing sample statistics given the null hypothesis is true. By comparing the probability (P-value) with the significance level (1-ɑ), we make reasonable guesses about the population parameters from which the sample is taken. With a similar process, we can calculate the confidence interval with a certain confidence level. A confidence interval is an interval estimation for a population…
California Governor Gavin Newsom has recently announced new stay-at-home orders in coping with the increased number of confirmed coronavirus cases. Following the curfew order a few weeks ago, now residents in the regions with ICU capacity below 15% in the Bay Area are advised to stay at home until January 4th, 2021. The Bay Area's economic activities are widely affected by adjusting the indoor capacities in the restaurants, shopping centers, etc., and limiting outdoor activities.
This is not the first time for residents in the Bay Area to experience this. In March 2020, the shelter-in-place orders were issued to cope…
For various projects I have worked on using NLP techniques, I am dealing with the text data in English. What to do when the text data are not in English? This article will discuss how I derive some insights from tweets in foreign languages by analyzing the universal language: Emojis🎈.
Recently I have started a for-fun project analyzing Twitter posts about a Japanese Show I am watching. In my previous posts, I have discussed using the Twint library to gather all show-related tweets and some analysis about tweets and tweets related actions such as the number of replies, retweets, and…
As a Ph.D. in Economics, I have devoted myself to find the causal relationship among certain variables towards finishing my dissertation. A causal relationship is so powerful that it gives enough confidence in making decisions, preventing losses, solving optimal solutions, and so forth. In this article, I will discuss what causality is, why we need to discover causal relationships, and the common techniques to conduct causal inference.
A causal relationship describes a relationship between two variables such that one has caused another to occur. It is a much stronger relationship than correlation, which is just describing the co-movement patterns between…
I have discussed the questions to prepare in machine learning, statistics, and probability theory for data science interviews in my previous articles. In this article, I will discuss the preparation for the case study questions.
During data science interviews, sometimes interviewers will propose a series of business questions and discuss potential solutions using data science techniques. This is a typical example of case study questions during data science interviews. Based on the candidate’s performance, the interviewer can have a thorough understanding of the candidate’s ability in critical thinking, business intelligence, problem-solving skills with vague business questions, and the practical use…