As a biologist, I often deal with data. No matter how fluent I am in programming, deciding the steps for my analysis pipeline is always a trial and error. It sometimes requires referencing to other bibliographies and discussing with fellow researchers. My analysis pipeline always starts with the Exploratory Data Analysis (EDA). EDA helps me investigate what happens in my data and detect critical points that can be important for further analysis.

What I meant by EDA is that it is the first step for *my analysis*. However, it does not necessarily mean that EDA is the initial step of data analysis. It is an approach to analyse data that includes the summary of data main characteristics and graphical illustration. There is another term for the initial step of data analysis, which is Initial Data Analysis (IDA). IDA focuses on checking assumptions for model fitting, handling missing values, and making transformations of variables. …

I started learning data science as an environmentalist. Statistics was and will always be my first go-to tool to organize data for solving real-life problems.

I studied a branch of environmental science that rarely anyone could ever think of as their first option to enter university. I studied forest and agricultural science. It is an interdisciplinary subject because I could focus not only on the forest, but also on plant physiology, genetics, ecology and landscape science, environmental science, epidemiology, and many more. …

Correlation is one of the statistics’ all time classic, yet it is still a busy measure that everyone uses in their analysis process. In classic interpretation, correlation is a measure of relationship or correspondence between two variables. This is usually visualized through a correlation plot and measured using correlation coefficient (r) that ranges between -1 to 1. The important takeaway from r is it shows the degree of relationship between two variables in terms how a change in one variable will lead to a change in the corresponding variable.

While interpreting the correlation plot is widely known and quite straightforward, the intuitive understanding on the equation of correlation coefficient (r) is less widely known. This is what the article is about. Why do the result is between -1 and 1; and where do the signs come from. …

*Why not start considering median and pay more attention to our standard deviation?*

*Disclaimer: This post is my thinking process of the statistical measures that I think should be paid more attention for. I am not an expert in statistics, and in this article I’m just sharing my opinion.*

I first started working using statistics when I wrote my bachelor’s thesis, but looking back to the manuscript, I could see that I was lacking the basic foundation of statistics. …

About