Photo by Ilya Pavlov on Unsplash

There must be a time when a data set is so big that it does not fit a csv or txt file. For example, in retail industries you may deal with customer records and sales information which usually are large in amount.

These kinds of data are stored and maintained in a database. To simply put it, a database stores information in organized and structured manners to maintain their accessibility, security, and integrity. Information in a database are maintained and controlled by a database management system (DBMS). …

Photo by Luke Chesser on Unsplash

As a biologist, I often deal with data. No matter how fluent I am in programming, deciding the steps for my analysis pipeline is always a trial and error. It sometimes requires referencing to other bibliographies and discussing with fellow researchers. My analysis pipeline always starts with the Exploratory Data Analysis (EDA). EDA helps me investigate what happens in my data and detect critical points that can be important for further analysis.

What I meant by EDA is that it is the first step for my analysis. However, it does not necessarily mean that EDA is the initial step of…

Photo by Imat Bagja Gumilar on Unsplash

I started learning data science as an environmentalist. Statistics was and will always be my first go-to tool to organize data for solving real-life problems.

I studied a branch of environmental science that rarely anyone could ever think of as their first option to enter university. I studied forest and agricultural science. It is an interdisciplinary subject because I could focus not only on the forest, but also on plant physiology, genetics, ecology and landscape science, environmental science, epidemiology, and many more. …

Photo by Dose Media on Unsplash

Correlation is one of the statistics’ all time classic, yet it is still a busy measure that everyone uses in their analysis process. In classic interpretation, correlation is a measure of relationship or correspondence between two variables. This is usually visualized through a correlation plot and measured using correlation coefficient (r) that ranges between -1 to 1. The important takeaway from r is it shows the degree of relationship between two variables in terms how a change in one variable will lead to a change in the corresponding variable.

While interpreting the correlation plot is widely known and quite straightforward…

Why not start considering median and pay more attention to our standard deviation?

This cover picture may not be related to the story post, but I want to use this space for the pictures I took.

Disclaimer: This post is my thinking process of the statistical measures that I think should be paid more attention for. I am not an expert in statistics, and in this article I’m just sharing my opinion.

I first started working using statistics when I wrote my bachelor’s thesis, but looking back to the manuscript, I could see that I was lacking the basic foundation of statistics. …

Firza Riany

Hi! I like to confuse people: I use data to study forest

