2020-11-08

Data Representation in NumPy

12 mins read The NumPy package is the workhorse of data analysis, machine learning, and scientific computing in the python ecosystem. It vastly simplifies manipulating […]
2020-07-24

Image classification example with Gradio and Keras

12 mins read Image classification is a subset of machine learning that categorizes a group of images into labeled classes. We train an […]
2020-07-15

What is the Bias-Variance Trade-off?

9 mins read Whenever you are using a Statistical, Econometrical, or Machine Learning model, no matter how simple the model is, you should […]
2020-07-13

Common loss functions for training deep neural networks in PyTorch

17 mins read Neural networks can do a lot of different tasks. Whether it’s classifying data, like grouping pictures of animals into cats […]
2020-07-13

Illustrated calculation of cross-entropy for binary, multi-class, and multi-label classification

8 mins read Cross-entropy is a commonly used loss function for classification tasks. Let’s see why and where to use it. We’ll start with […]
2020-07-12

A complete tutorial on evaluation metrics for imbalanced classification

38 mins read A classifier is only as good as the metric used to evaluate it. If you choose the wrong metric to […]
2020-07-01

Exploratory Data Analysis (EDA) example: Road safety dataset case study

20 mins read Getting a good feeling about a new dataset is not always easy and takes time. However, a good and broad […]
2020-06-24

Pandas data selection using .loc and .iloc

8 mins read When it comes to select data on a DataFrame, Pandas loc and iloc are two top favorites. They are quick, fast, easy to read, […]
2020-05-31

Understanding hypothesis testing with Covid-19 case study (Z-test and t-test)

13 mins read Introduction The coronavirus pandemic has made a statistician out of us all. We are constantly checking the numbers, making our […]
2020-04-25

Styling Pandas DataFrames using Style API

10 mins read Python’s Pandas library allows you to present tabular data in a similar way as Excel. What’s not so similar is […]
2020-04-20

Understanding the probabilistic interpretation of linear regression

6 mins read Linear regression is about finding a linear model that best fits a given dataset. For example, in a simple linear […]
2020-03-19

Understanding Beta Distribution

9 mins read When to use Beta distribution The Beta distribution is a probability distribution on probabilities. For example, we can use it to model […]
2020-03-13

The intuition behind Shapley Values

10 mins read The first time I heard about Shapley values was when I was reading up on model interpretability. I came across […]
2020-02-21

Walkthrough of an exploratory analysis for classification problems

20 mins read In this post, I’ll outline how to perform an exploratory analysis for a binary classification problem. I am going to […]
2020-02-05

Dealing with imbalanced data in machine learning

8 mins read Imbalanced classes are a common problem in machine learning classification where there is a disproportionate ratio of observations in each […]
2020-02-03

List of useful tutorials for Exploratory Data Analysis (EDA)

< 1 min https://towardsdatascience.com/exploratory-data-analysis-8fc1cb20fd15 https://medium.com/omarelgabrys-blog/statistics-probability-exploratory-data-analysis-714f361b43d1 https://www.kaggle.com/ekami66/detailed-exploratory-data-analysis-with-python https://www.kaggle.com/dvigneshwer/kernele7f4dbb964/notebook Visualizing the distribution of a dataset — seaborn 0.10.0 documentationhttps://seaborn.pydata.org/tutorial/distributions.html https://www.kaggle.com/kashnitsky/topic-1-exploratory-data-analysis-with-pandas https://iq.opengenus.org/exploratory-data-analysis-python/ Plotting with categorical data […]
2020-02-03

Types of Data & Measurement Scales: Nominal, Ordinal, Interval and Ratio

6 mins read There are four measurement scales: nominal, ordinal, interval, and ratio. These are simply ways to categorize different types of variables […]
2019-11-27

How to split data in decision tree nodes?

17 mins read The problem: We need to recommend apps to users according to what they’re likely to download Recommendation systems are one […]
2019-11-14

Machine Learning From Scratch Series: Gradient Descent

9 mins read Gradient Descent is an iterative algorithm that is used to minimize a function by finding the optimal parameters. Gradient Descent can […]
2019-11-13

Logistic Regression Implementation From Scratch in Python

4 mins read The objective of this tutorial is to implement our own Logistic Regression from scratch. This is going to be different […]