2022-03-24

When to avoid using Random Forest Regression?

8 mins read In this article, we’ll look at a major problem with using Random Forest for Regression which is extrapolation.  Random Forest Regression […]
2022-03-23

A comprehensive tutorial on Transformers Architecture

43 mins read We’ve been hearing a lot about Transformers and with good reason. They have taken the world of NLP by storm […]
2022-03-22

Categorical data type in Pandas

8 mins read You may have categorical data in your dataset. A categorical data is a type with two or more categories. If […]
2022-03-22

NumPy Broadcasting tutorial

13 mins read In operations between NumPy arrays (ndarray), each shape is automatically converted to be the same by broadcasting. This article describes the following […]
2022-03-22

PySpark equivalent methods for Pandas dataframes

8 mins read Pandas is the go-to library for every data scientist. It is essential for every person who wishes to manipulate data […]
2022-03-17

Methods for sampling from complex distributions

8 mins read This writeup includes descriptions from a recent paper on algorithmic sampling, to describe in simpler terms the motivation and approach for […]
2022-03-11

A tutorial on Bayesian Statistics and Bayesian Machine Learning basics with Python Code

31 mins read Introduction Conditional probability and Bayes’ theorem are fundamental ideas in statistics that even laymen have heard of. Bayes’ theorem also […]
2022-03-08

Review of intuitions behind the recent advances in NLP: From RNNs to Transformers and BERT

48 mins read Few areas of AI are more exciting than NLP right now. In recent years language models (LM), which can perform […]
2022-02-28

Understanding Pandas and NumPy views vs copies to handle SettingWithCopyWarning

33 mins read Table of Contents Prerequisites Example of a SettingWithCopyWarning Views and Copies in NumPy and Pandas Understanding Views and Copies in […]
2022-02-27

Classical Time Series Forecasting Models in Python

11 mins read Machine learning methods can be used for the classification and forecasting of time series problems. Before exploring machine learning methods for time […]
2022-02-25

Autocorrelation and Partial Autocorrelation explained with Python code

10 mins read What is correlation? In statistics, correlation or dependence refers to any statistical association between two random variables or bivariate data, whether causal […]
2022-02-23

Understanding Attention Mechanism with example

14 mins read For decades, Statistical Machine Translation has been the dominant translation model, until the birth of Neural Machine Translation (NMT). NMT is an […]
2022-02-22

Hyperparameter optimization techniques in machine learning with Python code

10 mins read In every Machine Learning project, it is possible and recommended to search the hyperparameter space to get the best performance […]
2022-02-20

Bayesian view of linear regression – Maximum Likelihood Estimation (MLE) and Maximum A Priori (MAP)

16 mins read Linear Regression is commonly the first machine learning problem that people are interested in in the area of study. For […]
2022-02-17

Useful magic commands in Jupyter Notebook/Lab

30 mins read Jupyter Notebook/Lab is the go-to tool used by data scientists and developers worldwide to perform data analysis nowadays. It provides […]
2022-02-15

Different approaches for finding feature importance using Random Forests

16 mins read In many (business) cases it is equally important to not only have an accurate, but also an interpretable model. Oftentimes, […]
2022-02-14

Understanding GROUP BY, GROUPING SET, ROLL UP, and CUBE in SQL

18 mins read GROUP BY A table in a database has columns of information in it. Each column in a table represents an […]
2022-02-13

Handling skewness in features by applying transformation in Python

13 mins read In this tutorial, you will learn how to deal with your data when it is not following the normal distribution. One […]
2022-02-09

Out of Bag (OOB) score in Random Forests with example

12 mins read Introduction This post describes the intuition behind the Out of Bag (OOB) score in Random forest, how it is calculated, […]
2022-02-08

Understanding the Random Forest algorithm and its hyperparameters

17 mins read In this post, we will see how the Random Forest algorithm works internally. To truly appreciate it, it might be […]