2022-02-18

A guide on PySpark Window Functions with Partition By

11 mins read Pyspark window functions are useful when you want to examine relationships within groups of data rather than between groups of […]
2022-02-17

Setting up a multi-node Apache Spark Cluster on a local Windows machine with Virtual Box

6 mins read Prerequisite Understand how to install Ubuntu inside Windows using Oracle VM VirtualBox from this Link Apache Spark is a fast and […]
2022-02-17

Useful magic commands in Jupyter Notebook/Lab

30 mins read Jupyter Notebook/Lab is the go-to tool used by data scientists and developers worldwide to perform data analysis nowadays. It provides […]
2022-02-15

Different approaches for finding feature importance using Random Forests

16 mins read In many (business) cases it is equally important to not only have an accurate, but also an interpretable model. Oftentimes, […]
2022-02-13

Common loss functions for training deep neural networks with Keras examples

30 mins read Deep neural networks are trained using the stochastic gradient descent optimization algorithm. As part of the optimization algorithm, the error for […]
2022-02-13

Handling skewness in features by applying transformation in Python

13 mins read In this tutorial, you will learn how to deal with your data when it is not following the normal distribution. One […]
2022-02-09

Out of Bag (OOB) score in Random Forests with example

12 mins read Introduction This post describes the intuition behind the Out of Bag (OOB) score in Random forest, how it is calculated, […]
2022-02-08

Understanding the Random Forest algorithm and its hyperparameters

17 mins read In this post, we will see how the Random Forest algorithm works internally. To truly appreciate it, it might be […]
2022-02-07

Machine Learning From Scratch Series: K-means Clustering

22 mins read Introduction Clustering is one of the most common exploratory data analysis techniques used to get an intuition about the structure of […]
2022-02-03

Feature selection for categorical data with Python code

17 mins read Feature selection is the process of identifying and selecting a subset of input features that are most relevant to the target […]
2022-02-03

Basic feature engineering tasks for numeric and categorical data with Python code

34 mins read Machine learning pipelines Any intelligent system basically consists of an end-to-end pipeline starting from ingesting raw data and leveraging data […]
2022-01-25

Interpreting ACF and PACF Plots for AR and MA models

12 mins read Autocorrelation analysis is an important step in the Exploratory Data Analysis of time series forecasting. The autocorrelation analysis helps detect patterns […]
2022-01-25

Understanding Alternating Least Squares algorithm for implicit collaborative filtering recommendations

23 mins read Overview We’re going to write a simple implementation of an implicit (more on that below) recommendation algorithm. We want to […]
2022-01-23

Implementing Attention Mechanism in Python

7 mins read The attention mechanism was introduced to improve the performance of the encoder-decoder model for machine translation. The idea behind the […]
2022-01-23

An illustrated guide to Attention Mechanism in Sequence Models with PyTorch code

22 mins read In this article, I will be covering the main concepts behind Attention, including the implementation of a sequence-to-sequence Attention model, […]
2022-01-23

Understanding Self-Attention in Transformers with example

10 mins read What do BERT, RoBERTa, ALBERT, SpanBERT, DistilBERT, SesameBERT, SemBERT, SciBERT, BioBERT, MobileBERT, TinyBERT and CamemBERT all have in common? And […]
2021-12-26

Walk-forward optimization for algorithmic trading strategies on cloud architecture

11 mins read Table of Contents: Introduction Terminology Walk-forward Optimization Design of walk-forwards The Architecture Configuring cloud machines using Ansible Docker Swarm Optimization […]
2021-12-04

Sampling from a multivariate Gaussian (Normal) distribution with Python code

3 mins read Steps: A widely used method for drawing (sampling) a random vector  from the N-dimensional multivariate normal distribution with mean vector  and covariance […]
2021-11-20

Understanding Expectation-Maximization (EM) algorithm with an example in Python

7 mins read Suppose we have some data sampled from two different groups, red and blue: Here, we can see which data point […]
2021-11-15

Using pre-commit and Makefile for Python code development workflow

5 mins read Introduction When developing Python code we are constantly adding and committing changes. However, nothing stops us from committing low-quality code, e.g. code […]