2021-06-27

Resampling time series in Pandas: resample and asfreq methods

23 mins read This article is an introductory dive into the technical aspects of resampling methods in pandas. 1. Resampling  Resampling is necessary […]
2021-06-26

Time series analysis with Pandas: Power consumption case study

24 mins read Originally developed for financial time series such as daily stock market prices, the robust and flexible data structures in pandas […]
2021-06-24

A complete guide on Pandas Hierarchical Indexing (MultiIndex)

31 mins read Pandas is the go-to library when for data analysis when working with tabular datasets. It is the best solution available for […]
2021-06-24

Data selection (indexing and slicing) in Pandas MultiIndex DataFrames

6 mins read A MultiIndex (also known as a hierarchical index) DataFrame allows you to have multiple columns acting as a row identifier and multiple […]
2021-06-21

Data Science and Machine Learning Cheat Sheets

5 mins read Click on the links to get the high-resolution cheat sheets. Algebra Linear Algebra Calculus Probability Statistics Python R Machine Learning […]
2021-05-26

5 steps to start becoming a Machine Learning Engineer

16 mins read Step 1: Adjusting Your Mindset Whenever I lead my workshops I always get a lot of questions afterward from developers […]
2021-05-11

Which Mean should we use? A guide on Arithmetic, Geometric, and Harmonic Means in Data Analysis

45 mins read Introduction It’s probably the most common data analytic task: You have a bunch of numbers. You want to summarize them […]
2021-04-28

Python Scipy sparse matrices explained

8 mins read What is a Sparse Matrix? Imagine you have a two-dimensional data set with 10 rows and 10 columns such that […]
2021-04-17

Understanding intuition behind Markov Chain Monte Carlo Methods (MCMC)

15 mins read For many of us, Bayesian statistics is voodoo magic at best or completely subjective nonsense at worst. Among the trademarks […]
2021-03-23

Review of important offline evaluation metrics for recommendation systems

28 mins read We are in an era of personalization. The user wants personalized content and businesses are capitalizing on the same. Recommendation […]
2021-03-12

Bayesian Linear Regression using PyMC3

8 mins read Introduction In statistics, Bayesian linear regression is an approach to linear regression in which the statistical analysis is undertaken within […]
2021-03-02

ARIMA for time series forecasting in Python

11 mins read Making out-of-sample forecasts can be confusing when getting started with time series data. The statsmodels Python API provides functions for […]
2021-02-25

Identifying time series AR, MA, ARMA, or ARIMA Models using ACF and PACF plots

4 mins read In time series analysis, the Autocorrelation Function (ACF) and the partial autocorrelation function (PACF) plots are essential in providing the […]
2021-02-19

Pivot, Melt, Stack, and Unstack methods in Pandas

5 mins read Data does not come in a usable format by default; a data science professional has to spend 70–80% of their […]
2021-02-15

Recommended tools and environment setup for a Data Scientist

16 mins read Intro and motivation In this post, I would like to describe in detail our setup and development environment (hardware and […]
2020-12-18

How to determine epsilon and MinPts parameters of DBSCAN clustering

9 mins read Every data mining task has the problem of parameters. Every parameter influences the algorithm in specific ways. DBSCAN (Density-Based Spatial […]
2020-11-18

Basics of Convolutional Neural Networks (CNN) from Deep Learning specialization

8 mins read These notes are taken from the first two weeks of the Convolutional Neural Networks course (part of Deep Learning specialization) by Andrew Ng […]
2020-11-14

Machine Learning From Scratch Series: Linear Regression with Gradient Descent

10 mins read In the following sections, we are going to implement linear regression in a step-by-step fashion using just Python and NumPy. We will […]
2020-11-13

Machine Learning From Scratch Series: Logistic Regression

10 mins read In this article, we are going to implement the most commonly used Classification algorithm called Logistic Regression. First, we will […]
2020-11-09

Restricted Boltzmann Machines (RBMs) Simply Explained

16 mins read Table of Content: Definition & Structure Reconstructions Probability Distributions Code Sample: Stacked RBMS Parameters & k Continuous RBMs Next Steps […]